Re: [PATCH v6 2/7] perf/x86/intel: Record branch type

2017-05-07 Thread Jin, Yao



On 4/24/2017 8:47 AM, Jin, Yao wrote:



On 4/23/2017 9:55 PM, Jiri Olsa wrote:

On Thu, Apr 20, 2017 at 08:07:50PM +0800, Jin Yao wrote:

SNIP


  +#define X86_BR_TYPE_MAP_MAX16
+
+static int
+common_branch_type(int type)
+{
+int i, mask;
+const int branch_map[X86_BR_TYPE_MAP_MAX] = {
+PERF_BR_CALL,/* X86_BR_CALL */
+PERF_BR_RET,/* X86_BR_RET */
+PERF_BR_SYSCALL,/* X86_BR_SYSCALL */
+PERF_BR_SYSRET,/* X86_BR_SYSRET */
+PERF_BR_INT,/* X86_BR_INT */
+PERF_BR_IRET,/* X86_BR_IRET */
+PERF_BR_JCC,/* X86_BR_JCC */
+PERF_BR_JMP,/* X86_BR_JMP */
+PERF_BR_IRQ,/* X86_BR_IRQ */
+PERF_BR_IND_CALL,/* X86_BR_IND_CALL */
+PERF_BR_NONE,/* X86_BR_ABORT */
+PERF_BR_NONE,/* X86_BR_IN_TX */
+PERF_BR_NONE,/* X86_BR_NO_TX */
+PERF_BR_CALL,/* X86_BR_ZERO_CALL */
+PERF_BR_NONE,/* X86_BR_CALL_STACK */
+PERF_BR_IND_JMP,/* X86_BR_IND_JMP */
+};
+
+type >>= 2; /* skip X86_BR_USER and X86_BR_KERNEL */
+mask = ~(~0 << 1);

is that a fancy way to get 1 into the mask? what do I miss?


+
+for (i = 0; i < X86_BR_TYPE_MAP_MAX; i++) {
+if (type & mask)
+return branch_map[i];

I wonder some bit search would be faster in here, but maybe not big deal

jirka


I just think the branch_map[] doesn't contain many entries (16 entries 
here), so maybe checking 1 bit one time should be acceptable. I just 
want to keep the code simple.


But if the number of entries is more (e.g. 64), maybe it'd better 
check 2 or 4 bits one time.


Thanks
Jin Yao



Hi,

Is this explanation OK? Since for tools part, it's Acked-by: Jiri Olsa. 
I just want to know  if the kernel part is OK either?


Thanks
Jin Yao



[PATCH] powerpc/mm: Use seq_putc() in two functions

2017-05-07 Thread SF Markus Elfring
From: Markus Elfring 
Date: Sun, 7 May 2017 16:32:04 +0200

Two single characters (line breaks) should be put into a sequence.
Thus use the corresponding function "seq_putc".

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 arch/powerpc/mm/dump_hashpagetable.c   | 2 +-
 arch/powerpc/mm/dump_linuxpagetables.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/dump_hashpagetable.c 
b/arch/powerpc/mm/dump_hashpagetable.c
index c6b900f54c07..31d248c82f95 100644
--- a/arch/powerpc/mm/dump_hashpagetable.c
+++ b/arch/powerpc/mm/dump_hashpagetable.c
@@ -205,7 +205,7 @@ static void dump_hpte_info(struct pg_state *st, unsigned 
long ea, u64 v, u64 r,
aps_index = calculate_pagesize(st, aps, "actual");
if (aps_index != 2)
seq_printf(st->seq, "LP enc: %lx", lp);
-   seq_puts(st->seq, "\n");
+   seq_putc(st->seq, '\n');
 }
 
 
diff --git a/arch/powerpc/mm/dump_linuxpagetables.c 
b/arch/powerpc/mm/dump_linuxpagetables.c
index d659345a98d6..5c08de16339d 100644
--- a/arch/powerpc/mm/dump_linuxpagetables.c
+++ b/arch/powerpc/mm/dump_linuxpagetables.c
@@ -349,7 +349,7 @@ static void note_page(struct pg_state *st, unsigned long 
addr,
  st->current_flags,
  pg_level[st->level].num);
 
-   seq_puts(st->seq, "\n");
+   seq_putc(st->seq, '\n');
}
 
/*
-- 
2.12.2



Re: [RFC 1/2] powerpc/mm: Add marker for contexts requiring global TLB invalidations

2017-05-07 Thread Frederic Barrat



Le 04/05/2017 à 11:42, Michael Ellerman a écrit :

Frederic Barrat  writes:


Introduce a new 'flags' attribute per context and define its first bit
to be a marker requiring all TLBIs for that context to be broadcasted
globally. Once that marker is set on a context, it cannot be removed.

Such a marker is useful for memory contexts used by devices behind the
NPU and CAPP/PSL. The NPU and the PSL keep their own
translation cache so they need to see all the TLBIs for those
contexts.

Signed-off-by: Frederic Barrat 
---
 arch/powerpc/include/asm/book3s/64/mmu.h |  9 +
 arch/powerpc/include/asm/tlb.h   | 10 --
 arch/powerpc/mm/mmu_context_book3s64.c   |  1 +
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index 77529a3e3811..7b640ab1cbeb 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -78,8 +78,12 @@ struct spinlock;
 /* Maximum possible number of NPUs in a system. */
 #define NV_MAX_NPUS 8

+/* Bits definition for the context flags */
+#define MM_CONTEXT_GLOBAL_TLBI 1   /* TLBI must be global */


I think I'd prefer MM_GLOBAL_TLBIE, it's shorter and tlbie is the name
of the instruction so is something people can search for.



OK



@@ -164,5 +168,10 @@ extern void radix_init_pseries(void);
 static inline void radix_init_pseries(void) { };
 #endif

+static inline void mm_context_set_global_tlbi(mm_context_t *ctx)
+{
+   set_bit(MM_CONTEXT_GLOBAL_TLBI, >flags);
+}


set_bit() and test_bit() are non-atomic, and unordered vs other loads
and stores.

So the caller will need to be careful they have a barrier between this
and whatever it is they do that creates mappings that might need to be
invalidated.


Agreed, I missed the barrier. So I need to set the flag, have a write 
memory barrier. Then, in the case of cxl, we can attach to the accelerator.




Similarly on the read side we should have a barrier between the store
that makes the PTE invalid and the load of the flag.


That one is more subtle, at least to me, but I think I now see what you 
mean. With no extra read barrier, we would be exposed to have the 
following order:


CPU1CPU2 device
read flag=>local
set global flag
write barrier
attach
 read PTE
update PTE
tlbiel   not seen, hence broken


Is that what you meant?
That would mean an extra read barrier for each tlbie.



Which makes me think cxl_ctx_in_use() is buggy :/, hmm. But it's late so
hopefully I'm wrong :D


Unfortunately, I think you're right. And we're missing the same 2 
barriers: a write barrier when cxl increments atomically the use count 
before attaching, and a read barrier like above.


  Fred




diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
index 609557569f65..bd18ed083011 100644
--- a/arch/powerpc/include/asm/tlb.h
+++ b/arch/powerpc/include/asm/tlb.h
@@ -71,8 +71,14 @@ static inline int mm_is_core_local(struct mm_struct *mm)

 static inline int mm_is_thread_local(struct mm_struct *mm)
 {
-   return cpumask_equal(mm_cpumask(mm),
- cpumask_of(smp_processor_id()));
+   int rc;
+
+   rc = cpumask_equal(mm_cpumask(mm),
+   cpumask_of(smp_processor_id()));
+#ifdef CONFIG_PPC_BOOK3S_64
+   rc = rc && !test_bit(MM_CONTEXT_GLOBAL_TLBI, >context.flags);
+#endif


The ifdef's a bit ugly, but I guess it's not worth putting it in a
static inline.

I'd be interested to see the generated code for this, and for the
reverse, ie. putting the test_bit() first, and doing an early return if
it's true. That way once the bit is set we can just skip the cpumask
comparison.

cheers





Re: [RFC 2/2] cxl: Mark context requiring global TLBIs

2017-05-07 Thread Frederic Barrat



Le 04/05/2017 à 09:39, Balbir Singh a écrit :

On Wed, 2017-05-03 at 16:29 +0200, Frederic Barrat wrote:

The PSL needs to see all TLBI pertinent to the memory contexts used on
the cxl adapter. For the hash memory model, it was done by making all
TLBIs global as soon as the cxl driver is in us. For radix, we need
something similar, but we can refine and only convert to global the
invalidations for contexts actually used by the device.

So mark the contexts being attached to the cxl adapter as requiring
global TLBIs.




+#ifdef CONFIG_PPC_BOOK3S_64
+   if (ctx->mm)
+   mm_context_set_global_tlbi(>mm->context);


Just curious and wondering

Could we do mm_context_set_global_tlbi() before ->attach_process() that
way we won't need atomic tests (set_bit() and test_bit())? May be a memory
barrier would suffice. Not 100% sure, hence checking



You're right, I need to move mm_context_set_global_tlbi() before the 
attach and have a write memory barrier.
If the attach fails, then the context will still be marked for global 
TLBIs (some other driver may also set the bit for a different reason). 
But I would expect the life expectancy of a process designed to use an 
accelerator and failing to attach to be pretty short.


I still think we need the atomic set_bit() though, but that's really for 
the future and if somebody introduces new bits in the context 'flags'.


  Fred



Balbir Singh.





Re: WARNING: CPU: 0 PID: 1 at /build/linux-dp17Ba/linux-4.9.18/arch/powerpc/lib/feature-fixups.c:208 check_features+0x38/0x7c

2017-05-07 Thread Linux User #330250

Am 2017-05-06 um 16:11 schrieb Linux User #330250:

Am 2017-05-04 um 12:15 schrieb Michael Ellerman:

Mathieu Malaterre  writes:


Hi all,

Does this dmesg output speaks to anyone here (smp kernel):


[4.767389] [ cut here ]
[4.774668] WARNING: CPU: 0 PID: 1 at
/build/linux-dp17Ba/linux-4.9.18/arch/powerpc/lib/feature-fixups.c:208
check_features+0x38/0x7c

Is there anything prior to the "cut here" line?

cheers



# uname -a
Linux G4QS 4.9.0-2-powerpc-smp #1 SMP Debian 4.9.18-1 (2017-03-30) ppc 
GNU/Linux


Follow-up: I updated to 4.9.0-3-powerpc-smp Debian kernel, which is 
4.9.25-1 (2017-05-02) and the fault is still the same.
Overnight I also compiled a vanilla 4.11.0 kernel using the Debian 
config as a starting point and "make olddefconfig" for a couple of new 
config options, booted it today and it gives the same messages.


This is 4.9.25-1 (Debian 4.9.0-3) smp kernel: https://pastebin.com/vq7XQ39U
NET: Registered protocol family 17
mpls_gso: MPLS GSO support
CPU features changed after feature patching!
[ cut here ]
WARNING: CPU: 0 PID: 1 at 
/build/linux-pGBmiG/linux-4.9.25/arch/powerpc/lib/feature-fixups.c:208 
check_features+0x38/0x7c

Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.9.0-3-powerpc-smp #1 Debian 
4.9.25-1

task: df4aa9a0 task.stack: df4e8000
NIP: c07e3a00 LR: c07e3a00 CTR: 
REGS: df4e9dd0 TRAP: 0700   Not tainted  (4.9.0-3-powerpc-smp Debian 
4.9.25-1)

MSR: 00029032 
  CR: 28000422  XER: 

GPR00: c07e3a00 df4e9e80 df4aa9a0 002c 9032 010d  

GPR08: 0001 c0861064 c0861064  22000444  c00048b8 

GPR16:        
c08d
GPR24: c07dbcec c08449b4 c07d3ab4 c0823434  c08d6610 c08d 
c082

NIP [c07e3a00] check_features+0x38/0x7c
LR [c07e3a00] check_features+0x38/0x7c
Call Trace:
[df4e9e80] [c07e3a00] check_features+0x38/0x7c (unreliable)
[df4e9e90] [c0004214] do_one_initcall+0x4c/0x188
[df4e9f00] [c07dc5c8] kernel_init_freeable+0x198/0x230
[df4e9f30] [c00048dc] kernel_init+0x24/0x128
[df4e9f40] [c0018080] ret_from_kernel_thread+0x5c/0x64
--- interrupt: 0 at   (null)
LR =   (null)
Instruction dump:
bfc10008 90010014 3fc0c08d 811e6028 3fe0c082 80ff45c4 8108000c 7f883800
41be0014 3c60c072 3863cb38 4be574f5 <0fe0> 815e6028 3bff45c4 813f0004
---[ end trace db276e67fcbb7c47 ]---
registered taskstats version 1
zswap: loaded using pool lzo/zbud
ima: No TPM chip found, activating TPM-bypass!


This is my vanilla 4.11.0 kernel (based on Debian kernel config): 
https://pastebin.com/n5nPMS9g

NET: Registered protocol family 17
mpls_gso: MPLS GSO support
CPU features changed after feature patching!
[ cut here ]
WARNING: CPU: 0 PID: 1 at arch/powerpc/lib/feature-fixups.c:209 
check_features+0x38/0x7c

Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.11.0 #1
task: df4da020 task.stack: df4dc000
NIP: c0822768 LR: c0822768 CTR: 
REGS: df4ddde0 TRAP: 0700   Not tainted  (4.11.0)
MSR: 00029032 
  CR: 28301422  XER: 

GPR00: c0822768 df4dde90 df4da020 002c 1032 010e  
1f2b3000
GPR08: 1032 c08a2208 c08a2208  28301422  c0004880 

GPR16:        
c092
GPR24: c081a9c8 c0883c2c c08129a0 c0863494  c091a670 c092 
c086

NIP [c0822768] check_features+0x38/0x7c
LR [c0822768] check_features+0x38/0x7c
Call Trace:
[df4dde90] [c0822768] check_features+0x38/0x7c (unreliable)
[df4ddea0] [c00041ec] do_one_initcall+0x4c/0x178
[df4ddf00] [c081b2b4] kernel_init_freeable+0x1a0/0x238
[df4ddf30] [c00048a4] kernel_init+0x24/0x120
[df4ddf40] [c0018900] ret_from_kernel_thread+0x5c/0x64
Instruction dump:
bfc10008 90010014 3fc0c092 811ea028 3fe0c086 80ff4624 8108000c 7f883800
41be0014 3c60c076 3863d154 4be44e45 <0fe0> 815ea028 3bff4624 813f0004
---[ end trace c67815660f5f31c3 ]---
registered taskstats version 1
zswap: loaded using pool lzo/zbud
ima: No TPM chip found, activating TPM-bypass! (rc=-19)