Re: pkeys on POWER: Access rights not reset on execve

2018-05-18 Thread Florian Weimer

On 05/19/2018 03:50 AM, Andy Lutomirski wrote:

Now another thread calls pkey_alloc(), so UAMR is asynchronously changed,
and the thread will write zero to the relevant AMR bits.  If I understand
correctly, this means that the decision to mask off unallocated keys via
UAMR effectively forces the initial value of newly-allocated keys in other
threads in the allocating process to be zero, whatever zero means.  (I
didn't get far enough in the POWER docs to figure out what zero means.)  So


(Note that this belongs on the other thread, here I originally wanted to 
talk about the lack of reset of AMR to the default value on execve.)


I don't think UAMOR is updated asynchronously.  On pkey_alloc, it is 
only changed for the current thread, and future threads launched from 
it.  Existing threads are unaffected.  This still results in a 
programming model which is substantially different from x86.



I don't think you're doing anyone any favors by making UAMR dynamic.


This is still true, I think.

Thanks,
Florian


Re: [PATCH 1/2] m68k: Set default dma mask for platform devices

2018-05-18 Thread Finn Thain
On Fri, 18 May 2018, Christoph Hellwig wrote:

> > This implementation of arch_setup_pdev_archdata() differs from the 
> > powerpc one, in that this one avoids clobbering a device dma mask 
> > which has already been initialized.
> 
> I think your implementation should move into the generic implementation 
> in drivers/base/platform.c instead of being stuck in m68k.
> 
> Also powerpc probably wants fixing, but that's something left to the
> ppc folks..

On powerpc, all platform devices get a dma mask. But they don't do that in 
drivers/base/platform.c, they use arch_setup_pdev_archdata(). Why didn't 
they take the approach you suggest?

How would I support the claim that replacing an empty platform device dma 
mask with 0x is safe on all architectures and platforms?

Is there no code conditional upon dev.coherent_dma_mask or dev.dma_mask 
that could misbehave? (Didn't I cite an example in the other thread?*)

If you can convince me that it is safe, I'd be happy to submit the patch 
you asked for.

For now, I still think that patching the platform driver was the correct 
patch*.

Maybe the real problem is your commit 205e1b7f51e4 ("dma-mapping: warn 
when there is no coherent_dma_mask"), because it assumes that all dma_ops 
implementations care about coherent_dma_mask.

The dma_map_ops implementations that do use coherent_dma_mask should 
simply fail when it is unset, right?

Would it not be better to revert your patch and fix the dma_map_ops 
failure paths, than to somehow audit all the platform drivers and patch 
drivers/base/platform.c?

Thanks.

* https://lkml.kernel.org/r/alpine.LNX.2.21.1805091804290.72%40nippy.intranet

-- 


Re: pkeys on POWER: Default AMR, UAMOR values

2018-05-18 Thread Florian Weimer

On 05/19/2018 02:52 AM, Ram Pai wrote:


The POWER semantics make it very hard for a multithreaded program to
meaningfully use protection keys to prevent accidental access to important
memory.


And you can change access rights for unallocated keys (unallocated
at thread start time, allocated later) on x86.  I have extended the
misc/tst-pkeys test to verify that, and it passes on x86, but not on
POWER, where the access rights are stuck.


This is something I do not understand. How can a thread change permissions
on a key, that is not even allocated in the first place.


It was allocated by another thread, and there is synchronization so that 
the allocation happens before the change in access rights.



Do you consider a key
allocated in some other thread's context, as allocated in this threads
context?


Yes, x86 does that.


If not, does that mean -- On x86, you can activate a key just
by changing its permission?


This also true on x86, but just an artifact of the implementation.  You 
are supposed to call pkey_alloc before changing the flag.


Thanks,
Florian


Re: pkeys on POWER: Access rights not reset on execve

2018-05-18 Thread Florian Weimer

On 05/19/2018 03:19 AM, Ram Pai wrote:


New AMR value (PID 112291, before execl): 0x0c00
AMR (PID 112291): 0x0c00


The issue is here.  The second line is after the execl (printed from the 
start of main), and the AMR value is not reset to zero.



Allocated key (PID 112291): 2

I think this is a real bug and needs to be fixed even if the
defaults are kept as-is (see the other thread).


The issue you may be talking about here is that  --

"when you set the AMR register to 0x, it
just sets it to 0x0c00."

To me it looks like, exec/fork are not related to the issue.
Or are they also somehow connected to the issue?


Yes, this is the other issue.  It is not really important here.

Thanks,
Florian


[PATCH 3/3] powerpc/64: hard disable irqs on the panic()ing CPU

2018-05-18 Thread Nicholas Piggin
Similar to previous patches, hard disable interrupts when a CPU is
in panic. This reduces the chance the watchdog has to interfere with
the panic, and avoids any other type of masked interrupt being
executed when crashing which minimises the length of the crash path.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/setup-common.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 0af5c11b9e78..d45fb522fe8a 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -706,12 +706,19 @@ EXPORT_SYMBOL(check_legacy_ioport);
 static int ppc_panic_event(struct notifier_block *this,
  unsigned long event, void *ptr)
 {
+   /*
+* panic does a local_irq_disable, but we really
+* want interrupts to be hard disabled.
+*/
+   hard_irq_disable();
+
/*
 * If firmware-assisted dump has been registered then trigger
 * firmware-assisted dump and let firmware handle everything else.
 */
crash_fadump(NULL, ptr);
-   ppc_md.panic(ptr);  /* May not return */
+   if (ppc_md.panic)
+   ppc_md.panic(ptr);  /* May not return */
return NOTIFY_DONE;
 }
 
@@ -722,7 +729,8 @@ static struct notifier_block ppc_panic_block = {
 
 void __init setup_panic(void)
 {
-   if (!ppc_md.panic)
+   /* PPC64 always does a hard irq disable in its panic handler */
+   if (!IS_ENABLED(CONFIG_PPC64) && !ppc_md.panic)
return;
atomic_notifier_chain_register(_notifier_list, _panic_block);
 }
-- 
2.17.0



[PATCH 2/3] powerpc: smp_send_stop do not offline stopped CPUs

2018-05-18 Thread Nicholas Piggin
Marking CPUs stopped by smp_send_stop as offline can cause warnings
due to cross-CPU wakeups. This trace was noticed on a busy system
running a sysrq+c crash test, after the injected crash:

WARNING: CPU: 51 PID: 1546 at kernel/sched/core.c:1179 set_task_cpu+0x22c/0x240
CPU: 51 PID: 1546 Comm: kworker/u352:1 Tainted: G  D
Workqueue: mlx5e mlx5e_update_stats_work [mlx5_core]
[...]
NIP [c017c21c] set_task_cpu+0x22c/0x240
LR [c017d580] try_to_wake_up+0x230/0x720
Call Trace:
[c1017700] runqueues+0x0/0xb00 (unreliable)
[c017d580] try_to_wake_up+0x230/0x720
[c015a214] insert_work+0x104/0x140
[c015adb0] __queue_work+0x230/0x690
[c03fc5007910] [c015b26c] queue_work_on+0x5c/0x90
[c008135fc8f8] mlx5_cmd_exec+0x538/0xcb0 [mlx5_core]
[c00813608fd0] mlx5_core_access_reg+0x140/0x1d0 [mlx5_core]
[c0081362777c] mlx5e_update_pport_counters.constprop.59+0x6c/0x90 
[mlx5_core]
[c00813628868] mlx5e_update_ndo_stats+0x28/0x90 [mlx5_core]
[c00813625558] mlx5e_update_stats_work+0x68/0xb0 [mlx5_core]
[c015bcec] process_one_work+0x1bc/0x5f0
[c015ecac] worker_thread+0xac/0x6b0
[c0168338] kthread+0x168/0x1b0
[c000b628] ret_from_kernel_thread+0x5c/0xb4

This happens because firstly the CPU is not really offline in the
usual sense, processes and interrupts have not been migrated away.
Secondly smp_send_stop does not happen atomically on all CPUs, so
one CPU can have marked itself offline, while another CPU is still
running processes or interrupts which can affect the first CPU.

Fix this by just not marking the CPU as offline. It's more like
frozen in time, so offline does not really reflect its state properly
anyway. There should be nothing in the crash/panic path that walks
online CPUs and synchronously waits for them, so this change should
not introduce new hangs.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/smp.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 9ca7148b5881..6d6cf14009cf 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -579,9 +579,6 @@ static void nmi_stop_this_cpu(struct pt_regs *regs)
nmi_ipi_busy_count--;
nmi_ipi_unlock();
 
-   /* Remove this CPU */
-   set_cpu_online(smp_processor_id(), false);
-
spin_begin();
while (1)
spin_cpu_relax();
@@ -596,9 +593,6 @@ void smp_send_stop(void)
 
 static void stop_this_cpu(void *dummy)
 {
-   /* Remove this CPU */
-   set_cpu_online(smp_processor_id(), false);
-
hard_irq_disable();
spin_begin();
while (1)
-- 
2.17.0



[PATCH 1/3] powerpc/64: hard disable irqs in panic_smp_self_stop

2018-05-18 Thread Nicholas Piggin
Similarly to commit 855bfe0de1 ("powerpc: hard disable irqs in
smp_send_stop loop"), irqs should be hard disabled by
panic_smp_self_stop.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/setup_64.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index b78f142a4148..d122321ad542 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -380,6 +380,14 @@ void early_setup_secondary(void)
 
 #endif /* CONFIG_SMP */
 
+void panic_smp_self_stop(void)
+{
+   hard_irq_disable();
+   spin_begin();
+   while (1)
+   spin_cpu_relax();
+}
+
 #if defined(CONFIG_SMP) || defined(CONFIG_KEXEC_CORE)
 static bool use_spinloop(void)
 {
-- 
2.17.0



[PATCH 0/3] further reduce warnings and watchdogs in crash/shutdown paths

2018-05-18 Thread Nicholas Piggin
Patches 1 and 3 are more of the same, just upgrading local_irq_disable
to hard_irq_disable in places.

Patch 2 fixes a warning people have been hitting in crash testing on
busy systems.

Thanks,
Nick

Nicholas Piggin (3):
  powerpc/64: hard disable irqs in panic_smp_self_stop
  powerpc: smp_send_stop do not offline stopped CPUs
  powerpc/64: hard disable irqs on the panic()ing CPU

 arch/powerpc/kernel/setup-common.c | 12 ++--
 arch/powerpc/kernel/setup_64.c |  8 
 arch/powerpc/kernel/smp.c  |  6 --
 3 files changed, 18 insertions(+), 8 deletions(-)

-- 
2.17.0



Re: pkeys on POWER: Access rights not reset on execve

2018-05-18 Thread Andy Lutomirski
On Fri, May 18, 2018 at 6:19 PM Ram Pai  wrote:

> However the fundamental issue is still the same, as mentioned in the
> other thread.

> "Should the permissions on a key be allowed to be changed, if the key
> is not allocated in the first place?".

> my answer is NO. Lets debate :)

As a preface, here's my two-minute attempt to understand POWER's behavior:
there are two registers, AMR and UAMR.  AMR contains both kernel-relevant
state and user-relevant state.  UAMR specifies which bits of AMR are
available for user code to directly access.  AMR bits outside UAMR are read
as zero and are unaffected by writes.  I'm assuming that the kernel
reserves some subset of AMR bits in advance to correspond to allocatable
pkeys.

Here's my question: given that disallowed AMR bits are read-as-zero, there
can always be a thread that is in the middle of a sequence like:

unsigned long old = amr;
amr |= whatever;
...  <- thread is here
amr = old;

Now another thread calls pkey_alloc(), so UAMR is asynchronously changed,
and the thread will write zero to the relevant AMR bits.  If I understand
correctly, this means that the decision to mask off unallocated keys via
UAMR effectively forces the initial value of newly-allocated keys in other
threads in the allocating process to be zero, whatever zero means.  (I
didn't get far enough in the POWER docs to figure out what zero means.)  So
I don't think you're doing anyone any favors by making UAMR dynamic.

IOW both x86 and POWER screwed up the ISA.


Re: pkeys on POWER: Access rights not reset on execve

2018-05-18 Thread Ram Pai
On Fri, May 18, 2018 at 04:27:14PM +0200, Florian Weimer wrote:
> This test program:
> 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> 
> /* Return the value of the AMR register.  */
> static inline unsigned long int
> pkey_read (void)
> {
>   unsigned long int result;
>   __asm__ volatile ("mfspr %0, 13" : "=r" (result));
>   return result;
> }
> 
> /* Overwrite the AMR register with VALUE.  */
> static inline void
> pkey_write (unsigned long int value)
> {
>   __asm__ volatile ("mtspr 13, %0" : : "r" (value));
> }
> 
> int
> main (int argc, char **argv)
> {
>   printf ("AMR (PID %d): 0x%016lx\n", (int) getpid (), pkey_read());
>   if (argc > 1)
> {
>   int key = syscall (__NR_pkey_alloc, 0, 0);
>   if (key < 0)
> err (1, "pkey_alloc");
>   printf ("Allocated key (PID %d): %d\n", (int) getpid (), key);
>   return 0;
> }
> 
>   pid_t pid = fork ();
>   if (pid == 0)
> {
>   execl ("/proc/self/exe", argv[0], "subprocess", NULL);
>   _exit (1);
> }
>   if (pid < 0)
> err (1, "fork");
>   int status;
>   if (waitpid (pid, , 0) < 0)
> err (1, "waitpid");
> 
>   int key = syscall (__NR_pkey_alloc, 0, 0);
>   if (key < 0)
> err (1, "pkey_alloc");
>   printf ("Allocated key (PID %d): %d\n", (int) getpid (), key);
> 
>   unsigned long int amr = -1;
>   printf ("Setting AMR: 0x%016lx\n", amr);
>   pkey_write (amr);
>   printf ("New AMR value (PID %d, before execl): 0x%016lx\n",
>   (int) getpid (), pkey_read());
>   execl ("/proc/self/exe", argv[0], "subprocess", NULL);
>   err (1, "exec");
>   return 1;
> }
> 
> shows that the AMR register value is not reset on execve:
> 
> AMR (PID 112291): 0x
> AMR (PID 112292): 0x
> Allocated key (PID 112292): 2
> Allocated key (PID 112291): 2
> Setting AMR: 0x
> New AMR value (PID 112291, before execl): 0x0c00
> AMR (PID 112291): 0x0c00
> Allocated key (PID 112291): 2
> 
> I think this is a real bug and needs to be fixed even if the
> defaults are kept as-is (see the other thread).

The issue you may be talking about here is that  --

"when you set the AMR register to 0x, it 
just sets it to 0x0c00."

To me it looks like, exec/fork are not related to the issue.
Or are they also somehow connected to the issue?


The reason the AMR register does not get set to 0x,
is because none of those keys; except key 2, are active. So it ignores
all other bits and just sets the bits corresponding to key 2.

However the fundamental issue is still the same, as mentioned in the
other thread.

"Should the permissions on a key be allowed to be changed, if the key
is not allocated in the first place?".

my answer is NO. Lets debate :)
RP



Re: pkeys on POWER: Default AMR, UAMOR values

2018-05-18 Thread Ram Pai
On Fri, May 18, 2018 at 11:13:30PM +0200, Florian Weimer wrote:
> On 05/18/2018 09:39 PM, Andy Lutomirski wrote:
> >The difference is that x86 starts out with deny-all instead of allow-all.

Ah!. this explains the discrepency. But still does not explain one
thing.. see below.

> >The POWER semantics make it very hard for a multithreaded program to
> >meaningfully use protection keys to prevent accidental access to important
> >memory.
> 
> And you can change access rights for unallocated keys (unallocated
> at thread start time, allocated later) on x86.  I have extended the
> misc/tst-pkeys test to verify that, and it passes on x86, but not on
> POWER, where the access rights are stuck.

This is something I do not understand. How can a thread change permissions
on a key, that is not even allocated in the first place. Do you consider a key
allocated in some other thread's context, as allocated in this threads
context? If not, does that mean -- On x86, you can activate a key just
by changing its permission?


RP



Re: pkeys on POWER: Default AMR, UAMOR values

2018-05-18 Thread Florian Weimer

On 05/18/2018 09:39 PM, Andy Lutomirski wrote:

The difference is that x86 starts out with deny-all instead of allow-all.
The POWER semantics make it very hard for a multithreaded program to
meaningfully use protection keys to prevent accidental access to important
memory.


And you can change access rights for unallocated keys (unallocated at 
thread start time, allocated later) on x86.  I have extended the 
misc/tst-pkeys test to verify that, and it passes on x86, but not on 
POWER, where the access rights are stuck.


I believe this is due to an incorrect UAMOR setting.

Thanks,
Florian


Re: pkeys on POWER: Default AMR, UAMOR values

2018-05-18 Thread Florian Weimer

On 05/18/2018 07:44 PM, Ram Pai wrote:

Florian, is the behavior on x86 any different? A key allocated in the
context off one thread is not meaningful in the context of any other
thread.

Since thread B was created prior to the creation of the key, and the key
was created in the context of thread A, thread B neither inherits the
key nor its permissions. Atleast that is how the semantics are supposed
to work as per the man page.

man 7 pkey

" Applications  using  threads  and  protection  keys  should
be especially careful.  Threads inherit the protection key rights of the
parent at the time of the clone(2), system call.  Applications should
either ensure that their own permissions are appropriate for child
threads at the time when clone(2) is  called,  or ensure that each child
thread can perform its own initialization of protection key rights."


I reported two separate issues (actually three, but the execve bug is in 
a separate issue).  The default, and the write restrictions.


The default is just a difference to x86 (however, x86 can be booted with 
init_pkru=0 and behaves the same way, but we're probably going to remove 
that).


The POWER implementation has the additional wrinkle that threads 
launched early, before key allocation, can never change access rights 
because they inherited not just the access rights, but also the access 
rights access mask.  This is different from x86, where all threads can 
freely update access rights, and contradicts the behavior in the manpage 
which says that “each child thread can perform its own initialization of 
protection key rights”.  It can't do that if it is launched before key 
allocation, which is not the right behavior IMO.


Thanks,
Florian


Re: [RFC v3 1/4] powerpc/hotplug/drcinfo: Fix bugs parsing ibm,drc-info structs

2018-05-18 Thread Michael Bringmann
See below.

On 05/18/2018 02:57 PM, Nathan Fontenot wrote:
> On 05/17/2018 05:41 PM, Michael Bringmann wrote:
>> [Replace/withdraw previous patch submission to ensure that testing
>> of related patches on similar hardware progresses together.]
>>
>> This patch fixes a memory parsing bug when using of_prop_next_u32
>> calls at the start of a structure.  Depending upon the value of
>> "cur" memory pointer argument to of_prop_next_u32, it will or it
>> won't advance the value of the returned memory pointer by the
>> size of one u32.  This patch corrects the code to deal with that
>> indexing feature when parsing the ibm,drc-info structs for CPUs.
>> Also, need to advance the pointer at the end of_read_drc_info_cell
>> for same reason.
>>
> 
> I see that you provide an update for of_read_drc_info_cell to fix the
> unexpected behavior you're seeing, but I'm not sure why you're updating
> the code in pseries_energy.c and rpaphp_core.c? can you provide some
> additional information as to why these functions also need to be updated.

The changes are related.

of_prop_next_u32() does not read a u32 and then advance the pointer.
It advances the pointer and then reads a u32.  It does an error check
to see whether the u32 read is within the boundary of the property
value, but it returns a pointer to the u32 that was read.

> 
>> Signed-off-by: Michael Bringmann 
>> Fixes: 3f38000eda48 ("powerpc/firmware: Add definitions for new drc-info 
>> firmware feature" -- end of patch series applied to powerpc next)
>> ---
>> Changes in V3:
>>   -- Rebased patch to 4.17-rc5 kernel
>> ---
>>  arch/powerpc/platforms/pseries/of_helpers.c |5 ++---
>>  arch/powerpc/platforms/pseries/pseries_energy.c |2 ++
>>  drivers/pci/hotplug/rpaphp_core.c   |1 +
>>  3 files changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/of_helpers.c 
>> b/arch/powerpc/platforms/pseries/of_helpers.c
>> index 6df192f..20598b2 100644
>> --- a/arch/powerpc/platforms/pseries/of_helpers.c
>> +++ b/arch/powerpc/platforms/pseries/of_helpers.c
>> @@ -65,9 +65,7 @@ int of_read_drc_info_cell(struct property **prop, const 
>> __be32 **curval,
>>
>>  /* Get drc-index-start:encode-int */
>>  p2 = (const __be32 *)p;
>> -p2 = of_prop_next_u32(*prop, p2, >drc_index_start);
>> -if (!p2)
>> -return -EINVAL;
>> +data->drc_index_start = of_read_number(p2, 1);
> 
> This appears to resolve advancing the pointer for the beginning of a struct.
> 
>>
>>  /* Get drc-name-suffix-start:encode-int */
>>  p2 = of_prop_next_u32(*prop, p2, >drc_name_suffix_start);
>> @@ -88,6 +86,7 @@ int of_read_drc_info_cell(struct property **prop, const 
>> __be32 **curval,
>>  p2 = of_prop_next_u32(*prop, p2, >drc_power_domain);
>>  if (!p2)
>>  return -EINVAL;
>> +p2++;
> 
> ...but why is the advancement needed here? of_prop_next_u32 should have 
> advanced it, correct?
> 
> -Nathan
> 
>>
>>  /* Should now know end of current entry */
>>  (*curval) = (void *)p2;
>> diff --git a/arch/powerpc/platforms/pseries/pseries_energy.c 
>> b/arch/powerpc/platforms/pseries/pseries_energy.c
>> index 6ed2212..c7d84aa 100644
>> --- a/arch/powerpc/platforms/pseries/pseries_energy.c
>> +++ b/arch/powerpc/platforms/pseries/pseries_energy.c
>> @@ -64,6 +64,7 @@ static u32 cpu_to_drc_index(int cpu)
>>  value = of_prop_next_u32(info, NULL, _set_entries);
>>  if (!value)
>>  goto err_of_node_put;
>> +value++;
>>
>>  for (j = 0; j < num_set_entries; j++) {
>>
>> @@ -126,6 +127,7 @@ static int drc_index_to_cpu(u32 drc_index)
>>  value = of_prop_next_u32(info, NULL, _set_entries);
>>  if (!value)
>>  goto err_of_node_put;
>> +value++;
>>
>>  for (j = 0; j < num_set_entries; j++) {
>>
>> diff --git a/drivers/pci/hotplug/rpaphp_core.c 
>> b/drivers/pci/hotplug/rpaphp_core.c
>> index fb5e084..dccdf62 100644
>> --- a/drivers/pci/hotplug/rpaphp_core.c
>> +++ b/drivers/pci/hotplug/rpaphp_core.c
>> @@ -239,6 +239,7 @@ static int rpaphp_check_drc_props_v2(struct device_node 
>> *dn, char *drc_name,
>>  value = of_prop_next_u32(info, NULL, );
>>  if (!value)
>>  return -EINVAL;
>> +value++;
>>
>>  for (j = 0; j < entries; j++) {
>>  of_read_drc_info_cell(, , );
>>
> 
> 

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:   (512) 466-0650
m...@linux.vnet.ibm.com



Re: [RFC v4 2/3] powerpc migration/cpu: Associativity & cpu changes

2018-05-18 Thread Michael Bringmann


On 05/18/2018 02:28 PM, Nathan Fontenot wrote:
> On 05/17/2018 05:26 PM, Michael Bringmann wrote:
>> powerpc migration/cpu: Now apply changes to the associativity of cpus
>> for the topology of LPARS in Post Migration events.  Recognize more
>> changes to the associativity of memory blocks described by the
>> 'cpu' properties when processing the topology of LPARS in Post Migration
>> events.  Previous efforts only recognized whether a memory block's
>> assignment had changed in the property.  Changes here include:
>>
>> * Provide hotplug CPU 'readd by index' operation
>> * Checking for changes in cpu associativity and making 'readd' calls
>>   when differences are observed.
>> * Queue up  changes to CPU properties so that they may take place
>>   after all PowerPC device-tree changes have been applied i.e. after
>>   the device hotplug is released in the mobility code.
> 
> This kinda feels like three different patches in one. any reason to not split
> this into three patches? Perhaps at the least split the last item into it's
> own patch.

E.g.
1) Conditional 'acquire_drc'
2) PSERIES_HP_ELOG_ACTION_READD call to dlpar_cpu_readd_by_index
3) Add lock/unlock device hotplug
> 
>>
>> Signed-off-by: Michael Bringmann 
>> ---
>> Changes include:
>>   -- Rearrange patches to co-locate CPU property-related changes.
>>   -- Modify dlpar_cpu_add & dlpar_cpu_remove to skip DRC index acquire
>>  or release operations during the CPU readd process.
>>   -- Correct a bug in DRC index selection for queued operation.
>>   -- Rebase to 4.17-rc5 kernel
>> ---
>>  arch/powerpc/platforms/pseries/hotplug-cpu.c |  123 
>> +++---
>>  arch/powerpc/platforms/pseries/mobility.c|3 +
>>  2 files changed, 95 insertions(+), 31 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
>> b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> index a408217..23d4cb8 100644
>> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> @@ -474,7 +474,7 @@ static bool valid_cpu_drc_index(struct device_node 
>> *parent, u32 drc_index)
>>  );
>>  }
>>
>> -static ssize_t dlpar_cpu_add(u32 drc_index)
>> +static ssize_t dlpar_cpu_add(u32 drc_index, bool acquire_drc)
>>  {
>>  struct device_node *dn, *parent;
>>  int rc, saved_rc;
>> @@ -499,19 +499,22 @@ static ssize_t dlpar_cpu_add(u32 drc_index)
>>  return -EINVAL;
>>  }
>>
>> -rc = dlpar_acquire_drc(drc_index);
>> -if (rc) {
>> -pr_warn("Failed to acquire DRC, rc: %d, drc index: %x\n",
>> -rc, drc_index);
>> -of_node_put(parent);
>> -return -EINVAL;
>> +if (acquire_drc) {
>> +rc = dlpar_acquire_drc(drc_index);
>> +if (rc) {
>> +pr_warn("Failed to acquire DRC, rc: %d, drc index: 
>> %x\n",
>> +rc, drc_index);
>> +of_node_put(parent);
>> +return -EINVAL;
>> +}
>>  }
>>
>>  dn = dlpar_configure_connector(cpu_to_be32(drc_index), parent);
>>  if (!dn) {
>>  pr_warn("Failed call to configure-connector, drc index: %x\n",
>>  drc_index);
>> -dlpar_release_drc(drc_index);
>> +if (acquire_drc)
>> +dlpar_release_drc(drc_index);
>>  of_node_put(parent);
>>  return -EINVAL;
>>  }
>> @@ -526,8 +529,9 @@ static ssize_t dlpar_cpu_add(u32 drc_index)
>>  pr_warn("Failed to attach node %s, rc: %d, drc index: %x\n",
>>  dn->name, rc, drc_index);
>>
>> -rc = dlpar_release_drc(drc_index);
>> -if (!rc)
>> +if (acquire_drc)
>> +rc = dlpar_release_drc(drc_index);
>> +if (!rc || acquire_drc)
>>  dlpar_free_cc_nodes(dn);
> 
> This seems like it would be more readable if everything were inside the
> if (acquire_drc) block.
> 
>>
>>  return saved_rc;
>> @@ -540,7 +544,7 @@ static ssize_t dlpar_cpu_add(u32 drc_index)
>>  dn->name, rc, drc_index);
>>
>>  rc = dlpar_detach_node(dn);
>> -if (!rc)
>> +if (!rc && acquire_drc)
>>  dlpar_release_drc(drc_index);
>>
>>  return saved_rc;
>> @@ -608,12 +612,13 @@ static int dlpar_offline_cpu(struct device_node *dn)
>>
>>  }
>>
>> -static ssize_t dlpar_cpu_remove(struct device_node *dn, u32 drc_index)
>> +static ssize_t dlpar_cpu_remove(struct device_node *dn, u32 drc_index,
>> +bool release_drc)
>>  {
>>  int rc;
>>
>> -pr_debug("Attempting to remove CPU %s, drc index: %x\n",
>> - dn->name, drc_index);
>> +pr_debug("Attempting to remove CPU %s, drc index: %x (%d)\n",
>> + dn->name, drc_index, release_drc);
>>
>>  rc = 

Re: [RFC v4 1/3] powerpc migration/drmem: Modify DRMEM code to export more features

2018-05-18 Thread Michael Bringmann
See below.

On 05/18/2018 02:17 PM, Nathan Fontenot wrote:
> On 05/17/2018 05:26 PM, Michael Bringmann wrote:
>> powerpc migration/drmem: Export many of the functions of DRMEM to
>> parse "ibm,dynamic-memory" and "ibm,dynamic-memory-v2" during
>> hotplug operations and for Post Migration events.
>>
>> Also modify the DRMEM initialization code to allow it to,
>>
>> * Be called after system initialization
>> * Provide a separate user copy of the LMB array that is produces
>> * Free the user copy upon request
>>
>> Signed-off-by: Michael Bringmann 
>> ---
>> Changes in RFC:
>>   -- Separate DRMEM changes into a standalone patch
>>   -- Do not export excess functions.  Make exported names more explicit.
>>   -- Add new iterator to work through a pair of drmem_info arrays.
>>   -- Modify DRMEM code to replace usages of dt_root_addr_cells, and
>>  dt_mem_next_cell, as these are only available at first boot.
>>   -- Rebase to 4.17-rc5 kernel
>> ---
>>  arch/powerpc/include/asm/drmem.h |   10 +
>>  arch/powerpc/mm/drmem.c  |   78 
>> +++---
>>  2 files changed, 66 insertions(+), 22 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/drmem.h 
>> b/arch/powerpc/include/asm/drmem.h
>> index ce242b9..c964b89 100644
>> --- a/arch/powerpc/include/asm/drmem.h
>> +++ b/arch/powerpc/include/asm/drmem.h
>> @@ -35,6 +35,13 @@ struct drmem_lmb_info {
>>  _info->lmbs[0],   \
>>  _info->lmbs[drmem_info->n_lmbs - 1])
>>
>> +#define for_each_pair_drmem_lmb(dinfo1, lmb1, dinfo2, lmb2) \
>> +for ((lmb1) = (>lmbs[0]),   \
>> + (lmb2) = (>lmbs[0]);   \
>> + ((lmb1) <= (>lmbs[dinfo1->n_lmbs - 1])) && \
>> + ((lmb2) <= (>lmbs[dinfo2->n_lmbs - 1]));   \
>> + (lmb1)++, (lmb2)++)
>> +
>>  /*
>>   * The of_drconf_cell_v1 struct defines the layout of the LMB data
>>   * specified in the ibm,dynamic-memory device tree property.
>> @@ -94,6 +101,9 @@ void __init walk_drmem_lmbs(struct device_node *dn,
>>  void (*func)(struct drmem_lmb *, const __be32 **));
>>  int drmem_update_dt(void);
>>
>> +struct drmem_lmb_info* drmem_init_lmbs(struct property *prop);
>> +void drmem_lmbs_free(struct drmem_lmb_info *dinfo);
>> +
>>  #ifdef CONFIG_PPC_PSERIES
>>  void __init walk_drmem_lmbs_early(unsigned long node,
>>  void (*func)(struct drmem_lmb *, const __be32 **));
>> diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
>> index 3f18036..d9b281c 100644
>> --- a/arch/powerpc/mm/drmem.c
>> +++ b/arch/powerpc/mm/drmem.c
>> @@ -20,6 +20,7 @@
>>
>>  static struct drmem_lmb_info __drmem_info;
>>  struct drmem_lmb_info *drmem_info = &__drmem_info;
>> +static int n_root_addr_cells;
>>
>>  u64 drmem_lmb_memory_max(void)
>>  {
>> @@ -193,12 +194,13 @@ int drmem_update_dt(void)
>>  return rc;
>>  }
>>
>> -static void __init read_drconf_v1_cell(struct drmem_lmb *lmb,
>> +static void read_drconf_v1_cell(struct drmem_lmb *lmb,
>> const __be32 **prop)
>>  {
>>  const __be32 *p = *prop;
>>
>> -lmb->base_addr = dt_mem_next_cell(dt_root_addr_cells, );
>> +lmb->base_addr = of_read_number(p, n_root_addr_cells);
>> +p += n_root_addr_cells;
>>  lmb->drc_index = of_read_number(p++, 1);
>>
>>  p++; /* skip reserved field */
>> @@ -209,7 +211,7 @@ static void __init read_drconf_v1_cell(struct drmem_lmb 
>> *lmb,
>>  *prop = p;
>>  }
>>
>> -static void __init __walk_drmem_v1_lmbs(const __be32 *prop, const __be32 
>> *usm,
>> +static void __walk_drmem_v1_lmbs(const __be32 *prop, const __be32 *data,
>>  void (*func)(struct drmem_lmb *, const __be32 **))
>>  {
>>  struct drmem_lmb lmb;
>> @@ -221,17 +223,18 @@ static void __init __walk_drmem_v1_lmbs(const __be32 
>> *prop, const __be32 *usm,
>>
>>  for (i = 0; i < n_lmbs; i++) {
>>  read_drconf_v1_cell(, );
>> -func(, );
>> +func(, );
> 
> Is there a need to change the variable name from usm to data (bot here and 
> below)?

Actually, this was your recommendation to me when we talked about
updating this code a couple of months ago.  Before we changed the
code to create a copy of the DRMEM lmbs and compare it.

I will change it back.

> 
>>  }
>>  }
>>
>> -static void __init read_drconf_v2_cell(struct of_drconf_cell_v2 *dr_cell,
>> +static void read_drconf_v2_cell(struct of_drconf_cell_v2 *dr_cell,
>> const __be32 **prop)
>>  {
>>  const __be32 *p = *prop;
>>
>>  dr_cell->seq_lmbs = of_read_number(p++, 1);
>> -dr_cell->base_addr = dt_mem_next_cell(dt_root_addr_cells, );
>> +dr_cell->base_addr = of_read_number(p, n_root_addr_cells);
>> +p += n_root_addr_cells;
>>  dr_cell->drc_index = of_read_number(p++, 1);
>>  dr_cell->aa_index = of_read_number(p++, 1);
>>  

Re: [RFC v3 1/4] powerpc/hotplug/drcinfo: Fix bugs parsing ibm,drc-info structs

2018-05-18 Thread Nathan Fontenot
On 05/17/2018 05:41 PM, Michael Bringmann wrote:
> [Replace/withdraw previous patch submission to ensure that testing
> of related patches on similar hardware progresses together.]
> 
> This patch fixes a memory parsing bug when using of_prop_next_u32
> calls at the start of a structure.  Depending upon the value of
> "cur" memory pointer argument to of_prop_next_u32, it will or it
> won't advance the value of the returned memory pointer by the
> size of one u32.  This patch corrects the code to deal with that
> indexing feature when parsing the ibm,drc-info structs for CPUs.
> Also, need to advance the pointer at the end of_read_drc_info_cell
> for same reason.
> 

I see that you provide an update for of_read_drc_info_cell to fix the
unexpected behavior you're seeing, but I'm not sure why you're updating
the code in pseries_energy.c and rpaphp_core.c? can you provide some
additional information as to why these functions also need to be updated.

> Signed-off-by: Michael Bringmann 
> Fixes: 3f38000eda48 ("powerpc/firmware: Add definitions for new drc-info 
> firmware feature" -- end of patch series applied to powerpc next)
> ---
> Changes in V3:
>   -- Rebased patch to 4.17-rc5 kernel
> ---
>  arch/powerpc/platforms/pseries/of_helpers.c |5 ++---
>  arch/powerpc/platforms/pseries/pseries_energy.c |2 ++
>  drivers/pci/hotplug/rpaphp_core.c   |1 +
>  3 files changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/of_helpers.c 
> b/arch/powerpc/platforms/pseries/of_helpers.c
> index 6df192f..20598b2 100644
> --- a/arch/powerpc/platforms/pseries/of_helpers.c
> +++ b/arch/powerpc/platforms/pseries/of_helpers.c
> @@ -65,9 +65,7 @@ int of_read_drc_info_cell(struct property **prop, const 
> __be32 **curval,
> 
>   /* Get drc-index-start:encode-int */
>   p2 = (const __be32 *)p;
> - p2 = of_prop_next_u32(*prop, p2, >drc_index_start);
> - if (!p2)
> - return -EINVAL;
> + data->drc_index_start = of_read_number(p2, 1);

This appears to resolve advancing the pointer for the beginning of a struct.

> 
>   /* Get drc-name-suffix-start:encode-int */
>   p2 = of_prop_next_u32(*prop, p2, >drc_name_suffix_start);
> @@ -88,6 +86,7 @@ int of_read_drc_info_cell(struct property **prop, const 
> __be32 **curval,
>   p2 = of_prop_next_u32(*prop, p2, >drc_power_domain);
>   if (!p2)
>   return -EINVAL;
> + p2++;

...but why is the advancement needed here? of_prop_next_u32 should have 
advanced it, correct?

-Nathan

> 
>   /* Should now know end of current entry */
>   (*curval) = (void *)p2;
> diff --git a/arch/powerpc/platforms/pseries/pseries_energy.c 
> b/arch/powerpc/platforms/pseries/pseries_energy.c
> index 6ed2212..c7d84aa 100644
> --- a/arch/powerpc/platforms/pseries/pseries_energy.c
> +++ b/arch/powerpc/platforms/pseries/pseries_energy.c
> @@ -64,6 +64,7 @@ static u32 cpu_to_drc_index(int cpu)
>   value = of_prop_next_u32(info, NULL, _set_entries);
>   if (!value)
>   goto err_of_node_put;
> + value++;
> 
>   for (j = 0; j < num_set_entries; j++) {
> 
> @@ -126,6 +127,7 @@ static int drc_index_to_cpu(u32 drc_index)
>   value = of_prop_next_u32(info, NULL, _set_entries);
>   if (!value)
>   goto err_of_node_put;
> + value++;
> 
>   for (j = 0; j < num_set_entries; j++) {
> 
> diff --git a/drivers/pci/hotplug/rpaphp_core.c 
> b/drivers/pci/hotplug/rpaphp_core.c
> index fb5e084..dccdf62 100644
> --- a/drivers/pci/hotplug/rpaphp_core.c
> +++ b/drivers/pci/hotplug/rpaphp_core.c
> @@ -239,6 +239,7 @@ static int rpaphp_check_drc_props_v2(struct device_node 
> *dn, char *drc_name,
>   value = of_prop_next_u32(info, NULL, );
>   if (!value)
>   return -EINVAL;
> + value++;
> 
>   for (j = 0; j < entries; j++) {
>   of_read_drc_info_cell(, , );
> 



Re: [PATCH bpf v2 5/6] tools: bpftool: resolve calls without using imm field

2018-05-18 Thread Jakub Kicinski
On Fri, 18 May 2018 18:20:38 +0530, Sandipan Das wrote:
> Currently, we resolve the callee's address for a JITed function
> call by using the imm field of the call instruction as an offset
> from __bpf_call_base. If bpf_jit_kallsyms is enabled, we further
> use this address to get the callee's kernel symbol's name.
> 
> For some architectures, such as powerpc64, the imm field is not
> large enough to hold this offset. So, instead of assigning this
> offset to the imm field, the verifier now assigns the subprog
> id. Also, a list of kernel symbol addresses for all the JITed
> functions is provided in the program info. We now use the imm
> field as an index for this list to lookup a callee's symbol's
> address and resolve its name.
> 
> Suggested-by: Daniel Borkmann 
> Signed-off-by: Sandipan Das 
> ---
> v2:
>  - Order variables from longest to shortest
>  - Make sure that ksyms_ptr and ksyms_len are always initialized
>  - Simplify code

Thanks for the improvements!  Since there will be v3 two minor nit
picks still :)

>  tools/bpf/bpftool/prog.c  | 29 +
>  tools/bpf/bpftool/xlated_dumper.c | 10 +-
>  tools/bpf/bpftool/xlated_dumper.h |  2 ++
>  3 files changed, 40 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
> index 9bdfdf2d3fbe..e2f8f8f259fc 100644
> --- a/tools/bpf/bpftool/prog.c
> +++ b/tools/bpf/bpftool/prog.c
> @@ -421,19 +421,26 @@ static int do_show(int argc, char **argv)
>  static int do_dump(int argc, char **argv)
>  {
>   struct bpf_prog_info info = {};
> + unsigned long *addrs = NULL;
>   struct dump_data dd = {};
>   __u32 len = sizeof(info);
>   unsigned int buf_size;
> + unsigned int nr_addrs;
>   char *filepath = NULL;
>   bool opcodes = false;
>   bool visual = false;
>   unsigned char *buf;
>   __u32 *member_len;
>   __u64 *member_ptr;
> + __u32 *ksyms_len;
> + __u64 *ksyms_ptr;
>   ssize_t n;
>   int err;
>   int fd;
>  
> + ksyms_len = _jited_ksyms;
> + ksyms_ptr = _ksyms;

I'm not sure why you need these, why not just access
info.nr_jited_ksyms and info.jited_ksyms directly?  "member" variables
are there because jited and xlated images get returned in different
member of struct bpf_prog_info.

>   if (is_prefix(*argv, "jited")) {
>   member_len = _prog_len;
>   member_ptr = _prog_insns;
> @@ -496,10 +503,22 @@ static int do_dump(int argc, char **argv)
>   return -1;
>   }
>  
> + nr_addrs = *ksyms_len;
> + if (nr_addrs) {
> + addrs = malloc(nr_addrs * sizeof(__u64));
> + if (!addrs) {
> + p_err("mem alloc failed");
> + close(fd);
> + goto err_free;
> + }
> + }
> +
>   memset(, 0, sizeof(info));
>  
>   *member_ptr = ptr_to_u64(buf);
>   *member_len = buf_size;
> + *ksyms_ptr = ptr_to_u64(addrs);
> + *ksyms_len = nr_addrs;
>  
>   err = bpf_obj_get_info_by_fd(fd, , );
>   close(fd);
> @@ -513,6 +532,11 @@ static int do_dump(int argc, char **argv)
>   goto err_free;
>   }
>  
> + if (*ksyms_len > nr_addrs) {
> + p_err("too many addresses returned");
> + goto err_free;
> + }
> +
>   if ((member_len == _prog_len &&
>info.jited_prog_insns == 0) ||
>   (member_len == _prog_len &&
> @@ -558,6 +582,9 @@ static int do_dump(int argc, char **argv)
>   dump_xlated_cfg(buf, *member_len);
>   } else {
>   kernel_syms_load();
> + dd.nr_jited_ksyms = *ksyms_len;
> + dd.jited_ksyms = (__u64 *) *ksyms_ptr;
> +
>   if (json_output)
>   dump_xlated_json(, buf, *member_len, opcodes);
>   else

> diff --git a/tools/bpf/bpftool/xlated_dumper.c 
> b/tools/bpf/bpftool/xlated_dumper.c
> index 7a3173b76c16..fb065b55db6d 100644
> --- a/tools/bpf/bpftool/xlated_dumper.c
> +++ b/tools/bpf/bpftool/xlated_dumper.c
> @@ -203,6 +207,10 @@ static const char *print_call(void *private_data,
>   unsigned long address = dd->address_call_base + insn->imm;
>   struct kernel_sym *sym;
>  
> + if (insn->src_reg == BPF_PSEUDO_CALL &&
> + (__u32) insn->imm < dd->nr_jited_ksyms)

Indentation seems off.

> + address = dd->jited_ksyms[insn->imm];
> +
>   sym = kernel_syms_search(dd, address);
>   if (insn->src_reg == BPF_PSEUDO_CALL)
>   return print_call_pcrel(dd, sym, address, insn);



Re: pkeys on POWER: Default AMR, UAMOR values

2018-05-18 Thread Andy Lutomirski
On Fri, May 18, 2018 at 10:45 AM Ram Pai  wrote:

> On Fri, May 18, 2018 at 03:17:14PM +0200, Florian Weimer wrote:
> > I'm working on adding POWER pkeys support to glibc.  The coding work
> > is done, but I'm faced with some test suite failures.
> >
> > Unlike the default x86 configuration, on POWER, existing threads
> > have full access to newly allocated keys.
> >
> > Or, more precisely, in this scenario:
> >
> > * Thread A launches thread B
> > * Thread B waits
> > * Thread A allocations a protection key with pkey_alloc
> > * Thread A applies the key to a page
> > * Thread A signals thread B
> > * Thread B starts to run and accesses the page
> >
> > Then at the end, the access will be granted.
> >
> > I hope it's not too late to change this to denied access.
> >
> > Furthermore, I think the UAMOR value is wrong as well because it
> > prevents thread B at the end to set the AMR register.  In
> > particular, if I do this
> >
> > * … (as before)
> > * Thread A signals thread B
> > * Thread B sets the access rights for the key to PKEY_DISABLE_ACCESS
> > * Thread B reads the current access rights for the key
> >
> > then it still gets 0 (all access permitted) because the original
> > UAMOR value inherited from thread A prior to the key allocation
> > masks out the access right update for the newly allocated key.

> Florian, is the behavior on x86 any different? A key allocated in the
> context off one thread is not meaningful in the context of any other
> thread.


The difference is that x86 starts out with deny-all instead of allow-all.
The POWER semantics make it very hard for a multithreaded program to
meaningfully use protection keys to prevent accidental access to important
memory.


Re: [RFC v4 2/3] powerpc migration/cpu: Associativity & cpu changes

2018-05-18 Thread Nathan Fontenot
On 05/17/2018 05:26 PM, Michael Bringmann wrote:
> powerpc migration/cpu: Now apply changes to the associativity of cpus
> for the topology of LPARS in Post Migration events.  Recognize more
> changes to the associativity of memory blocks described by the
> 'cpu' properties when processing the topology of LPARS in Post Migration
> events.  Previous efforts only recognized whether a memory block's
> assignment had changed in the property.  Changes here include:
> 
> * Provide hotplug CPU 'readd by index' operation
> * Checking for changes in cpu associativity and making 'readd' calls
>   when differences are observed.
> * Queue up  changes to CPU properties so that they may take place
>   after all PowerPC device-tree changes have been applied i.e. after
>   the device hotplug is released in the mobility code.

This kinda feels like three different patches in one. any reason to not split
this into three patches? Perhaps at the least split the last item into it's
own patch.

> 
> Signed-off-by: Michael Bringmann 
> ---
> Changes include:
>   -- Rearrange patches to co-locate CPU property-related changes.
>   -- Modify dlpar_cpu_add & dlpar_cpu_remove to skip DRC index acquire
>  or release operations during the CPU readd process.
>   -- Correct a bug in DRC index selection for queued operation.
>   -- Rebase to 4.17-rc5 kernel
> ---
>  arch/powerpc/platforms/pseries/hotplug-cpu.c |  123 
> +++---
>  arch/powerpc/platforms/pseries/mobility.c|3 +
>  2 files changed, 95 insertions(+), 31 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
> b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index a408217..23d4cb8 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -474,7 +474,7 @@ static bool valid_cpu_drc_index(struct device_node 
> *parent, u32 drc_index)
>   );
>  }
> 
> -static ssize_t dlpar_cpu_add(u32 drc_index)
> +static ssize_t dlpar_cpu_add(u32 drc_index, bool acquire_drc)
>  {
>   struct device_node *dn, *parent;
>   int rc, saved_rc;
> @@ -499,19 +499,22 @@ static ssize_t dlpar_cpu_add(u32 drc_index)
>   return -EINVAL;
>   }
> 
> - rc = dlpar_acquire_drc(drc_index);
> - if (rc) {
> - pr_warn("Failed to acquire DRC, rc: %d, drc index: %x\n",
> - rc, drc_index);
> - of_node_put(parent);
> - return -EINVAL;
> + if (acquire_drc) {
> + rc = dlpar_acquire_drc(drc_index);
> + if (rc) {
> + pr_warn("Failed to acquire DRC, rc: %d, drc index: 
> %x\n",
> + rc, drc_index);
> + of_node_put(parent);
> + return -EINVAL;
> + }
>   }
> 
>   dn = dlpar_configure_connector(cpu_to_be32(drc_index), parent);
>   if (!dn) {
>   pr_warn("Failed call to configure-connector, drc index: %x\n",
>   drc_index);
> - dlpar_release_drc(drc_index);
> + if (acquire_drc)
> + dlpar_release_drc(drc_index);
>   of_node_put(parent);
>   return -EINVAL;
>   }
> @@ -526,8 +529,9 @@ static ssize_t dlpar_cpu_add(u32 drc_index)
>   pr_warn("Failed to attach node %s, rc: %d, drc index: %x\n",
>   dn->name, rc, drc_index);
> 
> - rc = dlpar_release_drc(drc_index);
> - if (!rc)
> + if (acquire_drc)
> + rc = dlpar_release_drc(drc_index);
> + if (!rc || acquire_drc)
>   dlpar_free_cc_nodes(dn);

This seems like it would be more readable if everything were inside the
if (acquire_drc) block.

> 
>   return saved_rc;
> @@ -540,7 +544,7 @@ static ssize_t dlpar_cpu_add(u32 drc_index)
>   dn->name, rc, drc_index);
> 
>   rc = dlpar_detach_node(dn);
> - if (!rc)
> + if (!rc && acquire_drc)
>   dlpar_release_drc(drc_index);
> 
>   return saved_rc;
> @@ -608,12 +612,13 @@ static int dlpar_offline_cpu(struct device_node *dn)
> 
>  }
> 
> -static ssize_t dlpar_cpu_remove(struct device_node *dn, u32 drc_index)
> +static ssize_t dlpar_cpu_remove(struct device_node *dn, u32 drc_index,
> + bool release_drc)
>  {
>   int rc;
> 
> - pr_debug("Attempting to remove CPU %s, drc index: %x\n",
> -  dn->name, drc_index);
> + pr_debug("Attempting to remove CPU %s, drc index: %x (%d)\n",
> +  dn->name, drc_index, release_drc);
> 
>   rc = dlpar_offline_cpu(dn);
>   if (rc) {
> @@ -621,12 +626,14 @@ static ssize_t dlpar_cpu_remove(struct device_node *dn, 
> u32 drc_index)
>   return -EINVAL;
>   }
> 
> - rc = dlpar_release_drc(drc_index);
> - if (rc) {
> -  

Re: [RFC v4 1/3] powerpc migration/drmem: Modify DRMEM code to export more features

2018-05-18 Thread Nathan Fontenot
On 05/17/2018 05:26 PM, Michael Bringmann wrote:
> powerpc migration/drmem: Export many of the functions of DRMEM to
> parse "ibm,dynamic-memory" and "ibm,dynamic-memory-v2" during
> hotplug operations and for Post Migration events.
> 
> Also modify the DRMEM initialization code to allow it to,
> 
> * Be called after system initialization
> * Provide a separate user copy of the LMB array that is produces
> * Free the user copy upon request
> 
> Signed-off-by: Michael Bringmann 
> ---
> Changes in RFC:
>   -- Separate DRMEM changes into a standalone patch
>   -- Do not export excess functions.  Make exported names more explicit.
>   -- Add new iterator to work through a pair of drmem_info arrays.
>   -- Modify DRMEM code to replace usages of dt_root_addr_cells, and
>  dt_mem_next_cell, as these are only available at first boot.
>   -- Rebase to 4.17-rc5 kernel
> ---
>  arch/powerpc/include/asm/drmem.h |   10 +
>  arch/powerpc/mm/drmem.c  |   78 
> +++---
>  2 files changed, 66 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/drmem.h 
> b/arch/powerpc/include/asm/drmem.h
> index ce242b9..c964b89 100644
> --- a/arch/powerpc/include/asm/drmem.h
> +++ b/arch/powerpc/include/asm/drmem.h
> @@ -35,6 +35,13 @@ struct drmem_lmb_info {
>   _info->lmbs[0],   \
>   _info->lmbs[drmem_info->n_lmbs - 1])
> 
> +#define for_each_pair_drmem_lmb(dinfo1, lmb1, dinfo2, lmb2)  \
> + for ((lmb1) = (>lmbs[0]),   \
> +  (lmb2) = (>lmbs[0]);   \
> + ((lmb1) <= (>lmbs[dinfo1->n_lmbs - 1])) &&  \
> + ((lmb2) <= (>lmbs[dinfo2->n_lmbs - 1]));\
> +  (lmb1)++, (lmb2)++)
> +
>  /*
>   * The of_drconf_cell_v1 struct defines the layout of the LMB data
>   * specified in the ibm,dynamic-memory device tree property.
> @@ -94,6 +101,9 @@ void __init walk_drmem_lmbs(struct device_node *dn,
>   void (*func)(struct drmem_lmb *, const __be32 **));
>  int drmem_update_dt(void);
> 
> +struct drmem_lmb_info* drmem_init_lmbs(struct property *prop);
> +void drmem_lmbs_free(struct drmem_lmb_info *dinfo);
> +
>  #ifdef CONFIG_PPC_PSERIES
>  void __init walk_drmem_lmbs_early(unsigned long node,
>   void (*func)(struct drmem_lmb *, const __be32 **));
> diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
> index 3f18036..d9b281c 100644
> --- a/arch/powerpc/mm/drmem.c
> +++ b/arch/powerpc/mm/drmem.c
> @@ -20,6 +20,7 @@
> 
>  static struct drmem_lmb_info __drmem_info;
>  struct drmem_lmb_info *drmem_info = &__drmem_info;
> +static int n_root_addr_cells;
> 
>  u64 drmem_lmb_memory_max(void)
>  {
> @@ -193,12 +194,13 @@ int drmem_update_dt(void)
>   return rc;
>  }
> 
> -static void __init read_drconf_v1_cell(struct drmem_lmb *lmb,
> +static void read_drconf_v1_cell(struct drmem_lmb *lmb,
>  const __be32 **prop)
>  {
>   const __be32 *p = *prop;
> 
> - lmb->base_addr = dt_mem_next_cell(dt_root_addr_cells, );
> + lmb->base_addr = of_read_number(p, n_root_addr_cells);
> + p += n_root_addr_cells;
>   lmb->drc_index = of_read_number(p++, 1);
> 
>   p++; /* skip reserved field */
> @@ -209,7 +211,7 @@ static void __init read_drconf_v1_cell(struct drmem_lmb 
> *lmb,
>   *prop = p;
>  }
> 
> -static void __init __walk_drmem_v1_lmbs(const __be32 *prop, const __be32 
> *usm,
> +static void __walk_drmem_v1_lmbs(const __be32 *prop, const __be32 *data,
>   void (*func)(struct drmem_lmb *, const __be32 **))
>  {
>   struct drmem_lmb lmb;
> @@ -221,17 +223,18 @@ static void __init __walk_drmem_v1_lmbs(const __be32 
> *prop, const __be32 *usm,
> 
>   for (i = 0; i < n_lmbs; i++) {
>   read_drconf_v1_cell(, );
> - func(, );
> + func(, );

Is there a need to change the variable name from usm to data (bot here and 
below)?

>   }
>  }
> 
> -static void __init read_drconf_v2_cell(struct of_drconf_cell_v2 *dr_cell,
> +static void read_drconf_v2_cell(struct of_drconf_cell_v2 *dr_cell,
>  const __be32 **prop)
>  {
>   const __be32 *p = *prop;
> 
>   dr_cell->seq_lmbs = of_read_number(p++, 1);
> - dr_cell->base_addr = dt_mem_next_cell(dt_root_addr_cells, );
> + dr_cell->base_addr = of_read_number(p, n_root_addr_cells);
> + p += n_root_addr_cells;
>   dr_cell->drc_index = of_read_number(p++, 1);
>   dr_cell->aa_index = of_read_number(p++, 1);
>   dr_cell->flags = of_read_number(p++, 1);
> @@ -239,7 +242,7 @@ static void __init read_drconf_v2_cell(struct 
> of_drconf_cell_v2 *dr_cell,
>   *prop = p;
>  }
> 
> -static void __init __walk_drmem_v2_lmbs(const __be32 *prop, const __be32 
> *usm,
> +static void __walk_drmem_v2_lmbs(const __be32 *prop, const __be32 *data,
>   void 

Re: [PATCH] crypto: reorder paes test lexicographically

2018-05-18 Thread Herbert Xu
On Fri, May 11, 2018 at 09:04:06AM +0100, Gilad Ben-Yossef wrote:
> Due to a snafu "paes" testmgr tests were not ordered
> lexicographically, which led to boot time warnings.
> Reorder the tests as needed.
> 
> Fixes: a794d8d ("crypto: ccree - enable support for hardware keys")
> Reported-by: Abdul Haleem 
> Signed-off-by: Gilad Ben-Yossef 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] crypto: nx: fix spelling mistake: "seqeunce" -> "sequence"

2018-05-18 Thread Herbert Xu
On Wed, May 09, 2018 at 10:16:36AM +0100, Colin King wrote:
> From: Colin Ian King 
> 
> Trivial fix to spelling mistake in CSB_ERR error message text
> 
> Signed-off-by: Colin Ian King 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH 07/14] powerpc: Add support for restartable sequences

2018-05-18 Thread Mathieu Desnoyers
- On May 17, 2018, at 7:50 PM, Boqun Feng boqun.f...@gmail.com wrote:
[...]
>> > I think you're right. So we have to introduce callsite to rseq_syscall()
>> > in syscall path, something like:
>> > 
>> > diff --git a/arch/powerpc/kernel/entry_64.S 
>> > b/arch/powerpc/kernel/entry_64.S
>> > index 51695608c68b..a25734a96640 100644
>> > --- a/arch/powerpc/kernel/entry_64.S
>> > +++ b/arch/powerpc/kernel/entry_64.S
>> > @@ -222,6 +222,9 @@ system_call_exit:
>> >mtmsrd  r11,1
>> > #endif /* CONFIG_PPC_BOOK3E */
>> > 
>> > +  addir3,r1,STACK_FRAME_OVERHEAD
>> > +  bl  rseq_syscall
>> > +
>> >ld  r9,TI_FLAGS(r12)
>> >li  r11,-MAX_ERRNO
>> >andi.
>> >
>> > r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)
>> > 

By the way, I think this is not the right spot to call rseq_syscall, because
interrupts are disabled. I think we should move this hunk right after 
system_call_exit.

Would you like to implement and test an updated patch adding those calls for 
ppc 32 and 64 ?

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: pkeys on POWER: Default AMR, UAMOR values

2018-05-18 Thread Ram Pai
On Fri, May 18, 2018 at 03:17:14PM +0200, Florian Weimer wrote:
> I'm working on adding POWER pkeys support to glibc.  The coding work
> is done, but I'm faced with some test suite failures.
> 
> Unlike the default x86 configuration, on POWER, existing threads
> have full access to newly allocated keys.
> 
> Or, more precisely, in this scenario:
> 
> * Thread A launches thread B
> * Thread B waits
> * Thread A allocations a protection key with pkey_alloc
> * Thread A applies the key to a page
> * Thread A signals thread B
> * Thread B starts to run and accesses the page
> 
> Then at the end, the access will be granted.
> 
> I hope it's not too late to change this to denied access.
> 
> Furthermore, I think the UAMOR value is wrong as well because it
> prevents thread B at the end to set the AMR register.  In
> particular, if I do this
> 
> * … (as before)
> * Thread A signals thread B
> * Thread B sets the access rights for the key to PKEY_DISABLE_ACCESS
> * Thread B reads the current access rights for the key
> 
> then it still gets 0 (all access permitted) because the original
> UAMOR value inherited from thread A prior to the key allocation
> masks out the access right update for the newly allocated key.

Florian, is the behavior on x86 any different? A key allocated in the
context off one thread is not meaningful in the context of any other
thread. 

Since thread B was created prior to the creation of the key, and the key
was created in the context of thread A, thread B neither inherits the
key nor its permissions. Atleast that is how the semantics are supposed
to work as per the man page.

man 7 pkey 

" Applications  using  threads  and  protection  keys  should
be especially careful.  Threads inherit the protection key rights of the
parent at the time of the clone(2), system call.  Applications should
either ensure that their own permissions are appropriate for child
threads at the time when clone(2) is  called,  or ensure that each child
thread can perform its own initialization of protection key rights."


RP



Re: [PATCH bpf v2 1/6] bpf: support 64-bit offsets for bpf function calls

2018-05-18 Thread Sandipan Das

On 05/18/2018 08:45 PM, Daniel Borkmann wrote:
> On 05/18/2018 02:50 PM, Sandipan Das wrote:
>> The imm field of a bpf instruction is a signed 32-bit integer.
>> For JIT bpf-to-bpf function calls, it stores the offset of the
>> start address of the callee's JITed image from __bpf_call_base.
>>
>> For some architectures, such as powerpc64, this offset may be
>> as large as 64 bits and cannot be accomodated in the imm field
>> without truncation.
>>
>> We resolve this by:
>>
>> [1] Additionally using the auxillary data of each function to
>> keep a list of start addresses of the JITed images for all
>> functions determined by the verifier.
>>
>> [2] Retaining the subprog id inside the off field of the call
>> instructions and using it to index into the list mentioned
>> above and lookup the callee's address.
>>
>> To make sure that the existing JIT compilers continue to work
>> without requiring changes, we keep the imm field as it is.
>>
>> Signed-off-by: Sandipan Das 
>> ---
>>  kernel/bpf/verifier.c | 15 ++-
>>  1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index a9e4b1372da6..6c56cce9c4e3 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -5383,11 +5383,24 @@ static int jit_subprogs(struct bpf_verifier_env *env)
>>  insn->src_reg != BPF_PSEUDO_CALL)
>>  continue;
>>  subprog = insn->off;
>> -insn->off = 0;
>>  insn->imm = (u64 (*)(u64, u64, u64, u64, u64))
>>  func[subprog]->bpf_func -
>>  __bpf_call_base;
>>  }
>> +
>> +/* we use the aux data to keep a list of the start addresses
>> + * of the JITed images for each function in the program
>> + *
>> + * for some architectures, such as powerpc64, the imm field
>> + * might not be large enough to hold the offset of the start
>> + * address of the callee's JITed image from __bpf_call_base
>> + *
>> + * in such cases, we can lookup the start address of a callee
>> + * by using its subprog id, available from the off field of
>> + * the call instruction, as an index for this list
>> + */
>> +func[i]->aux->func = func;
>> +func[i]->aux->func_cnt = env->subprog_cnt + 1;
> 
> The target tree you have here is infact bpf, since in bpf-next there was a
> cleanup where the + 1 is removed. Just for the record that we need to keep
> this in mind for bpf into bpf-next merge since this would otherwise subtly
> break.
> 

Sorry about the wrong tag. This series is indeed based off bpf-next.

- Sandipan

>>  }
>>  for (i = 0; i < env->subprog_cnt; i++) {
>>  old_bpf_func = func[i]->bpf_func;
>>
> 
> 



Re: [PATCH bpf v2 4/6] tools: bpf: sync bpf uapi header

2018-05-18 Thread Alexei Starovoitov
On Fri, May 18, 2018 at 5:50 AM, Sandipan Das
 wrote:
> Syncing the bpf.h uapi header with tools so that struct
> bpf_prog_info has the two new fields for passing on the
> addresses of the kernel symbols corresponding to each
> function in a JITed program.
>
> Signed-off-by: Sandipan Das 
> ---
>  tools/include/uapi/linux/bpf.h | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index d94d333a8225..040c9cac7303 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -2188,6 +2188,8 @@ struct bpf_prog_info {
> __u32 xlated_prog_len;
> __aligned_u64 jited_prog_insns;
> __aligned_u64 xlated_prog_insns;
> +   __aligned_u64 jited_ksyms;
> +   __u32 nr_jited_ksyms;
> __u64 load_time;/* ns since boottime */
> __u32 created_by_uid;
> __u32 nr_map_ids;

this breaks uapi.
New fields can only be added to the end.


Re: [PATCH bpf v2 2/6] bpf: powerpc64: add JIT support for multi-function programs

2018-05-18 Thread Daniel Borkmann
On 05/18/2018 06:05 PM, Naveen N. Rao wrote:
> Daniel Borkmann wrote:
>> On 05/18/2018 02:50 PM, Sandipan Das wrote:
>>> This adds support for bpf-to-bpf function calls in the powerpc64
>>> JIT compiler. The JIT compiler converts the bpf call instructions
>>> to native branch instructions. After a round of the usual passes,
>>> the start addresses of the JITed images for the callee functions
>>> are known. Finally, to fixup the branch target addresses, we need
>>> to perform an extra pass.
>>>
>>> Because of the address range in which JITed images are allocated
>>> on powerpc64, the offsets of the start addresses of these images
>>> from __bpf_call_base are as large as 64 bits. So, for a function
>>> call, we cannot use the imm field of the instruction to determine
>>> the callee's address. Instead, we use the alternative method of
>>> getting it from the list of function addresses in the auxillary
>>> data of the caller by using the off field as an index.
>>>
>>> Signed-off-by: Sandipan Das 
>>> ---
>>>  arch/powerpc/net/bpf_jit_comp64.c | 79 
>>> ++-
>>>  1 file changed, 69 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
>>> b/arch/powerpc/net/bpf_jit_comp64.c
>>> index 1bdb1aff0619..25939892d8f7 100644
>>> --- a/arch/powerpc/net/bpf_jit_comp64.c
>>> +++ b/arch/powerpc/net/bpf_jit_comp64.c
>>> @@ -256,7 +256,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct 
>>> codegen_context *ctx, u32
>>>  /* Assemble the body code between the prologue & epilogue */
>>>  static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
>>>    struct codegen_context *ctx,
>>> -  u32 *addrs)
>>> +  u32 *addrs, bool extra_pass)
>>>  {
>>>  const struct bpf_insn *insn = fp->insnsi;
>>>  int flen = fp->len;
>>> @@ -712,11 +712,23 @@ static int bpf_jit_build_body(struct bpf_prog *fp, 
>>> u32 *image,
>>>  break;
>>>  
>>>  /*
>>> - * Call kernel helper
>>> + * Call kernel helper or bpf function
>>>   */
>>>  case BPF_JMP | BPF_CALL:
>>>  ctx->seen |= SEEN_FUNC;
>>> -    func = (u8 *) __bpf_call_base + imm;
>>> +
>>> +    /* bpf function call */
>>> +    if (insn[i].src_reg == BPF_PSEUDO_CALL && extra_pass)
>>
>> Perhaps it might make sense here for !extra_pass to set func to some dummy
>> address as otherwise the 'kernel helper call' branch used for this is a bit
>> misleading in that sense. The PPC_LI64() used in bpf_jit_emit_func_call()
>> optimizes the immediate addr, I presume the JIT can handle situations where
>> in the final extra_pass the image needs to grow/shrink again (due to 
>> different
>> final address for the call)?
> 
> That's a good catch. We don't handle that -- we expect to get the size right 
> on first pass. We could probably have PPC_FUNC_ADDR() pad the result with 
> nops to make it a constant 5-instruction sequence.

Yeah, arm64 does something similar by not optimizing the imm in order to always
emit 4 insns for it.

Thanks,
Daniel


Re: [PATCH bpf v2 2/6] bpf: powerpc64: add JIT support for multi-function programs

2018-05-18 Thread Naveen N. Rao

Daniel Borkmann wrote:

On 05/18/2018 02:50 PM, Sandipan Das wrote:

This adds support for bpf-to-bpf function calls in the powerpc64
JIT compiler. The JIT compiler converts the bpf call instructions
to native branch instructions. After a round of the usual passes,
the start addresses of the JITed images for the callee functions
are known. Finally, to fixup the branch target addresses, we need
to perform an extra pass.

Because of the address range in which JITed images are allocated
on powerpc64, the offsets of the start addresses of these images
from __bpf_call_base are as large as 64 bits. So, for a function
call, we cannot use the imm field of the instruction to determine
the callee's address. Instead, we use the alternative method of
getting it from the list of function addresses in the auxillary
data of the caller by using the off field as an index.

Signed-off-by: Sandipan Das 
---
 arch/powerpc/net/bpf_jit_comp64.c | 79 ++-
 1 file changed, 69 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index 1bdb1aff0619..25939892d8f7 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -256,7 +256,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct 
codegen_context *ctx, u32
 /* Assemble the body code between the prologue & epilogue */
 static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
  struct codegen_context *ctx,
- u32 *addrs)
+ u32 *addrs, bool extra_pass)
 {
const struct bpf_insn *insn = fp->insnsi;
int flen = fp->len;
@@ -712,11 +712,23 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 
*image,
break;
 
 		/*

-* Call kernel helper
+* Call kernel helper or bpf function
 */
case BPF_JMP | BPF_CALL:
ctx->seen |= SEEN_FUNC;
-   func = (u8 *) __bpf_call_base + imm;
+
+   /* bpf function call */
+   if (insn[i].src_reg == BPF_PSEUDO_CALL && extra_pass)


Perhaps it might make sense here for !extra_pass to set func to some dummy
address as otherwise the 'kernel helper call' branch used for this is a bit
misleading in that sense. The PPC_LI64() used in bpf_jit_emit_func_call()
optimizes the immediate addr, I presume the JIT can handle situations where
in the final extra_pass the image needs to grow/shrink again (due to different
final address for the call)?


That's a good catch. We don't handle that -- we expect to get the size 
right on first pass. We could probably have PPC_FUNC_ADDR() pad the 
result with nops to make it a constant 5-instruction sequence.





+   if (fp->aux->func && off < fp->aux->func_cnt)
+   /* use the subprog id from the off
+* field to lookup the callee address
+*/
+   func = (u8 *) 
fp->aux->func[off]->bpf_func;
+   else
+   return -EINVAL;
+   /* kernel helper call */
+   else
+   func = (u8 *) __bpf_call_base + imm;
 
 			bpf_jit_emit_func_call(image, ctx, (u64)func);
 
@@ -864,6 +876,14 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,

return 0;
 }
 
+struct powerpc64_jit_data {

+   struct bpf_binary_header *header;
+   u32 *addrs;
+   u8 *image;
+   u32 proglen;
+   struct codegen_context ctx;
+};
+
 struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 {
u32 proglen;
@@ -871,6 +891,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
u8 *image = NULL;
u32 *code_base;
u32 *addrs;
+   struct powerpc64_jit_data *jit_data;
struct codegen_context cgctx;
int pass;
int flen;
@@ -878,6 +899,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
struct bpf_prog *org_fp = fp;
struct bpf_prog *tmp_fp;
bool bpf_blinded = false;
+   bool extra_pass = false;
 
 	if (!fp->jit_requested)

return org_fp;
@@ -891,7 +913,28 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
fp = tmp_fp;
}
 
+	jit_data = fp->aux->jit_data;

+   if (!jit_data) {
+   jit_data = kzalloc(sizeof(*jit_data), GFP_KERNEL);
+   if (!jit_data) {
+   fp = org_fp;
+   goto out;
+   }
+   fp->aux->jit_data = jit_data;
+   }
+
flen = fp->len;
+   addrs = jit_data->addrs;
+   if (addrs) {
+   cgctx = jit_data->ctx;
+   

Re: [PATCH bpf v2 6/6] bpf: fix JITed dump for multi-function programs via syscall

2018-05-18 Thread Daniel Borkmann
On 05/18/2018 02:50 PM, Sandipan Das wrote:
> Currently, for multi-function programs, we cannot get the JITed
> instructions using the bpf system call's BPF_OBJ_GET_INFO_BY_FD
> command. Because of this, userspace tools such as bpftool fail
> to identify a multi-function program as being JITed or not.
> 
> With the JIT enabled and the test program running, this can be
> verified as follows:
> 
>   # cat /proc/sys/net/core/bpf_jit_enable
>   1
> 
> Before applying this patch:
> 
>   # bpftool prog list
>   1: kprobe  name foo  tag b811aab41a39ad3d  gpl
>   loaded_at 2018-05-16T11:43:38+0530  uid 0
>   xlated 216B  not jited  memlock 65536B
>   ...
> 
>   # bpftool prog dump jited id 1
>   no instructions returned
> 
> After applying this patch:
> 
>   # bpftool prog list
>   1: kprobe  name foo  tag b811aab41a39ad3d  gpl
>   loaded_at 2018-05-16T12:13:01+0530  uid 0
>   xlated 216B  jited 308B  memlock 65536B
>   ...

That's really nice! One comment inline below:

>   # bpftool prog dump jited id 1
>  0:   nop
>  4:   nop
>  8:   mflrr0
>  c:   std r0,16(r1)
> 10:   stdur1,-112(r1)
> 14:   std r31,104(r1)
> 18:   addir31,r1,48
> 1c:   li  r3,10
>   ...
> 
> Signed-off-by: Sandipan Das 
> ---
>  kernel/bpf/syscall.c | 38 --
>  1 file changed, 32 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 54a72fafe57c..2430d159078c 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1896,7 +1896,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog 
> *prog,
>   struct bpf_prog_info info = {};
>   u32 info_len = attr->info.info_len;
>   char __user *uinsns;
> - u32 ulen;
> + u32 ulen, i;
>   int err;
>  
>   err = check_uarg_tail_zero(uinfo, sizeof(info), info_len);
> @@ -1922,7 +1922,6 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog 
> *prog,
>   ulen = min_t(u32, info.nr_map_ids, ulen);
>   if (ulen) {
>   u32 __user *user_map_ids = u64_to_user_ptr(info.map_ids);
> - u32 i;
>  
>   for (i = 0; i < ulen; i++)
>   if (put_user(prog->aux->used_maps[i]->id,
> @@ -1970,13 +1969,41 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog 
> *prog,
>* for offload.
>*/
>   ulen = info.jited_prog_len;
> - info.jited_prog_len = prog->jited_len;
> + if (prog->aux->func_cnt) {
> + info.jited_prog_len = 0;
> + for (i = 0; i < prog->aux->func_cnt; i++)
> + info.jited_prog_len += prog->aux->func[i]->jited_len;
> + } else {
> + info.jited_prog_len = prog->jited_len;
> + }
> +
>   if (info.jited_prog_len && ulen) {
>   if (bpf_dump_raw_ok()) {
>   uinsns = u64_to_user_ptr(info.jited_prog_insns);
>   ulen = min_t(u32, info.jited_prog_len, ulen);
> - if (copy_to_user(uinsns, prog->bpf_func, ulen))
> - return -EFAULT;
> +
> + /* for multi-function programs, copy the JITed
> +  * instructions for all the functions
> +  */
> + if (prog->aux->func_cnt) {
> + u32 len, free;
> + u8 *img;
> +
> + free = ulen;
> + for (i = 0; i < prog->aux->func_cnt; i++) {
> + len = prog->aux->func[i]->jited_len;
> + img = (u8 *) 
> prog->aux->func[i]->bpf_func;
> + if (len > free)
> + break;
> + if (copy_to_user(uinsns, img, len))
> + return -EFAULT;
> + uinsns += len;
> + free -= len;

Is there any way we can introduce a delimiter between the different
images such that they could be more easily correlated with the call
from the main (or other sub-)program instead of having one contiguous
dump blob?

> + }
> + } else {
> + if (copy_to_user(uinsns, prog->bpf_func, ulen))
> + return -EFAULT;
> + }
>   } else {
>   info.jited_prog_insns = 0;
>   }
> @@ -1987,7 +2014,6 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog 
> *prog,
>   if (info.nr_jited_ksyms && ulen) {
>   u64 __user *user_jited_ksyms = 
> u64_to_user_ptr(info.jited_ksyms);
>   ulong ksym_addr;
> - u32 i;
>  
>   /* copy the address of the kernel symbol corresponding to
>

Re: [PATCH bpf v2 3/6] bpf: get kernel symbol addresses via syscall

2018-05-18 Thread Daniel Borkmann
On 05/18/2018 02:50 PM, Sandipan Das wrote:
> This adds new two new fields to struct bpf_prog_info. For
> multi-function programs, these fields can be used to pass
> a list of kernel symbol addresses for all functions in a
> given program and to userspace using the bpf system call
> with the BPF_OBJ_GET_INFO_BY_FD command.
> 
> When bpf_jit_kallsyms is enabled, we can get the address
> of the corresponding kernel symbol for a callee function
> and resolve the symbol's name. The address is determined
> by adding the value of the call instruction's imm field
> to __bpf_call_base. This offset gets assigned to the imm
> field by the verifier.
> 
> For some architectures, such as powerpc64, the imm field
> is not large enough to hold this offset.
> 
> We resolve this by:
> 
> [1] Assigning the subprog id to the imm field of a call
> instruction in the verifier instead of the offset of
> the callee's symbol's address from __bpf_call_base.
> 
> [2] Determining the address of a callee's corresponding
> symbol by using the imm field as an index for the
> list of kernel symbol addresses now available from
> the program info.
> 
> Suggested-by: Daniel Borkmann 
> Signed-off-by: Sandipan Das 
> ---
>  include/uapi/linux/bpf.h |  2 ++
>  kernel/bpf/syscall.c | 20 
>  kernel/bpf/verifier.c|  7 +--
>  3 files changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index d94d333a8225..040c9cac7303 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2188,6 +2188,8 @@ struct bpf_prog_info {
>   __u32 xlated_prog_len;
>   __aligned_u64 jited_prog_insns;
>   __aligned_u64 xlated_prog_insns;
> + __aligned_u64 jited_ksyms;
> + __u32 nr_jited_ksyms;
>   __u64 load_time;/* ns since boottime */
>   __u32 created_by_uid;
>   __u32 nr_map_ids;
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index bfcde949c7f8..54a72fafe57c 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1933,6 +1933,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog 
> *prog,
>   if (!capable(CAP_SYS_ADMIN)) {
>   info.jited_prog_len = 0;
>   info.xlated_prog_len = 0;
> + info.nr_jited_ksyms = 0;
>   goto done;
>   }
>  
> @@ -1981,6 +1982,25 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog 
> *prog,
>   }
>   }
>  
> + ulen = info.nr_jited_ksyms;
> + info.nr_jited_ksyms = prog->aux->func_cnt;
> + if (info.nr_jited_ksyms && ulen) {

Since this exposes addresses (though masked one which is correct), this
definitely needs to be guarded with bpf_dump_raw_ok() like we do in other
places here (see JIT dump for example).

> + u64 __user *user_jited_ksyms = 
> u64_to_user_ptr(info.jited_ksyms);
> + ulong ksym_addr;
> + u32 i;
> +
> + /* copy the address of the kernel symbol corresponding to
> +  * each function
> +  */
> + ulen = min_t(u32, info.nr_jited_ksyms, ulen);
> + for (i = 0; i < ulen; i++) {
> + ksym_addr = (ulong) prog->aux->func[i]->bpf_func;
> + ksym_addr &= PAGE_MASK;
> + if (put_user((u64) ksym_addr, _jited_ksyms[i]))
> + return -EFAULT;
> + }
> + }
> +
>  done:
>   if (copy_to_user(uinfo, , info_len) ||
>   put_user(info_len, >info.info_len))
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 6c56cce9c4e3..e826c396aba2 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -5426,17 +5426,12 @@ static int jit_subprogs(struct bpf_verifier_env *env)
>* later look the same as if they were interpreted only.
>*/
>   for (i = 0, insn = prog->insnsi; i < prog->len; i++, insn++) {
> - unsigned long addr;
> -
>   if (insn->code != (BPF_JMP | BPF_CALL) ||
>   insn->src_reg != BPF_PSEUDO_CALL)
>   continue;
>   insn->off = env->insn_aux_data[i].call_imm;
>   subprog = find_subprog(env, i + insn->off + 1);
> - addr  = (unsigned long)func[subprog]->bpf_func;

Hmm, in current bpf tree this says 'subprog + 1' here, so this is not
rebased against bpf tree but bpf-next (unlike what subject says)?

https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/tree/kernel/bpf/verifier.c#n5351

> - addr &= PAGE_MASK;
> - insn->imm = (u64 (*)(u64, u64, u64, u64, u64))
> - addr - __bpf_call_base;
> + insn->imm = subprog;
>   }
>  
>   prog->jited = 1;
> 



Re: [PATCH bpf v2 1/6] bpf: support 64-bit offsets for bpf function calls

2018-05-18 Thread Daniel Borkmann
On 05/18/2018 02:50 PM, Sandipan Das wrote:
> The imm field of a bpf instruction is a signed 32-bit integer.
> For JIT bpf-to-bpf function calls, it stores the offset of the
> start address of the callee's JITed image from __bpf_call_base.
> 
> For some architectures, such as powerpc64, this offset may be
> as large as 64 bits and cannot be accomodated in the imm field
> without truncation.
> 
> We resolve this by:
> 
> [1] Additionally using the auxillary data of each function to
> keep a list of start addresses of the JITed images for all
> functions determined by the verifier.
> 
> [2] Retaining the subprog id inside the off field of the call
> instructions and using it to index into the list mentioned
> above and lookup the callee's address.
> 
> To make sure that the existing JIT compilers continue to work
> without requiring changes, we keep the imm field as it is.
> 
> Signed-off-by: Sandipan Das 
> ---
>  kernel/bpf/verifier.c | 15 ++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index a9e4b1372da6..6c56cce9c4e3 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -5383,11 +5383,24 @@ static int jit_subprogs(struct bpf_verifier_env *env)
>   insn->src_reg != BPF_PSEUDO_CALL)
>   continue;
>   subprog = insn->off;
> - insn->off = 0;
>   insn->imm = (u64 (*)(u64, u64, u64, u64, u64))
>   func[subprog]->bpf_func -
>   __bpf_call_base;
>   }
> +
> + /* we use the aux data to keep a list of the start addresses
> +  * of the JITed images for each function in the program
> +  *
> +  * for some architectures, such as powerpc64, the imm field
> +  * might not be large enough to hold the offset of the start
> +  * address of the callee's JITed image from __bpf_call_base
> +  *
> +  * in such cases, we can lookup the start address of a callee
> +  * by using its subprog id, available from the off field of
> +  * the call instruction, as an index for this list
> +  */
> + func[i]->aux->func = func;
> + func[i]->aux->func_cnt = env->subprog_cnt + 1;

The target tree you have here is infact bpf, since in bpf-next there was a
cleanup where the + 1 is removed. Just for the record that we need to keep
this in mind for bpf into bpf-next merge since this would otherwise subtly
break.

>   }
>   for (i = 0; i < env->subprog_cnt; i++) {
>   old_bpf_func = func[i]->bpf_func;
> 



Re: [PATCH bpf v2 2/6] bpf: powerpc64: add JIT support for multi-function programs

2018-05-18 Thread Daniel Borkmann
On 05/18/2018 02:50 PM, Sandipan Das wrote:
> This adds support for bpf-to-bpf function calls in the powerpc64
> JIT compiler. The JIT compiler converts the bpf call instructions
> to native branch instructions. After a round of the usual passes,
> the start addresses of the JITed images for the callee functions
> are known. Finally, to fixup the branch target addresses, we need
> to perform an extra pass.
> 
> Because of the address range in which JITed images are allocated
> on powerpc64, the offsets of the start addresses of these images
> from __bpf_call_base are as large as 64 bits. So, for a function
> call, we cannot use the imm field of the instruction to determine
> the callee's address. Instead, we use the alternative method of
> getting it from the list of function addresses in the auxillary
> data of the caller by using the off field as an index.
> 
> Signed-off-by: Sandipan Das 
> ---
>  arch/powerpc/net/bpf_jit_comp64.c | 79 
> ++-
>  1 file changed, 69 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
> b/arch/powerpc/net/bpf_jit_comp64.c
> index 1bdb1aff0619..25939892d8f7 100644
> --- a/arch/powerpc/net/bpf_jit_comp64.c
> +++ b/arch/powerpc/net/bpf_jit_comp64.c
> @@ -256,7 +256,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct 
> codegen_context *ctx, u32
>  /* Assemble the body code between the prologue & epilogue */
>  static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
> struct codegen_context *ctx,
> -   u32 *addrs)
> +   u32 *addrs, bool extra_pass)
>  {
>   const struct bpf_insn *insn = fp->insnsi;
>   int flen = fp->len;
> @@ -712,11 +712,23 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 
> *image,
>   break;
>  
>   /*
> -  * Call kernel helper
> +  * Call kernel helper or bpf function
>*/
>   case BPF_JMP | BPF_CALL:
>   ctx->seen |= SEEN_FUNC;
> - func = (u8 *) __bpf_call_base + imm;
> +
> + /* bpf function call */
> + if (insn[i].src_reg == BPF_PSEUDO_CALL && extra_pass)

Perhaps it might make sense here for !extra_pass to set func to some dummy
address as otherwise the 'kernel helper call' branch used for this is a bit
misleading in that sense. The PPC_LI64() used in bpf_jit_emit_func_call()
optimizes the immediate addr, I presume the JIT can handle situations where
in the final extra_pass the image needs to grow/shrink again (due to different
final address for the call)?

> + if (fp->aux->func && off < fp->aux->func_cnt)
> + /* use the subprog id from the off
> +  * field to lookup the callee address
> +  */
> + func = (u8 *) 
> fp->aux->func[off]->bpf_func;
> + else
> + return -EINVAL;
> + /* kernel helper call */
> + else
> + func = (u8 *) __bpf_call_base + imm;
>  
>   bpf_jit_emit_func_call(image, ctx, (u64)func);
>  
> @@ -864,6 +876,14 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 
> *image,
>   return 0;
>  }
>  
> +struct powerpc64_jit_data {
> + struct bpf_binary_header *header;
> + u32 *addrs;
> + u8 *image;
> + u32 proglen;
> + struct codegen_context ctx;
> +};
> +
>  struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>  {
>   u32 proglen;
> @@ -871,6 +891,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>   u8 *image = NULL;
>   u32 *code_base;
>   u32 *addrs;
> + struct powerpc64_jit_data *jit_data;
>   struct codegen_context cgctx;
>   int pass;
>   int flen;
> @@ -878,6 +899,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>   struct bpf_prog *org_fp = fp;
>   struct bpf_prog *tmp_fp;
>   bool bpf_blinded = false;
> + bool extra_pass = false;
>  
>   if (!fp->jit_requested)
>   return org_fp;
> @@ -891,7 +913,28 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>   fp = tmp_fp;
>   }
>  
> + jit_data = fp->aux->jit_data;
> + if (!jit_data) {
> + jit_data = kzalloc(sizeof(*jit_data), GFP_KERNEL);
> + if (!jit_data) {
> + fp = org_fp;
> + goto out;
> + }
> + fp->aux->jit_data = jit_data;
> + }
> +
>   flen = fp->len;
> + addrs = jit_data->addrs;
> + if (addrs) {
> + cgctx = jit_data->ctx;
> + image = jit_data->image;
> + bpf_hdr = jit_data->header;
> + proglen 

Re: [PATCH 0/7] Miscellaneous patches and bug fixes

2018-05-18 Thread Martin K. Petersen

Uma,

> This patch series adds few improvements to the cxlflash driver and it
> also contains couple of bug fixes.

Applied to 4.18/scsi-queue, thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v2 5/5] powerpc/lib: inline memcmp() for small constant sizes

2018-05-18 Thread Segher Boessenkool
On Fri, May 18, 2018 at 12:35:48PM +0200, Christophe Leroy wrote:
> On 05/17/2018 03:55 PM, Segher Boessenkool wrote:
> >On Thu, May 17, 2018 at 12:49:58PM +0200, Christophe Leroy wrote:
> >>In my 8xx configuration, I get 208 calls to memcmp()
> >Could you show results with a more recent GCC?  What version was this?
> 
> It was with the latest GCC version I have available in my environment, 
> that is GCC 5.4. Is that too old ?

Since GCC 7 the compiler knows how to do this, for powerpc; in GCC 8
it has improved still.

> It seems that version inlines memcmp() when length is 1. All other 
> lengths call memcmp()

Yup.

> c000d018 :
> c000d018: 80 64 00 00 lwz r3,0(r4)
> c000d01c: 81 25 00 00 lwz r9,0(r5)
> c000d020: 7c 69 18 50 subfr3,r9,r3
> c000d024: 4e 80 00 20 blr

This is incorrect, it does not get the sign of the result correct.
Say when comparing 0xff 0xff 0xff 0xff to 0 0 0 0.  This should return
positive, but it returns negative.

For Power9 GCC does

lwz 3,0(3)
lwz 9,0(4)
cmpld 7,3,9
setb 3,7

and for Power7/Power8,

lwz 9,0(3)
lwz 3,0(4)
subfc 3,3,9
popcntd 3,3
subfe 9,9,9
or 3,3,9

(and it gives up for earlier CPUs, there is no nice simple code sequence
as far as we know.  Code size matters when generating inline code).

(Generating code for -m32 it is the same, just w instead of d in a few
places).

> c000d09c :
> c000d09c: 81 25 00 04 lwz r9,4(r5)
> c000d0a0: 80 64 00 04 lwz r3,4(r4)
> c000d0a4: 81 04 00 00 lwz r8,0(r4)
> c000d0a8: 81 45 00 00 lwz r10,0(r5)
> c000d0ac: 7c 69 18 10 subfc   r3,r9,r3
> c000d0b0: 7d 2a 41 10 subfe   r9,r10,r8
> c000d0b4: 7d 2a fe 70 srawi   r10,r9,31
> c000d0b8: 7d 48 4b 79 or. r8,r10,r9
> c000d0bc: 4d a2 00 20 bclr+   12,eq
> c000d0c0: 7d 23 4b 78 mr  r3,r9
> c000d0c4: 4e 80 00 20 blr

> This shows that on PPC32, the 8 bytes comparison is not optimal, I will 
> improve it.

It's not correct either (same problem as with length 4).


Segher


[PATCH] powerpc/32: Implement csum_ipv6_magic in assembly

2018-05-18 Thread Christophe Leroy
The generic csum_ipv6_magic() generates a pretty bad result

 :
   0:   81 23 00 00 lwz r9,0(r3)
   4:   81 03 00 04 lwz r8,4(r3)
   8:   7c e7 4a 14 add r7,r7,r9
   c:   7d 29 38 10 subfc   r9,r9,r7
  10:   7d 4a 51 10 subfe   r10,r10,r10
  14:   7d 27 42 14 add r9,r7,r8
  18:   7d 2a 48 50 subfr9,r10,r9
  1c:   80 e3 00 08 lwz r7,8(r3)
  20:   7d 08 48 10 subfc   r8,r8,r9
  24:   7d 4a 51 10 subfe   r10,r10,r10
  28:   7d 29 3a 14 add r9,r9,r7
  2c:   81 03 00 0c lwz r8,12(r3)
  30:   7d 2a 48 50 subfr9,r10,r9
  34:   7c e7 48 10 subfc   r7,r7,r9
  38:   7d 4a 51 10 subfe   r10,r10,r10
  3c:   7d 29 42 14 add r9,r9,r8
  40:   7d 2a 48 50 subfr9,r10,r9
  44:   80 e4 00 00 lwz r7,0(r4)
  48:   7d 08 48 10 subfc   r8,r8,r9
  4c:   7d 4a 51 10 subfe   r10,r10,r10
  50:   7d 29 3a 14 add r9,r9,r7
  54:   7d 2a 48 50 subfr9,r10,r9
  58:   81 04 00 04 lwz r8,4(r4)
  5c:   7c e7 48 10 subfc   r7,r7,r9
  60:   7d 4a 51 10 subfe   r10,r10,r10
  64:   7d 29 42 14 add r9,r9,r8
  68:   7d 2a 48 50 subfr9,r10,r9
  6c:   80 e4 00 08 lwz r7,8(r4)
  70:   7d 08 48 10 subfc   r8,r8,r9
  74:   7d 4a 51 10 subfe   r10,r10,r10
  78:   7d 29 3a 14 add r9,r9,r7
  7c:   7d 2a 48 50 subfr9,r10,r9
  80:   81 04 00 0c lwz r8,12(r4)
  84:   7c e7 48 10 subfc   r7,r7,r9
  88:   7d 4a 51 10 subfe   r10,r10,r10
  8c:   7d 29 42 14 add r9,r9,r8
  90:   7d 2a 48 50 subfr9,r10,r9
  94:   7d 08 48 10 subfc   r8,r8,r9
  98:   7d 4a 51 10 subfe   r10,r10,r10
  9c:   7d 29 2a 14 add r9,r9,r5
  a0:   7d 2a 48 50 subfr9,r10,r9
  a4:   7c a5 48 10 subfc   r5,r5,r9
  a8:   7c 63 19 10 subfe   r3,r3,r3
  ac:   7d 29 32 14 add r9,r9,r6
  b0:   7d 23 48 50 subfr9,r3,r9
  b4:   7c c6 48 10 subfc   r6,r6,r9
  b8:   7c 63 19 10 subfe   r3,r3,r3
  bc:   7c 63 48 50 subfr3,r3,r9
  c0:   54 6a 80 3e rotlwi  r10,r3,16
  c4:   7c 63 52 14 add r3,r3,r10
  c8:   7c 63 18 f8 not r3,r3
  cc:   54 63 84 3e rlwinm  r3,r3,16,16,31
  d0:   4e 80 00 20 blr

This patch implements it in assembly for PPC32

Link: https://github.com/linuxppc/linux/issues/9
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/checksum.h |  8 
 arch/powerpc/lib/checksum_32.S  | 33 +
 2 files changed, 41 insertions(+)

diff --git a/arch/powerpc/include/asm/checksum.h 
b/arch/powerpc/include/asm/checksum.h
index 54065caa40b3..c41c280c252f 100644
--- a/arch/powerpc/include/asm/checksum.h
+++ b/arch/powerpc/include/asm/checksum.h
@@ -13,6 +13,7 @@
 #include 
 #else
 #include 
+#include 
 /*
  * Computes the checksum of a memory block at src, length len,
  * and adds in "sum" (32-bit), while copying the block to dst.
@@ -211,6 +212,13 @@ static inline __sum16 ip_compute_csum(const void *buff, 
int len)
return csum_fold(csum_partial(buff, len, 0));
 }
 
+#ifdef CONFIG_PPC32
+#define _HAVE_ARCH_IPV6_CSUM
+__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
+   const struct in6_addr *daddr,
+   __u32 len, __u8 proto, __wsum sum);
+#endif
+
 #endif
 #endif /* __KERNEL__ */
 #endif
diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S
index 9a671c774b22..f9c047145696 100644
--- a/arch/powerpc/lib/checksum_32.S
+++ b/arch/powerpc/lib/checksum_32.S
@@ -293,3 +293,36 @@ dst_error:
EX_TABLE(51b, dst_error);
 
 EXPORT_SYMBOL(csum_partial_copy_generic)
+
+/*
+ * static inline __sum16 csum_ipv6_magic(const struct in6_addr *saddr,
+ *   const struct in6_addr *daddr,
+ *   __u32 len, __u8 proto, __wsum sum)
+ */
+
+_GLOBAL(csum_ipv6_magic)
+   lwz r8, 0(r3)
+   lwz r9, 4(r3)
+   lwz r10, 8(r3)
+   lwz r11, 12(r3)
+   addcr0, r5, r6
+   adder0, r0, r7
+   adder0, r0, r8
+   adder0, r0, r9
+   adder0, r0, r10
+   adder0, r0, r11
+   lwz r8, 0(r4)
+   lwz r9, 4(r4)
+   lwz r10, 8(r4)
+   lwz r11, 12(r4)
+   adder0, r0, r8
+   adder0, r0, r9
+   adder0, r0, r10
+   adder0, r0, r11
+   addze   r0
+   rotlwi  r3, r0, 16
+   add r3, r0, r3
+   not r3, r3
+   rlwinm  r3, r3, 16, 16, 31
+   blr
+EXPORT_SYMBOL(csum_ipv6_magic)
-- 
2.13.3



Re: pkeys on POWER: Default AMR, UAMOR values

2018-05-18 Thread Andy Lutomirski
On Fri, May 18, 2018 at 6:17 AM Florian Weimer  wrote:

> I'm working on adding POWER pkeys support to glibc.  The coding work is
> done, but I'm faced with some test suite failures.

> Unlike the default x86 configuration, on POWER, existing threads have
> full access to newly allocated keys.

> Or, more precisely, in this scenario:

> * Thread A launches thread B
> * Thread B waits
> * Thread A allocations a protection key with pkey_alloc
> * Thread A applies the key to a page
> * Thread A signals thread B
> * Thread B starts to run and accesses the page

> Then at the end, the access will be granted.

> I hope it's not too late to change this to denied access.

> Furthermore, I think the UAMOR value is wrong as well because it
> prevents thread B at the end to set the AMR register.  In particular, if
> I do this

> * … (as before)
> * Thread A signals thread B
> * Thread B sets the access rights for the key to PKEY_DISABLE_ACCESS
> * Thread B reads the current access rights for the key

> then it still gets 0 (all access permitted) because the original UAMOR
> value inherited from thread A prior to the key allocation masks out the
> access right update for the newly allocated key.

This type of issue is why I think that a good protection key ISA would not
have a usermode read-the-whole-register or write-the-whole-register
operation at all.  It's still not clear to me that there is any good
kernel-mode solution.  But at least x86 defaults to deny-everything, which
is more annoying but considerably safer than POWER's behavior.

--Andy


pkeys on POWER: Access rights not reset on execve

2018-05-18 Thread Florian Weimer

This test program:

#include 
#include 
#include 
#include 
#include 
#include 

/* Return the value of the AMR register.  */
static inline unsigned long int
pkey_read (void)
{
  unsigned long int result;
  __asm__ volatile ("mfspr %0, 13" : "=r" (result));
  return result;
}

/* Overwrite the AMR register with VALUE.  */
static inline void
pkey_write (unsigned long int value)
{
  __asm__ volatile ("mtspr 13, %0" : : "r" (value));
}

int
main (int argc, char **argv)
{
  printf ("AMR (PID %d): 0x%016lx\n", (int) getpid (), pkey_read());
  if (argc > 1)
{
  int key = syscall (__NR_pkey_alloc, 0, 0);
  if (key < 0)
err (1, "pkey_alloc");
  printf ("Allocated key (PID %d): %d\n", (int) getpid (), key);
  return 0;
}

  pid_t pid = fork ();
  if (pid == 0)
{
  execl ("/proc/self/exe", argv[0], "subprocess", NULL);
  _exit (1);
}
  if (pid < 0)
err (1, "fork");
  int status;
  if (waitpid (pid, , 0) < 0)
err (1, "waitpid");

  int key = syscall (__NR_pkey_alloc, 0, 0);
  if (key < 0)
err (1, "pkey_alloc");
  printf ("Allocated key (PID %d): %d\n", (int) getpid (), key);

  unsigned long int amr = -1;
  printf ("Setting AMR: 0x%016lx\n", amr);
  pkey_write (amr);
  printf ("New AMR value (PID %d, before execl): 0x%016lx\n",
  (int) getpid (), pkey_read());
  execl ("/proc/self/exe", argv[0], "subprocess", NULL);
  err (1, "exec");
  return 1;
}

shows that the AMR register value is not reset on execve:

AMR (PID 112291): 0x
AMR (PID 112292): 0x
Allocated key (PID 112292): 2
Allocated key (PID 112291): 2
Setting AMR: 0x
New AMR value (PID 112291, before execl): 0x0c00
AMR (PID 112291): 0x0c00
Allocated key (PID 112291): 2

I think this is a real bug and needs to be fixed even if the defaults 
are kept as-is (see the other thread).


(Seen on 4.17.0-rc5.)

Thanks,
Florian


[GIT PULL] Please pull powerpc/linux.git powerpc-4.17-6 tag

2018-05-18 Thread Michael Ellerman
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi Linus,

Please pull some more powerpc fixes for 4.17:

The following changes since commit 6c0a8f6b5a45ac892a763b6299bd3c5324fc5e02:

  powerpc/pseries: Fix CONFIG_NUMA=n build (2018-05-08 14:59:56 +1000)

are available in the git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-4.17-6

for you to fetch changes up to c1d2a31397ec51f0370f6bd17b19b39152c263cb:

  powerpc/powernv: Fix NVRAM sleep in invalid context when crashing (2018-05-18 
00:23:07 +1000)

- 
powerpc fixes for 4.17 #6

Just three commits.

The two cxl ones are not fixes per se, but they modify code that was added this
cycle so that it will work with a recent firmware change.

And then a fix for a recent commit that added sleeps in the NVRAM code, which
needs to be more careful and not sleep if eg. we're called in the panic() path.

Thanks to:
  Nicholas Piggin, Philippe Bergheaud, Christophe Lombard.

- 
Nicholas Piggin (1):
  powerpc/powernv: Fix NVRAM sleep in invalid context when crashing

Philippe Bergheaud (2):
  cxl: Set the PBCQ Tunnel BAR register when enabling capi mode
  cxl: Report the tunneled operations status

 Documentation/ABI/testing/sysfs-class-cxl   |  8 
 arch/powerpc/platforms/powernv/opal-nvram.c | 14 --
 drivers/misc/cxl/cxl.h  |  1 +
 drivers/misc/cxl/pci.c  | 12 
 drivers/misc/cxl/sysfs.c| 10 ++
 5 files changed, 43 insertions(+), 2 deletions(-)
-BEGIN PGP SIGNATURE-

iQIcBAEBCAAGBQJa/tj4AAoJEFHr6jzI4aWAZBoP/2/kddNPHzMwnzcx0B5WzEfo
3JQcTLZPEqsb5n0xwaWOgb0o0u8EpHDtPZlJlZZWwPKbphlZvQ+reAJAXCQXXj6P
n6BrbfpHlJo1J2wZVeihOuVTtSiXqq82YEtCzdUCUQNa8+2rQ7bLOB56cLdnyFWx
wvK/xOM4rG/nPDi3tlH+IvhNeUdc2Y1vCB6N/RVvJVJOduCED1zCxcWj7uJYGt1x
UdFPcah4OPSF9qPemReu6CjiwesvEm3y3S9TYLLJLXBDmKvs7BDRbfczP52tY90P
XMErJN4MtnNasotCJVcZylfNhSTCyyaQ4si+G/COAM0eh8oPdcIH3C2yXnQHXVPz
pUpe7TF3N7DOoa66WHXsh9dr9G7E0DhcpBgpz30/HmYasaYSEYpDsCH6UvbmvpzO
xtyocfgtuyaF28Rz4WVYSVGkJwE/+NhdsvfQUFI6DGQXoVu08Xxx3rhOlNKVecTw
8yuaIpPSz0i1pFg3SRa1MaS0F4FqGs0T3OLFsEihjdsc3EpJeB3Vp0IiqbmhR6Pa
1rhsjPY4yS5tPJS9K4CQszgw3Ke6C7KYIttGo7BxVKPLXsp52HhTqFpp0LHDcw8y
zFjOS7kh07NCgn8tSwMLUTRUUXEDLE2fJOVWCDdvkXEzpd/LfaP15gjZ3m66OMTS
Oj72EjBAGNX4dN4yQuo7
=4xib
-END PGP SIGNATURE-


Re: [PATCH 1/2] powerpc: Detect the presence of big-core with interleaved threads

2018-05-18 Thread Michael Ellerman
"Gautham R. Shenoy"  writes:

> diff --git a/arch/powerpc/kernel/setup-common.c 
> b/arch/powerpc/kernel/setup-common.c
> index 0af5c11..884dff2 100644
> --- a/arch/powerpc/kernel/setup-common.c
> +++ b/arch/powerpc/kernel/setup-common.c
> @@ -436,8 +438,56 @@ static void __init cpu_init_thread_core_maps(int tpc)
>   printk(KERN_DEBUG " (thread shift is %d)\n", threads_shift);
>  }
>  
> -
>  u32 *cpu_to_phys_id = NULL;
> +/*
> + * check_for_interleaved_big_core - Checks if the core represented by
> + *dn is a big-core whose threads are interleavings of the
> + *threads of the component small cores.
> + *
> + * @dn: device node corresponding to the core.
> + *
> + * Returns true if the core is a interleaved big-core.
> + * Returns false otherwise.
> + */
> +static inline bool check_for_interleaved_big_core(struct device_node *dn)
> +{
> + int len, nr_groups, threads_per_group;
> + const __be32 *thread_groups;
> + __be32 *thread_list, *first_cpu_idx;
> + int cur_cpu, next_cpu, i, j;
> +
> + thread_groups = of_get_property(dn, "ibm,thread-groups", );
> + if (!thread_groups)
> + return false;

There are better device tree APIs than bare of_get_property() these
days, can you try to use those?

> + nr_groups = be32_to_cpu(*(thread_groups + 1));
> + if (nr_groups <= 1)
> + return false;

eg. this would be of_property_read_u32_index()

> +
> + threads_per_group = be32_to_cpu(*(thread_groups + 2));
> + thread_list = (__be32 *)thread_groups + 3;
> +
> + /*
> +  * In case of an interleaved big-core, the thread-ids of the
> +  * big-core can be obtained by interleaving the the thread-ids
> +  * of the component small
> +  *
> +  * Eg: On a 8-thread big-core with two SMT4 small cores, the
> +  * threads of the two component small cores will be
> +  * {0, 2, 4, 6} and {1, 3, 5, 7}.
> +  */
> + for (i = 0; i < nr_groups; i++) {
> + first_cpu_idx = thread_list + i * threads_per_group;
> +
> + for (j = 0; j < threads_per_group - 1; j++) {
> + cur_cpu = be32_to_cpu(*(first_cpu_idx + j));
> + next_cpu = be32_to_cpu(*(first_cpu_idx + j + 1));
> + if (next_cpu != cur_cpu + nr_groups)
> + return false;
> + }
> + }
> + return true;
> +}
>  
>  /**
>   * setup_cpu_maps - initialize the following cpu maps:
> @@ -565,7 +615,16 @@ void __init smp_setup_cpu_maps(void)
>   vdso_data->processorCount = num_present_cpus();
>  #endif /* CONFIG_PPC64 */
>  
> -/* Initialize CPU <=> thread mapping/
> + dn = of_find_node_by_type(NULL, "cpu");
> + if (dn) {
> + if (check_for_interleaved_big_core(dn)) {
> + has_interleaved_big_core = true;
> + pr_info("Detected interleaved big-cores\n");
> + }
> + of_node_put(dn);
> + }

This is a bit untidy, given how unlikely it is that you would have no
CPUs :)

You should be able to do the lookup of the property and the setting of
has_interleaved_big_core all inside check_for_interleaved_big_core().

cheers


pkeys on POWER: Default AMR, UAMOR values

2018-05-18 Thread Florian Weimer
I'm working on adding POWER pkeys support to glibc.  The coding work is 
done, but I'm faced with some test suite failures.


Unlike the default x86 configuration, on POWER, existing threads have 
full access to newly allocated keys.


Or, more precisely, in this scenario:

* Thread A launches thread B
* Thread B waits
* Thread A allocations a protection key with pkey_alloc
* Thread A applies the key to a page
* Thread A signals thread B
* Thread B starts to run and accesses the page

Then at the end, the access will be granted.

I hope it's not too late to change this to denied access.

Furthermore, I think the UAMOR value is wrong as well because it 
prevents thread B at the end to set the AMR register.  In particular, if 
I do this


* … (as before)
* Thread A signals thread B
* Thread B sets the access rights for the key to PKEY_DISABLE_ACCESS
* Thread B reads the current access rights for the key

then it still gets 0 (all access permitted) because the original UAMOR 
value inherited from thread A prior to the key allocation masks out the 
access right update for the newly allocated key.


Thanks,
Florian


Re: [PATCH 1/2] powerpc: Detect the presence of big-core with interleaved threads

2018-05-18 Thread Michael Ellerman
Gautham R Shenoy  writes:
...
>> > @@ -565,7 +615,16 @@ void __init smp_setup_cpu_maps(void)
>> >vdso_data->processorCount = num_present_cpus();
>> >  #endif /* CONFIG_PPC64 */
>> >  
>> > -/* Initialize CPU <=> thread mapping/
>> > +  dn = of_find_node_by_type(NULL, "cpu");
>> > +  if (dn) {
>> > +  if (check_for_interleaved_big_core(dn)) {
>> > +  has_interleaved_big_core = true;
>> > +  pr_info("Detected interleaved big-cores\n");
>> 
>> Is there a runtime way to check this also?  If the dmesg buffer overflows, we
>> lose this.
>
> Where do you suggest we put this ? Should it be a part of
> /proc/cpuinfo ?

Hmm, it'd be nice not to pollute it with more junk.

Can you just look at the pir files in sysfs?

eg. on a normal system:

  # cd /sys/devices/system/cpu
  # grep . cpu[0-7]/pir
  cpu0/pir:20
  cpu1/pir:21
  cpu2/pir:22
  cpu3/pir:23
  cpu4/pir:24
  cpu5/pir:25
  cpu6/pir:26
  cpu7/pir:27


cheers


Re: [PATCH 2/2] powerpc: Enable ASYM_SMT on interleaved big-core systems

2018-05-18 Thread Michael Ellerman
Gautham R Shenoy  writes:

> On Mon, May 14, 2018 at 01:22:07PM +1000, Michael Neuling wrote:
>> On Fri, 2018-05-11 at 16:47 +0530, Gautham R. Shenoy wrote:
>> > From: "Gautham R. Shenoy" 
>> > 
>> > Each of the SMT4 cores forming a fused-core are more or less
>> > independent units. Thus when multiple tasks are scheduled to run on
>> > the fused core, we get the best performance when the tasks are spread
>> > across the pair of SMT4 cores.
>> > 
>> > Since the threads in the pair of SMT4 cores of an interleaved big-core
>> > are numbered {0,2,4,6} and {1,3,5,7} respectively, enable ASYM_SMT on
>> > such interleaved big-cores that will bias the load-balancing of tasks
>> > on smaller numbered threads, which will automatically result in
>> > spreading the tasks uniformly across the associated pair of SMT4
>> > cores.
>> > 
>> > Signed-off-by: Gautham R. Shenoy 
>> > ---
>> >  arch/powerpc/kernel/smp.c | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> > 
>> > diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
>> > index 9ca7148..0153f01 100644
>> > --- a/arch/powerpc/kernel/smp.c
>> > +++ b/arch/powerpc/kernel/smp.c
>> > @@ -1082,7 +1082,7 @@ static int powerpc_smt_flags(void)
>> >  {
>> >int flags = SD_SHARE_CPUCAPACITY | SD_SHARE_PKG_RESOURCES;
>> >  
>> > -  if (cpu_has_feature(CPU_FTR_ASYM_SMT)) {
>> > +  if (cpu_has_feature(CPU_FTR_ASYM_SMT) || has_interleaved_big_core) {
>> 
>> Shouldn't we just set CPU_FTR_ASYM_SMT and leave this code
> unchanged?
>
> Yes, that would have the same effect. I refrained from doing that
> since I thought CPU_FTR_ASYM_SMT has the "lower numbered threads
> expedite thread-folding" connotation from the POWER7 generation.

The above code is the only use of the feature, so I don't think we need
to worry about any other connotations.

> If it is ok to overload CPU_FTR_ASYM_SMT, we can do what you suggest
> and have all the changes in setup-common.c

Yeah let's do that.

cheers


[PATCH v2] powerpc/lib: Adjust .balign inside string functions for PPC32

2018-05-18 Thread Christophe Leroy
commit 87a156fb18fe1 ("Align hot loops of some string functions")
degraded the performance of string functions by adding useless
nops

A simple benchmark on an 8xx calling 10x a memchr() that
matches the first byte runs in 41668 TB ticks before this patch
and in 35986 TB ticks after this patch. So this gives an
improvement of approx 10%

Another benchmark doing the same with a memchr() matching the 128th
byte runs in 1011365 TB ticks before this patch and 1005682 TB ticks
after this patch, so regardless on the number of loops, removing
those useless nops improves the test by 5683 TB ticks.

Fixes: 87a156fb18fe1 ("Align hot loops of some string functions")
Signed-off-by: Christophe Leroy 
---
 v2: Define IFETCH_ALIGN_SHIFT for PPC32 and use IFETCH_ALIGN_BYTES
for the alignment

 arch/powerpc/include/asm/cache.h | 3 +++
 arch/powerpc/lib/string.S| 7 ---
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h
index c1d257aa4c2d..66298461b640 100644
--- a/arch/powerpc/include/asm/cache.h
+++ b/arch/powerpc/include/asm/cache.h
@@ -9,11 +9,14 @@
 #if defined(CONFIG_PPC_8xx) || defined(CONFIG_403GCX)
 #define L1_CACHE_SHIFT 4
 #define MAX_COPY_PREFETCH  1
+#define IFETCH_ALIGN_SHIFT 2
 #elif defined(CONFIG_PPC_E500MC)
 #define L1_CACHE_SHIFT 6
 #define MAX_COPY_PREFETCH  4
+#define IFETCH_ALIGN_SHIFT 3
 #elif defined(CONFIG_PPC32)
 #define MAX_COPY_PREFETCH  4
+#define IFETCH_ALIGN_SHIFT 3   /* 603 fetches 2 insn at a time */
 #if defined(CONFIG_PPC_47x)
 #define L1_CACHE_SHIFT 7
 #else
diff --git a/arch/powerpc/lib/string.S b/arch/powerpc/lib/string.S
index a787776822d8..0378def28d41 100644
--- a/arch/powerpc/lib/string.S
+++ b/arch/powerpc/lib/string.S
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 
.text

@@ -23,7 +24,7 @@ _GLOBAL(strncpy)
mtctr   r5
addir6,r3,-1
addir4,r4,-1
-   .balign 16
+   .balign IFETCH_ALIGN_BYTES
 1: lbzur0,1(r4)
cmpwi   0,r0,0
stbur0,1(r6)
@@ -43,7 +44,7 @@ _GLOBAL(strncmp)
mtctr   r5
addir5,r3,-1
addir4,r4,-1
-   .balign 16
+   .balign IFETCH_ALIGN_BYTES
 1: lbzur3,1(r5)
cmpwi   1,r3,0
lbzur0,1(r4)
@@ -77,7 +78,7 @@ _GLOBAL(memchr)
beq-2f
mtctr   r5
addir3,r3,-1
-   .balign 16
+   .balign IFETCH_ALIGN_BYTES
 1: lbzur0,1(r3)
cmpw0,r0,r4
bdnzf   2,1b
-- 
2.13.3



Re: [PATCH 2/2] powerpc/ptrace: Fix setting 512B aligned breakpoints with PTRACE_SET_DEBUGREG

2018-05-18 Thread Michael Ellerman
Michael Neuling  writes:
> In this change:
>   e2a800beac powerpc/hw_brk: Fix off by one error when validating DAWR region 
> end
>
> We fixed setting the DAWR end point to its max value via
> PPC_PTRACE_SETHWDEBUG. Unfortunately we broke PTRACE_SET_DEBUGREG when
> setting a 512 byte aligned breakpoint.
>
> PTRACE_SET_DEBUGREG currently sets the length of the breakpoint to
> zero (memset() in hw_breakpoint_init()).  This worked with
> arch_validate_hwbkpt_settings() before the above patch was applied but
> is now broken if the breakpoint is 512byte aligned.
>
> This sets the length of the breakpoint to 8 bytes when using
> PTRACE_SET_DEBUGREG.
>
> Signed-off-by: Michael Neuling 
> Cc: sta...@vger.kernel.org # 3.10+

If this is "fixing" e2a800beac then I think v3.11 is right for the
stable tag?

$ git describe --contains --long e2a800beaca1
v3.11-rc1~94^2~4

cheers


[PATCH bpf v2 6/6] bpf: fix JITed dump for multi-function programs via syscall

2018-05-18 Thread Sandipan Das
Currently, for multi-function programs, we cannot get the JITed
instructions using the bpf system call's BPF_OBJ_GET_INFO_BY_FD
command. Because of this, userspace tools such as bpftool fail
to identify a multi-function program as being JITed or not.

With the JIT enabled and the test program running, this can be
verified as follows:

  # cat /proc/sys/net/core/bpf_jit_enable
  1

Before applying this patch:

  # bpftool prog list
  1: kprobe  name foo  tag b811aab41a39ad3d  gpl
  loaded_at 2018-05-16T11:43:38+0530  uid 0
  xlated 216B  not jited  memlock 65536B
  ...

  # bpftool prog dump jited id 1
  no instructions returned

After applying this patch:

  # bpftool prog list
  1: kprobe  name foo  tag b811aab41a39ad3d  gpl
  loaded_at 2018-05-16T12:13:01+0530  uid 0
  xlated 216B  jited 308B  memlock 65536B
  ...

  # bpftool prog dump jited id 1
 0:   nop
 4:   nop
 8:   mflrr0
 c:   std r0,16(r1)
10:   stdur1,-112(r1)
14:   std r31,104(r1)
18:   addir31,r1,48
1c:   li  r3,10
  ...

Signed-off-by: Sandipan Das 
---
 kernel/bpf/syscall.c | 38 --
 1 file changed, 32 insertions(+), 6 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 54a72fafe57c..2430d159078c 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1896,7 +1896,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
struct bpf_prog_info info = {};
u32 info_len = attr->info.info_len;
char __user *uinsns;
-   u32 ulen;
+   u32 ulen, i;
int err;
 
err = check_uarg_tail_zero(uinfo, sizeof(info), info_len);
@@ -1922,7 +1922,6 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
ulen = min_t(u32, info.nr_map_ids, ulen);
if (ulen) {
u32 __user *user_map_ids = u64_to_user_ptr(info.map_ids);
-   u32 i;
 
for (i = 0; i < ulen; i++)
if (put_user(prog->aux->used_maps[i]->id,
@@ -1970,13 +1969,41 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog 
*prog,
 * for offload.
 */
ulen = info.jited_prog_len;
-   info.jited_prog_len = prog->jited_len;
+   if (prog->aux->func_cnt) {
+   info.jited_prog_len = 0;
+   for (i = 0; i < prog->aux->func_cnt; i++)
+   info.jited_prog_len += prog->aux->func[i]->jited_len;
+   } else {
+   info.jited_prog_len = prog->jited_len;
+   }
+
if (info.jited_prog_len && ulen) {
if (bpf_dump_raw_ok()) {
uinsns = u64_to_user_ptr(info.jited_prog_insns);
ulen = min_t(u32, info.jited_prog_len, ulen);
-   if (copy_to_user(uinsns, prog->bpf_func, ulen))
-   return -EFAULT;
+
+   /* for multi-function programs, copy the JITed
+* instructions for all the functions
+*/
+   if (prog->aux->func_cnt) {
+   u32 len, free;
+   u8 *img;
+
+   free = ulen;
+   for (i = 0; i < prog->aux->func_cnt; i++) {
+   len = prog->aux->func[i]->jited_len;
+   img = (u8 *) 
prog->aux->func[i]->bpf_func;
+   if (len > free)
+   break;
+   if (copy_to_user(uinsns, img, len))
+   return -EFAULT;
+   uinsns += len;
+   free -= len;
+   }
+   } else {
+   if (copy_to_user(uinsns, prog->bpf_func, ulen))
+   return -EFAULT;
+   }
} else {
info.jited_prog_insns = 0;
}
@@ -1987,7 +2014,6 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
if (info.nr_jited_ksyms && ulen) {
u64 __user *user_jited_ksyms = 
u64_to_user_ptr(info.jited_ksyms);
ulong ksym_addr;
-   u32 i;
 
/* copy the address of the kernel symbol corresponding to
 * each function
-- 
2.14.3



[PATCH bpf v2 5/6] tools: bpftool: resolve calls without using imm field

2018-05-18 Thread Sandipan Das
Currently, we resolve the callee's address for a JITed function
call by using the imm field of the call instruction as an offset
from __bpf_call_base. If bpf_jit_kallsyms is enabled, we further
use this address to get the callee's kernel symbol's name.

For some architectures, such as powerpc64, the imm field is not
large enough to hold this offset. So, instead of assigning this
offset to the imm field, the verifier now assigns the subprog
id. Also, a list of kernel symbol addresses for all the JITed
functions is provided in the program info. We now use the imm
field as an index for this list to lookup a callee's symbol's
address and resolve its name.

Suggested-by: Daniel Borkmann 
Signed-off-by: Sandipan Das 
---
v2:
 - Order variables from longest to shortest
 - Make sure that ksyms_ptr and ksyms_len are always initialized
 - Simplify code
---
 tools/bpf/bpftool/prog.c  | 29 +
 tools/bpf/bpftool/xlated_dumper.c | 10 +-
 tools/bpf/bpftool/xlated_dumper.h |  2 ++
 3 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index 9bdfdf2d3fbe..e2f8f8f259fc 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -421,19 +421,26 @@ static int do_show(int argc, char **argv)
 static int do_dump(int argc, char **argv)
 {
struct bpf_prog_info info = {};
+   unsigned long *addrs = NULL;
struct dump_data dd = {};
__u32 len = sizeof(info);
unsigned int buf_size;
+   unsigned int nr_addrs;
char *filepath = NULL;
bool opcodes = false;
bool visual = false;
unsigned char *buf;
__u32 *member_len;
__u64 *member_ptr;
+   __u32 *ksyms_len;
+   __u64 *ksyms_ptr;
ssize_t n;
int err;
int fd;
 
+   ksyms_len = _jited_ksyms;
+   ksyms_ptr = _ksyms;
+
if (is_prefix(*argv, "jited")) {
member_len = _prog_len;
member_ptr = _prog_insns;
@@ -496,10 +503,22 @@ static int do_dump(int argc, char **argv)
return -1;
}
 
+   nr_addrs = *ksyms_len;
+   if (nr_addrs) {
+   addrs = malloc(nr_addrs * sizeof(__u64));
+   if (!addrs) {
+   p_err("mem alloc failed");
+   close(fd);
+   goto err_free;
+   }
+   }
+
memset(, 0, sizeof(info));
 
*member_ptr = ptr_to_u64(buf);
*member_len = buf_size;
+   *ksyms_ptr = ptr_to_u64(addrs);
+   *ksyms_len = nr_addrs;
 
err = bpf_obj_get_info_by_fd(fd, , );
close(fd);
@@ -513,6 +532,11 @@ static int do_dump(int argc, char **argv)
goto err_free;
}
 
+   if (*ksyms_len > nr_addrs) {
+   p_err("too many addresses returned");
+   goto err_free;
+   }
+
if ((member_len == _prog_len &&
 info.jited_prog_insns == 0) ||
(member_len == _prog_len &&
@@ -558,6 +582,9 @@ static int do_dump(int argc, char **argv)
dump_xlated_cfg(buf, *member_len);
} else {
kernel_syms_load();
+   dd.nr_jited_ksyms = *ksyms_len;
+   dd.jited_ksyms = (__u64 *) *ksyms_ptr;
+
if (json_output)
dump_xlated_json(, buf, *member_len, opcodes);
else
@@ -566,10 +593,12 @@ static int do_dump(int argc, char **argv)
}
 
free(buf);
+   free(addrs);
return 0;
 
 err_free:
free(buf);
+   free(addrs);
return -1;
 }
 
diff --git a/tools/bpf/bpftool/xlated_dumper.c 
b/tools/bpf/bpftool/xlated_dumper.c
index 7a3173b76c16..fb065b55db6d 100644
--- a/tools/bpf/bpftool/xlated_dumper.c
+++ b/tools/bpf/bpftool/xlated_dumper.c
@@ -174,7 +174,11 @@ static const char *print_call_pcrel(struct dump_data *dd,
unsigned long address,
const struct bpf_insn *insn)
 {
-   if (sym)
+   if (!dd->nr_jited_ksyms)
+   /* Do not show address for interpreted programs */
+   snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
+   "%+d", insn->off);
+   else if (sym)
snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
 "%+d#%s", insn->off, sym->name);
else
@@ -203,6 +207,10 @@ static const char *print_call(void *private_data,
unsigned long address = dd->address_call_base + insn->imm;
struct kernel_sym *sym;
 
+   if (insn->src_reg == BPF_PSEUDO_CALL &&
+   (__u32) insn->imm < dd->nr_jited_ksyms)
+   address = dd->jited_ksyms[insn->imm];
+
sym = kernel_syms_search(dd, address);
if (insn->src_reg == BPF_PSEUDO_CALL)
return print_call_pcrel(dd, sym, address, insn);
diff 

[PATCH bpf v2 4/6] tools: bpf: sync bpf uapi header

2018-05-18 Thread Sandipan Das
Syncing the bpf.h uapi header with tools so that struct
bpf_prog_info has the two new fields for passing on the
addresses of the kernel symbols corresponding to each
function in a JITed program.

Signed-off-by: Sandipan Das 
---
 tools/include/uapi/linux/bpf.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index d94d333a8225..040c9cac7303 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2188,6 +2188,8 @@ struct bpf_prog_info {
__u32 xlated_prog_len;
__aligned_u64 jited_prog_insns;
__aligned_u64 xlated_prog_insns;
+   __aligned_u64 jited_ksyms;
+   __u32 nr_jited_ksyms;
__u64 load_time;/* ns since boottime */
__u32 created_by_uid;
__u32 nr_map_ids;
-- 
2.14.3



[PATCH bpf v2 3/6] bpf: get kernel symbol addresses via syscall

2018-05-18 Thread Sandipan Das
This adds new two new fields to struct bpf_prog_info. For
multi-function programs, these fields can be used to pass
a list of kernel symbol addresses for all functions in a
given program and to userspace using the bpf system call
with the BPF_OBJ_GET_INFO_BY_FD command.

When bpf_jit_kallsyms is enabled, we can get the address
of the corresponding kernel symbol for a callee function
and resolve the symbol's name. The address is determined
by adding the value of the call instruction's imm field
to __bpf_call_base. This offset gets assigned to the imm
field by the verifier.

For some architectures, such as powerpc64, the imm field
is not large enough to hold this offset.

We resolve this by:

[1] Assigning the subprog id to the imm field of a call
instruction in the verifier instead of the offset of
the callee's symbol's address from __bpf_call_base.

[2] Determining the address of a callee's corresponding
symbol by using the imm field as an index for the
list of kernel symbol addresses now available from
the program info.

Suggested-by: Daniel Borkmann 
Signed-off-by: Sandipan Das 
---
 include/uapi/linux/bpf.h |  2 ++
 kernel/bpf/syscall.c | 20 
 kernel/bpf/verifier.c|  7 +--
 3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d94d333a8225..040c9cac7303 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2188,6 +2188,8 @@ struct bpf_prog_info {
__u32 xlated_prog_len;
__aligned_u64 jited_prog_insns;
__aligned_u64 xlated_prog_insns;
+   __aligned_u64 jited_ksyms;
+   __u32 nr_jited_ksyms;
__u64 load_time;/* ns since boottime */
__u32 created_by_uid;
__u32 nr_map_ids;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index bfcde949c7f8..54a72fafe57c 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1933,6 +1933,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
if (!capable(CAP_SYS_ADMIN)) {
info.jited_prog_len = 0;
info.xlated_prog_len = 0;
+   info.nr_jited_ksyms = 0;
goto done;
}
 
@@ -1981,6 +1982,25 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
}
}
 
+   ulen = info.nr_jited_ksyms;
+   info.nr_jited_ksyms = prog->aux->func_cnt;
+   if (info.nr_jited_ksyms && ulen) {
+   u64 __user *user_jited_ksyms = 
u64_to_user_ptr(info.jited_ksyms);
+   ulong ksym_addr;
+   u32 i;
+
+   /* copy the address of the kernel symbol corresponding to
+* each function
+*/
+   ulen = min_t(u32, info.nr_jited_ksyms, ulen);
+   for (i = 0; i < ulen; i++) {
+   ksym_addr = (ulong) prog->aux->func[i]->bpf_func;
+   ksym_addr &= PAGE_MASK;
+   if (put_user((u64) ksym_addr, _jited_ksyms[i]))
+   return -EFAULT;
+   }
+   }
+
 done:
if (copy_to_user(uinfo, , info_len) ||
put_user(info_len, >info.info_len))
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6c56cce9c4e3..e826c396aba2 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5426,17 +5426,12 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 * later look the same as if they were interpreted only.
 */
for (i = 0, insn = prog->insnsi; i < prog->len; i++, insn++) {
-   unsigned long addr;
-
if (insn->code != (BPF_JMP | BPF_CALL) ||
insn->src_reg != BPF_PSEUDO_CALL)
continue;
insn->off = env->insn_aux_data[i].call_imm;
subprog = find_subprog(env, i + insn->off + 1);
-   addr  = (unsigned long)func[subprog]->bpf_func;
-   addr &= PAGE_MASK;
-   insn->imm = (u64 (*)(u64, u64, u64, u64, u64))
-   addr - __bpf_call_base;
+   insn->imm = subprog;
}
 
prog->jited = 1;
-- 
2.14.3



[PATCH bpf v2 2/6] bpf: powerpc64: add JIT support for multi-function programs

2018-05-18 Thread Sandipan Das
This adds support for bpf-to-bpf function calls in the powerpc64
JIT compiler. The JIT compiler converts the bpf call instructions
to native branch instructions. After a round of the usual passes,
the start addresses of the JITed images for the callee functions
are known. Finally, to fixup the branch target addresses, we need
to perform an extra pass.

Because of the address range in which JITed images are allocated
on powerpc64, the offsets of the start addresses of these images
from __bpf_call_base are as large as 64 bits. So, for a function
call, we cannot use the imm field of the instruction to determine
the callee's address. Instead, we use the alternative method of
getting it from the list of function addresses in the auxillary
data of the caller by using the off field as an index.

Signed-off-by: Sandipan Das 
---
 arch/powerpc/net/bpf_jit_comp64.c | 79 ++-
 1 file changed, 69 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index 1bdb1aff0619..25939892d8f7 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -256,7 +256,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct 
codegen_context *ctx, u32
 /* Assemble the body code between the prologue & epilogue */
 static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
  struct codegen_context *ctx,
- u32 *addrs)
+ u32 *addrs, bool extra_pass)
 {
const struct bpf_insn *insn = fp->insnsi;
int flen = fp->len;
@@ -712,11 +712,23 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 
*image,
break;
 
/*
-* Call kernel helper
+* Call kernel helper or bpf function
 */
case BPF_JMP | BPF_CALL:
ctx->seen |= SEEN_FUNC;
-   func = (u8 *) __bpf_call_base + imm;
+
+   /* bpf function call */
+   if (insn[i].src_reg == BPF_PSEUDO_CALL && extra_pass)
+   if (fp->aux->func && off < fp->aux->func_cnt)
+   /* use the subprog id from the off
+* field to lookup the callee address
+*/
+   func = (u8 *) 
fp->aux->func[off]->bpf_func;
+   else
+   return -EINVAL;
+   /* kernel helper call */
+   else
+   func = (u8 *) __bpf_call_base + imm;
 
bpf_jit_emit_func_call(image, ctx, (u64)func);
 
@@ -864,6 +876,14 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 
*image,
return 0;
 }
 
+struct powerpc64_jit_data {
+   struct bpf_binary_header *header;
+   u32 *addrs;
+   u8 *image;
+   u32 proglen;
+   struct codegen_context ctx;
+};
+
 struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 {
u32 proglen;
@@ -871,6 +891,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
u8 *image = NULL;
u32 *code_base;
u32 *addrs;
+   struct powerpc64_jit_data *jit_data;
struct codegen_context cgctx;
int pass;
int flen;
@@ -878,6 +899,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
struct bpf_prog *org_fp = fp;
struct bpf_prog *tmp_fp;
bool bpf_blinded = false;
+   bool extra_pass = false;
 
if (!fp->jit_requested)
return org_fp;
@@ -891,7 +913,28 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
fp = tmp_fp;
}
 
+   jit_data = fp->aux->jit_data;
+   if (!jit_data) {
+   jit_data = kzalloc(sizeof(*jit_data), GFP_KERNEL);
+   if (!jit_data) {
+   fp = org_fp;
+   goto out;
+   }
+   fp->aux->jit_data = jit_data;
+   }
+
flen = fp->len;
+   addrs = jit_data->addrs;
+   if (addrs) {
+   cgctx = jit_data->ctx;
+   image = jit_data->image;
+   bpf_hdr = jit_data->header;
+   proglen = jit_data->proglen;
+   alloclen = proglen + FUNCTION_DESCR_SIZE;
+   extra_pass = true;
+   goto skip_init_ctx;
+   }
+
addrs = kzalloc((flen+1) * sizeof(*addrs), GFP_KERNEL);
if (addrs == NULL) {
fp = org_fp;
@@ -904,10 +947,10 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
cgctx.stack_size = round_up(fp->aux->stack_depth, 16);
 
/* Scouting faux-generate pass 0 */
-   if (bpf_jit_build_body(fp, 0, , addrs)) {
+   if (bpf_jit_build_body(fp, 0, 

[PATCH bpf v2 1/6] bpf: support 64-bit offsets for bpf function calls

2018-05-18 Thread Sandipan Das
The imm field of a bpf instruction is a signed 32-bit integer.
For JIT bpf-to-bpf function calls, it stores the offset of the
start address of the callee's JITed image from __bpf_call_base.

For some architectures, such as powerpc64, this offset may be
as large as 64 bits and cannot be accomodated in the imm field
without truncation.

We resolve this by:

[1] Additionally using the auxillary data of each function to
keep a list of start addresses of the JITed images for all
functions determined by the verifier.

[2] Retaining the subprog id inside the off field of the call
instructions and using it to index into the list mentioned
above and lookup the callee's address.

To make sure that the existing JIT compilers continue to work
without requiring changes, we keep the imm field as it is.

Signed-off-by: Sandipan Das 
---
 kernel/bpf/verifier.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index a9e4b1372da6..6c56cce9c4e3 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5383,11 +5383,24 @@ static int jit_subprogs(struct bpf_verifier_env *env)
insn->src_reg != BPF_PSEUDO_CALL)
continue;
subprog = insn->off;
-   insn->off = 0;
insn->imm = (u64 (*)(u64, u64, u64, u64, u64))
func[subprog]->bpf_func -
__bpf_call_base;
}
+
+   /* we use the aux data to keep a list of the start addresses
+* of the JITed images for each function in the program
+*
+* for some architectures, such as powerpc64, the imm field
+* might not be large enough to hold the offset of the start
+* address of the callee's JITed image from __bpf_call_base
+*
+* in such cases, we can lookup the start address of a callee
+* by using its subprog id, available from the off field of
+* the call instruction, as an index for this list
+*/
+   func[i]->aux->func = func;
+   func[i]->aux->func_cnt = env->subprog_cnt + 1;
}
for (i = 0; i < env->subprog_cnt; i++) {
old_bpf_func = func[i]->bpf_func;
-- 
2.14.3



[PATCH bpf v2 0/6] bpf: enhancements for multi-function programs

2018-05-18 Thread Sandipan Das
This patch series introduces the following:

[1] Support for bpf-to-bpf function calls in the powerpc64 JIT compiler.

[2] Provide a way for resolving function calls because of the way JITed
images are allocated in powerpc64.

[3] Fix to get JITed instruction dumps for multi-function programs from
the bpf system call.

v2:
 - Incorporate review comments from Jakub

Sandipan Das (6):
  bpf: support 64-bit offsets for bpf function calls
  bpf: powerpc64: add JIT support for multi-function programs
  bpf: get kernel symbol addresses via syscall
  tools: bpf: sync bpf uapi header
  tools: bpftool: resolve calls without using imm field
  bpf: fix JITed dump for multi-function programs via syscall

 arch/powerpc/net/bpf_jit_comp64.c | 79 ++-
 include/uapi/linux/bpf.h  |  2 +
 kernel/bpf/syscall.c  | 56 ---
 kernel/bpf/verifier.c | 22 +++
 tools/bpf/bpftool/prog.c  | 29 ++
 tools/bpf/bpftool/xlated_dumper.c | 10 -
 tools/bpf/bpftool/xlated_dumper.h |  2 +
 tools/include/uapi/linux/bpf.h|  2 +
 8 files changed, 179 insertions(+), 23 deletions(-)

-- 
2.14.3



Re: [PATCH-RESEND] cxl: Disable prefault_mode in Radix mode

2018-05-18 Thread Frederic Barrat



Le 18/05/2018 à 11:42, Vaibhav Jain a écrit :

From: Vaibhav Jain 

Currently we see a kernel-oops reported on Power-9 while attaching a
context to an AFU, with radix-mode and sysfs attr 'prefault_mode' set
to anything other than 'none'. The backtrace of the oops is of this
form:

Unable to handle kernel paging request for data at address 0x0080
Faulting instruction address: 0xc0080bcf3b20
cpu 0x1: Vector: 300 (Data Access) at [c0037f003800]
 pc: c0080bcf3b20: cxl_load_segment+0x178/0x290 [cxl]
 lr: c0080bcf39f0: cxl_load_segment+0x48/0x290 [cxl]
 sp: c0037f003a80
msr: 90009033
dar: 80
  dsisr: 4000
   current = 0xc0037f28
   paca= 0xc003e600   softe: 3irq_happened: 0x01
 pid   = 3529, comm = afp_no_int

[c0037f003af0] c0080bcf4424 cxl_prefault+0xfc/0x248 [cxl]
[c0037f003b50] c0080bcf8a40 process_element_entry_psl9+0xd8/0x1a0 [cxl]
[c0037f003b90] c0080bcf944c 
cxl_attach_dedicated_process_psl9+0x44/0x130 [cxl]
[c0037f003bd0] c0080bcf5448 native_attach_process+0xc0/0x130 [cxl]
[c0037f003c50] c0080bcf16cc afu_ioctl+0x3f4/0x5e0 [cxl]
[c0037f003d00] c039d98c do_vfs_ioctl+0xdc/0x890
[c0037f003da0] c039e1a8 ksys_ioctl+0x68/0xf0
[c0037f003df0] c039e270 sys_ioctl+0x40/0xa0
[c0037f003e30] c000b320 system_call+0x58/0x6c
--- Exception: c01 (System Call) at 10053bb0

The issue is caused as on Power-8 the AFU attr 'prefault_mode' was
used to improve initial storage fault performance by prefaulting
process segments. However on Power-9 with radix mode we don't have
Storage-Segments that we can prefault. Also prefaulting process Pages
will be too costly and fine-grained.

Hence, since the prefaulting mechanism doesn't makes sense of
radix-mode, this patch updates prefault_mode_store() to not allow any
other value apart from CXL_PREFAULT_NONE when radix mode is enabled.

Cc: 
Fixes: f24be42aab37 ("cxl: Add psl9 specific code")
Signed-off-by: Vaibhav Jain 
---


Thanks!
Acked-by: Frederic Barrat 




Change-log:

Resend  ->  Updated the commit description to add more info on the
issue seen [Andrew]
---
  Documentation/ABI/testing/sysfs-class-cxl |  4 +++-
  drivers/misc/cxl/sysfs.c  | 16 
  2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-class-cxl 
b/Documentation/ABI/testing/sysfs-class-cxl
index 640f65e79ef1..267920a1874b 100644
--- a/Documentation/ABI/testing/sysfs-class-cxl
+++ b/Documentation/ABI/testing/sysfs-class-cxl
@@ -69,7 +69,9 @@ Date:   September 2014
  Contact:linuxppc-dev@lists.ozlabs.org
  Description:read/write
  Set the mode for prefaulting in segments into the segment 
table
-when performing the START_WORK ioctl. Possible values:
+when performing the START_WORK ioctl. Only applicable when
+running under hashed page table mmu.
+Possible values:
  none: No prefaulting (default)
  work_element_descriptor: Treat the work element
   descriptor as an effective address and
diff --git a/drivers/misc/cxl/sysfs.c b/drivers/misc/cxl/sysfs.c
index 4b5a4c5d3c01..629e2e156412 100644
--- a/drivers/misc/cxl/sysfs.c
+++ b/drivers/misc/cxl/sysfs.c
@@ -353,12 +353,20 @@ static ssize_t prefault_mode_store(struct device *device,
struct cxl_afu *afu = to_cxl_afu(device);
enum prefault_modes mode = -1;

-   if (!strncmp(buf, "work_element_descriptor", 23))
-   mode = CXL_PREFAULT_WED;
-   if (!strncmp(buf, "all", 3))
-   mode = CXL_PREFAULT_ALL;
if (!strncmp(buf, "none", 4))
mode = CXL_PREFAULT_NONE;
+   else {
+   if (!radix_enabled()) {
+
+   /* only allowed when not in radix mode */
+   if (!strncmp(buf, "work_element_descriptor", 23))
+   mode = CXL_PREFAULT_WED;
+   if (!strncmp(buf, "all", 3))
+   mode = CXL_PREFAULT_ALL;
+   } else {
+   dev_err(device, "Cannot prefault with radix enabled\n");
+   }
+   }

if (mode == -1)
return -EINVAL;





Re: [PATCH 07/14] powerpc: Add support for restartable sequences

2018-05-18 Thread Michael Ellerman
Mathieu Desnoyers  writes:
> - On May 16, 2018, at 9:19 PM, Boqun Feng boqun.f...@gmail.com wrote:
>> On Wed, May 16, 2018 at 04:13:16PM -0400, Mathieu Desnoyers wrote:
>>> - On May 16, 2018, at 12:18 PM, Peter Zijlstra pet...@infradead.org 
>>> wrote:
>>> > On Mon, Apr 30, 2018 at 06:44:26PM -0400, Mathieu Desnoyers wrote:
>>> >> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>>> >> index c32a181a7cbb..ed21a777e8c6 100644
>>> >> --- a/arch/powerpc/Kconfig
>>> >> +++ b/arch/powerpc/Kconfig
>>> >> @@ -223,6 +223,7 @@ config PPC
>>> >>  select HAVE_SYSCALL_TRACEPOINTS
>>> >>  select HAVE_VIRT_CPU_ACCOUNTING
>>> >>  select HAVE_IRQ_TIME_ACCOUNTING
>>> >> +select HAVE_RSEQ
>>> >>  select IRQ_DOMAIN
>>> >>  select IRQ_FORCED_THREADING
>>> >>  select MODULES_USE_ELF_RELA
>>> >> diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
>>> >> index 61db86ecd318..d3bb3aaaf5ac 100644
>>> >> --- a/arch/powerpc/kernel/signal.c
>>> >> +++ b/arch/powerpc/kernel/signal.c
>>> >> @@ -133,6 +133,8 @@ static void do_signal(struct task_struct *tsk)
>>> >>  /* Re-enable the breakpoints for the signal stack */
>>> >>  thread_change_pc(tsk, tsk->thread.regs);
>>> >>  
>>> >> +rseq_signal_deliver(tsk->thread.regs);
>>> >> +
>>> >>  if (is32) {
>>> >>  if (ksig.ka.sa.sa_flags & SA_SIGINFO)
>>> >>  ret = handle_rt_signal32(, oldset, tsk);
>>> >> @@ -164,6 +166,7 @@ void do_notify_resume(struct pt_regs *regs, unsigned 
>>> >> long
>>> >> thread_info_flags)
>>> >>  if (thread_info_flags & _TIF_NOTIFY_RESUME) {
>>> >>  clear_thread_flag(TIF_NOTIFY_RESUME);
>>> >>  tracehook_notify_resume(regs);
>>> >> +rseq_handle_notify_resume(regs);
>>> >>  }
>>> >>  
>>> >>  user_enter();
>>> > 
>>> > Again no rseq_syscall().
>>> 
>>> Same question for PowerPC as for ARM:
>>> 
>>> Considering that rseq_syscall is implemented as follows:
>>> 
>>> +void rseq_syscall(struct pt_regs *regs)
>>> +{
>>> +   unsigned long ip = instruction_pointer(regs);
>>> +   struct task_struct *t = current;
>>> +   struct rseq_cs rseq_cs;
>>> +
>>> +   if (!t->rseq)
>>> +   return;
>>> +   if (!access_ok(VERIFY_READ, t->rseq, sizeof(*t->rseq)) ||
>>> +   rseq_get_rseq_cs(t, _cs) || in_rseq_cs(ip, _cs))
>>> +   force_sig(SIGSEGV, t);
>>> +}
>>> 
>>> and that x86 calls it from syscall_return_slowpath() (which AFAIU is
>>> now used in the fast-path since KPTI), I wonder where we should call
>> 
>> So we actually detect this after the syscall takes effect, right? I
>> wonder whether this could be problematic, because "disallowing syscall"
>> in rseq areas may means the syscall won't take effect to some people, I
>> guess?
>> 
>>> this on PowerPC ?  I was under the impression that PowerPC return to
>>> userspace fast-path was not calling C code unless work flags were set,
>>> but I might be wrong.
>>> 
>> 
>> I think you're right. So we have to introduce callsite to rseq_syscall()
>> in syscall path, something like:
>> 
>> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
>> index 51695608c68b..a25734a96640 100644
>> --- a/arch/powerpc/kernel/entry_64.S
>> +++ b/arch/powerpc/kernel/entry_64.S
>> @@ -222,6 +222,9 @@ system_call_exit:
>>  mtmsrd  r11,1
>> #endif /* CONFIG_PPC_BOOK3E */
>> 
>> +addir3,r1,STACK_FRAME_OVERHEAD
>> +bl  rseq_syscall
>> +
>>  ld  r9,TI_FLAGS(r12)
>>  li  r11,-MAX_ERRNO
>>  andi.
>>  
>> r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)
>> 
>> But I think it's important for us to first decide where (before or after
>> the syscall) we do the detection.
>
> As Peter said, we don't really care whether it's on syscall entry or exit, as
> long as the process gets killed when the erroneous use is detected. I think 
> doing
> it on syscall exit is a bit easier because we can clearly access the userspace
> TLS, which AFAIU may be less straightforward on syscall entry.

Coming in to the thread late, sorry if I'm missing the point.

> We may want to add #ifdef CONFIG_DEBUG_RSEQ / #endif around the code you
> proposed above, so it's only compiled in if CONFIG_DEBUG_RSEQ=y.

That sounds good. A function call is not free even if it returns immediately.

> On the ARM leg of the email thread, Will Deacon suggests to test whether 
> current->rseq
> is non-NULL before calling rseq_syscall(). I wonder if this added check is 
> justified
> as the assembly level, considering that this is just a debugging option. We 
> already do
> that check at the very beginning of rseq_syscall().

I guess it depends if this is one of those "debugging options" that's
going to end up turned on in distro kernels?

I think in that code we'd need to check 

Re: [PATCH v2 00/10] KVM: PPC: reimplement mmio emulation with analyse_instr()

2018-05-18 Thread Paul Mackerras
On Mon, May 07, 2018 at 02:20:06PM +0800, wei.guo.si...@gmail.com wrote:
> From: Simon Guo 
> 
> We already have analyse_instr() which analyzes instructions for the 
> instruction
> type, size, addtional flags, etc. What kvmppc_emulate_loadstore() did is 
> somehow
> duplicated and it will be good to utilize analyse_instr() to reimplement the
> code. The advantage is that the code logic will be shared and more clean to 
> be 
> maintained.

I have put patches 1-3 into my kvm-ppc-next branch.

Thanks,
Paul.


Re: [PATCH 1/3] powerpc/io: Add __raw_writeq_be() __raw_rm_writeq_be()

2018-05-18 Thread Michael Ellerman
Samuel Mendoza-Jonas  writes:

> On Mon, 2018-05-14 at 22:50 +1000, Michael Ellerman wrote:
>> Add byte-swapping versions of __raw_writeq() and __raw_rm_writeq().
>> 
>> This allows us to avoid sparse warnings caused by passing __be64 to
>> __raw_writeq(), which takes unsigned long:
>> 
>>   arch/powerpc/platforms/powernv/pci-ioda.c:1981:38:
>>   warning: incorrect type in argument 1 (different base types)
>>   expected unsigned long [unsigned] v
>>   got restricted __be64 [usertype] 
>> 
>> It's also generally preferable to use a byte-swapping accessor rather
>> than doing it by hand in the code, which is more bug prone.
>> 
>> Signed-off-by: Michael Ellerman 
>
> For this and the following patches:
>
> Reviewed-by: Samuel Mendoza-Jonas 

Thanks.

cheers


Re: [PATCH v2 5/5] powerpc/lib: inline memcmp() for small constant sizes

2018-05-18 Thread Christophe Leroy



On 05/17/2018 03:55 PM, Segher Boessenkool wrote:

On Thu, May 17, 2018 at 12:49:58PM +0200, Christophe Leroy wrote:

In my 8xx configuration, I get 208 calls to memcmp()
Within those 208 calls, about half of them have constant sizes,
46 have a size of 8, 17 have a size of 16, only a few have a
size over 16. Other fixed sizes are mostly 4, 6 and 10.

This patch inlines calls to memcmp() when size
is constant and lower than or equal to 16

In my 8xx configuration, this reduces the number of calls
to memcmp() from 208 to 123

The following table shows the number of TB timeticks to perform
a constant size memcmp() before and after the patch depending on
the size

Before  After   Improvement
01:  75775682   25%
02: 416685682   86%
03: 51137   13258   74%
04: 454555682   87%
05: 58713   13258   77%
06: 58712   13258   77%
07: 68183   20834   70%
08: 56819   15153   73%
09: 70077   28411   60%
10: 70077   28411   60%
11: 79546   35986   55%
12: 68182   28411   58%
13: 81440   35986   55%
14: 81440   39774   51%
15: 94697   43562   54%
16: 79546   37881   52%


Could you show results with a more recent GCC?  What version was this?


It was with the latest GCC version I have available in my environment, 
that is GCC 5.4. Is that too old ?


It seems that version inlines memcmp() when length is 1. All other 
lengths call memcmp()




What is this really measuring?  I doubt it takes 7577 (or 5682) timebase
ticks to do a 1-byte memcmp, which is just 3 instructions after all.


Well I looked again in my tests and it seems some results are wrong, can 
remember why, I probably did something wrong when I did the tests.


Anyway, the principle is to call a function tstcmpX() 10 times from 
a loop, and getting the mftb before and after the loop.
Then we remove from the elapsed time the time spent when calling 
tstcmp0() which is only a blr.

Therefore, we get really the time spent in the comparison only.

Here is the loop:

c06243b0:   7f ac 42 e6 mftbr29
c06243b4:   3f 60 00 01 lis r27,1
c06243b8:   63 7b 86 a0 ori r27,r27,34464
c06243bc:   38 a0 00 02 li  r5,2
c06243c0:   7f c4 f3 78 mr  r4,r30
c06243c4:   7f 83 e3 78 mr  r3,r28
c06243c8:   4b 9e 8c 09 bl  c000cfd0 
c06243cc:   2c 1b 00 01 cmpwi   r27,1
c06243d0:   3b 7b ff ff addir27,r27,-1
c06243d4:   40 82 ff e8 bne c06243bc 
c06243d8:   7c ac 42 e6 mftbr5
c06243dc:   7c bd 28 50 subfr5,r29,r5
c06243e0:   7c bf 28 50 subfr5,r31,r5
c06243e4:   38 80 00 02 li  r4,2
c06243e8:   7f 43 d3 78 mr  r3,r26
c06243ec:   4b a2 e4 45 bl  c0052830 

Before the patch:
c000cfc4 :
c000cfc4:   4e 80 00 20 blr

c000cfc8 :
c000cfc8:   38 a0 00 01 li  r5,1
c000cfcc:   48 00 72 08 b   c00141d4 <__memcmp>

c000cfd0 :
c000cfd0:   38 a0 00 02 li  r5,2
c000cfd4:   48 00 72 00 b   c00141d4 <__memcmp>

c000cfd8 :
c000cfd8:   38 a0 00 03 li  r5,3
c000cfdc:   48 00 71 f8 b   c00141d4 <__memcmp>

After the patch:
c000cfc4 :
c000cfc4:   4e 80 00 20 blr

c000cfd8 :
c000cfd8:   88 64 00 00 lbz r3,0(r4)
c000cfdc:   89 25 00 00 lbz r9,0(r5)
c000cfe0:   7c 69 18 50 subfr3,r9,r3
c000cfe4:   4e 80 00 20 blr

c000cfe8 :
c000cfe8:   a0 64 00 00 lhz r3,0(r4)
c000cfec:   a1 25 00 00 lhz r9,0(r5)
c000cff0:   7c 69 18 50 subfr3,r9,r3
c000cff4:   4e 80 00 20 blr

c000cff8 :
c000cff8:   a1 24 00 00 lhz r9,0(r4)
c000cffc:   a0 65 00 00 lhz r3,0(r5)
c000d000:   7c 63 48 51 subf.   r3,r3,r9
c000d004:   4c 82 00 20 bnelr
c000d008:   88 64 00 02 lbz r3,2(r4)
c000d00c:   89 25 00 02 lbz r9,2(r5)
c000d010:   7c 69 18 50 subfr3,r9,r3
c000d014:   4e 80 00 20 blr

c000d018 :
c000d018:   80 64 00 00 lwz r3,0(r4)
c000d01c:   81 25 00 00 lwz r9,0(r5)
c000d020:   7c 69 18 50 subfr3,r9,r3
c000d024:   4e 80 00 20 blr

c000d028 :
c000d028:   81 24 00 00 lwz r9,0(r4)
c000d02c:   80 65 00 00 lwz r3,0(r5)
c000d030:   7c 63 48 51 subf.   r3,r3,r9
c000d034:   4c 82 00 20 bnelr
c000d038:   88 64 00 04 lbz r3,4(r4)
c000d03c:   89 25 00 04 lbz r9,4(r5)
c000d040:   7c 69 18 50 subfr3,r9,r3
c000d044:   4e 80 00 20 blr

c000d048 :
c000d048:   81 24 00 00 lwz r9,0(r4)
c000d04c:   80 65 00 00 lwz r3,0(r5)
c000d050:   7c 63 48 51 subf.   r3,r3,r9
c000d054:   4c 82 00 20 bnelr
c000d058:   a0 64 00 04 lhz r3,4(r4)
c000d05c:   a1 25 00 04 lhz r9,4(r5)
c000d060:   7c 69 18 50 subfr3,r9,r3
c000d064:

[PATCH-RESEND] cxl: Disable prefault_mode in Radix mode

2018-05-18 Thread Vaibhav Jain
From: Vaibhav Jain 

Currently we see a kernel-oops reported on Power-9 while attaching a
context to an AFU, with radix-mode and sysfs attr 'prefault_mode' set
to anything other than 'none'. The backtrace of the oops is of this
form:

Unable to handle kernel paging request for data at address 0x0080
Faulting instruction address: 0xc0080bcf3b20
cpu 0x1: Vector: 300 (Data Access) at [c0037f003800]
pc: c0080bcf3b20: cxl_load_segment+0x178/0x290 [cxl]
lr: c0080bcf39f0: cxl_load_segment+0x48/0x290 [cxl]
sp: c0037f003a80
   msr: 90009033
   dar: 80
 dsisr: 4000
  current = 0xc0037f28
  paca= 0xc003e600   softe: 3irq_happened: 0x01
pid   = 3529, comm = afp_no_int

[c0037f003af0] c0080bcf4424 cxl_prefault+0xfc/0x248 [cxl]
[c0037f003b50] c0080bcf8a40 process_element_entry_psl9+0xd8/0x1a0 [cxl]
[c0037f003b90] c0080bcf944c 
cxl_attach_dedicated_process_psl9+0x44/0x130 [cxl]
[c0037f003bd0] c0080bcf5448 native_attach_process+0xc0/0x130 [cxl]
[c0037f003c50] c0080bcf16cc afu_ioctl+0x3f4/0x5e0 [cxl]
[c0037f003d00] c039d98c do_vfs_ioctl+0xdc/0x890
[c0037f003da0] c039e1a8 ksys_ioctl+0x68/0xf0
[c0037f003df0] c039e270 sys_ioctl+0x40/0xa0
[c0037f003e30] c000b320 system_call+0x58/0x6c
--- Exception: c01 (System Call) at 10053bb0

The issue is caused as on Power-8 the AFU attr 'prefault_mode' was
used to improve initial storage fault performance by prefaulting
process segments. However on Power-9 with radix mode we don't have
Storage-Segments that we can prefault. Also prefaulting process Pages
will be too costly and fine-grained.

Hence, since the prefaulting mechanism doesn't makes sense of
radix-mode, this patch updates prefault_mode_store() to not allow any
other value apart from CXL_PREFAULT_NONE when radix mode is enabled.

Cc: 
Fixes: f24be42aab37 ("cxl: Add psl9 specific code")
Signed-off-by: Vaibhav Jain 
---
Change-log:

Resend  ->  Updated the commit description to add more info on the
issue seen [Andrew]
---
 Documentation/ABI/testing/sysfs-class-cxl |  4 +++-
 drivers/misc/cxl/sysfs.c  | 16 
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-class-cxl 
b/Documentation/ABI/testing/sysfs-class-cxl
index 640f65e79ef1..267920a1874b 100644
--- a/Documentation/ABI/testing/sysfs-class-cxl
+++ b/Documentation/ABI/testing/sysfs-class-cxl
@@ -69,7 +69,9 @@ Date:   September 2014
 Contact:linuxppc-dev@lists.ozlabs.org
 Description:read/write
 Set the mode for prefaulting in segments into the segment table
-when performing the START_WORK ioctl. Possible values:
+when performing the START_WORK ioctl. Only applicable when
+running under hashed page table mmu.
+Possible values:
 none: No prefaulting (default)
 work_element_descriptor: Treat the work element
  descriptor as an effective address and
diff --git a/drivers/misc/cxl/sysfs.c b/drivers/misc/cxl/sysfs.c
index 4b5a4c5d3c01..629e2e156412 100644
--- a/drivers/misc/cxl/sysfs.c
+++ b/drivers/misc/cxl/sysfs.c
@@ -353,12 +353,20 @@ static ssize_t prefault_mode_store(struct device *device,
struct cxl_afu *afu = to_cxl_afu(device);
enum prefault_modes mode = -1;
 
-   if (!strncmp(buf, "work_element_descriptor", 23))
-   mode = CXL_PREFAULT_WED;
-   if (!strncmp(buf, "all", 3))
-   mode = CXL_PREFAULT_ALL;
if (!strncmp(buf, "none", 4))
mode = CXL_PREFAULT_NONE;
+   else {
+   if (!radix_enabled()) {
+
+   /* only allowed when not in radix mode */
+   if (!strncmp(buf, "work_element_descriptor", 23))
+   mode = CXL_PREFAULT_WED;
+   if (!strncmp(buf, "all", 3))
+   mode = CXL_PREFAULT_ALL;
+   } else {
+   dev_err(device, "Cannot prefault with radix enabled\n");
+   }
+   }
 
if (mode == -1)
return -EINVAL;
-- 
2.17.0



[PATCH] powerpc: fix spelling mistake: "Discharching" -> "Discharging"

2018-05-18 Thread Colin King
From: Colin Ian King 

Trivial fix to spelling mistake in battery_charging array

Signed-off-by: Colin Ian King 
---
 arch/powerpc/kernel/rtas-proc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/rtas-proc.c b/arch/powerpc/kernel/rtas-proc.c
index d49063d0baa4..ed0ed0c9b7b3 100644
--- a/arch/powerpc/kernel/rtas-proc.c
+++ b/arch/powerpc/kernel/rtas-proc.c
@@ -504,7 +504,7 @@ static void ppc_rtas_process_sensor(struct seq_file *m,
"EPOW power off" };
const char * battery_cyclestate[]  = { "None", "In progress", 
"Requested" };
-   const char * battery_charging[]= { "Charging", "Discharching", 
+   const char * battery_charging[]= { "Charging", "Discharging",
"No current flow" };
const char * ibm_drconnector[] = { "Empty", "Present", "Unusable", 
"Exchange" };
-- 
2.17.0



Re: [PATCH 4.9 27/33] futex: Remove duplicated code and fix undefined behaviour

2018-05-18 Thread Greg Kroah-Hartman
On Fri, May 18, 2018 at 10:30:24AM +0200, Jiri Slaby wrote:
> On 05/18/2018, 10:16 AM, Greg Kroah-Hartman wrote:
> > 4.9-stable review patch.  If anyone has any objections, please let me know.
> > 
> > --
> > 
> > From: Jiri Slaby 
> > 
> > commit 30d6e0a4190d37740e9447e4e4815f06992dd8c3 upstream.
> ...
> > --- a/kernel/futex.c
> > +++ b/kernel/futex.c
> > @@ -1458,6 +1458,45 @@ out:
> > return ret;
> >  }
> >  
> > +static int futex_atomic_op_inuser(unsigned int encoded_op, u32 __user 
> > *uaddr)
> > +{
> > +   unsigned int op = (encoded_op & 0x7000) >> 28;
> > +   unsigned int cmp =(encoded_op & 0x0f00) >> 24;
> > +   int oparg = sign_extend32((encoded_op & 0x00fff000) >> 12, 12);
> > +   int cmparg = sign_extend32(encoded_op & 0x0fff, 12);
> 
> 12 is wrong here – wherever you apply this, you need also a follow-up fix:
> commit d70ef22892ed6c066e51e118b225923c9b74af34
> Author: Jiri Slaby 
> Date:   Thu Nov 30 15:35:44 2017 +0100
> 
> futex: futex_wake_op, fix sign_extend32 sign bits

Thanks for letting me know, I've now queued it up to the needed trees.

greg k-h


Re: [PATCH 4.9 27/33] futex: Remove duplicated code and fix undefined behaviour

2018-05-18 Thread Jiri Slaby
On 05/18/2018, 10:16 AM, Greg Kroah-Hartman wrote:
> 4.9-stable review patch.  If anyone has any objections, please let me know.
> 
> --
> 
> From: Jiri Slaby 
> 
> commit 30d6e0a4190d37740e9447e4e4815f06992dd8c3 upstream.
...
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -1458,6 +1458,45 @@ out:
>   return ret;
>  }
>  
> +static int futex_atomic_op_inuser(unsigned int encoded_op, u32 __user *uaddr)
> +{
> + unsigned int op = (encoded_op & 0x7000) >> 28;
> + unsigned int cmp =(encoded_op & 0x0f00) >> 24;
> + int oparg = sign_extend32((encoded_op & 0x00fff000) >> 12, 12);
> + int cmparg = sign_extend32(encoded_op & 0x0fff, 12);

12 is wrong here – wherever you apply this, you need also a follow-up fix:
commit d70ef22892ed6c066e51e118b225923c9b74af34
Author: Jiri Slaby 
Date:   Thu Nov 30 15:35:44 2017 +0100

futex: futex_wake_op, fix sign_extend32 sign bits

thanks,
-- 
js
suse labs


[PATCH 4.9 27/33] futex: Remove duplicated code and fix undefined behaviour

2018-05-18 Thread Greg Kroah-Hartman
4.9-stable review patch.  If anyone has any objections, please let me know.

--

From: Jiri Slaby 

commit 30d6e0a4190d37740e9447e4e4815f06992dd8c3 upstream.

There is code duplicated over all architecture's headers for
futex_atomic_op_inuser. Namely op decoding, access_ok check for uaddr,
and comparison of the result.

Remove this duplication and leave up to the arches only the needed
assembly which is now in arch_futex_atomic_op_inuser.

This effectively distributes the Will Deacon's arm64 fix for undefined
behaviour reported by UBSAN to all architectures. The fix was done in
commit 5f16a046f8e1 (arm64: futex: Fix undefined behaviour with
FUTEX_OP_OPARG_SHIFT usage). Look there for an example dump.

And as suggested by Thomas, check for negative oparg too, because it was
also reported to cause undefined behaviour report.

Note that s390 removed access_ok check in d12a29703 ("s390/uaccess:
remove pointless access_ok() checks") as access_ok there returns true.
We introduce it back to the helper for the sake of simplicity (it gets
optimized away anyway).

Signed-off-by: Jiri Slaby 
Signed-off-by: Thomas Gleixner 
Acked-by: Russell King 
Acked-by: Michael Ellerman  (powerpc)
Acked-by: Heiko Carstens  [s390]
Acked-by: Chris Metcalf  [for tile]
Reviewed-by: Darren Hart (VMware) 
Reviewed-by: Will Deacon  [core/arm64]
Cc: linux-m...@linux-mips.org
Cc: Rich Felker 
Cc: linux-i...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: pet...@infradead.org
Cc: Benjamin Herrenschmidt 
Cc: Max Filippov 
Cc: Paul Mackerras 
Cc: sparcli...@vger.kernel.org
Cc: Jonas Bonn 
Cc: linux-s...@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: Yoshinori Sato 
Cc: linux-hexa...@vger.kernel.org
Cc: Helge Deller 
Cc: "James E.J. Bottomley" 
Cc: Catalin Marinas 
Cc: Matt Turner 
Cc: linux-snps-...@lists.infradead.org
Cc: Fenghua Yu 
Cc: Arnd Bergmann 
Cc: linux-xte...@linux-xtensa.org
Cc: Stefan Kristiansson 
Cc: openr...@lists.librecores.org
Cc: Ivan Kokshaysky 
Cc: Stafford Horne 
Cc: linux-arm-ker...@lists.infradead.org
Cc: Richard Henderson 
Cc: Chris Zankel 
Cc: Michal Simek 
Cc: Tony Luck 
Cc: linux-par...@vger.kernel.org
Cc: Vineet Gupta 
Cc: Ralf Baechle 
Cc: Richard Kuo 
Cc: linux-al...@vger.kernel.org
Cc: Martin Schwidefsky 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: "David S. Miller" 
Link: http://lkml.kernel.org/r/20170824073105.3901-1-jsl...@suse.cz
Cc: Ben Hutchings 
Signed-off-by: Greg Kroah-Hartman 

---
 arch/alpha/include/asm/futex.h  |   26 +++---
 arch/arc/include/asm/futex.h|   40 +++-
 arch/arm/include/asm/futex.h|   26 ++
 arch/arm64/include/asm/futex.h  |   27 ++-
 arch/frv/include/asm/futex.h|3 +-
 arch/frv/kernel/futex.c |   27 ++-
 arch/hexagon/include/asm/futex.h|   38 ++-
 arch/ia64/include/asm/futex.h   |   25 ++
 arch/microblaze/include/asm/futex.h |   38 ++-
 arch/mips/include/asm/futex.h   |   25 ++
 arch/parisc/include/asm/futex.h |   26 ++
 arch/powerpc/include/asm/futex.h|   26 +++---
 arch/s390/include/asm/futex.h   |   23 +++-
 arch/sh/include/asm/futex.h |   26 ++
 arch/sparc/include/asm/futex_64.h   |   26 +++---
 arch/tile/include/asm/futex.h   |   40 +++-
 arch/x86/include/asm/futex.h|   40 +++-
 arch/xtensa/include/asm/futex.h |   27 +++
 include/asm-generic/futex.h |   50 ++--
 kernel/futex.c  |   39 
 20 files changed, 126 insertions(+), 472 deletions(-)

--- a/arch/alpha/include/asm/futex.h
+++ b/arch/alpha/include/asm/futex.h
@@ -29,18 +29,10 @@
:   "r" (uaddr), "r"(oparg) \
:   "memory")
 
-static inline int futex_atomic_op_inuser (int encoded_op, u32 __user *uaddr)
+static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval,
+   u32 __user *uaddr)
 {
-   int op = (encoded_op 

[PATCH] powerpc/perf: Remove sched_task function defined for thread-imc

2018-05-18 Thread Anju T Sudhakar
Call trace observed while running perf-fuzzer:

[  329.228068] CPU: 43 PID: 9088 Comm: perf_fuzzer Not tainted 
4.13.0-32-generic #35~lp1746225
[  329.228070] task: c03f776ac900 task.stack: c03f77728000
[  329.228071] NIP: c0299b70 LR: c02a4534 CTR: c029bb80
[  329.228073] REGS: c03f7772b760 TRAP: 0700   Not tainted  
(4.13.0-32-generic)
[  329.228073] MSR: 9282b033 
[  329.228079]   CR: 24008822  XER: 
[  329.228080] CFAR: c0299a70 SOFTE: 0
GPR00: c02a4534 c03f7772b9e0 c1606200 c03fef858908
GPR04: c03f776ac900 0001  003fee73
GPR08:   c11220d8 0002
GPR12: c029bb80 c7a3d900  
GPR16:    
GPR20:   c03f776ad090 c0c71354
GPR24: c03fef716780 003fee73 c03fe69d4200 c03f776ad330
GPR28: c11220d8 0001 c14c6108 c03fef858900
[  329.228098] NIP [c0299b70] perf_pmu_sched_task+0x170/0x180
[  329.228100] LR [c02a4534] __perf_event_task_sched_in+0xc4/0x230
[  329.228101] Call Trace:
[  329.228102] [c03f7772b9e0] [c02a0678] 
perf_iterate_sb+0x158/0x2a0 (unreliable)
[  329.228105] [c03f7772ba30] [c02a4534] 
__perf_event_task_sched_in+0xc4/0x230
[  329.228107] [c03f7772bab0] [c01396dc] 
finish_task_switch+0x21c/0x310
[  329.228109] [c03f7772bb60] [c0c71354] __schedule+0x304/0xb80
[  329.228111] [c03f7772bc40] [c0c71c10] schedule+0x40/0xc0
[  329.228113] [c03f7772bc60] [c01033f4] do_wait+0x254/0x2e0
[  329.228115] [c03f7772bcd0] [c0104ac0] kernel_wait4+0xa0/0x1a0
[  329.228117] [c03f7772bd70] [c0104c24] SyS_wait4+0x64/0xc0
[  329.228121] [c03f7772be30] [c000b184] system_call+0x58/0x6c
[  329.228121] Instruction dump:
[  329.228123] 3beafea0 7faa4800 409eff18 e8010060 eb610028 ebc10040 7c0803a6 
38210050
[  329.228127] eb81ffe0 eba1ffe8 ebe1fff8 4e800020 <0fe0> 4bbc 6000 
6042
[  329.228131] ---[ end trace 8c46856d314c1811 ]---
[  375.755943] hrtimer: interrupt took 31601 ns


The context switch call-backs for thread-imc are defined in sched_task function.
So when thread-imc events are grouped with software pmu events,
perf_pmu_sched_task hits the WARN_ON_ONCE condition, since software PMUs are
assumed not to have a sched_task defined. 
 
Patch to move the thread_imc enable/disable opal call back from sched_task to
event_[add/del] function

Signed-off-by: Anju T Sudhakar 
---
 arch/powerpc/perf/imc-pmu.c | 108 +---
 1 file changed, 51 insertions(+), 57 deletions(-)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index d7532e7..71d9ba7 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -866,59 +866,6 @@ static int thread_imc_cpu_init(void)
  ppc_thread_imc_cpu_offline);
 }
 
-void thread_imc_pmu_sched_task(struct perf_event_context *ctx,
- bool sched_in)
-{
-   int core_id;
-   struct imc_pmu_ref *ref;
-
-   if (!is_core_imc_mem_inited(smp_processor_id()))
-   return;
-
-   core_id = smp_processor_id() / threads_per_core;
-   /*
-* imc pmus are enabled only when it is used.
-* See if this is triggered for the first time.
-* If yes, take the mutex lock and enable the counters.
-* If not, just increment the count in ref count struct.
-*/
-   ref = _imc_refc[core_id];
-   if (!ref)
-   return;
-
-   if (sched_in) {
-   mutex_lock(>lock);
-   if (ref->refc == 0) {
-   if (opal_imc_counters_start(OPAL_IMC_COUNTERS_CORE,
-get_hard_smp_processor_id(smp_processor_id( {
-   mutex_unlock(>lock);
-   pr_err("thread-imc: Unable to start the counter\
-   for core %d\n", 
core_id);
-   return;
-   }
-   }
-   ++ref->refc;
-   mutex_unlock(>lock);
-   } else {
-   mutex_lock(>lock);
-   ref->refc--;
-   if (ref->refc == 0) {
-   if (opal_imc_counters_stop(OPAL_IMC_COUNTERS_CORE,
-   get_hard_smp_processor_id(smp_processor_id( {
-   mutex_unlock(>lock);
-   pr_err("thread-imc: Unable to stop the counters\
-   for core %d\n", 
core_id);
-   

[PATCH] cxl: Disable prefault_mode in Radix mode

2018-05-18 Thread Vaibhav Jain
From: Vaibhav Jain 

On Power-8 the AFU attr prefault_mode tried to improve storage fault
performance by prefaulting process segments. However Power-9 radix
mode doesn't have Storage-Segments and prefaulting Pages is too fine
grained.

So this patch updates prefault_mode_store() to not allow any other
value apart from CXL_PREFAULT_NONE when radix mode is enabled.

Cc: 
Fixes: f24be42aab37 ("cxl: Add psl9 specific code")
Signed-off-by: Vaibhav Jain 
---
 Documentation/ABI/testing/sysfs-class-cxl |  4 +++-
 drivers/misc/cxl/sysfs.c  | 16 
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-class-cxl 
b/Documentation/ABI/testing/sysfs-class-cxl
index 640f65e79ef1..267920a1874b 100644
--- a/Documentation/ABI/testing/sysfs-class-cxl
+++ b/Documentation/ABI/testing/sysfs-class-cxl
@@ -69,7 +69,9 @@ Date:   September 2014
 Contact:linuxppc-dev@lists.ozlabs.org
 Description:read/write
 Set the mode for prefaulting in segments into the segment table
-when performing the START_WORK ioctl. Possible values:
+when performing the START_WORK ioctl. Only applicable when
+running under hashed page table mmu.
+Possible values:
 none: No prefaulting (default)
 work_element_descriptor: Treat the work element
  descriptor as an effective address and
diff --git a/drivers/misc/cxl/sysfs.c b/drivers/misc/cxl/sysfs.c
index 4b5a4c5d3c01..629e2e156412 100644
--- a/drivers/misc/cxl/sysfs.c
+++ b/drivers/misc/cxl/sysfs.c
@@ -353,12 +353,20 @@ static ssize_t prefault_mode_store(struct device *device,
struct cxl_afu *afu = to_cxl_afu(device);
enum prefault_modes mode = -1;
 
-   if (!strncmp(buf, "work_element_descriptor", 23))
-   mode = CXL_PREFAULT_WED;
-   if (!strncmp(buf, "all", 3))
-   mode = CXL_PREFAULT_ALL;
if (!strncmp(buf, "none", 4))
mode = CXL_PREFAULT_NONE;
+   else {
+   if (!radix_enabled()) {
+
+   /* only allowed when not in radix mode */
+   if (!strncmp(buf, "work_element_descriptor", 23))
+   mode = CXL_PREFAULT_WED;
+   if (!strncmp(buf, "all", 3))
+   mode = CXL_PREFAULT_ALL;
+   } else {
+   dev_err(device, "Cannot prefault with radix enabled\n");
+   }
+   }
 
if (mode == -1)
return -EINVAL;
-- 
2.17.0



Re: [PATCH 1/3] powerpc/io: Add __raw_writeq_be() __raw_rm_writeq_be()

2018-05-18 Thread Samuel Mendoza-Jonas
On Mon, 2018-05-14 at 22:50 +1000, Michael Ellerman wrote:
> Add byte-swapping versions of __raw_writeq() and __raw_rm_writeq().
> 
> This allows us to avoid sparse warnings caused by passing __be64 to
> __raw_writeq(), which takes unsigned long:
> 
>   arch/powerpc/platforms/powernv/pci-ioda.c:1981:38:
>   warning: incorrect type in argument 1 (different base types)
>   expected unsigned long [unsigned] v
>   got restricted __be64 [usertype] 
> 
> It's also generally preferable to use a byte-swapping accessor rather
> than doing it by hand in the code, which is more bug prone.
> 
> Signed-off-by: Michael Ellerman 

For this and the following patches:

Reviewed-by: Samuel Mendoza-Jonas 

> ---
>  arch/powerpc/include/asm/io.h | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
> index af074923d598..e0331e754568 100644
> --- a/arch/powerpc/include/asm/io.h
> +++ b/arch/powerpc/include/asm/io.h
> @@ -367,6 +367,11 @@ static inline void __raw_writeq(unsigned long v, 
> volatile void __iomem *addr)
>   *(volatile unsigned long __force *)PCI_FIX_ADDR(addr) = v;
>  }
>  
> +static inline void __raw_writeq_be(unsigned long v, volatile void __iomem 
> *addr)
> +{
> + __raw_writeq((__force unsigned long)cpu_to_be64(v), addr);
> +}
> +
>  /*
>   * Real mode versions of the above. Those instructions are only supposed
>   * to be used in hypervisor real mode as per the architecture spec.
> @@ -395,6 +400,11 @@ static inline void __raw_rm_writeq(u64 val, volatile 
> void __iomem *paddr)
>   : : "r" (val), "r" (paddr) : "memory");
>  }
>  
> +static inline void __raw_rm_writeq_be(u64 val, volatile void __iomem *paddr)
> +{
> + __raw_rm_writeq((__force u64)cpu_to_be64(val), paddr);
> +}
> +
>  static inline u8 __raw_rm_readb(volatile void __iomem *paddr)
>  {
>   u8 ret;



Re: [PATCH v4 3/4] powerpc/64: add 32 bytes prechecking before using VMX optimization on memcmp()

2018-05-18 Thread Simon Guo
Hi Michael,
On Fri, May 18, 2018 at 12:13:52AM +1000, Michael Ellerman wrote:
> wei.guo.si...@gmail.com writes:
> > From: Simon Guo 
> >
> > This patch is based on the previous VMX patch on memcmp().
> >
> > To optimize ppc64 memcmp() with VMX instruction, we need to think about
> > the VMX penalty brought with: If kernel uses VMX instruction, it needs
> > to save/restore current thread's VMX registers. There are 32 x 128 bits
> > VMX registers in PPC, which means 32 x 16 = 512 bytes for load and store.
> >
> > The major concern regarding the memcmp() performance in kernel is KSM,
> > who will use memcmp() frequently to merge identical pages. So it will
> > make sense to take some measures/enhancement on KSM to see whether any
> > improvement can be done here.  Cyril Bur indicates that the memcmp() for
> > KSM has a higher possibility to fail (unmatch) early in previous bytes
> > in following mail.
> > https://patchwork.ozlabs.org/patch/817322/#1773629
> > And I am taking a follow-up on this with this patch.
> >
> > Per some testing, it shows KSM memcmp() will fail early at previous 32
> > bytes.  More specifically:
> > - 76% cases will fail/unmatch before 16 bytes;
> > - 83% cases will fail/unmatch before 32 bytes;
> > - 84% cases will fail/unmatch before 64 bytes;
> > So 32 bytes looks a better choice than other bytes for pre-checking.
> >
> > This patch adds a 32 bytes pre-checking firstly before jumping into VMX
> > operations, to avoid the unnecessary VMX penalty. And the testing shows
> > ~20% improvement on memcmp() average execution time with this patch.
> >
> > The detail data and analysis is at:
> > https://github.com/justdoitqd/publicFiles/blob/master/memcmp/README.md
> >
> > Any suggestion is welcome.
> 
> Thanks for digging into that, really great work.
> 
> I'm inclined to make this not depend on KSM though. It seems like a good
> optimisation to do in general.
> 
> So can we just call it the 'pre-check' or something, and always do it?
> 
Sound reasonable to me.
I will expand the change to .Ldiffoffset_vmx_cmp case and test accordingly.

Thanks,
- Simon