Re: pkeys on POWER: Access rights not reset on execve
On 05/19/2018 03:50 AM, Andy Lutomirski wrote: Now another thread calls pkey_alloc(), so UAMR is asynchronously changed, and the thread will write zero to the relevant AMR bits. If I understand correctly, this means that the decision to mask off unallocated keys via UAMR effectively forces the initial value of newly-allocated keys in other threads in the allocating process to be zero, whatever zero means. (I didn't get far enough in the POWER docs to figure out what zero means.) So (Note that this belongs on the other thread, here I originally wanted to talk about the lack of reset of AMR to the default value on execve.) I don't think UAMOR is updated asynchronously. On pkey_alloc, it is only changed for the current thread, and future threads launched from it. Existing threads are unaffected. This still results in a programming model which is substantially different from x86. I don't think you're doing anyone any favors by making UAMR dynamic. This is still true, I think. Thanks, Florian
Re: [PATCH 1/2] m68k: Set default dma mask for platform devices
On Fri, 18 May 2018, Christoph Hellwig wrote: > > This implementation of arch_setup_pdev_archdata() differs from the > > powerpc one, in that this one avoids clobbering a device dma mask > > which has already been initialized. > > I think your implementation should move into the generic implementation > in drivers/base/platform.c instead of being stuck in m68k. > > Also powerpc probably wants fixing, but that's something left to the > ppc folks.. On powerpc, all platform devices get a dma mask. But they don't do that in drivers/base/platform.c, they use arch_setup_pdev_archdata(). Why didn't they take the approach you suggest? How would I support the claim that replacing an empty platform device dma mask with 0x is safe on all architectures and platforms? Is there no code conditional upon dev.coherent_dma_mask or dev.dma_mask that could misbehave? (Didn't I cite an example in the other thread?*) If you can convince me that it is safe, I'd be happy to submit the patch you asked for. For now, I still think that patching the platform driver was the correct patch*. Maybe the real problem is your commit 205e1b7f51e4 ("dma-mapping: warn when there is no coherent_dma_mask"), because it assumes that all dma_ops implementations care about coherent_dma_mask. The dma_map_ops implementations that do use coherent_dma_mask should simply fail when it is unset, right? Would it not be better to revert your patch and fix the dma_map_ops failure paths, than to somehow audit all the platform drivers and patch drivers/base/platform.c? Thanks. * https://lkml.kernel.org/r/alpine.LNX.2.21.1805091804290.72%40nippy.intranet --
Re: pkeys on POWER: Default AMR, UAMOR values
On 05/19/2018 02:52 AM, Ram Pai wrote: The POWER semantics make it very hard for a multithreaded program to meaningfully use protection keys to prevent accidental access to important memory. And you can change access rights for unallocated keys (unallocated at thread start time, allocated later) on x86. I have extended the misc/tst-pkeys test to verify that, and it passes on x86, but not on POWER, where the access rights are stuck. This is something I do not understand. How can a thread change permissions on a key, that is not even allocated in the first place. It was allocated by another thread, and there is synchronization so that the allocation happens before the change in access rights. Do you consider a key allocated in some other thread's context, as allocated in this threads context? Yes, x86 does that. If not, does that mean -- On x86, you can activate a key just by changing its permission? This also true on x86, but just an artifact of the implementation. You are supposed to call pkey_alloc before changing the flag. Thanks, Florian
Re: pkeys on POWER: Access rights not reset on execve
On 05/19/2018 03:19 AM, Ram Pai wrote: New AMR value (PID 112291, before execl): 0x0c00 AMR (PID 112291): 0x0c00 The issue is here. The second line is after the execl (printed from the start of main), and the AMR value is not reset to zero. Allocated key (PID 112291): 2 I think this is a real bug and needs to be fixed even if the defaults are kept as-is (see the other thread). The issue you may be talking about here is that -- "when you set the AMR register to 0x, it just sets it to 0x0c00." To me it looks like, exec/fork are not related to the issue. Or are they also somehow connected to the issue? Yes, this is the other issue. It is not really important here. Thanks, Florian
[PATCH 3/3] powerpc/64: hard disable irqs on the panic()ing CPU
Similar to previous patches, hard disable interrupts when a CPU is in panic. This reduces the chance the watchdog has to interfere with the panic, and avoids any other type of masked interrupt being executed when crashing which minimises the length of the crash path. Signed-off-by: Nicholas Piggin--- arch/powerpc/kernel/setup-common.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c index 0af5c11b9e78..d45fb522fe8a 100644 --- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -706,12 +706,19 @@ EXPORT_SYMBOL(check_legacy_ioport); static int ppc_panic_event(struct notifier_block *this, unsigned long event, void *ptr) { + /* +* panic does a local_irq_disable, but we really +* want interrupts to be hard disabled. +*/ + hard_irq_disable(); + /* * If firmware-assisted dump has been registered then trigger * firmware-assisted dump and let firmware handle everything else. */ crash_fadump(NULL, ptr); - ppc_md.panic(ptr); /* May not return */ + if (ppc_md.panic) + ppc_md.panic(ptr); /* May not return */ return NOTIFY_DONE; } @@ -722,7 +729,8 @@ static struct notifier_block ppc_panic_block = { void __init setup_panic(void) { - if (!ppc_md.panic) + /* PPC64 always does a hard irq disable in its panic handler */ + if (!IS_ENABLED(CONFIG_PPC64) && !ppc_md.panic) return; atomic_notifier_chain_register(_notifier_list, _panic_block); } -- 2.17.0
[PATCH 2/3] powerpc: smp_send_stop do not offline stopped CPUs
Marking CPUs stopped by smp_send_stop as offline can cause warnings due to cross-CPU wakeups. This trace was noticed on a busy system running a sysrq+c crash test, after the injected crash: WARNING: CPU: 51 PID: 1546 at kernel/sched/core.c:1179 set_task_cpu+0x22c/0x240 CPU: 51 PID: 1546 Comm: kworker/u352:1 Tainted: G D Workqueue: mlx5e mlx5e_update_stats_work [mlx5_core] [...] NIP [c017c21c] set_task_cpu+0x22c/0x240 LR [c017d580] try_to_wake_up+0x230/0x720 Call Trace: [c1017700] runqueues+0x0/0xb00 (unreliable) [c017d580] try_to_wake_up+0x230/0x720 [c015a214] insert_work+0x104/0x140 [c015adb0] __queue_work+0x230/0x690 [c03fc5007910] [c015b26c] queue_work_on+0x5c/0x90 [c008135fc8f8] mlx5_cmd_exec+0x538/0xcb0 [mlx5_core] [c00813608fd0] mlx5_core_access_reg+0x140/0x1d0 [mlx5_core] [c0081362777c] mlx5e_update_pport_counters.constprop.59+0x6c/0x90 [mlx5_core] [c00813628868] mlx5e_update_ndo_stats+0x28/0x90 [mlx5_core] [c00813625558] mlx5e_update_stats_work+0x68/0xb0 [mlx5_core] [c015bcec] process_one_work+0x1bc/0x5f0 [c015ecac] worker_thread+0xac/0x6b0 [c0168338] kthread+0x168/0x1b0 [c000b628] ret_from_kernel_thread+0x5c/0xb4 This happens because firstly the CPU is not really offline in the usual sense, processes and interrupts have not been migrated away. Secondly smp_send_stop does not happen atomically on all CPUs, so one CPU can have marked itself offline, while another CPU is still running processes or interrupts which can affect the first CPU. Fix this by just not marking the CPU as offline. It's more like frozen in time, so offline does not really reflect its state properly anyway. There should be nothing in the crash/panic path that walks online CPUs and synchronously waits for them, so this change should not introduce new hangs. Signed-off-by: Nicholas Piggin--- arch/powerpc/kernel/smp.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index 9ca7148b5881..6d6cf14009cf 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -579,9 +579,6 @@ static void nmi_stop_this_cpu(struct pt_regs *regs) nmi_ipi_busy_count--; nmi_ipi_unlock(); - /* Remove this CPU */ - set_cpu_online(smp_processor_id(), false); - spin_begin(); while (1) spin_cpu_relax(); @@ -596,9 +593,6 @@ void smp_send_stop(void) static void stop_this_cpu(void *dummy) { - /* Remove this CPU */ - set_cpu_online(smp_processor_id(), false); - hard_irq_disable(); spin_begin(); while (1) -- 2.17.0
[PATCH 1/3] powerpc/64: hard disable irqs in panic_smp_self_stop
Similarly to commit 855bfe0de1 ("powerpc: hard disable irqs in smp_send_stop loop"), irqs should be hard disabled by panic_smp_self_stop. Signed-off-by: Nicholas Piggin--- arch/powerpc/kernel/setup_64.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index b78f142a4148..d122321ad542 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -380,6 +380,14 @@ void early_setup_secondary(void) #endif /* CONFIG_SMP */ +void panic_smp_self_stop(void) +{ + hard_irq_disable(); + spin_begin(); + while (1) + spin_cpu_relax(); +} + #if defined(CONFIG_SMP) || defined(CONFIG_KEXEC_CORE) static bool use_spinloop(void) { -- 2.17.0
[PATCH 0/3] further reduce warnings and watchdogs in crash/shutdown paths
Patches 1 and 3 are more of the same, just upgrading local_irq_disable to hard_irq_disable in places. Patch 2 fixes a warning people have been hitting in crash testing on busy systems. Thanks, Nick Nicholas Piggin (3): powerpc/64: hard disable irqs in panic_smp_self_stop powerpc: smp_send_stop do not offline stopped CPUs powerpc/64: hard disable irqs on the panic()ing CPU arch/powerpc/kernel/setup-common.c | 12 ++-- arch/powerpc/kernel/setup_64.c | 8 arch/powerpc/kernel/smp.c | 6 -- 3 files changed, 18 insertions(+), 8 deletions(-) -- 2.17.0
Re: pkeys on POWER: Access rights not reset on execve
On Fri, May 18, 2018 at 6:19 PM Ram Paiwrote: > However the fundamental issue is still the same, as mentioned in the > other thread. > "Should the permissions on a key be allowed to be changed, if the key > is not allocated in the first place?". > my answer is NO. Lets debate :) As a preface, here's my two-minute attempt to understand POWER's behavior: there are two registers, AMR and UAMR. AMR contains both kernel-relevant state and user-relevant state. UAMR specifies which bits of AMR are available for user code to directly access. AMR bits outside UAMR are read as zero and are unaffected by writes. I'm assuming that the kernel reserves some subset of AMR bits in advance to correspond to allocatable pkeys. Here's my question: given that disallowed AMR bits are read-as-zero, there can always be a thread that is in the middle of a sequence like: unsigned long old = amr; amr |= whatever; ... <- thread is here amr = old; Now another thread calls pkey_alloc(), so UAMR is asynchronously changed, and the thread will write zero to the relevant AMR bits. If I understand correctly, this means that the decision to mask off unallocated keys via UAMR effectively forces the initial value of newly-allocated keys in other threads in the allocating process to be zero, whatever zero means. (I didn't get far enough in the POWER docs to figure out what zero means.) So I don't think you're doing anyone any favors by making UAMR dynamic. IOW both x86 and POWER screwed up the ISA.
Re: pkeys on POWER: Access rights not reset on execve
On Fri, May 18, 2018 at 04:27:14PM +0200, Florian Weimer wrote: > This test program: > > #include > #include > #include > #include > #include > #include > > /* Return the value of the AMR register. */ > static inline unsigned long int > pkey_read (void) > { > unsigned long int result; > __asm__ volatile ("mfspr %0, 13" : "=r" (result)); > return result; > } > > /* Overwrite the AMR register with VALUE. */ > static inline void > pkey_write (unsigned long int value) > { > __asm__ volatile ("mtspr 13, %0" : : "r" (value)); > } > > int > main (int argc, char **argv) > { > printf ("AMR (PID %d): 0x%016lx\n", (int) getpid (), pkey_read()); > if (argc > 1) > { > int key = syscall (__NR_pkey_alloc, 0, 0); > if (key < 0) > err (1, "pkey_alloc"); > printf ("Allocated key (PID %d): %d\n", (int) getpid (), key); > return 0; > } > > pid_t pid = fork (); > if (pid == 0) > { > execl ("/proc/self/exe", argv[0], "subprocess", NULL); > _exit (1); > } > if (pid < 0) > err (1, "fork"); > int status; > if (waitpid (pid, , 0) < 0) > err (1, "waitpid"); > > int key = syscall (__NR_pkey_alloc, 0, 0); > if (key < 0) > err (1, "pkey_alloc"); > printf ("Allocated key (PID %d): %d\n", (int) getpid (), key); > > unsigned long int amr = -1; > printf ("Setting AMR: 0x%016lx\n", amr); > pkey_write (amr); > printf ("New AMR value (PID %d, before execl): 0x%016lx\n", > (int) getpid (), pkey_read()); > execl ("/proc/self/exe", argv[0], "subprocess", NULL); > err (1, "exec"); > return 1; > } > > shows that the AMR register value is not reset on execve: > > AMR (PID 112291): 0x > AMR (PID 112292): 0x > Allocated key (PID 112292): 2 > Allocated key (PID 112291): 2 > Setting AMR: 0x > New AMR value (PID 112291, before execl): 0x0c00 > AMR (PID 112291): 0x0c00 > Allocated key (PID 112291): 2 > > I think this is a real bug and needs to be fixed even if the > defaults are kept as-is (see the other thread). The issue you may be talking about here is that -- "when you set the AMR register to 0x, it just sets it to 0x0c00." To me it looks like, exec/fork are not related to the issue. Or are they also somehow connected to the issue? The reason the AMR register does not get set to 0x, is because none of those keys; except key 2, are active. So it ignores all other bits and just sets the bits corresponding to key 2. However the fundamental issue is still the same, as mentioned in the other thread. "Should the permissions on a key be allowed to be changed, if the key is not allocated in the first place?". my answer is NO. Lets debate :) RP
Re: pkeys on POWER: Default AMR, UAMOR values
On Fri, May 18, 2018 at 11:13:30PM +0200, Florian Weimer wrote: > On 05/18/2018 09:39 PM, Andy Lutomirski wrote: > >The difference is that x86 starts out with deny-all instead of allow-all. Ah!. this explains the discrepency. But still does not explain one thing.. see below. > >The POWER semantics make it very hard for a multithreaded program to > >meaningfully use protection keys to prevent accidental access to important > >memory. > > And you can change access rights for unallocated keys (unallocated > at thread start time, allocated later) on x86. I have extended the > misc/tst-pkeys test to verify that, and it passes on x86, but not on > POWER, where the access rights are stuck. This is something I do not understand. How can a thread change permissions on a key, that is not even allocated in the first place. Do you consider a key allocated in some other thread's context, as allocated in this threads context? If not, does that mean -- On x86, you can activate a key just by changing its permission? RP
Re: pkeys on POWER: Default AMR, UAMOR values
On 05/18/2018 09:39 PM, Andy Lutomirski wrote: The difference is that x86 starts out with deny-all instead of allow-all. The POWER semantics make it very hard for a multithreaded program to meaningfully use protection keys to prevent accidental access to important memory. And you can change access rights for unallocated keys (unallocated at thread start time, allocated later) on x86. I have extended the misc/tst-pkeys test to verify that, and it passes on x86, but not on POWER, where the access rights are stuck. I believe this is due to an incorrect UAMOR setting. Thanks, Florian
Re: pkeys on POWER: Default AMR, UAMOR values
On 05/18/2018 07:44 PM, Ram Pai wrote: Florian, is the behavior on x86 any different? A key allocated in the context off one thread is not meaningful in the context of any other thread. Since thread B was created prior to the creation of the key, and the key was created in the context of thread A, thread B neither inherits the key nor its permissions. Atleast that is how the semantics are supposed to work as per the man page. man 7 pkey " Applications using threads and protection keys should be especially careful. Threads inherit the protection key rights of the parent at the time of the clone(2), system call. Applications should either ensure that their own permissions are appropriate for child threads at the time when clone(2) is called, or ensure that each child thread can perform its own initialization of protection key rights." I reported two separate issues (actually three, but the execve bug is in a separate issue). The default, and the write restrictions. The default is just a difference to x86 (however, x86 can be booted with init_pkru=0 and behaves the same way, but we're probably going to remove that). The POWER implementation has the additional wrinkle that threads launched early, before key allocation, can never change access rights because they inherited not just the access rights, but also the access rights access mask. This is different from x86, where all threads can freely update access rights, and contradicts the behavior in the manpage which says that “each child thread can perform its own initialization of protection key rights”. It can't do that if it is launched before key allocation, which is not the right behavior IMO. Thanks, Florian
Re: [RFC v3 1/4] powerpc/hotplug/drcinfo: Fix bugs parsing ibm,drc-info structs
See below. On 05/18/2018 02:57 PM, Nathan Fontenot wrote: > On 05/17/2018 05:41 PM, Michael Bringmann wrote: >> [Replace/withdraw previous patch submission to ensure that testing >> of related patches on similar hardware progresses together.] >> >> This patch fixes a memory parsing bug when using of_prop_next_u32 >> calls at the start of a structure. Depending upon the value of >> "cur" memory pointer argument to of_prop_next_u32, it will or it >> won't advance the value of the returned memory pointer by the >> size of one u32. This patch corrects the code to deal with that >> indexing feature when parsing the ibm,drc-info structs for CPUs. >> Also, need to advance the pointer at the end of_read_drc_info_cell >> for same reason. >> > > I see that you provide an update for of_read_drc_info_cell to fix the > unexpected behavior you're seeing, but I'm not sure why you're updating > the code in pseries_energy.c and rpaphp_core.c? can you provide some > additional information as to why these functions also need to be updated. The changes are related. of_prop_next_u32() does not read a u32 and then advance the pointer. It advances the pointer and then reads a u32. It does an error check to see whether the u32 read is within the boundary of the property value, but it returns a pointer to the u32 that was read. > >> Signed-off-by: Michael Bringmann>> Fixes: 3f38000eda48 ("powerpc/firmware: Add definitions for new drc-info >> firmware feature" -- end of patch series applied to powerpc next) >> --- >> Changes in V3: >> -- Rebased patch to 4.17-rc5 kernel >> --- >> arch/powerpc/platforms/pseries/of_helpers.c |5 ++--- >> arch/powerpc/platforms/pseries/pseries_energy.c |2 ++ >> drivers/pci/hotplug/rpaphp_core.c |1 + >> 3 files changed, 5 insertions(+), 3 deletions(-) >> >> diff --git a/arch/powerpc/platforms/pseries/of_helpers.c >> b/arch/powerpc/platforms/pseries/of_helpers.c >> index 6df192f..20598b2 100644 >> --- a/arch/powerpc/platforms/pseries/of_helpers.c >> +++ b/arch/powerpc/platforms/pseries/of_helpers.c >> @@ -65,9 +65,7 @@ int of_read_drc_info_cell(struct property **prop, const >> __be32 **curval, >> >> /* Get drc-index-start:encode-int */ >> p2 = (const __be32 *)p; >> -p2 = of_prop_next_u32(*prop, p2, >drc_index_start); >> -if (!p2) >> -return -EINVAL; >> +data->drc_index_start = of_read_number(p2, 1); > > This appears to resolve advancing the pointer for the beginning of a struct. > >> >> /* Get drc-name-suffix-start:encode-int */ >> p2 = of_prop_next_u32(*prop, p2, >drc_name_suffix_start); >> @@ -88,6 +86,7 @@ int of_read_drc_info_cell(struct property **prop, const >> __be32 **curval, >> p2 = of_prop_next_u32(*prop, p2, >drc_power_domain); >> if (!p2) >> return -EINVAL; >> +p2++; > > ...but why is the advancement needed here? of_prop_next_u32 should have > advanced it, correct? > > -Nathan > >> >> /* Should now know end of current entry */ >> (*curval) = (void *)p2; >> diff --git a/arch/powerpc/platforms/pseries/pseries_energy.c >> b/arch/powerpc/platforms/pseries/pseries_energy.c >> index 6ed2212..c7d84aa 100644 >> --- a/arch/powerpc/platforms/pseries/pseries_energy.c >> +++ b/arch/powerpc/platforms/pseries/pseries_energy.c >> @@ -64,6 +64,7 @@ static u32 cpu_to_drc_index(int cpu) >> value = of_prop_next_u32(info, NULL, _set_entries); >> if (!value) >> goto err_of_node_put; >> +value++; >> >> for (j = 0; j < num_set_entries; j++) { >> >> @@ -126,6 +127,7 @@ static int drc_index_to_cpu(u32 drc_index) >> value = of_prop_next_u32(info, NULL, _set_entries); >> if (!value) >> goto err_of_node_put; >> +value++; >> >> for (j = 0; j < num_set_entries; j++) { >> >> diff --git a/drivers/pci/hotplug/rpaphp_core.c >> b/drivers/pci/hotplug/rpaphp_core.c >> index fb5e084..dccdf62 100644 >> --- a/drivers/pci/hotplug/rpaphp_core.c >> +++ b/drivers/pci/hotplug/rpaphp_core.c >> @@ -239,6 +239,7 @@ static int rpaphp_check_drc_props_v2(struct device_node >> *dn, char *drc_name, >> value = of_prop_next_u32(info, NULL, ); >> if (!value) >> return -EINVAL; >> +value++; >> >> for (j = 0; j < entries; j++) { >> of_read_drc_info_cell(, , ); >> > > -- Michael W. Bringmann Linux Technology Center IBM Corporation Tie-Line 363-5196 External: (512) 286-5196 Cell: (512) 466-0650 m...@linux.vnet.ibm.com
Re: [RFC v4 2/3] powerpc migration/cpu: Associativity & cpu changes
On 05/18/2018 02:28 PM, Nathan Fontenot wrote: > On 05/17/2018 05:26 PM, Michael Bringmann wrote: >> powerpc migration/cpu: Now apply changes to the associativity of cpus >> for the topology of LPARS in Post Migration events. Recognize more >> changes to the associativity of memory blocks described by the >> 'cpu' properties when processing the topology of LPARS in Post Migration >> events. Previous efforts only recognized whether a memory block's >> assignment had changed in the property. Changes here include: >> >> * Provide hotplug CPU 'readd by index' operation >> * Checking for changes in cpu associativity and making 'readd' calls >> when differences are observed. >> * Queue up changes to CPU properties so that they may take place >> after all PowerPC device-tree changes have been applied i.e. after >> the device hotplug is released in the mobility code. > > This kinda feels like three different patches in one. any reason to not split > this into three patches? Perhaps at the least split the last item into it's > own patch. E.g. 1) Conditional 'acquire_drc' 2) PSERIES_HP_ELOG_ACTION_READD call to dlpar_cpu_readd_by_index 3) Add lock/unlock device hotplug > >> >> Signed-off-by: Michael Bringmann>> --- >> Changes include: >> -- Rearrange patches to co-locate CPU property-related changes. >> -- Modify dlpar_cpu_add & dlpar_cpu_remove to skip DRC index acquire >> or release operations during the CPU readd process. >> -- Correct a bug in DRC index selection for queued operation. >> -- Rebase to 4.17-rc5 kernel >> --- >> arch/powerpc/platforms/pseries/hotplug-cpu.c | 123 >> +++--- >> arch/powerpc/platforms/pseries/mobility.c|3 + >> 2 files changed, 95 insertions(+), 31 deletions(-) >> >> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c >> b/arch/powerpc/platforms/pseries/hotplug-cpu.c >> index a408217..23d4cb8 100644 >> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c >> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c >> @@ -474,7 +474,7 @@ static bool valid_cpu_drc_index(struct device_node >> *parent, u32 drc_index) >> ); >> } >> >> -static ssize_t dlpar_cpu_add(u32 drc_index) >> +static ssize_t dlpar_cpu_add(u32 drc_index, bool acquire_drc) >> { >> struct device_node *dn, *parent; >> int rc, saved_rc; >> @@ -499,19 +499,22 @@ static ssize_t dlpar_cpu_add(u32 drc_index) >> return -EINVAL; >> } >> >> -rc = dlpar_acquire_drc(drc_index); >> -if (rc) { >> -pr_warn("Failed to acquire DRC, rc: %d, drc index: %x\n", >> -rc, drc_index); >> -of_node_put(parent); >> -return -EINVAL; >> +if (acquire_drc) { >> +rc = dlpar_acquire_drc(drc_index); >> +if (rc) { >> +pr_warn("Failed to acquire DRC, rc: %d, drc index: >> %x\n", >> +rc, drc_index); >> +of_node_put(parent); >> +return -EINVAL; >> +} >> } >> >> dn = dlpar_configure_connector(cpu_to_be32(drc_index), parent); >> if (!dn) { >> pr_warn("Failed call to configure-connector, drc index: %x\n", >> drc_index); >> -dlpar_release_drc(drc_index); >> +if (acquire_drc) >> +dlpar_release_drc(drc_index); >> of_node_put(parent); >> return -EINVAL; >> } >> @@ -526,8 +529,9 @@ static ssize_t dlpar_cpu_add(u32 drc_index) >> pr_warn("Failed to attach node %s, rc: %d, drc index: %x\n", >> dn->name, rc, drc_index); >> >> -rc = dlpar_release_drc(drc_index); >> -if (!rc) >> +if (acquire_drc) >> +rc = dlpar_release_drc(drc_index); >> +if (!rc || acquire_drc) >> dlpar_free_cc_nodes(dn); > > This seems like it would be more readable if everything were inside the > if (acquire_drc) block. > >> >> return saved_rc; >> @@ -540,7 +544,7 @@ static ssize_t dlpar_cpu_add(u32 drc_index) >> dn->name, rc, drc_index); >> >> rc = dlpar_detach_node(dn); >> -if (!rc) >> +if (!rc && acquire_drc) >> dlpar_release_drc(drc_index); >> >> return saved_rc; >> @@ -608,12 +612,13 @@ static int dlpar_offline_cpu(struct device_node *dn) >> >> } >> >> -static ssize_t dlpar_cpu_remove(struct device_node *dn, u32 drc_index) >> +static ssize_t dlpar_cpu_remove(struct device_node *dn, u32 drc_index, >> +bool release_drc) >> { >> int rc; >> >> -pr_debug("Attempting to remove CPU %s, drc index: %x\n", >> - dn->name, drc_index); >> +pr_debug("Attempting to remove CPU %s, drc index: %x (%d)\n", >> + dn->name, drc_index, release_drc); >> >> rc =
Re: [RFC v4 1/3] powerpc migration/drmem: Modify DRMEM code to export more features
See below. On 05/18/2018 02:17 PM, Nathan Fontenot wrote: > On 05/17/2018 05:26 PM, Michael Bringmann wrote: >> powerpc migration/drmem: Export many of the functions of DRMEM to >> parse "ibm,dynamic-memory" and "ibm,dynamic-memory-v2" during >> hotplug operations and for Post Migration events. >> >> Also modify the DRMEM initialization code to allow it to, >> >> * Be called after system initialization >> * Provide a separate user copy of the LMB array that is produces >> * Free the user copy upon request >> >> Signed-off-by: Michael Bringmann>> --- >> Changes in RFC: >> -- Separate DRMEM changes into a standalone patch >> -- Do not export excess functions. Make exported names more explicit. >> -- Add new iterator to work through a pair of drmem_info arrays. >> -- Modify DRMEM code to replace usages of dt_root_addr_cells, and >> dt_mem_next_cell, as these are only available at first boot. >> -- Rebase to 4.17-rc5 kernel >> --- >> arch/powerpc/include/asm/drmem.h | 10 + >> arch/powerpc/mm/drmem.c | 78 >> +++--- >> 2 files changed, 66 insertions(+), 22 deletions(-) >> >> diff --git a/arch/powerpc/include/asm/drmem.h >> b/arch/powerpc/include/asm/drmem.h >> index ce242b9..c964b89 100644 >> --- a/arch/powerpc/include/asm/drmem.h >> +++ b/arch/powerpc/include/asm/drmem.h >> @@ -35,6 +35,13 @@ struct drmem_lmb_info { >> _info->lmbs[0], \ >> _info->lmbs[drmem_info->n_lmbs - 1]) >> >> +#define for_each_pair_drmem_lmb(dinfo1, lmb1, dinfo2, lmb2) \ >> +for ((lmb1) = (>lmbs[0]), \ >> + (lmb2) = (>lmbs[0]); \ >> + ((lmb1) <= (>lmbs[dinfo1->n_lmbs - 1])) && \ >> + ((lmb2) <= (>lmbs[dinfo2->n_lmbs - 1])); \ >> + (lmb1)++, (lmb2)++) >> + >> /* >> * The of_drconf_cell_v1 struct defines the layout of the LMB data >> * specified in the ibm,dynamic-memory device tree property. >> @@ -94,6 +101,9 @@ void __init walk_drmem_lmbs(struct device_node *dn, >> void (*func)(struct drmem_lmb *, const __be32 **)); >> int drmem_update_dt(void); >> >> +struct drmem_lmb_info* drmem_init_lmbs(struct property *prop); >> +void drmem_lmbs_free(struct drmem_lmb_info *dinfo); >> + >> #ifdef CONFIG_PPC_PSERIES >> void __init walk_drmem_lmbs_early(unsigned long node, >> void (*func)(struct drmem_lmb *, const __be32 **)); >> diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c >> index 3f18036..d9b281c 100644 >> --- a/arch/powerpc/mm/drmem.c >> +++ b/arch/powerpc/mm/drmem.c >> @@ -20,6 +20,7 @@ >> >> static struct drmem_lmb_info __drmem_info; >> struct drmem_lmb_info *drmem_info = &__drmem_info; >> +static int n_root_addr_cells; >> >> u64 drmem_lmb_memory_max(void) >> { >> @@ -193,12 +194,13 @@ int drmem_update_dt(void) >> return rc; >> } >> >> -static void __init read_drconf_v1_cell(struct drmem_lmb *lmb, >> +static void read_drconf_v1_cell(struct drmem_lmb *lmb, >> const __be32 **prop) >> { >> const __be32 *p = *prop; >> >> -lmb->base_addr = dt_mem_next_cell(dt_root_addr_cells, ); >> +lmb->base_addr = of_read_number(p, n_root_addr_cells); >> +p += n_root_addr_cells; >> lmb->drc_index = of_read_number(p++, 1); >> >> p++; /* skip reserved field */ >> @@ -209,7 +211,7 @@ static void __init read_drconf_v1_cell(struct drmem_lmb >> *lmb, >> *prop = p; >> } >> >> -static void __init __walk_drmem_v1_lmbs(const __be32 *prop, const __be32 >> *usm, >> +static void __walk_drmem_v1_lmbs(const __be32 *prop, const __be32 *data, >> void (*func)(struct drmem_lmb *, const __be32 **)) >> { >> struct drmem_lmb lmb; >> @@ -221,17 +223,18 @@ static void __init __walk_drmem_v1_lmbs(const __be32 >> *prop, const __be32 *usm, >> >> for (i = 0; i < n_lmbs; i++) { >> read_drconf_v1_cell(, ); >> -func(, ); >> +func(, ); > > Is there a need to change the variable name from usm to data (bot here and > below)? Actually, this was your recommendation to me when we talked about updating this code a couple of months ago. Before we changed the code to create a copy of the DRMEM lmbs and compare it. I will change it back. > >> } >> } >> >> -static void __init read_drconf_v2_cell(struct of_drconf_cell_v2 *dr_cell, >> +static void read_drconf_v2_cell(struct of_drconf_cell_v2 *dr_cell, >> const __be32 **prop) >> { >> const __be32 *p = *prop; >> >> dr_cell->seq_lmbs = of_read_number(p++, 1); >> -dr_cell->base_addr = dt_mem_next_cell(dt_root_addr_cells, ); >> +dr_cell->base_addr = of_read_number(p, n_root_addr_cells); >> +p += n_root_addr_cells; >> dr_cell->drc_index = of_read_number(p++, 1); >> dr_cell->aa_index = of_read_number(p++, 1); >>
Re: [RFC v3 1/4] powerpc/hotplug/drcinfo: Fix bugs parsing ibm,drc-info structs
On 05/17/2018 05:41 PM, Michael Bringmann wrote: > [Replace/withdraw previous patch submission to ensure that testing > of related patches on similar hardware progresses together.] > > This patch fixes a memory parsing bug when using of_prop_next_u32 > calls at the start of a structure. Depending upon the value of > "cur" memory pointer argument to of_prop_next_u32, it will or it > won't advance the value of the returned memory pointer by the > size of one u32. This patch corrects the code to deal with that > indexing feature when parsing the ibm,drc-info structs for CPUs. > Also, need to advance the pointer at the end of_read_drc_info_cell > for same reason. > I see that you provide an update for of_read_drc_info_cell to fix the unexpected behavior you're seeing, but I'm not sure why you're updating the code in pseries_energy.c and rpaphp_core.c? can you provide some additional information as to why these functions also need to be updated. > Signed-off-by: Michael Bringmann> Fixes: 3f38000eda48 ("powerpc/firmware: Add definitions for new drc-info > firmware feature" -- end of patch series applied to powerpc next) > --- > Changes in V3: > -- Rebased patch to 4.17-rc5 kernel > --- > arch/powerpc/platforms/pseries/of_helpers.c |5 ++--- > arch/powerpc/platforms/pseries/pseries_energy.c |2 ++ > drivers/pci/hotplug/rpaphp_core.c |1 + > 3 files changed, 5 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/platforms/pseries/of_helpers.c > b/arch/powerpc/platforms/pseries/of_helpers.c > index 6df192f..20598b2 100644 > --- a/arch/powerpc/platforms/pseries/of_helpers.c > +++ b/arch/powerpc/platforms/pseries/of_helpers.c > @@ -65,9 +65,7 @@ int of_read_drc_info_cell(struct property **prop, const > __be32 **curval, > > /* Get drc-index-start:encode-int */ > p2 = (const __be32 *)p; > - p2 = of_prop_next_u32(*prop, p2, >drc_index_start); > - if (!p2) > - return -EINVAL; > + data->drc_index_start = of_read_number(p2, 1); This appears to resolve advancing the pointer for the beginning of a struct. > > /* Get drc-name-suffix-start:encode-int */ > p2 = of_prop_next_u32(*prop, p2, >drc_name_suffix_start); > @@ -88,6 +86,7 @@ int of_read_drc_info_cell(struct property **prop, const > __be32 **curval, > p2 = of_prop_next_u32(*prop, p2, >drc_power_domain); > if (!p2) > return -EINVAL; > + p2++; ...but why is the advancement needed here? of_prop_next_u32 should have advanced it, correct? -Nathan > > /* Should now know end of current entry */ > (*curval) = (void *)p2; > diff --git a/arch/powerpc/platforms/pseries/pseries_energy.c > b/arch/powerpc/platforms/pseries/pseries_energy.c > index 6ed2212..c7d84aa 100644 > --- a/arch/powerpc/platforms/pseries/pseries_energy.c > +++ b/arch/powerpc/platforms/pseries/pseries_energy.c > @@ -64,6 +64,7 @@ static u32 cpu_to_drc_index(int cpu) > value = of_prop_next_u32(info, NULL, _set_entries); > if (!value) > goto err_of_node_put; > + value++; > > for (j = 0; j < num_set_entries; j++) { > > @@ -126,6 +127,7 @@ static int drc_index_to_cpu(u32 drc_index) > value = of_prop_next_u32(info, NULL, _set_entries); > if (!value) > goto err_of_node_put; > + value++; > > for (j = 0; j < num_set_entries; j++) { > > diff --git a/drivers/pci/hotplug/rpaphp_core.c > b/drivers/pci/hotplug/rpaphp_core.c > index fb5e084..dccdf62 100644 > --- a/drivers/pci/hotplug/rpaphp_core.c > +++ b/drivers/pci/hotplug/rpaphp_core.c > @@ -239,6 +239,7 @@ static int rpaphp_check_drc_props_v2(struct device_node > *dn, char *drc_name, > value = of_prop_next_u32(info, NULL, ); > if (!value) > return -EINVAL; > + value++; > > for (j = 0; j < entries; j++) { > of_read_drc_info_cell(, , ); >
Re: [PATCH bpf v2 5/6] tools: bpftool: resolve calls without using imm field
On Fri, 18 May 2018 18:20:38 +0530, Sandipan Das wrote: > Currently, we resolve the callee's address for a JITed function > call by using the imm field of the call instruction as an offset > from __bpf_call_base. If bpf_jit_kallsyms is enabled, we further > use this address to get the callee's kernel symbol's name. > > For some architectures, such as powerpc64, the imm field is not > large enough to hold this offset. So, instead of assigning this > offset to the imm field, the verifier now assigns the subprog > id. Also, a list of kernel symbol addresses for all the JITed > functions is provided in the program info. We now use the imm > field as an index for this list to lookup a callee's symbol's > address and resolve its name. > > Suggested-by: Daniel Borkmann> Signed-off-by: Sandipan Das > --- > v2: > - Order variables from longest to shortest > - Make sure that ksyms_ptr and ksyms_len are always initialized > - Simplify code Thanks for the improvements! Since there will be v3 two minor nit picks still :) > tools/bpf/bpftool/prog.c | 29 + > tools/bpf/bpftool/xlated_dumper.c | 10 +- > tools/bpf/bpftool/xlated_dumper.h | 2 ++ > 3 files changed, 40 insertions(+), 1 deletion(-) > > diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c > index 9bdfdf2d3fbe..e2f8f8f259fc 100644 > --- a/tools/bpf/bpftool/prog.c > +++ b/tools/bpf/bpftool/prog.c > @@ -421,19 +421,26 @@ static int do_show(int argc, char **argv) > static int do_dump(int argc, char **argv) > { > struct bpf_prog_info info = {}; > + unsigned long *addrs = NULL; > struct dump_data dd = {}; > __u32 len = sizeof(info); > unsigned int buf_size; > + unsigned int nr_addrs; > char *filepath = NULL; > bool opcodes = false; > bool visual = false; > unsigned char *buf; > __u32 *member_len; > __u64 *member_ptr; > + __u32 *ksyms_len; > + __u64 *ksyms_ptr; > ssize_t n; > int err; > int fd; > > + ksyms_len = _jited_ksyms; > + ksyms_ptr = _ksyms; I'm not sure why you need these, why not just access info.nr_jited_ksyms and info.jited_ksyms directly? "member" variables are there because jited and xlated images get returned in different member of struct bpf_prog_info. > if (is_prefix(*argv, "jited")) { > member_len = _prog_len; > member_ptr = _prog_insns; > @@ -496,10 +503,22 @@ static int do_dump(int argc, char **argv) > return -1; > } > > + nr_addrs = *ksyms_len; > + if (nr_addrs) { > + addrs = malloc(nr_addrs * sizeof(__u64)); > + if (!addrs) { > + p_err("mem alloc failed"); > + close(fd); > + goto err_free; > + } > + } > + > memset(, 0, sizeof(info)); > > *member_ptr = ptr_to_u64(buf); > *member_len = buf_size; > + *ksyms_ptr = ptr_to_u64(addrs); > + *ksyms_len = nr_addrs; > > err = bpf_obj_get_info_by_fd(fd, , ); > close(fd); > @@ -513,6 +532,11 @@ static int do_dump(int argc, char **argv) > goto err_free; > } > > + if (*ksyms_len > nr_addrs) { > + p_err("too many addresses returned"); > + goto err_free; > + } > + > if ((member_len == _prog_len && >info.jited_prog_insns == 0) || > (member_len == _prog_len && > @@ -558,6 +582,9 @@ static int do_dump(int argc, char **argv) > dump_xlated_cfg(buf, *member_len); > } else { > kernel_syms_load(); > + dd.nr_jited_ksyms = *ksyms_len; > + dd.jited_ksyms = (__u64 *) *ksyms_ptr; > + > if (json_output) > dump_xlated_json(, buf, *member_len, opcodes); > else > diff --git a/tools/bpf/bpftool/xlated_dumper.c > b/tools/bpf/bpftool/xlated_dumper.c > index 7a3173b76c16..fb065b55db6d 100644 > --- a/tools/bpf/bpftool/xlated_dumper.c > +++ b/tools/bpf/bpftool/xlated_dumper.c > @@ -203,6 +207,10 @@ static const char *print_call(void *private_data, > unsigned long address = dd->address_call_base + insn->imm; > struct kernel_sym *sym; > > + if (insn->src_reg == BPF_PSEUDO_CALL && > + (__u32) insn->imm < dd->nr_jited_ksyms) Indentation seems off. > + address = dd->jited_ksyms[insn->imm]; > + > sym = kernel_syms_search(dd, address); > if (insn->src_reg == BPF_PSEUDO_CALL) > return print_call_pcrel(dd, sym, address, insn);
Re: pkeys on POWER: Default AMR, UAMOR values
On Fri, May 18, 2018 at 10:45 AM Ram Paiwrote: > On Fri, May 18, 2018 at 03:17:14PM +0200, Florian Weimer wrote: > > I'm working on adding POWER pkeys support to glibc. The coding work > > is done, but I'm faced with some test suite failures. > > > > Unlike the default x86 configuration, on POWER, existing threads > > have full access to newly allocated keys. > > > > Or, more precisely, in this scenario: > > > > * Thread A launches thread B > > * Thread B waits > > * Thread A allocations a protection key with pkey_alloc > > * Thread A applies the key to a page > > * Thread A signals thread B > > * Thread B starts to run and accesses the page > > > > Then at the end, the access will be granted. > > > > I hope it's not too late to change this to denied access. > > > > Furthermore, I think the UAMOR value is wrong as well because it > > prevents thread B at the end to set the AMR register. In > > particular, if I do this > > > > * … (as before) > > * Thread A signals thread B > > * Thread B sets the access rights for the key to PKEY_DISABLE_ACCESS > > * Thread B reads the current access rights for the key > > > > then it still gets 0 (all access permitted) because the original > > UAMOR value inherited from thread A prior to the key allocation > > masks out the access right update for the newly allocated key. > Florian, is the behavior on x86 any different? A key allocated in the > context off one thread is not meaningful in the context of any other > thread. The difference is that x86 starts out with deny-all instead of allow-all. The POWER semantics make it very hard for a multithreaded program to meaningfully use protection keys to prevent accidental access to important memory.
Re: [RFC v4 2/3] powerpc migration/cpu: Associativity & cpu changes
On 05/17/2018 05:26 PM, Michael Bringmann wrote: > powerpc migration/cpu: Now apply changes to the associativity of cpus > for the topology of LPARS in Post Migration events. Recognize more > changes to the associativity of memory blocks described by the > 'cpu' properties when processing the topology of LPARS in Post Migration > events. Previous efforts only recognized whether a memory block's > assignment had changed in the property. Changes here include: > > * Provide hotplug CPU 'readd by index' operation > * Checking for changes in cpu associativity and making 'readd' calls > when differences are observed. > * Queue up changes to CPU properties so that they may take place > after all PowerPC device-tree changes have been applied i.e. after > the device hotplug is released in the mobility code. This kinda feels like three different patches in one. any reason to not split this into three patches? Perhaps at the least split the last item into it's own patch. > > Signed-off-by: Michael Bringmann> --- > Changes include: > -- Rearrange patches to co-locate CPU property-related changes. > -- Modify dlpar_cpu_add & dlpar_cpu_remove to skip DRC index acquire > or release operations during the CPU readd process. > -- Correct a bug in DRC index selection for queued operation. > -- Rebase to 4.17-rc5 kernel > --- > arch/powerpc/platforms/pseries/hotplug-cpu.c | 123 > +++--- > arch/powerpc/platforms/pseries/mobility.c|3 + > 2 files changed, 95 insertions(+), 31 deletions(-) > > diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c > b/arch/powerpc/platforms/pseries/hotplug-cpu.c > index a408217..23d4cb8 100644 > --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c > +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c > @@ -474,7 +474,7 @@ static bool valid_cpu_drc_index(struct device_node > *parent, u32 drc_index) > ); > } > > -static ssize_t dlpar_cpu_add(u32 drc_index) > +static ssize_t dlpar_cpu_add(u32 drc_index, bool acquire_drc) > { > struct device_node *dn, *parent; > int rc, saved_rc; > @@ -499,19 +499,22 @@ static ssize_t dlpar_cpu_add(u32 drc_index) > return -EINVAL; > } > > - rc = dlpar_acquire_drc(drc_index); > - if (rc) { > - pr_warn("Failed to acquire DRC, rc: %d, drc index: %x\n", > - rc, drc_index); > - of_node_put(parent); > - return -EINVAL; > + if (acquire_drc) { > + rc = dlpar_acquire_drc(drc_index); > + if (rc) { > + pr_warn("Failed to acquire DRC, rc: %d, drc index: > %x\n", > + rc, drc_index); > + of_node_put(parent); > + return -EINVAL; > + } > } > > dn = dlpar_configure_connector(cpu_to_be32(drc_index), parent); > if (!dn) { > pr_warn("Failed call to configure-connector, drc index: %x\n", > drc_index); > - dlpar_release_drc(drc_index); > + if (acquire_drc) > + dlpar_release_drc(drc_index); > of_node_put(parent); > return -EINVAL; > } > @@ -526,8 +529,9 @@ static ssize_t dlpar_cpu_add(u32 drc_index) > pr_warn("Failed to attach node %s, rc: %d, drc index: %x\n", > dn->name, rc, drc_index); > > - rc = dlpar_release_drc(drc_index); > - if (!rc) > + if (acquire_drc) > + rc = dlpar_release_drc(drc_index); > + if (!rc || acquire_drc) > dlpar_free_cc_nodes(dn); This seems like it would be more readable if everything were inside the if (acquire_drc) block. > > return saved_rc; > @@ -540,7 +544,7 @@ static ssize_t dlpar_cpu_add(u32 drc_index) > dn->name, rc, drc_index); > > rc = dlpar_detach_node(dn); > - if (!rc) > + if (!rc && acquire_drc) > dlpar_release_drc(drc_index); > > return saved_rc; > @@ -608,12 +612,13 @@ static int dlpar_offline_cpu(struct device_node *dn) > > } > > -static ssize_t dlpar_cpu_remove(struct device_node *dn, u32 drc_index) > +static ssize_t dlpar_cpu_remove(struct device_node *dn, u32 drc_index, > + bool release_drc) > { > int rc; > > - pr_debug("Attempting to remove CPU %s, drc index: %x\n", > - dn->name, drc_index); > + pr_debug("Attempting to remove CPU %s, drc index: %x (%d)\n", > + dn->name, drc_index, release_drc); > > rc = dlpar_offline_cpu(dn); > if (rc) { > @@ -621,12 +626,14 @@ static ssize_t dlpar_cpu_remove(struct device_node *dn, > u32 drc_index) > return -EINVAL; > } > > - rc = dlpar_release_drc(drc_index); > - if (rc) { > -
Re: [RFC v4 1/3] powerpc migration/drmem: Modify DRMEM code to export more features
On 05/17/2018 05:26 PM, Michael Bringmann wrote: > powerpc migration/drmem: Export many of the functions of DRMEM to > parse "ibm,dynamic-memory" and "ibm,dynamic-memory-v2" during > hotplug operations and for Post Migration events. > > Also modify the DRMEM initialization code to allow it to, > > * Be called after system initialization > * Provide a separate user copy of the LMB array that is produces > * Free the user copy upon request > > Signed-off-by: Michael Bringmann> --- > Changes in RFC: > -- Separate DRMEM changes into a standalone patch > -- Do not export excess functions. Make exported names more explicit. > -- Add new iterator to work through a pair of drmem_info arrays. > -- Modify DRMEM code to replace usages of dt_root_addr_cells, and > dt_mem_next_cell, as these are only available at first boot. > -- Rebase to 4.17-rc5 kernel > --- > arch/powerpc/include/asm/drmem.h | 10 + > arch/powerpc/mm/drmem.c | 78 > +++--- > 2 files changed, 66 insertions(+), 22 deletions(-) > > diff --git a/arch/powerpc/include/asm/drmem.h > b/arch/powerpc/include/asm/drmem.h > index ce242b9..c964b89 100644 > --- a/arch/powerpc/include/asm/drmem.h > +++ b/arch/powerpc/include/asm/drmem.h > @@ -35,6 +35,13 @@ struct drmem_lmb_info { > _info->lmbs[0], \ > _info->lmbs[drmem_info->n_lmbs - 1]) > > +#define for_each_pair_drmem_lmb(dinfo1, lmb1, dinfo2, lmb2) \ > + for ((lmb1) = (>lmbs[0]), \ > + (lmb2) = (>lmbs[0]); \ > + ((lmb1) <= (>lmbs[dinfo1->n_lmbs - 1])) && \ > + ((lmb2) <= (>lmbs[dinfo2->n_lmbs - 1]));\ > + (lmb1)++, (lmb2)++) > + > /* > * The of_drconf_cell_v1 struct defines the layout of the LMB data > * specified in the ibm,dynamic-memory device tree property. > @@ -94,6 +101,9 @@ void __init walk_drmem_lmbs(struct device_node *dn, > void (*func)(struct drmem_lmb *, const __be32 **)); > int drmem_update_dt(void); > > +struct drmem_lmb_info* drmem_init_lmbs(struct property *prop); > +void drmem_lmbs_free(struct drmem_lmb_info *dinfo); > + > #ifdef CONFIG_PPC_PSERIES > void __init walk_drmem_lmbs_early(unsigned long node, > void (*func)(struct drmem_lmb *, const __be32 **)); > diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c > index 3f18036..d9b281c 100644 > --- a/arch/powerpc/mm/drmem.c > +++ b/arch/powerpc/mm/drmem.c > @@ -20,6 +20,7 @@ > > static struct drmem_lmb_info __drmem_info; > struct drmem_lmb_info *drmem_info = &__drmem_info; > +static int n_root_addr_cells; > > u64 drmem_lmb_memory_max(void) > { > @@ -193,12 +194,13 @@ int drmem_update_dt(void) > return rc; > } > > -static void __init read_drconf_v1_cell(struct drmem_lmb *lmb, > +static void read_drconf_v1_cell(struct drmem_lmb *lmb, > const __be32 **prop) > { > const __be32 *p = *prop; > > - lmb->base_addr = dt_mem_next_cell(dt_root_addr_cells, ); > + lmb->base_addr = of_read_number(p, n_root_addr_cells); > + p += n_root_addr_cells; > lmb->drc_index = of_read_number(p++, 1); > > p++; /* skip reserved field */ > @@ -209,7 +211,7 @@ static void __init read_drconf_v1_cell(struct drmem_lmb > *lmb, > *prop = p; > } > > -static void __init __walk_drmem_v1_lmbs(const __be32 *prop, const __be32 > *usm, > +static void __walk_drmem_v1_lmbs(const __be32 *prop, const __be32 *data, > void (*func)(struct drmem_lmb *, const __be32 **)) > { > struct drmem_lmb lmb; > @@ -221,17 +223,18 @@ static void __init __walk_drmem_v1_lmbs(const __be32 > *prop, const __be32 *usm, > > for (i = 0; i < n_lmbs; i++) { > read_drconf_v1_cell(, ); > - func(, ); > + func(, ); Is there a need to change the variable name from usm to data (bot here and below)? > } > } > > -static void __init read_drconf_v2_cell(struct of_drconf_cell_v2 *dr_cell, > +static void read_drconf_v2_cell(struct of_drconf_cell_v2 *dr_cell, > const __be32 **prop) > { > const __be32 *p = *prop; > > dr_cell->seq_lmbs = of_read_number(p++, 1); > - dr_cell->base_addr = dt_mem_next_cell(dt_root_addr_cells, ); > + dr_cell->base_addr = of_read_number(p, n_root_addr_cells); > + p += n_root_addr_cells; > dr_cell->drc_index = of_read_number(p++, 1); > dr_cell->aa_index = of_read_number(p++, 1); > dr_cell->flags = of_read_number(p++, 1); > @@ -239,7 +242,7 @@ static void __init read_drconf_v2_cell(struct > of_drconf_cell_v2 *dr_cell, > *prop = p; > } > > -static void __init __walk_drmem_v2_lmbs(const __be32 *prop, const __be32 > *usm, > +static void __walk_drmem_v2_lmbs(const __be32 *prop, const __be32 *data, > void
Re: [PATCH] crypto: reorder paes test lexicographically
On Fri, May 11, 2018 at 09:04:06AM +0100, Gilad Ben-Yossef wrote: > Due to a snafu "paes" testmgr tests were not ordered > lexicographically, which led to boot time warnings. > Reorder the tests as needed. > > Fixes: a794d8d ("crypto: ccree - enable support for hardware keys") > Reported-by: Abdul Haleem> Signed-off-by: Gilad Ben-Yossef Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] crypto: nx: fix spelling mistake: "seqeunce" -> "sequence"
On Wed, May 09, 2018 at 10:16:36AM +0100, Colin King wrote: > From: Colin Ian King> > Trivial fix to spelling mistake in CSB_ERR error message text > > Signed-off-by: Colin Ian King Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH 07/14] powerpc: Add support for restartable sequences
- On May 17, 2018, at 7:50 PM, Boqun Feng boqun.f...@gmail.com wrote: [...] >> > I think you're right. So we have to introduce callsite to rseq_syscall() >> > in syscall path, something like: >> > >> > diff --git a/arch/powerpc/kernel/entry_64.S >> > b/arch/powerpc/kernel/entry_64.S >> > index 51695608c68b..a25734a96640 100644 >> > --- a/arch/powerpc/kernel/entry_64.S >> > +++ b/arch/powerpc/kernel/entry_64.S >> > @@ -222,6 +222,9 @@ system_call_exit: >> >mtmsrd r11,1 >> > #endif /* CONFIG_PPC_BOOK3E */ >> > >> > + addir3,r1,STACK_FRAME_OVERHEAD >> > + bl rseq_syscall >> > + >> >ld r9,TI_FLAGS(r12) >> >li r11,-MAX_ERRNO >> >andi. >> > >> > r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) >> > By the way, I think this is not the right spot to call rseq_syscall, because interrupts are disabled. I think we should move this hunk right after system_call_exit. Would you like to implement and test an updated patch adding those calls for ppc 32 and 64 ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: pkeys on POWER: Default AMR, UAMOR values
On Fri, May 18, 2018 at 03:17:14PM +0200, Florian Weimer wrote: > I'm working on adding POWER pkeys support to glibc. The coding work > is done, but I'm faced with some test suite failures. > > Unlike the default x86 configuration, on POWER, existing threads > have full access to newly allocated keys. > > Or, more precisely, in this scenario: > > * Thread A launches thread B > * Thread B waits > * Thread A allocations a protection key with pkey_alloc > * Thread A applies the key to a page > * Thread A signals thread B > * Thread B starts to run and accesses the page > > Then at the end, the access will be granted. > > I hope it's not too late to change this to denied access. > > Furthermore, I think the UAMOR value is wrong as well because it > prevents thread B at the end to set the AMR register. In > particular, if I do this > > * … (as before) > * Thread A signals thread B > * Thread B sets the access rights for the key to PKEY_DISABLE_ACCESS > * Thread B reads the current access rights for the key > > then it still gets 0 (all access permitted) because the original > UAMOR value inherited from thread A prior to the key allocation > masks out the access right update for the newly allocated key. Florian, is the behavior on x86 any different? A key allocated in the context off one thread is not meaningful in the context of any other thread. Since thread B was created prior to the creation of the key, and the key was created in the context of thread A, thread B neither inherits the key nor its permissions. Atleast that is how the semantics are supposed to work as per the man page. man 7 pkey " Applications using threads and protection keys should be especially careful. Threads inherit the protection key rights of the parent at the time of the clone(2), system call. Applications should either ensure that their own permissions are appropriate for child threads at the time when clone(2) is called, or ensure that each child thread can perform its own initialization of protection key rights." RP
Re: [PATCH bpf v2 1/6] bpf: support 64-bit offsets for bpf function calls
On 05/18/2018 08:45 PM, Daniel Borkmann wrote: > On 05/18/2018 02:50 PM, Sandipan Das wrote: >> The imm field of a bpf instruction is a signed 32-bit integer. >> For JIT bpf-to-bpf function calls, it stores the offset of the >> start address of the callee's JITed image from __bpf_call_base. >> >> For some architectures, such as powerpc64, this offset may be >> as large as 64 bits and cannot be accomodated in the imm field >> without truncation. >> >> We resolve this by: >> >> [1] Additionally using the auxillary data of each function to >> keep a list of start addresses of the JITed images for all >> functions determined by the verifier. >> >> [2] Retaining the subprog id inside the off field of the call >> instructions and using it to index into the list mentioned >> above and lookup the callee's address. >> >> To make sure that the existing JIT compilers continue to work >> without requiring changes, we keep the imm field as it is. >> >> Signed-off-by: Sandipan Das>> --- >> kernel/bpf/verifier.c | 15 ++- >> 1 file changed, 14 insertions(+), 1 deletion(-) >> >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c >> index a9e4b1372da6..6c56cce9c4e3 100644 >> --- a/kernel/bpf/verifier.c >> +++ b/kernel/bpf/verifier.c >> @@ -5383,11 +5383,24 @@ static int jit_subprogs(struct bpf_verifier_env *env) >> insn->src_reg != BPF_PSEUDO_CALL) >> continue; >> subprog = insn->off; >> -insn->off = 0; >> insn->imm = (u64 (*)(u64, u64, u64, u64, u64)) >> func[subprog]->bpf_func - >> __bpf_call_base; >> } >> + >> +/* we use the aux data to keep a list of the start addresses >> + * of the JITed images for each function in the program >> + * >> + * for some architectures, such as powerpc64, the imm field >> + * might not be large enough to hold the offset of the start >> + * address of the callee's JITed image from __bpf_call_base >> + * >> + * in such cases, we can lookup the start address of a callee >> + * by using its subprog id, available from the off field of >> + * the call instruction, as an index for this list >> + */ >> +func[i]->aux->func = func; >> +func[i]->aux->func_cnt = env->subprog_cnt + 1; > > The target tree you have here is infact bpf, since in bpf-next there was a > cleanup where the + 1 is removed. Just for the record that we need to keep > this in mind for bpf into bpf-next merge since this would otherwise subtly > break. > Sorry about the wrong tag. This series is indeed based off bpf-next. - Sandipan >> } >> for (i = 0; i < env->subprog_cnt; i++) { >> old_bpf_func = func[i]->bpf_func; >> > >
Re: [PATCH bpf v2 4/6] tools: bpf: sync bpf uapi header
On Fri, May 18, 2018 at 5:50 AM, Sandipan Daswrote: > Syncing the bpf.h uapi header with tools so that struct > bpf_prog_info has the two new fields for passing on the > addresses of the kernel symbols corresponding to each > function in a JITed program. > > Signed-off-by: Sandipan Das > --- > tools/include/uapi/linux/bpf.h | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h > index d94d333a8225..040c9cac7303 100644 > --- a/tools/include/uapi/linux/bpf.h > +++ b/tools/include/uapi/linux/bpf.h > @@ -2188,6 +2188,8 @@ struct bpf_prog_info { > __u32 xlated_prog_len; > __aligned_u64 jited_prog_insns; > __aligned_u64 xlated_prog_insns; > + __aligned_u64 jited_ksyms; > + __u32 nr_jited_ksyms; > __u64 load_time;/* ns since boottime */ > __u32 created_by_uid; > __u32 nr_map_ids; this breaks uapi. New fields can only be added to the end.
Re: [PATCH bpf v2 2/6] bpf: powerpc64: add JIT support for multi-function programs
On 05/18/2018 06:05 PM, Naveen N. Rao wrote: > Daniel Borkmann wrote: >> On 05/18/2018 02:50 PM, Sandipan Das wrote: >>> This adds support for bpf-to-bpf function calls in the powerpc64 >>> JIT compiler. The JIT compiler converts the bpf call instructions >>> to native branch instructions. After a round of the usual passes, >>> the start addresses of the JITed images for the callee functions >>> are known. Finally, to fixup the branch target addresses, we need >>> to perform an extra pass. >>> >>> Because of the address range in which JITed images are allocated >>> on powerpc64, the offsets of the start addresses of these images >>> from __bpf_call_base are as large as 64 bits. So, for a function >>> call, we cannot use the imm field of the instruction to determine >>> the callee's address. Instead, we use the alternative method of >>> getting it from the list of function addresses in the auxillary >>> data of the caller by using the off field as an index. >>> >>> Signed-off-by: Sandipan Das>>> --- >>> arch/powerpc/net/bpf_jit_comp64.c | 79 >>> ++- >>> 1 file changed, 69 insertions(+), 10 deletions(-) >>> >>> diff --git a/arch/powerpc/net/bpf_jit_comp64.c >>> b/arch/powerpc/net/bpf_jit_comp64.c >>> index 1bdb1aff0619..25939892d8f7 100644 >>> --- a/arch/powerpc/net/bpf_jit_comp64.c >>> +++ b/arch/powerpc/net/bpf_jit_comp64.c >>> @@ -256,7 +256,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct >>> codegen_context *ctx, u32 >>> /* Assemble the body code between the prologue & epilogue */ >>> static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, >>> struct codegen_context *ctx, >>> - u32 *addrs) >>> + u32 *addrs, bool extra_pass) >>> { >>> const struct bpf_insn *insn = fp->insnsi; >>> int flen = fp->len; >>> @@ -712,11 +712,23 @@ static int bpf_jit_build_body(struct bpf_prog *fp, >>> u32 *image, >>> break; >>> >>> /* >>> - * Call kernel helper >>> + * Call kernel helper or bpf function >>> */ >>> case BPF_JMP | BPF_CALL: >>> ctx->seen |= SEEN_FUNC; >>> - func = (u8 *) __bpf_call_base + imm; >>> + >>> + /* bpf function call */ >>> + if (insn[i].src_reg == BPF_PSEUDO_CALL && extra_pass) >> >> Perhaps it might make sense here for !extra_pass to set func to some dummy >> address as otherwise the 'kernel helper call' branch used for this is a bit >> misleading in that sense. The PPC_LI64() used in bpf_jit_emit_func_call() >> optimizes the immediate addr, I presume the JIT can handle situations where >> in the final extra_pass the image needs to grow/shrink again (due to >> different >> final address for the call)? > > That's a good catch. We don't handle that -- we expect to get the size right > on first pass. We could probably have PPC_FUNC_ADDR() pad the result with > nops to make it a constant 5-instruction sequence. Yeah, arm64 does something similar by not optimizing the imm in order to always emit 4 insns for it. Thanks, Daniel
Re: [PATCH bpf v2 2/6] bpf: powerpc64: add JIT support for multi-function programs
Daniel Borkmann wrote: On 05/18/2018 02:50 PM, Sandipan Das wrote: This adds support for bpf-to-bpf function calls in the powerpc64 JIT compiler. The JIT compiler converts the bpf call instructions to native branch instructions. After a round of the usual passes, the start addresses of the JITed images for the callee functions are known. Finally, to fixup the branch target addresses, we need to perform an extra pass. Because of the address range in which JITed images are allocated on powerpc64, the offsets of the start addresses of these images from __bpf_call_base are as large as 64 bits. So, for a function call, we cannot use the imm field of the instruction to determine the callee's address. Instead, we use the alternative method of getting it from the list of function addresses in the auxillary data of the caller by using the off field as an index. Signed-off-by: Sandipan Das--- arch/powerpc/net/bpf_jit_comp64.c | 79 ++- 1 file changed, 69 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c index 1bdb1aff0619..25939892d8f7 100644 --- a/arch/powerpc/net/bpf_jit_comp64.c +++ b/arch/powerpc/net/bpf_jit_comp64.c @@ -256,7 +256,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 /* Assemble the body code between the prologue & epilogue */ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *ctx, - u32 *addrs) + u32 *addrs, bool extra_pass) { const struct bpf_insn *insn = fp->insnsi; int flen = fp->len; @@ -712,11 +712,23 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, break; /* -* Call kernel helper +* Call kernel helper or bpf function */ case BPF_JMP | BPF_CALL: ctx->seen |= SEEN_FUNC; - func = (u8 *) __bpf_call_base + imm; + + /* bpf function call */ + if (insn[i].src_reg == BPF_PSEUDO_CALL && extra_pass) Perhaps it might make sense here for !extra_pass to set func to some dummy address as otherwise the 'kernel helper call' branch used for this is a bit misleading in that sense. The PPC_LI64() used in bpf_jit_emit_func_call() optimizes the immediate addr, I presume the JIT can handle situations where in the final extra_pass the image needs to grow/shrink again (due to different final address for the call)? That's a good catch. We don't handle that -- we expect to get the size right on first pass. We could probably have PPC_FUNC_ADDR() pad the result with nops to make it a constant 5-instruction sequence. + if (fp->aux->func && off < fp->aux->func_cnt) + /* use the subprog id from the off +* field to lookup the callee address +*/ + func = (u8 *) fp->aux->func[off]->bpf_func; + else + return -EINVAL; + /* kernel helper call */ + else + func = (u8 *) __bpf_call_base + imm; bpf_jit_emit_func_call(image, ctx, (u64)func); @@ -864,6 +876,14 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, return 0; } +struct powerpc64_jit_data { + struct bpf_binary_header *header; + u32 *addrs; + u8 *image; + u32 proglen; + struct codegen_context ctx; +}; + struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) { u32 proglen; @@ -871,6 +891,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) u8 *image = NULL; u32 *code_base; u32 *addrs; + struct powerpc64_jit_data *jit_data; struct codegen_context cgctx; int pass; int flen; @@ -878,6 +899,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) struct bpf_prog *org_fp = fp; struct bpf_prog *tmp_fp; bool bpf_blinded = false; + bool extra_pass = false; if (!fp->jit_requested) return org_fp; @@ -891,7 +913,28 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) fp = tmp_fp; } + jit_data = fp->aux->jit_data; + if (!jit_data) { + jit_data = kzalloc(sizeof(*jit_data), GFP_KERNEL); + if (!jit_data) { + fp = org_fp; + goto out; + } + fp->aux->jit_data = jit_data; + } + flen = fp->len; + addrs = jit_data->addrs; + if (addrs) { + cgctx = jit_data->ctx; +
Re: [PATCH bpf v2 6/6] bpf: fix JITed dump for multi-function programs via syscall
On 05/18/2018 02:50 PM, Sandipan Das wrote: > Currently, for multi-function programs, we cannot get the JITed > instructions using the bpf system call's BPF_OBJ_GET_INFO_BY_FD > command. Because of this, userspace tools such as bpftool fail > to identify a multi-function program as being JITed or not. > > With the JIT enabled and the test program running, this can be > verified as follows: > > # cat /proc/sys/net/core/bpf_jit_enable > 1 > > Before applying this patch: > > # bpftool prog list > 1: kprobe name foo tag b811aab41a39ad3d gpl > loaded_at 2018-05-16T11:43:38+0530 uid 0 > xlated 216B not jited memlock 65536B > ... > > # bpftool prog dump jited id 1 > no instructions returned > > After applying this patch: > > # bpftool prog list > 1: kprobe name foo tag b811aab41a39ad3d gpl > loaded_at 2018-05-16T12:13:01+0530 uid 0 > xlated 216B jited 308B memlock 65536B > ... That's really nice! One comment inline below: > # bpftool prog dump jited id 1 > 0: nop > 4: nop > 8: mflrr0 > c: std r0,16(r1) > 10: stdur1,-112(r1) > 14: std r31,104(r1) > 18: addir31,r1,48 > 1c: li r3,10 > ... > > Signed-off-by: Sandipan Das> --- > kernel/bpf/syscall.c | 38 -- > 1 file changed, 32 insertions(+), 6 deletions(-) > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index 54a72fafe57c..2430d159078c 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -1896,7 +1896,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog > *prog, > struct bpf_prog_info info = {}; > u32 info_len = attr->info.info_len; > char __user *uinsns; > - u32 ulen; > + u32 ulen, i; > int err; > > err = check_uarg_tail_zero(uinfo, sizeof(info), info_len); > @@ -1922,7 +1922,6 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog > *prog, > ulen = min_t(u32, info.nr_map_ids, ulen); > if (ulen) { > u32 __user *user_map_ids = u64_to_user_ptr(info.map_ids); > - u32 i; > > for (i = 0; i < ulen; i++) > if (put_user(prog->aux->used_maps[i]->id, > @@ -1970,13 +1969,41 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog > *prog, >* for offload. >*/ > ulen = info.jited_prog_len; > - info.jited_prog_len = prog->jited_len; > + if (prog->aux->func_cnt) { > + info.jited_prog_len = 0; > + for (i = 0; i < prog->aux->func_cnt; i++) > + info.jited_prog_len += prog->aux->func[i]->jited_len; > + } else { > + info.jited_prog_len = prog->jited_len; > + } > + > if (info.jited_prog_len && ulen) { > if (bpf_dump_raw_ok()) { > uinsns = u64_to_user_ptr(info.jited_prog_insns); > ulen = min_t(u32, info.jited_prog_len, ulen); > - if (copy_to_user(uinsns, prog->bpf_func, ulen)) > - return -EFAULT; > + > + /* for multi-function programs, copy the JITed > + * instructions for all the functions > + */ > + if (prog->aux->func_cnt) { > + u32 len, free; > + u8 *img; > + > + free = ulen; > + for (i = 0; i < prog->aux->func_cnt; i++) { > + len = prog->aux->func[i]->jited_len; > + img = (u8 *) > prog->aux->func[i]->bpf_func; > + if (len > free) > + break; > + if (copy_to_user(uinsns, img, len)) > + return -EFAULT; > + uinsns += len; > + free -= len; Is there any way we can introduce a delimiter between the different images such that they could be more easily correlated with the call from the main (or other sub-)program instead of having one contiguous dump blob? > + } > + } else { > + if (copy_to_user(uinsns, prog->bpf_func, ulen)) > + return -EFAULT; > + } > } else { > info.jited_prog_insns = 0; > } > @@ -1987,7 +2014,6 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog > *prog, > if (info.nr_jited_ksyms && ulen) { > u64 __user *user_jited_ksyms = > u64_to_user_ptr(info.jited_ksyms); > ulong ksym_addr; > - u32 i; > > /* copy the address of the kernel symbol corresponding to >
Re: [PATCH bpf v2 3/6] bpf: get kernel symbol addresses via syscall
On 05/18/2018 02:50 PM, Sandipan Das wrote: > This adds new two new fields to struct bpf_prog_info. For > multi-function programs, these fields can be used to pass > a list of kernel symbol addresses for all functions in a > given program and to userspace using the bpf system call > with the BPF_OBJ_GET_INFO_BY_FD command. > > When bpf_jit_kallsyms is enabled, we can get the address > of the corresponding kernel symbol for a callee function > and resolve the symbol's name. The address is determined > by adding the value of the call instruction's imm field > to __bpf_call_base. This offset gets assigned to the imm > field by the verifier. > > For some architectures, such as powerpc64, the imm field > is not large enough to hold this offset. > > We resolve this by: > > [1] Assigning the subprog id to the imm field of a call > instruction in the verifier instead of the offset of > the callee's symbol's address from __bpf_call_base. > > [2] Determining the address of a callee's corresponding > symbol by using the imm field as an index for the > list of kernel symbol addresses now available from > the program info. > > Suggested-by: Daniel Borkmann> Signed-off-by: Sandipan Das > --- > include/uapi/linux/bpf.h | 2 ++ > kernel/bpf/syscall.c | 20 > kernel/bpf/verifier.c| 7 +-- > 3 files changed, 23 insertions(+), 6 deletions(-) > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index d94d333a8225..040c9cac7303 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -2188,6 +2188,8 @@ struct bpf_prog_info { > __u32 xlated_prog_len; > __aligned_u64 jited_prog_insns; > __aligned_u64 xlated_prog_insns; > + __aligned_u64 jited_ksyms; > + __u32 nr_jited_ksyms; > __u64 load_time;/* ns since boottime */ > __u32 created_by_uid; > __u32 nr_map_ids; > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index bfcde949c7f8..54a72fafe57c 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -1933,6 +1933,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog > *prog, > if (!capable(CAP_SYS_ADMIN)) { > info.jited_prog_len = 0; > info.xlated_prog_len = 0; > + info.nr_jited_ksyms = 0; > goto done; > } > > @@ -1981,6 +1982,25 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog > *prog, > } > } > > + ulen = info.nr_jited_ksyms; > + info.nr_jited_ksyms = prog->aux->func_cnt; > + if (info.nr_jited_ksyms && ulen) { Since this exposes addresses (though masked one which is correct), this definitely needs to be guarded with bpf_dump_raw_ok() like we do in other places here (see JIT dump for example). > + u64 __user *user_jited_ksyms = > u64_to_user_ptr(info.jited_ksyms); > + ulong ksym_addr; > + u32 i; > + > + /* copy the address of the kernel symbol corresponding to > + * each function > + */ > + ulen = min_t(u32, info.nr_jited_ksyms, ulen); > + for (i = 0; i < ulen; i++) { > + ksym_addr = (ulong) prog->aux->func[i]->bpf_func; > + ksym_addr &= PAGE_MASK; > + if (put_user((u64) ksym_addr, _jited_ksyms[i])) > + return -EFAULT; > + } > + } > + > done: > if (copy_to_user(uinfo, , info_len) || > put_user(info_len, >info.info_len)) > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 6c56cce9c4e3..e826c396aba2 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -5426,17 +5426,12 @@ static int jit_subprogs(struct bpf_verifier_env *env) >* later look the same as if they were interpreted only. >*/ > for (i = 0, insn = prog->insnsi; i < prog->len; i++, insn++) { > - unsigned long addr; > - > if (insn->code != (BPF_JMP | BPF_CALL) || > insn->src_reg != BPF_PSEUDO_CALL) > continue; > insn->off = env->insn_aux_data[i].call_imm; > subprog = find_subprog(env, i + insn->off + 1); > - addr = (unsigned long)func[subprog]->bpf_func; Hmm, in current bpf tree this says 'subprog + 1' here, so this is not rebased against bpf tree but bpf-next (unlike what subject says)? https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/tree/kernel/bpf/verifier.c#n5351 > - addr &= PAGE_MASK; > - insn->imm = (u64 (*)(u64, u64, u64, u64, u64)) > - addr - __bpf_call_base; > + insn->imm = subprog; > } > > prog->jited = 1; >
Re: [PATCH bpf v2 1/6] bpf: support 64-bit offsets for bpf function calls
On 05/18/2018 02:50 PM, Sandipan Das wrote: > The imm field of a bpf instruction is a signed 32-bit integer. > For JIT bpf-to-bpf function calls, it stores the offset of the > start address of the callee's JITed image from __bpf_call_base. > > For some architectures, such as powerpc64, this offset may be > as large as 64 bits and cannot be accomodated in the imm field > without truncation. > > We resolve this by: > > [1] Additionally using the auxillary data of each function to > keep a list of start addresses of the JITed images for all > functions determined by the verifier. > > [2] Retaining the subprog id inside the off field of the call > instructions and using it to index into the list mentioned > above and lookup the callee's address. > > To make sure that the existing JIT compilers continue to work > without requiring changes, we keep the imm field as it is. > > Signed-off-by: Sandipan Das> --- > kernel/bpf/verifier.c | 15 ++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index a9e4b1372da6..6c56cce9c4e3 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -5383,11 +5383,24 @@ static int jit_subprogs(struct bpf_verifier_env *env) > insn->src_reg != BPF_PSEUDO_CALL) > continue; > subprog = insn->off; > - insn->off = 0; > insn->imm = (u64 (*)(u64, u64, u64, u64, u64)) > func[subprog]->bpf_func - > __bpf_call_base; > } > + > + /* we use the aux data to keep a list of the start addresses > + * of the JITed images for each function in the program > + * > + * for some architectures, such as powerpc64, the imm field > + * might not be large enough to hold the offset of the start > + * address of the callee's JITed image from __bpf_call_base > + * > + * in such cases, we can lookup the start address of a callee > + * by using its subprog id, available from the off field of > + * the call instruction, as an index for this list > + */ > + func[i]->aux->func = func; > + func[i]->aux->func_cnt = env->subprog_cnt + 1; The target tree you have here is infact bpf, since in bpf-next there was a cleanup where the + 1 is removed. Just for the record that we need to keep this in mind for bpf into bpf-next merge since this would otherwise subtly break. > } > for (i = 0; i < env->subprog_cnt; i++) { > old_bpf_func = func[i]->bpf_func; >
Re: [PATCH bpf v2 2/6] bpf: powerpc64: add JIT support for multi-function programs
On 05/18/2018 02:50 PM, Sandipan Das wrote: > This adds support for bpf-to-bpf function calls in the powerpc64 > JIT compiler. The JIT compiler converts the bpf call instructions > to native branch instructions. After a round of the usual passes, > the start addresses of the JITed images for the callee functions > are known. Finally, to fixup the branch target addresses, we need > to perform an extra pass. > > Because of the address range in which JITed images are allocated > on powerpc64, the offsets of the start addresses of these images > from __bpf_call_base are as large as 64 bits. So, for a function > call, we cannot use the imm field of the instruction to determine > the callee's address. Instead, we use the alternative method of > getting it from the list of function addresses in the auxillary > data of the caller by using the off field as an index. > > Signed-off-by: Sandipan Das> --- > arch/powerpc/net/bpf_jit_comp64.c | 79 > ++- > 1 file changed, 69 insertions(+), 10 deletions(-) > > diff --git a/arch/powerpc/net/bpf_jit_comp64.c > b/arch/powerpc/net/bpf_jit_comp64.c > index 1bdb1aff0619..25939892d8f7 100644 > --- a/arch/powerpc/net/bpf_jit_comp64.c > +++ b/arch/powerpc/net/bpf_jit_comp64.c > @@ -256,7 +256,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct > codegen_context *ctx, u32 > /* Assemble the body code between the prologue & epilogue */ > static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, > struct codegen_context *ctx, > - u32 *addrs) > + u32 *addrs, bool extra_pass) > { > const struct bpf_insn *insn = fp->insnsi; > int flen = fp->len; > @@ -712,11 +712,23 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 > *image, > break; > > /* > - * Call kernel helper > + * Call kernel helper or bpf function >*/ > case BPF_JMP | BPF_CALL: > ctx->seen |= SEEN_FUNC; > - func = (u8 *) __bpf_call_base + imm; > + > + /* bpf function call */ > + if (insn[i].src_reg == BPF_PSEUDO_CALL && extra_pass) Perhaps it might make sense here for !extra_pass to set func to some dummy address as otherwise the 'kernel helper call' branch used for this is a bit misleading in that sense. The PPC_LI64() used in bpf_jit_emit_func_call() optimizes the immediate addr, I presume the JIT can handle situations where in the final extra_pass the image needs to grow/shrink again (due to different final address for the call)? > + if (fp->aux->func && off < fp->aux->func_cnt) > + /* use the subprog id from the off > + * field to lookup the callee address > + */ > + func = (u8 *) > fp->aux->func[off]->bpf_func; > + else > + return -EINVAL; > + /* kernel helper call */ > + else > + func = (u8 *) __bpf_call_base + imm; > > bpf_jit_emit_func_call(image, ctx, (u64)func); > > @@ -864,6 +876,14 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 > *image, > return 0; > } > > +struct powerpc64_jit_data { > + struct bpf_binary_header *header; > + u32 *addrs; > + u8 *image; > + u32 proglen; > + struct codegen_context ctx; > +}; > + > struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) > { > u32 proglen; > @@ -871,6 +891,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) > u8 *image = NULL; > u32 *code_base; > u32 *addrs; > + struct powerpc64_jit_data *jit_data; > struct codegen_context cgctx; > int pass; > int flen; > @@ -878,6 +899,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) > struct bpf_prog *org_fp = fp; > struct bpf_prog *tmp_fp; > bool bpf_blinded = false; > + bool extra_pass = false; > > if (!fp->jit_requested) > return org_fp; > @@ -891,7 +913,28 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) > fp = tmp_fp; > } > > + jit_data = fp->aux->jit_data; > + if (!jit_data) { > + jit_data = kzalloc(sizeof(*jit_data), GFP_KERNEL); > + if (!jit_data) { > + fp = org_fp; > + goto out; > + } > + fp->aux->jit_data = jit_data; > + } > + > flen = fp->len; > + addrs = jit_data->addrs; > + if (addrs) { > + cgctx = jit_data->ctx; > + image = jit_data->image; > + bpf_hdr = jit_data->header; > + proglen
Re: [PATCH 0/7] Miscellaneous patches and bug fixes
Uma, > This patch series adds few improvements to the cxlflash driver and it > also contains couple of bug fixes. Applied to 4.18/scsi-queue, thank you! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH v2 5/5] powerpc/lib: inline memcmp() for small constant sizes
On Fri, May 18, 2018 at 12:35:48PM +0200, Christophe Leroy wrote: > On 05/17/2018 03:55 PM, Segher Boessenkool wrote: > >On Thu, May 17, 2018 at 12:49:58PM +0200, Christophe Leroy wrote: > >>In my 8xx configuration, I get 208 calls to memcmp() > >Could you show results with a more recent GCC? What version was this? > > It was with the latest GCC version I have available in my environment, > that is GCC 5.4. Is that too old ? Since GCC 7 the compiler knows how to do this, for powerpc; in GCC 8 it has improved still. > It seems that version inlines memcmp() when length is 1. All other > lengths call memcmp() Yup. > c000d018 : > c000d018: 80 64 00 00 lwz r3,0(r4) > c000d01c: 81 25 00 00 lwz r9,0(r5) > c000d020: 7c 69 18 50 subfr3,r9,r3 > c000d024: 4e 80 00 20 blr This is incorrect, it does not get the sign of the result correct. Say when comparing 0xff 0xff 0xff 0xff to 0 0 0 0. This should return positive, but it returns negative. For Power9 GCC does lwz 3,0(3) lwz 9,0(4) cmpld 7,3,9 setb 3,7 and for Power7/Power8, lwz 9,0(3) lwz 3,0(4) subfc 3,3,9 popcntd 3,3 subfe 9,9,9 or 3,3,9 (and it gives up for earlier CPUs, there is no nice simple code sequence as far as we know. Code size matters when generating inline code). (Generating code for -m32 it is the same, just w instead of d in a few places). > c000d09c : > c000d09c: 81 25 00 04 lwz r9,4(r5) > c000d0a0: 80 64 00 04 lwz r3,4(r4) > c000d0a4: 81 04 00 00 lwz r8,0(r4) > c000d0a8: 81 45 00 00 lwz r10,0(r5) > c000d0ac: 7c 69 18 10 subfc r3,r9,r3 > c000d0b0: 7d 2a 41 10 subfe r9,r10,r8 > c000d0b4: 7d 2a fe 70 srawi r10,r9,31 > c000d0b8: 7d 48 4b 79 or. r8,r10,r9 > c000d0bc: 4d a2 00 20 bclr+ 12,eq > c000d0c0: 7d 23 4b 78 mr r3,r9 > c000d0c4: 4e 80 00 20 blr > This shows that on PPC32, the 8 bytes comparison is not optimal, I will > improve it. It's not correct either (same problem as with length 4). Segher
[PATCH] powerpc/32: Implement csum_ipv6_magic in assembly
The generic csum_ipv6_magic() generates a pretty bad result : 0: 81 23 00 00 lwz r9,0(r3) 4: 81 03 00 04 lwz r8,4(r3) 8: 7c e7 4a 14 add r7,r7,r9 c: 7d 29 38 10 subfc r9,r9,r7 10: 7d 4a 51 10 subfe r10,r10,r10 14: 7d 27 42 14 add r9,r7,r8 18: 7d 2a 48 50 subfr9,r10,r9 1c: 80 e3 00 08 lwz r7,8(r3) 20: 7d 08 48 10 subfc r8,r8,r9 24: 7d 4a 51 10 subfe r10,r10,r10 28: 7d 29 3a 14 add r9,r9,r7 2c: 81 03 00 0c lwz r8,12(r3) 30: 7d 2a 48 50 subfr9,r10,r9 34: 7c e7 48 10 subfc r7,r7,r9 38: 7d 4a 51 10 subfe r10,r10,r10 3c: 7d 29 42 14 add r9,r9,r8 40: 7d 2a 48 50 subfr9,r10,r9 44: 80 e4 00 00 lwz r7,0(r4) 48: 7d 08 48 10 subfc r8,r8,r9 4c: 7d 4a 51 10 subfe r10,r10,r10 50: 7d 29 3a 14 add r9,r9,r7 54: 7d 2a 48 50 subfr9,r10,r9 58: 81 04 00 04 lwz r8,4(r4) 5c: 7c e7 48 10 subfc r7,r7,r9 60: 7d 4a 51 10 subfe r10,r10,r10 64: 7d 29 42 14 add r9,r9,r8 68: 7d 2a 48 50 subfr9,r10,r9 6c: 80 e4 00 08 lwz r7,8(r4) 70: 7d 08 48 10 subfc r8,r8,r9 74: 7d 4a 51 10 subfe r10,r10,r10 78: 7d 29 3a 14 add r9,r9,r7 7c: 7d 2a 48 50 subfr9,r10,r9 80: 81 04 00 0c lwz r8,12(r4) 84: 7c e7 48 10 subfc r7,r7,r9 88: 7d 4a 51 10 subfe r10,r10,r10 8c: 7d 29 42 14 add r9,r9,r8 90: 7d 2a 48 50 subfr9,r10,r9 94: 7d 08 48 10 subfc r8,r8,r9 98: 7d 4a 51 10 subfe r10,r10,r10 9c: 7d 29 2a 14 add r9,r9,r5 a0: 7d 2a 48 50 subfr9,r10,r9 a4: 7c a5 48 10 subfc r5,r5,r9 a8: 7c 63 19 10 subfe r3,r3,r3 ac: 7d 29 32 14 add r9,r9,r6 b0: 7d 23 48 50 subfr9,r3,r9 b4: 7c c6 48 10 subfc r6,r6,r9 b8: 7c 63 19 10 subfe r3,r3,r3 bc: 7c 63 48 50 subfr3,r3,r9 c0: 54 6a 80 3e rotlwi r10,r3,16 c4: 7c 63 52 14 add r3,r3,r10 c8: 7c 63 18 f8 not r3,r3 cc: 54 63 84 3e rlwinm r3,r3,16,16,31 d0: 4e 80 00 20 blr This patch implements it in assembly for PPC32 Link: https://github.com/linuxppc/linux/issues/9 Signed-off-by: Christophe Leroy--- arch/powerpc/include/asm/checksum.h | 8 arch/powerpc/lib/checksum_32.S | 33 + 2 files changed, 41 insertions(+) diff --git a/arch/powerpc/include/asm/checksum.h b/arch/powerpc/include/asm/checksum.h index 54065caa40b3..c41c280c252f 100644 --- a/arch/powerpc/include/asm/checksum.h +++ b/arch/powerpc/include/asm/checksum.h @@ -13,6 +13,7 @@ #include #else #include +#include /* * Computes the checksum of a memory block at src, length len, * and adds in "sum" (32-bit), while copying the block to dst. @@ -211,6 +212,13 @@ static inline __sum16 ip_compute_csum(const void *buff, int len) return csum_fold(csum_partial(buff, len, 0)); } +#ifdef CONFIG_PPC32 +#define _HAVE_ARCH_IPV6_CSUM +__sum16 csum_ipv6_magic(const struct in6_addr *saddr, + const struct in6_addr *daddr, + __u32 len, __u8 proto, __wsum sum); +#endif + #endif #endif /* __KERNEL__ */ #endif diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S index 9a671c774b22..f9c047145696 100644 --- a/arch/powerpc/lib/checksum_32.S +++ b/arch/powerpc/lib/checksum_32.S @@ -293,3 +293,36 @@ dst_error: EX_TABLE(51b, dst_error); EXPORT_SYMBOL(csum_partial_copy_generic) + +/* + * static inline __sum16 csum_ipv6_magic(const struct in6_addr *saddr, + * const struct in6_addr *daddr, + * __u32 len, __u8 proto, __wsum sum) + */ + +_GLOBAL(csum_ipv6_magic) + lwz r8, 0(r3) + lwz r9, 4(r3) + lwz r10, 8(r3) + lwz r11, 12(r3) + addcr0, r5, r6 + adder0, r0, r7 + adder0, r0, r8 + adder0, r0, r9 + adder0, r0, r10 + adder0, r0, r11 + lwz r8, 0(r4) + lwz r9, 4(r4) + lwz r10, 8(r4) + lwz r11, 12(r4) + adder0, r0, r8 + adder0, r0, r9 + adder0, r0, r10 + adder0, r0, r11 + addze r0 + rotlwi r3, r0, 16 + add r3, r0, r3 + not r3, r3 + rlwinm r3, r3, 16, 16, 31 + blr +EXPORT_SYMBOL(csum_ipv6_magic) -- 2.13.3
Re: pkeys on POWER: Default AMR, UAMOR values
On Fri, May 18, 2018 at 6:17 AM Florian Weimerwrote: > I'm working on adding POWER pkeys support to glibc. The coding work is > done, but I'm faced with some test suite failures. > Unlike the default x86 configuration, on POWER, existing threads have > full access to newly allocated keys. > Or, more precisely, in this scenario: > * Thread A launches thread B > * Thread B waits > * Thread A allocations a protection key with pkey_alloc > * Thread A applies the key to a page > * Thread A signals thread B > * Thread B starts to run and accesses the page > Then at the end, the access will be granted. > I hope it's not too late to change this to denied access. > Furthermore, I think the UAMOR value is wrong as well because it > prevents thread B at the end to set the AMR register. In particular, if > I do this > * … (as before) > * Thread A signals thread B > * Thread B sets the access rights for the key to PKEY_DISABLE_ACCESS > * Thread B reads the current access rights for the key > then it still gets 0 (all access permitted) because the original UAMOR > value inherited from thread A prior to the key allocation masks out the > access right update for the newly allocated key. This type of issue is why I think that a good protection key ISA would not have a usermode read-the-whole-register or write-the-whole-register operation at all. It's still not clear to me that there is any good kernel-mode solution. But at least x86 defaults to deny-everything, which is more annoying but considerably safer than POWER's behavior. --Andy
pkeys on POWER: Access rights not reset on execve
This test program: #include #include #include #include #include #include /* Return the value of the AMR register. */ static inline unsigned long int pkey_read (void) { unsigned long int result; __asm__ volatile ("mfspr %0, 13" : "=r" (result)); return result; } /* Overwrite the AMR register with VALUE. */ static inline void pkey_write (unsigned long int value) { __asm__ volatile ("mtspr 13, %0" : : "r" (value)); } int main (int argc, char **argv) { printf ("AMR (PID %d): 0x%016lx\n", (int) getpid (), pkey_read()); if (argc > 1) { int key = syscall (__NR_pkey_alloc, 0, 0); if (key < 0) err (1, "pkey_alloc"); printf ("Allocated key (PID %d): %d\n", (int) getpid (), key); return 0; } pid_t pid = fork (); if (pid == 0) { execl ("/proc/self/exe", argv[0], "subprocess", NULL); _exit (1); } if (pid < 0) err (1, "fork"); int status; if (waitpid (pid, , 0) < 0) err (1, "waitpid"); int key = syscall (__NR_pkey_alloc, 0, 0); if (key < 0) err (1, "pkey_alloc"); printf ("Allocated key (PID %d): %d\n", (int) getpid (), key); unsigned long int amr = -1; printf ("Setting AMR: 0x%016lx\n", amr); pkey_write (amr); printf ("New AMR value (PID %d, before execl): 0x%016lx\n", (int) getpid (), pkey_read()); execl ("/proc/self/exe", argv[0], "subprocess", NULL); err (1, "exec"); return 1; } shows that the AMR register value is not reset on execve: AMR (PID 112291): 0x AMR (PID 112292): 0x Allocated key (PID 112292): 2 Allocated key (PID 112291): 2 Setting AMR: 0x New AMR value (PID 112291, before execl): 0x0c00 AMR (PID 112291): 0x0c00 Allocated key (PID 112291): 2 I think this is a real bug and needs to be fixed even if the defaults are kept as-is (see the other thread). (Seen on 4.17.0-rc5.) Thanks, Florian
[GIT PULL] Please pull powerpc/linux.git powerpc-4.17-6 tag
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hi Linus, Please pull some more powerpc fixes for 4.17: The following changes since commit 6c0a8f6b5a45ac892a763b6299bd3c5324fc5e02: powerpc/pseries: Fix CONFIG_NUMA=n build (2018-05-08 14:59:56 +1000) are available in the git repository at: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git tags/powerpc-4.17-6 for you to fetch changes up to c1d2a31397ec51f0370f6bd17b19b39152c263cb: powerpc/powernv: Fix NVRAM sleep in invalid context when crashing (2018-05-18 00:23:07 +1000) - powerpc fixes for 4.17 #6 Just three commits. The two cxl ones are not fixes per se, but they modify code that was added this cycle so that it will work with a recent firmware change. And then a fix for a recent commit that added sleeps in the NVRAM code, which needs to be more careful and not sleep if eg. we're called in the panic() path. Thanks to: Nicholas Piggin, Philippe Bergheaud, Christophe Lombard. - Nicholas Piggin (1): powerpc/powernv: Fix NVRAM sleep in invalid context when crashing Philippe Bergheaud (2): cxl: Set the PBCQ Tunnel BAR register when enabling capi mode cxl: Report the tunneled operations status Documentation/ABI/testing/sysfs-class-cxl | 8 arch/powerpc/platforms/powernv/opal-nvram.c | 14 -- drivers/misc/cxl/cxl.h | 1 + drivers/misc/cxl/pci.c | 12 drivers/misc/cxl/sysfs.c| 10 ++ 5 files changed, 43 insertions(+), 2 deletions(-) -BEGIN PGP SIGNATURE- iQIcBAEBCAAGBQJa/tj4AAoJEFHr6jzI4aWAZBoP/2/kddNPHzMwnzcx0B5WzEfo 3JQcTLZPEqsb5n0xwaWOgb0o0u8EpHDtPZlJlZZWwPKbphlZvQ+reAJAXCQXXj6P n6BrbfpHlJo1J2wZVeihOuVTtSiXqq82YEtCzdUCUQNa8+2rQ7bLOB56cLdnyFWx wvK/xOM4rG/nPDi3tlH+IvhNeUdc2Y1vCB6N/RVvJVJOduCED1zCxcWj7uJYGt1x UdFPcah4OPSF9qPemReu6CjiwesvEm3y3S9TYLLJLXBDmKvs7BDRbfczP52tY90P XMErJN4MtnNasotCJVcZylfNhSTCyyaQ4si+G/COAM0eh8oPdcIH3C2yXnQHXVPz pUpe7TF3N7DOoa66WHXsh9dr9G7E0DhcpBgpz30/HmYasaYSEYpDsCH6UvbmvpzO xtyocfgtuyaF28Rz4WVYSVGkJwE/+NhdsvfQUFI6DGQXoVu08Xxx3rhOlNKVecTw 8yuaIpPSz0i1pFg3SRa1MaS0F4FqGs0T3OLFsEihjdsc3EpJeB3Vp0IiqbmhR6Pa 1rhsjPY4yS5tPJS9K4CQszgw3Ke6C7KYIttGo7BxVKPLXsp52HhTqFpp0LHDcw8y zFjOS7kh07NCgn8tSwMLUTRUUXEDLE2fJOVWCDdvkXEzpd/LfaP15gjZ3m66OMTS Oj72EjBAGNX4dN4yQuo7 =4xib -END PGP SIGNATURE-
Re: [PATCH 1/2] powerpc: Detect the presence of big-core with interleaved threads
"Gautham R. Shenoy"writes: > diff --git a/arch/powerpc/kernel/setup-common.c > b/arch/powerpc/kernel/setup-common.c > index 0af5c11..884dff2 100644 > --- a/arch/powerpc/kernel/setup-common.c > +++ b/arch/powerpc/kernel/setup-common.c > @@ -436,8 +438,56 @@ static void __init cpu_init_thread_core_maps(int tpc) > printk(KERN_DEBUG " (thread shift is %d)\n", threads_shift); > } > > - > u32 *cpu_to_phys_id = NULL; > +/* > + * check_for_interleaved_big_core - Checks if the core represented by > + *dn is a big-core whose threads are interleavings of the > + *threads of the component small cores. > + * > + * @dn: device node corresponding to the core. > + * > + * Returns true if the core is a interleaved big-core. > + * Returns false otherwise. > + */ > +static inline bool check_for_interleaved_big_core(struct device_node *dn) > +{ > + int len, nr_groups, threads_per_group; > + const __be32 *thread_groups; > + __be32 *thread_list, *first_cpu_idx; > + int cur_cpu, next_cpu, i, j; > + > + thread_groups = of_get_property(dn, "ibm,thread-groups", ); > + if (!thread_groups) > + return false; There are better device tree APIs than bare of_get_property() these days, can you try to use those? > + nr_groups = be32_to_cpu(*(thread_groups + 1)); > + if (nr_groups <= 1) > + return false; eg. this would be of_property_read_u32_index() > + > + threads_per_group = be32_to_cpu(*(thread_groups + 2)); > + thread_list = (__be32 *)thread_groups + 3; > + > + /* > + * In case of an interleaved big-core, the thread-ids of the > + * big-core can be obtained by interleaving the the thread-ids > + * of the component small > + * > + * Eg: On a 8-thread big-core with two SMT4 small cores, the > + * threads of the two component small cores will be > + * {0, 2, 4, 6} and {1, 3, 5, 7}. > + */ > + for (i = 0; i < nr_groups; i++) { > + first_cpu_idx = thread_list + i * threads_per_group; > + > + for (j = 0; j < threads_per_group - 1; j++) { > + cur_cpu = be32_to_cpu(*(first_cpu_idx + j)); > + next_cpu = be32_to_cpu(*(first_cpu_idx + j + 1)); > + if (next_cpu != cur_cpu + nr_groups) > + return false; > + } > + } > + return true; > +} > > /** > * setup_cpu_maps - initialize the following cpu maps: > @@ -565,7 +615,16 @@ void __init smp_setup_cpu_maps(void) > vdso_data->processorCount = num_present_cpus(); > #endif /* CONFIG_PPC64 */ > > -/* Initialize CPU <=> thread mapping/ > + dn = of_find_node_by_type(NULL, "cpu"); > + if (dn) { > + if (check_for_interleaved_big_core(dn)) { > + has_interleaved_big_core = true; > + pr_info("Detected interleaved big-cores\n"); > + } > + of_node_put(dn); > + } This is a bit untidy, given how unlikely it is that you would have no CPUs :) You should be able to do the lookup of the property and the setting of has_interleaved_big_core all inside check_for_interleaved_big_core(). cheers
pkeys on POWER: Default AMR, UAMOR values
I'm working on adding POWER pkeys support to glibc. The coding work is done, but I'm faced with some test suite failures. Unlike the default x86 configuration, on POWER, existing threads have full access to newly allocated keys. Or, more precisely, in this scenario: * Thread A launches thread B * Thread B waits * Thread A allocations a protection key with pkey_alloc * Thread A applies the key to a page * Thread A signals thread B * Thread B starts to run and accesses the page Then at the end, the access will be granted. I hope it's not too late to change this to denied access. Furthermore, I think the UAMOR value is wrong as well because it prevents thread B at the end to set the AMR register. In particular, if I do this * … (as before) * Thread A signals thread B * Thread B sets the access rights for the key to PKEY_DISABLE_ACCESS * Thread B reads the current access rights for the key then it still gets 0 (all access permitted) because the original UAMOR value inherited from thread A prior to the key allocation masks out the access right update for the newly allocated key. Thanks, Florian
Re: [PATCH 1/2] powerpc: Detect the presence of big-core with interleaved threads
Gautham R Shenoywrites: ... >> > @@ -565,7 +615,16 @@ void __init smp_setup_cpu_maps(void) >> >vdso_data->processorCount = num_present_cpus(); >> > #endif /* CONFIG_PPC64 */ >> > >> > -/* Initialize CPU <=> thread mapping/ >> > + dn = of_find_node_by_type(NULL, "cpu"); >> > + if (dn) { >> > + if (check_for_interleaved_big_core(dn)) { >> > + has_interleaved_big_core = true; >> > + pr_info("Detected interleaved big-cores\n"); >> >> Is there a runtime way to check this also? If the dmesg buffer overflows, we >> lose this. > > Where do you suggest we put this ? Should it be a part of > /proc/cpuinfo ? Hmm, it'd be nice not to pollute it with more junk. Can you just look at the pir files in sysfs? eg. on a normal system: # cd /sys/devices/system/cpu # grep . cpu[0-7]/pir cpu0/pir:20 cpu1/pir:21 cpu2/pir:22 cpu3/pir:23 cpu4/pir:24 cpu5/pir:25 cpu6/pir:26 cpu7/pir:27 cheers
Re: [PATCH 2/2] powerpc: Enable ASYM_SMT on interleaved big-core systems
Gautham R Shenoywrites: > On Mon, May 14, 2018 at 01:22:07PM +1000, Michael Neuling wrote: >> On Fri, 2018-05-11 at 16:47 +0530, Gautham R. Shenoy wrote: >> > From: "Gautham R. Shenoy" >> > >> > Each of the SMT4 cores forming a fused-core are more or less >> > independent units. Thus when multiple tasks are scheduled to run on >> > the fused core, we get the best performance when the tasks are spread >> > across the pair of SMT4 cores. >> > >> > Since the threads in the pair of SMT4 cores of an interleaved big-core >> > are numbered {0,2,4,6} and {1,3,5,7} respectively, enable ASYM_SMT on >> > such interleaved big-cores that will bias the load-balancing of tasks >> > on smaller numbered threads, which will automatically result in >> > spreading the tasks uniformly across the associated pair of SMT4 >> > cores. >> > >> > Signed-off-by: Gautham R. Shenoy >> > --- >> > arch/powerpc/kernel/smp.c | 2 +- >> > 1 file changed, 1 insertion(+), 1 deletion(-) >> > >> > diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c >> > index 9ca7148..0153f01 100644 >> > --- a/arch/powerpc/kernel/smp.c >> > +++ b/arch/powerpc/kernel/smp.c >> > @@ -1082,7 +1082,7 @@ static int powerpc_smt_flags(void) >> > { >> >int flags = SD_SHARE_CPUCAPACITY | SD_SHARE_PKG_RESOURCES; >> > >> > - if (cpu_has_feature(CPU_FTR_ASYM_SMT)) { >> > + if (cpu_has_feature(CPU_FTR_ASYM_SMT) || has_interleaved_big_core) { >> >> Shouldn't we just set CPU_FTR_ASYM_SMT and leave this code > unchanged? > > Yes, that would have the same effect. I refrained from doing that > since I thought CPU_FTR_ASYM_SMT has the "lower numbered threads > expedite thread-folding" connotation from the POWER7 generation. The above code is the only use of the feature, so I don't think we need to worry about any other connotations. > If it is ok to overload CPU_FTR_ASYM_SMT, we can do what you suggest > and have all the changes in setup-common.c Yeah let's do that. cheers
[PATCH v2] powerpc/lib: Adjust .balign inside string functions for PPC32
commit 87a156fb18fe1 ("Align hot loops of some string functions") degraded the performance of string functions by adding useless nops A simple benchmark on an 8xx calling 10x a memchr() that matches the first byte runs in 41668 TB ticks before this patch and in 35986 TB ticks after this patch. So this gives an improvement of approx 10% Another benchmark doing the same with a memchr() matching the 128th byte runs in 1011365 TB ticks before this patch and 1005682 TB ticks after this patch, so regardless on the number of loops, removing those useless nops improves the test by 5683 TB ticks. Fixes: 87a156fb18fe1 ("Align hot loops of some string functions") Signed-off-by: Christophe Leroy--- v2: Define IFETCH_ALIGN_SHIFT for PPC32 and use IFETCH_ALIGN_BYTES for the alignment arch/powerpc/include/asm/cache.h | 3 +++ arch/powerpc/lib/string.S| 7 --- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h index c1d257aa4c2d..66298461b640 100644 --- a/arch/powerpc/include/asm/cache.h +++ b/arch/powerpc/include/asm/cache.h @@ -9,11 +9,14 @@ #if defined(CONFIG_PPC_8xx) || defined(CONFIG_403GCX) #define L1_CACHE_SHIFT 4 #define MAX_COPY_PREFETCH 1 +#define IFETCH_ALIGN_SHIFT 2 #elif defined(CONFIG_PPC_E500MC) #define L1_CACHE_SHIFT 6 #define MAX_COPY_PREFETCH 4 +#define IFETCH_ALIGN_SHIFT 3 #elif defined(CONFIG_PPC32) #define MAX_COPY_PREFETCH 4 +#define IFETCH_ALIGN_SHIFT 3 /* 603 fetches 2 insn at a time */ #if defined(CONFIG_PPC_47x) #define L1_CACHE_SHIFT 7 #else diff --git a/arch/powerpc/lib/string.S b/arch/powerpc/lib/string.S index a787776822d8..0378def28d41 100644 --- a/arch/powerpc/lib/string.S +++ b/arch/powerpc/lib/string.S @@ -12,6 +12,7 @@ #include #include #include +#include .text @@ -23,7 +24,7 @@ _GLOBAL(strncpy) mtctr r5 addir6,r3,-1 addir4,r4,-1 - .balign 16 + .balign IFETCH_ALIGN_BYTES 1: lbzur0,1(r4) cmpwi 0,r0,0 stbur0,1(r6) @@ -43,7 +44,7 @@ _GLOBAL(strncmp) mtctr r5 addir5,r3,-1 addir4,r4,-1 - .balign 16 + .balign IFETCH_ALIGN_BYTES 1: lbzur3,1(r5) cmpwi 1,r3,0 lbzur0,1(r4) @@ -77,7 +78,7 @@ _GLOBAL(memchr) beq-2f mtctr r5 addir3,r3,-1 - .balign 16 + .balign IFETCH_ALIGN_BYTES 1: lbzur0,1(r3) cmpw0,r0,r4 bdnzf 2,1b -- 2.13.3
Re: [PATCH 2/2] powerpc/ptrace: Fix setting 512B aligned breakpoints with PTRACE_SET_DEBUGREG
Michael Neulingwrites: > In this change: > e2a800beac powerpc/hw_brk: Fix off by one error when validating DAWR region > end > > We fixed setting the DAWR end point to its max value via > PPC_PTRACE_SETHWDEBUG. Unfortunately we broke PTRACE_SET_DEBUGREG when > setting a 512 byte aligned breakpoint. > > PTRACE_SET_DEBUGREG currently sets the length of the breakpoint to > zero (memset() in hw_breakpoint_init()). This worked with > arch_validate_hwbkpt_settings() before the above patch was applied but > is now broken if the breakpoint is 512byte aligned. > > This sets the length of the breakpoint to 8 bytes when using > PTRACE_SET_DEBUGREG. > > Signed-off-by: Michael Neuling > Cc: sta...@vger.kernel.org # 3.10+ If this is "fixing" e2a800beac then I think v3.11 is right for the stable tag? $ git describe --contains --long e2a800beaca1 v3.11-rc1~94^2~4 cheers
[PATCH bpf v2 6/6] bpf: fix JITed dump for multi-function programs via syscall
Currently, for multi-function programs, we cannot get the JITed instructions using the bpf system call's BPF_OBJ_GET_INFO_BY_FD command. Because of this, userspace tools such as bpftool fail to identify a multi-function program as being JITed or not. With the JIT enabled and the test program running, this can be verified as follows: # cat /proc/sys/net/core/bpf_jit_enable 1 Before applying this patch: # bpftool prog list 1: kprobe name foo tag b811aab41a39ad3d gpl loaded_at 2018-05-16T11:43:38+0530 uid 0 xlated 216B not jited memlock 65536B ... # bpftool prog dump jited id 1 no instructions returned After applying this patch: # bpftool prog list 1: kprobe name foo tag b811aab41a39ad3d gpl loaded_at 2018-05-16T12:13:01+0530 uid 0 xlated 216B jited 308B memlock 65536B ... # bpftool prog dump jited id 1 0: nop 4: nop 8: mflrr0 c: std r0,16(r1) 10: stdur1,-112(r1) 14: std r31,104(r1) 18: addir31,r1,48 1c: li r3,10 ... Signed-off-by: Sandipan Das--- kernel/bpf/syscall.c | 38 -- 1 file changed, 32 insertions(+), 6 deletions(-) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 54a72fafe57c..2430d159078c 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1896,7 +1896,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog, struct bpf_prog_info info = {}; u32 info_len = attr->info.info_len; char __user *uinsns; - u32 ulen; + u32 ulen, i; int err; err = check_uarg_tail_zero(uinfo, sizeof(info), info_len); @@ -1922,7 +1922,6 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog, ulen = min_t(u32, info.nr_map_ids, ulen); if (ulen) { u32 __user *user_map_ids = u64_to_user_ptr(info.map_ids); - u32 i; for (i = 0; i < ulen; i++) if (put_user(prog->aux->used_maps[i]->id, @@ -1970,13 +1969,41 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog, * for offload. */ ulen = info.jited_prog_len; - info.jited_prog_len = prog->jited_len; + if (prog->aux->func_cnt) { + info.jited_prog_len = 0; + for (i = 0; i < prog->aux->func_cnt; i++) + info.jited_prog_len += prog->aux->func[i]->jited_len; + } else { + info.jited_prog_len = prog->jited_len; + } + if (info.jited_prog_len && ulen) { if (bpf_dump_raw_ok()) { uinsns = u64_to_user_ptr(info.jited_prog_insns); ulen = min_t(u32, info.jited_prog_len, ulen); - if (copy_to_user(uinsns, prog->bpf_func, ulen)) - return -EFAULT; + + /* for multi-function programs, copy the JITed +* instructions for all the functions +*/ + if (prog->aux->func_cnt) { + u32 len, free; + u8 *img; + + free = ulen; + for (i = 0; i < prog->aux->func_cnt; i++) { + len = prog->aux->func[i]->jited_len; + img = (u8 *) prog->aux->func[i]->bpf_func; + if (len > free) + break; + if (copy_to_user(uinsns, img, len)) + return -EFAULT; + uinsns += len; + free -= len; + } + } else { + if (copy_to_user(uinsns, prog->bpf_func, ulen)) + return -EFAULT; + } } else { info.jited_prog_insns = 0; } @@ -1987,7 +2014,6 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog, if (info.nr_jited_ksyms && ulen) { u64 __user *user_jited_ksyms = u64_to_user_ptr(info.jited_ksyms); ulong ksym_addr; - u32 i; /* copy the address of the kernel symbol corresponding to * each function -- 2.14.3
[PATCH bpf v2 5/6] tools: bpftool: resolve calls without using imm field
Currently, we resolve the callee's address for a JITed function call by using the imm field of the call instruction as an offset from __bpf_call_base. If bpf_jit_kallsyms is enabled, we further use this address to get the callee's kernel symbol's name. For some architectures, such as powerpc64, the imm field is not large enough to hold this offset. So, instead of assigning this offset to the imm field, the verifier now assigns the subprog id. Also, a list of kernel symbol addresses for all the JITed functions is provided in the program info. We now use the imm field as an index for this list to lookup a callee's symbol's address and resolve its name. Suggested-by: Daniel BorkmannSigned-off-by: Sandipan Das --- v2: - Order variables from longest to shortest - Make sure that ksyms_ptr and ksyms_len are always initialized - Simplify code --- tools/bpf/bpftool/prog.c | 29 + tools/bpf/bpftool/xlated_dumper.c | 10 +- tools/bpf/bpftool/xlated_dumper.h | 2 ++ 3 files changed, 40 insertions(+), 1 deletion(-) diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c index 9bdfdf2d3fbe..e2f8f8f259fc 100644 --- a/tools/bpf/bpftool/prog.c +++ b/tools/bpf/bpftool/prog.c @@ -421,19 +421,26 @@ static int do_show(int argc, char **argv) static int do_dump(int argc, char **argv) { struct bpf_prog_info info = {}; + unsigned long *addrs = NULL; struct dump_data dd = {}; __u32 len = sizeof(info); unsigned int buf_size; + unsigned int nr_addrs; char *filepath = NULL; bool opcodes = false; bool visual = false; unsigned char *buf; __u32 *member_len; __u64 *member_ptr; + __u32 *ksyms_len; + __u64 *ksyms_ptr; ssize_t n; int err; int fd; + ksyms_len = _jited_ksyms; + ksyms_ptr = _ksyms; + if (is_prefix(*argv, "jited")) { member_len = _prog_len; member_ptr = _prog_insns; @@ -496,10 +503,22 @@ static int do_dump(int argc, char **argv) return -1; } + nr_addrs = *ksyms_len; + if (nr_addrs) { + addrs = malloc(nr_addrs * sizeof(__u64)); + if (!addrs) { + p_err("mem alloc failed"); + close(fd); + goto err_free; + } + } + memset(, 0, sizeof(info)); *member_ptr = ptr_to_u64(buf); *member_len = buf_size; + *ksyms_ptr = ptr_to_u64(addrs); + *ksyms_len = nr_addrs; err = bpf_obj_get_info_by_fd(fd, , ); close(fd); @@ -513,6 +532,11 @@ static int do_dump(int argc, char **argv) goto err_free; } + if (*ksyms_len > nr_addrs) { + p_err("too many addresses returned"); + goto err_free; + } + if ((member_len == _prog_len && info.jited_prog_insns == 0) || (member_len == _prog_len && @@ -558,6 +582,9 @@ static int do_dump(int argc, char **argv) dump_xlated_cfg(buf, *member_len); } else { kernel_syms_load(); + dd.nr_jited_ksyms = *ksyms_len; + dd.jited_ksyms = (__u64 *) *ksyms_ptr; + if (json_output) dump_xlated_json(, buf, *member_len, opcodes); else @@ -566,10 +593,12 @@ static int do_dump(int argc, char **argv) } free(buf); + free(addrs); return 0; err_free: free(buf); + free(addrs); return -1; } diff --git a/tools/bpf/bpftool/xlated_dumper.c b/tools/bpf/bpftool/xlated_dumper.c index 7a3173b76c16..fb065b55db6d 100644 --- a/tools/bpf/bpftool/xlated_dumper.c +++ b/tools/bpf/bpftool/xlated_dumper.c @@ -174,7 +174,11 @@ static const char *print_call_pcrel(struct dump_data *dd, unsigned long address, const struct bpf_insn *insn) { - if (sym) + if (!dd->nr_jited_ksyms) + /* Do not show address for interpreted programs */ + snprintf(dd->scratch_buff, sizeof(dd->scratch_buff), + "%+d", insn->off); + else if (sym) snprintf(dd->scratch_buff, sizeof(dd->scratch_buff), "%+d#%s", insn->off, sym->name); else @@ -203,6 +207,10 @@ static const char *print_call(void *private_data, unsigned long address = dd->address_call_base + insn->imm; struct kernel_sym *sym; + if (insn->src_reg == BPF_PSEUDO_CALL && + (__u32) insn->imm < dd->nr_jited_ksyms) + address = dd->jited_ksyms[insn->imm]; + sym = kernel_syms_search(dd, address); if (insn->src_reg == BPF_PSEUDO_CALL) return print_call_pcrel(dd, sym, address, insn); diff
[PATCH bpf v2 4/6] tools: bpf: sync bpf uapi header
Syncing the bpf.h uapi header with tools so that struct bpf_prog_info has the two new fields for passing on the addresses of the kernel symbols corresponding to each function in a JITed program. Signed-off-by: Sandipan Das--- tools/include/uapi/linux/bpf.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index d94d333a8225..040c9cac7303 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -2188,6 +2188,8 @@ struct bpf_prog_info { __u32 xlated_prog_len; __aligned_u64 jited_prog_insns; __aligned_u64 xlated_prog_insns; + __aligned_u64 jited_ksyms; + __u32 nr_jited_ksyms; __u64 load_time;/* ns since boottime */ __u32 created_by_uid; __u32 nr_map_ids; -- 2.14.3
[PATCH bpf v2 3/6] bpf: get kernel symbol addresses via syscall
This adds new two new fields to struct bpf_prog_info. For multi-function programs, these fields can be used to pass a list of kernel symbol addresses for all functions in a given program and to userspace using the bpf system call with the BPF_OBJ_GET_INFO_BY_FD command. When bpf_jit_kallsyms is enabled, we can get the address of the corresponding kernel symbol for a callee function and resolve the symbol's name. The address is determined by adding the value of the call instruction's imm field to __bpf_call_base. This offset gets assigned to the imm field by the verifier. For some architectures, such as powerpc64, the imm field is not large enough to hold this offset. We resolve this by: [1] Assigning the subprog id to the imm field of a call instruction in the verifier instead of the offset of the callee's symbol's address from __bpf_call_base. [2] Determining the address of a callee's corresponding symbol by using the imm field as an index for the list of kernel symbol addresses now available from the program info. Suggested-by: Daniel BorkmannSigned-off-by: Sandipan Das --- include/uapi/linux/bpf.h | 2 ++ kernel/bpf/syscall.c | 20 kernel/bpf/verifier.c| 7 +-- 3 files changed, 23 insertions(+), 6 deletions(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index d94d333a8225..040c9cac7303 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -2188,6 +2188,8 @@ struct bpf_prog_info { __u32 xlated_prog_len; __aligned_u64 jited_prog_insns; __aligned_u64 xlated_prog_insns; + __aligned_u64 jited_ksyms; + __u32 nr_jited_ksyms; __u64 load_time;/* ns since boottime */ __u32 created_by_uid; __u32 nr_map_ids; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index bfcde949c7f8..54a72fafe57c 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1933,6 +1933,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog, if (!capable(CAP_SYS_ADMIN)) { info.jited_prog_len = 0; info.xlated_prog_len = 0; + info.nr_jited_ksyms = 0; goto done; } @@ -1981,6 +1982,25 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog, } } + ulen = info.nr_jited_ksyms; + info.nr_jited_ksyms = prog->aux->func_cnt; + if (info.nr_jited_ksyms && ulen) { + u64 __user *user_jited_ksyms = u64_to_user_ptr(info.jited_ksyms); + ulong ksym_addr; + u32 i; + + /* copy the address of the kernel symbol corresponding to +* each function +*/ + ulen = min_t(u32, info.nr_jited_ksyms, ulen); + for (i = 0; i < ulen; i++) { + ksym_addr = (ulong) prog->aux->func[i]->bpf_func; + ksym_addr &= PAGE_MASK; + if (put_user((u64) ksym_addr, _jited_ksyms[i])) + return -EFAULT; + } + } + done: if (copy_to_user(uinfo, , info_len) || put_user(info_len, >info.info_len)) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 6c56cce9c4e3..e826c396aba2 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -5426,17 +5426,12 @@ static int jit_subprogs(struct bpf_verifier_env *env) * later look the same as if they were interpreted only. */ for (i = 0, insn = prog->insnsi; i < prog->len; i++, insn++) { - unsigned long addr; - if (insn->code != (BPF_JMP | BPF_CALL) || insn->src_reg != BPF_PSEUDO_CALL) continue; insn->off = env->insn_aux_data[i].call_imm; subprog = find_subprog(env, i + insn->off + 1); - addr = (unsigned long)func[subprog]->bpf_func; - addr &= PAGE_MASK; - insn->imm = (u64 (*)(u64, u64, u64, u64, u64)) - addr - __bpf_call_base; + insn->imm = subprog; } prog->jited = 1; -- 2.14.3
[PATCH bpf v2 2/6] bpf: powerpc64: add JIT support for multi-function programs
This adds support for bpf-to-bpf function calls in the powerpc64 JIT compiler. The JIT compiler converts the bpf call instructions to native branch instructions. After a round of the usual passes, the start addresses of the JITed images for the callee functions are known. Finally, to fixup the branch target addresses, we need to perform an extra pass. Because of the address range in which JITed images are allocated on powerpc64, the offsets of the start addresses of these images from __bpf_call_base are as large as 64 bits. So, for a function call, we cannot use the imm field of the instruction to determine the callee's address. Instead, we use the alternative method of getting it from the list of function addresses in the auxillary data of the caller by using the off field as an index. Signed-off-by: Sandipan Das--- arch/powerpc/net/bpf_jit_comp64.c | 79 ++- 1 file changed, 69 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c index 1bdb1aff0619..25939892d8f7 100644 --- a/arch/powerpc/net/bpf_jit_comp64.c +++ b/arch/powerpc/net/bpf_jit_comp64.c @@ -256,7 +256,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 /* Assemble the body code between the prologue & epilogue */ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *ctx, - u32 *addrs) + u32 *addrs, bool extra_pass) { const struct bpf_insn *insn = fp->insnsi; int flen = fp->len; @@ -712,11 +712,23 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, break; /* -* Call kernel helper +* Call kernel helper or bpf function */ case BPF_JMP | BPF_CALL: ctx->seen |= SEEN_FUNC; - func = (u8 *) __bpf_call_base + imm; + + /* bpf function call */ + if (insn[i].src_reg == BPF_PSEUDO_CALL && extra_pass) + if (fp->aux->func && off < fp->aux->func_cnt) + /* use the subprog id from the off +* field to lookup the callee address +*/ + func = (u8 *) fp->aux->func[off]->bpf_func; + else + return -EINVAL; + /* kernel helper call */ + else + func = (u8 *) __bpf_call_base + imm; bpf_jit_emit_func_call(image, ctx, (u64)func); @@ -864,6 +876,14 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, return 0; } +struct powerpc64_jit_data { + struct bpf_binary_header *header; + u32 *addrs; + u8 *image; + u32 proglen; + struct codegen_context ctx; +}; + struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) { u32 proglen; @@ -871,6 +891,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) u8 *image = NULL; u32 *code_base; u32 *addrs; + struct powerpc64_jit_data *jit_data; struct codegen_context cgctx; int pass; int flen; @@ -878,6 +899,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) struct bpf_prog *org_fp = fp; struct bpf_prog *tmp_fp; bool bpf_blinded = false; + bool extra_pass = false; if (!fp->jit_requested) return org_fp; @@ -891,7 +913,28 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) fp = tmp_fp; } + jit_data = fp->aux->jit_data; + if (!jit_data) { + jit_data = kzalloc(sizeof(*jit_data), GFP_KERNEL); + if (!jit_data) { + fp = org_fp; + goto out; + } + fp->aux->jit_data = jit_data; + } + flen = fp->len; + addrs = jit_data->addrs; + if (addrs) { + cgctx = jit_data->ctx; + image = jit_data->image; + bpf_hdr = jit_data->header; + proglen = jit_data->proglen; + alloclen = proglen + FUNCTION_DESCR_SIZE; + extra_pass = true; + goto skip_init_ctx; + } + addrs = kzalloc((flen+1) * sizeof(*addrs), GFP_KERNEL); if (addrs == NULL) { fp = org_fp; @@ -904,10 +947,10 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) cgctx.stack_size = round_up(fp->aux->stack_depth, 16); /* Scouting faux-generate pass 0 */ - if (bpf_jit_build_body(fp, 0, , addrs)) { + if (bpf_jit_build_body(fp, 0,
[PATCH bpf v2 1/6] bpf: support 64-bit offsets for bpf function calls
The imm field of a bpf instruction is a signed 32-bit integer. For JIT bpf-to-bpf function calls, it stores the offset of the start address of the callee's JITed image from __bpf_call_base. For some architectures, such as powerpc64, this offset may be as large as 64 bits and cannot be accomodated in the imm field without truncation. We resolve this by: [1] Additionally using the auxillary data of each function to keep a list of start addresses of the JITed images for all functions determined by the verifier. [2] Retaining the subprog id inside the off field of the call instructions and using it to index into the list mentioned above and lookup the callee's address. To make sure that the existing JIT compilers continue to work without requiring changes, we keep the imm field as it is. Signed-off-by: Sandipan Das--- kernel/bpf/verifier.c | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index a9e4b1372da6..6c56cce9c4e3 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -5383,11 +5383,24 @@ static int jit_subprogs(struct bpf_verifier_env *env) insn->src_reg != BPF_PSEUDO_CALL) continue; subprog = insn->off; - insn->off = 0; insn->imm = (u64 (*)(u64, u64, u64, u64, u64)) func[subprog]->bpf_func - __bpf_call_base; } + + /* we use the aux data to keep a list of the start addresses +* of the JITed images for each function in the program +* +* for some architectures, such as powerpc64, the imm field +* might not be large enough to hold the offset of the start +* address of the callee's JITed image from __bpf_call_base +* +* in such cases, we can lookup the start address of a callee +* by using its subprog id, available from the off field of +* the call instruction, as an index for this list +*/ + func[i]->aux->func = func; + func[i]->aux->func_cnt = env->subprog_cnt + 1; } for (i = 0; i < env->subprog_cnt; i++) { old_bpf_func = func[i]->bpf_func; -- 2.14.3
[PATCH bpf v2 0/6] bpf: enhancements for multi-function programs
This patch series introduces the following: [1] Support for bpf-to-bpf function calls in the powerpc64 JIT compiler. [2] Provide a way for resolving function calls because of the way JITed images are allocated in powerpc64. [3] Fix to get JITed instruction dumps for multi-function programs from the bpf system call. v2: - Incorporate review comments from Jakub Sandipan Das (6): bpf: support 64-bit offsets for bpf function calls bpf: powerpc64: add JIT support for multi-function programs bpf: get kernel symbol addresses via syscall tools: bpf: sync bpf uapi header tools: bpftool: resolve calls without using imm field bpf: fix JITed dump for multi-function programs via syscall arch/powerpc/net/bpf_jit_comp64.c | 79 ++- include/uapi/linux/bpf.h | 2 + kernel/bpf/syscall.c | 56 --- kernel/bpf/verifier.c | 22 +++ tools/bpf/bpftool/prog.c | 29 ++ tools/bpf/bpftool/xlated_dumper.c | 10 - tools/bpf/bpftool/xlated_dumper.h | 2 + tools/include/uapi/linux/bpf.h| 2 + 8 files changed, 179 insertions(+), 23 deletions(-) -- 2.14.3
Re: [PATCH-RESEND] cxl: Disable prefault_mode in Radix mode
Le 18/05/2018 à 11:42, Vaibhav Jain a écrit : From: Vaibhav JainCurrently we see a kernel-oops reported on Power-9 while attaching a context to an AFU, with radix-mode and sysfs attr 'prefault_mode' set to anything other than 'none'. The backtrace of the oops is of this form: Unable to handle kernel paging request for data at address 0x0080 Faulting instruction address: 0xc0080bcf3b20 cpu 0x1: Vector: 300 (Data Access) at [c0037f003800] pc: c0080bcf3b20: cxl_load_segment+0x178/0x290 [cxl] lr: c0080bcf39f0: cxl_load_segment+0x48/0x290 [cxl] sp: c0037f003a80 msr: 90009033 dar: 80 dsisr: 4000 current = 0xc0037f28 paca= 0xc003e600 softe: 3irq_happened: 0x01 pid = 3529, comm = afp_no_int [c0037f003af0] c0080bcf4424 cxl_prefault+0xfc/0x248 [cxl] [c0037f003b50] c0080bcf8a40 process_element_entry_psl9+0xd8/0x1a0 [cxl] [c0037f003b90] c0080bcf944c cxl_attach_dedicated_process_psl9+0x44/0x130 [cxl] [c0037f003bd0] c0080bcf5448 native_attach_process+0xc0/0x130 [cxl] [c0037f003c50] c0080bcf16cc afu_ioctl+0x3f4/0x5e0 [cxl] [c0037f003d00] c039d98c do_vfs_ioctl+0xdc/0x890 [c0037f003da0] c039e1a8 ksys_ioctl+0x68/0xf0 [c0037f003df0] c039e270 sys_ioctl+0x40/0xa0 [c0037f003e30] c000b320 system_call+0x58/0x6c --- Exception: c01 (System Call) at 10053bb0 The issue is caused as on Power-8 the AFU attr 'prefault_mode' was used to improve initial storage fault performance by prefaulting process segments. However on Power-9 with radix mode we don't have Storage-Segments that we can prefault. Also prefaulting process Pages will be too costly and fine-grained. Hence, since the prefaulting mechanism doesn't makes sense of radix-mode, this patch updates prefault_mode_store() to not allow any other value apart from CXL_PREFAULT_NONE when radix mode is enabled. Cc: Fixes: f24be42aab37 ("cxl: Add psl9 specific code") Signed-off-by: Vaibhav Jain --- Thanks! Acked-by: Frederic Barrat Change-log: Resend -> Updated the commit description to add more info on the issue seen [Andrew] --- Documentation/ABI/testing/sysfs-class-cxl | 4 +++- drivers/misc/cxl/sysfs.c | 16 2 files changed, 15 insertions(+), 5 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-class-cxl b/Documentation/ABI/testing/sysfs-class-cxl index 640f65e79ef1..267920a1874b 100644 --- a/Documentation/ABI/testing/sysfs-class-cxl +++ b/Documentation/ABI/testing/sysfs-class-cxl @@ -69,7 +69,9 @@ Date: September 2014 Contact:linuxppc-dev@lists.ozlabs.org Description:read/write Set the mode for prefaulting in segments into the segment table -when performing the START_WORK ioctl. Possible values: +when performing the START_WORK ioctl. Only applicable when +running under hashed page table mmu. +Possible values: none: No prefaulting (default) work_element_descriptor: Treat the work element descriptor as an effective address and diff --git a/drivers/misc/cxl/sysfs.c b/drivers/misc/cxl/sysfs.c index 4b5a4c5d3c01..629e2e156412 100644 --- a/drivers/misc/cxl/sysfs.c +++ b/drivers/misc/cxl/sysfs.c @@ -353,12 +353,20 @@ static ssize_t prefault_mode_store(struct device *device, struct cxl_afu *afu = to_cxl_afu(device); enum prefault_modes mode = -1; - if (!strncmp(buf, "work_element_descriptor", 23)) - mode = CXL_PREFAULT_WED; - if (!strncmp(buf, "all", 3)) - mode = CXL_PREFAULT_ALL; if (!strncmp(buf, "none", 4)) mode = CXL_PREFAULT_NONE; + else { + if (!radix_enabled()) { + + /* only allowed when not in radix mode */ + if (!strncmp(buf, "work_element_descriptor", 23)) + mode = CXL_PREFAULT_WED; + if (!strncmp(buf, "all", 3)) + mode = CXL_PREFAULT_ALL; + } else { + dev_err(device, "Cannot prefault with radix enabled\n"); + } + } if (mode == -1) return -EINVAL;
Re: [PATCH 07/14] powerpc: Add support for restartable sequences
Mathieu Desnoyerswrites: > - On May 16, 2018, at 9:19 PM, Boqun Feng boqun.f...@gmail.com wrote: >> On Wed, May 16, 2018 at 04:13:16PM -0400, Mathieu Desnoyers wrote: >>> - On May 16, 2018, at 12:18 PM, Peter Zijlstra pet...@infradead.org >>> wrote: >>> > On Mon, Apr 30, 2018 at 06:44:26PM -0400, Mathieu Desnoyers wrote: >>> >> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig >>> >> index c32a181a7cbb..ed21a777e8c6 100644 >>> >> --- a/arch/powerpc/Kconfig >>> >> +++ b/arch/powerpc/Kconfig >>> >> @@ -223,6 +223,7 @@ config PPC >>> >> select HAVE_SYSCALL_TRACEPOINTS >>> >> select HAVE_VIRT_CPU_ACCOUNTING >>> >> select HAVE_IRQ_TIME_ACCOUNTING >>> >> +select HAVE_RSEQ >>> >> select IRQ_DOMAIN >>> >> select IRQ_FORCED_THREADING >>> >> select MODULES_USE_ELF_RELA >>> >> diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c >>> >> index 61db86ecd318..d3bb3aaaf5ac 100644 >>> >> --- a/arch/powerpc/kernel/signal.c >>> >> +++ b/arch/powerpc/kernel/signal.c >>> >> @@ -133,6 +133,8 @@ static void do_signal(struct task_struct *tsk) >>> >> /* Re-enable the breakpoints for the signal stack */ >>> >> thread_change_pc(tsk, tsk->thread.regs); >>> >> >>> >> +rseq_signal_deliver(tsk->thread.regs); >>> >> + >>> >> if (is32) { >>> >> if (ksig.ka.sa.sa_flags & SA_SIGINFO) >>> >> ret = handle_rt_signal32(, oldset, tsk); >>> >> @@ -164,6 +166,7 @@ void do_notify_resume(struct pt_regs *regs, unsigned >>> >> long >>> >> thread_info_flags) >>> >> if (thread_info_flags & _TIF_NOTIFY_RESUME) { >>> >> clear_thread_flag(TIF_NOTIFY_RESUME); >>> >> tracehook_notify_resume(regs); >>> >> +rseq_handle_notify_resume(regs); >>> >> } >>> >> >>> >> user_enter(); >>> > >>> > Again no rseq_syscall(). >>> >>> Same question for PowerPC as for ARM: >>> >>> Considering that rseq_syscall is implemented as follows: >>> >>> +void rseq_syscall(struct pt_regs *regs) >>> +{ >>> + unsigned long ip = instruction_pointer(regs); >>> + struct task_struct *t = current; >>> + struct rseq_cs rseq_cs; >>> + >>> + if (!t->rseq) >>> + return; >>> + if (!access_ok(VERIFY_READ, t->rseq, sizeof(*t->rseq)) || >>> + rseq_get_rseq_cs(t, _cs) || in_rseq_cs(ip, _cs)) >>> + force_sig(SIGSEGV, t); >>> +} >>> >>> and that x86 calls it from syscall_return_slowpath() (which AFAIU is >>> now used in the fast-path since KPTI), I wonder where we should call >> >> So we actually detect this after the syscall takes effect, right? I >> wonder whether this could be problematic, because "disallowing syscall" >> in rseq areas may means the syscall won't take effect to some people, I >> guess? >> >>> this on PowerPC ? I was under the impression that PowerPC return to >>> userspace fast-path was not calling C code unless work flags were set, >>> but I might be wrong. >>> >> >> I think you're right. So we have to introduce callsite to rseq_syscall() >> in syscall path, something like: >> >> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S >> index 51695608c68b..a25734a96640 100644 >> --- a/arch/powerpc/kernel/entry_64.S >> +++ b/arch/powerpc/kernel/entry_64.S >> @@ -222,6 +222,9 @@ system_call_exit: >> mtmsrd r11,1 >> #endif /* CONFIG_PPC_BOOK3E */ >> >> +addir3,r1,STACK_FRAME_OVERHEAD >> +bl rseq_syscall >> + >> ld r9,TI_FLAGS(r12) >> li r11,-MAX_ERRNO >> andi. >> >> r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) >> >> But I think it's important for us to first decide where (before or after >> the syscall) we do the detection. > > As Peter said, we don't really care whether it's on syscall entry or exit, as > long as the process gets killed when the erroneous use is detected. I think > doing > it on syscall exit is a bit easier because we can clearly access the userspace > TLS, which AFAIU may be less straightforward on syscall entry. Coming in to the thread late, sorry if I'm missing the point. > We may want to add #ifdef CONFIG_DEBUG_RSEQ / #endif around the code you > proposed above, so it's only compiled in if CONFIG_DEBUG_RSEQ=y. That sounds good. A function call is not free even if it returns immediately. > On the ARM leg of the email thread, Will Deacon suggests to test whether > current->rseq > is non-NULL before calling rseq_syscall(). I wonder if this added check is > justified > as the assembly level, considering that this is just a debugging option. We > already do > that check at the very beginning of rseq_syscall(). I guess it depends if this is one of those "debugging options" that's going to end up turned on in distro kernels? I think in that code we'd need to check
Re: [PATCH v2 00/10] KVM: PPC: reimplement mmio emulation with analyse_instr()
On Mon, May 07, 2018 at 02:20:06PM +0800, wei.guo.si...@gmail.com wrote: > From: Simon Guo> > We already have analyse_instr() which analyzes instructions for the > instruction > type, size, addtional flags, etc. What kvmppc_emulate_loadstore() did is > somehow > duplicated and it will be good to utilize analyse_instr() to reimplement the > code. The advantage is that the code logic will be shared and more clean to > be > maintained. I have put patches 1-3 into my kvm-ppc-next branch. Thanks, Paul.
Re: [PATCH 1/3] powerpc/io: Add __raw_writeq_be() __raw_rm_writeq_be()
Samuel Mendoza-Jonaswrites: > On Mon, 2018-05-14 at 22:50 +1000, Michael Ellerman wrote: >> Add byte-swapping versions of __raw_writeq() and __raw_rm_writeq(). >> >> This allows us to avoid sparse warnings caused by passing __be64 to >> __raw_writeq(), which takes unsigned long: >> >> arch/powerpc/platforms/powernv/pci-ioda.c:1981:38: >> warning: incorrect type in argument 1 (different base types) >> expected unsigned long [unsigned] v >> got restricted __be64 [usertype] >> >> It's also generally preferable to use a byte-swapping accessor rather >> than doing it by hand in the code, which is more bug prone. >> >> Signed-off-by: Michael Ellerman > > For this and the following patches: > > Reviewed-by: Samuel Mendoza-Jonas Thanks. cheers
Re: [PATCH v2 5/5] powerpc/lib: inline memcmp() for small constant sizes
On 05/17/2018 03:55 PM, Segher Boessenkool wrote: On Thu, May 17, 2018 at 12:49:58PM +0200, Christophe Leroy wrote: In my 8xx configuration, I get 208 calls to memcmp() Within those 208 calls, about half of them have constant sizes, 46 have a size of 8, 17 have a size of 16, only a few have a size over 16. Other fixed sizes are mostly 4, 6 and 10. This patch inlines calls to memcmp() when size is constant and lower than or equal to 16 In my 8xx configuration, this reduces the number of calls to memcmp() from 208 to 123 The following table shows the number of TB timeticks to perform a constant size memcmp() before and after the patch depending on the size Before After Improvement 01: 75775682 25% 02: 416685682 86% 03: 51137 13258 74% 04: 454555682 87% 05: 58713 13258 77% 06: 58712 13258 77% 07: 68183 20834 70% 08: 56819 15153 73% 09: 70077 28411 60% 10: 70077 28411 60% 11: 79546 35986 55% 12: 68182 28411 58% 13: 81440 35986 55% 14: 81440 39774 51% 15: 94697 43562 54% 16: 79546 37881 52% Could you show results with a more recent GCC? What version was this? It was with the latest GCC version I have available in my environment, that is GCC 5.4. Is that too old ? It seems that version inlines memcmp() when length is 1. All other lengths call memcmp() What is this really measuring? I doubt it takes 7577 (or 5682) timebase ticks to do a 1-byte memcmp, which is just 3 instructions after all. Well I looked again in my tests and it seems some results are wrong, can remember why, I probably did something wrong when I did the tests. Anyway, the principle is to call a function tstcmpX() 10 times from a loop, and getting the mftb before and after the loop. Then we remove from the elapsed time the time spent when calling tstcmp0() which is only a blr. Therefore, we get really the time spent in the comparison only. Here is the loop: c06243b0: 7f ac 42 e6 mftbr29 c06243b4: 3f 60 00 01 lis r27,1 c06243b8: 63 7b 86 a0 ori r27,r27,34464 c06243bc: 38 a0 00 02 li r5,2 c06243c0: 7f c4 f3 78 mr r4,r30 c06243c4: 7f 83 e3 78 mr r3,r28 c06243c8: 4b 9e 8c 09 bl c000cfd0 c06243cc: 2c 1b 00 01 cmpwi r27,1 c06243d0: 3b 7b ff ff addir27,r27,-1 c06243d4: 40 82 ff e8 bne c06243bcc06243d8: 7c ac 42 e6 mftbr5 c06243dc: 7c bd 28 50 subfr5,r29,r5 c06243e0: 7c bf 28 50 subfr5,r31,r5 c06243e4: 38 80 00 02 li r4,2 c06243e8: 7f 43 d3 78 mr r3,r26 c06243ec: 4b a2 e4 45 bl c0052830 Before the patch: c000cfc4 : c000cfc4: 4e 80 00 20 blr c000cfc8 : c000cfc8: 38 a0 00 01 li r5,1 c000cfcc: 48 00 72 08 b c00141d4 <__memcmp> c000cfd0 : c000cfd0: 38 a0 00 02 li r5,2 c000cfd4: 48 00 72 00 b c00141d4 <__memcmp> c000cfd8 : c000cfd8: 38 a0 00 03 li r5,3 c000cfdc: 48 00 71 f8 b c00141d4 <__memcmp> After the patch: c000cfc4 : c000cfc4: 4e 80 00 20 blr c000cfd8 : c000cfd8: 88 64 00 00 lbz r3,0(r4) c000cfdc: 89 25 00 00 lbz r9,0(r5) c000cfe0: 7c 69 18 50 subfr3,r9,r3 c000cfe4: 4e 80 00 20 blr c000cfe8 : c000cfe8: a0 64 00 00 lhz r3,0(r4) c000cfec: a1 25 00 00 lhz r9,0(r5) c000cff0: 7c 69 18 50 subfr3,r9,r3 c000cff4: 4e 80 00 20 blr c000cff8 : c000cff8: a1 24 00 00 lhz r9,0(r4) c000cffc: a0 65 00 00 lhz r3,0(r5) c000d000: 7c 63 48 51 subf. r3,r3,r9 c000d004: 4c 82 00 20 bnelr c000d008: 88 64 00 02 lbz r3,2(r4) c000d00c: 89 25 00 02 lbz r9,2(r5) c000d010: 7c 69 18 50 subfr3,r9,r3 c000d014: 4e 80 00 20 blr c000d018 : c000d018: 80 64 00 00 lwz r3,0(r4) c000d01c: 81 25 00 00 lwz r9,0(r5) c000d020: 7c 69 18 50 subfr3,r9,r3 c000d024: 4e 80 00 20 blr c000d028 : c000d028: 81 24 00 00 lwz r9,0(r4) c000d02c: 80 65 00 00 lwz r3,0(r5) c000d030: 7c 63 48 51 subf. r3,r3,r9 c000d034: 4c 82 00 20 bnelr c000d038: 88 64 00 04 lbz r3,4(r4) c000d03c: 89 25 00 04 lbz r9,4(r5) c000d040: 7c 69 18 50 subfr3,r9,r3 c000d044: 4e 80 00 20 blr c000d048 : c000d048: 81 24 00 00 lwz r9,0(r4) c000d04c: 80 65 00 00 lwz r3,0(r5) c000d050: 7c 63 48 51 subf. r3,r3,r9 c000d054: 4c 82 00 20 bnelr c000d058: a0 64 00 04 lhz r3,4(r4) c000d05c: a1 25 00 04 lhz r9,4(r5) c000d060: 7c 69 18 50 subfr3,r9,r3 c000d064:
[PATCH-RESEND] cxl: Disable prefault_mode in Radix mode
From: Vaibhav JainCurrently we see a kernel-oops reported on Power-9 while attaching a context to an AFU, with radix-mode and sysfs attr 'prefault_mode' set to anything other than 'none'. The backtrace of the oops is of this form: Unable to handle kernel paging request for data at address 0x0080 Faulting instruction address: 0xc0080bcf3b20 cpu 0x1: Vector: 300 (Data Access) at [c0037f003800] pc: c0080bcf3b20: cxl_load_segment+0x178/0x290 [cxl] lr: c0080bcf39f0: cxl_load_segment+0x48/0x290 [cxl] sp: c0037f003a80 msr: 90009033 dar: 80 dsisr: 4000 current = 0xc0037f28 paca= 0xc003e600 softe: 3irq_happened: 0x01 pid = 3529, comm = afp_no_int [c0037f003af0] c0080bcf4424 cxl_prefault+0xfc/0x248 [cxl] [c0037f003b50] c0080bcf8a40 process_element_entry_psl9+0xd8/0x1a0 [cxl] [c0037f003b90] c0080bcf944c cxl_attach_dedicated_process_psl9+0x44/0x130 [cxl] [c0037f003bd0] c0080bcf5448 native_attach_process+0xc0/0x130 [cxl] [c0037f003c50] c0080bcf16cc afu_ioctl+0x3f4/0x5e0 [cxl] [c0037f003d00] c039d98c do_vfs_ioctl+0xdc/0x890 [c0037f003da0] c039e1a8 ksys_ioctl+0x68/0xf0 [c0037f003df0] c039e270 sys_ioctl+0x40/0xa0 [c0037f003e30] c000b320 system_call+0x58/0x6c --- Exception: c01 (System Call) at 10053bb0 The issue is caused as on Power-8 the AFU attr 'prefault_mode' was used to improve initial storage fault performance by prefaulting process segments. However on Power-9 with radix mode we don't have Storage-Segments that we can prefault. Also prefaulting process Pages will be too costly and fine-grained. Hence, since the prefaulting mechanism doesn't makes sense of radix-mode, this patch updates prefault_mode_store() to not allow any other value apart from CXL_PREFAULT_NONE when radix mode is enabled. Cc: Fixes: f24be42aab37 ("cxl: Add psl9 specific code") Signed-off-by: Vaibhav Jain --- Change-log: Resend -> Updated the commit description to add more info on the issue seen [Andrew] --- Documentation/ABI/testing/sysfs-class-cxl | 4 +++- drivers/misc/cxl/sysfs.c | 16 2 files changed, 15 insertions(+), 5 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-class-cxl b/Documentation/ABI/testing/sysfs-class-cxl index 640f65e79ef1..267920a1874b 100644 --- a/Documentation/ABI/testing/sysfs-class-cxl +++ b/Documentation/ABI/testing/sysfs-class-cxl @@ -69,7 +69,9 @@ Date: September 2014 Contact:linuxppc-dev@lists.ozlabs.org Description:read/write Set the mode for prefaulting in segments into the segment table -when performing the START_WORK ioctl. Possible values: +when performing the START_WORK ioctl. Only applicable when +running under hashed page table mmu. +Possible values: none: No prefaulting (default) work_element_descriptor: Treat the work element descriptor as an effective address and diff --git a/drivers/misc/cxl/sysfs.c b/drivers/misc/cxl/sysfs.c index 4b5a4c5d3c01..629e2e156412 100644 --- a/drivers/misc/cxl/sysfs.c +++ b/drivers/misc/cxl/sysfs.c @@ -353,12 +353,20 @@ static ssize_t prefault_mode_store(struct device *device, struct cxl_afu *afu = to_cxl_afu(device); enum prefault_modes mode = -1; - if (!strncmp(buf, "work_element_descriptor", 23)) - mode = CXL_PREFAULT_WED; - if (!strncmp(buf, "all", 3)) - mode = CXL_PREFAULT_ALL; if (!strncmp(buf, "none", 4)) mode = CXL_PREFAULT_NONE; + else { + if (!radix_enabled()) { + + /* only allowed when not in radix mode */ + if (!strncmp(buf, "work_element_descriptor", 23)) + mode = CXL_PREFAULT_WED; + if (!strncmp(buf, "all", 3)) + mode = CXL_PREFAULT_ALL; + } else { + dev_err(device, "Cannot prefault with radix enabled\n"); + } + } if (mode == -1) return -EINVAL; -- 2.17.0
[PATCH] powerpc: fix spelling mistake: "Discharching" -> "Discharging"
From: Colin Ian KingTrivial fix to spelling mistake in battery_charging array Signed-off-by: Colin Ian King --- arch/powerpc/kernel/rtas-proc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/rtas-proc.c b/arch/powerpc/kernel/rtas-proc.c index d49063d0baa4..ed0ed0c9b7b3 100644 --- a/arch/powerpc/kernel/rtas-proc.c +++ b/arch/powerpc/kernel/rtas-proc.c @@ -504,7 +504,7 @@ static void ppc_rtas_process_sensor(struct seq_file *m, "EPOW power off" }; const char * battery_cyclestate[] = { "None", "In progress", "Requested" }; - const char * battery_charging[]= { "Charging", "Discharching", + const char * battery_charging[]= { "Charging", "Discharging", "No current flow" }; const char * ibm_drconnector[] = { "Empty", "Present", "Unusable", "Exchange" }; -- 2.17.0
Re: [PATCH 4.9 27/33] futex: Remove duplicated code and fix undefined behaviour
On Fri, May 18, 2018 at 10:30:24AM +0200, Jiri Slaby wrote: > On 05/18/2018, 10:16 AM, Greg Kroah-Hartman wrote: > > 4.9-stable review patch. If anyone has any objections, please let me know. > > > > -- > > > > From: Jiri Slaby> > > > commit 30d6e0a4190d37740e9447e4e4815f06992dd8c3 upstream. > ... > > --- a/kernel/futex.c > > +++ b/kernel/futex.c > > @@ -1458,6 +1458,45 @@ out: > > return ret; > > } > > > > +static int futex_atomic_op_inuser(unsigned int encoded_op, u32 __user > > *uaddr) > > +{ > > + unsigned int op = (encoded_op & 0x7000) >> 28; > > + unsigned int cmp =(encoded_op & 0x0f00) >> 24; > > + int oparg = sign_extend32((encoded_op & 0x00fff000) >> 12, 12); > > + int cmparg = sign_extend32(encoded_op & 0x0fff, 12); > > 12 is wrong here – wherever you apply this, you need also a follow-up fix: > commit d70ef22892ed6c066e51e118b225923c9b74af34 > Author: Jiri Slaby > Date: Thu Nov 30 15:35:44 2017 +0100 > > futex: futex_wake_op, fix sign_extend32 sign bits Thanks for letting me know, I've now queued it up to the needed trees. greg k-h
Re: [PATCH 4.9 27/33] futex: Remove duplicated code and fix undefined behaviour
On 05/18/2018, 10:16 AM, Greg Kroah-Hartman wrote: > 4.9-stable review patch. If anyone has any objections, please let me know. > > -- > > From: Jiri Slaby> > commit 30d6e0a4190d37740e9447e4e4815f06992dd8c3 upstream. ... > --- a/kernel/futex.c > +++ b/kernel/futex.c > @@ -1458,6 +1458,45 @@ out: > return ret; > } > > +static int futex_atomic_op_inuser(unsigned int encoded_op, u32 __user *uaddr) > +{ > + unsigned int op = (encoded_op & 0x7000) >> 28; > + unsigned int cmp =(encoded_op & 0x0f00) >> 24; > + int oparg = sign_extend32((encoded_op & 0x00fff000) >> 12, 12); > + int cmparg = sign_extend32(encoded_op & 0x0fff, 12); 12 is wrong here – wherever you apply this, you need also a follow-up fix: commit d70ef22892ed6c066e51e118b225923c9b74af34 Author: Jiri Slaby Date: Thu Nov 30 15:35:44 2017 +0100 futex: futex_wake_op, fix sign_extend32 sign bits thanks, -- js suse labs
[PATCH 4.9 27/33] futex: Remove duplicated code and fix undefined behaviour
4.9-stable review patch. If anyone has any objections, please let me know. -- From: Jiri Slabycommit 30d6e0a4190d37740e9447e4e4815f06992dd8c3 upstream. There is code duplicated over all architecture's headers for futex_atomic_op_inuser. Namely op decoding, access_ok check for uaddr, and comparison of the result. Remove this duplication and leave up to the arches only the needed assembly which is now in arch_futex_atomic_op_inuser. This effectively distributes the Will Deacon's arm64 fix for undefined behaviour reported by UBSAN to all architectures. The fix was done in commit 5f16a046f8e1 (arm64: futex: Fix undefined behaviour with FUTEX_OP_OPARG_SHIFT usage). Look there for an example dump. And as suggested by Thomas, check for negative oparg too, because it was also reported to cause undefined behaviour report. Note that s390 removed access_ok check in d12a29703 ("s390/uaccess: remove pointless access_ok() checks") as access_ok there returns true. We introduce it back to the helper for the sake of simplicity (it gets optimized away anyway). Signed-off-by: Jiri Slaby Signed-off-by: Thomas Gleixner Acked-by: Russell King Acked-by: Michael Ellerman (powerpc) Acked-by: Heiko Carstens [s390] Acked-by: Chris Metcalf [for tile] Reviewed-by: Darren Hart (VMware) Reviewed-by: Will Deacon [core/arm64] Cc: linux-m...@linux-mips.org Cc: Rich Felker Cc: linux-i...@vger.kernel.org Cc: linux...@vger.kernel.org Cc: pet...@infradead.org Cc: Benjamin Herrenschmidt Cc: Max Filippov Cc: Paul Mackerras Cc: sparcli...@vger.kernel.org Cc: Jonas Bonn Cc: linux-s...@vger.kernel.org Cc: linux-a...@vger.kernel.org Cc: Yoshinori Sato Cc: linux-hexa...@vger.kernel.org Cc: Helge Deller Cc: "James E.J. Bottomley" Cc: Catalin Marinas Cc: Matt Turner Cc: linux-snps-...@lists.infradead.org Cc: Fenghua Yu Cc: Arnd Bergmann Cc: linux-xte...@linux-xtensa.org Cc: Stefan Kristiansson Cc: openr...@lists.librecores.org Cc: Ivan Kokshaysky Cc: Stafford Horne Cc: linux-arm-ker...@lists.infradead.org Cc: Richard Henderson Cc: Chris Zankel Cc: Michal Simek Cc: Tony Luck Cc: linux-par...@vger.kernel.org Cc: Vineet Gupta Cc: Ralf Baechle Cc: Richard Kuo Cc: linux-al...@vger.kernel.org Cc: Martin Schwidefsky Cc: linuxppc-dev@lists.ozlabs.org Cc: "David S. Miller" Link: http://lkml.kernel.org/r/20170824073105.3901-1-jsl...@suse.cz Cc: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- arch/alpha/include/asm/futex.h | 26 +++--- arch/arc/include/asm/futex.h| 40 +++- arch/arm/include/asm/futex.h| 26 ++ arch/arm64/include/asm/futex.h | 27 ++- arch/frv/include/asm/futex.h|3 +- arch/frv/kernel/futex.c | 27 ++- arch/hexagon/include/asm/futex.h| 38 ++- arch/ia64/include/asm/futex.h | 25 ++ arch/microblaze/include/asm/futex.h | 38 ++- arch/mips/include/asm/futex.h | 25 ++ arch/parisc/include/asm/futex.h | 26 ++ arch/powerpc/include/asm/futex.h| 26 +++--- arch/s390/include/asm/futex.h | 23 +++- arch/sh/include/asm/futex.h | 26 ++ arch/sparc/include/asm/futex_64.h | 26 +++--- arch/tile/include/asm/futex.h | 40 +++- arch/x86/include/asm/futex.h| 40 +++- arch/xtensa/include/asm/futex.h | 27 +++ include/asm-generic/futex.h | 50 ++-- kernel/futex.c | 39 20 files changed, 126 insertions(+), 472 deletions(-) --- a/arch/alpha/include/asm/futex.h +++ b/arch/alpha/include/asm/futex.h @@ -29,18 +29,10 @@ : "r" (uaddr), "r"(oparg) \ : "memory") -static inline int futex_atomic_op_inuser (int encoded_op, u32 __user *uaddr) +static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval, + u32 __user *uaddr) { - int op = (encoded_op
[PATCH] powerpc/perf: Remove sched_task function defined for thread-imc
Call trace observed while running perf-fuzzer: [ 329.228068] CPU: 43 PID: 9088 Comm: perf_fuzzer Not tainted 4.13.0-32-generic #35~lp1746225 [ 329.228070] task: c03f776ac900 task.stack: c03f77728000 [ 329.228071] NIP: c0299b70 LR: c02a4534 CTR: c029bb80 [ 329.228073] REGS: c03f7772b760 TRAP: 0700 Not tainted (4.13.0-32-generic) [ 329.228073] MSR: 9282b033[ 329.228079] CR: 24008822 XER: [ 329.228080] CFAR: c0299a70 SOFTE: 0 GPR00: c02a4534 c03f7772b9e0 c1606200 c03fef858908 GPR04: c03f776ac900 0001 003fee73 GPR08: c11220d8 0002 GPR12: c029bb80 c7a3d900 GPR16: GPR20: c03f776ad090 c0c71354 GPR24: c03fef716780 003fee73 c03fe69d4200 c03f776ad330 GPR28: c11220d8 0001 c14c6108 c03fef858900 [ 329.228098] NIP [c0299b70] perf_pmu_sched_task+0x170/0x180 [ 329.228100] LR [c02a4534] __perf_event_task_sched_in+0xc4/0x230 [ 329.228101] Call Trace: [ 329.228102] [c03f7772b9e0] [c02a0678] perf_iterate_sb+0x158/0x2a0 (unreliable) [ 329.228105] [c03f7772ba30] [c02a4534] __perf_event_task_sched_in+0xc4/0x230 [ 329.228107] [c03f7772bab0] [c01396dc] finish_task_switch+0x21c/0x310 [ 329.228109] [c03f7772bb60] [c0c71354] __schedule+0x304/0xb80 [ 329.228111] [c03f7772bc40] [c0c71c10] schedule+0x40/0xc0 [ 329.228113] [c03f7772bc60] [c01033f4] do_wait+0x254/0x2e0 [ 329.228115] [c03f7772bcd0] [c0104ac0] kernel_wait4+0xa0/0x1a0 [ 329.228117] [c03f7772bd70] [c0104c24] SyS_wait4+0x64/0xc0 [ 329.228121] [c03f7772be30] [c000b184] system_call+0x58/0x6c [ 329.228121] Instruction dump: [ 329.228123] 3beafea0 7faa4800 409eff18 e8010060 eb610028 ebc10040 7c0803a6 38210050 [ 329.228127] eb81ffe0 eba1ffe8 ebe1fff8 4e800020 <0fe0> 4bbc 6000 6042 [ 329.228131] ---[ end trace 8c46856d314c1811 ]--- [ 375.755943] hrtimer: interrupt took 31601 ns The context switch call-backs for thread-imc are defined in sched_task function. So when thread-imc events are grouped with software pmu events, perf_pmu_sched_task hits the WARN_ON_ONCE condition, since software PMUs are assumed not to have a sched_task defined. Patch to move the thread_imc enable/disable opal call back from sched_task to event_[add/del] function Signed-off-by: Anju T Sudhakar --- arch/powerpc/perf/imc-pmu.c | 108 +--- 1 file changed, 51 insertions(+), 57 deletions(-) diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c index d7532e7..71d9ba7 100644 --- a/arch/powerpc/perf/imc-pmu.c +++ b/arch/powerpc/perf/imc-pmu.c @@ -866,59 +866,6 @@ static int thread_imc_cpu_init(void) ppc_thread_imc_cpu_offline); } -void thread_imc_pmu_sched_task(struct perf_event_context *ctx, - bool sched_in) -{ - int core_id; - struct imc_pmu_ref *ref; - - if (!is_core_imc_mem_inited(smp_processor_id())) - return; - - core_id = smp_processor_id() / threads_per_core; - /* -* imc pmus are enabled only when it is used. -* See if this is triggered for the first time. -* If yes, take the mutex lock and enable the counters. -* If not, just increment the count in ref count struct. -*/ - ref = _imc_refc[core_id]; - if (!ref) - return; - - if (sched_in) { - mutex_lock(>lock); - if (ref->refc == 0) { - if (opal_imc_counters_start(OPAL_IMC_COUNTERS_CORE, -get_hard_smp_processor_id(smp_processor_id( { - mutex_unlock(>lock); - pr_err("thread-imc: Unable to start the counter\ - for core %d\n", core_id); - return; - } - } - ++ref->refc; - mutex_unlock(>lock); - } else { - mutex_lock(>lock); - ref->refc--; - if (ref->refc == 0) { - if (opal_imc_counters_stop(OPAL_IMC_COUNTERS_CORE, - get_hard_smp_processor_id(smp_processor_id( { - mutex_unlock(>lock); - pr_err("thread-imc: Unable to stop the counters\ - for core %d\n", core_id); -
[PATCH] cxl: Disable prefault_mode in Radix mode
From: Vaibhav JainOn Power-8 the AFU attr prefault_mode tried to improve storage fault performance by prefaulting process segments. However Power-9 radix mode doesn't have Storage-Segments and prefaulting Pages is too fine grained. So this patch updates prefault_mode_store() to not allow any other value apart from CXL_PREFAULT_NONE when radix mode is enabled. Cc: Fixes: f24be42aab37 ("cxl: Add psl9 specific code") Signed-off-by: Vaibhav Jain --- Documentation/ABI/testing/sysfs-class-cxl | 4 +++- drivers/misc/cxl/sysfs.c | 16 2 files changed, 15 insertions(+), 5 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-class-cxl b/Documentation/ABI/testing/sysfs-class-cxl index 640f65e79ef1..267920a1874b 100644 --- a/Documentation/ABI/testing/sysfs-class-cxl +++ b/Documentation/ABI/testing/sysfs-class-cxl @@ -69,7 +69,9 @@ Date: September 2014 Contact:linuxppc-dev@lists.ozlabs.org Description:read/write Set the mode for prefaulting in segments into the segment table -when performing the START_WORK ioctl. Possible values: +when performing the START_WORK ioctl. Only applicable when +running under hashed page table mmu. +Possible values: none: No prefaulting (default) work_element_descriptor: Treat the work element descriptor as an effective address and diff --git a/drivers/misc/cxl/sysfs.c b/drivers/misc/cxl/sysfs.c index 4b5a4c5d3c01..629e2e156412 100644 --- a/drivers/misc/cxl/sysfs.c +++ b/drivers/misc/cxl/sysfs.c @@ -353,12 +353,20 @@ static ssize_t prefault_mode_store(struct device *device, struct cxl_afu *afu = to_cxl_afu(device); enum prefault_modes mode = -1; - if (!strncmp(buf, "work_element_descriptor", 23)) - mode = CXL_PREFAULT_WED; - if (!strncmp(buf, "all", 3)) - mode = CXL_PREFAULT_ALL; if (!strncmp(buf, "none", 4)) mode = CXL_PREFAULT_NONE; + else { + if (!radix_enabled()) { + + /* only allowed when not in radix mode */ + if (!strncmp(buf, "work_element_descriptor", 23)) + mode = CXL_PREFAULT_WED; + if (!strncmp(buf, "all", 3)) + mode = CXL_PREFAULT_ALL; + } else { + dev_err(device, "Cannot prefault with radix enabled\n"); + } + } if (mode == -1) return -EINVAL; -- 2.17.0
Re: [PATCH 1/3] powerpc/io: Add __raw_writeq_be() __raw_rm_writeq_be()
On Mon, 2018-05-14 at 22:50 +1000, Michael Ellerman wrote: > Add byte-swapping versions of __raw_writeq() and __raw_rm_writeq(). > > This allows us to avoid sparse warnings caused by passing __be64 to > __raw_writeq(), which takes unsigned long: > > arch/powerpc/platforms/powernv/pci-ioda.c:1981:38: > warning: incorrect type in argument 1 (different base types) > expected unsigned long [unsigned] v > got restricted __be64 [usertype] > > It's also generally preferable to use a byte-swapping accessor rather > than doing it by hand in the code, which is more bug prone. > > Signed-off-by: Michael EllermanFor this and the following patches: Reviewed-by: Samuel Mendoza-Jonas > --- > arch/powerpc/include/asm/io.h | 10 ++ > 1 file changed, 10 insertions(+) > > diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h > index af074923d598..e0331e754568 100644 > --- a/arch/powerpc/include/asm/io.h > +++ b/arch/powerpc/include/asm/io.h > @@ -367,6 +367,11 @@ static inline void __raw_writeq(unsigned long v, > volatile void __iomem *addr) > *(volatile unsigned long __force *)PCI_FIX_ADDR(addr) = v; > } > > +static inline void __raw_writeq_be(unsigned long v, volatile void __iomem > *addr) > +{ > + __raw_writeq((__force unsigned long)cpu_to_be64(v), addr); > +} > + > /* > * Real mode versions of the above. Those instructions are only supposed > * to be used in hypervisor real mode as per the architecture spec. > @@ -395,6 +400,11 @@ static inline void __raw_rm_writeq(u64 val, volatile > void __iomem *paddr) > : : "r" (val), "r" (paddr) : "memory"); > } > > +static inline void __raw_rm_writeq_be(u64 val, volatile void __iomem *paddr) > +{ > + __raw_rm_writeq((__force u64)cpu_to_be64(val), paddr); > +} > + > static inline u8 __raw_rm_readb(volatile void __iomem *paddr) > { > u8 ret;
Re: [PATCH v4 3/4] powerpc/64: add 32 bytes prechecking before using VMX optimization on memcmp()
Hi Michael, On Fri, May 18, 2018 at 12:13:52AM +1000, Michael Ellerman wrote: > wei.guo.si...@gmail.com writes: > > From: Simon Guo> > > > This patch is based on the previous VMX patch on memcmp(). > > > > To optimize ppc64 memcmp() with VMX instruction, we need to think about > > the VMX penalty brought with: If kernel uses VMX instruction, it needs > > to save/restore current thread's VMX registers. There are 32 x 128 bits > > VMX registers in PPC, which means 32 x 16 = 512 bytes for load and store. > > > > The major concern regarding the memcmp() performance in kernel is KSM, > > who will use memcmp() frequently to merge identical pages. So it will > > make sense to take some measures/enhancement on KSM to see whether any > > improvement can be done here. Cyril Bur indicates that the memcmp() for > > KSM has a higher possibility to fail (unmatch) early in previous bytes > > in following mail. > > https://patchwork.ozlabs.org/patch/817322/#1773629 > > And I am taking a follow-up on this with this patch. > > > > Per some testing, it shows KSM memcmp() will fail early at previous 32 > > bytes. More specifically: > > - 76% cases will fail/unmatch before 16 bytes; > > - 83% cases will fail/unmatch before 32 bytes; > > - 84% cases will fail/unmatch before 64 bytes; > > So 32 bytes looks a better choice than other bytes for pre-checking. > > > > This patch adds a 32 bytes pre-checking firstly before jumping into VMX > > operations, to avoid the unnecessary VMX penalty. And the testing shows > > ~20% improvement on memcmp() average execution time with this patch. > > > > The detail data and analysis is at: > > https://github.com/justdoitqd/publicFiles/blob/master/memcmp/README.md > > > > Any suggestion is welcome. > > Thanks for digging into that, really great work. > > I'm inclined to make this not depend on KSM though. It seems like a good > optimisation to do in general. > > So can we just call it the 'pre-check' or something, and always do it? > Sound reasonable to me. I will expand the change to .Ldiffoffset_vmx_cmp case and test accordingly. Thanks, - Simon