Re: [PATCH 1/3] selftests/bpf: Update LLVM Phabricator links

2024-01-11 Thread Alexei Starovoitov
On Thu, Jan 11, 2024 at 11:40 AM Nathan Chancellor  wrote:
>
> Hi Yonghong,
>
> On Wed, Jan 10, 2024 at 08:05:36PM -0800, Yonghong Song wrote:
> >
> > On 1/9/24 2:16 PM, Nathan Chancellor wrote:
> > > reviews.llvm.org was LLVM's Phabricator instances for code review. It
> > > has been abandoned in favor of GitHub pull requests. While the majority
> > > of links in the kernel sources still work because of the work Fangrui
> > > has done turning the dynamic Phabricator instance into a static archive,
> > > there are some issues with that work, so preemptively convert all the
> > > links in the kernel sources to point to the commit on GitHub.
> > >
> > > Most of the commits have the corresponding differential review link in
> > > the commit message itself so there should not be any loss of fidelity in
> > > the relevant information.
> > >
> > > Additionally, fix a typo in the xdpwall.c print ("LLMV" -> "LLVM") while
> > > in the area.
> > >
> > > Link: 
> > > https://discourse.llvm.org/t/update-on-github-pull-requests/71540/172
> > > Signed-off-by: Nathan Chancellor 
> >
> > Ack with one nit below.
> >
> > Acked-by: Yonghong Song 
>
> 
>
> > > @@ -304,6 +304,6 @@ from running test_progs will look like:
> > >   .. code-block:: console
> > > -  test_xdpwall:FAIL:Does LLVM have https://reviews.llvm.org/D109073? 
> > > unexpected error: -4007
> > > +  test_xdpwall:FAIL:Does LLVM have 
> > > https://github.com/llvm/llvm-project/commit/ea72b0319d7b0f0c2fcf41d121afa5d031b319d5?
> > >  unexpected error: -4007
> > > -__ https://reviews.llvm.org/D109073
> > > +__ 
> > > https://github.com/llvm/llvm-project/commit/ea72b0319d7b0f0c2fcf41d121afa5d031b319d
> >
> > To be consistent with other links, could you add the missing last alnum '5' 
> > to the above link?
>
> Thanks a lot for catching this and providing an ack. Andrew, could you
> squash this update into selftests-bpf-update-llvm-phabricator-links.patch?

Please send a new patch.
We'd like to take all bpf patches through the bpf tree to avoid conflicts.


Re: [PATCH bpf-next 5/6] bpf, arm32: Always zero extend for LDX with B/H/W

2023-09-12 Thread Alexei Starovoitov
On Tue, Sep 12, 2023 at 4:17 PM Puranjay Mohan  wrote:
>
> On Wed, Sep 13, 2023 at 1:04 AM Russell King (Oracle)
>  wrote:
> >
> > On Tue, Sep 12, 2023 at 10:46:53PM +, Puranjay Mohan wrote:
> > > The JITs should not depend on the verifier for zero extending the upper
> > > 32 bits of the destination register when loading a byte, half-word, or
> > > word.
> > >
> > > A following patch will make the verifier stop patching zext instructions
> > > after LDX.
> >
> > This was introduced by:
> >
> > 163541e6ba34 ("arm: bpf: eliminate zero extension code-gen")
> >
> > along with an additional function. So three points:
> >
> > 1) the commit should probably explain why it has now become undesirable
> > to access this verifier state, whereas it appears it was explicitly
> > added to permit this optimisation.
>
> I added some details in the cover letter.
>
> For the complete discussion see: [1]
>
> > 2) you state that jits should not depend on this state, but the above
> > commit adds more references than you're removing, so aren't there still
> > references to the verifier remaining after this patch? I count a total
> > of 10, and the patch below removes three.
>
> The JITs should not depend on this state for LDX (loading
> a B/H/W.
> This patch removes the usage only for LDX.
>
> > 3) what about the bpf_jit_needs_zext() function that was added to
> > support the export of this zext state?
>
> That is still applicable, The verifier will still emit zext
> instructions for other
> instructions like BPF_ALU / BPF_ALU64
>
> >
> > Essentially, the logic stated in the commit message doesn't seem to be
> > reflected by the proposed code change.
>
> I will try to provide more information.
> Currently I have asked Alexei if we really need this in [2].
> I still think this optimization is useful and we should keep it.

Right. subreg tracking is indeed functional for narrow loads.
Let's drop this patch set.


Re: [PATCH v2 06/12] mm/execmem: introduce execmem_data_alloc()

2023-06-20 Thread Alexei Starovoitov
On Tue, Jun 20, 2023 at 7:51 AM Steven Rostedt  wrote:
>
> On Mon, 19 Jun 2023 02:43:58 +0200
> Thomas Gleixner  wrote:
>
> > Now you might argue that it _is_ a "hotpath" due to the BPF usage, but
> > then even more so as any intermediate wrapper which converts from one
> > data representation to another data representation is not going to
> > increase performance, right?
>
> Just as a side note. BPF can not attach its return calling code to
> functions that have more than 6 parameters (3 on 32 bit x86), because of
> the way BPF return path trampoline works. It is a requirement that all
> parameters live in registers, and none on the stack.

It's actually 7 and that restriction is being lifted.
The patch set to attach to <= 12 is being discussed.


Re: [PATCH 00/34] selftests: Fix incorrect kernel headers search path

2023-01-30 Thread Alexei Starovoitov
On Mon, Jan 30, 2023 at 3:48 PM Shuah Khan  wrote:
>
> >>
> >> These will be applied by maintainers to their trees.
> >
> > Not in this form. They break the build.
>
> Mathieu is sending you the patches in the format you requested in
> the thread on this patch.

It's not the format, but the patch itself is incorrect.


Re: [PATCH 00/34] selftests: Fix incorrect kernel headers search path

2023-01-30 Thread Alexei Starovoitov
On Mon, Jan 30, 2023 at 2:46 PM Shuah Khan  wrote:
>
> On 1/27/23 06:57, Mathieu Desnoyers wrote:
> > Hi,
> >
> > This series fixes incorrect kernel header search path in kernel
> > selftests.
> >
> > Near the end of the series, a few changes are not tagged as "Fixes"
> > because the current behavior is to rely on the kernel sources uapi files
> > rather than on the installed kernel header files. Nevertheless, those
> > are updated for consistency.
> >
> > There are situations where "../../../../include/" was added to -I search
> > path, which is bogus for userspace tests and caused issues with types.h.
> > Those are removed.
> >
> > Thanks,
> >
> > Mathieu
> >
> > Mathieu Desnoyers (34):
>
> The below patches are now applied to linux-kselftest next for Linux 6.3-rc1
>
> >selftests: arm64: Fix incorrect kernel headers search path
> >selftests: clone3: Fix incorrect kernel headers search path
> >selftests: core: Fix incorrect kernel headers search path
> >selftests: dma: Fix incorrect kernel headers search path
> >selftests: dmabuf-heaps: Fix incorrect kernel headers search path
> >selftests: drivers: Fix incorrect kernel headers search path
> >selftests: filesystems: Fix incorrect kernel headers search path
> >selftests: futex: Fix incorrect kernel headers search path
> >selftests: gpio: Fix incorrect kernel headers search path
> >selftests: ipc: Fix incorrect kernel headers search path
> >selftests: kcmp: Fix incorrect kernel headers search path
> >selftests: media_tests: Fix incorrect kernel headers search path
> >selftests: membarrier: Fix incorrect kernel headers search path
> >selftests: mount_setattr: Fix incorrect kernel headers search path
> >selftests: move_mount_set_group: Fix incorrect kernel headers search
> >  path
> >selftests: perf_events: Fix incorrect kernel headers search path
> >selftests: pid_namespace: Fix incorrect kernel headers search path
> >selftests: pidfd: Fix incorrect kernel headers search path
> >selftests: ptp: Fix incorrect kernel headers search path
> >selftests: rseq: Fix incorrect kernel headers search path
> >selftests: sched: Fix incorrect kernel headers search path
> >selftests: seccomp: Fix incorrect kernel headers search path
> >selftests: sync: Fix incorrect kernel headers search path
> >selftests: user_events: Fix incorrect kernel headers search path
> >selftests: vm: Fix incorrect kernel headers search path
> >selftests: x86: Fix incorrect kernel headers search path
> >selftests: iommu: Use installed kernel headers search path
> >selftests: memfd: Use installed kernel headers search path
> >selftests: ptrace: Use installed kernel headers search path
> >selftests: tdx: Use installed kernel headers search path
> >
>
> These will be applied by maintainers to their trees.

Not in this form. They break the build.

> >selftests: bpf: Fix incorrect kernel headers search path # 02/34
> >selftests: net: Fix incorrect kernel headers search path # 17/34
> >selftests: powerpc: Fix incorrect kernel headers search path # 21/34
> >selftests: bpf docs: Use installed kernel headers search path # 30/34
>
> thanks,
> -- Shuah


Re: [PATCH 1/2] powerpc/bpf: Fix detecting BPF atomic instructions

2021-07-01 Thread Alexei Starovoitov
On Thu, Jul 1, 2021 at 12:32 PM Naveen N. Rao
 wrote:
>
> Alexei Starovoitov wrote:
> > On Thu, Jul 1, 2021 at 8:09 AM Naveen N. Rao
> >  wrote:
> >>
> >> Commit 91c960b0056672 ("bpf: Rename BPF_XADD and prepare to encode other
> >> atomics in .imm") converted BPF_XADD to BPF_ATOMIC and added a way to
> >> distinguish instructions based on the immediate field. Existing JIT
> >> implementations were updated to check for the immediate field and to
> >> reject programs utilizing anything more than BPF_ADD (such as BPF_FETCH)
> >> in the immediate field.
> >>
> >> However, the check added to powerpc64 JIT did not look at the correct
> >> BPF instruction. Due to this, such programs would be accepted and
> >> incorrectly JIT'ed resulting in soft lockups, as seen with the atomic
> >> bounds test. Fix this by looking at the correct immediate value.
> >>
> >> Fixes: 91c960b0056672 ("bpf: Rename BPF_XADD and prepare to encode other 
> >> atomics in .imm")
> >> Reported-by: Jiri Olsa 
> >> Tested-by: Jiri Olsa 
> >> Signed-off-by: Naveen N. Rao 
> >> ---
> >> Hi Jiri,
> >> FYI: I made a small change in this patch -- using 'imm' directly, rather
> >> than insn[i].imm. I've still added your Tested-by since this shouldn't
> >> impact the fix in any way.
> >>
> >> - Naveen
> >
> > Excellent debugging! You guys are awesome.
>
> Thanks. Jiri and Brendan did the bulk of the work :)
>
> > How do you want this fix routed? via bpf tree?
>
> Michael has a few BPF patches queued up in powerpc tree for v5.14, so it
> might be easier to take these patches through the powerpc tree unless he
> feels otherwise. Michael?

Works for me. Thanks!


Re: [PATCH 1/2] powerpc/bpf: Fix detecting BPF atomic instructions

2021-07-01 Thread Alexei Starovoitov
On Thu, Jul 1, 2021 at 8:09 AM Naveen N. Rao
 wrote:
>
> Commit 91c960b0056672 ("bpf: Rename BPF_XADD and prepare to encode other
> atomics in .imm") converted BPF_XADD to BPF_ATOMIC and added a way to
> distinguish instructions based on the immediate field. Existing JIT
> implementations were updated to check for the immediate field and to
> reject programs utilizing anything more than BPF_ADD (such as BPF_FETCH)
> in the immediate field.
>
> However, the check added to powerpc64 JIT did not look at the correct
> BPF instruction. Due to this, such programs would be accepted and
> incorrectly JIT'ed resulting in soft lockups, as seen with the atomic
> bounds test. Fix this by looking at the correct immediate value.
>
> Fixes: 91c960b0056672 ("bpf: Rename BPF_XADD and prepare to encode other 
> atomics in .imm")
> Reported-by: Jiri Olsa 
> Tested-by: Jiri Olsa 
> Signed-off-by: Naveen N. Rao 
> ---
> Hi Jiri,
> FYI: I made a small change in this patch -- using 'imm' directly, rather
> than insn[i].imm. I've still added your Tested-by since this shouldn't
> impact the fix in any way.
>
> - Naveen

Excellent debugging! You guys are awesome.
How do you want this fix routed? via bpf tree?


Re: [PATCH v2] lockdown,selinux: avoid bogus SELinux lockdown permission checks

2021-06-04 Thread Alexei Starovoitov
On Fri, Jun 4, 2021 at 4:34 PM Paul Moore  wrote:
>
> > Again, the problem is not limited to BPF at all. kprobes is doing register-
> > time hooks which are equivalent to the one of BPF. Anything in run-time
> > trying to prevent probe_read_kernel by kprobes or BPF is broken by design.
>
> Not being an expert on kprobes I can't really comment on that, but
> right now I'm focused on trying to make things work for the BPF
> helpers.  I suspect that if we can get the SELinux lockdown
> implementation working properly for BPF the solution for kprobes won't
> be far off.

Paul,

Both kprobe and bpf can call probe_read_kernel==copy_from_kernel_nofault
from all contexts.
Including NMI. Most of audit_log_* is not acceptable.
Just removing a wakeup is not solving anything.
Audit hooks don't belong in NMI.
Audit design needs memory allocation. Hence it's not suitable
for NMI and hardirq. But kprobes and bpf progs do run just fine there.
BPF, for example, only uses pre-allocated memory.


Re: [PATCH bpf-next 1/2] bpf: Remove bpf_jit_enable=2 debugging mode

2021-04-20 Thread Alexei Starovoitov
On Sat, Apr 17, 2021 at 1:16 AM Christophe Leroy
 wrote:
>
>
>
> Le 16/04/2021 à 01:49, Alexei Starovoitov a écrit :
> > On Thu, Apr 15, 2021 at 8:41 AM Quentin Monnet  
> > wrote:
> >>
> >> 2021-04-15 16:37 UTC+0200 ~ Daniel Borkmann 
> >>> On 4/15/21 11:32 AM, Jianlin Lv wrote:
> >>>> For debugging JITs, dumping the JITed image to kernel log is discouraged,
> >>>> "bpftool prog dump jited" is much better way to examine JITed dumps.
> >>>> This patch get rid of the code related to bpf_jit_enable=2 mode and
> >>>> update the proc handler of bpf_jit_enable, also added auxiliary
> >>>> information to explain how to use bpf_jit_disasm tool after this change.
> >>>>
> >>>> Signed-off-by: Jianlin Lv 
> >>
> >> Hello,
> >>
> >> For what it's worth, I have already seen people dump the JIT image in
> >> kernel logs in Qemu VMs running with just a busybox, not for kernel
> >> development, but in a context where buiding/using bpftool was not
> >> possible.
> >
> > If building/using bpftool is not possible then majority of selftests won't
> > be exercised. I don't think such environment is suitable for any kind
> > of bpf development. Much so for JIT debugging.
> > While bpf_jit_enable=2 is nothing but the debugging tool for JIT developers.
> > I'd rather nuke that code instead of carrying it from kernel to kernel.
> >
>
> When I implemented JIT for PPC32, it was extremely helpfull.
>
> As far as I understand, for the time being bpftool is not usable in my 
> environment because it
> doesn't support cross compilation when the target's endianess differs from 
> the building host
> endianess, see discussion at
> https://lore.kernel.org/bpf/21e66a09-514f-f426-b9e2-13baab0b9...@csgroup.eu/
>
> That's right that selftests can't be exercised because they don't build.
>
> The question might be candid as I didn't investigate much about the 
> replacement of "bpf_jit_enable=2
> debugging mode" by bpftool, how do we use bpftool exactly for that ? 
> Especially when using the BPF
> test module ?

the kernel developers can add any amount of printk and dumps to debug
their code,
but such debugging aid should not be part of the production kernel.
That sysctl was two things at once: debugging tool for kernel devs and
introspection for users.
bpftool jit dump solves the 2nd part. It provides JIT introspection to users.
Debugging of the kernel can be done with any amount of auxiliary code
including calling print_hex_dump() during jiting.


Re: [PATCH bpf-next 1/2] bpf: Remove bpf_jit_enable=2 debugging mode

2021-04-15 Thread Alexei Starovoitov
On Thu, Apr 15, 2021 at 8:41 AM Quentin Monnet  wrote:
>
> 2021-04-15 16:37 UTC+0200 ~ Daniel Borkmann 
> > On 4/15/21 11:32 AM, Jianlin Lv wrote:
> >> For debugging JITs, dumping the JITed image to kernel log is discouraged,
> >> "bpftool prog dump jited" is much better way to examine JITed dumps.
> >> This patch get rid of the code related to bpf_jit_enable=2 mode and
> >> update the proc handler of bpf_jit_enable, also added auxiliary
> >> information to explain how to use bpf_jit_disasm tool after this change.
> >>
> >> Signed-off-by: Jianlin Lv 
>
> Hello,
>
> For what it's worth, I have already seen people dump the JIT image in
> kernel logs in Qemu VMs running with just a busybox, not for kernel
> development, but in a context where buiding/using bpftool was not
> possible.

If building/using bpftool is not possible then majority of selftests won't
be exercised. I don't think such environment is suitable for any kind
of bpf development. Much so for JIT debugging.
While bpf_jit_enable=2 is nothing but the debugging tool for JIT developers.
I'd rather nuke that code instead of carrying it from kernel to kernel.


Re: [RFC PATCH v1 7/7] powerpc/bpf: Implement extended BPF on PPC32

2020-12-16 Thread Alexei Starovoitov
On Wed, Dec 16, 2020 at 10:07:37AM +, Christophe Leroy wrote:
> Implement Extended Berkeley Packet Filter on Powerpc 32
> 
> Test result with test_bpf module:
> 
>   test_bpf: Summary: 378 PASSED, 0 FAILED, [354/366 JIT'ed]

nice!

> Registers mapping:
> 
>   [BPF_REG_0] = r11-r12
>   /* function arguments */
>   [BPF_REG_1] = r3-r4
>   [BPF_REG_2] = r5-r6
>   [BPF_REG_3] = r7-r8
>   [BPF_REG_4] = r9-r10
>   [BPF_REG_5] = r21-r22 (Args 9 and 10 come in via the stack)
>   /* non volatile registers */
>   [BPF_REG_6] = r23-r24
>   [BPF_REG_7] = r25-r26
>   [BPF_REG_8] = r27-r28
>   [BPF_REG_9] = r29-r30
>   /* frame pointer aka BPF_REG_10 */
>   [BPF_REG_FP] = r31
>   /* eBPF jit internal registers */
>   [BPF_REG_AX] = r19-r20
>   [TMP_REG] = r18
> 
> As PPC32 doesn't have a redzone in the stack,
> use r17 as tail call counter.
> 
> r0 is used as temporary register as much as possible. It is referenced
> directly in the code in order to avoid misuse of it, because some
> instructions interpret it as value 0 instead of register r0
> (ex: addi, addis, stw, lwz, ...)
> 
> The following operations are not implemented:
> 
>   case BPF_ALU64 | BPF_DIV | BPF_X: /* dst /= src */
>   case BPF_ALU64 | BPF_MOD | BPF_X: /* dst %= src */
>   case BPF_STX | BPF_XADD | BPF_DW: /* *(u64 *)(dst + off) += src 
> */
> 
> The following operations are only implemented for power of two constants:
> 
>   case BPF_ALU64 | BPF_MOD | BPF_K: /* dst %= imm */
>   case BPF_ALU64 | BPF_DIV | BPF_K: /* dst /= imm */

Those are sensible limitations. MOD and DIV are rare, but XADD is common.
Please consider doing it as a cmpxchg loop in the future.

Also please run test_progs. It will give a lot better coverage than test_bpf.ko


Re: [PATCH v5 01/10] capabilities: introduce CAP_PERFMON to kernel and user space

2020-01-21 Thread Alexei Starovoitov
On Tue, Jan 21, 2020 at 9:31 AM Alexey Budankov
 wrote:
>
>
> On 21.01.2020 17:43, Stephen Smalley wrote:
> > On 1/20/20 6:23 AM, Alexey Budankov wrote:
> >>
> >> Introduce CAP_PERFMON capability designed to secure system performance
> >> monitoring and observability operations so that CAP_PERFMON would assist
> >> CAP_SYS_ADMIN capability in its governing role for perf_events, i915_perf
> >> and other performance monitoring and observability subsystems.
> >>
> >> CAP_PERFMON intends to harden system security and integrity during system
> >> performance monitoring and observability operations by decreasing attack
> >> surface that is available to a CAP_SYS_ADMIN privileged process [1].
> >> Providing access to system performance monitoring and observability
> >> operations under CAP_PERFMON capability singly, without the rest of
> >> CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and
> >> makes operation more secure.
> >>
> >> CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to
> >> system performance monitoring and observability operations and balance
> >> amount of CAP_SYS_ADMIN credentials following the recommendations in the
> >> capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability is
> >> overloaded; see Notes to kernel developers, below."
> >>
> >> Although the software running under CAP_PERFMON can not ensure avoidance
> >> of related hardware issues, the software can still mitigate these issues
> >> following the official embargoed hardware issues mitigation procedure [2].
> >> The bugs in the software itself could be fixed following the standard
> >> kernel development process [3] to maintain and harden security of system
> >> performance monitoring and observability operations.
> >>
> >> [1] http://man7.org/linux/man-pages/man7/capabilities.7.html
> >> [2] 
> >> https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
> >> [3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html
> >>
> >> Signed-off-by: Alexey Budankov 
> >> ---
> >>   include/linux/capability.h  | 12 
> >>   include/uapi/linux/capability.h |  8 +++-
> >>   security/selinux/include/classmap.h |  4 ++--
> >>   3 files changed, 21 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/include/linux/capability.h b/include/linux/capability.h
> >> index ecce0f43c73a..8784969d91e1 100644
> >> --- a/include/linux/capability.h
> >> +++ b/include/linux/capability.h
> >> @@ -251,6 +251,18 @@ extern bool privileged_wrt_inode_uidgid(struct 
> >> user_namespace *ns, const struct
> >>   extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
> >>   extern bool file_ns_capable(const struct file *file, struct 
> >> user_namespace *ns, int cap);
> >>   extern bool ptracer_capable(struct task_struct *tsk, struct 
> >> user_namespace *ns);
> >> +static inline bool perfmon_capable(void)
> >> +{
> >> +struct user_namespace *ns = _user_ns;
> >> +
> >> +if (ns_capable_noaudit(ns, CAP_PERFMON))
> >> +return ns_capable(ns, CAP_PERFMON);
> >> +
> >> +if (ns_capable_noaudit(ns, CAP_SYS_ADMIN))
> >> +return ns_capable(ns, CAP_SYS_ADMIN);
> >> +
> >> +return false;
> >> +}
> >
> > Why _noaudit()?  Normally only used when a permission failure is non-fatal 
> > to the operation.  Otherwise, we want the audit message.
>
> Some of ideas from v4 review.

well, in the requested changes form v4 I wrote:
return capable(CAP_PERFMON);
instead of
return false;

That's what Andy suggested earlier for CAP_BPF.
I think that should resolve Stephen's concern.


Re: linux-next: build warning after merge of the bpf-next tree

2020-01-13 Thread Alexei Starovoitov
On Sun, Jan 12, 2020 at 8:33 PM Zong Li  wrote:
>
> I'm not quite familiar with btf, so I have no idea why there are two
> weak symbols be added in 8580ac9404f6 ("bpf: Process in-kernel BTF")

I can explain what these weak symbols are for, but that won't change
the fact that compiler or linker are buggy. The weak symbols should work
in all cases and compiler should pick correct relocation.
In this case it sounds that compiler picked relative relocation and failed
to reach zero from that address.


Re: Re: linux-next: build warning after merge of the bpf-next tree

2020-01-10 Thread Alexei Starovoitov
On Fri, Jan 10, 2020 at 2:28 PM Alexandre Ghiti  wrote:
>
> Hi guys,
>
> On 10/27/19 8:02 PM, Stephen Rothwell wrote:
> > Hi all,
> >
> > On Fri, 18 Oct 2019 10:56:57 +1100 Stephen Rothwell  
> > wrote:
> >> Hi all,
> >>
> >> After merging the bpf-next tree, today's linux-next build (powerpc
> >> ppc64_defconfig) produced this warning:
> >>
> >> WARNING: 2 bad relocations
> >> c1998a48 R_PPC64_ADDR64_binary__btf_vmlinux_bin_start
> >> c1998a50 R_PPC64_ADDR64_binary__btf_vmlinux_bin_end
> >>
> >> Introduced by commit
> >>
> >>8580ac9404f6 ("bpf: Process in-kernel BTF")
> > This warning now appears in the net-next tree build.
> >
> >
> I bump that thread up because Zong also noticed that 2 new relocations for
> those symbols appeared in my riscv relocatable kernel branch following
> that commit.
>
> I also noticed 2 new relocations R_AARCH64_ABS64 appearing in arm64 kernel.
>
> Those 2 weak undefined symbols have existed since commit
> 341dfcf8d78e ("btf: expose BTF info through sysfs") but this is the fact
> to declare those symbols into btf.c that produced those relocations.
>
> I'm not sure what this all means, but this is not something I expected
> for riscv for
> a kernel linked with -shared/-fpie. Maybe should we just leave them to
> zero ?
>
> I think that deserves a deeper look if someone understands all this
> better than I do.

Are you saying there is a warning for arm64 as well?
Can ppc folks explain the above warning?
What does it mean "2 bad relocations"?
The code is doing:
extern char __weak _binary__btf_vmlinux_bin_start[];
extern char __weak _binary__btf_vmlinux_bin_end[];
Since they are weak they should be zero when not defined.
What's the issue?


Re: [PATCH] libbpf: Fix readelf output parsing for Fedora

2019-12-15 Thread Alexei Starovoitov
On Fri, Dec 13, 2019 at 9:02 AM Andrii Nakryiko
 wrote:
>
> On Fri, Dec 13, 2019 at 2:11 AM Thadeu Lima de Souza Cascardo
>  wrote:
> >
> > Fedora binutils has been patched to show "other info" for a symbol at the
> > end of the line. This was done in order to support unmaintained scripts
> > that would break with the extra info. [1]
> >
> > [1] 
> > https://src.fedoraproject.org/rpms/binutils/c/b8265c46f7ddae23a792ee8306fbaaeacba83bf8
> >
> > This in turn has been done to fix the build of ruby, because of checksec.
> > [2] Thanks Michael Ellerman for the pointer.
> >
> > [2] https://bugzilla.redhat.com/show_bug.cgi?id=1479302
> >
> > As libbpf Makefile is not unmaintained, we can simply deal with either
> > output format, by just removing the "other info" field, as it always comes
> > inside brackets.
> >
> > Cc: Aurelien Jarno 
> > Fixes: 3464afdf11f9 (libbpf: Fix readelf output parsing on powerpc with 
> > recent binutils)
> > Reported-by: Justin Forbes 
> > Signed-off-by: Thadeu Lima de Souza Cascardo 
> > ---
>
> I was briefly playing with it and trying to make it use nm to dump
> symbols, instead of parsing more human-oriented output of readelf, but
> somehow nm doesn't output symbols with @@LIBBPF.* suffix at the end,
> so I just gave up. So I think this one is good.
>
> This should go through bpf-next tree.
>
> Acked-by: Andrii Nakryiko 

Applied. Thanks


Re: linux-next: build warning after merge of the bpf-next tree

2019-10-17 Thread Alexei Starovoitov
On Fri, Oct 18, 2019 at 10:56:57AM +1100, Stephen Rothwell wrote:
> Hi all,
> 
> After merging the bpf-next tree, today's linux-next build (powerpc
> ppc64_defconfig) produced this warning:
> 
> WARNING: 2 bad relocations
> c1998a48 R_PPC64_ADDR64_binary__btf_vmlinux_bin_start
> c1998a50 R_PPC64_ADDR64_binary__btf_vmlinux_bin_end

Can ppc folks help me figure out what this warning means?



Re: [PATCH bpf] bpf: powerpc: fix broken uapi for BPF_PROG_TYPE_PERF_EVENT

2018-12-09 Thread Alexei Starovoitov
On Thu, Dec 06, 2018 at 02:57:01PM +0530, Sandipan Das wrote:
> Now that there are different variants of pt_regs for userspace and
> kernel, the uapi for the BPF_PROG_TYPE_PERF_EVENT program type must
> be changed by exporting the user_pt_regs structure instead of the
> pt_regs structure that is in-kernel only.
> 
> Fixes: 002af9391bfb ("powerpc: Split user/kernel definitions of struct 
> pt_regs")
> Signed-off-by: Sandipan Das 

Thanks! Applied to bpf tree.



Re: [PATCH net-next 0/6] Remove VLAN.CFI overload

2018-11-16 Thread Alexei Starovoitov
On Sat, Nov 10, 2018 at 1:48 PM David Miller  wrote:
>
> From: Michał Mirosław 
> Date: Sat, 10 Nov 2018 19:58:29 +0100
>
> > Fix BPF code/JITs to allow for separate VLAN_PRESENT flag
> > storage and finally move the flag to separate storage in skbuff.
> >
> > This is final step to make CLAN.CFI transparent to core Linux
> > networking stack.
> >
> > An #ifdef is introduced temporarily to mark fragments masking
> > VLAN_TAG_PRESENT. This is removed altogether in the final patch.
>
> Daniel and Alexei, please review.

It was on my todo list.
All reviews got delayed due to LPC.

I guess too late to comment now.
Anyhow I don't see the value in this patch set.
Seems like code churn.

Michal, could you please explain the reasoning?


Re: [PATCH bpf v2 4/6] tools: bpf: sync bpf uapi header

2018-05-18 Thread Alexei Starovoitov
On Fri, May 18, 2018 at 5:50 AM, Sandipan Das
 wrote:
> Syncing the bpf.h uapi header with tools so that struct
> bpf_prog_info has the two new fields for passing on the
> addresses of the kernel symbols corresponding to each
> function in a JITed program.
>
> Signed-off-by: Sandipan Das 
> ---
>  tools/include/uapi/linux/bpf.h | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index d94d333a8225..040c9cac7303 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -2188,6 +2188,8 @@ struct bpf_prog_info {
> __u32 xlated_prog_len;
> __aligned_u64 jited_prog_insns;
> __aligned_u64 xlated_prog_insns;
> +   __aligned_u64 jited_ksyms;
> +   __u32 nr_jited_ksyms;
> __u64 load_time;/* ns since boottime */
> __u32 created_by_uid;
> __u32 nr_map_ids;

this breaks uapi.
New fields can only be added to the end.


Re: [RFC][PATCH bpf] tools: bpftool: Fix tags for bpf-to-bpf calls

2018-03-05 Thread Alexei Starovoitov

On 3/1/18 12:51 AM, Naveen N. Rao wrote:

Daniel Borkmann wrote:

On 02/27/2018 01:13 PM, Sandipan Das wrote:

With this patch, it will look like this:
   0: (85) call pc+2#bpf_prog_8f85936f29a7790a+3


(Note the +2 is the insn->off already.)


   1: (b7) r0 = 1
   2: (95) exit
   3: (b7) r0 = 2
   4: (95) exit

where 8f85936f29a7790a is the tag of the bpf program and 3 is
the offset to the start of the subprog from the start of the
program.


The problem with this approach would be that right now the name is
something like bpf_prog_5f76847930402518_F where the subprog tag is
just a placeholder so in future, this may well adapt to e.g. the actual
function name from the elf file. Note that when kallsyms is enabled
then a name like bpf_prog_5f76847930402518_F will also appear in stack
traces, perf records, etc, so for correlation/debugging it would really
help to have them the same everywhere.

Worst case if there's nothing better, potentially what one could do in
bpf_prog_get_info_by_fd() is to dump an array of full addresses and
have the imm part as the index pointing to one of them, just unfortunate
that it's likely only needed in ppc64.


Ok. We seem to have discussed a few different aspects in this thread.
Let me summarize the different aspects we have discussed:
1. Passing address of JIT'ed function to the JIT engines:
   Two approaches discussed:
   a. Existing approach, where the subprog address is encoded as an
offset from __bpf_call_base() in imm32 field of the BPF call
instruction. This requires the JIT'ed function to be within 2GB of
__bpf_call_base(), which won't be true on ppc64, at the least. So,
this won't on ppc64 (and any other architectures where vmalloc'ed
(module_alloc()) memory is from a different, far, address range).


it looks like ppc64 doesn't guarantee today that all of module_alloc()
will be within 32-bit, but I think it should be trivial to add such
guarantee. If so, we can define another __bpf_call_base specifically
for bpf-to-bpf calls when jit is on.
Then jit_subprogs() math will fit:
insn->imm = func[subprog]->bpf_func - __bpf_call_base_for_jited_progs;
and will make it easier for ppc64 jit to optimize and use
near calls for bpf-to-bpf calls while still using trampoline
for bpf-to-kernel.
Also it solves bpftool issue.
For all other archs we can keep
__bpf_call_base_for_jited_progs == __bpf_call_base


   There is a third option we can consider:
   c. Convert BPF pseudo call instruction into a 2-instruction sequence
   (similar to BPF_DW) and encode the full 64-bit call target in the
second bpf instruction. To distinguish this from other instruction
forms, we can set imm32 to -1.


Adding new instruction just for that case looks like overkill.



Re: [RFC][PATCH bpf 1/2] bpf: allow 64-bit offsets for bpf function calls

2018-02-09 Thread Alexei Starovoitov

On 2/9/18 8:54 AM, Naveen N. Rao wrote:

Naveen N. Rao wrote:

Alexei Starovoitov wrote:

On 2/8/18 4:03 AM, Sandipan Das wrote:

The imm field of a bpf_insn is a signed 32-bit integer. For
JIT-ed bpf-to-bpf function calls, it stores the offset from
__bpf_call_base to the start of the callee function.

For some architectures, such as powerpc64, it was found that
this offset may be as large as 64 bits because of which this
cannot be accomodated in the imm field without truncation.

To resolve this, we additionally use the aux data within each
bpf_prog associated with the caller functions to store the
addresses of their respective callees.

Signed-off-by: Sandipan Das <sandi...@linux.vnet.ibm.com>
---
 kernel/bpf/verifier.c | 39 ++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 5fb69a85d967..52088b4ca02f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5282,6 +5282,19 @@ static int jit_subprogs(struct
bpf_verifier_env *env)
  * run last pass of JIT
  */
 for (i = 0; i <= env->subprog_cnt; i++) {
+u32 flen = func[i]->len, callee_cnt = 0;
+struct bpf_prog **callee;
+
+/* for now assume that the maximum number of bpf function
+ * calls that can be made by a caller must be at most the
+ * number of bpf instructions in that function
+ */
+callee = kzalloc(sizeof(func[i]) * flen, GFP_KERNEL);
+if (!callee) {
+err = -ENOMEM;
+goto out_free;
+}
+
 insn = func[i]->insnsi;
 for (j = 0; j < func[i]->len; j++, insn++) {
 if (insn->code != (BPF_JMP | BPF_CALL) ||
@@ -5292,6 +5305,26 @@ static int jit_subprogs(struct
bpf_verifier_env *env)
 insn->imm = (u64 (*)(u64, u64, u64, u64, u64))
 func[subprog]->bpf_func -
 __bpf_call_base;
+
+/* the offset to the callee from __bpf_call_base
+ * may be larger than what the 32 bit integer imm
+ * can accomodate which will truncate the higher
+ * order bits
+ *
+ * to avoid this, we additionally utilize the aux
+ * data of each caller function for storing the
+ * addresses of every callee associated with it
+ */
+callee[callee_cnt++] = func[subprog];


can you share typical /proc/kallsyms ?
Are you saying that kernel and kernel modules are allocated from
address spaces that are always more than 32-bit apart?


Yes. On ppc64, kernel text is linearly mapped from 0xc000,
while vmalloc'ed area starts from 0xd000 (for radix, this is
different, but still beyond a 32-bit offset).


That would mean that all kernel calls into modules are far calls
and the other way around form .ko into kernel?
Performance is probably suffering because every call needs to be built
with full 64-bit offset. No ?


Possibly, and I think Michael can give a better perspective, but I think
this is due to our ABI. For inter-module calls, we need to setup the TOC
pointer (or the address of the function being called with ABIv2),
which would require us to load a full address regardless.


Thinking more about this, as an optimization, for bpf-to-bpf calls, we
could detect a near call and just emit a relative branch since we don't
care about TOC with BPF. But, this will depend on whether the different
BPF functions are close enough (within 32MB) of one another.


so that will be just an optimization. Understood.
How about instead of doing callee = kzalloc(sizeof(func[i]) * flen..
we keep  insn->off pointing to subprog and move
prog->aux->func = func;
before the last JIT pass.
Then you won't need to alloc this extra array.



Re: [RFC][PATCH bpf 1/2] bpf: allow 64-bit offsets for bpf function calls

2018-02-08 Thread Alexei Starovoitov

On 2/8/18 4:03 AM, Sandipan Das wrote:

The imm field of a bpf_insn is a signed 32-bit integer. For
JIT-ed bpf-to-bpf function calls, it stores the offset from
__bpf_call_base to the start of the callee function.

For some architectures, such as powerpc64, it was found that
this offset may be as large as 64 bits because of which this
cannot be accomodated in the imm field without truncation.

To resolve this, we additionally use the aux data within each
bpf_prog associated with the caller functions to store the
addresses of their respective callees.

Signed-off-by: Sandipan Das 
---
 kernel/bpf/verifier.c | 39 ++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 5fb69a85d967..52088b4ca02f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5282,6 +5282,19 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 * run last pass of JIT
 */
for (i = 0; i <= env->subprog_cnt; i++) {
+   u32 flen = func[i]->len, callee_cnt = 0;
+   struct bpf_prog **callee;
+
+   /* for now assume that the maximum number of bpf function
+* calls that can be made by a caller must be at most the
+* number of bpf instructions in that function
+*/
+   callee = kzalloc(sizeof(func[i]) * flen, GFP_KERNEL);
+   if (!callee) {
+   err = -ENOMEM;
+   goto out_free;
+   }
+
insn = func[i]->insnsi;
for (j = 0; j < func[i]->len; j++, insn++) {
if (insn->code != (BPF_JMP | BPF_CALL) ||
@@ -5292,6 +5305,26 @@ static int jit_subprogs(struct bpf_verifier_env *env)
insn->imm = (u64 (*)(u64, u64, u64, u64, u64))
func[subprog]->bpf_func -
__bpf_call_base;
+
+   /* the offset to the callee from __bpf_call_base
+* may be larger than what the 32 bit integer imm
+* can accomodate which will truncate the higher
+* order bits
+*
+* to avoid this, we additionally utilize the aux
+* data of each caller function for storing the
+* addresses of every callee associated with it
+*/
+   callee[callee_cnt++] = func[subprog];


can you share typical /proc/kallsyms ?
Are you saying that kernel and kernel modules are allocated from
address spaces that are always more than 32-bit apart?
That would mean that all kernel calls into modules are far calls
and the other way around form .ko into kernel?
Performance is probably suffering because every call needs to be built
with full 64-bit offset. No ?



Re: [PATCH v3 00/20] Speculative page faults

2017-10-05 Thread Alexei Starovoitov
On Wed, Oct 04, 2017 at 08:50:49AM +0200, Laurent Dufour wrote:
> On 25/09/2017 18:27, Alexei Starovoitov wrote:
> > On Mon, Sep 18, 2017 at 12:15 AM, Laurent Dufour
> > <lduf...@linux.vnet.ibm.com> wrote:
> >> Despite the unprovable lockdep warning raised by Sergey, I didn't get any
> >> feedback on this series.
> >>
> >> Is there a chance to get it moved upstream ?
> > 
> > what is the status ?
> > We're eagerly looking forward for this set to land,
> > since we have several use cases for tracing that
> > will build on top of this set as discussed at Plumbers.
> 
> Hi Alexei,
> 
> Based on Plumber's note [1], it sounds that the use case is tied to the BPF
> tracing where a call tp find_vma() call will be made on a process's context
> to fetch user space's symbols.
> 
> Am I right ?
> Is the find_vma() call made in the context of the process owning the mm
> struct ?

Hi Laurent,

we're thinking about several use cases on top of your work.
First one is translation of user address to file_handle where
we need to do find_vma() from preempt_disabled context of bpf program.
My understanding that srcu should solve that nicely.
Second is making probe_read() to try harder when address is causing
minor fault. We're thinking that find_vma() followed by some new
light weight filemap_access() that doesn't sleep will do the trick.
In both cases the program will be accessing current->mm



Re: [PATCH v3 00/20] Speculative page faults

2017-09-25 Thread Alexei Starovoitov
On Mon, Sep 18, 2017 at 12:15 AM, Laurent Dufour
 wrote:
> Despite the unprovable lockdep warning raised by Sergey, I didn't get any
> feedback on this series.
>
> Is there a chance to get it moved upstream ?

what is the status ?
We're eagerly looking forward for this set to land,
since we have several use cases for tracing that
will build on top of this set as discussed at Plumbers.


Re: [PATCH 2/3] powerpc: bpf: flush the entire JIT buffer

2017-01-13 Thread Alexei Starovoitov
On Fri, Jan 13, 2017 at 10:40:01PM +0530, Naveen N. Rao wrote:
> With bpf_jit_binary_alloc(), we allocate at a page granularity and fill
> the rest of the space with illegal instructions to mitigate BPF spraying
> attacks, while having the actual JIT'ed BPF program at a random location
> within the allocated space. Under this scenario, it would be better to
> flush the entire allocated buffer rather than just the part containing
> the actual program. We already flush the buffer from start to the end of
> the BPF program. Extend this to include the illegal instructions after
> the BPF program.
> 
> Signed-off-by: Naveen N. Rao <naveen.n....@linux.vnet.ibm.com>

Acked-by: Alexei Starovoitov <a...@kernel.org>



Re: [PATCH 1/3] powerpc: bpf: remove redundant check for non-null image

2017-01-13 Thread Alexei Starovoitov
On Fri, Jan 13, 2017 at 10:40:00PM +0530, Naveen N. Rao wrote:
> From: Daniel Borkmann <dan...@iogearbox.net>
> 
> We have a check earlier to ensure we don't proceed if image is NULL. As
> such, the redundant check can be removed.
> 
> Signed-off-by: Daniel Borkmann <dan...@iogearbox.net>
> [Added similar changes for classic BPF JIT]
> Signed-off-by: Naveen N. Rao <naveen.n....@linux.vnet.ibm.com>

Acked-by: Alexei Starovoitov <a...@kernel.org>



Re: [PATCH 2/3] bpf powerpc: implement support for tail calls

2016-09-24 Thread Alexei Starovoitov
On Sat, Sep 24, 2016 at 12:33:54AM +0200, Daniel Borkmann wrote:
> On 09/23/2016 10:35 PM, Naveen N. Rao wrote:
> >Tail calls allow JIT'ed eBPF programs to call into other JIT'ed eBPF
> >programs. This can be achieved either by:
> >(1) retaining the stack setup by the first eBPF program and having all
> >subsequent eBPF programs re-using it, or,
> >(2) by unwinding/tearing down the stack and having each eBPF program
> >deal with its own stack as it sees fit.
> >
> >To ensure that this does not create loops, there is a limit to how many
> >tail calls can be done (currently 32). This requires the JIT'ed code to
> >maintain a count of the number of tail calls done so far.
> >
> >Approach (1) is simple, but requires every eBPF program to have (almost)
> >the same prologue/epilogue, regardless of whether they need it. This is
> >inefficient for small eBPF programs which may not sometimes need a
> >prologue at all. As such, to minimize impact of tail call
> >implementation, we use approach (2) here which needs each eBPF program
> >in the chain to use its own prologue/epilogue. This is not ideal when
> >many tail calls are involved and when all the eBPF programs in the chain
> >have similar prologue/epilogue. However, the impact is restricted to
> >programs that do tail calls. Individual eBPF programs are not affected.
> >
> >We maintain the tail call count in a fixed location on the stack and
> >updated tail call count values are passed in through this. The very
> >first eBPF program in a chain sets this up to 0 (the first 2
> >instructions). Subsequent tail calls skip the first two eBPF JIT
> >instructions to maintain the count. For programs that don't do tail
> >calls themselves, the first two instructions are NOPs.
> >
> >Signed-off-by: Naveen N. Rao 
> 
> Thanks for adding support, Naveen, that's really great! I think 2) seems
> fine as well in this context as prologue size can vary quite a bit here,
> and depending on program types likelihood of tail call usage as well (but
> I wouldn't expect deep nesting). Thanks a lot!

Great stuff. In this circumstances approach 2 makes sense to me as well.



Re: [PATCH 2/2] bpf samples: update tracex5 sample to use __seccomp_filter

2016-09-24 Thread Alexei Starovoitov
On Sat, Sep 24, 2016 at 02:10:05AM +0530, Naveen N. Rao wrote:
> seccomp_phase1() does not exist anymore. Instead, update sample to use
> __seccomp_filter(). While at it, set max locked memory to unlimited.
> 
> Signed-off-by: Naveen N. Rao <naveen.n@linux.vnet.ibm.com>

Acked-by: Alexei Starovoitov <a...@kernel.org>



Re: [PATCH 1/2] bpf samples: fix compiler errors with sockex2 and sockex3

2016-09-24 Thread Alexei Starovoitov
On Sat, Sep 24, 2016 at 02:10:04AM +0530, Naveen N. Rao wrote:
> These samples fail to compile as 'struct flow_keys' conflicts with
> definition in net/flow_dissector.h. Fix the same by renaming the
> structure used in the sample.
> 
> Signed-off-by: Naveen N. Rao <naveen.n@linux.vnet.ibm.com>

Thanks for the fix.
Acked-by: Alexei Starovoitov <a...@kernel.org>



Re: [PATCH] ppc: Fix BPF JIT for ABIv2

2016-06-21 Thread Alexei Starovoitov

On 6/21/16 7:47 AM, Thadeu Lima de Souza Cascardo wrote:


The calling convention is different with ABIv2 and so we'll need changes
in bpf_slow_path_common() and sk_negative_common().


How big would those changes be? Do we know?

How come no one reported this was broken previously? This is the first I've
heard of it being broken.



I just heard of it less than two weeks ago, and only could investigate it last
week, when I realized mainline was also affected.

It looks like the little-endian support for classic JIT were done before the
conversion to ABIv2. And as JIT is disabled by default, no one seems to have
exercised it.


it's not a surprise unfortunately. The JITs that were written before
test_bpf.ko was developed were missing corner cases. Typical tcpdump
would be fine, but fragmented packets, negative offsets and
out-out-bounds wouldn't be handled correctly.
I'd suggest to validate the stable backport with test_bpf as well.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 6/6] ppc: ebpf/jit: Implement JIT compiler for extended BPF

2016-06-07 Thread Alexei Starovoitov
On Tue, Jun 07, 2016 at 07:02:23PM +0530, Naveen N. Rao wrote:
> PPC64 eBPF JIT compiler.
> 
> Enable with:
> echo 1 > /proc/sys/net/core/bpf_jit_enable
> or
> echo 2 > /proc/sys/net/core/bpf_jit_enable
> 
> ... to see the generated JIT code. This can further be processed with
> tools/net/bpf_jit_disasm.
> 
> With CONFIG_TEST_BPF=m and 'modprobe test_bpf':
> test_bpf: Summary: 305 PASSED, 0 FAILED, [297/297 JIT'ed]
> 
> ... on both ppc64 BE and LE.

Nice. That's even better than on x64 which cannot jit one test:
test_bpf: #262 BPF_MAXINSNS: Jump, gap, jump, ... jited:0 168 PASS
which was designed specifically to hit x64 jit pass limit.
ppc jit has predicatble number of passes and doesn't have this problem
as expected. Great.

> The details of the approach are documented through various comments in
> the code.
> 
> Cc: Matt Evans <m...@ozlabs.org>
> Cc: Denis Kirjanov <k...@linux-powerpc.org>
> Cc: Michael Ellerman <m...@ellerman.id.au>
> Cc: Paul Mackerras <pau...@samba.org>
> Cc: Alexei Starovoitov <a...@fb.com>
> Cc: Daniel Borkmann <dan...@iogearbox.net>
> Cc: "David S. Miller" <da...@davemloft.net>
> Cc: Ananth N Mavinakayanahalli <ana...@in.ibm.com>
> Signed-off-by: Naveen N. Rao <naveen.n@linux.vnet.ibm.com>
> ---
>  arch/powerpc/Kconfig  |   3 +-
>  arch/powerpc/include/asm/asm-compat.h |   2 +
>  arch/powerpc/include/asm/ppc-opcode.h |  20 +-
>  arch/powerpc/net/Makefile |   4 +
>  arch/powerpc/net/bpf_jit.h|  53 +-
>  arch/powerpc/net/bpf_jit64.h  | 102 
>  arch/powerpc/net/bpf_jit_asm64.S  | 180 +++
>  arch/powerpc/net/bpf_jit_comp64.c | 956 
> ++
>  8 files changed, 1317 insertions(+), 3 deletions(-)
>  create mode 100644 arch/powerpc/net/bpf_jit64.h
>  create mode 100644 arch/powerpc/net/bpf_jit_asm64.S
>  create mode 100644 arch/powerpc/net/bpf_jit_comp64.c

don't see any issues with the code.
Thank you for working on this.

Acked-by: Alexei Starovoitov <a...@kernel.org>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH net 4/4] lib/test_bpf: Add additional BPF_ADD tests

2016-04-05 Thread Alexei Starovoitov

On 4/5/16 3:02 AM, Naveen N. Rao wrote:

Some of these tests proved useful with the powerpc eBPF JIT port due to
sign-extended 16-bit immediate loads. Though some of these aspects get
covered in other tests, it is better to have explicit tests so as to
quickly tag the precise problem.

Cc: Alexei Starovoitov <a...@fb.com>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: "David S. Miller" <da...@davemloft.net>
Cc: Ananth N Mavinakayanahalli <ana...@in.ibm.com>
Cc: Michael Ellerman <m...@ellerman.id.au>
Cc: Paul Mackerras <pau...@samba.org>
Signed-off-by: Naveen N. Rao <naveen.n@linux.vnet.ibm.com>


Makes sense. Looks like ppc jit will be using quite a bit of
available ppc instructions. Nice.

I'm assuming all these new tests passed with x64 jit?

Acked-by: Alexei Starovoitov <a...@kernel.org>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH net 3/4] lib/test_bpf: Add test to check for result of 32-bit add that overflows

2016-04-05 Thread Alexei Starovoitov

On 4/5/16 3:02 AM, Naveen N. Rao wrote:

BPF_ALU32 and BPF_ALU64 tests for adding two 32-bit values that results in
32-bit overflow.

Cc: Alexei Starovoitov <a...@fb.com>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: "David S. Miller" <da...@davemloft.net>
Cc: Ananth N Mavinakayanahalli <ana...@in.ibm.com>
Cc: Michael Ellerman <m...@ellerman.id.au>
Cc: Paul Mackerras <pau...@samba.org>
Signed-off-by: Naveen N. Rao <naveen.n@linux.vnet.ibm.com>


Acked-by: Alexei Starovoitov <a...@kernel.org>


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH net 2/4] lib/test_bpf: Add tests for unsigned BPF_JGT

2016-04-05 Thread Alexei Starovoitov

On 4/5/16 3:02 AM, Naveen N. Rao wrote:

Unsigned Jump-if-Greater-Than.

Cc: Alexei Starovoitov <a...@fb.com>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: "David S. Miller" <da...@davemloft.net>
Cc: Ananth N Mavinakayanahalli <ana...@in.ibm.com>
Cc: Michael Ellerman <m...@ellerman.id.au>
Cc: Paul Mackerras <pau...@samba.org>
Signed-off-by: Naveen N. Rao <naveen.n@linux.vnet.ibm.com>


I think some of the tests already cover it, but extra tests are
always great.
Acked-by: Alexei Starovoitov <a...@kernel.org>

I think the whole set belongs in net-next.
Next time you submit the patches please say [PATCH net-next] in subject.
[PATCH net] is for bugfixes only.
Thanks a bunch!

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH net 1/4] lib/test_bpf: Fix JMP_JSET tests

2016-04-05 Thread Alexei Starovoitov

On 4/5/16 3:02 AM, Naveen N. Rao wrote:

JMP_JSET tests incorrectly used BPF_JNE. Fix the same.

Cc: Alexei Starovoitov <a...@fb.com>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: "David S. Miller" <da...@davemloft.net>
Cc: Ananth N Mavinakayanahalli <ana...@in.ibm.com>
Cc: Michael Ellerman <m...@ellerman.id.au>
Cc: Paul Mackerras <pau...@samba.org>
Signed-off-by: Naveen N. Rao <naveen.n@linux.vnet.ibm.com>


Good catch.
Fixes: cffc642d93f9 ("test_bpf: add 173 new testcases for eBPF")
Acked-by: Alexei Starovoitov <a...@kernel.org>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCHv2 net 3/3] samples/bpf: Enable powerpc support

2016-04-04 Thread Alexei Starovoitov
On Mon, Apr 04, 2016 at 10:31:34PM +0530, Naveen N. Rao wrote:
> Add the necessary definitions for building bpf samples on ppc.
> 
> Since ppc doesn't store function return address on the stack, modify how
> PT_REGS_RET() and PT_REGS_FP() work.
> 
> Also, introduce PT_REGS_IP() to access the instruction pointer.
> 
> Cc: Alexei Starovoitov <a...@fb.com>
> Cc: Daniel Borkmann <dan...@iogearbox.net>
> Cc: David S. Miller <da...@davemloft.net>
> Cc: Ananth N Mavinakayanahalli <ana...@in.ibm.com>
> Cc: Michael Ellerman <m...@ellerman.id.au>
> Signed-off-by: Naveen N. Rao <naveen.n@linux.vnet.ibm.com>

Acked-by: Alexei Starovoitov <a...@kernel.org>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCHv2 net 2/3] samples/bpf: Use llc in PATH, rather than a hardcoded value

2016-04-04 Thread Alexei Starovoitov
On Mon, Apr 04, 2016 at 10:31:33PM +0530, Naveen N. Rao wrote:
> While at it, remove the generation of .s files and fix some typos in the
> related comment.
> 
> Cc: Alexei Starovoitov <a...@fb.com>
> Cc: David S. Miller <da...@davemloft.net>
> Cc: Daniel Borkmann <dan...@iogearbox.net>
> Cc: Ananth N Mavinakayanahalli <ana...@in.ibm.com>
> Cc: Michael Ellerman <m...@ellerman.id.au>
> Signed-off-by: Naveen N. Rao <naveen.n@linux.vnet.ibm.com>

Acked-by: Alexei Starovoitov <a...@kernel.org>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH 6/6] ppc: ebpf/jit: Implement JIT compiler for extended BPF

2016-04-01 Thread Alexei Starovoitov

On 4/1/16 2:58 AM, Naveen N. Rao wrote:

PPC64 eBPF JIT compiler. Works for both ABIv1 and ABIv2.

Enable with:
echo 1 > /proc/sys/net/core/bpf_jit_enable
or
echo 2 > /proc/sys/net/core/bpf_jit_enable

... to see the generated JIT code. This can further be processed with
tools/net/bpf_jit_disasm.

With CONFIG_TEST_BPF=m and 'modprobe test_bpf':
test_bpf: Summary: 291 PASSED, 0 FAILED, [234/283 JIT'ed]

... on both ppc64 BE and LE.

The details of the approach are documented through various comments in
the code, as are the TODOs. Some of the prominent TODOs include
implementing BPF tail calls and skb loads.

Cc: Matt Evans <m...@ozlabs.org>
Cc: Michael Ellerman <m...@ellerman.id.au>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Alexei Starovoitov <a...@fb.com>
Cc: "David S. Miller" <da...@davemloft.net>
Cc: Ananth N Mavinakayanahalli <ana...@in.ibm.com>
Signed-off-by: Naveen N. Rao <naveen.n@linux.vnet.ibm.com>
---
  arch/powerpc/include/asm/ppc-opcode.h |  19 +-
  arch/powerpc/net/Makefile |   4 +
  arch/powerpc/net/bpf_jit.h|  66 ++-
  arch/powerpc/net/bpf_jit64.h  |  58 +++
  arch/powerpc/net/bpf_jit_comp64.c | 828 ++
  5 files changed, 973 insertions(+), 2 deletions(-)
  create mode 100644 arch/powerpc/net/bpf_jit64.h
  create mode 100644 arch/powerpc/net/bpf_jit_comp64.c

...

-#ifdef CONFIG_PPC64
+#if defined(CONFIG_PPC64) && (!defined(_CALL_ELF) || _CALL_ELF != 2)


impressive stuff!
Everything nicely documented. Could you add few words for the above
condition as well ?
Or may be a new macro, since it occurs many times?
What are these _CALL_ELF == 2 and != 2 conditions mean? ppc ABIs ?
Will there ever be v3 ?

So far most of the bpf jits were going via net-next tree, but if
in this case no changes to the core is necessary then I guess it's fine
to do it via powerpc tree. What's your plan?

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 4/4] samples/bpf: Enable powerpc support

2016-04-01 Thread Alexei Starovoitov

On 4/1/16 7:41 AM, Naveen N. Rao wrote:

On 2016/03/31 10:52AM, Alexei Starovoitov wrote:

On 3/31/16 4:25 AM, Naveen N. Rao wrote:
...

+
+#ifdef __powerpc__
+#define BPF_KPROBE_READ_RET_IP(ip, ctx){ (ip) = (ctx)->link; }
+#define BPF_KRETPROBE_READ_RET_IP(ip, ctx) BPF_KPROBE_READ_RET_IP(ip, ctx)
+#else
+#define BPF_KPROBE_READ_RET_IP(ip, ctx)
\
+   bpf_probe_read(&(ip), sizeof(ip), (void *)PT_REGS_RET(ctx))
+#define BPF_KRETPROBE_READ_RET_IP(ip, ctx) 
\
+   bpf_probe_read(&(ip), sizeof(ip),   \
+   (void *)(PT_REGS_FP(ctx) + sizeof(ip)))


makes sense, but please use ({ }) gcc extension instead of {} and
open call to make sure that macro body is scoped.


To be sure I understand this right, do you mean something like this?

+
+#ifdef __powerpc__
+#define BPF_KPROBE_READ_RET_IP(ip, ctx)({ (ip) = (ctx)->link; 
})
+#define BPF_KRETPROBE_READ_RET_IP  BPF_KPROBE_READ_RET_IP
+#else
+#define BPF_KPROBE_READ_RET_IP(ip, ctx)({  
\
+   bpf_probe_read(&(ip), sizeof(ip), (void *)PT_REGS_RET(ctx)); })
+#define BPF_KRETPROBE_READ_RET_IP(ip, ctx) ({  
\
+   bpf_probe_read(&(ip), sizeof(ip),   
\
+   (void *)(PT_REGS_FP(ctx) + sizeof(ip))); })
+#endif


yes. Thanks!

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/4] samples/bpf: Use llc in PATH, rather than a hardcoded value

2016-04-01 Thread Alexei Starovoitov

On 4/1/16 7:37 AM, Naveen N. Rao wrote:

On 2016/03/31 08:19PM, Daniel Borkmann wrote:

On 03/31/2016 07:46 PM, Alexei Starovoitov wrote:

On 3/31/16 4:25 AM, Naveen N. Rao wrote:

  clang $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) \
  -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
--O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@
+-O2 -emit-llvm -c $< -o -| llc -march=bpf -filetype=obj -o $@
  clang $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) \
  -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
--O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=asm -o $@.s
+-O2 -emit-llvm -c $< -o -| llc -march=bpf -filetype=asm -o $@.s


that was a workaround when clang/llvm didn't have bpf support.
Now clang 3.7 and 3.8 have bpf built-in, so make sense to remove
manual calls to llc completely.
Just use 'clang -target bpf -O2 -D... -c $< -o $@'


+1, the clang part in that Makefile should also more correctly be called
with '-target bpf' as it turns out (despite llc with '-march=bpf' ...).
Better to use clang directly as suggested by Alexei.


I'm likely missing something obvious, but I cannot get this to work.
With this diff:

 $(obj)/%.o: $(src)/%.c
clang $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) \
-D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value 
-Wno-pointer-sign \
-   -O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf 
-filetype=obj -o $@
-   clang $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) \
-   -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value 
-Wno-pointer-sign \
-   -O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf 
-filetype=asm -o $@.s
+   -O2 -target bpf -c $< -o $@

I see far too many errors thrown starting with:
./arch/x86/include/asm/arch_hweight.h:31:10: error: invalid output 
constraint '=a' in asm
 : "="REG_OUT (res)


ahh. yes. when processing kernel headers clang has to assume x86 style
inline asm, though all of these functions will be ignored.
I don't have a quick fix for this yet.
Let's go back to your original change $(LLC)->llc

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/4] samples/bpf: Simplify building BPF samples

2016-03-31 Thread Alexei Starovoitov

On 3/31/16 11:51 AM, Naveen N. Rao wrote:

On 2016/03/31 10:49AM, Alexei Starovoitov wrote:

On 3/31/16 4:25 AM, Naveen N. Rao wrote:

Make BPF samples build depend on CONFIG_SAMPLE_BPF. We still don't add a
Kconfig option since that will add a dependency on llvm for allyesconfig
builds which may not be desirable.

Those who need to build the BPF samples can now just do:

make CONFIG_SAMPLE_BPF=y

or:

export CONFIG_SAMPLE_BPF=y
make


I don't like this 'simplification'.
make samples/bpf/
is much easier to type than capital letters.


This started out as a patch to have the BPF samples built with a Kconfig
option. As stated in the commit description, I realised that it won't
work for allyesconfig builds. However, the reason I retained this patch
is since it gets us one step closer to building the samples as part of
the kernel build.

The 'simplification' is since I can now have the export in my .bashrc
and the kernel build will now build the BPF samples too without
requiring an additional 'make samples/bpf/' step.

I agree this is subjective, so I am ok if this isn't taken in.


If you can change it that 'make samples/bpf/' still works then it would
be fine. As it is it breaks our testing setup.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/4] samples/bpf: Fix build breakage with map_perf_test_user.c

2016-03-31 Thread Alexei Starovoitov

On 3/31/16 11:46 AM, Naveen N. Rao wrote:

It's failing this way on powerpc? Odd.

This fails for me on x86_64 too -- RHEL 7.1.


indeed. fails on centos 7.1, whereas centos 6.7 is fine.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/4] samples/bpf: Simplify building BPF samples

2016-03-31 Thread Alexei Starovoitov

On 3/31/16 4:25 AM, Naveen N. Rao wrote:

Make BPF samples build depend on CONFIG_SAMPLE_BPF. We still don't add a
Kconfig option since that will add a dependency on llvm for allyesconfig
builds which may not be desirable.

Those who need to build the BPF samples can now just do:

make CONFIG_SAMPLE_BPF=y

or:

export CONFIG_SAMPLE_BPF=y
make


I don't like this 'simplification'.
make samples/bpf/
is much easier to type than capital letters.


diff --git a/samples/Makefile b/samples/Makefile
index 48001d7..3c77fc8 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -2,4 +2,4 @@

  obj-$(CONFIG_SAMPLES) += kobject/ kprobes/ trace_events/ livepatch/ \
   hw_breakpoint/ kfifo/ kdb/ hidraw/ rpmsg/ seccomp/ \
-  configfs/
+  configfs/ bpf/
diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 88bc5a0..bc5b675 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -2,23 +2,23 @@
  obj- := dummy.o

  # List of programs to build
-hostprogs-y := test_verifier test_maps
-hostprogs-y += sock_example
-hostprogs-y += fds_example
-hostprogs-y += sockex1
-hostprogs-y += sockex2
-hostprogs-y += sockex3
-hostprogs-y += tracex1
-hostprogs-y += tracex2
-hostprogs-y += tracex3
-hostprogs-y += tracex4
-hostprogs-y += tracex5
-hostprogs-y += tracex6
-hostprogs-y += trace_output
-hostprogs-y += lathist
-hostprogs-y += offwaketime
-hostprogs-y += spintest
-hostprogs-y += map_perf_test
+hostprogs-$(CONFIG_SAMPLE_BPF) := test_verifier test_maps
+hostprogs-$(CONFIG_SAMPLE_BPF) += sock_example
+hostprogs-$(CONFIG_SAMPLE_BPF) += fds_example
+hostprogs-$(CONFIG_SAMPLE_BPF) += sockex1
+hostprogs-$(CONFIG_SAMPLE_BPF) += sockex2
+hostprogs-$(CONFIG_SAMPLE_BPF) += sockex3
+hostprogs-$(CONFIG_SAMPLE_BPF) += tracex1
+hostprogs-$(CONFIG_SAMPLE_BPF) += tracex2
+hostprogs-$(CONFIG_SAMPLE_BPF) += tracex3
+hostprogs-$(CONFIG_SAMPLE_BPF) += tracex4
+hostprogs-$(CONFIG_SAMPLE_BPF) += tracex5
+hostprogs-$(CONFIG_SAMPLE_BPF) += tracex6
+hostprogs-$(CONFIG_SAMPLE_BPF) += trace_output
+hostprogs-$(CONFIG_SAMPLE_BPF) += lathist
+hostprogs-$(CONFIG_SAMPLE_BPF) += offwaketime
+hostprogs-$(CONFIG_SAMPLE_BPF) += spintest
+hostprogs-$(CONFIG_SAMPLE_BPF) += map_perf_test

  test_verifier-objs := test_verifier.o libbpf.o
  test_maps-objs := test_maps.o libbpf.o
@@ -39,8 +39,8 @@ offwaketime-objs := bpf_load.o libbpf.o offwaketime_user.o
  spintest-objs := bpf_load.o libbpf.o spintest_user.o
  map_perf_test-objs := bpf_load.o libbpf.o map_perf_test_user.o

-# Tell kbuild to always build the programs
-always := $(hostprogs-y)
+ifdef CONFIG_SAMPLE_BPF
+always := $(hostprogs-$(CONFIG_SAMPLE_BPF))
  always += sockex1_kern.o
  always += sockex2_kern.o
  always += sockex3_kern.o
@@ -56,6 +56,7 @@ always += lathist_kern.o
  always += offwaketime_kern.o
  always += spintest_kern.o
  always += map_perf_test_kern.o
+endif

  HOSTCFLAGS += -I$(objtree)/usr/include




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/4] samples/bpf: Use llc in PATH, rather than a hardcoded value

2016-03-31 Thread Alexei Starovoitov

On 3/31/16 4:25 AM, Naveen N. Rao wrote:

While at it, fix some typos in the comment.

Cc: Alexei Starovoitov <a...@fb.com>
Cc: David S. Miller <da...@davemloft.net>
Cc: Ananth N Mavinakayanahalli <ana...@in.ibm.com>
Cc: Michael Ellerman <m...@ellerman.id.au>
Signed-off-by: Naveen N. Rao <naveen.n@linux.vnet.ibm.com>
---
  samples/bpf/Makefile | 11 ---
  1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 502c9fc..88bc5a0 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -76,16 +76,13 @@ HOSTLOADLIBES_offwaketime += -lelf
  HOSTLOADLIBES_spintest += -lelf
  HOSTLOADLIBES_map_perf_test += -lelf -lrt

-# point this to your LLVM backend with bpf support
-LLC=$(srctree)/tools/bpf/llvm/bld/Debug+Asserts/bin/llc
-
-# asm/sysreg.h inline assmbly used by it is incompatible with llvm.
-# But, ehere is not easy way to fix it, so just exclude it since it is
+# asm/sysreg.h - inline assembly used by it is incompatible with llvm.
+# But, there is no easy way to fix it, so just exclude it since it is
  # useless for BPF samples.
  $(obj)/%.o: $(src)/%.c
clang $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) \
-D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value 
-Wno-pointer-sign \
-   -O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@
+   -O2 -emit-llvm -c $< -o -| llc -march=bpf -filetype=obj -o $@
clang $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) \
-D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value 
-Wno-pointer-sign \
-   -O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=asm -o 
$@.s
+   -O2 -emit-llvm -c $< -o -| llc -march=bpf -filetype=asm -o $@.s


that was a workaround when clang/llvm didn't have bpf support.
Now clang 3.7 and 3.8 have bpf built-in, so make sense to remove
manual calls to llc completely.
Just use 'clang -target bpf -O2 -D... -c $< -o $@'

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/4] samples/bpf: Fix build breakage with map_perf_test_user.c

2016-03-31 Thread Alexei Starovoitov

On 3/31/16 4:25 AM, Naveen N. Rao wrote:

Building BPF samples is failing with the below error:

samples/bpf/map_perf_test_user.c: In function ‘main’:
samples/bpf/map_perf_test_user.c:134:9: error: variable ‘r’ has
initializer but incomplete type
   struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
  ^
Fix this by including the necessary header file.

Cc: Alexei Starovoitov <a...@fb.com>
Cc: David S. Miller <da...@davemloft.net>
Cc: Ananth N Mavinakayanahalli <ana...@in.ibm.com>
Cc: Michael Ellerman <m...@ellerman.id.au>
Signed-off-by: Naveen N. Rao <naveen.n@linux.vnet.ibm.com>
---
  samples/bpf/map_perf_test_user.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/samples/bpf/map_perf_test_user.c b/samples/bpf/map_perf_test_user.c
index 95af56e..3147377 100644
--- a/samples/bpf/map_perf_test_user.c
+++ b/samples/bpf/map_perf_test_user.c
@@ -17,6 +17,7 @@
  #include 
  #include 
  #include 
+#include 
  #include "libbpf.h"
  #include "bpf_load.h"


It's failing this way on powerpc? Odd.
Such hidden header dependency was always puzzling to me. Anyway:
Acked-by: Alexei Starovoitov <a...@kernel.org>

I'm assuming you want this set to go via 'net' tree, so please resubmit
with [PATCH net 1/4] subjects and cc netdev.

Reviewing your other patches...

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 4/4] samples/bpf: Enable powerpc support

2016-03-31 Thread Alexei Starovoitov

On 3/31/16 4:25 AM, Naveen N. Rao wrote:

Add the necessary definitions for building bpf samples on ppc.

Since ppc doesn't store function return address on the stack, modify how
PT_REGS_RET() and PT_REGS_FP() work.

Also, introduce PT_REGS_IP() to access the instruction pointer. I have
fixed this to work with x86_64 and arm64, but not s390.

Cc: Alexei Starovoitov <a...@fb.com>
Cc: David S. Miller <da...@davemloft.net>
Cc: Ananth N Mavinakayanahalli <ana...@in.ibm.com>
Cc: Michael Ellerman <m...@ellerman.id.au>
Signed-off-by: Naveen N. Rao <naveen.n@linux.vnet.ibm.com>
---

...

+
+#ifdef __powerpc__
+#define BPF_KPROBE_READ_RET_IP(ip, ctx){ (ip) = (ctx)->link; }
+#define BPF_KRETPROBE_READ_RET_IP(ip, ctx) BPF_KPROBE_READ_RET_IP(ip, ctx)
+#else
+#define BPF_KPROBE_READ_RET_IP(ip, ctx)
\
+   bpf_probe_read(&(ip), sizeof(ip), (void *)PT_REGS_RET(ctx))
+#define BPF_KRETPROBE_READ_RET_IP(ip, ctx) 
\
+   bpf_probe_read(&(ip), sizeof(ip),   \
+   (void *)(PT_REGS_FP(ctx) + sizeof(ip)))


makes sense, but please use ({ }) gcc extension instead of {} and
open call to make sure that macro body is scoped.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] net: filter: make JITs zero A for SKF_AD_ALU_XOR_X

2016-01-05 Thread Alexei Starovoitov
On Tue, Jan 05, 2016 at 05:36:47PM +0100, Daniel Borkmann wrote:
> On 01/05/2016 04:23 PM, Rabin Vincent wrote:
> >The SKF_AD_ALU_XOR_X ancillary is not like the other ancillary data
> >instructions since it XORs A with X while all the others replace A with
> >some loaded value.  All the BPF JITs fail to clear A if this is used as
> >the first instruction in a filter.  This was found using american fuzzy
> >lop.
> >
> >Add a helper to determine if A needs to be cleared given the first
> >instruction in a filter, and use this in the JITs.  Except for ARM, the
> >rest have only been compile-tested.
> >
> >Fixes: 3480593131e0 ("net: filter: get rid of BPF_S_* enum")
> >Signed-off-by: Rabin Vincent <ra...@rab.in>
> 
> Excellent catch, thanks a lot! The fix looks good to me and should
> go to -net tree.
> 
> Acked-by: Daniel Borkmann <dan...@iogearbox.net>

good catch indeed.
Classic bpf jits didn't have much love. Great to see this work.

Acked-by: Alexei Starovoitov <a...@kernel.org>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH net-next 0/6] bpf: Enable BPF JIT on ppc32

2015-02-16 Thread Alexei Starovoitov
On Mon, Feb 16, 2015 at 2:13 AM, Denis Kirjanov k...@linux-powerpc.org wrote:
 On 2/15/15, Daniel Borkmann dan...@iogearbox.net wrote:
 On 02/15/2015 07:06 PM, Denis Kirjanov wrote:
 This patch series enables BPF JIT on ppc32. There are relatevily
 few chnages in the code to make it work.

 All test_bpf tests passed both on 7447a and P2041-based machines.

 I'm just wondering, next to the feedback that has already been
 provided, would opening this up for ppc32 make it significantly
 more difficult in future to migrate from classic BPF JIT to eBPF
 JIT eventually (which is what we want long-term)? Being curious,
 is there any ongoing effort from ppc people?


 Well, I don't see significant challenges to enable eBPF on ppc64 in the 
 future.
 I'll start working on it after I get this merged

sounds great. looking forward to it :)
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/3] module: remove mod arg from module_free, rename module_memfree().

2015-01-07 Thread Alexei Starovoitov
On Wed, Jan 7, 2015 at 4:58 PM, Rusty Russell ru...@rustcorp.com.au wrote:
 --- a/kernel/bpf/core.c
 +++ b/kernel/bpf/core.c
  void bpf_jit_binary_free(struct bpf_binary_header *hdr)
  {
 -   module_free(NULL, hdr);
 +   module_memfree(hdr);
  }
...
 -void __weak module_free(struct module *mod, void *module_region)
 +void __weak module_memfree(void *module_region)
  {
 vfree(module_region);
  }

Looks obviously correct.
Acked-by: Alexei Starovoitov a...@kernel.org
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH net-next] PPC: bpf_jit_comp: Unify BPF_MOD | BPF_X and BPF_DIV | BPF_X

2014-11-18 Thread Alexei Starovoitov
On Mon, Nov 17, 2014 at 10:58 PM, Denis Kirjanov k...@linux-powerpc.org wrote:
 Hi Michael,

 This patch added no new functionality so I haven't put the test
 results (of course I ran the test suite to check the patch).

 The output :
 [  650.198958] test_bpf: Summary: 60 PASSED, 0 FAILED

Acked-by: Alexei Starovoitov a...@plumgrid.com

btw, please don't top post.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 net-next] PPC: bpf_jit_comp: add SKF_AD_HATYPE instruction

2014-11-10 Thread Alexei Starovoitov
On Sun, Nov 9, 2014 at 9:59 PM, Denis Kirjanov k...@linux-powerpc.org wrote:
 Add BPF extension SKF_AD_HATYPE to ppc JIT to check
 the hw type of the interface

 Before:
 [   57.723666] test_bpf: #20 LD_HATYPE
 [   57.723675] BPF filter opcode 0020 (@0) unsupported
 [   57.724168] 48 48 PASS

 After:
 [  103.053184] test_bpf: #20 LD_HATYPE 7 6 PASS

 CC: Alexei Starovoitovalexei.starovoi...@gmail.com
 CC: Daniel Borkmanndbork...@redhat.com
 CC: Philippe Bergheaudfe...@linux.vnet.ibm.com
 Signed-off-by: Denis Kirjanov k...@linux-powerpc.org

 v2: address Alexei's comments
 ---
  arch/powerpc/net/bpf_jit_comp.c | 17 +
  1 file changed, 13 insertions(+), 4 deletions(-)

 diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
 index d110e28..d3fa80d 100644
 --- a/arch/powerpc/net/bpf_jit_comp.c
 +++ b/arch/powerpc/net/bpf_jit_comp.c
 @@ -361,6 +361,11 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 
 *image,
 protocol));
 break;
 case BPF_ANC | SKF_AD_IFINDEX:
 +   case BPF_ANC | SKF_AD_HATYPE:
 +   BUILD_BUG_ON(FIELD_SIZEOF(struct net_device,
 +   ifindex) != 4);
 +   BUILD_BUG_ON(FIELD_SIZEOF(struct net_device,
 +   type) != 2);
 PPC_LD_OFFS(r_scratch1, r_skb, offsetof(struct 
 sk_buff,
 dev));
 PPC_CMPDI(r_scratch1, 0);
 @@ -368,14 +373,18 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 
 *image,
 PPC_BCC(COND_EQ, addrs[ctx-pc_ret0]);
 } else {
 /* Exit, returning 0; first pass hits here. */
 -   PPC_BCC_SHORT(COND_NE, (ctx-idx*4)+12);
 +   PPC_BCC_SHORT(COND_NE, ctx-idx * 4 + 12);
 PPC_LI(r_ret, 0);
 PPC_JMP(exit_addr);
 }
 -   BUILD_BUG_ON(FIELD_SIZEOF(struct net_device,
 - ifindex) != 4);
 -   PPC_LWZ_OFFS(r_A, r_scratch1,
 +   if (code == (BPF_ANC | SKF_AD_IFINDEX)) {
 +   PPC_LWZ_OFFS(r_A, r_scratch1,
  offsetof(struct net_device, ifindex));
 +   } else {
 +   PPC_LHZ_OFFS(r_A, r_scratch1,
 +offsetof(struct net_device, type));

formatting is a bit off here, but that's minor.
Acked-by: Alexei Starovoitov a...@plumgrid.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH net-next] PPC: bpf_jit_comp: add SKF_AD_HATYPE instruction

2014-11-08 Thread Alexei Starovoitov
On Wed, Nov 5, 2014 at 10:02 PM, Denis Kirjanov k...@linux-powerpc.org wrote:
 Add BPF extension SKF_AD_HATYPE to ppc JIT to check
 the hw type of the interface

 JIT off:
 [   69.106783] test_bpf: #20 LD_HATYPE 48 48 PASS
 JIT on:
 [   64.721757] test_bpf: #20 LD_HATYPE 7 6 PASS

 CC: Alexei Starovoitovalexei.starovoi...@gmail.com
 CC: Daniel Borkmanndbork...@redhat.com
 CC: Philippe Bergheaudfe...@linux.vnet.ibm.com
 Signed-off-by: Denis Kirjanov k...@linux-powerpc.org
 ---
  arch/powerpc/net/bpf_jit_comp.c | 16 
  1 file changed, 16 insertions(+)

 diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
 index d110e28..8bf4fc2 100644
 --- a/arch/powerpc/net/bpf_jit_comp.c
 +++ b/arch/powerpc/net/bpf_jit_comp.c
 @@ -412,6 +412,22 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 
 *image,
 PPC_ANDI(r_A, r_A, PKT_TYPE_MAX);
 PPC_SRWI(r_A, r_A, 5);
 break;
 +   case BPF_ANC | SKF_AD_HATYPE:
 +   BUILD_BUG_ON(FIELD_SIZEOF(struct net_device, type) != 
 2);
 +   PPC_LD_OFFS(r_scratch1, r_skb, offsetof(struct 
 sk_buff,
 +   dev));
 +   PPC_CMPDI(r_scratch1, 0);
 +   if (ctx-pc_ret0 != -1) {
 +   PPC_BCC(COND_EQ, addrs[ctx-pc_ret0]);
 +   } else {
 +   /* Exit, returning 0; first pass hits here. */
 +   PPC_BCC_SHORT(COND_NE, (ctx-idx*4)+12);

please use canonical formatting ctx-idx * 4 + 12

 +   PPC_LI(r_ret, 0);
 +   PPC_JMP(exit_addr);
 +   }
 +   PPC_LHZ_OFFS(r_A, r_scratch1,
 +offsetof(struct net_device, type));

the whole thing looks like copy paste from 'case ifindex'.
Would be nice to handle them together to reduce
duplicated code, sine only last load is different.

Also in commit log please do _both_ runs with JIT on.
You should see a difference before/after applying this patch.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2] PPC: bpf_jit_comp: add SKF_AD_PKTTYPE instruction

2014-11-03 Thread Alexei Starovoitov
On Mon, Nov 3, 2014 at 9:06 AM, David Miller da...@davemloft.net wrote:
 From: Denis Kirjanov k...@linux-powerpc.org
 Date: Thu, 30 Oct 2014 09:12:15 +0300

 Add BPF extension SKF_AD_PKTTYPE to ppc JIT to load
 skb-pkt_type field.

 Before:
 [   88.262622] test_bpf: #11 LD_IND_NET 86 97 99 PASS
 [   88.265740] test_bpf: #12 LD_PKTTYPE 109 107 PASS

 After:
 [   80.605964] test_bpf: #11 LD_IND_NET 44 40 39 PASS
 [   80.607370] test_bpf: #12 LD_PKTTYPE 9 9 PASS

 CC: Alexei Starovoitovalexei.starovoi...@gmail.com
 CC: Michael Ellermanm...@ellerman.id.au
 Cc: Matt Evans m...@ozlabs.org
 Signed-off-by: Denis Kirjanov k...@linux-powerpc.org

 v2: Added test rusults

 So, can I apply this now?

I think this question is more towards ppc folks,
since both Daniel and myself said before that it looks ok.
Philippe just tested the previous version of this patch on ppc64le...
I'm guessing that Matt (original author of bpf jit for ppc) is not replying,
because he has no objections.
Either way the addition is tiny and contained, so can go in now.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2] PPC: bpf_jit_comp: add SKF_AD_PKTTYPE instruction

2014-10-30 Thread Alexei Starovoitov
On Wed, Oct 29, 2014 at 11:12 PM, Denis Kirjanov k...@linux-powerpc.org wrote:
 Add BPF extension SKF_AD_PKTTYPE to ppc JIT to load
 skb-pkt_type field.

 Before:
 [   88.262622] test_bpf: #11 LD_IND_NET 86 97 99 PASS
 [   88.265740] test_bpf: #12 LD_PKTTYPE 109 107 PASS

 After:
 [   80.605964] test_bpf: #11 LD_IND_NET 44 40 39 PASS
 [   80.607370] test_bpf: #12 LD_PKTTYPE 9 9 PASS

if you'd only quoted #12, it would all make sense ;)
but #11 test is not using PKTTYPE. So your patch shouldn't
make a difference. Are these numbers with JIT on and off?
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] PPC: bpf_jit_comp: add SKF_AD_PKTTYPE instruction

2014-10-29 Thread Alexei Starovoitov
On Wed, Oct 29, 2014 at 2:21 AM, Denis Kirjanov k...@linux-powerpc.org wrote:
 Any feedback from PPC folks?

not a ppc guy, but looks reasonable to me.
What lib/test_bpf says? Like performance difference before/after
for LD_PKTTYPE test...
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 net-next] fix unsafe set_memory_rw from softirq

2013-10-04 Thread Alexei Starovoitov
On Thu, Oct 3, 2013 at 10:16 PM, Eric Dumazet eric.duma...@gmail.com wrote:
 On Thu, 2013-10-03 at 21:11 -0700, Alexei Starovoitov wrote:

 -static inline unsigned int sk_filter_len(const struct sk_filter *fp)
 +static inline unsigned int sk_filter_size(const struct sk_filter *fp,
 + unsigned int proglen)
  {
 -   return fp-len * sizeof(struct sock_filter) + sizeof(*fp);
 +   return max(sizeof(*fp),
 +  offsetof(struct sk_filter, insns[proglen]));
  }

indeed that's cleaner.
Like this then:
-static inline unsigned int sk_filter_len(const struct sk_filter *fp)
+static inline unsigned int sk_filter_size(unsigned int proglen)
 {
-   return fp-len * sizeof(struct sock_filter) + sizeof(*fp);
+   return max(sizeof(struct sk_filter),
+  offsetof(struct sk_filter, insns[proglen]));
 }

testing it... will send v4 shortly
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v3 net-next] fix unsafe set_memory_rw from softirq

2013-10-04 Thread Alexei Starovoitov
on x86 system with net.core.bpf_jit_enable = 1

sudo tcpdump -i eth1 'tcp port 22'

causes the warning:
[   56.766097]  Possible unsafe locking scenario:
[   56.766097]
[   56.780146]CPU0
[   56.786807]
[   56.793188]   lock((vb-lock)-rlock);
[   56.799593]   Interrupt
[   56.805889] lock((vb-lock)-rlock);
[   56.812266]
[   56.812266]  *** DEADLOCK ***
[   56.812266]
[   56.830670] 1 lock held by ksoftirqd/1/13:
[   56.836838]  #0:  (rcu_read_lock){.+.+..}, at: [8118f44c] 
vm_unmap_aliases+0x8c/0x380
[   56.849757]
[   56.849757] stack backtrace:
[   56.862194] CPU: 1 PID: 13 Comm: ksoftirqd/1 Not tainted 3.12.0-rc3+ #45
[   56.868721] Hardware name: System manufacturer System Product Name/P8Z77 WS, 
BIOS 3007 07/26/2012
[   56.882004]  821944c0 88080bbdb8c8 8175a145 
0007
[   56.895630]  88080bbd5f40 88080bbdb928 81755b14 
0001
[   56.909313]  88080001 8808 8101178f 
0001
[   56.923006] Call Trace:
[   56.929532]  [8175a145] dump_stack+0x55/0x76
[   56.936067]  [81755b14] print_usage_bug+0x1f7/0x208
[   56.942445]  [8101178f] ? save_stack_trace+0x2f/0x50
[   56.948932]  [810cc0a0] ? check_usage_backwards+0x150/0x150
[   56.955470]  [810ccb52] mark_lock+0x282/0x2c0
[   56.961945]  [810ccfed] __lock_acquire+0x45d/0x1d50
[   56.968474]  [810cce6e] ? __lock_acquire+0x2de/0x1d50
[   56.975140]  [81393bf5] ? cpumask_next_and+0x55/0x90
[   56.981942]  [810cef72] lock_acquire+0x92/0x1d0
[   56.988745]  [8118f52a] ? vm_unmap_aliases+0x16a/0x380
[   56.995619]  [817628f1] _raw_spin_lock+0x41/0x50
[   57.002493]  [8118f52a] ? vm_unmap_aliases+0x16a/0x380
[   57.009447]  [8118f52a] vm_unmap_aliases+0x16a/0x380
[   57.016477]  [8118f44c] ? vm_unmap_aliases+0x8c/0x380
[   57.023607]  [810436b0] change_page_attr_set_clr+0xc0/0x460
[   57.030818]  [810cfb8d] ? trace_hardirqs_on+0xd/0x10
[   57.037896]  [811a8330] ? kmem_cache_free+0xb0/0x2b0
[   57.044789]  [811b59c3] ? free_object_rcu+0x93/0xa0
[   57.051720]  [81043d9f] set_memory_rw+0x2f/0x40
[   57.058727]  [8104e17c] bpf_jit_free+0x2c/0x40
[   57.065577]  [81642cba] sk_filter_release_rcu+0x1a/0x30
[   57.072338]  [811108e2] rcu_process_callbacks+0x202/0x7c0
[   57.078962]  [81057f17] __do_softirq+0xf7/0x3f0
[   57.085373]  [81058245] run_ksoftirqd+0x35/0x70

cannot reuse jited filter memory, since it's readonly,
so use original bpf insns memory to hold work_struct

defer kfree of sk_filter until jit completed freeing

tested on x86_64 and i386

Signed-off-by: Alexei Starovoitov a...@plumgrid.com
---
 arch/arm/net/bpf_jit_32.c   |1 +
 arch/powerpc/net/bpf_jit_comp.c |1 +
 arch/s390/net/bpf_jit_comp.c|4 +++-
 arch/sparc/net/bpf_jit_comp.c   |1 +
 arch/x86/net/bpf_jit_comp.c |   20 +++-
 include/linux/filter.h  |   11 +--
 net/core/filter.c   |   11 +++
 7 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index f50d223..99b44e0 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -930,4 +930,5 @@ void bpf_jit_free(struct sk_filter *fp)
 {
if (fp-bpf_func != sk_run_filter)
module_free(NULL, fp-bpf_func);
+   kfree(fp);
 }
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index bf56e33..2345bdb 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -691,4 +691,5 @@ void bpf_jit_free(struct sk_filter *fp)
 {
if (fp-bpf_func != sk_run_filter)
module_free(NULL, fp-bpf_func);
+   kfree(fp);
 }
diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index 7092392..a5df511 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -881,7 +881,9 @@ void bpf_jit_free(struct sk_filter *fp)
struct bpf_binary_header *header = (void *)addr;
 
if (fp-bpf_func == sk_run_filter)
-   return;
+   goto free_filter;
set_memory_rw(addr, header-pages);
module_free(NULL, header);
+free_filter:
+   kfree(fp);
 }
diff --git a/arch/sparc/net/bpf_jit_comp.c b/arch/sparc/net/bpf_jit_comp.c
index 9c7be59..218b6b2 100644
--- a/arch/sparc/net/bpf_jit_comp.c
+++ b/arch/sparc/net/bpf_jit_comp.c
@@ -808,4 +808,5 @@ void bpf_jit_free(struct sk_filter *fp)
 {
if (fp-bpf_func != sk_run_filter)
module_free(NULL, fp-bpf_func);
+   kfree(fp);
 }
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 79c216a..1396a0a 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -772,13 +772,23 @@ out:
return;
 }
 
+static void

[PATCH v3 net-next] fix unsafe set_memory_rw from softirq

2013-10-04 Thread Alexei Starovoitov
on x86 system with net.core.bpf_jit_enable = 1

sudo tcpdump -i eth1 'tcp port 22'

causes the warning:
[   56.766097]  Possible unsafe locking scenario:
[   56.766097]
[   56.780146]CPU0
[   56.786807]
[   56.793188]   lock((vb-lock)-rlock);
[   56.799593]   Interrupt
[   56.805889] lock((vb-lock)-rlock);
[   56.812266]
[   56.812266]  *** DEADLOCK ***
[   56.812266]
[   56.830670] 1 lock held by ksoftirqd/1/13:
[   56.836838]  #0:  (rcu_read_lock){.+.+..}, at: [8118f44c] 
vm_unmap_aliases+0x8c/0x380
[   56.849757]
[   56.849757] stack backtrace:
[   56.862194] CPU: 1 PID: 13 Comm: ksoftirqd/1 Not tainted 3.12.0-rc3+ #45
[   56.868721] Hardware name: System manufacturer System Product Name/P8Z77 WS, 
BIOS 3007 07/26/2012
[   56.882004]  821944c0 88080bbdb8c8 8175a145 
0007
[   56.895630]  88080bbd5f40 88080bbdb928 81755b14 
0001
[   56.909313]  88080001 8808 8101178f 
0001
[   56.923006] Call Trace:
[   56.929532]  [8175a145] dump_stack+0x55/0x76
[   56.936067]  [81755b14] print_usage_bug+0x1f7/0x208
[   56.942445]  [8101178f] ? save_stack_trace+0x2f/0x50
[   56.948932]  [810cc0a0] ? check_usage_backwards+0x150/0x150
[   56.955470]  [810ccb52] mark_lock+0x282/0x2c0
[   56.961945]  [810ccfed] __lock_acquire+0x45d/0x1d50
[   56.968474]  [810cce6e] ? __lock_acquire+0x2de/0x1d50
[   56.975140]  [81393bf5] ? cpumask_next_and+0x55/0x90
[   56.981942]  [810cef72] lock_acquire+0x92/0x1d0
[   56.988745]  [8118f52a] ? vm_unmap_aliases+0x16a/0x380
[   56.995619]  [817628f1] _raw_spin_lock+0x41/0x50
[   57.002493]  [8118f52a] ? vm_unmap_aliases+0x16a/0x380
[   57.009447]  [8118f52a] vm_unmap_aliases+0x16a/0x380
[   57.016477]  [8118f44c] ? vm_unmap_aliases+0x8c/0x380
[   57.023607]  [810436b0] change_page_attr_set_clr+0xc0/0x460
[   57.030818]  [810cfb8d] ? trace_hardirqs_on+0xd/0x10
[   57.037896]  [811a8330] ? kmem_cache_free+0xb0/0x2b0
[   57.044789]  [811b59c3] ? free_object_rcu+0x93/0xa0
[   57.051720]  [81043d9f] set_memory_rw+0x2f/0x40
[   57.058727]  [8104e17c] bpf_jit_free+0x2c/0x40
[   57.065577]  [81642cba] sk_filter_release_rcu+0x1a/0x30
[   57.072338]  [811108e2] rcu_process_callbacks+0x202/0x7c0
[   57.078962]  [81057f17] __do_softirq+0xf7/0x3f0
[   57.085373]  [81058245] run_ksoftirqd+0x35/0x70

cannot reuse jited filter memory, since it's readonly,
so use original bpf insns memory to hold work_struct

defer kfree of sk_filter until jit completed freeing

tested on x86_64 and i386

Signed-off-by: Alexei Starovoitov a...@plumgrid.com
---
 arch/arm/net/bpf_jit_32.c   |1 +
 arch/powerpc/net/bpf_jit_comp.c |1 +
 arch/s390/net/bpf_jit_comp.c|4 +++-
 arch/sparc/net/bpf_jit_comp.c   |1 +
 arch/x86/net/bpf_jit_comp.c |   20 +++-
 include/linux/filter.h  |   11 +--
 net/core/filter.c   |   11 +++
 7 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index f50d223..99b44e0 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -930,4 +930,5 @@ void bpf_jit_free(struct sk_filter *fp)
 {
if (fp-bpf_func != sk_run_filter)
module_free(NULL, fp-bpf_func);
+   kfree(fp);
 }
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index bf56e33..2345bdb 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -691,4 +691,5 @@ void bpf_jit_free(struct sk_filter *fp)
 {
if (fp-bpf_func != sk_run_filter)
module_free(NULL, fp-bpf_func);
+   kfree(fp);
 }
diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index 7092392..a5df511 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -881,7 +881,9 @@ void bpf_jit_free(struct sk_filter *fp)
struct bpf_binary_header *header = (void *)addr;
 
if (fp-bpf_func == sk_run_filter)
-   return;
+   goto free_filter;
set_memory_rw(addr, header-pages);
module_free(NULL, header);
+free_filter:
+   kfree(fp);
 }
diff --git a/arch/sparc/net/bpf_jit_comp.c b/arch/sparc/net/bpf_jit_comp.c
index 9c7be59..218b6b2 100644
--- a/arch/sparc/net/bpf_jit_comp.c
+++ b/arch/sparc/net/bpf_jit_comp.c
@@ -808,4 +808,5 @@ void bpf_jit_free(struct sk_filter *fp)
 {
if (fp-bpf_func != sk_run_filter)
module_free(NULL, fp-bpf_func);
+   kfree(fp);
 }
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 79c216a..1396a0a 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -772,13 +772,23 @@ out:
return;
 }
 
+static void

[PATCH v4 net-next] fix unsafe set_memory_rw from softirq

2013-10-04 Thread Alexei Starovoitov
on x86 system with net.core.bpf_jit_enable = 1

sudo tcpdump -i eth1 'tcp port 22'

causes the warning:
[   56.766097]  Possible unsafe locking scenario:
[   56.766097]
[   56.780146]CPU0
[   56.786807]
[   56.793188]   lock((vb-lock)-rlock);
[   56.799593]   Interrupt
[   56.805889] lock((vb-lock)-rlock);
[   56.812266]
[   56.812266]  *** DEADLOCK ***
[   56.812266]
[   56.830670] 1 lock held by ksoftirqd/1/13:
[   56.836838]  #0:  (rcu_read_lock){.+.+..}, at: [8118f44c] 
vm_unmap_aliases+0x8c/0x380
[   56.849757]
[   56.849757] stack backtrace:
[   56.862194] CPU: 1 PID: 13 Comm: ksoftirqd/1 Not tainted 3.12.0-rc3+ #45
[   56.868721] Hardware name: System manufacturer System Product Name/P8Z77 WS, 
BIOS 3007 07/26/2012
[   56.882004]  821944c0 88080bbdb8c8 8175a145 
0007
[   56.895630]  88080bbd5f40 88080bbdb928 81755b14 
0001
[   56.909313]  88080001 8808 8101178f 
0001
[   56.923006] Call Trace:
[   56.929532]  [8175a145] dump_stack+0x55/0x76
[   56.936067]  [81755b14] print_usage_bug+0x1f7/0x208
[   56.942445]  [8101178f] ? save_stack_trace+0x2f/0x50
[   56.948932]  [810cc0a0] ? check_usage_backwards+0x150/0x150
[   56.955470]  [810ccb52] mark_lock+0x282/0x2c0
[   56.961945]  [810ccfed] __lock_acquire+0x45d/0x1d50
[   56.968474]  [810cce6e] ? __lock_acquire+0x2de/0x1d50
[   56.975140]  [81393bf5] ? cpumask_next_and+0x55/0x90
[   56.981942]  [810cef72] lock_acquire+0x92/0x1d0
[   56.988745]  [8118f52a] ? vm_unmap_aliases+0x16a/0x380
[   56.995619]  [817628f1] _raw_spin_lock+0x41/0x50
[   57.002493]  [8118f52a] ? vm_unmap_aliases+0x16a/0x380
[   57.009447]  [8118f52a] vm_unmap_aliases+0x16a/0x380
[   57.016477]  [8118f44c] ? vm_unmap_aliases+0x8c/0x380
[   57.023607]  [810436b0] change_page_attr_set_clr+0xc0/0x460
[   57.030818]  [810cfb8d] ? trace_hardirqs_on+0xd/0x10
[   57.037896]  [811a8330] ? kmem_cache_free+0xb0/0x2b0
[   57.044789]  [811b59c3] ? free_object_rcu+0x93/0xa0
[   57.051720]  [81043d9f] set_memory_rw+0x2f/0x40
[   57.058727]  [8104e17c] bpf_jit_free+0x2c/0x40
[   57.065577]  [81642cba] sk_filter_release_rcu+0x1a/0x30
[   57.072338]  [811108e2] rcu_process_callbacks+0x202/0x7c0
[   57.078962]  [81057f17] __do_softirq+0xf7/0x3f0
[   57.085373]  [81058245] run_ksoftirqd+0x35/0x70

cannot reuse jited filter memory, since it's readonly,
so use original bpf insns memory to hold work_struct

defer kfree of sk_filter until jit completed freeing

tested on x86_64 and i386

Signed-off-by: Alexei Starovoitov a...@plumgrid.com
---
 arch/arm/net/bpf_jit_32.c   |1 +
 arch/powerpc/net/bpf_jit_comp.c |1 +
 arch/s390/net/bpf_jit_comp.c|4 +++-
 arch/sparc/net/bpf_jit_comp.c   |1 +
 arch/x86/net/bpf_jit_comp.c |   18 +-
 include/linux/filter.h  |   15 +++
 include/net/sock.h  |6 ++
 net/core/filter.c   |8 
 8 files changed, 36 insertions(+), 18 deletions(-)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index f50d223..99b44e0 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -930,4 +930,5 @@ void bpf_jit_free(struct sk_filter *fp)
 {
if (fp-bpf_func != sk_run_filter)
module_free(NULL, fp-bpf_func);
+   kfree(fp);
 }
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index bf56e33..2345bdb 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -691,4 +691,5 @@ void bpf_jit_free(struct sk_filter *fp)
 {
if (fp-bpf_func != sk_run_filter)
module_free(NULL, fp-bpf_func);
+   kfree(fp);
 }
diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index 7092392..a5df511 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -881,7 +881,9 @@ void bpf_jit_free(struct sk_filter *fp)
struct bpf_binary_header *header = (void *)addr;
 
if (fp-bpf_func == sk_run_filter)
-   return;
+   goto free_filter;
set_memory_rw(addr, header-pages);
module_free(NULL, header);
+free_filter:
+   kfree(fp);
 }
diff --git a/arch/sparc/net/bpf_jit_comp.c b/arch/sparc/net/bpf_jit_comp.c
index 9c7be59..218b6b2 100644
--- a/arch/sparc/net/bpf_jit_comp.c
+++ b/arch/sparc/net/bpf_jit_comp.c
@@ -808,4 +808,5 @@ void bpf_jit_free(struct sk_filter *fp)
 {
if (fp-bpf_func != sk_run_filter)
module_free(NULL, fp-bpf_func);
+   kfree(fp);
 }
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 79c216a..516593e 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -772,13 +772,21 @@ out