Re: [PATCH net-next,v4 01/12] flow_offload: add flow_rule and flow_match structures and use them (fwd)

2018-11-30 Thread Julia Lawall
Hello,

It looks like the kfree on line 1573 should be moved down a few lines.

julia

-- Forwarded message --
Date: Sat, 1 Dec 2018 11:11:55 +0800
From: kbuild test robot 
To: kbu...@01.org
Cc: Julia Lawall 
Subject: Re: [PATCH net-next,
v4 01/12] flow_offload: add flow_rule and flow_match structures and use them

CC: kbuild-...@01.org
In-Reply-To: <20181129022231.2740-2-pa...@netfilter.org>
References: <20181129022231.2740-2-pa...@netfilter.org>
TO: Pablo Neira Ayuso 
CC: netdev@vger.kernel.org

Hi Pablo,

I love your patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]
[also build test WARNING on next-20181130]
[cannot apply to v4.20-rc4]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Pablo-Neira-Ayuso/add-flow_rule-infrastructure/20181130-204709
:: branch date: 14 hours ago
:: commit date: 14 hours ago

>> net/sched/cls_flower.c:1586:7-9: ERROR: reference preceded by free on line 
>> 1573

# 
https://github.com/0day-ci/linux/commit/012ba7afb48db31be34a84d5fe82c4ceb409fd2d
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout 012ba7afb48db31be34a84d5fe82c4ceb409fd2d
vim +1586 net/sched/cls_flower.c

34738452 Jiri Pirko2018-07-23  1544
b95ec7eb Jiri Pirko2018-07-23  1545  static void 
*fl_tmplt_create(struct net *net, struct tcf_chain *chain,
b95ec7eb Jiri Pirko2018-07-23  1546  struct 
nlattr **tca,
b95ec7eb Jiri Pirko2018-07-23  1547  struct 
netlink_ext_ack *extack)
b95ec7eb Jiri Pirko2018-07-23  1548  {
b95ec7eb Jiri Pirko2018-07-23  1549 struct fl_flow_tmplt *tmplt;
b95ec7eb Jiri Pirko2018-07-23  1550 struct nlattr **tb;
b95ec7eb Jiri Pirko2018-07-23  1551 int err;
b95ec7eb Jiri Pirko2018-07-23  1552
b95ec7eb Jiri Pirko2018-07-23  1553 if (!tca[TCA_OPTIONS])
b95ec7eb Jiri Pirko2018-07-23  1554 return ERR_PTR(-EINVAL);
b95ec7eb Jiri Pirko2018-07-23  1555
b95ec7eb Jiri Pirko2018-07-23  1556 tb = kcalloc(TCA_FLOWER_MAX + 
1, sizeof(struct nlattr *), GFP_KERNEL);
b95ec7eb Jiri Pirko2018-07-23  1557 if (!tb)
b95ec7eb Jiri Pirko2018-07-23  1558 return 
ERR_PTR(-ENOBUFS);
b95ec7eb Jiri Pirko2018-07-23  1559 err = nla_parse_nested(tb, 
TCA_FLOWER_MAX, tca[TCA_OPTIONS],
b95ec7eb Jiri Pirko2018-07-23  1560
fl_policy, NULL);
b95ec7eb Jiri Pirko2018-07-23  1561 if (err)
b95ec7eb Jiri Pirko2018-07-23  1562 goto errout_tb;
b95ec7eb Jiri Pirko2018-07-23  1563
b95ec7eb Jiri Pirko2018-07-23  1564 tmplt = kzalloc(sizeof(*tmplt), 
GFP_KERNEL);
1cbc36a5 Dan Carpenter 2018-08-03  1565 if (!tmplt) {
1cbc36a5 Dan Carpenter 2018-08-03  1566 err = -ENOMEM;
b95ec7eb Jiri Pirko2018-07-23  1567 goto errout_tb;
1cbc36a5 Dan Carpenter 2018-08-03  1568 }
b95ec7eb Jiri Pirko2018-07-23  1569 tmplt->chain = chain;
b95ec7eb Jiri Pirko2018-07-23  1570 err = fl_set_key(net, tb, 
&tmplt->dummy_key, &tmplt->mask, extack);
b95ec7eb Jiri Pirko2018-07-23  1571 if (err)
b95ec7eb Jiri Pirko2018-07-23  1572 goto errout_tmplt;
b95ec7eb Jiri Pirko2018-07-23 @1573 kfree(tb);
b95ec7eb Jiri Pirko2018-07-23  1574
b95ec7eb Jiri Pirko2018-07-23  1575 
fl_init_dissector(&tmplt->dissector, &tmplt->mask);
b95ec7eb Jiri Pirko2018-07-23  1576
012ba7af Pablo Neira Ayuso 2018-11-29  1577 err = fl_hw_create_tmplt(chain, 
tmplt);
012ba7af Pablo Neira Ayuso 2018-11-29  1578 if (err)
012ba7af Pablo Neira Ayuso 2018-11-29  1579 goto errout_tmplt;
34738452 Jiri Pirko2018-07-23  1580
b95ec7eb Jiri Pirko2018-07-23  1581 return tmplt;
b95ec7eb Jiri Pirko2018-07-23  1582
b95ec7eb Jiri Pirko2018-07-23  1583  errout_tmplt:
b95ec7eb Jiri Pirko2018-07-23  1584 kfree(tmplt);
b95ec7eb Jiri Pirko2018-07-23  1585  errout_tb:
b95ec7eb Jiri Pirko2018-07-23 @1586 kfree(tb);
b95ec7eb Jiri Pirko2018-07-23  1587 return ERR_PTR(err);
b95ec7eb Jiri Pirko2018-07-23  1588  }
b95ec7eb Jiri Pirko2018-07-23  1589

:: The code at line 1586 was first introduced by commit
:: b95ec7eb3b4d2f158dd15c912cf670b546f09571 net: sched: cls_flower: 
implement chain templates

:: TO: Jiri Pirko 
:: CC: David S. Miller 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


Re: [PATCH bpf-next v2 0/2] sample: xdp1 improvements

2018-11-30 Thread Alexei Starovoitov
On Sat, Dec 01, 2018 at 01:23:04AM +0100, Matteo Croce wrote:
> Small improvements to improve the readability and easiness
> to use of the xdp1 sample.

Applied to bpf-next.

I think that sample code could be more useful if it's wrapped with bash
script like selftests/test_xdp* tests do.
At that point it can move to selftests to get 0bot coverage.
Would you be interested in doing that?



VIEW THE ATTACHED FILE YOUR CERTIFIED BANK DRAFT FOR CONFIRMATION

2018-11-30 Thread REV,DR.JOHN H.WILLIAMS
DEAR BENEFICIARY,


I AM REV,DR.JOHN H.WILLIAMS,THE AUDITOR GENERAL OF THE FEDERAL
REPUBLIC OF NIGERIA AND I WISH TO KNOW IF YOU HAVE RECEIVED THE SCAN
COPY OF YOUR CERTIFIED BANK DRAFT IN THE AMOUNT OF $10,5MUSD WHICH I
SENT TO YOUR EMAIL ADDRESS YESTERDAY AND THIS APPROVED DRAFT INCLUDES
YOUR FUND INTEREST AMOUNT FOR THIS YEARS AWAITING,SO CALL ME NOW TO
CONFIRM IT OKAY:-+234-8132902282

NOTE THAT WE ARE PAYING YOU THROUGH OUR OIL RESERVE BANK ACCOUNT WITH
THE JP MORGAN CHASE BANK NEW YORK,USA AND THEY WILL CREDIT YOUR FUND
ONCE YOU HAVE THE ORIGINAL CERTIFIED BANK DRAFT HARD COPY AND DEPOSIT
IT TO ANY BANK OF YOUR CHOICE IN THE WORLD.

YOU ARE HEREBY ADVICE TO TAKE THE SCAN COPY OF YOUR CERTIFIED BANK
DRAFT TO YOUR LOCAL BANK AND CONFIRM IT BEFORE WE CAN SEND YOU THE
ORIGINAL HARD COPY OR YOU CAN COME DOWN FOR THE COLLECTION HERE OKAY.

I AWAIT TO HEAR FROM YOU SOONEST.

YOURS,

REV,DR.JOHN H.WILLIAMS


Re: [PATCH v2 bpf-next 0/4] bpf: Improve verifier test coverage on sparc64 et al.

2018-11-30 Thread Alexei Starovoitov
On Fri, Nov 30, 2018 at 09:07:54PM -0800, David Miller wrote:
> 
> On sparc64 a ton of test cases in test_verifier.c fail because
> the memory accesses in the test case are unaligned (or cannot
> be proven to be aligned by the verifier).
> 
> Perhaps we can eventually try to (carefully) modify each test case
> which has this problem to not use unaligned accesses but:
> 
> 1) That is delicate work.
> 
> 2) The changes might not fully respect the original
>intention of the testcase.
> 
> 3) In some cases, such a transformation might not even
>be feasible at all.
> 
> So add an "any alignment" flag to tell the verifier to forcefully
> disable it's alignment checks completely.
> 
> test_verifier.c is then annotated to use this flag when necessary.
> 
> The presence of the flag in each test case is good documentation to
> anyone who wants to actually tackle the job of eliminating the
> unaligned memory accesses in the test cases.
> 
> I've also seen several weird things in test cases, like trying to
> access __skb->mark in a packet buffer.
> 
> This gets rid of 104 test_verifier.c failures on sparc64.
> 
> Changes since v1:
> 
> 1) Explain the new BPF_PROG_LOAD flag in easier to understand terms.
>Suggested by Alexei.
> 
> 2) Make bpf_verify_program() just take a __u32 prog_flags instead of
>just accumulating boolean arguments over and over.  Also suggested
>by Alexei.
> 
> Changes since RFC:
> 
> 1) Only the admin can allow the relaxation of alignment restrictions
>on inefficient unaligned access architectures.
> 
> 2) Use F_NEEDS_EFFICIENT_UNALIGNED_ACCESS instead of making a new
>flag.
> 
> 3) Annotate in the output, when we have a test case that the verifier
>accepted but we did not try to execute because we are on an
>inefficient unaligned access platform.  Maybe with some arch
>machinery we can avoid this in the future.
> 
> Signed-off-by: David S. Miller 

The patch 2 didn't apply as-is to bpf-next, since I applied your earlier fix
"bpf: Fix verifier log string check for bad alignment" to bpf tree.
So I applied that fix to bpf-next as well and then pushed your series on top.
I think git should do the right thing when bpf and bpf-next trees converge.
But... let me know if I should drop that fix from bpf tree...
just to be on a safe side.



Re: [PATCH v2 bpf-next 0/4] bpf: Improve verifier test coverage on sparc64 et al.

2018-11-30 Thread David Miller
From: Alexei Starovoitov 
Date: Fri, 30 Nov 2018 21:55:02 -0800

> The patch 2 didn't apply as-is to bpf-next, since I applied your
> earlier fix "bpf: Fix verifier log string check for bad alignment"
> to bpf tree.  So I applied that fix to bpf-next as well and then
> pushed your series on top.

That seems best, thanks for resolving this.


Re: [PATCH bpf-next 1/4] bpf: Add BPF_F_ANY_ALIGNMENT.

2018-11-30 Thread Alexei Starovoitov
On Fri, Nov 30, 2018 at 08:44:20PM -0800, David Miller wrote:
> From: Alexei Starovoitov 
> Date: Fri, 30 Nov 2018 13:58:20 -0800
> 
> > On Thu, Nov 29, 2018 at 07:32:41PM -0800, David Miller wrote:
> >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >> index 426b5c8..c9647ea 100644
> >> --- a/include/uapi/linux/bpf.h
> >> +++ b/include/uapi/linux/bpf.h
> >> @@ -232,6 +232,16 @@ enum bpf_attach_type {
> >>   */
> >>  #define BPF_F_STRICT_ALIGNMENT(1U << 0)
> >>  
> >> +/* If BPF_F_ANY_ALIGNMENT is used in BPF_PROF_LOAD command, the
> >> + * verifier will allow any alignment whatsoever.  This bypasses
> >> + * what CONFIG_EFFICIENT_UNALIGNED_ACCESS would cause it to do.
> > 
> > I think majority of user space folks who read uapi/bpf.h have no idea
> > what that kernel config does.
> > Could you reword the comment here to say that this flag is only
> > effective on architectures and like sparc and mips that don't
> > have efficient unaligned access and ignored on x86/arm64 ?
> 
> I just want to point out in passing that your feeback applies also to
> the comment above BPF_F_STRICT_ALIGNMENT, which I used as a model for
> my comment.

Good point. Missed that earlier.
NET_IP_ALIGN is even more cryptic and it's not the same as 
HAVE_EFFICIENT_UNALIGNED_ACCESS.
Example: s390
We need to reword it.



[PATCH v2 bpf-next 4/4] bpf: Apply F_NEEDS_EFFICIENT_UNALIGNED_ACCESS to more ACCEPT test cases.

2018-11-30 Thread David Miller


If a testcase has alignment problems but is expected to be ACCEPT,
verify it using F_NEEDS_EFFICIENT_UNALIGNED_ACCESS too.

Maybe in the future if we add some architecture specific code to elide
the unaligned memory access warnings during the test, we can execute
these as well.

Signed-off-by: David S. Miller 
---
 tools/testing/selftests/bpf/test_verifier.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index 5d97fb8..beec10b 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -3886,6 +3886,7 @@ static struct bpf_test tests[] = {
},
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
.result = ACCEPT,
+   .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
{
"direct packet access: test21 (x += pkt_ptr, 2)",
@@ -3911,6 +3912,7 @@ static struct bpf_test tests[] = {
},
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
.result = ACCEPT,
+   .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
{
"direct packet access: test22 (x += pkt_ptr, 3)",
@@ -3941,6 +3943,7 @@ static struct bpf_test tests[] = {
},
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
.result = ACCEPT,
+   .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
{
"direct packet access: test23 (x += pkt_ptr, 4)",
@@ -3993,6 +3996,7 @@ static struct bpf_test tests[] = {
},
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
.result = ACCEPT,
+   .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
{
"direct packet access: test25 (marking on <, good access)",
@@ -7675,6 +7679,7 @@ static struct bpf_test tests[] = {
.result = ACCEPT,
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
.retval = 0 /* csum_diff of 64-byte packet */,
+   .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
{
"helper access to variable memory: size = 0 not allowed on NULL 
(!ARG_PTR_TO_MEM_OR_NULL)",
@@ -9637,6 +9642,7 @@ static struct bpf_test tests[] = {
},
.result = ACCEPT,
.prog_type = BPF_PROG_TYPE_XDP,
+   .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
{
"XDP pkt read, pkt_data' > pkt_end, bad access 1",
@@ -9808,6 +9814,7 @@ static struct bpf_test tests[] = {
},
.result = ACCEPT,
.prog_type = BPF_PROG_TYPE_XDP,
+   .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
{
"XDP pkt read, pkt_end < pkt_data', bad access 1",
@@ -9920,6 +9927,7 @@ static struct bpf_test tests[] = {
},
.result = ACCEPT,
.prog_type = BPF_PROG_TYPE_XDP,
+   .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
{
"XDP pkt read, pkt_end >= pkt_data', bad access 1",
@@ -9977,6 +9985,7 @@ static struct bpf_test tests[] = {
},
.result = ACCEPT,
.prog_type = BPF_PROG_TYPE_XDP,
+   .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
{
"XDP pkt read, pkt_data' <= pkt_end, bad access 1",
@@ -10089,6 +10098,7 @@ static struct bpf_test tests[] = {
},
.result = ACCEPT,
.prog_type = BPF_PROG_TYPE_XDP,
+   .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
{
"XDP pkt read, pkt_meta' > pkt_data, bad access 1",
@@ -10260,6 +10270,7 @@ static struct bpf_test tests[] = {
},
.result = ACCEPT,
.prog_type = BPF_PROG_TYPE_XDP,
+   .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
{
"XDP pkt read, pkt_data < pkt_meta', bad access 1",
@@ -10372,6 +10383,7 @@ static struct bpf_test tests[] = {
},
.result = ACCEPT,
.prog_type = BPF_PROG_TYPE_XDP,
+   .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
{
"XDP pkt read, pkt_data >= pkt_meta', bad access 1",
@@ -10429,6 +10441,7 @@ static struct bpf_test tests[] = {
},
.result = ACCEPT,
.prog_type = BPF_PROG_TYPE_XDP,
+   .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
{
"XDP pkt read, pkt_meta' <= pkt_data, bad access 1",
@@ -12348,6 +12361,7 @@ static struct bpf_test tests[] = {
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
.result = ACCEPT,
.retval = 1,
+   .flags = F_NEEDS_EFFICI

[PATCH v2 bpf-next 3/4] bpf: Make more use of 'any' alignment in test_verifier.c

2018-11-30 Thread David Miller


Use F_NEEDS_EFFICIENT_UNALIGNED_ACCESS in more tests where the
expected result is REJECT.

Signed-off-by: David S. Miller 
---
 tools/testing/selftests/bpf/test_verifier.c | 44 +
 1 file changed, 44 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index 428a84d..5d97fb8 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -1823,6 +1823,7 @@ static struct bpf_test tests[] = {
.errstr = "invalid bpf_context access",
.result = REJECT,
.prog_type = BPF_PROG_TYPE_SK_MSG,
+   .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
{
"direct packet read for SK_MSG",
@@ -2187,6 +2188,8 @@ static struct bpf_test tests[] = {
},
.errstr = "invalid bpf_context access",
.result = REJECT,
+   .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
+   .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
{
"check cb access: half, wrong type",
@@ -3249,6 +3252,7 @@ static struct bpf_test tests[] = {
.result = REJECT,
.errstr = "R0 invalid mem access 'inv'",
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+   .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
{
"raw_stack: skb_load_bytes, spilled regs corruption 2",
@@ -3279,6 +3283,7 @@ static struct bpf_test tests[] = {
.result = REJECT,
.errstr = "R3 invalid mem access 'inv'",
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+   .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
{
"raw_stack: skb_load_bytes, spilled regs + data",
@@ -3778,6 +3783,7 @@ static struct bpf_test tests[] = {
.errstr = "R2 invalid mem access 'inv'",
.result = REJECT,
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+   .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
{
"direct packet access: test16 (arith on data_end)",
@@ -3961,6 +3967,7 @@ static struct bpf_test tests[] = {
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
.result = REJECT,
.errstr = "invalid access to packet, off=0 size=8, 
R5(id=1,off=0,r=0)",
+   .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
{
"direct packet access: test24 (x += pkt_ptr, 5)",
@@ -5117,6 +5124,7 @@ static struct bpf_test tests[] = {
.result = REJECT,
.errstr = "invalid access to map value, value_size=64 off=-2 
size=4",
.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+   .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
{
"invalid cgroup storage access 5",
@@ -5233,6 +5241,7 @@ static struct bpf_test tests[] = {
.result = REJECT,
.errstr = "invalid access to map value, value_size=64 off=-2 
size=4",
.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+   .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
{
"invalid per-cpu cgroup storage access 5",
@@ -7149,6 +7158,7 @@ static struct bpf_test tests[] = {
.errstr = "invalid mem access 'inv'",
.result = REJECT,
.result_unpriv = REJECT,
+   .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
{
"map element value illegal alu op, 5",
@@ -7171,6 +7181,7 @@ static struct bpf_test tests[] = {
.fixup_map_hash_48b = { 3 },
.errstr = "R0 invalid mem access 'inv'",
.result = REJECT,
+   .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
{
"map element value is preserved across register spilling",
@@ -9663,6 +9674,7 @@ static struct bpf_test tests[] = {
.errstr = "R1 offset is outside of the packet",
.result = REJECT,
.prog_type = BPF_PROG_TYPE_XDP,
+   .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
{
"XDP pkt read, pkt_end > pkt_data', good access",
@@ -9701,6 +9713,7 @@ static struct bpf_test tests[] = {
.errstr = "R1 offset is outside of the packet",
.result = REJECT,
.prog_type = BPF_PROG_TYPE_XDP,
+   .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
{
"XDP pkt read, pkt_end > pkt_data', bad access 2",
@@ -9719,6 +9732,7 @@ static struct bpf_test tests[] = {
.errstr = "R1 offset is outside of the packet",
.result = REJECT,
.prog_type = BPF_PROG_TYPE_XDP,
+   .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},

[PATCH v2 bpf-next 2/4] bpf: Adjust F_NEEDS_EFFICIENT_UNALIGNED_ACCESS handling in test_verifier.c

2018-11-30 Thread David Miller


Make it set the flag argument to bpf_verify_program() which will relax
the alignment restrictions.

Now all such test cases will go properly through the verifier even on
inefficient unaligned access architectures.

On inefficient unaligned access architectures do not try to run such
programs, instead mark the test case as passing but annotate the
result similarly to how it is done now in the presence of this flag.

So, we get complete full coverage for all REJECT test cases, and at
least verifier level coverage for ACCEPT test cases.

Signed-off-by: David S. Miller 
---
 tools/testing/selftests/bpf/test_verifier.c | 42 -
 1 file changed, 24 insertions(+), 18 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index 4224133..428a84d 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -14200,13 +14200,14 @@ static int set_admin(bool admin)
 static void do_test_single(struct bpf_test *test, bool unpriv,
   int *passes, int *errors)
 {
-   int fd_prog, expected_ret, reject_from_alignment;
+   int fd_prog, expected_ret, alignment_prevented_execution;
int prog_len, prog_type = test->prog_type;
struct bpf_insn *prog = test->insns;
int map_fds[MAX_NR_MAPS];
const char *expected_err;
uint32_t expected_val;
uint32_t retval;
+   __u32 pflags;
int i, err;
 
for (i = 0; i < MAX_NR_MAPS; i++)
@@ -14217,9 +14218,12 @@ static void do_test_single(struct bpf_test *test, bool 
unpriv,
do_test_fixup(test, prog_type, prog, map_fds);
prog_len = probe_filter_length(prog);
 
-   fd_prog = bpf_verify_program(prog_type, prog, prog_len,
-test->flags & F_LOAD_WITH_STRICT_ALIGNMENT 
?
-BPF_F_STRICT_ALIGNMENT : 0,
+   pflags = 0;
+   if (test->flags & F_LOAD_WITH_STRICT_ALIGNMENT)
+   pflags |= BPF_F_STRICT_ALIGNMENT;
+   if (test->flags & F_NEEDS_EFFICIENT_UNALIGNED_ACCESS)
+   pflags |= BPF_F_ANY_ALIGNMENT;
+   fd_prog = bpf_verify_program(prog_type, prog, prog_len, pflags,
 "GPL", 0, bpf_vlog, sizeof(bpf_vlog), 1);
 
expected_ret = unpriv && test->result_unpriv != UNDEF ?
@@ -14229,28 +14233,27 @@ static void do_test_single(struct bpf_test *test, 
bool unpriv,
expected_val = unpriv && test->retval_unpriv ?
   test->retval_unpriv : test->retval;
 
-   reject_from_alignment = fd_prog < 0 &&
-   (test->flags & 
F_NEEDS_EFFICIENT_UNALIGNED_ACCESS) &&
-   strstr(bpf_vlog, "misaligned");
-#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
-   if (reject_from_alignment) {
-   printf("FAIL\nFailed due to alignment despite having efficient 
unaligned access: '%s'!\n",
-  strerror(errno));
-   goto fail_log;
-   }
-#endif
+   alignment_prevented_execution = 0;
+
if (expected_ret == ACCEPT) {
-   if (fd_prog < 0 && !reject_from_alignment) {
+   if (fd_prog < 0) {
printf("FAIL\nFailed to load prog '%s'!\n",
   strerror(errno));
goto fail_log;
}
+#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
+   if (fd_prog >= 0 &&
+   (test->flags & F_NEEDS_EFFICIENT_UNALIGNED_ACCESS)) {
+   alignment_prevented_execution = 1;
+   goto test_ok;
+   }
+#endif
} else {
if (fd_prog >= 0) {
printf("FAIL\nUnexpected success to load!\n");
goto fail_log;
}
-   if (!strstr(bpf_vlog, expected_err) && !reject_from_alignment) {
+   if (!strstr(bpf_vlog, expected_err)) {
printf("FAIL\nUnexpected error message!\n\tEXP: 
%s\n\tRES: %s\n",
  expected_err, bpf_vlog);
goto fail_log;
@@ -14278,9 +14281,12 @@ static void do_test_single(struct bpf_test *test, bool 
unpriv,
goto fail_log;
}
}
+#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
+test_ok:
+#endif
(*passes)++;
-   printf("OK%s\n", reject_from_alignment ?
-  " (NOTE: reject due to unknown alignment)" : "");
+   printf("OK%s\n", alignment_prevented_execution ?
+  " (NOTE: not executed due to unknown alignment)" : "");
 close_fds:
close(fd_prog);
for (i = 0; i < MAX_NR_MAPS; i++)
-- 
2.1.2.532.g19b5d50



[PATCH v2 bpf-next 0/4] bpf: Improve verifier test coverage on sparc64 et al.

2018-11-30 Thread David Miller


On sparc64 a ton of test cases in test_verifier.c fail because
the memory accesses in the test case are unaligned (or cannot
be proven to be aligned by the verifier).

Perhaps we can eventually try to (carefully) modify each test case
which has this problem to not use unaligned accesses but:

1) That is delicate work.

2) The changes might not fully respect the original
   intention of the testcase.

3) In some cases, such a transformation might not even
   be feasible at all.

So add an "any alignment" flag to tell the verifier to forcefully
disable it's alignment checks completely.

test_verifier.c is then annotated to use this flag when necessary.

The presence of the flag in each test case is good documentation to
anyone who wants to actually tackle the job of eliminating the
unaligned memory accesses in the test cases.

I've also seen several weird things in test cases, like trying to
access __skb->mark in a packet buffer.

This gets rid of 104 test_verifier.c failures on sparc64.

Changes since v1:

1) Explain the new BPF_PROG_LOAD flag in easier to understand terms.
   Suggested by Alexei.

2) Make bpf_verify_program() just take a __u32 prog_flags instead of
   just accumulating boolean arguments over and over.  Also suggested
   by Alexei.

Changes since RFC:

1) Only the admin can allow the relaxation of alignment restrictions
   on inefficient unaligned access architectures.

2) Use F_NEEDS_EFFICIENT_UNALIGNED_ACCESS instead of making a new
   flag.

3) Annotate in the output, when we have a test case that the verifier
   accepted but we did not try to execute because we are on an
   inefficient unaligned access platform.  Maybe with some arch
   machinery we can avoid this in the future.

Signed-off-by: David S. Miller 


[PATCH v2 bpf-next 1/4] bpf: Add BPF_F_ANY_ALIGNMENT.

2018-11-30 Thread David Miller


Often we want to write tests cases that check things like bad context
offset accesses.  And one way to do this is to use an odd offset on,
for example, a 32-bit load.

This unfortunately triggers the alignment checks first on platforms
that do not set CONFIG_EFFICIENT_UNALIGNED_ACCESS.  So the test
case see the alignment failure rather than what it was testing for.

It is often not completely possible to respect the original intention
of the test, or even test the same exact thing, while solving the
alignment issue.

Another option could have been to check the alignment after the
context and other validations are performed by the verifier, but
that is a non-trivial change to the verifier.

Signed-off-by: David S. Miller 
---
 include/uapi/linux/bpf.h| 14 ++
 kernel/bpf/syscall.c|  7 ++-
 kernel/bpf/verifier.c   |  2 ++
 tools/include/uapi/linux/bpf.h  | 14 ++
 tools/lib/bpf/bpf.c |  8 
 tools/lib/bpf/bpf.h |  2 +-
 tools/testing/selftests/bpf/test_align.c|  4 ++--
 tools/testing/selftests/bpf/test_verifier.c |  3 ++-
 8 files changed, 45 insertions(+), 9 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 426b5c8..6620c37 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -232,6 +232,20 @@ enum bpf_attach_type {
  */
 #define BPF_F_STRICT_ALIGNMENT (1U << 0)
 
+/* If BPF_F_ANY_ALIGNMENT is used in BPF_PROF_LOAD command, the
+ * verifier will allow any alignment whatsoever.  On platforms
+ * with strict alignment requirements for loads ands stores (such
+ * as sparc and mips) the verifier validates that all loads and
+ * stores provably follow this requirement.  This flag turns that
+ * checking and enforcement off.
+ *
+ * It is mostly used for testing when we want to validate the
+ * context and memory access aspects of the verifier, but because
+ * of an unaligned access the alignment check would trigger before
+ * the one we are interested in.
+ */
+#define BPF_F_ANY_ALIGNMENT(1U << 1)
+
 /* when bpf_ldimm64->src_reg == BPF_PSEUDO_MAP_FD, bpf_ldimm64->imm == fd */
 #define BPF_PSEUDO_MAP_FD  1
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index cf5040f..cae65bb 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1450,9 +1450,14 @@ static int bpf_prog_load(union bpf_attr *attr)
if (CHECK_ATTR(BPF_PROG_LOAD))
return -EINVAL;
 
-   if (attr->prog_flags & ~BPF_F_STRICT_ALIGNMENT)
+   if (attr->prog_flags & ~(BPF_F_STRICT_ALIGNMENT | BPF_F_ANY_ALIGNMENT))
return -EINVAL;
 
+   if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) &&
+   (attr->prog_flags & BPF_F_ANY_ALIGNMENT) &&
+   !capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
/* copy eBPF program license from user space */
if (strncpy_from_user(license, u64_to_user_ptr(attr->license),
  sizeof(license) - 1) < 0)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6dd4195..2fefeae 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6350,6 +6350,8 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr 
*attr)
env->strict_alignment = !!(attr->prog_flags & BPF_F_STRICT_ALIGNMENT);
if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
env->strict_alignment = true;
+   if (attr->prog_flags & BPF_F_ANY_ALIGNMENT)
+   env->strict_alignment = false;
 
ret = replace_map_fd_with_map_ptr(env);
if (ret < 0)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 426b5c8..6620c37 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -232,6 +232,20 @@ enum bpf_attach_type {
  */
 #define BPF_F_STRICT_ALIGNMENT (1U << 0)
 
+/* If BPF_F_ANY_ALIGNMENT is used in BPF_PROF_LOAD command, the
+ * verifier will allow any alignment whatsoever.  On platforms
+ * with strict alignment requirements for loads ands stores (such
+ * as sparc and mips) the verifier validates that all loads and
+ * stores provably follow this requirement.  This flag turns that
+ * checking and enforcement off.
+ *
+ * It is mostly used for testing when we want to validate the
+ * context and memory access aspects of the verifier, but because
+ * of an unaligned access the alignment check would trigger before
+ * the one we are interested in.
+ */
+#define BPF_F_ANY_ALIGNMENT(1U << 1)
+
 /* when bpf_ldimm64->src_reg == BPF_PSEUDO_MAP_FD, bpf_ldimm64->imm == fd */
 #define BPF_PSEUDO_MAP_FD  1
 
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 03f9bcc..b9b8c5a 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -231,9 +231,9 @@ int bpf_load_program(enum bpf_prog_type type, const struct 
bpf_insn *insns,
 }
 
 int bpf_verify_program(enum bpf_prog_type ty

[PATCH] net/core: tidy up an error message

2018-11-30 Thread Qian Cai
netif_napi_add() could report an error like this below due to it allows
to pass a format string for wildcarding before calling
dev_get_valid_name(),

"netif_napi_add() called with weight 256 on device eth%d"

For example, hns_enet_drv module does this.

hns_nic_try_get_ae
  hns_nic_init_ring_data
netif_napi_add
  register_netdev
dev_get_valid_name

Hence, make it a bit more human-readable.

Signed-off-by: Qian Cai 
---
 net/core/dev.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index ddc551f24ba2..bbd7cdbbbebd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6205,7 +6205,8 @@ void netif_napi_add(struct net_device *dev, struct 
napi_struct *napi,
napi->poll = poll;
if (weight > NAPI_POLL_WEIGHT)
pr_err_once("netif_napi_add() called with weight %d on device 
%s\n",
-   weight, dev->name);
+   weight,
+   !strchr(dev->name, '%') ? dev->name : "unknown");
napi->weight = weight;
list_add(&napi->dev_list, &dev->napi_list);
napi->dev = dev;
-- 
2.17.2 (Apple Git-113)



Re: [PATCH bpf-next 1/4] bpf: Add BPF_F_ANY_ALIGNMENT.

2018-11-30 Thread David Miller
From: Alexei Starovoitov 
Date: Fri, 30 Nov 2018 13:58:20 -0800

> On Thu, Nov 29, 2018 at 07:32:41PM -0800, David Miller wrote:
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 426b5c8..c9647ea 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -232,6 +232,16 @@ enum bpf_attach_type {
>>   */
>>  #define BPF_F_STRICT_ALIGNMENT  (1U << 0)
>>  
>> +/* If BPF_F_ANY_ALIGNMENT is used in BPF_PROF_LOAD command, the
>> + * verifier will allow any alignment whatsoever.  This bypasses
>> + * what CONFIG_EFFICIENT_UNALIGNED_ACCESS would cause it to do.
> 
> I think majority of user space folks who read uapi/bpf.h have no idea
> what that kernel config does.
> Could you reword the comment here to say that this flag is only
> effective on architectures and like sparc and mips that don't
> have efficient unaligned access and ignored on x86/arm64 ?

I just want to point out in passing that your feeback applies also to
the comment above BPF_F_STRICT_ALIGNMENT, which I used as a model for
my comment.


Re: consistency for statistics with XDP mode

2018-11-30 Thread Jakub Kicinski
On Fri, 30 Nov 2018 13:35:53 -0700, David Ahern wrote:
> On 11/30/18 1:30 PM, Michael S. Tsirkin wrote:
>  I would like to see basic packets, bytes, and dropped counters
>  tracked
>  for Rx and Tx via the standard netdev counters for all devices.   
> >>
> >> The problem of reporting XDP_DROP in the netedev drop counter is that
> >> they don't fit this counter description : "no space in linux buffers"
> >> and it will be hard for the user to determine whether these drops are
> >> coming from XDP or because no buffer is available, which will make it
> >> impossible to estimate packet rate performance without looking at
> >> ethtool stats.
> >> And reporting XDP_DROP in the netdev rx packets counter is somehow
> >> misleading.. since those packets never made it out of this driver.. 
> >>
> >>
> >> And reporting XDP_DROP in the netdev rx packets counter is somehow
> >> misleading.. since those packets never made it out of this driver..  
> > 
> > I think I agree. XDP needs minimal overhead - if user wants to do
> > counters then user can via maps. And in a sense XDP dropping packet
> > is much like e.g. TCP dropping packet - it is not counted
> > against the driver since it's not driver's fault.
> >   
> 
> XDP dropping a packet is completely different.
> 
> stats are important. packets disappearing with no counters -- standard
> counters visible by standard tools -- is a user nightmare. If the
> agreement is for XDP drops to be in driver level (e.g., xdp_drop) that
> is fine since it is still retrievable by ethtool -S (existing APIs and
> existing tools).

I don't think that's completely fair.  Disappearing packets are a
nightmare, but if the user installed a program which silently drops
packets without incrementing any counter it's their own fault.  If
cls_bpf returns STOLEN or TRAP, I don't think that's gonna get counted
anywhere.

I don't think DPDK drivers maintain "just in case" statistics, either..


Re: [PATCH RFC 00/15] Zero ****s, hugload of hugs <3

2018-11-30 Thread Jarkko Sakkinen
On Fri, Nov 30, 2018 at 08:39:01PM +, Abuse wrote:
> On Friday, 30 November 2018 20:35:07 GMT David Miller wrote:
> > From: Jens Axboe 
> > Date: Fri, 30 Nov 2018 13:12:26 -0700
> > 
> > > On 11/30/18 12:56 PM, Davidlohr Bueso wrote:
> > >> On Fri, 30 Nov 2018, Kees Cook wrote:
> > >> 
> > >>> On Fri, Nov 30, 2018 at 11:27 AM Jarkko Sakkinen
> > >>>  wrote:
> > 
> >  In order to comply with the CoC, replace  with a hug.
> > >> 
> > >> I hope this is some kind of joke. How would anyone get offended by 
> > >> reading
> > >> technical comments? This is all beyond me...
> > > 
> > > Agree, this is insanity.
> > 
> > And even worse it is censorship.
> > 
> 
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> Fuck
> 
> I assume I will now be barred.
> 
> 

Thank you for taking the opportunity to practice free speech in the
always so welcoming and inclusive on-line world :-) Now I'll just have
to find my notebook and write this prose down so that I'll never forget
it. Thanks again.

/Jarkko


RE: [Intel-wired-lan] [PATCH net-next] e100: Fix passing zero to 'PTR_ERR' warning in e100_load_ucode_wait

2018-11-30 Thread Brown, Aaron F
> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@osuosl.org] On
> Behalf Of YueHaibing
> Sent: Monday, November 19, 2018 4:48 AM
> To: da...@davemloft.net; Kirsher, Jeffrey T 
> Cc: netdev@vger.kernel.org; YueHaibing ; intel-
> wired-...@lists.osuosl.org; linux-ker...@vger.kernel.org
> Subject: [Intel-wired-lan] [PATCH net-next] e100: Fix passing zero to
> 'PTR_ERR' warning in e100_load_ucode_wait
> 
> Fix a static code checker warning:
> drivers/net/ethernet/intel/e100.c:1349
>  e100_load_ucode_wait() warn: passing zero to 'PTR_ERR'
> 
> Signed-off-by: YueHaibing 
> ---
>  drivers/net/ethernet/intel/e100.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 

Tested-by: Aaron Brown 


Re: [PATCH net-next v3 01/13] virtio: add packed ring types and macros

2018-11-30 Thread Tiwei Bie
On Fri, Nov 30, 2018 at 11:46:57AM -0500, Michael S. Tsirkin wrote:
> On Sat, Dec 01, 2018 at 12:24:16AM +0800, Tiwei Bie wrote:
> > On Fri, Nov 30, 2018 at 10:53:07AM -0500, Michael S. Tsirkin wrote:
> > > On Fri, Nov 30, 2018 at 11:37:37PM +0800, Tiwei Bie wrote:
> > > > On Fri, Nov 30, 2018 at 08:52:42AM -0500, Michael S. Tsirkin wrote:
> > > > > On Fri, Nov 30, 2018 at 02:01:06PM +0100, Maxime Coquelin wrote:
> > > > > > On 11/30/18 1:47 PM, Michael S. Tsirkin wrote:
> > > > > > > On Fri, Nov 30, 2018 at 05:53:40PM +0800, Tiwei Bie wrote:
> > > > > > > > On Fri, Nov 30, 2018 at 04:10:55PM +0800, Jason Wang wrote:
> > > > > > > > > 
> > > > > > > > > On 2018/11/21 下午6:03, Tiwei Bie wrote:
> > > > > > > > > > Add types and macros for packed ring.
> > > > > > > > > > 
> > > > > > > > > > Signed-off-by: Tiwei Bie 
> > > > > > > > > > ---
> > > > > > > > > >include/uapi/linux/virtio_config.h |  3 +++
> > > > > > > > > >include/uapi/linux/virtio_ring.h   | 52 
> > > > > > > > > > ++
> > > > > > > > > >2 files changed, 55 insertions(+)
> > > > > > > > > > 
> > > > > > > > > > diff --git a/include/uapi/linux/virtio_config.h 
> > > > > > > > > > b/include/uapi/linux/virtio_config.h
> > > > > > > > > > index 449132c76b1c..1196e1c1d4f6 100644
> > > > > > > > > > --- a/include/uapi/linux/virtio_config.h
> > > > > > > > > > +++ b/include/uapi/linux/virtio_config.h
> > > > > > > > > > @@ -75,6 +75,9 @@
> > > > > > > > > > */
> > > > > > > > > >#define VIRTIO_F_IOMMU_PLATFORM  33
> > > > > > > > > > +/* This feature indicates support for the packed virtqueue 
> > > > > > > > > > layout. */
> > > > > > > > > > +#define VIRTIO_F_RING_PACKED   34
> > > > > > > > > > +
> > > > > > > > > >/*
> > > > > > > > > > * Does the device support Single Root I/O 
> > > > > > > > > > Virtualization?
> > > > > > > > > > */
> > > > > > > > > > diff --git a/include/uapi/linux/virtio_ring.h 
> > > > > > > > > > b/include/uapi/linux/virtio_ring.h
> > > > > > > > > > index 6d5d5faa989b..2414f8af26b3 100644
> > > > > > > > > > --- a/include/uapi/linux/virtio_ring.h
> > > > > > > > > > +++ b/include/uapi/linux/virtio_ring.h
> > > > > > > > > > @@ -44,6 +44,13 @@
> > > > > > > > > >/* This means the buffer contains a list of buffer 
> > > > > > > > > > descriptors. */
> > > > > > > > > >#define VRING_DESC_F_INDIRECT4
> > > > > > > > > > +/*
> > > > > > > > > > + * Mark a descriptor as available or used in packed ring.
> > > > > > > > > > + * Notice: they are defined as shifts instead of shifted 
> > > > > > > > > > values.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > This looks inconsistent to previous flags, any reason for 
> > > > > > > > > using shifts?
> > > > > > > > 
> > > > > > > > Yeah, it was suggested to use shifts, as _F_ should be a bit
> > > > > > > > number, not a shifted value:
> > > > > > > > 
> > > > > > > > https://patchwork.ozlabs.org/patch/942296/#1989390
> > > > > > > 
> > > > > > > But let's add a _SPLIT_ variant that uses shifts consistently.
> > > > > > 
> > > > > > Maybe we could avoid adding SPLIT and PACKED, but define as follow:
> > > > > > 
> > > > > > #define VRING_DESC_F_INDIRECT_SHIFT 2
> > > > > > #define VRING_DESC_F_INDIRECT (1ull <<  VRING_DESC_F_INDIRECT_SHIFT)
> > > > > > 
> > > > > > #define VRING_DESC_F_AVAIL_SHIFT 7
> > > > > > #define VRING_DESC_F_AVAIL (1ull << VRING_DESC_F_AVAIL_SHIFT)
> > > > > > 
> > > > > > I think it would be better for consistency.
> > > > > 
> > > > > I don't think that will help. the problem is that
> > > > > most of the existing virtio code consistently uses _F_ as shifts.
> > > > > So we just need to do something about these 5 being inconsistent:
> > > > > 
> > > > > include/uapi/linux/virtio_ring.h:#define VRING_DESC_F_NEXT  1
> > > > > include/uapi/linux/virtio_ring.h:#define VRING_DESC_F_WRITE 2
> > > > > include/uapi/linux/virtio_ring.h:#define VRING_DESC_F_INDIRECT  4
> > > > > include/uapi/linux/virtio_ring.h:#define VRING_USED_F_NO_NOTIFY 1
> > > > > include/uapi/linux/virtio_ring.h:#define VRING_AVAIL_F_NO_INTERRUPT   
> > > > >   1
> > > > > 
> > > > > I do not want all of virtio to become verbose with _SHIFT, ergo
> > > > > we need to change the above 5 to have names which are with _F_ and
> > > > > have the bit number.
> > > > 
> > > > How about something like this:
> > > > 
> > > > #define VRING_COMM_DESC_F_NEXT  0
> > > > #define VRING_COMM_DESC_F_WRITE 1
> > > > #define VRING_COMM_DESC_F_INDIRECT  2
> > > > 
> > > > #define VRING_SPLIT_USED_F_NO_NOTIFY0
> > > > #define VRING_SPLIT_AVAIL_F_NO_INTERRUPT0
> > > > 
> > > > or
> > > > 
> > > > #define VRING_SPLIT_DESC_F_NEXT 0
> > > > #define VRING_SPLIT_DESC_F_WRITE1
> > > > #define VRING_SPLIT_DESC_F_INDIRECT 2
> > > > 
> > > > #define VRING_SPLIT_USED_F_NO_NOTIFY0
> > > > #def

Re: [PATCHv3 bpf 1/2] bpf: Support sk lookup in netns with id 0

2018-11-30 Thread Alexei Starovoitov
On Fri, Nov 30, 2018 at 03:32:20PM -0800, Joe Stringer wrote:
> David Ahern and Nicolas Dichtel report that the handling of the netns id
> 0 is incorrect for the BPF socket lookup helpers: rather than finding
> the netns with id 0, it is resolving to the current netns. This renders
> the netns_id 0 inaccessible.
> 
> To fix this, adjust the API for the netns to treat all negative s32
> values as a lookup in the current netns (including u64 values which when
> truncated to s32 become negative), while any values with a positive
> value in the signed 32-bit integer space would result in a lookup for a
> socket in the netns corresponding to that id. As before, if the netns
> with that ID does not exist, no socket will be found. Any netns outside
> of these ranges will fail to find a corresponding socket, as those
> values are reserved for future usage.
> 
> Signed-off-by: Joe Stringer 
> Acked-by: Nicolas Dichtel 

Applied both. Thanks everyone.

Joe, please provide a cover letter 0/N next time for the series
or if they're really separate patches submit them one by one.



Re: [PATCH net] tun: forbid iface creation with rtnl ops

2018-11-30 Thread David Miller
From: Nicolas Dichtel 
Date: Thu, 29 Nov 2018 14:45:39 +0100

> It's not supported right now (the goal of the initial patch was to support
> 'ip link del' only).
> 
> Before the patch:
> $ ip link add foo type tun
> [  239.632660] BUG: unable to handle kernel NULL pointer dereference at 
> 
> [snip]
> [  239.636410] RIP: 0010:register_netdevice+0x8e/0x3a0
> 
> This panic occurs because dev->netdev_ops is not set by tun_setup(). But to
> have something usable, it will require more than just setting
> netdev_ops.
> 
> Fixes: f019a7a594d9 ("tun: Implement ip link del tunXXX")
> CC: Eric W. Biederman 
> Signed-off-by: Nicolas Dichtel 

Super old bug, scary.

Applied, thanks.


Re: [PATCH v2] net: usb: aqc111: Initialize wol_cfg with memset in aqc111_suspend

2018-11-30 Thread David Miller
From: Nathan Chancellor 
Date: Wed, 28 Nov 2018 23:01:05 -0700

> Clang warns:
> 
> drivers/net/usb/aqc111.c:1326:37: warning: suggest braces around
> initialization of subobject [-Wmissing-braces]
> struct aqc111_wol_cfg wol_cfg = { 0 };
>   ^
>   {}
> 1 warning generated.
> 
> Use memset to initialize the object to take compiler instrumentation out
> of the equation.
> 
> Fixes: e58ba4544c77 ("net: usb: aqc111: Add support for wake on LAN by MAGIC 
> packet")
> Signed-off-by: Nathan Chancellor 

Applied.


Re: [PATCH net] virtio-net: keep vnet header zeroed after processing XDP

2018-11-30 Thread David Miller
From: Jason Wang 
Date: Thu, 29 Nov 2018 13:53:16 +0800

> We copy vnet header unconditionally in page_to_skb() this is wrong
> since XDP may modify the packet data. So let's keep a zeroed vnet
> header for not confusing the conversion between vnet header and skb
> metadata.
> 
> In the future, we should able to detect whether or not the packet was
> modified and keep using the vnet header when packet was not touched.
> 
> Fixes: f600b6905015 ("virtio_net: Add XDP support")
> Reported-by: Pavel Popa 
> Signed-off-by: Jason Wang 

Applied and queued up for -stable, thanks.


Re: [PATCH net-next] net: qualcomm: rmnet: Remove set but not used variable 'cmd'

2018-11-30 Thread David Miller
From: YueHaibing 
Date: Thu, 29 Nov 2018 02:36:32 +

> Fixes gcc '-Wunused-but-set-variable' warning:
> 
> drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c: In function 
> 'rmnet_map_do_flow_control':
> drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c:23:36: warning:
>  variable 'cmd' set but not used [-Wunused-but-set-variable]
>   struct rmnet_map_control_command *cmd;
> 
> 'cmd' not used anymore now, should also be removed.
> 
> Signed-off-by: YueHaibing 

Applied.


Re: [PATCH net 0/3] fixes in timeout and retransmission accounting

2018-11-30 Thread David Miller
From: Yuchung Cheng 
Date: Wed, 28 Nov 2018 16:06:42 -0800

> This patch set has assorted fixes of minor accounting issues in
> timeout, window probe, and retransmission stats.

Looks good, series applied.


Re: [PATCH net] liquidio: read sc->iq_no before release sc

2018-11-30 Thread David Miller
From: Pan Bian 
Date: Thu, 29 Nov 2018 07:54:22 +0800

> The function lio_vf_rep_packet_sent_callback releases the occupation of
> sc via octeon_free_soft_command. sc should not be used after that.
> Unfortunately, sc->iq_no is read. To fix this, the patch stores sc->iq_no
> into a local variable before releasing sc and then uses the local variable
> instead of sc->iq_no.
> 
> Signed-off-by: Pan Bian 

Applied.


Re: [PATCHv3 bpf 1/2] bpf: Support sk lookup in netns with id 0

2018-11-30 Thread Joey Pabalinas
On Fri, Nov 30, 2018 at 02:36:57PM -1000, Joey Pabalinas wrote:
> On Fri, Nov 30, 2018 at 03:32:20PM -0800, Joe Stringer wrote:
> > + * the netns associated with the *ctx*. *netns* values beyond the
> > + * range of 32-bit integers are reserved for future use.
> 
> Would adding a word or two before "*netns*" here be helpful to placate
> the english pedants among us (such as myself)? Starting a sentence with
> a lowercase word, even if it's a variable name, just never really sits
> right with me.
> 
>   Any *netns* values ...
> 
> Doesn't that kind of flow a bit better anyway?

Hah,

> Anyway, now with that unimportant nitpick if off my chest, the rest
^
|
I make an absolutely terrible pedant. --

-- 
Cheers,
Joey Pabalinas


signature.asc
Description: PGP signature


Re: [Patch net] mlx5: fix get_ip_proto()

2018-11-30 Thread David Miller
From: Cong Wang 
Date: Wed, 28 Nov 2018 15:04:05 -0800

> IP header is not necessarily located right after struct ethhdr,
> there could be multiple 802.1Q headers in between, this is why
> we call __vlan_get_protocol().
> 
> Fixes: fe1dc069990c ("net/mlx5e: don't set CHECKSUM_COMPLETE on SCTP packets")
> Cc: Alaa Hleihel 
> Cc: Or Gerlitz 
> Cc: Saeed Mahameed 
> Signed-off-by: Cong Wang 

Applied.


Re: [PATCH net] net: dsa: Fix tagging attribute location

2018-11-30 Thread David Miller
From: Florian Fainelli 
Date: Wed, 28 Nov 2018 13:40:04 -0800

> While introducing the DSA tagging protocol attribute, it was added to the DSA
> slave network devices, but those actually see untagged traffic (that is their
> whole purpose). Correct this mistake by putting the tagging sysfs attribute
> under the DSA master network device where this is the information that we 
> need.
> 
> While at it, also correct the sysfs documentation mistake that missed the
> "dsa/" directory component of the attribute.
> 
> Fixes: 98cdb4807123 ("net: dsa: Expose tagging protocol to user-space")
> Signed-off-by: Florian Fainelli 

Applied, thanks Florian.


Re: [PATCH net-next] tun: implement carrier change

2018-11-30 Thread David Miller
From: Nicolas Dichtel 
Date: Wed, 28 Nov 2018 19:12:56 +0100

> The userspace may need to control the carrier state.
> 
> Signed-off-by: Nicolas Dichtel 
> Signed-off-by: Didier Pallard 

Applied.


Re: [PATCH bpf] bpf: fix pointer offsets in context for 32 bit

2018-11-30 Thread Alexei Starovoitov
On Sat, Dec 01, 2018 at 01:18:53AM +0100, Daniel Borkmann wrote:
> Currently, pointer offsets in three BPF context structures are
> broken in two scenarios: i) 32 bit compiled applications running
> on 64 bit kernels, and ii) LLVM compiled BPF programs running
> on 32 bit kernels. The latter is due to BPF target machine being
> strictly 64 bit. So in each of the cases the offsets will mismatch
> in verifier when checking / rewriting context access. Fix this by
> providing a helper macro __bpf_md_ptr() that will enforce padding
> up to 64 bit and proper alignment, and for context access a macro
> bpf_ctx_range_ptr() which will cover full 64 bit member range on
> 32 bit archs. For flow_keys, we additionally need to force the
> size check to sizeof(__u64) as with other pointer types.
> 
> Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook")
> Fixes: 4f738adba30a ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket 
> TX/RX data")
> Fixes: 2dbb9b9e6df6 ("bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT")
> Reported-by: David S. Miller 
> Signed-off-by: Daniel Borkmann 
> Acked-by: David S. Miller 

Applied. Thanks everyone



Re: [PATCH net] net/sched: act_police: fix memory leak in case of invalid control action

2018-11-30 Thread David Miller
From: Davide Caratti 
Date: Wed, 28 Nov 2018 18:43:42 +0100

> when users set an invalid control action, kmemleak complains as follows:
> 
>  # echo clear >/sys/kernel/debug/kmemleak
>  # ./tdc.py -e b48b
>  Test b48b: Add police action with exceed goto chain control action
>  All test results:
> 
>  1..1
>  ok 1 - b48b # Add police action with exceed goto chain control action
>  about to flush the tap output if tests need to be skipped
>  done flushing skipped test tap output
>  # echo scan >/sys/kernel/debug/kmemleak
>  # cat /sys/kernel/debug/kmemleak
>  unreferenced object 0xa0fafbc3dde0 (size 96):
>   comm "tc", pid 2358, jiffies 4294922738 (age 17.022s)
>   hex dump (first 32 bytes):
> 2a 00 00 20 00 00 00 00 00 00 7d 00 00 00 00 00  *.. ..}.
> f8 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
>   backtrace:
> [<648803d2>] tcf_action_init_1+0x384/0x4c0
> [] tcf_action_init+0x12b/0x1a0
> [<847ef0d4>] tcf_action_add+0x73/0x170
> [<93656e14>] tc_ctl_action+0x122/0x160
> [<23c98e32>] rtnetlink_rcv_msg+0x263/0x2d0
> [<3493ae9c>] netlink_rcv_skb+0x4d/0x130
> [] netlink_unicast+0x209/0x2d0
> [] netlink_sendmsg+0x2c1/0x3c0
> [<7a9e0753>] sock_sendmsg+0x33/0x40
> [<457c6d2e>] ___sys_sendmsg+0x2a0/0x2f0
> [] __sys_sendmsg+0x5e/0xa0
> [<446eafce>] do_syscall_64+0x5b/0x180
> [<4aa871f2>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [<450c38ef>] 0x
> 
> change tcf_police_init() to avoid leaking 'new' in case TCA_POLICE_RESULT
> contains TC_ACT_GOTO_CHAIN extended action.
> 
> Fixes: c08f5ed5d625 ("net/sched: act_police: disallow 'goto chain' on 
> fallback control action")
> Reported-by: Dan Carpenter 
> Signed-off-by: Davide Caratti 

Applied.


Re: [PATCH net-next] net: reorder flowi_common fields to avoid holes

2018-11-30 Thread David Miller
From: Paolo Abeni 
Date: Wed, 28 Nov 2018 17:37:53 +0100

> the flowi* structures are used and memsetted by server functions
> in critical path. Currently flowi_common has a couple of holes that
> we can eliminate reordering the struct fields. As a side effect,
> both flowi4 and flowi6 shrink by 8 bytes.
 ...
> Signed-off-by: Paolo Abeni 

Applied, thanks.


Re: [PATCH net-next 0/8] mlxsw: Add VxLAN support with VLAN-aware bridges

2018-11-30 Thread David Miller
From: Ido Schimmel 
Date: Wed, 28 Nov 2018 20:06:56 +

> Commit 53e50a6ec24d ("Merge branch 'mlxsw-Add-VxLAN-support'") added
> mlxsw support for VxLAN when the VxLAN device was enslaved to
> VLAN-unaware bridges. This patchset extends mlxsw to also support VxLAN
> with VLAN-aware bridges.
> 
> With VLAN-aware bridges, the VxLAN device's VNI is mapped to the VLAN
> that is configured as 'pvid untagged' on the corresponding bridge port.
> To prevent ambiguity, mlxsw forbids configurations in which the same
> VLAN is configured as 'pvid untagged' on multiple VxLAN devices.
> 
> Patches #1-#2 add the necessary APIs in mlxsw and the bridge driver.
> 
> Patches #3-#4 perform small refactoring in order to prepare mlxsw for
> VLAN-aware support.
> 
> Patch #5 finally enables the enslavement of VxLAN devices to a
> VLAN-aware bridge. Among other things, it extends mlxsw to handle
> switchdev notifications about VLAN add / delete on a VxLAN device
> enslaved to an offloaded VLAN-aware bridge.
> 
> Patches #6-#8 add selftests to test the new functionality.

Nice clean changes, nice test cases, series applied.

Thanks!


Re: BPF uapi structures and 32-bit

2018-11-30 Thread David Miller
From: Alexei Starovoitov 
Date: Fri, 30 Nov 2018 15:33:08 -0800

> On Wed, Nov 28, 2018 at 11:02:00AM -0800, David Miller wrote:
>> From: Daniel Borkmann 
>> Date: Wed, 28 Nov 2018 11:34:55 +0100
>> 
>> > Yeah fully agree. Thinking diff below should address it, do you
>> > have a chance to give this a spin for sparc / 32 bit to check if
>> > test_verifier still explodes?
>> 
>> Great, let me play with this.
>> 
>> I did something simpler yesterday, just changing the data pointers to
>> "u64" and that made at least one test pass that didn't before :-)
>> 
>> I'll get back to you with results.
> 
> Did you have a chance to test it ?
> 
> I'd like to add a tested-by before I apply Daniel's patch
> which looks good to me. btw.

Tested-by: David S. Miller 

Yes, it does the trick.


Re: [PATCH bpf-next] bpf: allow BPF read access to qdisc pkt_len

2018-11-30 Thread Daniel Borkmann
On 12/01/2018 12:42 AM, Willem de Bruijn wrote:
> On Fri, Nov 30, 2018 at 5:48 PM Song Liu  wrote:
>>
>> On Fri, Nov 30, 2018 at 12:09 PM Willem de Bruijn
>>  wrote:
>>>
>>> From: Petar Penkov 
>>>
>>> The pkt_len field in qdisc_skb_cb stores the skb length as it will
>>> appear on the wire after segmentation. For byte accounting, this value
>>> is more accurate than skb->len. It is computed on entry to the TC
>>> layer, so only valid there.
>>>
>>> Allow read access to this field from BPF tc classifier and action
>>> programs. The implementation is analogous to tc_classid, aside from
>>> restricting to read access.
>>>
>>> Signed-off-by: Petar Penkov 
>>> Signed-off-by: Vlad Dumitrescu 
>>> Signed-off-by: Willem de Bruijn 
>>> ---
>>>  include/uapi/linux/bpf.h|  1 +
>>>  net/core/filter.c   | 16 +++
>>>  tools/include/uapi/linux/bpf.h  |  1 +
>>>  tools/testing/selftests/bpf/test_verifier.c | 32 +
>>>  4 files changed, 50 insertions(+)
>>
>> Please split this into 3 patches:
>> 1 for include/uapi/linux/bpf.h and filter.c
>> 1 for tools/include/uapi/linux/bpf.h
>> 1 for tools/testing/selftests/bpf/test_verifier.c
>>
>> Other than this
>> Acked-by: Song Liu 
> 
> Thanks for the fast review.
> 
> I'm happy to resend in three parts, of course, but am curious: what is
> the rationale for splitting this up?
> 
> This patch follows the process for commit  f11216b24219 ("bpf: add
> skb->tstamp r/w access from tc clsact and cg skb progs"), which went
> in as a single patch just last week.

Yeah, I think it's fine as is, one small thing I'm wondering though is
given that we now would have both 'skb->len' and 'skb->pkt_len', would
it be more intuitive for a BPF program developer to distinguish the two
by having the latter named e.g. 'skb->wire_len' so it's slightly more
obvious that it's including header size at post-segmentation? If not
probably some comment in the uapi header similar as in qdisc_pkt_len_init()
might be helpful in any case.

Thanks,
Daniel


Re: [PATCHv3 bpf 1/2] bpf: Support sk lookup in netns with id 0

2018-11-30 Thread Joey Pabalinas
On Fri, Nov 30, 2018 at 03:32:20PM -0800, Joe Stringer wrote:
> + *   the netns associated with the *ctx*. *netns* values beyond the
> + *   range of 32-bit integers are reserved for future use.

Would adding a word or two before "*netns*" here be helpful to placate
the english pedants among us (such as myself)? Starting a sentence with
a lowercase word, even if it's a variable name, just never really sits
right with me.

Any *netns* values ...

Doesn't that kind of flow a bit better anyway?

Anyway, now with that unimportant nitpick if off my chest, the rest
looks fine to me. Ack with or without any changes to that comment.

Acked-by: Joey Pabalinas 

On Fri, Nov 30, 2018 at 03:32:20PM -0800, Joe Stringer wrote:
> David Ahern and Nicolas Dichtel report that the handling of the netns id
> 0 is incorrect for the BPF socket lookup helpers: rather than finding
> the netns with id 0, it is resolving to the current netns. This renders
> the netns_id 0 inaccessible.
> 
> To fix this, adjust the API for the netns to treat all negative s32
> values as a lookup in the current netns (including u64 values which when
> truncated to s32 become negative), while any values with a positive
> value in the signed 32-bit integer space would result in a lookup for a
> socket in the netns corresponding to that id. As before, if the netns
> with that ID does not exist, no socket will be found. Any netns outside
> of these ranges will fail to find a corresponding socket, as those
> values are reserved for future usage.
> 
> Signed-off-by: Joe Stringer 
> Acked-by: Nicolas Dichtel 
> ---
>  include/uapi/linux/bpf.h  | 35 ++---
>  net/core/filter.c | 11 +++---
>  tools/include/uapi/linux/bpf.h| 39 ---
>  tools/testing/selftests/bpf/bpf_helpers.h |  4 +-
>  .../selftests/bpf/test_sk_lookup_kern.c   | 18 -
>  5 files changed, 63 insertions(+), 44 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 852dc17ab47a..ad68b472dad2 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2170,7 +2170,7 @@ union bpf_attr {
>   *   Return
>   *   0 on success, or a negative error in case of failure.
>   *
> - * struct bpf_sock *bpf_sk_lookup_tcp(void *ctx, struct bpf_sock_tuple 
> *tuple, u32 tuple_size, u32 netns, u64 flags)
> + * struct bpf_sock *bpf_sk_lookup_tcp(void *ctx, struct bpf_sock_tuple 
> *tuple, u32 tuple_size, u64 netns, u64 flags)
>   *   Description
>   *   Look for TCP socket matching *tuple*, optionally in a child
>   *   network namespace *netns*. The return value must be checked,
> @@ -2187,12 +2187,14 @@ union bpf_attr {
>   *   **sizeof**\ (*tuple*\ **->ipv6**)
>   *   Look for an IPv6 socket.
>   *
> - *   If the *netns* is zero, then the socket lookup table in the
> - *   netns associated with the *ctx* will be used. For the TC hooks,
> - *   this in the netns of the device in the skb. For socket hooks,
> - *   this in the netns of the socket. If *netns* is non-zero, then
> - *   it specifies the ID of the netns relative to the netns
> - *   associated with the *ctx*.
> + *   If the *netns* is a negative signed 32-bit integer, then the
> + *   socket lookup table in the netns associated with the *ctx* will
> + *   will be used. For the TC hooks, this is the netns of the device
> + *   in the skb. For socket hooks, this is the netns of the socket.
> + *   If *netns* is any other signed 32-bit value greater than or
> + *   equal to zero then it specifies the ID of the netns relative to
> + *   the netns associated with the *ctx*. *netns* values beyond the
> + *   range of 32-bit integers are reserved for future use.
>   *
>   *   All values for *flags* are reserved for future usage, and must
>   *   be left at zero.
> @@ -2202,7 +2204,7 @@ union bpf_attr {
>   *   Return
>   *   Pointer to *struct bpf_sock*, or NULL in case of failure.
>   *
> - * struct bpf_sock *bpf_sk_lookup_udp(void *ctx, struct bpf_sock_tuple 
> *tuple, u32 tuple_size, u32 netns, u64 flags)
> + * struct bpf_sock *bpf_sk_lookup_udp(void *ctx, struct bpf_sock_tuple 
> *tuple, u32 tuple_size, u64 netns, u64 flags)
>   *   Description
>   *   Look for UDP socket matching *tuple*, optionally in a child
>   *   network namespace *netns*. The return value must be checked,
> @@ -2219,12 +2221,14 @@ union bpf_attr {
>   *   **sizeof**\ (*tuple*\ **->ipv6**)
>   *   Look for an IPv6 socket.
>   *
> - *   If the *netns* is zero, then the socket lookup table in the
> - *   netns associated with the *ctx* will be used. For the TC hooks,
> - *   this in the netns of the device in the skb. For socket hooks,
> - *   this in

Re: [PATCH net-next v2 2/3] vxlan: extack support for some changelink cases

2018-11-30 Thread kbuild test robot
Hi Roopa,

I love your patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Roopa-Prabhu/vxlan-a-few-minor-cleanups/20181130-184658
config: i386-allmodconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/net/vxlan.c: In function 'vxlan_nl2conf':
>> drivers/net/vxlan.c:3527:3: error: expected '}' before 'else'
  else {
  ^~~~

vim +3527 drivers/net/vxlan.c

  3376  
  3377  static int vxlan_nl2conf(struct nlattr *tb[], struct nlattr *data[],
  3378   struct net_device *dev, struct vxlan_config 
*conf,
  3379   bool changelink, struct netlink_ext_ack 
*extack)
  3380  {
  3381  struct vxlan_dev *vxlan = netdev_priv(dev);
  3382  
  3383  memset(conf, 0, sizeof(*conf));
  3384  
  3385  /* if changelink operation, start with old existing cfg */
  3386  if (changelink)
  3387  memcpy(conf, &vxlan->cfg, sizeof(*conf));
  3388  
  3389  if (data[IFLA_VXLAN_ID]) {
  3390  __be32 vni = 
cpu_to_be32(nla_get_u32(data[IFLA_VXLAN_ID]));
  3391  
  3392  if (changelink && (vni != conf->vni)) {
  3393  NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_VXLAN_ID],
  3394  "Cannot change vni");
  3395  return -EOPNOTSUPP;
  3396  }
  3397  conf->vni = 
cpu_to_be32(nla_get_u32(data[IFLA_VXLAN_ID]));
  3398  }
  3399  
  3400  if (data[IFLA_VXLAN_GROUP]) {
  3401  if (changelink && (conf->remote_ip.sa.sa_family != 
AF_INET)) {
  3402  NL_SET_ERR_MSG_ATTR(extack, 
tb[IFLA_VXLAN_GROUP],
  3403  "New group addr family does 
not match old group");
  3404  return -EOPNOTSUPP;
  3405  }
  3406  conf->remote_ip.sin.sin_addr.s_addr = 
nla_get_in_addr(data[IFLA_VXLAN_GROUP]);
  3407  conf->remote_ip.sa.sa_family = AF_INET;
  3408  } else if (data[IFLA_VXLAN_GROUP6]) {
  3409  if (!IS_ENABLED(CONFIG_IPV6)) {
  3410  NL_SET_ERR_MSG_ATTR(extack, 
tb[IFLA_VXLAN_GROUP6],
  3411  "IPv6 support not enabled 
in the kernel");
  3412  return -EPFNOSUPPORT;
  3413  }
  3414  
  3415  if (changelink && (conf->remote_ip.sa.sa_family != 
AF_INET6)) {
  3416  NL_SET_ERR_MSG_ATTR(extack, 
tb[IFLA_VXLAN_GROUP6],
  3417  "New group addr family does 
not match old group");
  3418  return -EOPNOTSUPP;
  3419  }
  3420  
  3421  conf->remote_ip.sin6.sin6_addr = 
nla_get_in6_addr(data[IFLA_VXLAN_GROUP6]);
  3422  conf->remote_ip.sa.sa_family = AF_INET6;
  3423  }
  3424  
  3425  if (data[IFLA_VXLAN_LOCAL]) {
  3426  if (changelink && (conf->saddr.sa.sa_family != 
AF_INET)) {
  3427  NL_SET_ERR_MSG_ATTR(extack, 
tb[IFLA_VXLAN_LOCAL],
  3428  "New local addr family does 
not match old");
  3429  return -EOPNOTSUPP;
  3430  }
  3431  
  3432  conf->saddr.sin.sin_addr.s_addr = 
nla_get_in_addr(data[IFLA_VXLAN_LOCAL]);
  3433  conf->saddr.sa.sa_family = AF_INET;
  3434  } else if (data[IFLA_VXLAN_LOCAL6]) {
  3435  if (!IS_ENABLED(CONFIG_IPV6)) {
  3436  NL_SET_ERR_MSG_ATTR(extack, 
tb[IFLA_VXLAN_LOCAL6],
  3437  "IPv6 support not enabled 
in the kernel");
  3438  return -EPFNOSUPPORT;
  3439  }
  3440  
  3441  if (changelink && (conf->saddr.sa.sa_family != 
AF_INET6)) {
  3442  NL_SET_ERR_MSG_ATTR(extack, 
tb[IFLA_VXLAN_LOCAL6],
  3443  "New local6 addr family 
does not match old");
  3444  return -EOPNOTSUPP;
  3445  }
  3446  
  3447  /* TODO: respect scope id */
  3448  conf->saddr.sin6.sin6_addr = 
nla_get_in6_addr(data[IFLA_VXLAN_LOCAL6]);
  3449  conf->saddr.sa.sa_family = AF_INET6;
  3450  }
  3451  
  3452  if (data[IFLA_VXLAN_LINK])
 

[PATCH bpf-next v2 0/2] sample: xdp1 improvements

2018-11-30 Thread Matteo Croce
Small improvements to improve the readability and easiness
to use of the xdp1 sample.

Matteo Croce (2):
  samples: bpf: improve xdp1 example
  samples: bpf: get ifindex from ifname

 samples/bpf/xdp1_user.c | 27 +++
 1 file changed, 15 insertions(+), 12 deletions(-)

-- 
2.19.1



[PATCH bpf-next v2 1/2] samples: bpf: improve xdp1 example

2018-11-30 Thread Matteo Croce
Store only the total packet count for every protocol, instead of the
whole per-cpu array.
Use bpf_map_get_next_key() to iterate the map, instead of looking up
all the protocols.

Signed-off-by: Matteo Croce 
---
 samples/bpf/xdp1_user.c | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/samples/bpf/xdp1_user.c b/samples/bpf/xdp1_user.c
index b02c531510ed..4f3d824fc044 100644
--- a/samples/bpf/xdp1_user.c
+++ b/samples/bpf/xdp1_user.c
@@ -34,26 +34,24 @@ static void int_exit(int sig)
 static void poll_stats(int map_fd, int interval)
 {
unsigned int nr_cpus = bpf_num_possible_cpus();
-   const unsigned int nr_keys = 256;
-   __u64 values[nr_cpus], prev[nr_keys][nr_cpus];
-   __u32 key;
+   __u64 values[nr_cpus], prev[UINT8_MAX] = { 0 };
int i;
 
-   memset(prev, 0, sizeof(prev));
-
while (1) {
+   __u32 key = UINT32_MAX;
+
sleep(interval);
 
-   for (key = 0; key < nr_keys; key++) {
+   while (bpf_map_get_next_key(map_fd, &key, &key) != -1) {
__u64 sum = 0;
 
assert(bpf_map_lookup_elem(map_fd, &key, values) == 0);
for (i = 0; i < nr_cpus; i++)
-   sum += (values[i] - prev[key][i]);
-   if (sum)
+   sum += values[i];
+   if (sum > prev[key])
printf("proto %u: %10llu pkt/s\n",
-  key, sum / interval);
-   memcpy(prev[key], values, sizeof(values));
+  key, (sum - prev[key]) / interval);
+   prev[key] = sum;
}
}
 }
-- 
2.19.1



[PATCH bpf-next v2 2/2] samples: bpf: get ifindex from ifname

2018-11-30 Thread Matteo Croce
Find the ifindex with if_nametoindex() instead of requiring the
numeric ifindex.

Signed-off-by: Matteo Croce 
---
v2: use if_nametoindex() instead of ioctl()

 samples/bpf/xdp1_user.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/samples/bpf/xdp1_user.c b/samples/bpf/xdp1_user.c
index 4f3d824fc044..0a197f86ac43 100644
--- a/samples/bpf/xdp1_user.c
+++ b/samples/bpf/xdp1_user.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "bpf_util.h"
 #include "bpf/bpf.h"
@@ -59,7 +60,7 @@ static void poll_stats(int map_fd, int interval)
 static void usage(const char *prog)
 {
fprintf(stderr,
-   "usage: %s [OPTS] IFINDEX\n\n"
+   "usage: %s [OPTS] IFACE\n\n"
"OPTS:\n"
"-Suse skb-mode\n"
"-Nenforce native mode\n",
@@ -102,7 +103,11 @@ int main(int argc, char **argv)
return 1;
}
 
-   ifindex = strtoul(argv[optind], NULL, 0);
+   ifindex = if_nametoindex(argv[1]);
+   if (!ifindex) {
+   perror("if_nametoindex");
+   return 1;
+   }
 
snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
prog_load_attr.file = filename;
-- 
2.19.1



[PATCH bpf] bpf: fix pointer offsets in context for 32 bit

2018-11-30 Thread Daniel Borkmann
Currently, pointer offsets in three BPF context structures are
broken in two scenarios: i) 32 bit compiled applications running
on 64 bit kernels, and ii) LLVM compiled BPF programs running
on 32 bit kernels. The latter is due to BPF target machine being
strictly 64 bit. So in each of the cases the offsets will mismatch
in verifier when checking / rewriting context access. Fix this by
providing a helper macro __bpf_md_ptr() that will enforce padding
up to 64 bit and proper alignment, and for context access a macro
bpf_ctx_range_ptr() which will cover full 64 bit member range on
32 bit archs. For flow_keys, we additionally need to force the
size check to sizeof(__u64) as with other pointer types.

Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook")
Fixes: 4f738adba30a ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket 
TX/RX data")
Fixes: 2dbb9b9e6df6 ("bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT")
Reported-by: David S. Miller 
Signed-off-by: Daniel Borkmann 
Acked-by: David S. Miller 
---
 include/linux/filter.h |  7 +++
 include/uapi/linux/bpf.h   | 17 -
 net/core/filter.c  | 16 
 tools/include/uapi/linux/bpf.h | 17 -
 4 files changed, 39 insertions(+), 18 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 448dcc4..795ff0b 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -449,6 +449,13 @@ struct sock_reuseport;
offsetof(TYPE, MEMBER) ... offsetofend(TYPE, MEMBER) - 1
 #define bpf_ctx_range_till(TYPE, MEMBER1, MEMBER2) 
\
offsetof(TYPE, MEMBER1) ... offsetofend(TYPE, MEMBER2) - 1
+#if BITS_PER_LONG == 64
+# define bpf_ctx_range_ptr(TYPE, MEMBER)   
\
+   offsetof(TYPE, MEMBER) ... offsetofend(TYPE, MEMBER) - 1
+#else
+# define bpf_ctx_range_ptr(TYPE, MEMBER)   
\
+   offsetof(TYPE, MEMBER) ... offsetof(TYPE, MEMBER) + 8 - 1
+#endif /* BITS_PER_LONG == 64 */
 
 #define bpf_target_off(TYPE, MEMBER, SIZE, PTR_SIZE)   
\
({  
\
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 852dc17..426b5c8 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2422,6 +2422,12 @@ enum bpf_lwt_encap_mode {
BPF_LWT_ENCAP_SEG6_INLINE
 };
 
+#define __bpf_md_ptr(type, name)   \
+union {\
+   type name;  \
+   __u64 :64;  \
+} __attribute__((aligned(8)))
+
 /* user accessible mirror of in-kernel sk_buff.
  * new fields can only be added to the end of this structure
  */
@@ -2456,7 +2462,7 @@ struct __sk_buff {
/* ... here. */
 
__u32 data_meta;
-   struct bpf_flow_keys *flow_keys;
+   __bpf_md_ptr(struct bpf_flow_keys *, flow_keys);
 };
 
 struct bpf_tunnel_key {
@@ -2572,8 +2578,8 @@ enum sk_action {
  * be added to the end of this structure
  */
 struct sk_msg_md {
-   void *data;
-   void *data_end;
+   __bpf_md_ptr(void *, data);
+   __bpf_md_ptr(void *, data_end);
 
__u32 family;
__u32 remote_ip4;   /* Stored in network byte order */
@@ -2589,8 +2595,9 @@ struct sk_reuseport_md {
 * Start of directly accessible data. It begins from
 * the tcp/udp header.
 */
-   void *data;
-   void *data_end; /* End of directly accessible data */
+   __bpf_md_ptr(void *, data);
+   /* End of directly accessible data */
+   __bpf_md_ptr(void *, data_end);
/*
 * Total length of packet (starting from the tcp/udp header).
 * Note that the directly accessible bytes (data_end - data)
diff --git a/net/core/filter.c b/net/core/filter.c
index 9a1327e..6ee605d 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5435,8 +5435,8 @@ static bool bpf_skb_is_valid_access(int off, int size, 
enum bpf_access_type type
if (size != size_default)
return false;
break;
-   case bpf_ctx_range(struct __sk_buff, flow_keys):
-   if (size != sizeof(struct bpf_flow_keys *))
+   case bpf_ctx_range_ptr(struct __sk_buff, flow_keys):
+   if (size != sizeof(__u64))
return false;
break;
default:
@@ -5464,7 +5464,7 @@ static bool sk_filter_is_valid_access(int off, int size,
case bpf_ctx_range(struct __sk_buff, data):
case bpf_ctx_range(struct __sk_buff, data_meta):
case bpf_ctx_range(struct __sk_buff, data_end):
-   case bpf_ctx_range(struct __sk_buff, flow_keys):
+   case bpf_ctx_range_ptr(struct __sk_buff, flow_keys):
case bpf_ctx_range_till(struct __sk_buff, family, local_port):
return false;
}
@@ -5489,7 +5489,7 @@ static bool cg_

Re: [PATCH bpf-next 1/2] bpf: add BPF_LWT_ENCAP_IP option to bpf_lwt_push_encap

2018-11-30 Thread Peter Oskolkov
On Fri, Nov 30, 2018 at 3:52 PM David Ahern  wrote:
>
> On 11/30/18 4:35 PM, Peter Oskolkov wrote:
> > Thanks, David! This is for egress only, so I'll add an appropriate
> > check. I'll also address your other comments/concerns in a v2 version
> > of this patchset.
>
> Why are you limiting this to egress only?

I'm focusing on egress because this is a use case that we have and
understand well, and would like to have a solution for sooner rather
than later.

Without understanding why anybody would want to do lwt-bpf encap on
ingress, I don't know, for example, what a good test would look like;
but I'd be happy to look into a specific ingress use case if you have
one.


[PATCH v3 net] mv88e6060: disable hardware level MAC learning

2018-11-30 Thread Anderson Luiz Alves
Disable hardware level MAC learning because it breaks station roaming.
When enabled it drops all frames that arrive from a MAC address
that is on a different port at learning table.

Signed-off-by: Anderson Luiz Alves 
---

Notes:
v2: Updated code comments.
v3: Sent with correct whitespaces.

 drivers/net/dsa/mv88e6060.c | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/net/dsa/mv88e6060.c b/drivers/net/dsa/mv88e6060.c
index 65f10fec2..0b3e51f24 100644
--- a/drivers/net/dsa/mv88e6060.c
+++ b/drivers/net/dsa/mv88e6060.c
@@ -116,8 +116,7 @@ static int mv88e6060_switch_reset(struct dsa_switch *ds)
/* Reset the switch. */
REG_WRITE(REG_GLOBAL, GLOBAL_ATU_CONTROL,
  GLOBAL_ATU_CONTROL_SWRESET |
- GLOBAL_ATU_CONTROL_ATUSIZE_1024 |
- GLOBAL_ATU_CONTROL_ATE_AGE_5MIN);
+ GLOBAL_ATU_CONTROL_LEARNDIS);
 
/* Wait up to one second for reset to complete. */
timeout = jiffies + 1 * HZ;
@@ -142,13 +141,10 @@ static int mv88e6060_setup_global(struct dsa_switch *ds)
 */
REG_WRITE(REG_GLOBAL, GLOBAL_CONTROL, GLOBAL_CONTROL_MAX_FRAME_1536);
 
-   /* Enable automatic address learning, set the address
-* database size to 1024 entries, and set the default aging
-* time to 5 minutes.
+   /* Disable automatic address learning.
 */
REG_WRITE(REG_GLOBAL, GLOBAL_ATU_CONTROL,
- GLOBAL_ATU_CONTROL_ATUSIZE_1024 |
- GLOBAL_ATU_CONTROL_ATE_AGE_5MIN);
+ GLOBAL_ATU_CONTROL_LEARNDIS);
 
return 0;
 }
-- 
2.17.1



Re: consistency for statistics with XDP mode

2018-11-30 Thread Saeed Mahameed
On Fri, 2018-11-30 at 15:30 -0500, Michael S. Tsirkin wrote:
> On Fri, Nov 30, 2018 at 08:10:58PM +, Saeed Mahameed wrote:
> > On Thu, 2018-11-22 at 18:00 +0100, Toke Høiland-Jørgensen wrote:
> > > David Ahern  writes:
> > > 
> > > > On 11/22/18 1:26 AM, Toke Høiland-Jørgensen wrote:
> > > > > Saeed Mahameed  writes:
> > > > > 
> > > > > > > > I'd say it sounds reasonable to include XDP in the
> > > > > > > > normal
> > > > > > > > traffic
> > > > > > > > counters, but having the detailed XDP-specific counters
> > > > > > > > is
> > > > > > > > quite
> > > > > > > > useful
> > > > > > > > as well... So can't we do both (for all drivers)?
> > > > > > > > 
> > > > > > 
> > > > > > What are you thinking ? 
> > > > > > reporting XDP_DROP in interface dropped counter ?
> > > > > > and XDP_TX/REDIRECT in the TX counter ?
> > > > > > XDP_ABORTED in the  err/drop counter ?
> > > > > > 
> > > > > > how about having a special XDP command in the .ndo_bpf that
> > > > > > would query
> > > > > > the standardized XDP stats ?
> > > > > the XDP-specific stats are useful to have separately as well
> > > > > :)
> > > > > 
> > > > 
> > > > I would like to see basic packets, bytes, and dropped counters
> > > > tracked
> > > > for Rx and Tx via the standard netdev counters for all
> > > > devices. 
> > 
> > The problem of reporting XDP_DROP in the netedev drop counter is
> > that
> > they don't fit this counter description : "no space in linux
> > buffers"
> > and it will be hard for the user to determine whether these drops
> > are
> > coming from XDP or because no buffer is available, which will make
> > it
> > impossible to estimate packet rate performance without looking at
> > ethtool stats.
> > And reporting XDP_DROP in the netdev rx packets counter is somehow
> > misleading.. since those packets never made it out of this
> > driver.. 
> > 
> > 
> > And reporting XDP_DROP in the netdev rx packets counter is somehow
> > misleading.. since those packets never made it out of this driver..
> 
> I think I agree. XDP needs minimal overhead - if user wants to do
> counters then user can via maps. And in a sense XDP dropping packet
> is much like e.g. TCP dropping packet - it is not counted
> against the driver since it's not driver's fault.

So we should count all XDP RX packets as successful rx packets i.e
netdev->stats.rx_packets++; regardless of the XDP program decision ? 

this implies that XDP_TX packets will be counted twice once in 
netdev->stats.rx_packets and once in netdev->stats.tx_packets

I think this is the only valid option if we are going to use standard
netdev stats for XDP use cases.





Re: BPF uapi structures and 32-bit

2018-11-30 Thread Daniel Borkmann
On 12/01/2018 12:33 AM, Alexei Starovoitov wrote:
> On Wed, Nov 28, 2018 at 11:02:00AM -0800, David Miller wrote:
>> From: Daniel Borkmann 
>> Date: Wed, 28 Nov 2018 11:34:55 +0100
>>
>>> Yeah fully agree. Thinking diff below should address it, do you
>>> have a chance to give this a spin for sparc / 32 bit to check if
>>> test_verifier still explodes?
>>
>> Great, let me play with this.
>>
>> I did something simpler yesterday, just changing the data pointers to
>> "u64" and that made at least one test pass that didn't before :-)
>>
>> I'll get back to you with results.
> 
> Did you have a chance to test it ?

David got back to me and mentioned it worked fine on sparc.

> I'd like to add a tested-by before I apply Daniel's patch
> which looks good to me. btw.

Yeah, wanted to cook an official patch today, let me do that now.


Re: [PATCH bpf-next 1/2] bpf: add BPF_LWT_ENCAP_IP option to bpf_lwt_push_encap

2018-11-30 Thread David Ahern
On 11/30/18 4:35 PM, Peter Oskolkov wrote:
> Thanks, David! This is for egress only, so I'll add an appropriate
> check. I'll also address your other comments/concerns in a v2 version
> of this patchset.

Why are you limiting this to egress only?


Re: [PATCH bpf-next] bpf: allow BPF read access to qdisc pkt_len

2018-11-30 Thread Willem de Bruijn
On Fri, Nov 30, 2018 at 5:48 PM Song Liu  wrote:
>
> On Fri, Nov 30, 2018 at 12:09 PM Willem de Bruijn
>  wrote:
> >
> > From: Petar Penkov 
> >
> > The pkt_len field in qdisc_skb_cb stores the skb length as it will
> > appear on the wire after segmentation. For byte accounting, this value
> > is more accurate than skb->len. It is computed on entry to the TC
> > layer, so only valid there.
> >
> > Allow read access to this field from BPF tc classifier and action
> > programs. The implementation is analogous to tc_classid, aside from
> > restricting to read access.
> >
> > Signed-off-by: Petar Penkov 
> > Signed-off-by: Vlad Dumitrescu 
> > Signed-off-by: Willem de Bruijn 
> > ---
> >  include/uapi/linux/bpf.h|  1 +
> >  net/core/filter.c   | 16 +++
> >  tools/include/uapi/linux/bpf.h  |  1 +
> >  tools/testing/selftests/bpf/test_verifier.c | 32 +
> >  4 files changed, 50 insertions(+)
>
> Please split this into 3 patches:
> 1 for include/uapi/linux/bpf.h and filter.c
> 1 for tools/include/uapi/linux/bpf.h
> 1 for tools/testing/selftests/bpf/test_verifier.c
>
> Other than this
> Acked-by: Song Liu 

Thanks for the fast review.

I'm happy to resend in three parts, of course, but am curious: what is
the rationale for splitting this up?

This patch follows the process for commit  f11216b24219 ("bpf: add
skb->tstamp r/w access from tc clsact and cg skb progs"), which went
in as a single patch just last week.


Re: [PATCH 7/8] socket: Add SO_TIMESTAMP[NS]_NEW

2018-11-30 Thread Willem de Bruijn
On Fri, Nov 30, 2018 at 5:43 PM Deepa Dinamani  wrote:
>
> On Sun, Nov 25, 2018 at 6:33 AM Willem de Bruijn
>  wrote:
> >
> > On Sun, Nov 25, 2018 at 12:28 AM Deepa Dinamani  
> > wrote:
> > >
> > > > > > +   if (type == SO_TIMESTAMP_NEW || type == SO_TIMESTAMPNS_NEW)
> > > > > > +   sock_set_flag(sk, SOCK_TSTAMP_NEW);
> > > > > > +   else
> > > > > > +   sock_reset_flag(sk, SOCK_TSTAMP_NEW);
> > > > > > +
> > > > >
> > > > > if adding a boolean whether the socket uses new or old-style
> > > > > timestamps, perhaps fail hard if a process tries to set a new-style
> > > > > option while an old-style is already set and vice versa. Also include
> > > > > SO_TIMESTAMPING_NEW as it toggles the same option.
> > >
> > > I do not think this is a problem.
> > > Consider this example, if there is a user application with updated
> > > socket timestamps is linking into a library that is yet to be updated.
> >
> > Also consider applications that do not use libraries.
>
> Arnd and I talked about this.
> We thought that the new options should behave like the already
> existing options. The patch already does this.
> Eg: Today if we set SO_TIMESTAMP and then try to switch to
> SO_TIMESTAMPNS then there is no fail.

> Do you still want a hard fail?

I do think that it is preferable. In general, and in this specific case.

We have had had many bug reports from syzkaller where the fuzzer
manages to trigger unexpected behavior by combining two APIs
that were never intended to be used together.

However inane the combination may be, once an API is published,
we cannot simply add an EINVAL and stop supporting it. So it is safer
to explicitly block unsafe combinations from the start. If there is a
legitimate use it is always possible to loosen that restriction later.

I don't see any sensible use for mixing both the old and the new
interface on the same socket.

That said, just a suggestion.


Re: [PATCH bpf-next 1/2] bpf: add BPF_LWT_ENCAP_IP option to bpf_lwt_push_encap

2018-11-30 Thread Peter Oskolkov
Thanks, David! This is for egress only, so I'll add an appropriate
check. I'll also address your other comments/concerns in a v2 version
of this patchset.
On Fri, Nov 30, 2018 at 12:08 PM David Ahern  wrote:
>
> On 11/28/18 6:34 PM, Peter Oskolkov wrote:
> > On Wed, Nov 28, 2018 at 4:47 PM David Ahern  wrote:
> >>
> >> On 11/28/18 5:22 PM, Peter Oskolkov wrote:
> >>> diff --git a/net/core/filter.c b/net/core/filter.c
> >>> index bd0df75dc7b6..17f3c37218e5 100644
> >>> --- a/net/core/filter.c
> >>> +++ b/net/core/filter.c
> >>> @@ -4793,6 +4793,60 @@ static int bpf_push_seg6_encap(struct sk_buff 
> >>> *skb, u32 type, void *hdr, u32 len
> >>>  }
> >>>  #endif /* CONFIG_IPV6_SEG6_BPF */
> >>>
> >>> +static int bpf_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len)
> >>> +{
> >>> + struct dst_entry *dst;
> >>> + struct rtable *rt;
> >>> + struct iphdr *iph;
> >>> + struct net *net;
> >>> + int err;
> >>> +
> >>> + if (skb->protocol != htons(ETH_P_IP))
> >>> + return -EINVAL;  /* ETH_P_IPV6 not yet supported */
> >>> +
> >>> + iph = (struct iphdr *)hdr;
> >>> +
> >>> + if (unlikely(len < sizeof(struct iphdr) || len > 
> >>> LWTUNNEL_MAX_ENCAP_HSIZE))
> >>> + return -EINVAL;
> >>> + if (unlikely(iph->version != 4 || iph->ihl * 4 > len))
> >>> + return -EINVAL;
> >>> +
> >>> + if (skb->sk)
> >>> + net = sock_net(skb->sk);
> >>> + else {
> >>> + net = dev_net(skb_dst(skb)->dev);
> >>> + }
> >>> + rt = ip_route_output(net, iph->daddr, 0, 0, 0);
> >>
> >> That is a very limited use case. e.g., oif = 0 means you are not
> >> considering any kind of policy routing (e.g., VRF).
> >
> > Hi David! Could you be a bit more specific re: what you would like to
> > see here? Thanks!
> >
>
> Is the encap happening on ingress or egress? Seems like the current code
> does not assume either direction for lwt (BPF_PROG_TYPE_LWT_IN vs
> BPF_PROG_TYPE_LWT_OUT), yet your change does - output only. Basically,
> you should be filling in a flow struct and doing a proper lookup.
>
> When the potentially custom encap header is pushed on, seems to me skb
> marks should still be honored for the route lookup. If not, you should
> handle that in the API.
>
> From there skb->dev at a minimum should be used as either iif (ingress)
> or oif (egress).
>
> The iph is already set so you have quick access to the tos.
>
> Also, this should implement IPv6 as well before going in.


Re: [PATCH 2/8] sockopt: Rename SO_TIMESTAMP* to SO_TIMESTAMP*_OLD

2018-11-30 Thread Willem de Bruijn
On Fri, Nov 30, 2018 at 5:38 PM Deepa Dinamani  wrote:
>
> On Sat, Nov 24, 2018 at 7:59 PM Willem de Bruijn
>  wrote:
> >
> > On Sat, Nov 24, 2018 at 3:58 AM Deepa Dinamani  
> > wrote:
> > >
> > > SO_TIMESTAMP, SO_TIMESTAMPNS and SO_TIMESTAMPING options, the
> > > way they are currently defined, are not y2038 safe.
> > > Subsequent patches in the series add new y2038 safe versions
> > > of these options which provide 64 bit timestamps on all
> > > architectures uniformly.
> > > Hence, rename existing options with OLD tag suffixes.
> >
> > Why do the existing interfaces have to be renamed when new interfaces are 
> > added?
>
> Existing options need to be renamed because of the macro below:
>
> #define SO_TIMESTAMP (sizeof(time_t) == sizeof(__kernel_long_t) ? \
> SO_TIMESTAMP_OLD : SO_TIMESTAMP_NEW)
>
> SO_TIMESTAMP is now dependent on size of time_t because of the libc flag.

Yes, I understand the mechanism based on libc's definition of time_t
after Arnd's explanation. Please do capture that in the commit
message, for possible future readers who stumble upon the code with
git blame.


Re: No address after suspend/resume with 4.18

2018-11-30 Thread Jeff Kirsher
On Fri, 2018-11-30 at 15:26 -0800, Stephen Hemminger wrote:
> On my box with Debian testing, I see a new problem with
> suspend/resume of wired network device.
> Using stock Debian kernel 4.18.0-2-amd64
> 
> After suspend/resume cycle, IP address is lost.
> 
> Device Info:
> $ /sbin/ethtool -i enp12s0
> driver: igb
> version: 5.4.0-k
> firmware-version:  0. 6-1
> expansion-rom-version: 
> bus-info: :0c:00.0
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: yes
> 
> 
> $ lspci -v -s :0c:00.0
> 0c:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network
> Connection (rev 03)
>   Subsystem: Gigabyte Technology Co., Ltd I211 Gigabit Network
> Connection
>   Flags: bus master, fast devsel, latency 0, IRQ 17, NUMA node 0
>   Memory at dfb0 (32-bit, non-prefetchable) [size=128K]
>   I/O ports at c000 [size=32]
>   Memory at dfb2 (32-bit, non-prefetchable) [size=16K]
>   Capabilities: 
>   Kernel driver in use: igb
>   Kernel modules: igb
> 
> State before suspend:
> $ ip addr show dev enp12s0
> 4: enp12s0:  mtu 1500 qdisc mq state
> UP group default qlen 1000
> link/ether 1c:1b:0d:0a:4b:0e brd ff:ff:ff:ff:ff:ff
> inet 192.168.1.18/24 brd 192.168.1.255 scope global enp12s0
>valid_lft forever preferred_lft forever
> 
> State after resume:
> 
> $ ip addr show dev enp12s0
> 4: enp12s0:  mtu 1500 qdisc mq state
> UP group default qlen 1000
> link/ether 1c:1b:0d:0a:4b:0e brd ff:ff:ff:ff:ff:f
> 
> Doing ifdown/ifup which restarts the DHCP client does restore the
> address.
> 
> Not sure if this is a kernel issue with carrier handling, Intel
> driver issue, or DHCP client issue.

Thanks Stephen, I will have someone look into it.


signature.asc
Description: This is a digitally signed message part


Re: BPF uapi structures and 32-bit

2018-11-30 Thread Alexei Starovoitov
On Wed, Nov 28, 2018 at 11:02:00AM -0800, David Miller wrote:
> From: Daniel Borkmann 
> Date: Wed, 28 Nov 2018 11:34:55 +0100
> 
> > Yeah fully agree. Thinking diff below should address it, do you
> > have a chance to give this a spin for sparc / 32 bit to check if
> > test_verifier still explodes?
> 
> Great, let me play with this.
> 
> I did something simpler yesterday, just changing the data pointers to
> "u64" and that made at least one test pass that didn't before :-)
> 
> I'll get back to you with results.

Did you have a chance to test it ?

I'd like to add a tested-by before I apply Daniel's patch
which looks good to me. btw.



[PATCHv3 bpf 1/2] bpf: Support sk lookup in netns with id 0

2018-11-30 Thread Joe Stringer
David Ahern and Nicolas Dichtel report that the handling of the netns id
0 is incorrect for the BPF socket lookup helpers: rather than finding
the netns with id 0, it is resolving to the current netns. This renders
the netns_id 0 inaccessible.

To fix this, adjust the API for the netns to treat all negative s32
values as a lookup in the current netns (including u64 values which when
truncated to s32 become negative), while any values with a positive
value in the signed 32-bit integer space would result in a lookup for a
socket in the netns corresponding to that id. As before, if the netns
with that ID does not exist, no socket will be found. Any netns outside
of these ranges will fail to find a corresponding socket, as those
values are reserved for future usage.

Signed-off-by: Joe Stringer 
Acked-by: Nicolas Dichtel 
---
 include/uapi/linux/bpf.h  | 35 ++---
 net/core/filter.c | 11 +++---
 tools/include/uapi/linux/bpf.h| 39 ---
 tools/testing/selftests/bpf/bpf_helpers.h |  4 +-
 .../selftests/bpf/test_sk_lookup_kern.c   | 18 -
 5 files changed, 63 insertions(+), 44 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 852dc17ab47a..ad68b472dad2 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2170,7 +2170,7 @@ union bpf_attr {
  * Return
  * 0 on success, or a negative error in case of failure.
  *
- * struct bpf_sock *bpf_sk_lookup_tcp(void *ctx, struct bpf_sock_tuple *tuple, 
u32 tuple_size, u32 netns, u64 flags)
+ * struct bpf_sock *bpf_sk_lookup_tcp(void *ctx, struct bpf_sock_tuple *tuple, 
u32 tuple_size, u64 netns, u64 flags)
  * Description
  * Look for TCP socket matching *tuple*, optionally in a child
  * network namespace *netns*. The return value must be checked,
@@ -2187,12 +2187,14 @@ union bpf_attr {
  * **sizeof**\ (*tuple*\ **->ipv6**)
  * Look for an IPv6 socket.
  *
- * If the *netns* is zero, then the socket lookup table in the
- * netns associated with the *ctx* will be used. For the TC hooks,
- * this in the netns of the device in the skb. For socket hooks,
- * this in the netns of the socket. If *netns* is non-zero, then
- * it specifies the ID of the netns relative to the netns
- * associated with the *ctx*.
+ * If the *netns* is a negative signed 32-bit integer, then the
+ * socket lookup table in the netns associated with the *ctx* will
+ * will be used. For the TC hooks, this is the netns of the device
+ * in the skb. For socket hooks, this is the netns of the socket.
+ * If *netns* is any other signed 32-bit value greater than or
+ * equal to zero then it specifies the ID of the netns relative to
+ * the netns associated with the *ctx*. *netns* values beyond the
+ * range of 32-bit integers are reserved for future use.
  *
  * All values for *flags* are reserved for future usage, and must
  * be left at zero.
@@ -2202,7 +2204,7 @@ union bpf_attr {
  * Return
  * Pointer to *struct bpf_sock*, or NULL in case of failure.
  *
- * struct bpf_sock *bpf_sk_lookup_udp(void *ctx, struct bpf_sock_tuple *tuple, 
u32 tuple_size, u32 netns, u64 flags)
+ * struct bpf_sock *bpf_sk_lookup_udp(void *ctx, struct bpf_sock_tuple *tuple, 
u32 tuple_size, u64 netns, u64 flags)
  * Description
  * Look for UDP socket matching *tuple*, optionally in a child
  * network namespace *netns*. The return value must be checked,
@@ -2219,12 +2221,14 @@ union bpf_attr {
  * **sizeof**\ (*tuple*\ **->ipv6**)
  * Look for an IPv6 socket.
  *
- * If the *netns* is zero, then the socket lookup table in the
- * netns associated with the *ctx* will be used. For the TC hooks,
- * this in the netns of the device in the skb. For socket hooks,
- * this in the netns of the socket. If *netns* is non-zero, then
- * it specifies the ID of the netns relative to the netns
- * associated with the *ctx*.
+ * If the *netns* is a negative signed 32-bit integer, then the
+ * socket lookup table in the netns associated with the *ctx* will
+ * will be used. For the TC hooks, this is the netns of the device
+ * in the skb. For socket hooks, this is the netns of the socket.
+ * If *netns* is any other signed 32-bit value greater than or
+ * equal to zero then it specifies the ID of the netns relative to
+ * the netns associated with the *ctx*. *netns* values beyond the
+ * range of 32-bit integers are reserved for future use.
  *
  * All values for *flags* are 

[PATCHv3 bpf 2/2] bpf: Improve socket lookup reuseport documentation

2018-11-30 Thread Joe Stringer
Improve the wording around socket lookup for reuseport sockets, and
ensure that both bpf.h headers are in sync.

Signed-off-by: Joe Stringer 
---
 include/uapi/linux/bpf.h   | 4 
 tools/include/uapi/linux/bpf.h | 8 
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index ad68b472dad2..47f620b5cc5c 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2203,6 +2203,8 @@ union bpf_attr {
  * **CONFIG_NET** configuration option.
  * Return
  * Pointer to *struct bpf_sock*, or NULL in case of failure.
+ * For sockets with reuseport option, the *struct bpf_sock*
+ * result is from reuse->socks[] using the hash of the tuple.
  *
  * struct bpf_sock *bpf_sk_lookup_udp(void *ctx, struct bpf_sock_tuple *tuple, 
u32 tuple_size, u64 netns, u64 flags)
  * Description
@@ -2237,6 +2239,8 @@ union bpf_attr {
  * **CONFIG_NET** configuration option.
  * Return
  * Pointer to *struct bpf_sock*, or NULL in case of failure.
+ * For sockets with reuseport option, the *struct bpf_sock*
+ * result is from reuse->socks[] using the hash of the tuple.
  *
  * int bpf_sk_release(struct bpf_sock *sk)
  * Description
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index de2072ef475b..47f620b5cc5c 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2203,8 +2203,8 @@ union bpf_attr {
  * **CONFIG_NET** configuration option.
  * Return
  * Pointer to *struct bpf_sock*, or NULL in case of failure.
- * For sockets with reuseport option, *struct bpf_sock*
- * return is from reuse->socks[] using hash of the packet.
+ * For sockets with reuseport option, the *struct bpf_sock*
+ * result is from reuse->socks[] using the hash of the tuple.
  *
  * struct bpf_sock *bpf_sk_lookup_udp(void *ctx, struct bpf_sock_tuple *tuple, 
u32 tuple_size, u64 netns, u64 flags)
  * Description
@@ -2239,8 +2239,8 @@ union bpf_attr {
  * **CONFIG_NET** configuration option.
  * Return
  * Pointer to *struct bpf_sock*, or NULL in case of failure.
- * For sockets with reuseport option, *struct bpf_sock*
- * return is from reuse->socks[] using hash of the packet.
+ * For sockets with reuseport option, the *struct bpf_sock*
+ * result is from reuse->socks[] using the hash of the tuple.
  *
  * int bpf_sk_release(struct bpf_sock *sk)
  * Description
-- 
2.19.1



Re: [PATCH 3/8] socket: Disentangle SOCK_RCVTSTAMPNS from SOCK_RCVTSTAMP

2018-11-30 Thread Willem de Bruijn
On Fri, Nov 30, 2018 at 5:16 PM Deepa Dinamani  wrote:
>
> On Sun, Nov 25, 2018 at 10:19 AM David Miller  wrote:
> >
> > From: Willem de Bruijn 
> > Date: Sun, 25 Nov 2018 09:18:55 -0500
> >
> > > The existing logic is as is for a reason. There is no need to change
> > > it to satisfy the main purpose of your patchset?
> > >
> > > It is structured as one bit to test whether a timestamp is requested
> > > and another to select among two variants usec/nsec. Just add another
> > > layer of branching between new/old in cases where this distinction is
> > > needed.
> > >
> > > Please avoid code churn unless needed.
> >
> > +1
>
> This patch makes it easier to add logic for 2 new socket time options.
> But, if you prefer for all of the options to depend on SOCK_RCVTSTAMP
> then I will drop it.

Yes, please keep as is.

I don't see how this change is needed to significantly simplify the
main patchset, and an unnecessary change can cause an unforeseen
regression (as was the case with doubling the tests in the hot path).

The current approach has one branch in the hot path where timestamps
are disabled and then selects from two variants where it is enabled:

if (rcvtstamp) {
  if (rcvtstamp_ns)
..
  else
..
}

Both of these need to be split into new and old variants. The way to
achieve that with minimal code perturbation is

if (rcvtstamp) {
  +if (sk_timestamping_new)
  + return __sock_recv_timestamp_new(..)
  +
  if (rcvtstamp_ns)
..
  else
..
   }

Or alternatively add a check for new in each of the inner branches. In
any case, please be consistent between sock_recv_sw_timestamp and
tcp_recv_sw_timestamp. The current patchset alternates them.


Re: [PATCH RFC 00/15] Zero ****s, hugload of hugs <3

2018-11-30 Thread Jarkko Sakkinen
On Fri, Nov 30, 2018 at 02:40:19PM -0800, Jarkko Sakkinen wrote:
> Got you... Well I now read the 2nd amendment now through, and yeah, kind
> of way I work/function anyway.

Ugh, looked up the word from dictionary for something that makes
additions to some guidelines because did not know the english word.
Not meant as a political reference of any kind. Just don't know
any better English word.

/Jarkko


Re: [PATCHv2 bpf 1/2] bpf: Support sk lookup in netns with id 0

2018-11-30 Thread Joe Stringer
On Fri, 30 Nov 2018 at 15:27, Alexei Starovoitov
 wrote:
>
> On Fri, Nov 30, 2018 at 03:18:25PM -0800, Joe Stringer wrote:
> > On Fri, 30 Nov 2018 at 14:42, Alexei Starovoitov
> >  wrote:
> > >
> > > On Thu, Nov 29, 2018 at 04:29:33PM -0800, Joe Stringer wrote:
> > > > David Ahern and Nicolas Dichtel report that the handling of the netns id
> > > > 0 is incorrect for the BPF socket lookup helpers: rather than finding
> > > > the netns with id 0, it is resolving to the current netns. This renders
> > > > the netns_id 0 inaccessible.
> > > >
> > > > To fix this, adjust the API for the netns to treat all negative s32
> > > > values as a lookup in the current netns, while any values with a
> > > > positive value in the signed 32-bit integer space would result in a
> > > > lookup for a socket in the netns corresponding to that id. As before, if
> > > > the netns with that ID does not exist, no socket will be found.
> > > > Furthermore, if any bits are set in the upper 32-bits, then no socket
> > > > will be found.
> > > >
> > > > Signed-off-by: Joe Stringer 
> > > ..
> > > > +/* Current network namespace */
> > > > +#define BPF_CURRENT_NETNS(-1L)
> > >
> > > I was about to apply it, but then noticed that the name doesn't match
> > > the rest of the names.
> > > Could you rename it to BPF_F_CURRENT_NETNS ?
> >
> > I skipped the F_ part since it's not really a flag, it's a value. I
> > can put it back though.
>
> BPF_F_ prefix has smaller chance of conflicts.
> I wish we did that sooner.
> In retrospect BPF_ANY, BPF_EXIST were poorly picked names.

OK, I'll send out a v3 shortly.


Re: [PATCHv2 bpf 1/2] bpf: Support sk lookup in netns with id 0

2018-11-30 Thread Alexei Starovoitov
On Fri, Nov 30, 2018 at 03:18:25PM -0800, Joe Stringer wrote:
> On Fri, 30 Nov 2018 at 14:42, Alexei Starovoitov
>  wrote:
> >
> > On Thu, Nov 29, 2018 at 04:29:33PM -0800, Joe Stringer wrote:
> > > David Ahern and Nicolas Dichtel report that the handling of the netns id
> > > 0 is incorrect for the BPF socket lookup helpers: rather than finding
> > > the netns with id 0, it is resolving to the current netns. This renders
> > > the netns_id 0 inaccessible.
> > >
> > > To fix this, adjust the API for the netns to treat all negative s32
> > > values as a lookup in the current netns, while any values with a
> > > positive value in the signed 32-bit integer space would result in a
> > > lookup for a socket in the netns corresponding to that id. As before, if
> > > the netns with that ID does not exist, no socket will be found.
> > > Furthermore, if any bits are set in the upper 32-bits, then no socket
> > > will be found.
> > >
> > > Signed-off-by: Joe Stringer 
> > ..
> > > +/* Current network namespace */
> > > +#define BPF_CURRENT_NETNS(-1L)
> >
> > I was about to apply it, but then noticed that the name doesn't match
> > the rest of the names.
> > Could you rename it to BPF_F_CURRENT_NETNS ?
> 
> I skipped the F_ part since it's not really a flag, it's a value. I
> can put it back though.

BPF_F_ prefix has smaller chance of conflicts.
I wish we did that sooner.
In retrospect BPF_ANY, BPF_EXIST were poorly picked names.



[PATCH net] macvlan: return correct error value

2018-11-30 Thread Matteo Croce
A MAC address must be unique among all the macvlan devices with the same
lower device. The only exception is the passthru [sic] mode,
which shares the lower device address.

When duplicate addresses are detected, EBUSY is returned when bringing
the interface up:

# ip link add macvlan0 link eth0 type macvlan
# read addr 
---
 drivers/net/macvlan.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index fc8d5f1ee1ad..0da3d36b283b 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -608,7 +608,7 @@ static int macvlan_open(struct net_device *dev)
goto hash_add;
}
 
-   err = -EBUSY;
+   err = -EADDRINUSE;
if (macvlan_addr_busy(vlan->port, dev->dev_addr))
goto out;
 
@@ -706,7 +706,7 @@ static int macvlan_sync_address(struct net_device *dev, 
unsigned char *addr)
} else {
/* Rehash and update the device filters */
if (macvlan_addr_busy(vlan->port, addr))
-   return -EBUSY;
+   return -EADDRINUSE;
 
if (!macvlan_passthru(port)) {
err = dev_uc_add(lowerdev, addr);
@@ -747,6 +747,9 @@ static int macvlan_set_mac_address(struct net_device *dev, 
void *p)
return dev_set_mac_address(vlan->lowerdev, addr);
}
 
+   if (macvlan_addr_busy(vlan->port, addr->sa_data))
+   return -EADDRINUSE;
+
return macvlan_sync_address(dev, addr->sa_data);
 }
 
-- 
2.19.1



No address after suspend/resume with 4.18

2018-11-30 Thread Stephen Hemminger
On my box with Debian testing, I see a new problem with suspend/resume of wired 
network device.
Using stock Debian kernel 4.18.0-2-amd64

After suspend/resume cycle, IP address is lost.

Device Info:
$ /sbin/ethtool -i enp12s0
driver: igb
version: 5.4.0-k
firmware-version:  0. 6-1
expansion-rom-version: 
bus-info: :0c:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes


$ lspci -v -s :0c:00.0
0c:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection 
(rev 03)
Subsystem: Gigabyte Technology Co., Ltd I211 Gigabit Network Connection
Flags: bus master, fast devsel, latency 0, IRQ 17, NUMA node 0
Memory at dfb0 (32-bit, non-prefetchable) [size=128K]
I/O ports at c000 [size=32]
Memory at dfb2 (32-bit, non-prefetchable) [size=16K]
Capabilities: 
Kernel driver in use: igb
Kernel modules: igb

State before suspend:
$ ip addr show dev enp12s0
4: enp12s0:  mtu 1500 qdisc mq state UP group 
default qlen 1000
link/ether 1c:1b:0d:0a:4b:0e brd ff:ff:ff:ff:ff:ff
inet 192.168.1.18/24 brd 192.168.1.255 scope global enp12s0
   valid_lft forever preferred_lft forever

State after resume:

$ ip addr show dev enp12s0
4: enp12s0:  mtu 1500 qdisc mq state UP group 
default qlen 1000
link/ether 1c:1b:0d:0a:4b:0e brd ff:ff:ff:ff:ff:f

Doing ifdown/ifup which restarts the DHCP client does restore the address.

Not sure if this is a kernel issue with carrier handling, Intel driver issue, 
or DHCP client issue.


Re: [PATCHv2 bpf 1/2] bpf: Support sk lookup in netns with id 0

2018-11-30 Thread Joe Stringer
On Fri, 30 Nov 2018 at 14:42, Alexei Starovoitov
 wrote:
>
> On Thu, Nov 29, 2018 at 04:29:33PM -0800, Joe Stringer wrote:
> > David Ahern and Nicolas Dichtel report that the handling of the netns id
> > 0 is incorrect for the BPF socket lookup helpers: rather than finding
> > the netns with id 0, it is resolving to the current netns. This renders
> > the netns_id 0 inaccessible.
> >
> > To fix this, adjust the API for the netns to treat all negative s32
> > values as a lookup in the current netns, while any values with a
> > positive value in the signed 32-bit integer space would result in a
> > lookup for a socket in the netns corresponding to that id. As before, if
> > the netns with that ID does not exist, no socket will be found.
> > Furthermore, if any bits are set in the upper 32-bits, then no socket
> > will be found.
> >
> > Signed-off-by: Joe Stringer 
> ..
> > +/* Current network namespace */
> > +#define BPF_CURRENT_NETNS(-1L)
>
> I was about to apply it, but then noticed that the name doesn't match
> the rest of the names.
> Could you rename it to BPF_F_CURRENT_NETNS ?

I skipped the F_ part since it's not really a flag, it's a value. I
can put it back though.

> Also reword the commit log so it's less misleading.

Can do.

Cheers,
Joe


Re: [PATCH net] bpf: uninitialized variables in test code

2018-11-30 Thread Alexei Starovoitov
On Thu, Nov 29, 2018 at 01:27:03PM +0300, Dan Carpenter wrote:
> Smatch complains that if bpf_test_run() fails with -ENOMEM at the
> begining then the "duration" is uninitialized.  We then copy the
> unintialized variables to the user inside the bpf_test_finish()
> function.  The functions require CAP_SYS_ADMIN so it's not really an
> information leak.
> 
> Fixes: 1cf1cae963c2 ("bpf: introduce BPF_PROG_TEST_RUN command")
> Signed-off-by: Dan Carpenter 

That is incorrect fixes tag.
It should be pointing to commit f42ee093be29 ("bpf/test_run: support cgroup 
local storage")

bpf_test_run() can only return the value that bpf program returned.
It cannot return -ENOMEM.
That code needs to be refactored.
I think the proper way for bpf_test_run() would be to return 0 or -ENOMEM
and store bpf's retval into extra pointer.
Proper checks need to be added in the callers (bpf_prog_test_run_skb, etc).

Dan, can you do such refactoring or you want to punt back to Roman ?



Re: [PATCH net] bpf: uninitialized variables in test code

2018-11-30 Thread Song Liu
On Thu, Nov 29, 2018 at 2:28 AM Dan Carpenter  wrote:
>
> Smatch complains that if bpf_test_run() fails with -ENOMEM at the
> begining then the "duration" is uninitialized.  We then copy the
> unintialized variables to the user inside the bpf_test_finish()
> function.  The functions require CAP_SYS_ADMIN so it's not really an
> information leak.
>
> Fixes: 1cf1cae963c2 ("bpf: introduce BPF_PROG_TEST_RUN command")
> Signed-off-by: Dan Carpenter 
Acked-by: Song Liu 

> ---
>  net/bpf/test_run.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> index c89c22c49015..49304192a031 100644
> --- a/net/bpf/test_run.c
> +++ b/net/bpf/test_run.c
> @@ -114,7 +114,7 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const 
> union bpf_attr *kattr,
> bool is_l2 = false, is_direct_pkt_access = false;
> u32 size = kattr->test.data_size_in;
> u32 repeat = kattr->test.repeat;
> -   u32 retval, duration;
> +   u32 retval, duration = 0;
> int hh_len = ETH_HLEN;
> struct sk_buff *skb;
> struct sock *sk;
> @@ -196,7 +196,7 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const 
> union bpf_attr *kattr,
> u32 repeat = kattr->test.repeat;
> struct netdev_rx_queue *rxqueue;
> struct xdp_buff xdp = {};
> -   u32 retval, duration;
> +   u32 retval, duration = 0;
> void *data;
> int ret;
>
> --
> 2.11.0
>


Re: [PATCH bpf-next] bpf: allow BPF read access to qdisc pkt_len

2018-11-30 Thread Song Liu
On Fri, Nov 30, 2018 at 12:09 PM Willem de Bruijn
 wrote:
>
> From: Petar Penkov 
>
> The pkt_len field in qdisc_skb_cb stores the skb length as it will
> appear on the wire after segmentation. For byte accounting, this value
> is more accurate than skb->len. It is computed on entry to the TC
> layer, so only valid there.
>
> Allow read access to this field from BPF tc classifier and action
> programs. The implementation is analogous to tc_classid, aside from
> restricting to read access.
>
> Signed-off-by: Petar Penkov 
> Signed-off-by: Vlad Dumitrescu 
> Signed-off-by: Willem de Bruijn 
> ---
>  include/uapi/linux/bpf.h|  1 +
>  net/core/filter.c   | 16 +++
>  tools/include/uapi/linux/bpf.h  |  1 +
>  tools/testing/selftests/bpf/test_verifier.c | 32 +
>  4 files changed, 50 insertions(+)

Please split this into 3 patches:
1 for include/uapi/linux/bpf.h and filter.c
1 for tools/include/uapi/linux/bpf.h
1 for tools/testing/selftests/bpf/test_verifier.c

Other than this
Acked-by: Song Liu 


>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 597afdbc1ab9..ce028482d423 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2483,6 +2483,7 @@ struct __sk_buff {
> __u32 data_meta;
> struct bpf_flow_keys *flow_keys;
> __u64 tstamp;
> +   __u32 pkt_len;
>  };
>
>  struct bpf_tunnel_key {
> diff --git a/net/core/filter.c b/net/core/filter.c
> index bd0df75dc7b6..af071ef22c0a 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -5773,6 +5773,7 @@ static bool sk_filter_is_valid_access(int off, int size,
> case bpf_ctx_range(struct __sk_buff, flow_keys):
> case bpf_ctx_range_till(struct __sk_buff, family, local_port):
> case bpf_ctx_range(struct __sk_buff, tstamp):
> +   case bpf_ctx_range(struct __sk_buff, pkt_len):
> return false;
> }
>
> @@ -5797,6 +5798,7 @@ static bool cg_skb_is_valid_access(int off, int size,
> case bpf_ctx_range(struct __sk_buff, tc_classid):
> case bpf_ctx_range(struct __sk_buff, data_meta):
> case bpf_ctx_range(struct __sk_buff, flow_keys):
> +   case bpf_ctx_range(struct __sk_buff, pkt_len):
> return false;
> case bpf_ctx_range(struct __sk_buff, data):
> case bpf_ctx_range(struct __sk_buff, data_end):
> @@ -5843,6 +5845,7 @@ static bool lwt_is_valid_access(int off, int size,
> case bpf_ctx_range(struct __sk_buff, data_meta):
> case bpf_ctx_range(struct __sk_buff, flow_keys):
> case bpf_ctx_range(struct __sk_buff, tstamp):
> +   case bpf_ctx_range(struct __sk_buff, pkt_len):
> return false;
> }
>
> @@ -6273,6 +6276,7 @@ static bool sk_skb_is_valid_access(int off, int size,
> case bpf_ctx_range(struct __sk_buff, data_meta):
> case bpf_ctx_range(struct __sk_buff, flow_keys):
> case bpf_ctx_range(struct __sk_buff, tstamp):
> +   case bpf_ctx_range(struct __sk_buff, pkt_len):
> return false;
> }
>
> @@ -6360,6 +6364,7 @@ static bool flow_dissector_is_valid_access(int off, int 
> size,
> case bpf_ctx_range(struct __sk_buff, data_meta):
> case bpf_ctx_range_till(struct __sk_buff, family, local_port):
> case bpf_ctx_range(struct __sk_buff, tstamp):
> +   case bpf_ctx_range(struct __sk_buff, pkt_len):
> return false;
> }
>
> @@ -6685,6 +6690,17 @@ static u32 bpf_convert_ctx_access(enum bpf_access_type 
> type,
>   bpf_target_off(struct sk_buff,
>  tstamp, 8,
>  target_size));
> +   break;
> +
> +   case offsetof(struct __sk_buff, pkt_len):
> +   BUILD_BUG_ON(FIELD_SIZEOF(struct qdisc_skb_cb, pkt_len) != 4);
> +
> +   off = si->off;
> +   off -= offsetof(struct __sk_buff, pkt_len);
> +   off += offsetof(struct sk_buff, cb);
> +   off += offsetof(struct qdisc_skb_cb, pkt_len);
> +   *target_size = 4;
> +   *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg, off);
> }
>
> return insn - insn_buf;
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 597afdbc1ab9..ce028482d423 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -2483,6 +2483,7 @@ struct __sk_buff {
> __u32 data_meta;
> struct bpf_flow_keys *flow_keys;
> __u64 tstamp;
> +   __u32 pkt_len;
>  };
>
>  struct bpf_tunnel_key {
> diff --git a/tools/testing/selftests/bpf/test_verifier.c 
> b/tools/testing/selftests/bpf/test_verifier.c
> index 17021d2b6bfe..0ab37fb4afad 100644
> --- a/tools/testing/selftests/bpf/test_verifier.c
> +++ b/to

Re: [PATCH bpf] bpf: Fix verifier log string check for bad alignment.

2018-11-30 Thread Alexei Starovoitov
On Wed, Nov 28, 2018 at 10:33:53PM -0800, David Miller wrote:
> 
> The message got changed a lot time ago.
> 
> This was responsible for 36 test case failures on sparc64.
> 
> Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
> Signed-off-by: David S. Miller 

Applied to bpf tree. Thanks!



Re: [PATCH 7/8] socket: Add SO_TIMESTAMP[NS]_NEW

2018-11-30 Thread Deepa Dinamani
On Sun, Nov 25, 2018 at 6:33 AM Willem de Bruijn
 wrote:
>
> On Sun, Nov 25, 2018 at 12:28 AM Deepa Dinamani  
> wrote:
> >
> > > > > +   if (type == SO_TIMESTAMP_NEW || type == SO_TIMESTAMPNS_NEW)
> > > > > +   sock_set_flag(sk, SOCK_TSTAMP_NEW);
> > > > > +   else
> > > > > +   sock_reset_flag(sk, SOCK_TSTAMP_NEW);
> > > > > +
> > > >
> > > > if adding a boolean whether the socket uses new or old-style
> > > > timestamps, perhaps fail hard if a process tries to set a new-style
> > > > option while an old-style is already set and vice versa. Also include
> > > > SO_TIMESTAMPING_NEW as it toggles the same option.
> >
> > I do not think this is a problem.
> > Consider this example, if there is a user application with updated
> > socket timestamps is linking into a library that is yet to be updated.
>
> Also consider applications that do not use libraries.

Arnd and I talked about this.
We thought that the new options should behave like the already
existing options. The patch already does this.
Eg: Today if we set SO_TIMESTAMP and then try to switch to
SO_TIMESTAMPNS then there is no fail.

Do you still want a hard fail?

-Deepa


Re: [PATCHv2 bpf 1/2] bpf: Support sk lookup in netns with id 0

2018-11-30 Thread Alexei Starovoitov
On Thu, Nov 29, 2018 at 04:29:33PM -0800, Joe Stringer wrote:
> David Ahern and Nicolas Dichtel report that the handling of the netns id
> 0 is incorrect for the BPF socket lookup helpers: rather than finding
> the netns with id 0, it is resolving to the current netns. This renders
> the netns_id 0 inaccessible.
> 
> To fix this, adjust the API for the netns to treat all negative s32
> values as a lookup in the current netns, while any values with a
> positive value in the signed 32-bit integer space would result in a
> lookup for a socket in the netns corresponding to that id. As before, if
> the netns with that ID does not exist, no socket will be found.
> Furthermore, if any bits are set in the upper 32-bits, then no socket
> will be found.
> 
> Signed-off-by: Joe Stringer 
..
> +/* Current network namespace */
> +#define BPF_CURRENT_NETNS(-1L)

I was about to apply it, but then noticed that the name doesn't match
the rest of the names.
Could you rename it to BPF_F_CURRENT_NETNS ?

Also reword the commit log so it's less misleading.

thx!



Re: [PATCH RFC 00/15] Zero ****s, hugload of hugs <3

2018-11-30 Thread Jarkko Sakkinen
On Fri, Nov 30, 2018 at 02:30:45PM -0800, James Bottomley wrote:
> On Fri, 2018-11-30 at 14:26 -0800, Jarkko Sakkinen wrote:
> > On Fri, Nov 30, 2018 at 03:14:59PM -0700, Jonathan Corbet wrote:
> [...]
> > > Have you read Documentation/process/code-of-conduct-
> > > interpretation.rst? 
> > > As has been pointed out, it contains a clear answer to how things
> > > should be interpreted here.
> > 
> > Ugh, was not aware that there two documents.
> > 
> > Yeah, definitely sheds light. Why the documents could not be merged
> > to single common sense code of conduct?
> 
> The fact that we've arrived at essentially an original CoC
> reinterpreted to the point where it's effectively a new CoC has been
> the source of much debate and recrimination over the last few months
> ... you can read it in the ksummit-discuss archives, but I really think
> we don't want to reopen that can of worms.

Got you... Well I now read the 2nd amendment now through, and yeah, kind
of way I work/function anyway.

Thank you for the patience...

/Jarkko


Re: [PATCH 2/8] sockopt: Rename SO_TIMESTAMP* to SO_TIMESTAMP*_OLD

2018-11-30 Thread Deepa Dinamani
On Sat, Nov 24, 2018 at 7:59 PM Willem de Bruijn
 wrote:
>
> On Sat, Nov 24, 2018 at 3:58 AM Deepa Dinamani  wrote:
> >
> > SO_TIMESTAMP, SO_TIMESTAMPNS and SO_TIMESTAMPING options, the
> > way they are currently defined, are not y2038 safe.
> > Subsequent patches in the series add new y2038 safe versions
> > of these options which provide 64 bit timestamps on all
> > architectures uniformly.
> > Hence, rename existing options with OLD tag suffixes.
>
> Why do the existing interfaces have to be renamed when new interfaces are 
> added?

Existing options need to be renamed because of the macro below:

#define SO_TIMESTAMP (sizeof(time_t) == sizeof(__kernel_long_t) ? \
SO_TIMESTAMP_OLD : SO_TIMESTAMP_NEW)

SO_TIMESTAMP is now dependent on size of time_t because of the libc flag.

-Deepa


Re: [PATCH bpf-next 1/4] bpf: Add BPF_F_ANY_ALIGNMENT.

2018-11-30 Thread David Miller
From: Alexei Starovoitov 
Date: Fri, 30 Nov 2018 13:58:20 -0800

> On Thu, Nov 29, 2018 at 07:32:41PM -0800, David Miller wrote:
>> +/* If BPF_F_ANY_ALIGNMENT is used in BPF_PROF_LOAD command, the
>> + * verifier will allow any alignment whatsoever.  This bypasses
>> + * what CONFIG_EFFICIENT_UNALIGNED_ACCESS would cause it to do.
> 
> I think majority of user space folks who read uapi/bpf.h have no idea
> what that kernel config does.
> Could you reword the comment here to say that this flag is only
> effective on architectures and like sparc and mips that don't
> have efficient unaligned access and ignored on x86/arm64 ?

Ok.

>> +if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) &&
>> +(attr->prog_flags & BPF_F_ANY_ALIGNMENT) &&
>> +!capable(CAP_SYS_ADMIN))
>> +return -EPERM;
> 
> I guess we don't want to add:
> if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) &&
> (attr->prog_flags & BPF_F_ANY_ALIGNMENT))
> return -EINVAL;
> 
> so that test_verifier.c can just unconditionally pass this flag
> on all archs ?

Right.

>> @@ -247,7 +249,11 @@ int bpf_verify_program(enum bpf_prog_type type, const 
>> struct bpf_insn *insns,
>>  attr.log_level = log_level;
>>  log_buf[0] = 0;
>>  attr.kern_version = kern_version;
>> -attr.prog_flags = strict_alignment ? BPF_F_STRICT_ALIGNMENT : 0;
>> +if (strict_alignment)
>> +prog_flags |= BPF_F_STRICT_ALIGNMENT;
>> +if (any_alignment)
>> +prog_flags |= BPF_F_ANY_ALIGNMENT;
>> +attr.prog_flags = prog_flags;
> 
> instead of adding another argument may be replace 'int strict_alignment'
> with '__u32 prog_flags' ?
> and future flags won't need tweaks to bpf_verify_program() api.

Yeah the number of arguments are getting out of control, I'll do that.


INFO: task hung in vhost_net_stop_vq

2018-11-30 Thread syzbot

Hello,

syzbot found the following crash on:

HEAD commit:ef78e5ec9214 ia64: export node_distance function
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=175c241540
kernel config:  https://syzkaller.appspot.com/x/.config?x=c94f9f0c0363db4b
dashboard link: https://syzkaller.appspot.com/bug?extid=d21e6e297322a900c128
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=141db34d40
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=108ef29340

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+d21e6e297322a900c...@syzkaller.appspotmail.com

IPv6: ADDRCONF(NETDEV_UP): veth0: link is not ready
IPv6: ADDRCONF(NETDEV_UP): veth1: link is not ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
8021q: adding VLAN 0 to HW filter on device team0
INFO: task syz-executor251:6230 blocked for more than 140 seconds.
  Not tainted 4.20.0-rc4+ #132
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor251 D21080  6230   6229 0x8002
Call Trace:
 context_switch kernel/sched/core.c:2831 [inline]
 __schedule+0x8cf/0x21d0 kernel/sched/core.c:3472
 schedule+0xfe/0x460 kernel/sched/core.c:3516
 schedule_preempt_disabled+0x13/0x20 kernel/sched/core.c:3574
 __mutex_lock_common kernel/locking/mutex.c:1002 [inline]
 __mutex_lock+0xaff/0x16f0 kernel/locking/mutex.c:1072
 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1087
 vhost_net_stop_vq+0x2d/0x120 drivers/vhost/net.c:1306
 vhost_net_stop drivers/vhost/net.c:1320 [inline]
 vhost_net_release+0x5b/0x1d0 drivers/vhost/net.c:1352
 __fput+0x385/0xa30 fs/file_table.c:278
 fput+0x15/0x20 fs/file_table.c:309
 task_work_run+0x1e8/0x2a0 kernel/task_work.c:113
 exit_task_work include/linux/task_work.h:22 [inline]
 do_exit+0xf86/0x26d0 kernel/exit.c:867
 do_group_exit+0x177/0x440 kernel/exit.c:970
 __do_sys_exit_group kernel/exit.c:981 [inline]
 __se_sys_exit_group kernel/exit.c:979 [inline]
 __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:979
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x445338
Code: 72 6e 20 69 73 20 25 64 0a 00 72 65 67 65 78 3a 20 65 6e 64 20 73 65  
61 72 63 68 2c 20 66 6f 75 6e 64 20 25 64 0a 00 23 25 33 <2e> 33 64 00 5f  
00 5f 2e 00 31 30 30 00 31 30 31 00 31 30 32 00 31

RSP: 002b:7ffd228e1cb8 EFLAGS: 0246 ORIG_RAX: 00e7
RAX: ffda RBX: 0001 RCX: 00445338
RDX: 0001 RSI: 003c RDI: 0001
RBP: 004cd650 R08: 00e7 R09: ffd0
R10: 7ffd228e1d00 R11: 0246 R12: 0001
R13: 006e1320 R14: 000a R15: 

Showing all locks held in the system:
1 lock held by khungtaskd/1019:
 #0: da98dfa9 (rcu_read_lock){}, at:  
debug_show_all_locks+0xd0/0x424 kernel/locking/lockdep.c:4379

1 lock held by rsyslogd/6077:
 #0: a7881970 (&f->f_pos_lock){+.+.}, at: __fdget_pos+0x1bb/0x200  
fs/file.c:766

2 locks held by getty/6199:
 #0: 51a5383e (&tty->ldisc_sem){}, at:  
ldsem_down_read+0x32/0x40 drivers/tty/tty_ldsem.c:353
 #1: e292e313 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x335/0x1e80 drivers/tty/n_tty.c:2154

2 locks held by getty/6200:
 #0: 0155ff5f (&tty->ldisc_sem){}, at:  
ldsem_down_read+0x32/0x40 drivers/tty/tty_ldsem.c:353
 #1: 1a79dc08 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x335/0x1e80 drivers/tty/n_tty.c:2154

2 locks held by getty/6201:
 #0: 50da3a04 (&tty->ldisc_sem){}, at:  
ldsem_down_read+0x32/0x40 drivers/tty/tty_ldsem.c:353
 #1: 0068a427 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x335/0x1e80 drivers/tty/n_tty.c:2154

2 locks held by getty/6202:
 #0: 4bd50c6d (&tty->ldisc_sem){}, at:  
ldsem_down_read+0x32/0x40 drivers/tty/tty_ldsem.c:353
 #1: 7dde2138 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x335/0x1e80 drivers/tty/n_tty.c:2154

2 locks held by getty/6203:
 #0: 54dc4f51 (&tty->ldisc_sem){}, at:  
ldsem_down_read+0x32/0x40 drivers/tty/tty_ldsem.c:353
 #1: 87e81f0c (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x335/0x1e80 drivers/tty/n_tty.c:2154

2 locks held by getty/6204:
 #0: 4f97fe78 (&tty->ldisc_sem){}, at:  
ldsem_down_read+0x32/0x40 drivers/tty/tty_ldsem.c:353
 #1: 9af2a5dd (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x335/0x1e80 drivers/tty/n_tty.c:2154

2 locks held by getty/6205:
 #0: cb15d6f6 (&tty->ldisc_sem){}, at:  
ldsem_down_read+0x32/0x40 drivers/tty/tty_ldsem.c:353
 #1: 392a1db6 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x335/0x1e80 drivers/tty/n_tty.c:2154

1 lock held by syz-executor251/6230:
 #0: 9c6ae67f (&vq->mutex){+.+.

Re: [PATCH RFC 00/15] Zero ****s, hugload of hugs <3

2018-11-30 Thread James Bottomley
On Fri, 2018-11-30 at 14:26 -0800, Jarkko Sakkinen wrote:
> On Fri, Nov 30, 2018 at 03:14:59PM -0700, Jonathan Corbet wrote:
[...]
> > Have you read Documentation/process/code-of-conduct-
> > interpretation.rst? 
> > As has been pointed out, it contains a clear answer to how things
> > should be interpreted here.
> 
> Ugh, was not aware that there two documents.
> 
> Yeah, definitely sheds light. Why the documents could not be merged
> to single common sense code of conduct?

The fact that we've arrived at essentially an original CoC
reinterpreted to the point where it's effectively a new CoC has been
the source of much debate and recrimination over the last few months
... you can read it in the ksummit-discuss archives, but I really think
we don't want to reopen that can of worms.

James




Re: [PATCH RFC 00/15] Zero ****s, hugload of hugs <3

2018-11-30 Thread Jarkko Sakkinen
On Fri, Nov 30, 2018 at 02:26:05PM -0800, Jarkko Sakkinen wrote:
> On Fri, Nov 30, 2018 at 03:14:59PM -0700, Jonathan Corbet wrote:
> > On Fri, 30 Nov 2018 14:12:19 -0800
> > Jarkko Sakkinen  wrote:
> > 
> > > As a maintainer myself (and based on somewhat disturbed feedback from
> > > other maintainers) I can only make the conclusion that nobody knows what
> > > the responsibility part here means.
> > > 
> > > I would interpret, if I read it like at lawyer at least, that even for
> > > existing code you would need to do the changes postmorterm.
> > > 
> > > Is this wrong interpretation?  Should I conclude that I made a mistake
> > > by reading the CoC and trying to understand what it *actually* says?
> > > After this discussion, I can say that I understand it less than before.
> > 
> > Have you read Documentation/process/code-of-conduct-interpretation.rst?
> > As has been pointed out, it contains a clear answer to how things should
> > be interpreted here.
> 
> Ugh, was not aware that there two documents.
> 
> Yeah, definitely sheds light. Why the documents could not be merged to
> single common sense code of conduct?

I.e. if the latter that you pointed out tells you what you actually
should do what value does the former bring?

Just looked up archives and realized that there has been whole alot
of CoC related discussions. No wonder this is seen as waste of time.

/Jarkko


Re: [PATCH RFC 00/15] Zero ****s, hugload of hugs <3

2018-11-30 Thread James Bottomley
On Fri, 2018-11-30 at 14:12 -0800, Jarkko Sakkinen wrote:
[...]
> I pasted this already to another response and this was probably the
> part that ignited me to send the patch set (was a few days ago, so
> had to revisit to find the exact paragraph):

I replied in to the other thread.

> "Maintainers have the right and responsibility to remove, edit, or
> reject comments, commits, code, wiki edits, issues, and other
> contributions that are not aligned to this Code of Conduct, or to ban
> temporarily or permanently any contributor for other behaviors that
> they deem inappropriate, threatening, offensive, or harmful."
> 
> The whole patch set is neither a joke/troll nor something I would
> necessarily want to be include myself. It does have the RFC tag.
> 
> As a maintainer myself (and based on somewhat disturbed feedback from
> other maintainers) I can only make the conclusion that nobody knows
> what the responsibility part here means.
> 
> I would interpret, if I read it like at lawyer at least, that even
> for existing code you would need to do the changes postmorterm.

That's wrong in the light of the interpretation document, yes.

> Is this wrong interpretation?  Should I conclude that I made a
> mistake by reading the CoC and trying to understand what it
> *actually* says?

You can't read it in isolation, you need to read it along with the
interpretation document.  The latter was created precisely because
there was a lot of push back on interpretation problems and ambiguities
with the original CoC and it specifically covers this case (and a lot
of others).

James


> After this discussion, I can say that I understand it less than
> before.
> 
> /Jarkko
> 



Re: [PATCH RFC 00/15] Zero ****s, hugload of hugs <3

2018-11-30 Thread Jarkko Sakkinen
On Fri, Nov 30, 2018 at 03:14:59PM -0700, Jonathan Corbet wrote:
> On Fri, 30 Nov 2018 14:12:19 -0800
> Jarkko Sakkinen  wrote:
> 
> > As a maintainer myself (and based on somewhat disturbed feedback from
> > other maintainers) I can only make the conclusion that nobody knows what
> > the responsibility part here means.
> > 
> > I would interpret, if I read it like at lawyer at least, that even for
> > existing code you would need to do the changes postmorterm.
> > 
> > Is this wrong interpretation?  Should I conclude that I made a mistake
> > by reading the CoC and trying to understand what it *actually* says?
> > After this discussion, I can say that I understand it less than before.
> 
> Have you read Documentation/process/code-of-conduct-interpretation.rst?
> As has been pointed out, it contains a clear answer to how things should
> be interpreted here.

Ugh, was not aware that there two documents.

Yeah, definitely sheds light. Why the documents could not be merged to
single common sense code of conduct?

/Jarkko


Re: [PATCH 2/2] net: phy: ensure autoneg is configured when resuming a phydev

2018-11-30 Thread Heiner Kallweit
On 30.11.2018 19:45, Anssi Hannula wrote:
> When a PHY_HALTED phydev is resumed by phy_start(), it is set to
> PHY_RESUMING to wait for the link to come up.
> 
> However, if the phydev was put to PHY_HALTED (by e.g. phy_stop()) before
> autonegotiation was ever started by phy_state_machine(), autonegotiation
> remains unconfigured, i.e. phy_config_aneg() is never called.
> 
phy_stop() should only be called once the PHY was started. But it's
right that we don't have full control over it because misbehaving
drivers may call phy_stop() even if the PHY hasn't been started yet.

I think phy_error() and phy_stop() should be extended to set the state
to PHY_HALTED only if the PHY is in a started state, means:
don't touch the state if it's DOWN, READY, or HALTED.
In case of phy_error() it's more a precaution, because it's used within
phylib only and AFAICS it can be triggered from a started state only.

So yes, there is a theoretical issue. But as you wrote already, I think
there's a better way to deal with it.

For checking whether PHY is in a started state you could use the helper
which is being discussed here:
https://marc.info/?l=linux-netdev&m=154360368104946&w=2


> Fix that by going to PHY_UP instead of PHY_RESUMING if autonegotiation
> has never been configured.
> 
> Signed-off-by: Anssi Hannula 
> ---
> 
> This doesn't feel as clean is I'd like to, though. Maybe there is a
> better way?
> 
> 
>  drivers/net/phy/phy.c | 11 ++-
>  include/linux/phy.h   |  2 ++
>  2 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
> index d46858694db9..7a650cce7599 100644
> --- a/drivers/net/phy/phy.c
> +++ b/drivers/net/phy/phy.c
> @@ -553,6 +553,8 @@ int phy_start_aneg(struct phy_device *phydev)
>   if (err < 0)
>   goto out_unlock;
>  
> + phydev->autoneg_configured = 1;
> +
>   if (phydev->state != PHY_HALTED) {
>   if (AUTONEG_ENABLE == phydev->autoneg) {
>   err = phy_check_link_status(phydev);
> @@ -893,7 +895,14 @@ void phy_start(struct phy_device *phydev)
>   break;
>   }
>  
> - phydev->state = PHY_RESUMING;
> + /* if autoneg/forcing was never configured, go back to PHY_UP
> +  * to make that happen
> +  */
> + if (!phydev->autoneg_configured)
> + phydev->state = PHY_UP;
> + else
> + phydev->state = PHY_RESUMING;
> +
>   break;
>   default:
>   break;
> diff --git a/include/linux/phy.h b/include/linux/phy.h
> index 8f927246acdb..b306d5ed9d09 100644
> --- a/include/linux/phy.h
> +++ b/include/linux/phy.h
> @@ -389,6 +389,8 @@ struct phy_device {
>   unsigned loopback_enabled:1;
>  
>   unsigned autoneg:1;
> + /* autoneg has been configured at least once */
> + unsigned autoneg_configured:1;
>   /* The most recently read link state */
>   unsigned link:1;
>  
> 



Re: [PATCH 3/8] socket: Disentangle SOCK_RCVTSTAMPNS from SOCK_RCVTSTAMP

2018-11-30 Thread Deepa Dinamani
On Sun, Nov 25, 2018 at 10:19 AM David Miller  wrote:
>
> From: Willem de Bruijn 
> Date: Sun, 25 Nov 2018 09:18:55 -0500
>
> > The existing logic is as is for a reason. There is no need to change
> > it to satisfy the main purpose of your patchset?
> >
> > It is structured as one bit to test whether a timestamp is requested
> > and another to select among two variants usec/nsec. Just add another
> > layer of branching between new/old in cases where this distinction is
> > needed.
> >
> > Please avoid code churn unless needed.
>
> +1

This patch makes it easier to add logic for 2 new socket time options.
But, if you prefer for all of the options to depend on SOCK_RCVTSTAMP
then I will drop it.

-Deepa


Re: [PATCH RFC 00/15] Zero ****s, hugload of hugs <3

2018-11-30 Thread Jonathan Corbet
On Fri, 30 Nov 2018 14:12:19 -0800
Jarkko Sakkinen  wrote:

> As a maintainer myself (and based on somewhat disturbed feedback from
> other maintainers) I can only make the conclusion that nobody knows what
> the responsibility part here means.
> 
> I would interpret, if I read it like at lawyer at least, that even for
> existing code you would need to do the changes postmorterm.
> 
> Is this wrong interpretation?  Should I conclude that I made a mistake
> by reading the CoC and trying to understand what it *actually* says?
> After this discussion, I can say that I understand it less than before.

Have you read Documentation/process/code-of-conduct-interpretation.rst?
As has been pointed out, it contains a clear answer to how things should
be interpreted here.

Thanks,

jon


Re: [PATCH RFC 00/15] Zero ****s, hugload of hugs <3

2018-11-30 Thread Jarkko Sakkinen
On Fri, Nov 30, 2018 at 01:57:49PM -0800, James Bottomley wrote:
> On Fri, 2018-11-30 at 13:44 -0800, Jarkko Sakkinen wrote:
> > On Fri, Nov 30, 2018 at 01:01:02PM -0800, James Bottomley wrote:
> > > No because use of what some people consider to be bad language
> > > isn't necessarily abusive, offensive or degrading.  Our most
> > > heavily censored medium is TV and "fuck" is now considered
> > > acceptable in certain contexts on most channels in the UK and EU.
> > 
> > This makes following the CoC extremely hard to a non-native speaker
> > as it is not too explicit on what is OK and what is not. I did
> > through the whole thing with an eye glass and this what I deduced
> > from it.
> 
> OK, so something that would simply be considered in some quarters as
> bad language isn't explicitly banned.  The thing which differentiates
> simple bad language from "abusive, offensive or degrading language",
> which is called out by the CoC, is the context and the target.
> 
> So when it's a simple expletive or the code of the author or even the
> hardware is the target, I'd say it's an easy determination it's not a
> CoC violation.  If someone else's code is the target or the inventor of
> the hardware is targetted by name, I'd say it is.  Even non-native
> English speakers should be able to determine target and context,
> because that's the essence of meaning.

I pasted this already to another response and this was probably the part
that ignited me to send the patch set (was a few days ago, so had to
revisit to find the exact paragraph):

"Maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other
contributions that are not aligned to this Code of Conduct, or to ban
temporarily or permanently any contributor for other behaviors that they
deem inappropriate, threatening, offensive, or harmful."

The whole patch set is neither a joke/troll nor something I would
necessarily want to be include myself. It does have the RFC tag.

As a maintainer myself (and based on somewhat disturbed feedback from
other maintainers) I can only make the conclusion that nobody knows what
the responsibility part here means.

I would interpret, if I read it like at lawyer at least, that even for
existing code you would need to do the changes postmorterm.

Is this wrong interpretation?  Should I conclude that I made a mistake
by reading the CoC and trying to understand what it *actually* says?
After this discussion, I can say that I understand it less than before.

/Jarkko


Re: [PATCH bpf-next 0/5] tools: bpftool: fixes and small improvements

2018-11-30 Thread Alexei Starovoitov
On Fri, Nov 30, 2018 at 04:25:43PM +, Quentin Monnet wrote:
> Hi,
> Several items for bpftool are included in this set: the first three patches
> are fixes for bpftool itself and bash completion, while the last two
> slightly improve the information obtained when dumping programs or maps, on
> Daniel's suggestion. Please refer to individual commit logs for more
> details.

Applied, Thanks



Re: [PATCH RFC 00/15] Zero ****s, hugload of hugs <3

2018-11-30 Thread James Bottomley
On Fri, 2018-11-30 at 13:54 -0800, Jarkko Sakkinen wrote:
> On Fri, Nov 30, 2018 at 01:48:08PM -0800, David Miller wrote:
> > From: Jarkko Sakkinen 
> > Date: Fri, 30 Nov 2018 13:44:05 -0800
> > 
> > > On Fri, Nov 30, 2018 at 01:01:02PM -0800, James Bottomley wrote:
> > > > No because use of what some people consider to be bad language
> > > > isn't
> > > > necessarily abusive, offensive or degrading.  Our most heavily
> > > > censored
> > > > medium is TV and "fuck" is now considered acceptable in certain
> > > > contexts on most channels in the UK and EU.
> > > 
> > > This makes following the CoC extremely hard to a non-native
> > > speaker as
> > > it is not too explicit on what is OK and what is not. I did
> > > through the
> > > whole thing with an eye glass and this what I deduced from it.
> > 
> > It would be helpful if you could explain what part of the language
> > is unclear wrt. explaining how CoC does not apply to existing code.
> > 
> > That part seems very explicit to me.
> 
> Well, now that I re-read it, it was this part to be exact:
> 
> "Maintainers have the right and responsibility to remove, edit, or
> reject comments, commits, code, wiki edits, issues, and other
> contributions that are not aligned to this Code of Conduct, or to ban
> temporarily or permanently any contributor for other behaviors that
> they deem inappropriate, threatening, offensive, or harmful."
> 
> How this should be interpreted?

Firstly, this is *only* about contributions going forward.  The
interpretation document says we don't have to look back into the
repository and we definitely can't remove something from git that's
already been committed.

As a Maintainer, if you feel bad language is inappropriate for your
subsystem then you say so and reject with that reason patches that come
in containing it (suggesting alternative rewordings).  However, your
determination about what is or isn't acceptable in your subsystem isn't
binding on other maintainers, who may have different standards ... this
is identical to what we do with coding, like your insistence on one
line per variable or other subsystem's insistence on reverse christmas
tree for includes ...

James


> I have not really followed the previous CoC discussions as I try to
> always use polite language so I'm sorry if this duplicate.
> 
> /Jarkko
> 



Re: [PATCH bpf-next 1/4] bpf: Add BPF_F_ANY_ALIGNMENT.

2018-11-30 Thread Alexei Starovoitov
On Thu, Nov 29, 2018 at 07:32:41PM -0800, David Miller wrote:
> 
> Often we want to write tests cases that check things like bad context
> offset accesses.  And one way to do this is to use an odd offset on,
> for example, a 32-bit load.
> 
> This unfortunately triggers the alignment checks first on platforms
> that do not set CONFIG_EFFICIENT_UNALIGNED_ACCESS.  So the test
> case see the alignment failure rather than what it was testing for.
> 
> It is often not completely possible to respect the original intention
> of the test, or even test the same exact thing, while solving the
> alignment issue.
> 
> Another option could have been to check the alignment after the
> context and other validations are performed by the verifier, but
> that is a non-trivial change to the verifier.
> 
> Signed-off-by: David S. Miller 
> ---
>  include/uapi/linux/bpf.h| 10 ++
>  kernel/bpf/syscall.c|  7 ++-
>  kernel/bpf/verifier.c   |  2 ++
>  tools/include/uapi/linux/bpf.h  | 10 ++
>  tools/lib/bpf/bpf.c | 12 +---
>  tools/lib/bpf/bpf.h |  6 +++---
>  tools/testing/selftests/bpf/test_align.c|  2 +-
>  tools/testing/selftests/bpf/test_verifier.c |  1 +
>  8 files changed, 42 insertions(+), 8 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 426b5c8..c9647ea 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -232,6 +232,16 @@ enum bpf_attach_type {
>   */
>  #define BPF_F_STRICT_ALIGNMENT   (1U << 0)
>  
> +/* If BPF_F_ANY_ALIGNMENT is used in BPF_PROF_LOAD command, the
> + * verifier will allow any alignment whatsoever.  This bypasses
> + * what CONFIG_EFFICIENT_UNALIGNED_ACCESS would cause it to do.

I think majority of user space folks who read uapi/bpf.h have no idea
what that kernel config does.
Could you reword the comment here to say that this flag is only
effective on architectures and like sparc and mips that don't
have efficient unaligned access and ignored on x86/arm64 ?

> + * It is mostly used for testing when we want to validate the
> + * context and memory access aspects of the validator, but because
> + * of an unaligned access the alignment check would trigger before
> + * the one we are interested in.
> + */
> +#define BPF_F_ANY_ALIGNMENT  (1U << 1)
> +
>  /* when bpf_ldimm64->src_reg == BPF_PSEUDO_MAP_FD, bpf_ldimm64->imm == fd */
>  #define BPF_PSEUDO_MAP_FD1
>  
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index cf5040f..cae65bb 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1450,9 +1450,14 @@ static int bpf_prog_load(union bpf_attr *attr)
>   if (CHECK_ATTR(BPF_PROG_LOAD))
>   return -EINVAL;
>  
> - if (attr->prog_flags & ~BPF_F_STRICT_ALIGNMENT)
> + if (attr->prog_flags & ~(BPF_F_STRICT_ALIGNMENT | BPF_F_ANY_ALIGNMENT))
>   return -EINVAL;
>  
> + if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) &&
> + (attr->prog_flags & BPF_F_ANY_ALIGNMENT) &&
> + !capable(CAP_SYS_ADMIN))
> + return -EPERM;

I guess we don't want to add:
if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) &&
(attr->prog_flags & BPF_F_ANY_ALIGNMENT))
return -EINVAL;

so that test_verifier.c can just unconditionally pass this flag
on all archs ?
Kinda weird that this knob doesn't do anything on x86.
But I guess it's fine.

> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> index 03f9bcc..d4f76d2 100644
> --- a/tools/lib/bpf/bpf.c
> +++ b/tools/lib/bpf/bpf.c
> @@ -232,9 +232,11 @@ int bpf_load_program(enum bpf_prog_type type, const 
> struct bpf_insn *insns,
>  
>  int bpf_verify_program(enum bpf_prog_type type, const struct bpf_insn *insns,
>  size_t insns_cnt, int strict_alignment,
> -const char *license, __u32 kern_version,
> -char *log_buf, size_t log_buf_sz, int log_level)
> +int any_alignment, const char *license,
> +__u32 kern_version, char *log_buf, size_t log_buf_sz,
> +int log_level)
>  {
> + __u32 prog_flags = 0;
>   union bpf_attr attr;
>  
>   bzero(&attr, sizeof(attr));
> @@ -247,7 +249,11 @@ int bpf_verify_program(enum bpf_prog_type type, const 
> struct bpf_insn *insns,
>   attr.log_level = log_level;
>   log_buf[0] = 0;
>   attr.kern_version = kern_version;
> - attr.prog_flags = strict_alignment ? BPF_F_STRICT_ALIGNMENT : 0;
> + if (strict_alignment)
> + prog_flags |= BPF_F_STRICT_ALIGNMENT;
> + if (any_alignment)
> + prog_flags |= BPF_F_ANY_ALIGNMENT;
> + attr.prog_flags = prog_flags;

instead of adding another argument may be replace 'int strict_alignment'
with '__u32 prog_flags' ?
and future flags won't need tweaks to bpf_verify_program() api.



Re: [PATCH RFC 00/15] Zero ****s, hugload of hugs <3

2018-11-30 Thread James Bottomley
On Fri, 2018-11-30 at 13:44 -0800, Jarkko Sakkinen wrote:
> On Fri, Nov 30, 2018 at 01:01:02PM -0800, James Bottomley wrote:
> > No because use of what some people consider to be bad language
> > isn't necessarily abusive, offensive or degrading.  Our most
> > heavily censored medium is TV and "fuck" is now considered
> > acceptable in certain contexts on most channels in the UK and EU.
> 
> This makes following the CoC extremely hard to a non-native speaker
> as it is not too explicit on what is OK and what is not. I did
> through the whole thing with an eye glass and this what I deduced
> from it.

OK, so something that would simply be considered in some quarters as
bad language isn't explicitly banned.  The thing which differentiates
simple bad language from "abusive, offensive or degrading language",
which is called out by the CoC, is the context and the target.

So when it's a simple expletive or the code of the author or even the
hardware is the target, I'd say it's an easy determination it's not a
CoC violation.  If someone else's code is the target or the inventor of
the hardware is targetted by name, I'd say it is.  Even non-native
English speakers should be able to determine target and context,
because that's the essence of meaning.

James



Re: [PATCH RFC 00/15] Zero ****s, hugload of hugs <3

2018-11-30 Thread Jarkko Sakkinen
On Fri, Nov 30, 2018 at 01:48:08PM -0800, David Miller wrote:
> From: Jarkko Sakkinen 
> Date: Fri, 30 Nov 2018 13:44:05 -0800
> 
> > On Fri, Nov 30, 2018 at 01:01:02PM -0800, James Bottomley wrote:
> >> No because use of what some people consider to be bad language isn't
> >> necessarily abusive, offensive or degrading.  Our most heavily censored
> >> medium is TV and "fuck" is now considered acceptable in certain
> >> contexts on most channels in the UK and EU.
> > 
> > This makes following the CoC extremely hard to a non-native speaker as
> > it is not too explicit on what is OK and what is not. I did through the
> > whole thing with an eye glass and this what I deduced from it.
> 
> It would be helpful if you could explain what part of the language
> is unclear wrt. explaining how CoC does not apply to existing code.
> 
> That part seems very explicit to me.

Well, now that I re-read it, it was this part to be exact:

"Maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other
contributions that are not aligned to this Code of Conduct, or to ban
temporarily or permanently any contributor for other behaviors that they
deem inappropriate, threatening, offensive, or harmful."

How this should be interpreted?

I have not really followed the previous CoC discussions as I try to
always use polite language so I'm sorry if this duplicate.

/Jarkko


Re: [PATCH RFC 00/15] Zero ****s, hugload of hugs <3

2018-11-30 Thread Jens Axboe
On 11/30/18 2:47 PM, David Miller wrote:
> From: Jarkko Sakkinen 
> Date: Fri, 30 Nov 2018 13:42:33 -0800
> 
>> Can you tell how the CoC should be interpreted then?
> 
> Regardless of what I think, as others have showen the CoC explicitly
> does not apply to existing code.

And with that, can we please put an end to this thread (and patchset)?

-- 
Jens Axboe



[PATCH v2] dma-debug: Kconfig for PREALLOC_DMA_DEBUG_ENTRIES

2018-11-30 Thread Qian Cai
The amount of DMA mappings from Hisilicon HNS ethernet devices is huge,
so it could trigger "DMA-API: debugging out of memory - disabling".

hnae_get_handle [1]
  hnae_init_queue
hnae_init_ring
  hnae_alloc_buffers [2]
debug_dma_map_page
  dma_entry_alloc

[1] for (i = 0; i < handle->q_num; i++)
[2] for (i = 0; i < ring->desc_num; i++)

Also, "#define HNS_DSAF_MAX_DESC_CNT 1024"

On this Huawei TaiShan 2280 aarch64 server, it has reached the limit
already,

4 (NICs) x 16 (queues) x 1024 (port descption numbers) = 65536

Added a Kconfig entry for PREALLOC_DMA_DEBUG_ENTRIES, so make it easier
for users to deal with special cases like this.

Signed-off-by: Qian Cai 
---

Changes since v1:
* Increased the default value if has HNS_ENET suggested by Robin.

 kernel/dma/debug.c |  9 ++---
 lib/Kconfig.debug  | 10 ++
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index 231ca4628062..3752fb23f72f 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -41,11 +41,6 @@
 #define HASH_FN_SHIFT   13
 #define HASH_FN_MASK(HASH_SIZE - 1)
 
-/* allow architectures to override this if absolutely required */
-#ifndef PREALLOC_DMA_DEBUG_ENTRIES
-#define PREALLOC_DMA_DEBUG_ENTRIES (1 << 16)
-#endif
-
 enum {
dma_debug_single,
dma_debug_page,
@@ -132,7 +127,7 @@ static u32 min_free_entries;
 static u32 nr_total_entries;
 
 /* number of preallocated entries requested by kernel cmdline */
-static u32 nr_prealloc_entries = PREALLOC_DMA_DEBUG_ENTRIES;
+static u32 nr_prealloc_entries = CONFIG_PREALLOC_DMA_DEBUG_ENTRIES;
 
 /* debugfs dentry's for the stuff above */
 static struct dentry *dma_debug_dent__read_mostly;
@@ -1063,7 +1058,7 @@ static __init int dma_debug_entries_cmdline(char *str)
if (!str)
return -EINVAL;
if (!get_option(&str, &nr_prealloc_entries))
-   nr_prealloc_entries = PREALLOC_DMA_DEBUG_ENTRIES;
+   nr_prealloc_entries = CONFIG_PREALLOC_DMA_DEBUG_ENTRIES;
return 0;
 }
 
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 1af29b8224fd..9f85a7a13647 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1659,6 +1659,16 @@ config DMA_API_DEBUG
 
  If unsure, say N.
 
+config PREALLOC_DMA_DEBUG_ENTRIES
+   int "Preallocated DMA-API debugging entries"
+   depends on DMA_API_DEBUG
+   default 131072 if HNS_ENET
+   default 65536
+   help
+ The number of preallocated entries for DMA-API debugging code. One
+ entry is required per DMA-API allocation. Increase this if the DMA-API
+ debugging code disables itself because the default is too low.
+
 config DMA_API_DEBUG_SG
bool "Debug DMA scatter-gather usage"
default y
-- 
2.17.2 (Apple Git-113)



Re: [PATCH RFC 00/15] Zero ****s, hugload of hugs <3

2018-11-30 Thread David Miller
From: Jarkko Sakkinen 
Date: Fri, 30 Nov 2018 13:44:05 -0800

> On Fri, Nov 30, 2018 at 01:01:02PM -0800, James Bottomley wrote:
>> No because use of what some people consider to be bad language isn't
>> necessarily abusive, offensive or degrading.  Our most heavily censored
>> medium is TV and "fuck" is now considered acceptable in certain
>> contexts on most channels in the UK and EU.
> 
> This makes following the CoC extremely hard to a non-native speaker as
> it is not too explicit on what is OK and what is not. I did through the
> whole thing with an eye glass and this what I deduced from it.

It would be helpful if you could explain what part of the language
is unclear wrt. explaining how CoC does not apply to existing code.

That part seems very explicit to me.


Re: [PATCH RFC 00/15] Zero ****s, hugload of hugs <3

2018-11-30 Thread David Miller
From: Jarkko Sakkinen 
Date: Fri, 30 Nov 2018 13:42:33 -0800

> Can you tell how the CoC should be interpreted then?

Regardless of what I think, as others have showen the CoC explicitly
does not apply to existing code.


Re: [PATCH net-next 0/6 v3] qed*: Doorbell overflow recovery

2018-11-30 Thread David Miller
From: Ariel Elior 
Date: Wed, 28 Nov 2018 18:16:01 +0200

> Doorbell Overflow
> If sufficient CPU cores will send doorbells at a sufficiently high rate, they
> can cause an overflow in the doorbell queue block message fifo. When fill 
> level
> reaches maximum, the device stops accepting all doorbells from that PF until a
> recovery procedure has taken place.
> 
> Doorbell Overflow Recovery
> The recovery procedure basically means resending the last doorbell for every
> doorbelling entity. A doorbelling entity is anything which may send doorbells:
> L2 tx ring, rdma sq/rq/cq, light l2, vf l2 tx ring, spq, etc. This relies on
> the design assumption that all doorbells are aggregative, so last doorbell
> carries the information of all previous doorbells.
> 
> APIs
> All doorbelling entities need to register with the mechanism before sending
> doorbells. The registration entails providing the doorbell address the entity
> would be using, and a virtual address where last doorbell data can be found.
> Typically fastpath structures already have this construct.
> 
> Executing the recovery procedure
> Handling the attentions, iterating over all the registered entities and
> resending their doorbells, is all handled within qed core module.
> 
> Relevance
> All doorbelling entities in all protocols need to register with the mechanism,
> via the new APIs. Technically this is quite simple (just call the API). Some
> protocol fastpath implementation may not have the doorbell data stored 
> anywhere
> (compute it from scratch every time) and will have to add such a place.
> This is rare and is also better practice (save some cycles on the fastpath).
> 
> Performance Penalty
> No performance penalty should incur as a result of this feature. If anything
> performance can improve by avoiding recalcualtion of doorbell data everytime
> doorbell is sent (in some flows).
> 
> Add the database used to register doorbelling entities, and APIs for adding
> and deleting entries, and logic for traversing the database and doorbelling
> once on behalf of all entities.
> 
> Please consider applying to net-next.

Series applied, thanks.


Re: [PATCH RFC 00/15] Zero ****s, hugload of hugs <3

2018-11-30 Thread Jarkko Sakkinen
On Fri, Nov 30, 2018 at 01:01:02PM -0800, James Bottomley wrote:
> No because use of what some people consider to be bad language isn't
> necessarily abusive, offensive or degrading.  Our most heavily censored
> medium is TV and "fuck" is now considered acceptable in certain
> contexts on most channels in the UK and EU.

This makes following the CoC extremely hard to a non-native speaker as
it is not too explicit on what is OK and what is not. I did through the
whole thing with an eye glass and this what I deduced from it.

/Jarkko


Re: [PATCH RFC 00/15] Zero ****s, hugload of hugs <3

2018-11-30 Thread Jarkko Sakkinen
On Fri, Nov 30, 2018 at 12:35:07PM -0800, David Miller wrote:
> From: Jens Axboe 
> Date: Fri, 30 Nov 2018 13:12:26 -0700
> 
> > On 11/30/18 12:56 PM, Davidlohr Bueso wrote:
> >> On Fri, 30 Nov 2018, Kees Cook wrote:
> >> 
> >>> On Fri, Nov 30, 2018 at 11:27 AM Jarkko Sakkinen
> >>>  wrote:
> 
>  In order to comply with the CoC, replace  with a hug.
> >> 
> >> I hope this is some kind of joke. How would anyone get offended by reading
> >> technical comments? This is all beyond me...
> > 
> > Agree, this is insanity.
> 
> And even worse it is censorship.

It is not close to a cencorship, especially when I used RFC tag, which
essentially says that I'm not saying "please take this", right?

Can you tell how the CoC should be interpreted then? I read it through
on my plane trip with an eye glass.

Is cursing OK?

/Jarkko


Re: [PATCH net-next 0/2] rtnetlink: avoid a warning in rtnl_newlink()

2018-11-30 Thread David Miller
From: Jakub Kicinski 
Date: Tue, 27 Nov 2018 22:32:29 -0800

> I've been hoping for some time that someone more competent would fix
> the stack frame size warning in rtnl_newlink(), but looks like I'll
> have to take a stab at it myself :)  That's the only warning I see
> in most of my builds.
> 
> First patch refactors away a somewhat surprising if (1) code block.
> Reindentation will most likely cause cherry-pick problems but OTOH
> rtnl_newlink() doesn't seem to be changed often, so perhaps we can
> risk it in the name of cleaner code?
> 
> Second patch fixes the warning in simplest possible way.  I was
> pondering if there is any more clever solution, but I can't see it..
> rtnl_newlink() is quite long with a lot of possible execution paths
> so doing memory allocations half way through leads to very ugly
> results.

Series applied, thanks for tackling this Jakub.

That whole "if (1) {" was probably just a construct used in order
to create an inner basic block for local variables, nothing more.


Re: [PATCH RFC 00/15] Zero ****s, hugload of hugs <3

2018-11-30 Thread Jarkko Sakkinen
On Fri, Nov 30, 2018 at 09:31:13PM +0100, Matthias Brugger wrote:
> Like John I don't think that the word "fuck" is something we have to ban from
> the source code, but I don't care too much. Anyway, please don't change it to
> something like heck as it might be difficult for non-english speaker to 
> understand.

I make context sensitive better thought updates based on the feedback
that Kees gave. I used RFC tag for a reason.

/Jarkko


Re: [PATCH RFC 00/15] Zero ****s, hugload of hugs <3

2018-11-30 Thread Jarkko Sakkinen
On Fri, Nov 30, 2018 at 09:09:48PM +0100, John Paul Adrian Glaubitz wrote:
> Or just leave it as is because we're all grown up and don't freak out
> when a piece of text contains the word "fuck".
> 
> I still don't understand why people think that the word "fuck" is what
> would keep certain groups from contributing to the Linux kernel. In all
> seriousness, it doesn't.

Are you making a claim that your personal experience, and maybe your
mates, is the objective truth, or am I misunderstanding something?

/Jarkko


Re: [next] bonding: pass link-local packets to bonding master also.

2018-11-30 Thread Vincent Bernat
 ❦ 15 juillet 2018 19:12 -0700, Mahesh Bandewar :

> Commit b89f04c61efe ("bonding: deliver link-local packets with
> skb->dev set to link that packets arrived on") changed the behavior
> of how link-local-multicast packets are processed. The change in
> the behavior broke some legacy use cases where these packets are
> expected to arrive on bonding master device also.

Unfortunately, this doesn't completely restore the previous
functionality as PACKET_ORIGDEV is broken for the copy: the original
interface is lost through the call to netif_rx(). A LLDP daemon
listening to the master interface won't get the original interface like
it was able to before 4.12.

I am a bit lost of what the original patch was trying to achieve. I am
using the following test program:

#v+
#!/usr/bin/env python3

import sys
import socket
import datetime

socket.SOL_PACKET = 263
socket.ETH_P_ALL = 3
socket.PACKET_ORIGDEV = 9

interface = sys.argv[1] if len(sys.argv) > 1 else 'lag1'

s = socket.socket(socket.AF_PACKET,
  socket.SOCK_RAW,
  socket.htons(socket.ETH_P_ALL))
s.bind((interface, 0))
s.setsockopt(socket.SOL_PACKET, socket.PACKET_ORIGDEV, 1)
while True:
data, addrinfo = s.recvfrom(1500)
if addrinfo[2] == socket.PACKET_OUTGOING:
continue
print(f"{datetime.datetime.now().isoformat()}: "
  f"Received {len(data)} bytes from {addrinfo}")
#v-

If I run it with a kernel compiled with the commit before b89f04c61efe
(plus a few more cherry-pick to make it work like ea8ffc0818d8 and
72ccc471e13b), I get:

#v+
2018-11-30T22:20:40.193378: Received 221 bytes from ('eth1', 35020, 2, 1, 
b'RT3\x00\x00\x02')
2018-11-30T22:20:40.194504: Received 221 bytes from ('eth0', 35020, 2, 1, 
b'RT3\x00\x00\x01')
#v-

If I send non link-local packets, I get:

#v+
2018-11-30T22:25:57.300965: Received 98 bytes from ('eth0', 2048, 0, 1, 
b'PT3\x00\x00\x02')
#v-

I am also able to correctly receive link-local packets directly on each
interface. So, it seems everything was working as expected before
b89f04c61efe.
-- 
AWAKE! FEAR! FIRE! FOES! AWAKE!
FEAR! FIRE! FOES!
AWAKE! AWAKE!
-- J. R. R. Tolkien


Re: [PATCH net-next 00/11] nfp: update TX path to enable repr offloads

2018-11-30 Thread David Miller
From: Jakub Kicinski 
Date: Tue, 27 Nov 2018 22:24:47 -0800

> This set starts with three micro optimizations to the TX path.
> The improvement is measurable, but below 1% of CPU utilization.
> 
> Patches 4 - 9 add basic TX offloads to representor devices, like
> checksum offload or TSO, and remove the unnecessary TX lock and
> Qdisc (our representors are software constructs on top of the PF).
> 
> The last 2 patches add more info to error messages - id of command
> which failed and exact location of incorrect TLVs, very useful for
> debugging.

Series applied, thanks Jakub.


Re: [PATCH net-next] net: phy: Also request modules for C45 IDs

2018-11-30 Thread Andrew Lunn
> @@ -606,6 +606,18 @@ struct phy_device *phy_device_create(struct mii_bus 
> *bus, int addr, int phy_id,
>* there's no driver _already_ loaded.
>*/
>   request_module(MDIO_MODULE_PREFIX MDIO_ID_FMT, MDIO_ID_ARGS(phy_id));

This line above is for C22. If you look at phy_bus_match(), it will
perform a match on the c45_ids->device_ids[] if it is a c45 PHY, or
the phy_id if it is c22. So i think we should also have this one or
the other here, not both.

Thanks
Andrew

> + if (c45_ids) {
> + const int num_ids = ARRAY_SIZE(c45_ids->device_ids);
> + int i;
> +
> + for (i = 1; i < num_ids; i++) {
> + if (!(c45_ids->devices_in_package & (1 << i)))
> + continue;
> +
> + request_module(MDIO_MODULE_PREFIX MDIO_ID_FMT,
> +MDIO_ID_ARGS(c45_ids->device_ids[i]));
> + }
> + }
>  
>   device_initialize(&mdiodev->dev);
>  
> -- 
> 2.7.4
> 
> 


Re: [PATCH net-next] tcp: md5: add tcp_md5_needed jump label

2018-11-30 Thread David Miller
From: Eric Dumazet 
Date: Tue, 27 Nov 2018 15:03:21 -0800

> Most linux hosts never setup TCP MD5 keys. We can avoid a
> cache line miss (accessing tp->md5ig_info) on RX and TX
> using a jump label.
> 
> Signed-off-by: Eric Dumazet 

Also applied, thanks.


Re: [PATCH v3 net-next 0/4] tcp: take a bit more care of backlog stress

2018-11-30 Thread David Miller
From: Eric Dumazet 
Date: Tue, 27 Nov 2018 14:41:59 -0800

> While working on the SACK compression issue Jean-Louis Dupond
> reported, we found that his linux box was suffering very hard
> from tail drops on the socket backlog queue.
> 
> First patch hints the compiler about sack flows being the norm.
> 
> Second patch changes non-sack code in preparation of the ack
> compression.
> 
> Third patch fixes tcp_space() to take backlog into account.
> 
> Fourth patch is attempting coalescing when a new packet must
> be added to the backlog queue. Cooking bigger skbs helps
> to keep backlog list smaller and speeds its handling when
> user thread finally releases the socket lock.
> 
> v3: Neal/Yuchung feedback addressed :
>  Do not aggregate if any skb has URG bit set.
>  Do not aggregate if the skbs have different ECE/CWR bits
> 
> v2: added feedback from Neal : tcp: take care of compressed acks in 
> tcp_add_reno_sack() 
> added : tcp: hint compiler about sack flows
>   added : tcp: make tcp_space() aware of socket backlog

Series applied, thanks Eric.

I'll push this out after the build check finishes.


Re: [PATCH net 2/2] nfp: flower: prevent offload if rhashtable insert fails

2018-11-30 Thread David Miller
From: Jakub Kicinski 
Date: Tue, 27 Nov 2018 14:04:12 -0800

> From: John Hurley 
> 
> For flow offload adds, if the rhash insert code fails, the flow will still
> have been offloaded but the reference to it in the driver freed.
> 
> Re-order the offload setup calls to ensure that a flow will only be written
> to FW if a kernel reference is held and stored in the rhashtable. Remove
> this hashtable entry if the offload fails.
> 
> Fixes: c01d0efa5136 ("nfp: flower: use rhashtable for flow caching")
> Signed-off-by: John Hurley 
> Reviewed-by: Pieter Jansen van Vuuren 
> Reviewed-by: Jakub Kicinski 
> ---
> Merge note: there will be a slight merge conflict with net-next
> here:
>  - the first argument of nfp_flower_xmit_flow() changed from 'netdev'
>to 'app';
>  - we only bump the port offload cnt if (port).
> 
> FWIW the net-next version of the patch can be found at:
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/kuba/linux.git
> 
>  17ed95873d51 nfp: flower: prevent offload if rhashtable insert fails
>  d857fc8f472b nfp: flower: release metadata on offload failure
> 
> CC: Stephen Rothwell  # for linux-next

Applied.


Re: [PATCH net 1/2] nfp: flower: release metadata on offload failure

2018-11-30 Thread David Miller
From: Jakub Kicinski 
Date: Tue, 27 Nov 2018 14:04:11 -0800

> From: John Hurley 
> 
> Calling nfp_compile_flow_metadata both assigns a stats context and
> increments a ref counter on (or allocates) a mask id table entry. These
> are released by the nfp_modify_flow_metadata call on flow deletion,
> however, if a flow add fails after metadata is set then the flow entry
> will be deleted but the metadata assignments leaked.
> 
> Add an error path to the flow add offload function to ensure allocated
> metadata is released in the event of an offload fail.
> 
> Fixes: 81f3ddf2547d ("nfp: add control message passing capabilities to flower 
> offloads")
> Signed-off-by: John Hurley 
> Reviewed-by: Pieter Jansen van Vuuren 
> Reviewed-by: Jakub Kicinski 

Applied.


  1   2   3   >