date:20180208

[ANNOUNCE] Linux IPsec workshop 2018

2018-02-08 Thread Steffen Klassert

This is an announce of the Linux IPsec workshop 2018. The workshop
will take place in Dresden, Germany, from 26th to 28th March 2018.

The workshop is invitation based and limited to ca. 20 - 25 IPsec
developers from user and kernel space. We almost reached the limit,
but still have a few spare places. If you think you can contribute
with a discussion topic or presentation, please send me a mail with
your proposal.

The main focus of this workshop is IPsec, but also other network
security/crypto related related topics are welcome. This workshop
should bring together userspace and kernel IPsec developers, as well as
hardware vendors that develop hardware with network security offload
capabilities. The aim is to discuss the ongoing IPsec related development.
It should be also a platform to discuss new ideas and to get feedback
from different points of view.

Re: [PATCH] ath9k: turn on btcoex_enable as default

2018-02-08 Thread Kalle Valo

Kai Heng Feng  writes:

> Hi Felix,
>
>> On Feb 8, 2018, at 7:02 PM, Felix Fietkau  wrote:
>>
>> On 2018-02-08 06:28, Kai-Heng Feng wrote:
>>> Without btcoex_enable, WiFi activies make both WiFi and Bluetooth
>>> unstable if there's a bluetooth connection.
>>>
>>> Enable this option when bt_ant_diversity is disabled.
>>>
>>> BugLink: https://bugs.launchpad.net/bugs/1746164
>>> Signed-off-by: Kai-Heng Feng 
>> I think this might cause regressions on devices that don't have
>> bluetooth. This probably either needs more EEPROM checks, or something
>> to selectively enable it only on affected platforms.
>
> I think it’s better not to use dmi_match. This issue should affect
> more ath9k. And bluetooth peripherals are more than ever now, so it
> would be great to use BT out of the box.

Sure, but we have to make sure that we don't create regressions on
existing systems. For example, did you test this with any system which
don't support btcoex? (just asking, haven't tested this myself)

-- 
Kalle Valo

Re: [PATCH iproute2-next v2 0/3] ip/tunnel: Unify tunnel help message print routines

2018-02-08 Thread Serhey Popovych

> To show only relevant diffs of ip and ipv6 variants help message print
> routines needs to be unified and improved.
> 
> Get rid of print_usage() and usage() wrappers: use single function to
> output help message. As side effect we return -1 from parse function
> instead of calling exit(2) in case of "... tunnel " is
> found.
> 
> Additionally we get pointer to @struct link_util and can directly access
> ->id information to prepare customized help message.
> 
> Split calls to fprintf() two group: one that contains format string with
> specifiers (thus requiring parameters) and another one that does not.
> This helps compiler to optimize calls to fprintf() with fputs() when no
> format specifiers in string. Do not use fputs() directly to keep code
> formatting nice.
> 
> After this series applied following diffs:
> 
>   # diff -urN ip/link_gre{,6}.c
>   # diff -urN ip/link_vti{,6}.c
>   # diff -urN ip/link_ip{,6}tnl.c
> 
> in scope of help print routines reduced to necessary minimum.
> 
> Tested minimally by compiling and executing "ip link help " and
> "ip link add type help" commands. Looks correct.
> 
> See individual patch description for more information.
> 
> Reviews, commands and suggestions are welcome.
> 
> v2
>   Dropped function prefix changes from the series: this is not strictly
>   relared change and should be done separately.

There is v3 available, we have checkpatch and commit description
problems here.

> 
> Thanks,
> Serhii
> 
> Serhey Popovych (3):
>   vti/vti6: Unify vti_print_help()
>   gre/gre6: Unify gre_print_help()
>   iptnl/ip6tnl: Unify iptunnel_print_help()
> 
>  ip/link_gre.c|   73 
>  ip/link_gre6.c   |   74 +
>  ip/link_ip6tnl.c |   45 ++--
>  ip/link_iptnl.c  |   88 
> ++
>  ip/link_vti.c|   42 +++---
>  ip/link_vti6.c   |   33 +---
>  6 files changed, 165 insertions(+), 190 deletions(-)
> 




signature.asc
Description: OpenPGP digital signature

[PATCH iproute2-next v3 2/3] gre/gre6: Unify gre_print_help()

2018-02-08 Thread Serhey Popovych

Reduce diff lines between gre and gre6 help printing code.

Use @struct link_util ->id field to print correct link help: all callers
now pass this data structure to gre_print_help().

Get rid of custom print_usage() and usage() functions and use
gre_print_help() directly, return from function on "... type
" instead of exit(2).

Signed-off-by: Serhey Popovych 
---
 ip/link_gre.c  |   73 +--
 ip/link_gre6.c |   74 ++--
 2 files changed, 67 insertions(+), 80 deletions(-)

diff --git a/ip/link_gre.c b/ip/link_gre.c
index b2573a1..e972a10 100644
--- a/ip/link_gre.c
+++ b/ip/link_gre.c
@@ -23,34 +23,38 @@
 #include "ip_common.h"
 #include "tunnel.h"
 
-static void print_usage(FILE *f)
+static void gre_print_help(struct link_util *lu, int argc, char **argv, FILE 
*f)
 {
fprintf(f,
-   "Usage: ... { gre | gretap | erspan } [ remote ADDR ]\n"
-   "[ local ADDR ]\n"
-   "[ [i|o]seq ]\n"
-   "[ [i|o]key KEY ]\n"
-   "[ [i|o]csum ]\n"
-   "[ ttl TTL ]\n"
-   "[ tos TOS ]\n"
-   "[ [no]pmtudisc ]\n"
-   "[ [no]ignore-df ]\n"
-   "[ dev PHYS_DEV ]\n"
-   "[ noencap ]\n"
-   "[ encap { fou | gue | none } ]\n"
-   "[ encap-sport PORT ]\n"
-   "[ encap-dport PORT ]\n"
-   "[ [no]encap-csum ]\n"
-   "[ [no]encap-csum6 ]\n"
-   "[ [no]encap-remcsum ]\n"
-   "[ external ]\n"
-   "[ fwmark MARK ]\n"
-   "[ erspan_ver version ]\n"
-   "[ erspan IDX ]\n"
-   "[ erspan_dir { ingress | egress } 
]\n"
-   "[ erspan_hwid hwid ]\n"
-   "[ external ]\n"
+   "Usage: ... %-9s [ remote ADDR ]\n",
+   lu->id
+   );
+   fprintf(f,
+   " [ local ADDR ]\n"
+   " [ [i|o]seq ]\n"
+   " [ [i|o]key KEY ]\n"
+   " [ [i|o]csum ]\n"
+   " [ ttl TTL ]\n"
+   " [ tos TOS ]\n"
+   " [ [no]pmtudisc ]\n"
+   " [ [no]ignore-df ]\n"
+   " [ dev PHYS_DEV ]\n"
+   " [ fwmark MARK ]\n"
+   " [ external ]\n"
+   " [ noencap ]\n"
+   " [ encap { fou | gue | none } ]\n"
+   " [ encap-sport PORT ]\n"
+   " [ encap-dport PORT ]\n"
+   " [ [no]encap-csum ]\n"
+   " [ [no]encap-csum6 ]\n"
+   " [ [no]encap-remcsum ]\n"
+   " [ erspan_ver version ]\n"
+   " [ erspan IDX ]\n"
+   " [ erspan_dir { ingress | egress } ]\n"
+   " [ erspan_hwid hwid ]\n"
"\n"
+   );
+   fprintf(f,
"Where: ADDR := { IP_ADDRESS | any }\n"
"   TOS  := { NUMBER | inherit }\n"
"   TTL  := { 1..255 | inherit }\n"
@@ -59,13 +63,6 @@ static void print_usage(FILE *f)
);
 }
 
-static void usage(void) __attribute__((noreturn));
-static void usage(void)
-{
-   print_usage(stderr);
-   exit(-1);
-}
-
 static int gre_parse_opt(struct link_util *lu, int argc, char **argv,
 struct nlmsghdr *n)
 {
@@ -336,8 +333,10 @@ get_failed:
NEXT_ARG();
if (get_u16(_hwid, *argv, 0))
invarg("invalid erspan hwid\n", *argv);
-   } else
-   usage();
+   } else {
+   gre_print_help(lu, argc, argv, stderr);
+   return -1;
+   }
argc--; argv++;
}
 
@@ -517,12 +516,6 @@ static void gre_print_opt(struct link_util *lu, FILE *f,

[PATCH iproute2-next v3 3/3] iptnl/ip6tnl: Unify iptunnel_print_help()

2018-02-08 Thread Serhey Popovych

Reduce diff lines between iptnl and ip6tnl help printing code.

Use @struct link_util ->id field to print correct link help: all callers
now pass this data structure to iptunnel_print_help().

Get rid of custom print_usage() and usage() functions and use
iptunnel_print_help() directly, return from function on "... type
" instead of exit(2).

Signed-off-by: Serhey Popovych 
---
 ip/link_ip6tnl.c |   45 ++--
 ip/link_iptnl.c  |   88 ++
 2 files changed, 66 insertions(+), 67 deletions(-)

diff --git a/ip/link_ip6tnl.c b/ip/link_ip6tnl.c
index 91d7d99..60c7451 100644
--- a/ip/link_ip6tnl.c
+++ b/ip/link_ip6tnl.c
@@ -29,20 +29,26 @@
 
 #define DEFAULT_TNL_HOP_LIMIT  (64)
 
-static void print_usage(FILE *f)
+static void ip6tunnel_print_help(struct link_util *lu, int argc, char **argv,
+FILE *f)
 {
+   const char *mode;
+
+   fprintf(f,
+   "Usage: ... %-6s [ remote ADDR ]\n",
+   lu->id
+   );
fprintf(f,
-   "Usage: ... ip6tnl [ mode { ip6ip6 | ipip6 | any } ]\n"
-   "  [ remote ADDR ]\n"
"  [ local ADDR ]\n"
-   "  [ dev PHYS_DEV ]\n"
"  [ encaplimit ELIM ]\n"
"  [ hoplimit HLIM ]\n"
"  [ tclass TCLASS ]\n"
"  [ flowlabel FLOWLABEL ]\n"
"  [ dscp inherit ]\n"
-   "  [ fwmark MARK ]\n"
"  [ [no]allow-localremote ]\n"
+   "  [ dev PHYS_DEV ]\n"
+   "  [ fwmark MARK ]\n"
+   "  [ external ]\n"
"  [ noencap ]\n"
"  [ encap { fou | gue | none } ]\n"
"  [ encap-sport PORT ]\n"
@@ -50,8 +56,14 @@ static void print_usage(FILE *f)
"  [ [no]encap-csum ]\n"
"  [ [no]encap-csum6 ]\n"
"  [ [no]encap-remcsum ]\n"
-   "  [ external ]\n"
-   "\n"
+   );
+   mode = "{ ip6ip6 | ipip6 | any }";
+   fprintf(f,
+   "  [ mode %s ]\n"
+   "\n",
+   mode
+   );
+   fprintf(f,
"Where: ADDR  := IPV6_ADDRESS\n"
"   ELIM  := { none | 0..255 }(default=%d)\n"
"   HLIM  := 0..255 (default=%d)\n"
@@ -62,13 +74,6 @@ static void print_usage(FILE *f)
);
 }
 
-static void usage(void) __attribute__((noreturn));
-static void usage(void)
-{
-   print_usage(stderr);
-   exit(-1);
-}
-
 static int ip6tunnel_parse_opt(struct link_util *lu, int argc, char **argv,
   struct nlmsghdr *n)
 {
@@ -304,8 +309,10 @@ get_failed:
encapflags &= ~TUNNEL_ENCAP_FLAG_REMCSUM;
} else if (strcmp(*argv, "external") == 0) {
metadata = 1;
-   } else
-   usage();
+   } else {
+   ip6tunnel_print_help(lu, argc, argv, stderr);
+   return -1;
+   }
argc--, argv++;
}
 
@@ -456,12 +463,6 @@ static void ip6tunnel_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb
IFLA_IPTUN_ENCAP_DPORT);
 }
 
-static void ip6tunnel_print_help(struct link_util *lu, int argc, char **argv,
-FILE *f)
-{
-   print_usage(f);
-}
-
 struct link_util ip6tnl_link_util = {
.id = "ip6tnl",
.maxattr = IFLA_IPTUN_MAX,
diff --git a/ip/link_iptnl.c b/ip/link_iptnl.c
index 3e653b7..84117ac 100644
--- a/ip/link_iptnl.c
+++ b/ip/link_iptnl.c
@@ -24,49 +24,51 @@
 #include "ip_common.h"
 #include "tunnel.h"
 
-static void print_usage(FILE *f, int sit)
+static void iptunnel_print_help(struct link_util *lu, int argc, char **argv,
+   FILE *f)
 {
-   const char *type = sit ? "sit " : "ipip";
+   const char *mode;
 
fprintf(f,
-   "Usage: ... %s [ remote ADDR ]\n"
-   "[ local ADDR ]\n"
-   "[ ttl TTL ]\n"
-   "[ tos TOS ]\n"
-   "[ [no]pmtudisc ]\n"
-   "[ dev PHYS_DEV ]\n"
-   "[ 6rd-prefix ADDR ]\n"
-   "[ 6rd-relay_prefix ADDR ]\n"
-   "[ 6rd-reset ]\n"
-   "[ noencap ]\n"
-   "[ encap { fou | gue | none } ]\n"
-   "

[PATCH iproute2-next v3 1/3] vti/vti6: Unify vti_print_help()

2018-02-08 Thread Serhey Popovych

Reduce diff lines between vti and vti6 help printing code.

Use @struct link_util ->id field to print correct link help: all callers
now pass this data structure to vti_print_help().

Get rid of custom print_usage() and usage() functions and use
vti_print_help() directly, return from function on "... type
" instead of exit(2).

Signed-off-by: Serhey Popovych 
---
 ip/link_vti.c  |   42 ++
 ip/link_vti6.c |   34 +++---
 2 files changed, 33 insertions(+), 43 deletions(-)

diff --git a/ip/link_vti.c b/ip/link_vti.c
index 49b87e9..f128e6b 100644
--- a/ip/link_vti.c
+++ b/ip/link_vti.c
@@ -23,29 +23,27 @@
 #include "ip_common.h"
 #include "tunnel.h"
 
-
-static void print_usage(FILE *f)
+static void vti_print_help(struct link_util *lu, int argc, char **argv, FILE 
*f)
 {
fprintf(f,
-   "Usage: ... vti [ remote ADDR ]\n"
-   "   [ local ADDR ]\n"
-   "   [ [i|o]key KEY ]\n"
-   "   [ dev PHYS_DEV ]\n"
-   "   [ fwmark MARK ]\n"
+   "Usage: ... %-4s [ remote ADDR ]\n",
+   lu->id
+   );
+   fprintf(f,
+   "[ local ADDR ]\n"
+   "[ [i|o]key KEY ]\n"
+   "[ dev PHYS_DEV ]\n"
+   "[ fwmark MARK ]\n"
"\n"
-   "Where: ADDR := { IP_ADDRESS }\n"
+   );
+   fprintf(f,
+   "Where: ADDR := { IP%s_ADDRESS }\n"
"   KEY  := { DOTTED_QUAD | NUMBER }\n"
-   "   MARK := { 0x0..0x }\n"
+   "   MARK := { 0x0..0x }\n",
+   ""
);
 }
 
-static void usage(void) __attribute__((noreturn));
-static void usage(void)
-{
-   print_usage(stderr);
-   exit(-1);
-}
-
 static int vti_parse_opt(struct link_util *lu, int argc, char **argv,
 struct nlmsghdr *n)
 {
@@ -147,8 +145,10 @@ get_failed:
NEXT_ARG();
if (get_u32(, *argv, 0))
invarg("invalid fwmark\n", *argv);
-   } else
-   usage();
+   } else {
+   vti_print_help(lu, argc, argv, stderr);
+   return -1;
+   }
argc--; argv++;
}
 
@@ -208,12 +208,6 @@ static void vti_print_opt(struct link_util *lu, FILE *f, 
struct rtattr *tb[])
}
 }
 
-static void vti_print_help(struct link_util *lu, int argc, char **argv,
-   FILE *f)
-{
-   print_usage(f);
-}
-
 struct link_util vti_link_util = {
.id = "vti",
.maxattr = IFLA_VTI_MAX,
diff --git a/ip/link_vti6.c b/ip/link_vti6.c
index d1fbec5..109f3e8 100644
--- a/ip/link_vti6.c
+++ b/ip/link_vti6.c
@@ -24,28 +24,28 @@
 #include "ip_common.h"
 #include "tunnel.h"
 
-static void print_usage(FILE *f)
+static void vti6_print_help(struct link_util *lu, int argc, char **argv,
+   FILE *f)
 {
fprintf(f,
-   "Usage: ... vti6 [ remote ADDR ]\n"
+   "Usage: ... %-4s [ remote ADDR ]\n",
+   lu->id
+   );
+   fprintf(f,
"[ local ADDR ]\n"
"[ [i|o]key KEY ]\n"
"[ dev PHYS_DEV ]\n"
"[ fwmark MARK ]\n"
"\n"
-   "Where: ADDR := { IPV6_ADDRESS }\n"
+   );
+   fprintf(f,
+   "Where: ADDR := { IP%s_ADDRESS }\n"
"   KEY  := { DOTTED_QUAD | NUMBER }\n"
-   "   MARK := { 0x0..0x }\n"
+   "   MARK := { 0x0..0x }\n",
+   "V6"
);
 }
 
-static void usage(void) __attribute__((noreturn));
-static void usage(void)
-{
-   print_usage(stderr);
-   exit(-1);
-}
-
 static int vti6_parse_opt(struct link_util *lu, int argc, char **argv,
  struct nlmsghdr *n)
 {
@@ -153,8 +153,10 @@ get_failed:
NEXT_ARG();
if (get_u32(, *argv, 0))
invarg("invalid fwmark\n", *argv);
-   } else
-   usage();
+   } else {
+   vti6_print_help(lu, argc, argv, stderr);
+   return -1;
+   }
argc--; argv++;
}
 
@@ -214,12 +216,6 @@ static void vti6_print_opt(struct link_util *lu, FILE *f, 
struct rtattr *tb[])
}
 }
 
-static void vti6_print_help(struct link_util *lu, int argc, char **argv,
-   FILE *f)
-{
-   print_usage(f);
-}
-
 struct link_util vti6_link_util = {
.id = "vti6",
.maxattr = IFLA_VTI_MAX,
-- 
1.7.10.4

[PATCH iproute2-next v3 0/3] ip/tunnel: Unify tunnel help message print routines

2018-02-08 Thread Serhey Popovych

To show only relevant diffs of ip and ipv6 variants help message print
routines needs to be unified and improved.

Get rid of print_usage() and usage() wrappers: use single function to
output help message. As side effect we return -1 from parse function
instead of calling exit(2) in case of "... tunnel " is
found.

Additionally we get pointer to @struct link_util and can directly access
->id information to prepare customized help message.

Split calls to fprintf() two group: one that contains format string with
specifiers (thus requiring parameters) and another one that does not.
This helps compiler to optimize calls to fprintf() with fputs() when no
format specifiers in string. Do not use fputs() directly to keep code
formatting nice.

After this series applied following diffs:

  # diff -urN ip/link_gre{,6}.c
  # diff -urN ip/link_vti{,6}.c
  # diff -urN ip/link_ip{,6}tnl.c

in scope of help print routines reduced to necessary minimum.

Tested minimally by compiling and executing "ip link help " and
"ip link add type help" commands. Looks correct.

See individual patch description for more information.

Reviews, commands and suggestions are welcome.

v3
  Address commit message format issues and checkpatch in link_vti6.c

v2
  Dropped function prefix changes from the series: this is not strictly
  relared change and should be done separately.

Thanks,
Serhii

Serhey Popovych (3):
  vti/vti6: Unify vti_print_help()
  gre/gre6: Unify gre_print_help()
  iptnl/ip6tnl: Unify iptunnel_print_help()

 ip/link_gre.c|   73 
 ip/link_gre6.c   |   74 +
 ip/link_ip6tnl.c |   45 ++--
 ip/link_iptnl.c  |   88 ++
 ip/link_vti.c|   42 +++---
 ip/link_vti6.c   |   34 ++---
 6 files changed, 166 insertions(+), 190 deletions(-)

-- 
1.7.10.4

Re: [PATCH iproute2-next 0/3] ip/tunnel: Unify tunnel help message print routines

2018-02-08 Thread Serhey Popovych

David Ahern wrote:
> On 2/8/18 8:35 PM, David Ahern wrote:
>> On 2/8/18 3:50 AM, Serhey Popovych wrote:
>>> To show only relevant diffs of ip and ipv6 variants help message print
>>> routines needs to be unified and improved.
>>>
>>> Get rid of print_usage() and usage() wrappers: use single function to
>>> output help message. As side effect we return -1 from parse function
>>> instead of calling exit(2) in case of "... tunnel " is
>>> found.
>>>
>>> Additionally we get pointer to @struct link_util and can directly access
>>> ->id information to prepare customized help message.
>>>
>>> Split calls to fprintf() two group: one that contains format string with
>>> specifiers (thus requiring parameters) and another one that does not.
>>> This helps compiler to optimize calls to fprintf() with fputs() when no
>>> format specifiers in string. Do not use fputs() directly to keep code
>>> formatting nice.
>>>
>>> After this series applied following diffs:
>>>
>>>   # diff -urN ip/link_gre{,6}.c
>>>   # diff -urN ip/link_vti{,6}.c
>>>   # diff -urN ip/link_ip{,6}tnl.c
>>>
>>> in scope of help print routines reduced to necessary minimum.
>>>
>>> Tested minimally by compiling and executing "ip link help " and
>>> "ip link add type help" commands. Looks correct.
>>>
>>> See individual patch description for more information.
>>
>> Series applied to iproute2-next
>>
>>
> 
> 
> I take that back. Before pushing I noticed you dropped the '6' from the
> name all of the ipv6 print_help functions. Why?
> 

You are right. I probably should not do that change in this series. Will
send v2. There is no '6' in name for link_gre6.c for example so I
decided to drop it from the rest to reduce number of diffs.

Sorry for this.



signature.asc
Description: OpenPGP digital signature

[PATCH iproute2-next v2 3/3] iptnl/ip6tnl: Unify iptunnel_print_help()

2018-02-08 Thread Serhey Popovych

Reduce diff lines between iptnl and ip6tnl help printing code.

Use @struct link_util ->id field to print correct link help: all callers
now pass this data structure to iptunnel_print_help().

Get rid of custom print_usage() and usage() functions and use
ip{,6}tunnel_print_help() directly, return from function on "... type
" instead of exit(2).

Signed-off-by: Serhey Popovych 
---
 ip/link_ip6tnl.c |   45 ++--
 ip/link_iptnl.c  |   88 ++
 2 files changed, 66 insertions(+), 67 deletions(-)

diff --git a/ip/link_ip6tnl.c b/ip/link_ip6tnl.c
index 91d7d99..60c7451 100644
--- a/ip/link_ip6tnl.c
+++ b/ip/link_ip6tnl.c
@@ -29,20 +29,26 @@
 
 #define DEFAULT_TNL_HOP_LIMIT  (64)
 
-static void print_usage(FILE *f)
+static void ip6tunnel_print_help(struct link_util *lu, int argc, char **argv,
+FILE *f)
 {
+   const char *mode;
+
+   fprintf(f,
+   "Usage: ... %-6s [ remote ADDR ]\n",
+   lu->id
+   );
fprintf(f,
-   "Usage: ... ip6tnl [ mode { ip6ip6 | ipip6 | any } ]\n"
-   "  [ remote ADDR ]\n"
"  [ local ADDR ]\n"
-   "  [ dev PHYS_DEV ]\n"
"  [ encaplimit ELIM ]\n"
"  [ hoplimit HLIM ]\n"
"  [ tclass TCLASS ]\n"
"  [ flowlabel FLOWLABEL ]\n"
"  [ dscp inherit ]\n"
-   "  [ fwmark MARK ]\n"
"  [ [no]allow-localremote ]\n"
+   "  [ dev PHYS_DEV ]\n"
+   "  [ fwmark MARK ]\n"
+   "  [ external ]\n"
"  [ noencap ]\n"
"  [ encap { fou | gue | none } ]\n"
"  [ encap-sport PORT ]\n"
@@ -50,8 +56,14 @@ static void print_usage(FILE *f)
"  [ [no]encap-csum ]\n"
"  [ [no]encap-csum6 ]\n"
"  [ [no]encap-remcsum ]\n"
-   "  [ external ]\n"
-   "\n"
+   );
+   mode = "{ ip6ip6 | ipip6 | any }";
+   fprintf(f,
+   "  [ mode %s ]\n"
+   "\n",
+   mode
+   );
+   fprintf(f,
"Where: ADDR  := IPV6_ADDRESS\n"
"   ELIM  := { none | 0..255 }(default=%d)\n"
"   HLIM  := 0..255 (default=%d)\n"
@@ -62,13 +74,6 @@ static void print_usage(FILE *f)
);
 }
 
-static void usage(void) __attribute__((noreturn));
-static void usage(void)
-{
-   print_usage(stderr);
-   exit(-1);
-}
-
 static int ip6tunnel_parse_opt(struct link_util *lu, int argc, char **argv,
   struct nlmsghdr *n)
 {
@@ -304,8 +309,10 @@ get_failed:
encapflags &= ~TUNNEL_ENCAP_FLAG_REMCSUM;
} else if (strcmp(*argv, "external") == 0) {
metadata = 1;
-   } else
-   usage();
+   } else {
+   ip6tunnel_print_help(lu, argc, argv, stderr);
+   return -1;
+   }
argc--, argv++;
}
 
@@ -456,12 +463,6 @@ static void ip6tunnel_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb
IFLA_IPTUN_ENCAP_DPORT);
 }
 
-static void ip6tunnel_print_help(struct link_util *lu, int argc, char **argv,
-FILE *f)
-{
-   print_usage(f);
-}
-
 struct link_util ip6tnl_link_util = {
.id = "ip6tnl",
.maxattr = IFLA_IPTUN_MAX,
diff --git a/ip/link_iptnl.c b/ip/link_iptnl.c
index 3e653b7..84117ac 100644
--- a/ip/link_iptnl.c
+++ b/ip/link_iptnl.c
@@ -24,49 +24,51 @@
 #include "ip_common.h"
 #include "tunnel.h"
 
-static void print_usage(FILE *f, int sit)
+static void iptunnel_print_help(struct link_util *lu, int argc, char **argv,
+   FILE *f)
 {
-   const char *type = sit ? "sit " : "ipip";
+   const char *mode;
 
fprintf(f,
-   "Usage: ... %s [ remote ADDR ]\n"
-   "[ local ADDR ]\n"
-   "[ ttl TTL ]\n"
-   "[ tos TOS ]\n"
-   "[ [no]pmtudisc ]\n"
-   "[ dev PHYS_DEV ]\n"
-   "[ 6rd-prefix ADDR ]\n"
-   "[ 6rd-relay_prefix ADDR ]\n"
-   "[ 6rd-reset ]\n"
-   "[ noencap ]\n"
-   "[ encap { fou | gue | none } ]\n"
-

[PATCH iproute2-next v2 2/3] gre/gre6: Unify gre_print_help()

2018-02-08 Thread Serhey Popovych

Reduce diff lines between gre and gre6 help printing code.

Use @struct link_util ->id field to print correct link help: all callers
now pass this data structure to gre_print_help().

Get rid of custom print_usage() and usage() functions and use
gre_print_help() directly, return from function on "... type
" instead of exit(2).

Signed-off-by: Serhey Popovych 
---
 ip/link_gre.c  |   73 +--
 ip/link_gre6.c |   74 ++--
 2 files changed, 67 insertions(+), 80 deletions(-)

diff --git a/ip/link_gre.c b/ip/link_gre.c
index b2573a1..e972a10 100644
--- a/ip/link_gre.c
+++ b/ip/link_gre.c
@@ -23,34 +23,38 @@
 #include "ip_common.h"
 #include "tunnel.h"
 
-static void print_usage(FILE *f)
+static void gre_print_help(struct link_util *lu, int argc, char **argv, FILE 
*f)
 {
fprintf(f,
-   "Usage: ... { gre | gretap | erspan } [ remote ADDR ]\n"
-   "[ local ADDR ]\n"
-   "[ [i|o]seq ]\n"
-   "[ [i|o]key KEY ]\n"
-   "[ [i|o]csum ]\n"
-   "[ ttl TTL ]\n"
-   "[ tos TOS ]\n"
-   "[ [no]pmtudisc ]\n"
-   "[ [no]ignore-df ]\n"
-   "[ dev PHYS_DEV ]\n"
-   "[ noencap ]\n"
-   "[ encap { fou | gue | none } ]\n"
-   "[ encap-sport PORT ]\n"
-   "[ encap-dport PORT ]\n"
-   "[ [no]encap-csum ]\n"
-   "[ [no]encap-csum6 ]\n"
-   "[ [no]encap-remcsum ]\n"
-   "[ external ]\n"
-   "[ fwmark MARK ]\n"
-   "[ erspan_ver version ]\n"
-   "[ erspan IDX ]\n"
-   "[ erspan_dir { ingress | egress } 
]\n"
-   "[ erspan_hwid hwid ]\n"
-   "[ external ]\n"
+   "Usage: ... %-9s [ remote ADDR ]\n",
+   lu->id
+   );
+   fprintf(f,
+   " [ local ADDR ]\n"
+   " [ [i|o]seq ]\n"
+   " [ [i|o]key KEY ]\n"
+   " [ [i|o]csum ]\n"
+   " [ ttl TTL ]\n"
+   " [ tos TOS ]\n"
+   " [ [no]pmtudisc ]\n"
+   " [ [no]ignore-df ]\n"
+   " [ dev PHYS_DEV ]\n"
+   " [ fwmark MARK ]\n"
+   " [ external ]\n"
+   " [ noencap ]\n"
+   " [ encap { fou | gue | none } ]\n"
+   " [ encap-sport PORT ]\n"
+   " [ encap-dport PORT ]\n"
+   " [ [no]encap-csum ]\n"
+   " [ [no]encap-csum6 ]\n"
+   " [ [no]encap-remcsum ]\n"
+   " [ erspan_ver version ]\n"
+   " [ erspan IDX ]\n"
+   " [ erspan_dir { ingress | egress } ]\n"
+   " [ erspan_hwid hwid ]\n"
"\n"
+   );
+   fprintf(f,
"Where: ADDR := { IP_ADDRESS | any }\n"
"   TOS  := { NUMBER | inherit }\n"
"   TTL  := { 1..255 | inherit }\n"
@@ -59,13 +63,6 @@ static void print_usage(FILE *f)
);
 }
 
-static void usage(void) __attribute__((noreturn));
-static void usage(void)
-{
-   print_usage(stderr);
-   exit(-1);
-}
-
 static int gre_parse_opt(struct link_util *lu, int argc, char **argv,
 struct nlmsghdr *n)
 {
@@ -336,8 +333,10 @@ get_failed:
NEXT_ARG();
if (get_u16(_hwid, *argv, 0))
invarg("invalid erspan hwid\n", *argv);
-   } else
-   usage();
+   } else {
+   gre_print_help(lu, argc, argv, stderr);
+   return -1;
+   }
argc--; argv++;
}
 
@@ -517,12 +516,6 @@ static void gre_print_opt(struct link_util *lu, FILE *f,

[PATCH iproute2-next v2 0/3] ip/tunnel: Unify tunnel help message print routines

2018-02-08 Thread Serhey Popovych

To show only relevant diffs of ip and ipv6 variants help message print
routines needs to be unified and improved.

Get rid of print_usage() and usage() wrappers: use single function to
output help message. As side effect we return -1 from parse function
instead of calling exit(2) in case of "... tunnel " is
found.

Additionally we get pointer to @struct link_util and can directly access
->id information to prepare customized help message.

Split calls to fprintf() two group: one that contains format string with
specifiers (thus requiring parameters) and another one that does not.
This helps compiler to optimize calls to fprintf() with fputs() when no
format specifiers in string. Do not use fputs() directly to keep code
formatting nice.

After this series applied following diffs:

  # diff -urN ip/link_gre{,6}.c
  # diff -urN ip/link_vti{,6}.c
  # diff -urN ip/link_ip{,6}tnl.c

in scope of help print routines reduced to necessary minimum.

Tested minimally by compiling and executing "ip link help " and
"ip link add type help" commands. Looks correct.

See individual patch description for more information.

Reviews, commands and suggestions are welcome.

v2
  Dropped function prefix changes from the series: this is not strictly
  relared change and should be done separately.

Thanks,
Serhii

Serhey Popovych (3):
  vti/vti6: Unify vti_print_help()
  gre/gre6: Unify gre_print_help()
  iptnl/ip6tnl: Unify iptunnel_print_help()

 ip/link_gre.c|   73 
 ip/link_gre6.c   |   74 +
 ip/link_ip6tnl.c |   45 ++--
 ip/link_iptnl.c  |   88 ++
 ip/link_vti.c|   42 +++---
 ip/link_vti6.c   |   33 +---
 6 files changed, 165 insertions(+), 190 deletions(-)

-- 
1.7.10.4

[PATCH iproute2-next v2 1/3] vti/vti6: Unify vti_print_help()

2018-02-08 Thread Serhey Popovych

Reduce diff lines between vti and vti6 help printing code.

Use @struct link_util ->id field to print correct link help:
all callers now pass this data structure to vti_print_help().

Get rid of custom print_usage() and usage() functions and use
vti_print_help() directly, return from function on "... type
" instead of exit(2).

Signed-off-by: Serhey Popovych 
---
 ip/link_vti.c  |   42 ++
 ip/link_vti6.c |   33 ++---
 2 files changed, 32 insertions(+), 43 deletions(-)

diff --git a/ip/link_vti.c b/ip/link_vti.c
index 49b87e9..f128e6b 100644
--- a/ip/link_vti.c
+++ b/ip/link_vti.c
@@ -23,29 +23,27 @@
 #include "ip_common.h"
 #include "tunnel.h"
 
-
-static void print_usage(FILE *f)
+static void vti_print_help(struct link_util *lu, int argc, char **argv, FILE 
*f)
 {
fprintf(f,
-   "Usage: ... vti [ remote ADDR ]\n"
-   "   [ local ADDR ]\n"
-   "   [ [i|o]key KEY ]\n"
-   "   [ dev PHYS_DEV ]\n"
-   "   [ fwmark MARK ]\n"
+   "Usage: ... %-4s [ remote ADDR ]\n",
+   lu->id
+   );
+   fprintf(f,
+   "[ local ADDR ]\n"
+   "[ [i|o]key KEY ]\n"
+   "[ dev PHYS_DEV ]\n"
+   "[ fwmark MARK ]\n"
"\n"
-   "Where: ADDR := { IP_ADDRESS }\n"
+   );
+   fprintf(f,
+   "Where: ADDR := { IP%s_ADDRESS }\n"
"   KEY  := { DOTTED_QUAD | NUMBER }\n"
-   "   MARK := { 0x0..0x }\n"
+   "   MARK := { 0x0..0x }\n",
+   ""
);
 }
 
-static void usage(void) __attribute__((noreturn));
-static void usage(void)
-{
-   print_usage(stderr);
-   exit(-1);
-}
-
 static int vti_parse_opt(struct link_util *lu, int argc, char **argv,
 struct nlmsghdr *n)
 {
@@ -147,8 +145,10 @@ get_failed:
NEXT_ARG();
if (get_u32(, *argv, 0))
invarg("invalid fwmark\n", *argv);
-   } else
-   usage();
+   } else {
+   vti_print_help(lu, argc, argv, stderr);
+   return -1;
+   }
argc--; argv++;
}
 
@@ -208,12 +208,6 @@ static void vti_print_opt(struct link_util *lu, FILE *f, 
struct rtattr *tb[])
}
 }
 
-static void vti_print_help(struct link_util *lu, int argc, char **argv,
-   FILE *f)
-{
-   print_usage(f);
-}
-
 struct link_util vti_link_util = {
.id = "vti",
.maxattr = IFLA_VTI_MAX,
diff --git a/ip/link_vti6.c b/ip/link_vti6.c
index d1fbec5..18018bd 100644
--- a/ip/link_vti6.c
+++ b/ip/link_vti6.c
@@ -24,28 +24,27 @@
 #include "ip_common.h"
 #include "tunnel.h"
 
-static void print_usage(FILE *f)
+static void vti6_print_help(struct link_util *lu, int argc, char **argv, FILE 
*f)
 {
fprintf(f,
-   "Usage: ... vti6 [ remote ADDR ]\n"
+   "Usage: ... %-4s [ remote ADDR ]\n",
+   lu->id
+   );
+   fprintf(f,
"[ local ADDR ]\n"
"[ [i|o]key KEY ]\n"
"[ dev PHYS_DEV ]\n"
"[ fwmark MARK ]\n"
"\n"
-   "Where: ADDR := { IPV6_ADDRESS }\n"
+   );
+   fprintf(f,
+   "Where: ADDR := { IP%s_ADDRESS }\n"
"   KEY  := { DOTTED_QUAD | NUMBER }\n"
-   "   MARK := { 0x0..0x }\n"
+   "   MARK := { 0x0..0x }\n",
+   "V6"
);
 }
 
-static void usage(void) __attribute__((noreturn));
-static void usage(void)
-{
-   print_usage(stderr);
-   exit(-1);
-}
-
 static int vti6_parse_opt(struct link_util *lu, int argc, char **argv,
  struct nlmsghdr *n)
 {
@@ -153,8 +152,10 @@ get_failed:
NEXT_ARG();
if (get_u32(, *argv, 0))
invarg("invalid fwmark\n", *argv);
-   } else
-   usage();
+   } else {
+   vti6_print_help(lu, argc, argv, stderr);
+   return -1;
+   }
argc--; argv++;
}
 
@@ -214,12 +215,6 @@ static void vti6_print_opt(struct link_util *lu, FILE *f, 
struct rtattr *tb[])
}
 }
 
-static void vti6_print_help(struct link_util *lu, int argc, char **argv,
-   FILE *f)
-{
-   print_usage(f);
-}
-
 struct link_util vti6_link_util = {
.id = "vti6",
.maxattr = IFLA_VTI_MAX,
-- 
1.7.10.4

Re: [BUG] mveta: mvneta_txq_bufs_free NULL pointer dereference

2018-02-08 Thread Sean Nyekjær

On 8 December 2017 at 18:41, Simon Guinot  wrote:
> On Sat, Dec 02, 2017 at 12:06:12PM +0100, Sean Nyekjær wrote:
>> Hi
>>
>> >> I'm not sure at all, but could you try to apply
>> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0d63785c6b94b5d2f095f90755825f90eea791f5
>> >> and see if the problem is resolved ?>
>> > I will apply the patch right away, and report back.
>> >
>>
>> The same issue reappeared yesterday, with the patch applied :-)
>
> Hi Sean,
>
> Please can you give us a little bit more of context ?
> What is your setup ? And how can we reproduce the issue ?

Hi Simon

Just a follow up.
I havn't seen this issue for a long time now :-)
I have done 2 things, on a regular basis updated the kernel 4.14.x -
4.15.x and added a heat sink to the ethernet phy and SoC.

/Sean

>
> Thanks.
>
> Simon

Re: net: phy: question about phy_is_internal for generic-phy

2018-02-08 Thread Kunihiko Hayashi

Hi Andrew,

On Thu, 8 Feb 2018 13:51:44 +0100  wrote:

> On Thu, Feb 08, 2018 at 07:09:25PM +0900, Kunihiko Hayashi wrote:
> > Hello,
> > 
> > Is there a way to specify "phy is internal" to generic phy driver,
> > that is, to make phy_is_internal() function available?
> > 
> > I found "phy-is-integrated" DT property in
> > Documentation/devicetree/bindings/net/phy.txt, however, it seems
> > that the property is no effect for generic phy. And I think that
> > the meaning of "integrated" is slightly different from "internal".
> 
> Hi Kunihiko
> 
> Could you explain the bigger picture. Why do you need this?

There are some SoCs that have a built-in phy, and sometimes
these SoCs can choose to use built-in phy or external phy.

In our case, MAC driver needs to set a register to choose whether
to use them. This choice depends on the board implemented on the SoC.
And this built-in phy can be driven with generic phy driver.

If 'struct phy_driver' of a phy driver has the flag with
PHY_IS_INTERNAL, phy_is_internal() returns true,
and the MAC driver can decide a value of the register using it.

drivers/net/phy/phy_device.c:

if (phydrv->flags & PHY_IS_INTERNAL)
phydev->is_internal = true;

include/linux/phy.h:

static inline bool phy_is_internal(struct phy_device *phydev)
{
return phydev->is_internal;
}

Although I can write a new driver with PHY_IS_INTERNAL, or
add a new property to the MAC driver,
I'd like to use generic phy driver if possible.

However, it seems that the generic phy driver doesn't have the way
to express built-in phy with/without even DT.

How can I handle such built-in phy?

Thank you,

---
Best Regards,
Kunihiko Hayashi

tg3 crashes under high load, when using 100Mbits

2018-02-08 Thread Kai Heng Feng


Hi Broadcom folks,

We are now enabling a new platform with tg3 nic, unfortunately we observed  
the bug [1] that dated back to 2015.
I tried commit 4419bb1cedcd ("tg3: Add workaround to restrict 5762 MRRS to  
2048”) but it does’t work.


Do you have any idea how to solve the issue?

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1447664

Kai-Heng

Re: net: thunder: change q_len's type to handle max ring size

2018-02-08 Thread Sunil Kovvuri

On Fri, Feb 9, 2018 at 3:27 AM, Dean Nelson  wrote:
> On 02/08/2018 02:34 PM, David Miller wrote:
>>
>> From: Dean Nelson 
>> Date:
>>
>>> The Cavium thunder nicvf driver supports rx/tx rings of up to 65536
>>> entries per.
>>> The number of entires are stored in the q_len member of struct
>>> q_desc_mem. The
>>> problem is that q_len being a u16, results in 65536 becoming 0.
>>>
>>> In getting pointers to descriptors in the rings, the driver uses q_len
>>> minus 1
>>> as a mask after incrementing the pointer, in order to go back to the
>>> beginning
>>> and not go past the end of the ring.
>>>
>>> With the q_len set to 0 the mask is no longer correct and the driver does
>>> go
>>> beyond the end of the ring, causing various ills. Usually the first thing
>>> that
>>> shows up is a "NETDEV WATCHDOG: enP2p1s0f1 (nicvf): transmit queue 7
>>> timed out"
>>> warning.
>>>
>>> This patch remedies the problem by changing q_len to a u32.
>>>
>>> Signed-off-by: Dean Nelson 
>>
>>
>> Applied, thanks.
>
>
> Thank you!
>
>>
>> Another way to solve this could have been to encode that length
>> as "length - 1"
>
>
> True. I had pondered that, but felt that since changing q_len's type
> didn't add any length to the structure and that it was less impactful
> from a number-of-lines of code changed perspective, I'd opt for this
> route.
>
> Cavium, if you'd prefer this goes the route that Dave just mentioned,
> please let me know and I can make a new patch against what's been
> applied?

Thanks for fixing this and i think the current patch is fine.

Thanks,
Sunil.

>
> Thanks,
> Dean
>
>
>
>
>
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Re: [PATCH] ath9k: turn on btcoex_enable as default

2018-02-08 Thread Kai Heng Feng


Hi Felix,


On Feb 8, 2018, at 7:02 PM, Felix Fietkau  wrote:

On 2018-02-08 06:28, Kai-Heng Feng wrote:

Without btcoex_enable, WiFi activies make both WiFi and Bluetooth
unstable if there's a bluetooth connection.

Enable this option when bt_ant_diversity is disabled.

BugLink: https://bugs.launchpad.net/bugs/1746164
Signed-off-by: Kai-Heng Feng 

I think this might cause regressions on devices that don't have
bluetooth. This probably either needs more EEPROM checks, or something
to selectively enable it only on affected platforms.



I think it’s better not to use dmi_match. This issue should affect more  
ath9k.
And bluetooth peripherals are more than ever now, so it would be great to  
use BT out of the box.


Can you take a look at the bug link, maybe there are other things caused  
the erratic behavior that I didn’t notice?


Kai-Heng


- Felix

Re: [PATCH net V3 1/2] ptr_ring: try vmalloc() when kmalloc() fails

2018-02-08 Thread Jason Wang




On 2018年02月09日 11:56, Michael S. Tsirkin wrote:

On Fri, Feb 09, 2018 at 11:49:12AM +0800, Jason Wang wrote:


On 2018年02月09日 03:17, Michael S. Tsirkin wrote:

On Thu, Feb 08, 2018 at 02:58:40PM +0800, Jason Wang wrote:

On 2018年02月08日 12:45, Michael S. Tsirkin wrote:

On Thu, Feb 08, 2018 at 11:59:24AM +0800, Jason Wang wrote:

This patch switch to use kvmalloc_array() for using a vmalloc()
fallback to help in case kmalloc() fails.

Reported-by:syzbot+e4d4f9ddd42955397...@syzkaller.appspotmail.com
Fixes: 2e0ab8ca83c12 ("ptr_ring: array based FIFO for pointers")

I guess the actual patch is the one that switches tun to ptr_ring.

I think not, since the issue was large allocation.


In fact, I think the actual bugfix is patch 2/2. This specific one
just makes kmalloc less likely to fail but that's
not what syzbot reported.

Agree.


Then I would add this patch on top to make kmalloc less likely to fail.

Ok.


Signed-off-by: Jason Wang
---
include/linux/ptr_ring.h | 10 +-
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
index 1883d61..2af71a7 100644
--- a/include/linux/ptr_ring.h
+++ b/include/linux/ptr_ring.h
@@ -466,7 +466,7 @@ static inline int ptr_ring_consume_batched_bh(struct 
ptr_ring *r,
static inline void **__ptr_ring_init_queue_alloc(unsigned int size, gfp_t 
gfp)
{
-   return kcalloc(size, sizeof(void *), gfp);
+   return kvmalloc_array(size, sizeof(void *), gfp | __GFP_ZERO);
}
static inline void __ptr_ring_set_size(struct ptr_ring *r, int size)

This implies a bunch of limitations on the flags. From kvmalloc_node
docs:

* Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported.
* __GFP_RETRY_MAYFAIL is supported, and it should be used only if kmalloc is
* preferable to the vmalloc fallback, due to visible performance drawbacks.

Fine with all the current users, but if we go this way, please add
documentation so future users don't misuse this API.

I suspect this is somehow a overkill since this means we need sync with
mm/vmalloc changes in the future to keep it synced.


Alternatively, test flags and call kvmalloc or kcalloc?

Similar to the above issue, I would rather leave it as is.

Thanks

How do we prevent someone from inevitably trying to use this with
GFP_ATOMIC?


Well, we somehow can't prevent this even if there's a documentation, that's
why there's a BUG() in vmalloc code I think. And kvmalloc also requires
GFP_KERNEL otherewise another WARN().

So looks like the WARN()/BUG() should be sufficient?

Well vmalloc only triggers when you pass in a huge size.
Let's settle for


There's a:

    BUG_ON(in_interrupt());

in __get_vm_area_node().



/* Not all gfp_t flags (besides GFP_KERNEL) are allowed. See
  * documentation for vmalloc for which of them are legal.
  */


Fine with me.


Thanks

Another thing is kvm

?


Sorry typo.

Thanks

Re: [PATCH net V3 1/2] ptr_ring: try vmalloc() when kmalloc() fails

2018-02-08 Thread Michael S. Tsirkin

On Fri, Feb 09, 2018 at 11:49:12AM +0800, Jason Wang wrote:
> 
> 
> On 2018年02月09日 03:17, Michael S. Tsirkin wrote:
> > On Thu, Feb 08, 2018 at 02:58:40PM +0800, Jason Wang wrote:
> > > On 2018年02月08日 12:45, Michael S. Tsirkin wrote:
> > > > On Thu, Feb 08, 2018 at 11:59:24AM +0800, Jason Wang wrote:
> > > > > This patch switch to use kvmalloc_array() for using a vmalloc()
> > > > > fallback to help in case kmalloc() fails.
> > > > > 
> > > > > Reported-by:syzbot+e4d4f9ddd42955397...@syzkaller.appspotmail.com
> > > > > Fixes: 2e0ab8ca83c12 ("ptr_ring: array based FIFO for pointers")
> > > > I guess the actual patch is the one that switches tun to ptr_ring.
> > > I think not, since the issue was large allocation.
> > > 
> > > > In fact, I think the actual bugfix is patch 2/2. This specific one
> > > > just makes kmalloc less likely to fail but that's
> > > > not what syzbot reported.
> > > Agree.
> > > 
> > > > Then I would add this patch on top to make kmalloc less likely to fail.
> > > Ok.
> > > 
> > > > > Signed-off-by: Jason Wang
> > > > > ---
> > > > >include/linux/ptr_ring.h | 10 +-
> > > > >1 file changed, 5 insertions(+), 5 deletions(-)
> > > > > 
> > > > > diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> > > > > index 1883d61..2af71a7 100644
> > > > > --- a/include/linux/ptr_ring.h
> > > > > +++ b/include/linux/ptr_ring.h
> > > > > @@ -466,7 +466,7 @@ static inline int 
> > > > > ptr_ring_consume_batched_bh(struct ptr_ring *r,
> > > > >static inline void **__ptr_ring_init_queue_alloc(unsigned int 
> > > > > size, gfp_t gfp)
> > > > >{
> > > > > - return kcalloc(size, sizeof(void *), gfp);
> > > > > + return kvmalloc_array(size, sizeof(void *), gfp | __GFP_ZERO);
> > > > >}
> > > > >static inline void __ptr_ring_set_size(struct ptr_ring *r, int 
> > > > > size)
> > > > This implies a bunch of limitations on the flags. From kvmalloc_node
> > > > docs:
> > > > 
> > > >* Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not 
> > > > supported.
> > > >* __GFP_RETRY_MAYFAIL is supported, and it should be used only if 
> > > > kmalloc is
> > > >* preferable to the vmalloc fallback, due to visible performance 
> > > > drawbacks.
> > > > 
> > > > Fine with all the current users, but if we go this way, please add
> > > > documentation so future users don't misuse this API.
> > > I suspect this is somehow a overkill since this means we need sync with
> > > mm/vmalloc changes in the future to keep it synced.
> > > 
> > > > Alternatively, test flags and call kvmalloc or kcalloc?
> > > Similar to the above issue, I would rather leave it as is.
> > > 
> > > Thanks
> > How do we prevent someone from inevitably trying to use this with
> > GFP_ATOMIC?
> > 
> 
> Well, we somehow can't prevent this even if there's a documentation, that's
> why there's a BUG() in vmalloc code I think. And kvmalloc also requires
> GFP_KERNEL otherewise another WARN().
> 
> So looks like the WARN()/BUG() should be sufficient?

Well vmalloc only triggers when you pass in a huge size.
Let's settle for

/* Not all gfp_t flags (besides GFP_KERNEL) are allowed. See
 * documentation for vmalloc for which of them are legal.
 */

> Thanks
> 
> Another thing is kvm

?

Re: [PATCH iproute2-next 0/3] ip/tunnel: Unify tunnel help message print routines

2018-02-08 Thread David Ahern

On 2/8/18 8:35 PM, David Ahern wrote:
> On 2/8/18 3:50 AM, Serhey Popovych wrote:
>> To show only relevant diffs of ip and ipv6 variants help message print
>> routines needs to be unified and improved.
>>
>> Get rid of print_usage() and usage() wrappers: use single function to
>> output help message. As side effect we return -1 from parse function
>> instead of calling exit(2) in case of "... tunnel " is
>> found.
>>
>> Additionally we get pointer to @struct link_util and can directly access
>> ->id information to prepare customized help message.
>>
>> Split calls to fprintf() two group: one that contains format string with
>> specifiers (thus requiring parameters) and another one that does not.
>> This helps compiler to optimize calls to fprintf() with fputs() when no
>> format specifiers in string. Do not use fputs() directly to keep code
>> formatting nice.
>>
>> After this series applied following diffs:
>>
>>   # diff -urN ip/link_gre{,6}.c
>>   # diff -urN ip/link_vti{,6}.c
>>   # diff -urN ip/link_ip{,6}tnl.c
>>
>> in scope of help print routines reduced to necessary minimum.
>>
>> Tested minimally by compiling and executing "ip link help " and
>> "ip link add type help" commands. Looks correct.
>>
>> See individual patch description for more information.
> 
> Series applied to iproute2-next
> 
> 


I take that back. Before pushing I noticed you dropped the '6' from the
name all of the ipv6 print_help functions. Why?

Re: [PATCH net V3 2/2] ptr_ring: fail on large queue size (>64K)

2018-02-08 Thread Jason Wang

On 2018年02月09日 03:09, David Miller wrote:

From: Jason Wang 
Date: Thu,  8 Feb 2018 11:59:25 +0800

We need limit the maximum size of queue, otherwise it may cause
several side effects e.g slab will warn when the size exceeds
KMALLOC_MAX_SIZE. Using KMALLOC_MAX_SIZE still looks too so this patch
tries to limit it to 64K. This value could be revisited if we found a
real case that needs more.

Reported-by: syzbot+e4d4f9ddd42955397...@syzkaller.appspotmail.com
Fixes: 2e0ab8ca83c12 ("ptr_ring: array based FIFO for pointers")
Signed-off-by: Jason Wang 

  ...

@@ -466,6 +468,8 @@ static inline int ptr_ring_consume_batched_bh(struct 
ptr_ring *r,

  static inline void **__ptr_ring_init_queue_alloc(unsigned int size, gfp_t gfp)

  {
+   if (size > PTR_RING_MAX_ALLOC)
+   return NULL;
return kvmalloc_array(size, sizeof(void *), gfp | __GFP_ZERO);
  }

This doesn't limit the allocation to 64K.  It limits it to 64K * sizeof(void *).

Right, will fix this.

Thanks

Re: [PATCH net V3 2/2] ptr_ring: fail on large queue size (>64K)

2018-02-08 Thread Jason Wang




On 2018年02月08日 23:50, Michael S. Tsirkin wrote:

On Thu, Feb 08, 2018 at 03:11:22PM +0800, Jason Wang wrote:

On 2018年02月08日 12:52, Michael S. Tsirkin wrote:

On Thu, Feb 08, 2018 at 11:59:25AM +0800, Jason Wang wrote:

We need limit the maximum size of queue, otherwise it may cause
several side effects e.g slab will warn when the size exceeds
KMALLOC_MAX_SIZE. Using KMALLOC_MAX_SIZE still looks too so this patch
tries to limit it to 64K. This value could be revisited if we found a
real case that needs more.

Reported-by:syzbot+e4d4f9ddd42955397...@syzkaller.appspotmail.com
Fixes: 2e0ab8ca83c12 ("ptr_ring: array based FIFO for pointers")
Signed-off-by: Jason Wang
---
   include/linux/ptr_ring.h | 4 
   1 file changed, 4 insertions(+)

diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
index 2af71a7..5858d48 100644
--- a/include/linux/ptr_ring.h
+++ b/include/linux/ptr_ring.h
@@ -44,6 +44,8 @@ struct ptr_ring {
void **queue;
   };

Seems like a weird location for a define. Either put defines on
top of the file, or near where they are used. I prefer the
second option.

Ok.


+#define PTR_RING_MAX_ALLOC 65536
+

I guess it's an arbitrary number. Seems like a sufficiently large one,
but pls add a comment so readers don't wonder. And please explain what
it does:

/* Callers can create ptr_ring structures with userspace-supplied
   * parameters. This sets a limit on the size to make that usecase
   * safe. If you ever change this, make sure to audit all callers.
   */

Also I think we should generally use either hex 0x1 or (1 << 16).

I agree the number is arbitrary, so I still prefer the KMALLOC_MAX_SIZE
especially consider it was used by pfifo_fast now. Try to limit it to an
arbitrary may break lots of exist setups. E.g just google "txqueuelen
10" can give me a lots of search results.

We can do any kind of optimization on top but not for -net now.

Thanks

Interesting. I have an idea for fixing this, but maybe
for now KMALLOC_MAX_SIZE does make sense. It's unfortunate that
this value is architecture dependent.

The patch still needs code comments though, and fix the math to
use the proper size.



Yes.

Thanks

Re: [PATCH net V3 1/2] ptr_ring: try vmalloc() when kmalloc() fails

2018-02-08 Thread Jason Wang




On 2018年02月09日 03:17, Michael S. Tsirkin wrote:

On Thu, Feb 08, 2018 at 02:58:40PM +0800, Jason Wang wrote:

On 2018年02月08日 12:45, Michael S. Tsirkin wrote:

On Thu, Feb 08, 2018 at 11:59:24AM +0800, Jason Wang wrote:

This patch switch to use kvmalloc_array() for using a vmalloc()
fallback to help in case kmalloc() fails.

Reported-by:syzbot+e4d4f9ddd42955397...@syzkaller.appspotmail.com
Fixes: 2e0ab8ca83c12 ("ptr_ring: array based FIFO for pointers")

I guess the actual patch is the one that switches tun to ptr_ring.

I think not, since the issue was large allocation.


In fact, I think the actual bugfix is patch 2/2. This specific one
just makes kmalloc less likely to fail but that's
not what syzbot reported.

Agree.


Then I would add this patch on top to make kmalloc less likely to fail.

Ok.


Signed-off-by: Jason Wang
---
   include/linux/ptr_ring.h | 10 +-
   1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
index 1883d61..2af71a7 100644
--- a/include/linux/ptr_ring.h
+++ b/include/linux/ptr_ring.h
@@ -466,7 +466,7 @@ static inline int ptr_ring_consume_batched_bh(struct 
ptr_ring *r,
   static inline void **__ptr_ring_init_queue_alloc(unsigned int size, gfp_t 
gfp)
   {
-   return kcalloc(size, sizeof(void *), gfp);
+   return kvmalloc_array(size, sizeof(void *), gfp | __GFP_ZERO);
   }
   static inline void __ptr_ring_set_size(struct ptr_ring *r, int size)

This implies a bunch of limitations on the flags. From kvmalloc_node
docs:

   * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported.
   * __GFP_RETRY_MAYFAIL is supported, and it should be used only if kmalloc is
   * preferable to the vmalloc fallback, due to visible performance drawbacks.

Fine with all the current users, but if we go this way, please add
documentation so future users don't misuse this API.

I suspect this is somehow a overkill since this means we need sync with
mm/vmalloc changes in the future to keep it synced.


Alternatively, test flags and call kvmalloc or kcalloc?

Similar to the above issue, I would rather leave it as is.

Thanks

How do we prevent someone from inevitably trying to use this with
GFP_ATOMIC?



Well, we somehow can't prevent this even if there's a documentation, 
that's why there's a BUG() in vmalloc code I think. And kvmalloc also 
requires GFP_KERNEL otherewise another WARN().


So looks like the WARN()/BUG() should be sufficient?

Thanks

Another thing is kvm

Re: [PATCH iproute2-next 0/3] ip/tunnel: Unify tunnel help message print routines

2018-02-08 Thread David Ahern

On 2/8/18 3:50 AM, Serhey Popovych wrote:
> To show only relevant diffs of ip and ipv6 variants help message print
> routines needs to be unified and improved.
> 
> Get rid of print_usage() and usage() wrappers: use single function to
> output help message. As side effect we return -1 from parse function
> instead of calling exit(2) in case of "... tunnel " is
> found.
> 
> Additionally we get pointer to @struct link_util and can directly access
> ->id information to prepare customized help message.
> 
> Split calls to fprintf() two group: one that contains format string with
> specifiers (thus requiring parameters) and another one that does not.
> This helps compiler to optimize calls to fprintf() with fputs() when no
> format specifiers in string. Do not use fputs() directly to keep code
> formatting nice.
> 
> After this series applied following diffs:
> 
>   # diff -urN ip/link_gre{,6}.c
>   # diff -urN ip/link_vti{,6}.c
>   # diff -urN ip/link_ip{,6}tnl.c
> 
> in scope of help print routines reduced to necessary minimum.
> 
> Tested minimally by compiling and executing "ip link help " and
> "ip link add type help" commands. Looks correct.
> 
> See individual patch description for more information.

Series applied to iproute2-next

tcp_probe tracepoint only when cwnd changes

2018-02-08 Thread Md. Islam

Hi

I'm using tcp_probe tracepoint as [1]. It takes a snapshot each time
tcp_rcv_established() is called. However I need to take a snapshot
only when congestion window changes. Old tcp_probe had full=0 option
to achieve this. Is there a way to achieve this using tcp_probe
tracepoint?

Many thanks
Tamim

1. https://lkml.org/lkml/2017/12/18/126

Re: [bpf-next V3 PATCH 0/5] tools/libbpf improvements and selftests

2018-02-08 Thread Daniel Borkmann

On 02/08/2018 12:48 PM, Jesper Dangaard Brouer wrote:
> While playing with using libbpf for the Suricata project, we had
> issues LLVM >= 4.0.1 generating ELF files that could not be loaded
> with libbpf (tools/lib/bpf/).
> 
> During the troubleshooting phase, I wrote a test program and improved
> the debugging output in libbpf.  I turned this into a selftests
> program, and it also serves as a code example for libbpf in itself.
> 
> I discovered that there are at least three ELF load issues with
> libbpf.  I left them as TODO comments in (tools/testing/selftests/bpf)
> test_libbpf.sh. I've only fixed the load issue with eh_frames, and
> other types of relo-section that does not have exec flags.  We can
> work on the other issues later.

Applied it to bpf tree, thanks Jesper!

pull-request: bpf 2018-02-09

2018-02-08 Thread Daniel Borkmann

Hi David,

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Two fixes for BPF sockmap in order to break up circular map references
   from programs attached to sockmap, and detaching related sockets in
   case of socket close() event. For the latter we get rid of the
   smap_state_change() and plug into ULP infrastructure, which will later
   also be used for additional features anyway such as TX hooks. For the
   second issue, dependency chain is broken up via map release callback
   to free parse/verdict programs, all from John.

2) Fix a libbpf relocation issue that was found while implementing XDP
   support for Suricata project. Issue was that when clang was invoked
   with default target instead of bpf target, then various other e.g.
   debugging relevant sections are added to the ELF file that contained
   relocation entries pointing to non-BPF related sections which libbpf
   trips over instead of skipping them. Test cases for libbpf are added
   as well, from Jesper.

3) Various misc fixes for bpftool and one for libbpf: a small addition
   to libbpf to make sure it recognizes all standard section prefixes.
   Then, the Makefile in bpftool/Documentation is improved to explicitly
   check for rst2man being installed on the system as we otherwise risk
   installing empty man pages; the man page for bpftool-map is corrected
   and a set of missing bash completions added in order to avoid shipping
   bpftool where the completions are only partially working, from Quentin.

4) Fix applying the relocation to immediate load instructions in the
   nfp JIT which were missing a shift, from Jakub.

5) Two fixes for the BPF kernel selftests: handle CONFIG_BPF_JIT_ALWAYS_ON=y
   gracefully in test_bpf.ko module and mark them as FLAG_EXPECTED_FAIL
   in this case; and explicitly delete the veth devices in the two tests
   test_xdp_{meta,redirect}.sh before dismantling the netnses as when
   selftests are run in batch mode, then workqueue to handle destruction
   might not have finished yet and thus veth creation in next test under
   same dev name would fail, from Yonghong.

6) Fix test_kmod.sh to check the test_bpf.ko module path before performing
   an insmod, and fallback to modprobe. Especially the latter is useful
   when having a device under test that has the modules installed instead,
   from Naresh.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Thanks a lot!



The following changes since commit a6b88814ab541d386d793f6df260a3e4d5cccb11:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf (2018-02-04 
16:46:58 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git 

for you to fetch changes up to d977ae593b2d3f9ef0df795eda93f4e6bc92b323:

  Merge branch 'bpf-libbpf-relo-fix-and-tests' (2018-02-09 00:26:28 +0100)


Daniel Borkmann (3):
  Merge branch 'bpf-sockmap-fixes'
  Merge branch 'bpf-misc-nfp-bpftool-doc-fixes'
  Merge branch 'bpf-libbpf-relo-fix-and-tests'

Jakub Kicinski (1):
  nfp: bpf: fix immed relocation for larger offsets

Jesper Dangaard Brouer (5):
  bpf: Sync kernel ABI header with tooling header for bpf_common.h
  tools/libbpf: improve the pr_debug statements to contain section numbers
  selftests/bpf: add test program for loading BPF ELF files
  selftests/bpf: add selftest that use test_libbpf_open
  tools/libbpf: handle issues with bpf ELF objects containing .eh_frames

John Fastabend (3):
  net: add a UID to use for ULP socket assignment
  bpf: sockmap, add sock close() hook to remove socks
  bpf: sockmap, fix leaking maps with attached but not detached progs

Naresh Kamboju (1):
  selftests: bpf: test_kmod.sh: check the module path before insmod

Quentin Monnet (5):
  libbpf: complete list of strings for guessing program type
  tools: bpftool: exit doc Makefile early if rst2man is not available
  tools: bpftool: make syntax for program map update explicit in man page
  tools: bpftool: add bash completion for `bpftool prog load`
  tools: bpftool: add bash completion for cgroup commands

Yonghong Song (2):
  bpf: fix selftests/bpf test_kmod.sh failure when 
CONFIG_BPF_JIT_ALWAYS_ON=y
  tools/bpf: fix batch-mode test failure of test_xdp_redirect.sh

 drivers/net/ethernet/netronome/nfp/nfp_asm.c   |   2 +-
 include/net/tcp.h  |   8 +
 kernel/bpf/sockmap.c   | 187 +
 lib/test_bpf.c |  31 +++-
 net/ipv4/tcp_ulp.c |  59 ++-
 net/tls/tls_main.c |   2 +
 tools/bpf/bpftool/Documentation/Makefile   |   5 +

Re: [PATCH v2] rtlwifi: rtl8192cu: Remove variable self-assignment in rf.c

2018-02-08 Thread Guenter Roeck

On Thu, Feb 8, 2018 at 4:57 PM, Matthias Kaehlcke  wrote:
> In _rtl92c_get_txpower_writeval_by_regulatory() the variable writeVal
> is assigned to itself in an if ... else statement, apparently only to
> document that the branch condition is handled and that a previously read
> value should be returned unmodified. The self-assignment causes clang to
> raise the following warning:
>
> drivers/net/wireless/realtek/rtlwifi/rtl8192cu/rf.c:304:13:
>   error: explicitly assigning value of variable of type 'u32'
> (aka 'unsigned int') to itself [-Werror,-Wself-assign]
>   writeVal = writeVal;
>
> Delete the branch with the self-assignment.
>
> Signed-off-by: Matthias Kaehlcke 

Reviewed-by: Guenter Roeck 

> ---
> Changes in v2:
> - Delete the 'else if' branch entirely
>
>  drivers/net/wireless/realtek/rtlwifi/rtl8192cu/rf.c | 3 ---
>  1 file changed, 3 deletions(-)
>
> diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8192cu/rf.c 
> b/drivers/net/wireless/realtek/rtlwifi/rtl8192cu/rf.c
> index 9cff6bc4049c..cf551785eb08 100644
> --- a/drivers/net/wireless/realtek/rtlwifi/rtl8192cu/rf.c
> +++ b/drivers/net/wireless/realtek/rtlwifi/rtl8192cu/rf.c
> @@ -299,9 +299,6 @@ static void 
> _rtl92c_get_txpower_writeval_by_regulatory(struct ieee80211_hw *hw,
> writeVal = 0x;
> if (rtlpriv->dm.dynamic_txhighpower_lvl == TXHIGHPWRLEVEL_BT1)
> writeVal = writeVal - 0x06060606;
> -   else if (rtlpriv->dm.dynamic_txhighpower_lvl ==
> -TXHIGHPWRLEVEL_BT2)
> -   writeVal = writeVal;
> *(p_outwriteval + rf) = writeVal;
> }
>  }
> --
> 2.16.0.rc1.238.g530d649a79-goog
>

Re: [PATCH v2] rtlwifi: rtl8192cu: Remove variable self-assignment in rf.c

2018-02-08 Thread Larry Finger


On 02/08/2018 06:57 PM, Matthias Kaehlcke wrote:

In _rtl92c_get_txpower_writeval_by_regulatory() the variable writeVal
is assigned to itself in an if ... else statement, apparently only to
document that the branch condition is handled and that a previously read
value should be returned unmodified. The self-assignment causes clang to
raise the following warning:

drivers/net/wireless/realtek/rtlwifi/rtl8192cu/rf.c:304:13:
   error: explicitly assigning value of variable of type 'u32'
 (aka 'unsigned int') to itself [-Werror,-Wself-assign]
   writeVal = writeVal;

Delete the branch with the self-assignment.

Signed-off-by: Matthias Kaehlcke 
---
Changes in v2:
- Delete the 'else if' branch entirely

  drivers/net/wireless/realtek/rtlwifi/rtl8192cu/rf.c | 3 ---
  1 file changed, 3 deletions(-)


Acked-by: Larry Finger 




diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8192cu/rf.c 
b/drivers/net/wireless/realtek/rtlwifi/rtl8192cu/rf.c
index 9cff6bc4049c..cf551785eb08 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8192cu/rf.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8192cu/rf.c
@@ -299,9 +299,6 @@ static void 
_rtl92c_get_txpower_writeval_by_regulatory(struct ieee80211_hw *hw,
writeVal = 0x;
if (rtlpriv->dm.dynamic_txhighpower_lvl == TXHIGHPWRLEVEL_BT1)
writeVal = writeVal - 0x06060606;
-   else if (rtlpriv->dm.dynamic_txhighpower_lvl ==
-TXHIGHPWRLEVEL_BT2)
-   writeVal = writeVal;
*(p_outwriteval + rf) = writeVal;
}
  }

[PATCH v2] rtlwifi: rtl8192cu: Remove variable self-assignment in rf.c

2018-02-08 Thread Matthias Kaehlcke

In _rtl92c_get_txpower_writeval_by_regulatory() the variable writeVal
is assigned to itself in an if ... else statement, apparently only to
document that the branch condition is handled and that a previously read
value should be returned unmodified. The self-assignment causes clang to
raise the following warning:

drivers/net/wireless/realtek/rtlwifi/rtl8192cu/rf.c:304:13:
  error: explicitly assigning value of variable of type 'u32'
(aka 'unsigned int') to itself [-Werror,-Wself-assign]
  writeVal = writeVal;

Delete the branch with the self-assignment.

Signed-off-by: Matthias Kaehlcke 
---
Changes in v2:
- Delete the 'else if' branch entirely

 drivers/net/wireless/realtek/rtlwifi/rtl8192cu/rf.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8192cu/rf.c 
b/drivers/net/wireless/realtek/rtlwifi/rtl8192cu/rf.c
index 9cff6bc4049c..cf551785eb08 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8192cu/rf.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8192cu/rf.c
@@ -299,9 +299,6 @@ static void 
_rtl92c_get_txpower_writeval_by_regulatory(struct ieee80211_hw *hw,
writeVal = 0x;
if (rtlpriv->dm.dynamic_txhighpower_lvl == TXHIGHPWRLEVEL_BT1)
writeVal = writeVal - 0x06060606;
-   else if (rtlpriv->dm.dynamic_txhighpower_lvl ==
-TXHIGHPWRLEVEL_BT2)
-   writeVal = writeVal;
*(p_outwriteval + rf) = writeVal;
}
 }
-- 
2.16.0.rc1.238.g530d649a79-goog

Re: sctp: skb_over_panic on INIT/INIT_ACK packet sending

2018-02-08 Thread Marcelo Ricardo Leitner

Hi,

On Fri, Feb 09, 2018 at 02:38:59AM +0300, Alexey Kodanev wrote:
> Hi,
> 
> Got the following panic when the received INIT packet has a lot of
> address parameters, so that the INIT_ACK chunksize exceeds
> SCTP_MAX_CHUNK_LEN: 
> 
> [  597.804948] skbuff: skb_over_panic: text:ffae06e4 len:120168 
> put:120156
>head:7aa47635 data:d991c2de tail:0x1d640 
> end:0xfec0 dev:
> ...
> [  597.976970] [ cut here ]
> [  598.033408] kernel BUG at net/core/skbuff.c:104!
> [  600.314841] Call Trace:
> [  600.345829]  
> [  600.371639]  ? sctp_packet_transmit+0x2095/0x26d0 [sctp]
> [  600.436934]  skb_put+0x16c/0x200
> [  600.477295]  sctp_packet_transmit+0x2095/0x26d0 [sctp]
> [  600.540630]  ? sctp_packet_config+0x890/0x890 [sctp]
> [  600.601781]  ? __sctp_packet_append_chunk+0x3b4/0xd00 [sctp]
> [  600.671356]  ? sctp_cmp_addr_exact+0x3f/0x90 [sctp]
> [  600.731482]  sctp_outq_flush+0x663/0x30d0 [sctp]
> [  600.788565]  ? sctp_make_init+0xbf0/0xbf0 [sctp]
> [  600.84]  ? sctp_check_transmitted+0x18f0/0x18f0 [sctp]
> [  600.912945]  ? sctp_outq_tail+0x631/0x9d0 [sctp]
> [  600.969936]  sctp_cmd_interpreter.isra.22+0x3be1/0x5cb0 [sctp]
> [  601.041593]  ? sctp_sf_do_5_1B_init+0x85f/0xc30 [sctp]
> [  601.104837]  ? sctp_generate_t1_cookie_event+0x20/0x20 [sctp]
> [  601.175436]  ? sctp_eat_data+0x1710/0x1710 [sctp]
> [  601.233575]  sctp_do_sm+0x182/0x560 [sctp]
> [  601.284328]  ? sctp_has_association+0x70/0x70 [sctp]
> [  601.345586]  ? sctp_rcv+0xef4/0x32f0 [sctp]
> [  601.397478]  ? sctp6_rcv+0xa/0x20 [sctp]
> ...
> 
> Here the chunk size for INIT_ACK packet may become too big, mostly
> because of the cookielen. Tried on local machine so both addrlen
> and cookielen was big: addrlen 39960 cookielen 80168.

Makes sense.

>  
> Not sure how to fix it correctly, may be we need a check here
> or add it to sctp_make_control()?

_sctp_make_chunk seems a good place for it, before trying to allocate
the skb. Any chunk bigger than SCTP_MAX_CHUNK_LEN is bogus and should
not be allowed.

> 
> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> index 793b05e..c27564c 100644
> --- a/net/sctp/sm_make_chunk.c
> +++ b/net/sctp/sm_make_chunk.c
> @@ -475,6 +475,9 @@ struct sctp_chunk *sctp_make_init_ack(const struct 
> sctp_association *asoc,
> if (num_ext)
> chunksize += SCTP_PAD4(sizeof(ext_param) + num_ext);
>  
> +   if (chunksize > SCTP_MAX_CHUNK_LEN)
> +   goto nomem_chunk;
> +
> /* Now allocate and fill out the chunk.  */
> retval = sctp_make_control(asoc, SCTP_CID_INIT_ACK, 0, chunksize, 
> gfp);
> if (!retval)
> 
> And for INIT as well...
> 
> Otherwise this chunk goes to skb_packet_transmit() -> sctp_packet_pack()
> where panic on
> 
>   skb_put_data(nskb, chunk->skb->data, chunk->skb->len);
> 
> nskb (head skb) was previously allocated with packet->size that looks
> like getting the chunk size from u16 chunk_hdr->length.

Huh, yes.

Thanks,
Marcelo

> 
> Thanks,
> Alexey
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

sctp: skb_over_panic on INIT/INIT_ACK packet sending

2018-02-08 Thread Alexey Kodanev

Hi,

Got the following panic when the received INIT packet has a lot of
address parameters, so that the INIT_ACK chunksize exceeds
SCTP_MAX_CHUNK_LEN: 

[  597.804948] skbuff: skb_over_panic: text:ffae06e4 len:120168 
put:120156
   head:7aa47635 data:d991c2de tail:0x1d640 
end:0xfec0 dev:
...
[  597.976970] [ cut here ]
[  598.033408] kernel BUG at net/core/skbuff.c:104!
[  600.314841] Call Trace:
[  600.345829]  
[  600.371639]  ? sctp_packet_transmit+0x2095/0x26d0 [sctp]
[  600.436934]  skb_put+0x16c/0x200
[  600.477295]  sctp_packet_transmit+0x2095/0x26d0 [sctp]
[  600.540630]  ? sctp_packet_config+0x890/0x890 [sctp]
[  600.601781]  ? __sctp_packet_append_chunk+0x3b4/0xd00 [sctp]
[  600.671356]  ? sctp_cmp_addr_exact+0x3f/0x90 [sctp]
[  600.731482]  sctp_outq_flush+0x663/0x30d0 [sctp]
[  600.788565]  ? sctp_make_init+0xbf0/0xbf0 [sctp]
[  600.84]  ? sctp_check_transmitted+0x18f0/0x18f0 [sctp]
[  600.912945]  ? sctp_outq_tail+0x631/0x9d0 [sctp]
[  600.969936]  sctp_cmd_interpreter.isra.22+0x3be1/0x5cb0 [sctp]
[  601.041593]  ? sctp_sf_do_5_1B_init+0x85f/0xc30 [sctp]
[  601.104837]  ? sctp_generate_t1_cookie_event+0x20/0x20 [sctp]
[  601.175436]  ? sctp_eat_data+0x1710/0x1710 [sctp]
[  601.233575]  sctp_do_sm+0x182/0x560 [sctp]
[  601.284328]  ? sctp_has_association+0x70/0x70 [sctp]
[  601.345586]  ? sctp_rcv+0xef4/0x32f0 [sctp]
[  601.397478]  ? sctp6_rcv+0xa/0x20 [sctp]
...

Here the chunk size for INIT_ACK packet may become too big, mostly
because of the cookielen. Tried on local machine so both addrlen
and cookielen was big: addrlen 39960 cookielen 80168.
 
Not sure how to fix it correctly, may be we need a check here
or add it to sctp_make_control()?

diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index 793b05e..c27564c 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -475,6 +475,9 @@ struct sctp_chunk *sctp_make_init_ack(const struct 
sctp_association *asoc,
if (num_ext)
chunksize += SCTP_PAD4(sizeof(ext_param) + num_ext);
 
+   if (chunksize > SCTP_MAX_CHUNK_LEN)
+   goto nomem_chunk;
+
/* Now allocate and fill out the chunk.  */
retval = sctp_make_control(asoc, SCTP_CID_INIT_ACK, 0, chunksize, gfp);
if (!retval)

And for INIT as well...

Otherwise this chunk goes to skb_packet_transmit() -> sctp_packet_pack()
where panic on

  skb_put_data(nskb, chunk->skb->data, chunk->skb->len);

nskb (head skb) was previously allocated with packet->size that looks
like getting the chunk size from u16 chunk_hdr->length.

Thanks,
Alexey

Re: [RFC PATCH 00/24] Introducing AF_XDP support

2018-02-08 Thread Willem de Bruijn

On Wed, Feb 7, 2018 at 4:28 PM, Björn Töpel  wrote:
> 2018-02-07 16:54 GMT+01:00 Willem de Bruijn :
>>> We realized, a bit late maybe, that 24 patches is a bit mouthful, so
>>> let me try to make it more palatable.
>>
>> Overall, this approach looks great to me.
>>
>
> Yay! :-)
>
>> The patch set incorporates all the feedback from AF_PACKET V4.
>> At this point I don't have additional high-level interface comments.
>>
>
> I have a thought on the socket API. Now, we're registering buffer
> memory *to* the kernel, but mmap:ing the Rx/Tx rings *from* the
> kernel. I'm leaning towards removing the mmap call, in favor of
> registering the rings to kernel analogous to the XDP_MEM_REG socket
> option. We wont guarantee physical contiguous memory for the rings,
> but I think we can live with that. Thoughts?
>
>> As you point out, 24 patches and nearly 6000 changed lines is
>> quite a bit to ingest. Splitting up in smaller patch sets will help
>> give more detailed implementation feedback.
>>
>> The frame pool and device driver changes are largely independent
>> from AF_XDP and probably should be resolved first (esp. the
>> observed regresssion even without AF_XDP).
>>
>
> Yeah, the regression is unacceptable.
>
> Another way is starting with the patches without zero-copy first
> (i.e. the copy path), and later add the driver modifications. That
> would be the first 7 patches.
>
>> As you suggest, it would be great if the need for a separate
>> xsk_packet_array data structure can be avoided.
>>
>
> Yes, we'll address that!
>
>> Since frames from the same frame pool can be forwarded between
>> multiple device ports and thus AF_XDP sockets, that should perhaps
>> be a separate object independent from the sockets. This comment
>> hints at the awkward situation if tied to a descriptor pair:
>>
>>> +   /* Check if umem is from this socket, if so do not make
>>> +* circular references.
>>> +*/
>>
>> Since this is in principle just a large shared memory area, could
>> it reuse existing BPF map logic?
>>
>
> Hmm, care to elaborate on your thinking here?

On second thought, that is not workable. I was thinking of reusing
existing mmap support for maps, but that is limited to the perf ring
buffer.

>> More extreme, and perhaps unrealistic, is if the descriptor ring
>> could similarly be a BPF map and the Rx XDP program directly
>> writes the descriptor, instead of triggering xdp_do_xsk_redirect.
>> As we discussed before, this would avoid the need to specify a
>> descriptor format upfront.
>
> Having the XDP program writeback the descriptor to user space ring is
> really something that would be useful (writing a virtio-net
> descriptors...).

Yes, that's a great use case. This ties in with Jason Wang's
presentation on XDP with tap and virtio, too.

https://www.netdevconf.org/2.2/slides/wang-vmperformance-talk.pdf

> I need to think a bit more about this. :-) Please
> share your ideas!
>
> Thanks for looking into the patches!
>
>
> Björn

Re: [PATCH net 1/1 v4] rtnetlink: require unique netns identifier

2018-02-08 Thread Christian Brauner

On Thu, Feb 8, 2018 at 8:33 PM, David Miller  wrote:
> From: Christian Brauner 
> Date: Wed,  7 Feb 2018 13:53:20 +0100
>
>> Since we've added support for IFLA_IF_NETNSID for RTM_{DEL,GET,SET,NEW}LINK
>> it is possible for userspace to send us requests with three different
>> properties to identify a target network namespace. This affects at least
>> RTM_{NEW,SET}LINK. Each of them could potentially refer to a different
>> network namespace which is confusing. For legacy reasons the kernel will
>> pick the IFLA_NET_NS_PID property first and then look for the
>> IFLA_NET_NS_FD property but there is no reason to extend this type of
>> behavior to network namespace ids. The regression potential is quite
>> minimal since the rtnetlink requests in question either won't allow
>> IFLA_IF_NETNSID requests before 4.16 is out (RTM_{NEW,SET}LINK) or don't
>> support IFLA_NET_NS_{PID,FD} (RTM_{DEL,GET}LINK) in the first place.
>>
>> Signed-off-by: Christian Brauner 
>
> Applied, thanks Christian.

Thanks for applying, David.

Re: [PATCH iproute2 v1] ip netns: allow negative nsid

2018-02-08 Thread Christian Brauner

On Thu, Feb 8, 2018 at 5:01 PM, Stephen Hemminger
 wrote:
> On Tue,  6 Feb 2018 19:39:31 +0100
> Christian Brauner  wrote:
>
>> If the kernel receives a negative nsid it will automatically assign the
>> next available nsid. In this case alloc_netid() will set min and max to
>> 0 for ird_alloc(). And when max == 0 idr_alloc() will interpret this as
>> the maxium range, i.e. specific to nsids it will try to find an id in
>> the range [0,INT_MAX). This is intentionally supported in the kernel for
>> nsids. Commit acbe9118ce8086f765ffb0da15f80c7c01a8903a regressed ip
>> netns in that respect although previously the use-case was either
>> accidentally supported or opaquely supported such that it triggered the
>> original commit. From what I can gather it went as follows before:
>> atoi() was called with a string indicating a negative value which caused
>> it to return -1 which was passed to the kernel. Let's make it less
>> opaque by introducing the keyword "auto":
>>
>> ip netns set  auto
>>
>> will cause nsid to be set to -1 and the kernel will select an available
>> nsid.
>>
>> Signed-off-by: Christian Brauner 
>
> Applied thank you.
> I did have to fix spelling and format of commit reference in
> the commit description. If you run checkpatch on patches to
> iproute you would have caught that.

Ah, sorry about that. Didn't think about applying checkpatch to iproute
patches as well. Won't happen again!

Thanks for applying.
Christian

linux-next: Signed-off-by missing for commit in the net tree

2018-02-08 Thread Stephen Rothwell

Hi all,

Commit

  55b3280d1e47 ("tipc: fix skb truesize/datasize ratio control")

is missing a Signed-off-by from its author.

-- 
Cheers,
Stephen Rothwell

Re: net: thunder: change q_len's type to handle max ring size

2018-02-08 Thread Dean Nelson

On 02/08/2018 02:34 PM, David Miller wrote:

From: Dean Nelson 
Date:

The Cavium thunder nicvf driver supports rx/tx rings of up to 65536 entries per.
The number of entires are stored in the q_len member of struct q_desc_mem. The
problem is that q_len being a u16, results in 65536 becoming 0.

In getting pointers to descriptors in the rings, the driver uses q_len minus 1
as a mask after incrementing the pointer, in order to go back to the beginning
and not go past the end of the ring.

With the q_len set to 0 the mask is no longer correct and the driver does go
beyond the end of the ring, causing various ills. Usually the first thing that
shows up is a "NETDEV WATCHDOG: enP2p1s0f1 (nicvf): transmit queue 7 timed out"
warning.

This patch remedies the problem by changing q_len to a u32.

Signed-off-by: Dean Nelson 

Applied, thanks.

Thank you!

Another way to solve this could have been to encode that length
as "length - 1"

True. I had pondered that, but felt that since changing q_len's type
didn't add any length to the structure and that it was less impactful
from a number-of-lines of code changed perspective, I'd opt for this
route.

Cavium, if you'd prefer this goes the route that Dave just mentioned,
please let me know and I can make a new patch against what's been
applied?

Thanks,
Dean

Re: [Patch net v2] ipt_CLUSTERIP: fix a refcount bug in clusterip_config_find_get()

2018-02-08 Thread Florian Westphal

Cong Wang  wrote:
> In clusterip_config_find_get() we hold RCU read lock so it could
> run concurrently with clusterip_config_entry_put(), as a result,
> the refcnt could go back to 1 from 0, which leads to a double
> list_del()... Just replace refcount_inc() with
> refcount_inc_not_zero(), as for c->refcount.

Reviewed-by: Florian Westphal

[Patch net v2] ipt_CLUSTERIP: fix a refcount bug in clusterip_config_find_get()

2018-02-08 Thread Cong Wang

In clusterip_config_find_get() we hold RCU read lock so it could
run concurrently with clusterip_config_entry_put(), as a result,
the refcnt could go back to 1 from 0, which leads to a double
list_del()... Just replace refcount_inc() with
refcount_inc_not_zero(), as for c->refcount.

Fixes: d73f33b16883 ("netfilter: CLUSTERIP: RCU conversion")
Cc: Eric Dumazet 
Cc: Pablo Neira Ayuso 
Cc: Florian Westphal 
Signed-off-by: Cong Wang 
---
 net/ipv4/netfilter/ipt_CLUSTERIP.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c 
b/net/ipv4/netfilter/ipt_CLUSTERIP.c
index 1ff72b87a066..4b02ab39ebc5 100644
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -154,8 +154,12 @@ clusterip_config_find_get(struct net *net, __be32 
clusterip, int entry)
 #endif
if (unlikely(!refcount_inc_not_zero(>refcount)))
c = NULL;
-   else if (entry)
-   refcount_inc(>entries);
+   else if (entry) {
+   if (unlikely(!refcount_inc_not_zero(>entries))) {
+   clusterip_config_put(c);
+   c = NULL;
+   }
+   }
}
rcu_read_unlock_bh();
 
-- 
2.13.0

Re: [PATCH v2 1/2] net, can, ifi: fix "write buffer full" error

2018-02-08 Thread Marek Vasut

On 02/08/2018 10:33 PM, Marc Kleine-Budde wrote:
> On 02/08/2018 08:22 PM, Marek Vasut wrote:
>> On 02/08/2018 03:46 PM, Marc Kleine-Budde wrote:
>>> On 02/08/2018 07:47 AM, Heiko Schocher wrote:
 the driver reads in the ISR first the IRQpending register,
 and clears after that in a write *all* bits in it.

 It could happen that the isr register raise bits between
 this 2 register accesses, which leads in lost bits ...

 In case it clears "TX message sent successfully", the driver
 never sends any Tx data, and buffers to userspace run over.

 Fixed this:
 clear only the bits in the IRQpending register, the
 driver had read.

 Signed-off-by: Heiko Schocher 
 Reviewed-by: Marek Vasut 
>>>
>>> Applied both to linux-can.
>>
>> Can you also apply them to stable, so they get into 4.9.x etc ?
> 
> I've already added stable on Cc, so they will be picked up by the stable
> maintainers.

Thanks!

-- 
Best regards,
Marek Vasut

Re: [PATCH v2 1/2] net, can, ifi: fix "write buffer full" error

2018-02-08 Thread Marc Kleine-Budde

On 02/08/2018 08:22 PM, Marek Vasut wrote:
> On 02/08/2018 03:46 PM, Marc Kleine-Budde wrote:
>> On 02/08/2018 07:47 AM, Heiko Schocher wrote:
>>> the driver reads in the ISR first the IRQpending register,
>>> and clears after that in a write *all* bits in it.
>>>
>>> It could happen that the isr register raise bits between
>>> this 2 register accesses, which leads in lost bits ...
>>>
>>> In case it clears "TX message sent successfully", the driver
>>> never sends any Tx data, and buffers to userspace run over.
>>>
>>> Fixed this:
>>> clear only the bits in the IRQpending register, the
>>> driver had read.
>>>
>>> Signed-off-by: Heiko Schocher 
>>> Reviewed-by: Marek Vasut 
>>
>> Applied both to linux-can.
> 
> Can you also apply them to stable, so they get into 4.9.x etc ?

I've already added stable on Cc, so they will be picked up by the stable
maintainers.

Marc

-- 
Pengutronix e.K.  | Marc Kleine-Budde   |
Industrial Linux Solutions| Phone: +49-231-2826-924 |
Vertretung West/Dortmund  | Fax:   +49-5121-206917- |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |



signature.asc
Description: OpenPGP digital signature

Re: [PATCH] net: Whitelist the skbuff_head_cache "cb" field

2018-02-08 Thread David Miller

From: Kees Cook 
Date: Fri, 9 Feb 2018 08:01:12 +1100

> Cool, thanks. And just to be clear, if it's not already obvious, this
> patch needs kmem_cache_create_usercopy() which just landed in Linus's
> tree last week, in case you've not merged yet.

Understood, and 'net' has it.

Re: [PATCH] net: Whitelist the skbuff_head_cache "cb" field

2018-02-08 Thread Kees Cook

On Fri, Feb 9, 2018 at 7:16 AM, David Miller  wrote:
> From: Kees Cook 
> Date: Wed, 7 Feb 2018 17:44:38 -0800
>
>> Most callers of put_cmsg() use a "sizeof(foo)" for the length argument.
>> Within put_cmsg(), a copy_to_user() call is made with a dynamic size, as a
>> result of the cmsg header calculations. This means that hardened usercopy
>> will examine the copy, even though it was technically a fixed size and
>> should be implicitly whitelisted. All the put_cmsg() calls being built
>> from values in skbuff_head_cache are coming out of the protocol-defined
>> "cb" field, so whitelist this field entirely instead of creating per-use
>> bounce buffers, for which there are concerns about performance.
>>
>> Original report was:
>  ...
>> Reported-by: syzbot+e2d6cfb305e9f3911...@syzkaller.appspotmail.com
>> Fixes: 6d07d1cd300f ("usercopy: Restrict non-usercopy caches to size 0")
>> Signed-off-by: Kees Cook 
>> ---
>> I tried the inlining, it was awful. Splitting put_cmsg() was awful. So,
>> instead, whitelist the "cb" field as the least bad option if bounce
>> buffers are unacceptable. Dave, do you want to take this through net, or
>> should I take it through the usercopy tree?
>
> Thanks Kees, I'll take this through my 'net' tree.

Cool, thanks. And just to be clear, if it's not already obvious, this
patch needs kmem_cache_create_usercopy() which just landed in Linus's
tree last week, in case you've not merged yet.

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH net v3] net: phy: fix phy_start to consider PHY_IGNORE_INTERRUPT

2018-02-08 Thread David Miller

From: Heiner Kallweit 
Date: Thu, 8 Feb 2018 21:01:48 +0100

> This condition wasn't adjusted when PHY_IGNORE_INTERRUPT (-2) was added
> long ago. In case of PHY_IGNORE_INTERRUPT the MAC interrupt indicates
> also PHY state changes and we should do what the symbol says.
> 
> Fixes: 84a527a41f38 ("net: phylib: fix interrupts re-enablement in phy_start")
> Signed-off-by: Heiner Kallweit 

Applied and queued up for -stable, thank you.

Re: net: thunder: change q_len's type to handle max ring size

2018-02-08 Thread David Miller

From: Dean Nelson 
Date: 

> The Cavium thunder nicvf driver supports rx/tx rings of up to 65536 entries 
> per.
> The number of entires are stored in the q_len member of struct q_desc_mem. The
> problem is that q_len being a u16, results in 65536 becoming 0.
> 
> In getting pointers to descriptors in the rings, the driver uses q_len minus 1
> as a mask after incrementing the pointer, in order to go back to the beginning
> and not go past the end of the ring.
> 
> With the q_len set to 0 the mask is no longer correct and the driver does go
> beyond the end of the ring, causing various ills. Usually the first thing that
> shows up is a "NETDEV WATCHDOG: enP2p1s0f1 (nicvf): transmit queue 7 timed 
> out"
> warning.
> 
> This patch remedies the problem by changing q_len to a u32.
> 
> Signed-off-by: Dean Nelson 

Applied, thanks.

Another way to solve this could have been to encode that length
as "length - 1"

Re: pull-request: wireless-drivers-next 2018-02-08

2018-02-08 Thread David Miller

From: Kalle Valo 
Date: Thu, 08 Feb 2018 19:54:15 +0200

> first set of fixes for 4.16, unusually many when the merge window hasn't
> even closed yet. Especially the ssb fix is important so I hope there's
> still time to get this to 4.16-rc1. As you can see from the diffstat
> there's one PCI id addition but that has been acked by Bjorn.
> 
> Please let me know if you have any problems.

Device ID additions are always ok regardless of what state of development
we are in.

Pulled, thanks Kalle.

Re: [net 1/1] tipc: fix skb truesize/datasize ratio control

2018-02-08 Thread David Miller

From: Jon Maloy 
Date: Thu, 8 Feb 2018 17:16:25 +0100

> From: Hoang Le 
> 
> In commit d618d09a68e4 ("tipc: enforce valid ratio between skb truesize
> and contents") we introduced a test for ensuring that the condition
> truesize/datasize <= 4 is true for a received buffer. Unfortunately this
> test has two problems.
> 
> - Because of the integer arithmetics the test
>   if (skb->truesize / buf_roundup_len(skb) > 4) will miss all
>   ratios [4 < ratio < 5], which was not the intention.
> - The buffer returned by skb_copy() inherits skb->truesize of the
>   original buffer, which doesn't help the situation at all.
> 
> In this commit, we change the ratio condition and replace skb_copy()
> with a call to skb_copy_expand() to finally get this right.
> 
> Acked-by: Jon Maloy 
> Signed-off-by: Jon Maloy 

Applied, thanks Jon.

Re: [PATCH net] net/sched: cls_u32: fix cls_u32 on filter replace

2018-02-08 Thread David Miller

From: Ivan Vecera 
Date: Thu,  8 Feb 2018 16:10:39 +0100

> The following sequence is currently broken:
> 
>  # tc qdisc add dev foo ingress
>  # tc filter replace dev foo protocol all ingress \
>u32 match u8 0 0 action mirred egress mirror dev bar1
>  # tc filter replace dev foo protocol all ingress \
>handle 800::800 pref 49152 \
>u32 match u8 0 0 action mirred egress mirror dev bar2
>  Error: cls_u32: Key node flags do not match passed flags.
>  We have an error talking to the kernel, -1
> 
> The error comes from u32_change() when comparing new and
> existing flags. The existing ones always contains one of
> TCA_CLS_FLAGS_{,NOT}_IN_HW flag depending on offloading state.
> These flags cannot be passed from userspace so the condition
> (n->flags != flags) in u32_change() always fails.
> 
> Fix the condition so the flags TCA_CLS_FLAGS_NOT_IN_HW and
> TCA_CLS_FLAGS_IN_HW are not taken into account.
> 
> Fixes: 24d3dc6d27ea ("net/sched: cls_u32: Reflect HW offload status")
> Signed-off-by: Ivan Vecera 

Ugh, private kernel flags are always troublesome for this reason.

Applied and queued up for -stable.

Re: [PATCH] mpls, nospec: Sanitize array index in mpls_label_ok()

2018-02-08 Thread David Miller

From: Dan Williams 
Date: Wed, 07 Feb 2018 22:34:24 -0800

> mpls_label_ok() validates that the 'platform_label' array index from a
> userspace netlink message payload is valid. Under speculation the
> mpls_label_ok() result may not resolve in the CPU pipeline until after
> the index is used to access an array element. Sanitize the index to zero
> to prevent userspace-controlled arbitrary out-of-bounds speculation, a
> precursor for a speculative execution side channel vulnerability.
> 
> Cc: 
> Cc: "David S. Miller" 
> Cc: Eric W. Biederman 
> Signed-off-by: Dan Williams 

Applied, thank you.

Re: [PATCH V2 net-next] rds: tcp: use rds_destroy_pending() to synchronize netns/module teardown and rds connection/workq management

2018-02-08 Thread David Miller

From: Sowmini Varadhan 
Date: Thu, 8 Feb 2018 15:19:05 -0500

> I was just checking the patchq for the fate of this patch and find
> it marked "superseded" in http://patchwork.ozlabs.org/patch/868902/
> 
> I'm intrigued, superseded by what?

My bad, I'll apply this.  I may have mis-clicked a button or something
like that.

Thanks.

Re: [PATCH V2 net-next] rds: tcp: use rds_destroy_pending() to synchronize netns/module teardown and rds connection/workq management

2018-02-08 Thread Sowmini Varadhan


I was just checking the patchq for the fate of this patch and find
it marked "superseded" in http://patchwork.ozlabs.org/patch/868902/

I'm intrigued, superseded by what?

--Sowmini

Re: [PATCH] net: Whitelist the skbuff_head_cache "cb" field

2018-02-08 Thread David Miller

From: Kees Cook 
Date: Wed, 7 Feb 2018 17:44:38 -0800

> Most callers of put_cmsg() use a "sizeof(foo)" for the length argument.
> Within put_cmsg(), a copy_to_user() call is made with a dynamic size, as a
> result of the cmsg header calculations. This means that hardened usercopy
> will examine the copy, even though it was technically a fixed size and
> should be implicitly whitelisted. All the put_cmsg() calls being built
> from values in skbuff_head_cache are coming out of the protocol-defined
> "cb" field, so whitelist this field entirely instead of creating per-use
> bounce buffers, for which there are concerns about performance.
> 
> Original report was:
 ...
> Reported-by: syzbot+e2d6cfb305e9f3911...@syzkaller.appspotmail.com
> Fixes: 6d07d1cd300f ("usercopy: Restrict non-usercopy caches to size 0")
> Signed-off-by: Kees Cook 
> ---
> I tried the inlining, it was awful. Splitting put_cmsg() was awful. So,
> instead, whitelist the "cb" field as the least bad option if bounce
> buffers are unacceptable. Dave, do you want to take this through net, or
> should I take it through the usercopy tree?

Thanks Kees, I'll take this through my 'net' tree.

Re: [PATCH] net: Extra '_get' in declaration of arch_get_platform_mac_address

2018-02-08 Thread David Miller

From: Mathieu Malaterre 
Date: Wed,  7 Feb 2018 20:35:00 +0100

> In commit c7f5d105495a ("net: Add eth_platform_get_mac_address() helper."),
> two declarations were added:
> 
>   int eth_platform_get_mac_address(struct device *dev, u8 *mac_addr);
>   unsigned char *arch_get_platform_get_mac_address(void);
> 
> An extra '_get' was introduced in arch_get_platform_get_mac_address, remove
> it. Fix compile warning using W=1:
> 
>   CC  net/ethernet/eth.o
> net/ethernet/eth.c:523:24: warning: no previous prototype for 
> ‘arch_get_platform_mac_address’ [-Wmissing-prototypes]
>  unsigned char * __weak arch_get_platform_mac_address(void)
> ^
>   AR  net/ethernet/built-in.o
> 
> Signed-off-by: Mathieu Malaterre 

I'm very surprised this warning didn't get triggered before.

That's really weird!

Applied, thank you.

Re: [PATCH net-next] ibmvnic: queue reset when CRQ gets closed during reset

2018-02-08 Thread David Miller

From: Nathan Fontenot 
Date: Wed, 07 Feb 2018 13:00:24 -0600

> While handling a driver reset we get a H_CLOSED return trying
> to send a CRQ event. When this occurs we need to queue up another
> reset attempt. Without doing this we see instances where the driver
> is left in a closed state because the reset failed and there is no
> further attempts to reset the driver.
> 
> Signed-off-by: Nathan Fontenot 

Applied.

Re: [PATCH v2 1/2] net, can, ifi: fix "write buffer full" error

2018-02-08 Thread Marek Vasut

On 02/08/2018 03:46 PM, Marc Kleine-Budde wrote:
> On 02/08/2018 07:47 AM, Heiko Schocher wrote:
>> the driver reads in the ISR first the IRQpending register,
>> and clears after that in a write *all* bits in it.
>>
>> It could happen that the isr register raise bits between
>> this 2 register accesses, which leads in lost bits ...
>>
>> In case it clears "TX message sent successfully", the driver
>> never sends any Tx data, and buffers to userspace run over.
>>
>> Fixed this:
>> clear only the bits in the IRQpending register, the
>> driver had read.
>>
>> Signed-off-by: Heiko Schocher 
>> Reviewed-by: Marek Vasut 
> 
> Applied both to linux-can.

Can you also apply them to stable, so they get into 4.9.x etc ?

Thanks!

-- 
Best regards,
Marek Vasut

Re: [PATCH] atm: he: use 64-bit arithmetic instead of 32-bit

2018-02-08 Thread David Miller

From: "Gustavo A. R. Silva" 
Date: Wed, 7 Feb 2018 10:17:29 -0600

> Add suffix ULL to constants 272, 204, 136 and 68 in order to give the
> compiler complete information about the proper arithmetic to use.
> Notice that these constants are used in contexts that expect
> expressions of type unsigned long long (64 bits, unsigned).
> 
> The following expressions are currently being evaluated using 32-bit
> arithmetic:
> 
> 272 * mult
> 204 * mult
> 136 * mult
> 68 * mult
> 
> Addresses-Coverity-ID: 201058
> Signed-off-by: Gustavo A. R. Silva 

Applied, thanks.

Re: [Patch net] ipt_CLUSTERIP: fix a refcount bug in clusterip_config_find_get()

2018-02-08 Thread Cong Wang

On Thu, Feb 8, 2018 at 12:01 AM, Florian Westphal  wrote:
> Cong Wang  wrote:
>> In clusterip_config_find_get() we hold RCU read lock so it could
>> run concurrently with clusterip_config_entry_put(), as a result,
>> the refcnt could go back to 1 from 0, which leads to a double
>> list_del()... Just replace refcount_inc() with
>> refcount_inc_not_zero(), as for c->refcount.
>>
>> Fixes: d73f33b16883 ("netfilter: CLUSTERIP: RCU conversion")
>> Cc: Eric Dumazet 
>> Cc: Pablo Neira Ayuso 
>> Signed-off-by: Cong Wang 
>> ---
>>  net/ipv4/netfilter/ipt_CLUSTERIP.c | 6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c 
>> b/net/ipv4/netfilter/ipt_CLUSTERIP.c
>> index 1ff72b87a066..4537b1686c7c 100644
>> --- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
>> +++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
>> @@ -154,8 +154,10 @@ clusterip_config_find_get(struct net *net, __be32 
>> clusterip, int entry)
>>  #endif
>>   if (unlikely(!refcount_inc_not_zero(>refcount)))
>>   c = NULL;
>> - else if (entry)
>> - refcount_inc(>entries);
>> + else if (entry) {
>> + if (unlikely(!refcount_inc_not_zero(>entries)))
>
> this needs to call clusterip_config_put(c); too, else we leak one
> reference.
>
> Other than that this looks good.

Right, good catch! I will send v2.

Re: [PATCH net v3] net: phy: fix phy_start to consider PHY_IGNORE_INTERRUPT

2018-02-08 Thread Florian Fainelli

On 02/08/2018 12:01 PM, Heiner Kallweit wrote:
> This condition wasn't adjusted when PHY_IGNORE_INTERRUPT (-2) was added
> long ago. In case of PHY_IGNORE_INTERRUPT the MAC interrupt indicates
> also PHY state changes and we should do what the symbol says.
> 
> Fixes: 84a527a41f38 ("net: phylib: fix interrupts re-enablement in phy_start")
> Signed-off-by: Heiner Kallweit 

Reviewed-by: Florian Fainelli 

Thanks Heiner!
-- 
Florian

[PATCH net v3] net: phy: fix phy_start to consider PHY_IGNORE_INTERRUPT

2018-02-08 Thread Heiner Kallweit

This condition wasn't adjusted when PHY_IGNORE_INTERRUPT (-2) was added
long ago. In case of PHY_IGNORE_INTERRUPT the MAC interrupt indicates
also PHY state changes and we should do what the symbol says.

Fixes: 84a527a41f38 ("net: phylib: fix interrupts re-enablement in phy_start")
Signed-off-by: Heiner Kallweit 
---
v2:
- use phy_interrupt_is_valid() instead of checking for irq > 0
v3:
- added "Fixes" tag
- fix is a candidate for stable, v4.9+
---
 drivers/net/phy/phy.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index f3313a129..50ed35a45 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -822,7 +822,7 @@ void phy_start(struct phy_device *phydev)
phy_resume(phydev);
 
/* make sure interrupts are re-enabled for the PHY */
-   if (phydev->irq != PHY_POLL) {
+   if (phy_interrupt_is_valid(phydev)) {
err = phy_enable_interrupts(phydev);
if (err < 0)
break;
-- 
2.16.1

Linux Plumbers Networking Track CFP

2018-02-08 Thread David Miller


This is a call for proposals for the networking track at the Linux
Plumbers Conference in Vancouver, which will be happening on November
13th and November 14th.

We are seeking talks of 40 minutes in length, accompanied by papers
of 2 to 10 pages in length.

Please submit your proposals to the LPC Networking Technical Committee
at:

lpc-net...@vger.kernel.org

Proposals must be submitted by July 11th, and submitters will be
notified of acceptance by August 15th.

The format of the submission and other details can be found at:

   http://vger.kernel.org/lpc-networking.html

We are looking forward to seeing everyone in November!

Re: [PATCH v3 1/1] tcp: Honor the eor bit in tcp_mtu_probe

2018-02-08 Thread David Miller

From: Ilya Lesokhin 
Date: Wed,  7 Feb 2018 15:13:11 +0200

> +static bool tcp_can_coalesce_send_queue_head(struct sock *sk, int len)
> +{
> + struct sk_buff *skb, *next;
> +
> + skb = tcp_send_head(sk);
> + tcp_for_write_queue_from_safe(skb, next, sk)
> + {

Please format tcp_for_write_queue_from_safe() uses like a real for loop,
meaning that the openning curly braces go on the same line as the
tcp_for_write_queue_from_safe() statement.

Thank you.

[RFC PATCH bpf-next 2/2] bpf/verifier: update selftests

2018-02-08 Thread Edward Cree

Error messages for some bad programs have changed, partly because we now
 check for loops / out-of-bounds jumps before checking subprogs.

Problematic selftests:
513 calls: wrong recursive calls
 This is now rejected with 'unreachable insn 1'.  I'm not entirely sure what
 it was meant to do/test, since all of the JMP|CALLs are also unreachable.
546 calls: ld_abs with changing ctx data in callee
 Rejected with R1 !read_ok.  It was testing for the "can't mix LD_ABS with
 function calls", which has now changed to "can't use LD_ABS in functions
 other than main()".  I'm still not 100% sure that's right though.

Signed-off-by: Edward Cree 
---
 tools/testing/selftests/bpf/test_verifier.c | 46 ++---
 1 file changed, 29 insertions(+), 17 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index 697bd83de295..9c7531887ee3 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -644,7 +644,7 @@ static struct bpf_test tests[] = {
.insns = {
BPF_ALU64_REG(BPF_MOV, BPF_REG_0, BPF_REG_2),
},
-   .errstr = "not an exit",
+   .errstr = "jump out of range",
.result = REJECT,
},
{
@@ -9288,7 +9288,7 @@ static struct bpf_test tests[] = {
BPF_EXIT_INSN(),
},
.prog_type = BPF_PROG_TYPE_TRACEPOINT,
-   .errstr = "last insn is not an exit or jmp",
+   .errstr = "insn 1 was in subprog 1, now 0",
.result = REJECT,
},
{
@@ -9354,7 +9354,7 @@ static struct bpf_test tests[] = {
BPF_EXIT_INSN(),
},
.prog_type = BPF_PROG_TYPE_TRACEPOINT,
-   .errstr = "jump out of range",
+   .errstr = "insn 5 was in subprog 1, now 0",
.result = REJECT,
},
{
@@ -9633,7 +9633,7 @@ static struct bpf_test tests[] = {
BPF_EXIT_INSN(),
},
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
-   .errstr = "jump out of range from insn 1 to 4",
+   .errstr = "insn 5 was in subprog 1, now 0",
.result = REJECT,
},
{
@@ -9649,13 +9649,12 @@ static struct bpf_test tests[] = {
BPF_ALU64_REG(BPF_ADD, BPF_REG_7, BPF_REG_0),
BPF_MOV64_REG(BPF_REG_0, BPF_REG_7),
BPF_EXIT_INSN(),
-   BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
-   offsetof(struct __sk_buff, len)),
+   BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, 8),
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, -3),
BPF_EXIT_INSN(),
},
.prog_type = BPF_PROG_TYPE_TRACEPOINT,
-   .errstr = "jump out of range from insn 11 to 9",
+   .errstr = "insn 9 was in subprog 1, now 2",
.result = REJECT,
},
{
@@ -9707,7 +9706,7 @@ static struct bpf_test tests[] = {
BPF_EXIT_INSN(),
},
.prog_type = BPF_PROG_TYPE_TRACEPOINT,
-   .errstr = "invalid destination",
+   .errstr = "jump out of range from insn 2 to -1",
.result = REJECT,
},
{
@@ -9719,7 +9718,7 @@ static struct bpf_test tests[] = {
BPF_EXIT_INSN(),
},
.prog_type = BPF_PROG_TYPE_TRACEPOINT,
-   .errstr = "invalid destination",
+   .errstr = "jump out of range from insn 2 to -2147483646",
.result = REJECT,
},
{
@@ -9732,7 +9731,7 @@ static struct bpf_test tests[] = {
BPF_EXIT_INSN(),
},
.prog_type = BPF_PROG_TYPE_TRACEPOINT,
-   .errstr = "jump out of range",
+   .errstr = "insn 1 was in subprog 0, now 1",
.result = REJECT,
},
{
@@ -9745,7 +9744,7 @@ static struct bpf_test tests[] = {
BPF_EXIT_INSN(),
},
.prog_type = BPF_PROG_TYPE_TRACEPOINT,
-   .errstr = "jump out of range",
+   .errstr = "insn 4 was in subprog 1, now 0",
.result = REJECT,
},
{
@@ -9759,7 +9758,7 @@ static struct bpf_test tests[] = {
BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, -2),
},
.prog_type = BPF_PROG_TYPE_TRACEPOINT,
-   .errstr = "not an exit",
+   .errstr = "jump out of range from insn 5 to 6",
.result = REJECT,
},
{
@@ -9773,7 +9772,7 @@ static struct bpf_test tests[] = {
BPF_EXIT_INSN(),

Re: [PATCH net 1/1 v4] rtnetlink: require unique netns identifier

2018-02-08 Thread David Miller

From: Christian Brauner 
Date: Wed,  7 Feb 2018 13:53:20 +0100

> Since we've added support for IFLA_IF_NETNSID for RTM_{DEL,GET,SET,NEW}LINK
> it is possible for userspace to send us requests with three different
> properties to identify a target network namespace. This affects at least
> RTM_{NEW,SET}LINK. Each of them could potentially refer to a different
> network namespace which is confusing. For legacy reasons the kernel will
> pick the IFLA_NET_NS_PID property first and then look for the
> IFLA_NET_NS_FD property but there is no reason to extend this type of
> behavior to network namespace ids. The regression potential is quite
> minimal since the rtnetlink requests in question either won't allow
> IFLA_IF_NETNSID requests before 4.16 is out (RTM_{NEW,SET}LINK) or don't
> support IFLA_NET_NS_{PID,FD} (RTM_{DEL,GET}LINK) in the first place.
> 
> Signed-off-by: Christian Brauner 

Applied, thanks Christian.

[RFC PATCH bpf-next 1/2] bpf/verifier: validate func_calls by marking at do_check() time

2018-02-08 Thread Edward Cree

Removes a couple of passes from the verifier, one to check subprogs don't
 overlap etc., and one to compute max stack depth (which now is done by
 topologically sorting the call graph).

Signed-off-by: Edward Cree 
---
 include/linux/bpf_verifier.h |  24 ++-
 kernel/bpf/verifier.c| 425 +++
 2 files changed, 242 insertions(+), 207 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 6b66cd1aa0b9..0387e0c61161 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -146,6 +146,8 @@ struct bpf_insn_aux_data {
s32 call_imm;   /* saved imm field of call insn 
*/
};
int ctx_field_size; /* the ctx field size for load insn, maybe 0 */
+   u16 subprogno; /* subprog in which this insn resides, valid iff @seen */
+   u16 subprog_off; /* insn_idx within subprog, computed in jit_subprogs */
bool seen; /* this insn was processed by the verifier */
 };
 
@@ -168,6 +170,15 @@ static inline bool bpf_verifier_log_full(const struct 
bpf_verifer_log *log)
 
 #define BPF_MAX_SUBPROGS 256
 
+struct bpf_subprog_info {
+   /* which other subprogs does this one directly call? */
+   DECLARE_BITMAP(callees, BPF_MAX_SUBPROGS);
+   u32 start; /* insn idx of function entry point */
+   u16 stack_depth; /* max. stack depth used by this function */
+   u16 total_stack_depth; /* max. stack depth used by entire call chain */
+   u16 len; /* #insns in this subprog */
+};
+
 /* single container for all structs
  * one verifier_env per bpf_check() call
  */
@@ -186,20 +197,23 @@ struct bpf_verifier_env {
bool seen_direct_write;
struct bpf_insn_aux_data *insn_aux_data; /* array of per-insn state */
struct bpf_verifer_log log;
-   u32 subprog_starts[BPF_MAX_SUBPROGS];
-   /* computes the stack depth of each bpf function */
-   u16 subprog_stack_depth[BPF_MAX_SUBPROGS + 1];
+   struct bpf_subprog_info subprog_info[BPF_MAX_SUBPROGS];
u32 subprog_cnt;
 };
 
 __printf(2, 3) void bpf_verifier_log_write(struct bpf_verifier_env *env,
   const char *fmt, ...);
 
-static inline struct bpf_reg_state *cur_regs(struct bpf_verifier_env *env)
+static inline struct bpf_func_state *cur_frame(struct bpf_verifier_env *env)
 {
struct bpf_verifier_state *cur = env->cur_state;
 
-   return cur->frame[cur->curframe]->regs;
+   return cur->frame[cur->curframe];
+}
+
+static inline struct bpf_reg_state *cur_regs(struct bpf_verifier_env *env)
+{
+   return cur_frame(env)->regs;
 }
 
 int bpf_prog_offload_verifier_prep(struct bpf_verifier_env *env);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 5fb69a85d967..2dc69fb3bfbe 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -728,111 +728,49 @@ enum reg_arg_type {
DST_OP_NO_MARK  /* same as above, check only, don't mark */
 };
 
-static int cmp_subprogs(const void *a, const void *b)
+static int find_subprog(struct bpf_verifier_env *env, int insn_idx)
 {
-   return *(int *)a - *(int *)b;
-}
-
-static int find_subprog(struct bpf_verifier_env *env, int off)
-{
-   u32 *p;
+   struct bpf_insn_aux_data *aux;
+   int insn_cnt = env->prog->len;
+   u32 subprogno;
 
-   p = bsearch(, env->subprog_starts, env->subprog_cnt,
-   sizeof(env->subprog_starts[0]), cmp_subprogs);
-   if (!p)
+   if (insn_idx >= insn_cnt || insn_idx < 0) {
+   verbose(env, "find_subprog of invalid insn_idx %d\n", insn_idx);
+   return -EINVAL;
+   }
+   aux = >insn_aux_data[insn_idx];
+   if (!aux->seen) /* haven't visited this line yet */
return -ENOENT;
-   return p - env->subprog_starts;
-
+   subprogno = aux->subprogno;
+   /* validate that we are at start of subprog */
+   if (env->subprog_info[subprogno].start != insn_idx) {
+   verbose(env, "insn_idx %d is in subprog %u but that starts at 
%d\n",
+   insn_idx, subprogno, 
env->subprog_info[subprogno].start);
+   return -EINVAL;
+   }
+   return subprogno;
 }
 
 static int add_subprog(struct bpf_verifier_env *env, int off)
 {
int insn_cnt = env->prog->len;
+   struct bpf_subprog_info *info;
int ret;
 
if (off >= insn_cnt || off < 0) {
verbose(env, "call to invalid destination\n");
return -EINVAL;
}
-   ret = find_subprog(env, off);
-   if (ret >= 0)
-   return 0;
if (env->subprog_cnt >= BPF_MAX_SUBPROGS) {
verbose(env, "too many subprograms\n");
return -E2BIG;
}
-   env->subprog_starts[env->subprog_cnt++] = off;
-   sort(env->subprog_starts, env->subprog_cnt,
-sizeof(env->subprog_starts[0]), cmp_subprogs, NULL);
-

[RFC PATCH bpf-next 0/2] bpf/verifier: simplify subprog tracking

2018-02-08 Thread Edward Cree

By storing subprog boundaries as a subprogno mark on each insn, rather than
 a start (and implicit end) for each subprog, we collect a number of gains:
* More efficient determination of which subprog contains a given insn, and
  thus of find_subprog (which subprog begins at a given insn).
* Number of verifier passes is reduced, since most of the work is done in
  the main insn walk (do_check()).
* Subprogs no longer have to be contiguous; so long as they don't overlap
  and there are no unreachable insns, verifier is happy.  (This does require
  a small amount of care at jit_subprogs() time to fix up jump offsets, so
  we could instead disallow this if people prefer.)

Some other changes were also included to support this:
* Per-subprog info is stored in env->subprog_info, an array of structs,
  rather than several arrays with a common index.
* Call graph is now stored in the new bpf_subprog_info struct; used here for
  check_max_stack_depth() but may have other uses too.
* LD_ABS and LD_IND were previously disallowed in programs that also contain
  subprog calls.  Now they are only disallowed in callees, i.e. main() can
  always use them even if it also uses subprog calls.  AFAICT this is safe
  (main()'s r1 arg is still known to be ctx, so prologue can do its stuff).
  But again it can be disallowed if necessary.

Most tests in test_verifier pass (a few had to be changed to expect different
 failure messages), but there are a couple I wasn't quite sure what to do
 with - see comment on patch #2.

Edward Cree (2):
  bpf/verifier: validate func_calls by marking at do_check() time
  bpf/verifier: update selftests

 include/linux/bpf_verifier.h|  24 +-
 kernel/bpf/verifier.c   | 425 +++-
 tools/testing/selftests/bpf/test_verifier.c |  46 +--
 3 files changed, 271 insertions(+), 224 deletions(-)

net: thunder: change q_len's type to handle max ring size

2018-02-08 Thread Dean Nelson

The Cavium thunder nicvf driver supports rx/tx rings of up to 65536 entries per.
The number of entires are stored in the q_len member of struct q_desc_mem. The
problem is that q_len being a u16, results in 65536 becoming 0.

In getting pointers to descriptors in the rings, the driver uses q_len minus 1
as a mask after incrementing the pointer, in order to go back to the beginning
and not go past the end of the ring.

With the q_len set to 0 the mask is no longer correct and the driver does go
beyond the end of the ring, causing various ills. Usually the first thing that
shows up is a "NETDEV WATCHDOG: enP2p1s0f1 (nicvf): transmit queue 7 timed out"
warning.

This patch remedies the problem by changing q_len to a u32.

Signed-off-by: Dean Nelson 
---
 drivers/net/ethernet/cavium/thunder/nicvf_queues.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.h 
b/drivers/net/ethernet/cavium/thunder/nicvf_queues.h
index 7d1e4e2aaad0..ce1eed7a6d63 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.h
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.h
@@ -213,7 +213,7 @@ struct rx_tx_queue_stats {
 struct q_desc_mem {
dma_addr_t  dma;
u64 size;
-   u16 q_len;
+   u32 q_len;
dma_addr_t  phys_base;
void*base;
void*unalign_base;

Re: [PATCH net V3 1/2] ptr_ring: try vmalloc() when kmalloc() fails

2018-02-08 Thread Michael S. Tsirkin

On Thu, Feb 08, 2018 at 02:58:40PM +0800, Jason Wang wrote:
> 
> 
> On 2018年02月08日 12:45, Michael S. Tsirkin wrote:
> > On Thu, Feb 08, 2018 at 11:59:24AM +0800, Jason Wang wrote:
> > > This patch switch to use kvmalloc_array() for using a vmalloc()
> > > fallback to help in case kmalloc() fails.
> > > 
> > > Reported-by: syzbot+e4d4f9ddd42955397...@syzkaller.appspotmail.com
> > > Fixes: 2e0ab8ca83c12 ("ptr_ring: array based FIFO for pointers")
> > I guess the actual patch is the one that switches tun to ptr_ring.
> 
> I think not, since the issue was large allocation.
> 
> > 
> > In fact, I think the actual bugfix is patch 2/2. This specific one
> > just makes kmalloc less likely to fail but that's
> > not what syzbot reported.
> 
> Agree.
> 
> > 
> > Then I would add this patch on top to make kmalloc less likely to fail.
> 
> Ok.
> 
> > > Signed-off-by: Jason Wang 
> > 
> > 
> > > ---
> > >   include/linux/ptr_ring.h | 10 +-
> > >   1 file changed, 5 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> > > index 1883d61..2af71a7 100644
> > > --- a/include/linux/ptr_ring.h
> > > +++ b/include/linux/ptr_ring.h
> > > @@ -466,7 +466,7 @@ static inline int ptr_ring_consume_batched_bh(struct 
> > > ptr_ring *r,
> > >   static inline void **__ptr_ring_init_queue_alloc(unsigned int size, 
> > > gfp_t gfp)
> > >   {
> > > - return kcalloc(size, sizeof(void *), gfp);
> > > + return kvmalloc_array(size, sizeof(void *), gfp | __GFP_ZERO);
> > >   }
> > >   static inline void __ptr_ring_set_size(struct ptr_ring *r, int size)
> > This implies a bunch of limitations on the flags. From kvmalloc_node
> > docs:
> > 
> >   * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported.
> >   * __GFP_RETRY_MAYFAIL is supported, and it should be used only if kmalloc 
> > is
> >   * preferable to the vmalloc fallback, due to visible performance 
> > drawbacks.
> > 
> > Fine with all the current users, but if we go this way, please add
> > documentation so future users don't misuse this API.
> 
> I suspect this is somehow a overkill since this means we need sync with
> mm/vmalloc changes in the future to keep it synced.
> 
> > Alternatively, test flags and call kvmalloc or kcalloc?
> 
> Similar to the above issue, I would rather leave it as is.
> 
> Thanks

How do we prevent someone from inevitably trying to use this with
GFP_ATOMIC?

> > 
> > 
> > > @@ -601,7 +601,7 @@ static inline int ptr_ring_resize(struct ptr_ring *r, 
> > > int size, gfp_t gfp,
> > >   spin_unlock(&(r)->producer_lock);
> > >   spin_unlock_irqrestore(&(r)->consumer_lock, flags);
> > > - kfree(old);
> > > + kvfree(old);
> > >   return 0;
> > >   }
> > > @@ -641,7 +641,7 @@ static inline int ptr_ring_resize_multiple(struct 
> > > ptr_ring **rings,
> > >   }
> > >   for (i = 0; i < nrings; ++i)
> > > - kfree(queues[i]);
> > > + kvfree(queues[i]);
> > >   kfree(queues);
> > > @@ -649,7 +649,7 @@ static inline int ptr_ring_resize_multiple(struct 
> > > ptr_ring **rings,
> > >   nomem:
> > >   while (--i >= 0)
> > > - kfree(queues[i]);
> > > + kvfree(queues[i]);
> > >   kfree(queues);
> > > @@ -664,7 +664,7 @@ static inline void ptr_ring_cleanup(struct ptr_ring 
> > > *r, void (*destroy)(void *))
> > >   if (destroy)
> > >   while ((ptr = ptr_ring_consume(r)))
> > >   destroy(ptr);
> > > - kfree(r->queue);
> > > + kvfree(r->queue);
> > >   }
> > >   #endif /* _LINUX_PTR_RING_H  */
> > > -- 
> > > 2.7.4

Re: [V9fs-developer] [PATCH] 9p/trans_virtio: discard zero-length reply

2018-02-08 Thread Michael S. Tsirkin

OK, I've queued it.

On Thu, Feb 08, 2018 at 06:52:32PM +0100, Greg Kurz wrote:
> Ping ?
> 
> Michael,
> 
> Since this is virtio code and you have acked the QEMU part of the fix already,
> would you be kind enough to take this through your tree ?
> 
> Cheers,
> 
> --
> Greg
> 
> On Mon, 22 Jan 2018 22:02:05 +0100
> Greg Kurz  wrote:
> 
> > When a 9p request is successfully flushed, the server is expected to just
> > mark it as used without sending a 9p reply (ie, without writing data into
> > the buffer). In this case, virtqueue_get_buf() will return len == 0 and
> > we must not report a REQ_STATUS_RCVD status to the client, otherwise the
> > client will erroneously assume the request has not been flushed.
> > 
> > Signed-off-by: Greg Kurz 
> > ---
> >  net/9p/trans_virtio.c |3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
> > index 0845aad4ba51..ca08c72ef4de 100644
> > --- a/net/9p/trans_virtio.c
> > +++ b/net/9p/trans_virtio.c
> > @@ -171,7 +171,8 @@ static void req_done(struct virtqueue *vq)
> > spin_unlock_irqrestore(>lock, flags);
> > /* Wakeup if anyone waiting for VirtIO ring space. */
> > wake_up(chan->vc_wq);
> > -   p9_client_cb(chan->client, req, REQ_STATUS_RCVD);
> > +   if (len)
> > +   p9_client_cb(chan->client, req, REQ_STATUS_RCVD);
> > }
> >  }
> >  
> > 
> > 
> > --
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> > ___
> > V9fs-developer mailing list
> > v9fs-develo...@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/v9fs-developer

Re: [PATCH net] tuntap: add missing xdp flush

2018-02-08 Thread David Miller

From: Jason Wang 
Date: Wed,  7 Feb 2018 17:14:46 +0800

> When using devmap to redirect packets between interfaces,
> xdp_do_flush() is usually a must to flush any batched
> packets. Unfortunately this is missed in current tuntap
> implementation.
> 
> Unlike most hardware driver which did XDP inside NAPI loop and call
> xdp_do_flush() at then end of each round of poll. TAP did it in the
> context of process e.g tun_get_user(). So fix this by count the
> pending redirected packets and flush when it exceeds NAPI_POLL_WEIGHT
> or MSG_MORE was cleared by sendmsg() caller.
> 
> With this fix, xdp_redirect_map works again between two TAPs.
> 
> Fixes: 761876c857cb ("tap: XDP support")
> Signed-off-by: Jason Wang 

Applied and queued up for -stable, thanks Jason.

Re: [PATCH net V3 2/2] ptr_ring: fail on large queue size (>64K)

2018-02-08 Thread David Miller

From: Jason Wang 
Date: Thu,  8 Feb 2018 11:59:25 +0800

> We need limit the maximum size of queue, otherwise it may cause
> several side effects e.g slab will warn when the size exceeds
> KMALLOC_MAX_SIZE. Using KMALLOC_MAX_SIZE still looks too so this patch
> tries to limit it to 64K. This value could be revisited if we found a
> real case that needs more.
> 
> Reported-by: syzbot+e4d4f9ddd42955397...@syzkaller.appspotmail.com
> Fixes: 2e0ab8ca83c12 ("ptr_ring: array based FIFO for pointers")
> Signed-off-by: Jason Wang 
 ...
> @@ -466,6 +468,8 @@ static inline int ptr_ring_consume_batched_bh(struct 
> ptr_ring *r,
>  
>  static inline void **__ptr_ring_init_queue_alloc(unsigned int size, gfp_t 
> gfp)
>  {
> + if (size > PTR_RING_MAX_ALLOC)
> + return NULL;
>   return kvmalloc_array(size, sizeof(void *), gfp | __GFP_ZERO);
>  }

This doesn't limit the allocation to 64K.  It limits it to 64K * sizeof(void *).

Re: [PATCH net] netlink: ensure to loop over all netns in genlmsg_multicast_allns()

2018-02-08 Thread David Miller

From: Nicolas Dichtel 
Date: Tue,  6 Feb 2018 14:48:32 +0100

> Nowadays, nlmsg_multicast() returns only 0 or -ESRCH but this was not the
> case when commit 134e63756d5f was pushed.
> However, there was no reason to stop the loop if a netns does not have
> listeners.
> Returns -ESRCH only if there was no listeners in all netns.
> 
> To avoid having the same problem in the future, I didn't take the
> assumption that nlmsg_multicast() returns only 0 or -ESRCH.
> 
> Fixes: 134e63756d5f ("genetlink: make netns aware")
> CC: Johannes Berg 
> Signed-off-by: Nicolas Dichtel 

Ok, that indeed preserves the original behavior.  Given this has
been this way since 2.6.32 I wonder if fixing this might break
something.

Regardless, applied and queued up for -stable, thanks.

Re: [PATCH] net: ethernet: ti: cpsw: fix net watchdog timeout

2018-02-08 Thread David Miller

From: Grygorii Strashko 
Date: Thu, 8 Feb 2018 10:04:31 -0600

> Could this be marked as stable material 4.9+?

Sure, queued up.

Re: [PATCH net] rxrpc: Don't put crypto buffers on the stack

2018-02-08 Thread David Miller

From: David Howells 
Date: Thu, 08 Feb 2018 15:59:07 +

> Don't put buffers of data to be handed to crypto on the stack as this may
> cause an assertion failure in the kernel (see below).  Fix this by using an
> kmalloc'd buffer instead.
 ...
> Reported-by: Jonathan Billings 
> Reported-by: Marc Dionne 
> Signed-off-by: David Howells 
> Tested-by: Jonathan Billings 

Applied, thanks David.

Re: [V9fs-developer] [PATCH] 9p/trans_virtio: discard zero-length reply

2018-02-08 Thread Greg Kurz

Ping ?

Michael,

Since this is virtio code and you have acked the QEMU part of the fix already,
would you be kind enough to take this through your tree ?

Cheers,

--
Greg

On Mon, 22 Jan 2018 22:02:05 +0100
Greg Kurz  wrote:

> When a 9p request is successfully flushed, the server is expected to just
> mark it as used without sending a 9p reply (ie, without writing data into
> the buffer). In this case, virtqueue_get_buf() will return len == 0 and
> we must not report a REQ_STATUS_RCVD status to the client, otherwise the
> client will erroneously assume the request has not been flushed.
> 
> Signed-off-by: Greg Kurz 
> ---
>  net/9p/trans_virtio.c |3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
> index 0845aad4ba51..ca08c72ef4de 100644
> --- a/net/9p/trans_virtio.c
> +++ b/net/9p/trans_virtio.c
> @@ -171,7 +171,8 @@ static void req_done(struct virtqueue *vq)
>   spin_unlock_irqrestore(>lock, flags);
>   /* Wakeup if anyone waiting for VirtIO ring space. */
>   wake_up(chan->vc_wq);
> - p9_client_cb(chan->client, req, REQ_STATUS_RCVD);
> + if (len)
> + p9_client_cb(chan->client, req, REQ_STATUS_RCVD);
>   }
>  }
>  
> 
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> V9fs-developer mailing list
> v9fs-develo...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/v9fs-developer

Re: [RFC][PATCH bpf 1/2] bpf: allow 64-bit offsets for bpf function calls

2018-02-08 Thread Naveen N. Rao


Alexei Starovoitov wrote:

On 2/8/18 4:03 AM, Sandipan Das wrote:

The imm field of a bpf_insn is a signed 32-bit integer. For
JIT-ed bpf-to-bpf function calls, it stores the offset from
__bpf_call_base to the start of the callee function.

For some architectures, such as powerpc64, it was found that
this offset may be as large as 64 bits because of which this
cannot be accomodated in the imm field without truncation.

To resolve this, we additionally use the aux data within each
bpf_prog associated with the caller functions to store the
addresses of their respective callees.

Signed-off-by: Sandipan Das 
---
 kernel/bpf/verifier.c | 39 ++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 5fb69a85d967..52088b4ca02f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5282,6 +5282,19 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 * run last pass of JIT
 */
for (i = 0; i <= env->subprog_cnt; i++) {
+   u32 flen = func[i]->len, callee_cnt = 0;
+   struct bpf_prog **callee;
+
+   /* for now assume that the maximum number of bpf function
+* calls that can be made by a caller must be at most the
+* number of bpf instructions in that function
+*/
+   callee = kzalloc(sizeof(func[i]) * flen, GFP_KERNEL);
+   if (!callee) {
+   err = -ENOMEM;
+   goto out_free;
+   }
+
insn = func[i]->insnsi;
for (j = 0; j < func[i]->len; j++, insn++) {
if (insn->code != (BPF_JMP | BPF_CALL) ||
@@ -5292,6 +5305,26 @@ static int jit_subprogs(struct bpf_verifier_env *env)
insn->imm = (u64 (*)(u64, u64, u64, u64, u64))
func[subprog]->bpf_func -
__bpf_call_base;
+
+   /* the offset to the callee from __bpf_call_base
+* may be larger than what the 32 bit integer imm
+* can accomodate which will truncate the higher
+* order bits
+*
+* to avoid this, we additionally utilize the aux
+* data of each caller function for storing the
+* addresses of every callee associated with it
+*/
+   callee[callee_cnt++] = func[subprog];


can you share typical /proc/kallsyms ?
Are you saying that kernel and kernel modules are allocated from
address spaces that are always more than 32-bit apart?


Yes. On ppc64, kernel text is linearly mapped from 0xc000, 
while vmalloc'ed area starts from 0xd000 (for radix, this is

different, but still beyond a 32-bit offset).


That would mean that all kernel calls into modules are far calls
and the other way around form .ko into kernel?
Performance is probably suffering because every call needs to be built
with full 64-bit offset. No ?


Possibly, and I think Michael can give a better perspective, but I think
this is due to our ABI. For inter-module calls, we need to setup the TOC
pointer (or the address of the function being called with ABIv2), which 
would require us to load a full address regardless.


- Naveen

pull-request: wireless-drivers-next 2018-02-08

2018-02-08 Thread Kalle Valo

Hi Dave,

first set of fixes for 4.16, unusually many when the merge window hasn't
even closed yet. Especially the ssb fix is important so I hope there's
still time to get this to 4.16-rc1. As you can see from the diffstat
there's one PCI id addition but that has been acked by Bjorn.

Please let me know if you have any problems.

Kalle

The following changes since commit f813614f531114db796ad66ced75c5dc8db7aa3a:

  ibmvnic: Wait for device response when changing MAC (2018-01-29 18:03:24 
-0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next.git 
tags/wireless-drivers-next-for-davem-2018-02-08

for you to fetch changes up to 99ffd198f07f46f3a8e64399249a8333c09063df:

  Merge ath-current from 
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git (2018-02-08 
19:28:49 +0200)


wireless-drivers-next patches for 4.16

The most important here is the ssb fix, it has been reported by the
users frequently and the fix just missed the final v4.15. Also
numerous other fixes, mt76 had multiple problems with aggregation and
a long standing unaligned access bug in rtlwifi is finally fixed.

Major changes:

ath10k

* correct firmware RAM dump length for QCA6174/QCA9377

* add new QCA988X device id

* fix a kernel panic during pci probe

* revert a recent commit which broke ath10k firmware metadata parsing

ath9k

* fix a noise floor regression introduced during the merge window

* add new device id

rtlwifi

* fix unaligned access seen on ARM architecture

mt76

* various aggregation fixes which fix connection stalls

ssb

* fix b43 and b44 on non-MIPS which broke in v4.15-rc9


Felix Fietkau (4):
  mt76: implement AP_LINK_PS
  mt76: implement processing of BlockAckReq frames
  mt76: avoid re-queueing A-MPDU rx reorder work if no frames are pending
  mt76: do not set status->aggr for NULL data frames

Kalle Valo (2):
  Merge git://git.kernel.org/.../kvalo/wireless-drivers.git
  Merge ath-current from git://git.kernel.org/.../kvalo/ath.git

Larry Finger (1):
  rtlwifi: rtl8821ae: Fix connection lost problem correctly

Oleksij Rempel (1):
  ath9k_htc: add Altai WA1011N-GU

Ryan Hsu (1):
  Revert "ath10k: add sanity check to ie_len before parsing fw/board ie"

Sven Joachim (1):
  ssb: Do not disable PCI host on non-Mips

Tobias Schramm (2):
  PCI: Add Ubiquiti Networks vendor ID
  ath10k: add support for Ubiquiti rebranded QCA988X v2

Wojciech Dubowik (1):
  ath9k: Fix get channel default noise floor

Yu Wang (2):
  ath10k: correct the length of DRAM dump for QCA6174 hw3.x/QCA9377 hw1.1
  ath10k: fix kernel panic issue during pci probe

 drivers/net/wireless/ath/ath10k/core.c | 43 +++---
 drivers/net/wireless/ath/ath10k/coredump.c |  3 +-
 drivers/net/wireless/ath/ath10k/debug.c| 12 -
 drivers/net/wireless/ath/ath10k/hw.h   |  1 +
 drivers/net/wireless/ath/ath10k/pci.c  |  6 +++
 drivers/net/wireless/ath/ath9k/calib.c |  2 +-
 drivers/net/wireless/ath/ath9k/hif_usb.c   |  1 +
 drivers/net/wireless/mediatek/mt76/agg-rx.c| 40 -
 drivers/net/wireless/mediatek/mt76/mac80211.c  | 52 +-
 drivers/net/wireless/mediatek/mt76/mt76.h  | 10 +
 drivers/net/wireless/mediatek/mt76/mt76x2.h|  2 +
 drivers/net/wireless/mediatek/mt76/mt76x2_init.c   |  1 +
 drivers/net/wireless/mediatek/mt76/mt76x2_mac.c|  2 +-
 drivers/net/wireless/mediatek/mt76/mt76x2_main.c   | 28 ++--
 .../net/wireless/realtek/rtlwifi/rtl8821ae/hw.c|  5 ++-
 drivers/net/wireless/realtek/rtlwifi/wifi.h|  1 +
 drivers/ssb/Kconfig|  2 +-
 include/linux/pci_ids.h|  2 +
 18 files changed, 181 insertions(+), 32 deletions(-)

[PATCH] net/9p: avoid -ERESTARTSYS leak to userspace

2018-02-08 Thread Greg Kurz

If it was interrupted by a signal, the 9p client may need to send some
more requests to the server for cleanup before returning to userspace.

To avoid such a last minute request to be interrupted right away, the
client memorizes if a signal is pending, clear TIF_SIGPENDING, handle
the request and call recalc_sigpending() before returning.

Unfortunately, if the transmission of this cleanup request fails for any
reason, the transport returns an error and the client propagates it right
away, without calling recalc_sigpending().

This ends up with -ERESTARTSYS from the initially interrupted request
crawling up to syscall exit, with TIF_SIGPENDING cleared by the cleanup
request. The specific signal handling code, which is responsible for
converting -ERESTARTSYS to -EINTR is not called, and userspace receives
the confusing errno value:

open: Unknown error 512 (512)

This is really hard to hit in real life. I discovered the issue while
working on hot-unplug of a virtio-9p-pci device with an instrumented
QEMU allowing to control request completion.

Both p9_client_zc_rpc() and p9_client_rpc() functions have this buggy
error path actually. Their code flow is a bit obscure and the best
thing to do would probably be a full rewrite: to really ensure this
situation of clearing TIF_SIGPENDING and returning -ERESTARTSYS can
never happen.

But given the general lack of interest for the 9p code, I won't risk
breaking more things. So this patch simply fix the buggy paths in both
functions with a trivial label+goto.

Thanks to Laurent Dufour for his help and suggestions on how to find
the root cause and how to fix it.

Signed-off-by: Greg Kurz 
---
 net/9p/client.c |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/9p/client.c b/net/9p/client.c
index 4c8cf9c1631a..5154eaf19fff 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -769,7 +769,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char 
*fmt, ...)
if (err < 0) {
if (err != -ERESTARTSYS && err != -EFAULT)
c->status = Disconnected;
-   goto reterr;
+   goto recalc_sigpending;
}
 again:
/* Wait for the response */
@@ -804,6 +804,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char 
*fmt, ...)
if (req->status == REQ_STATUS_RCVD)
err = 0;
}
+recalc_sigpending:
if (sigpending) {
spin_lock_irqsave(>sighand->siglock, flags);
recalc_sigpending();
@@ -867,7 +868,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client 
*c, int8_t type,
if (err == -EIO)
c->status = Disconnected;
if (err != -ERESTARTSYS)
-   goto reterr;
+   goto recalc_sigpending;
}
if (req->status == REQ_STATUS_ERROR) {
p9_debug(P9_DEBUG_ERROR, "req_status error %d\n", req->t_err);
@@ -885,6 +886,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client 
*c, int8_t type,
if (req->status == REQ_STATUS_RCVD)
err = 0;
}
+recalc_sigpending:
if (sigpending) {
spin_lock_irqsave(>sighand->siglock, flags);
recalc_sigpending();

Re: [RFC][PATCH bpf 1/2] bpf: allow 64-bit offsets for bpf function calls

2018-02-08 Thread Alexei Starovoitov


On 2/8/18 4:03 AM, Sandipan Das wrote:

The imm field of a bpf_insn is a signed 32-bit integer. For
JIT-ed bpf-to-bpf function calls, it stores the offset from
__bpf_call_base to the start of the callee function.

For some architectures, such as powerpc64, it was found that
this offset may be as large as 64 bits because of which this
cannot be accomodated in the imm field without truncation.

To resolve this, we additionally use the aux data within each
bpf_prog associated with the caller functions to store the
addresses of their respective callees.

Signed-off-by: Sandipan Das 
---
 kernel/bpf/verifier.c | 39 ++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 5fb69a85d967..52088b4ca02f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5282,6 +5282,19 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 * run last pass of JIT
 */
for (i = 0; i <= env->subprog_cnt; i++) {
+   u32 flen = func[i]->len, callee_cnt = 0;
+   struct bpf_prog **callee;
+
+   /* for now assume that the maximum number of bpf function
+* calls that can be made by a caller must be at most the
+* number of bpf instructions in that function
+*/
+   callee = kzalloc(sizeof(func[i]) * flen, GFP_KERNEL);
+   if (!callee) {
+   err = -ENOMEM;
+   goto out_free;
+   }
+
insn = func[i]->insnsi;
for (j = 0; j < func[i]->len; j++, insn++) {
if (insn->code != (BPF_JMP | BPF_CALL) ||
@@ -5292,6 +5305,26 @@ static int jit_subprogs(struct bpf_verifier_env *env)
insn->imm = (u64 (*)(u64, u64, u64, u64, u64))
func[subprog]->bpf_func -
__bpf_call_base;
+
+   /* the offset to the callee from __bpf_call_base
+* may be larger than what the 32 bit integer imm
+* can accomodate which will truncate the higher
+* order bits
+*
+* to avoid this, we additionally utilize the aux
+* data of each caller function for storing the
+* addresses of every callee associated with it
+*/
+   callee[callee_cnt++] = func[subprog];


can you share typical /proc/kallsyms ?
Are you saying that kernel and kernel modules are allocated from
address spaces that are always more than 32-bit apart?
That would mean that all kernel calls into modules are far calls
and the other way around form .ko into kernel?
Performance is probably suffering because every call needs to be built
with full 64-bit offset. No ?

Re: USB rndis_host - slow download transfers, RX errors

2018-02-08 Thread Tomasz Janowski

On Thursday, February 8, 2018 5:37:25 PM EST Greg KH wrote:
> On Thu, Feb 08, 2018 at 10:53:20AM -0500, Tomasz Janowski wrote:
> > On Thursday, February 8, 2018 3:43:05 PM EST Greg KH wrote:
> > > On Thu, Feb 08, 2018 at 02:16:08PM +, Tomasz Janowski, Ph.D. wrote:
> > > > Dear USB developers,
> > > > 
> > > > Based on my google research, the problem I experience seems to happen
> > > > with some newer smartphones. My test case is Samsung Galaxy S8
> > > > (SM-950U1).
> > > > I am trying to use USB tethering and everything seems to work as
> > > > expected
> > > > (modules are loaded, Ethernet devices are up and running, dhcp works
> > > > fine). I can connect to the external world using both LTE or wireless
> > > > network on the phone.
> > > > 
> > > > Now, the problem is that the download speeds are terrible, around 64
> > > > KB/s,
> > > > while uploads are fast, the order of 15 MB/s. These speeds do not
> > > > depend
> > > > on the wireless service provider: the results are similar when I
> > > > tether
> > > > wi-fi. The USB Ethernet interface on the Linux host reports a lot of
> > > > receive errors (attached: device_state.txt), while kernel reports bad
> > > > rndis messages (attached: kernel.log.txt).
> > > > 
> > > > Windows 10 works great with the same hardware (same PC and same
> > > > phone),
> > > > with uploads and downloads in the order of 150 Mbit/s, which is
> > > > probably
> > > > as fast as my wireless network can do. But some people reported issues
> > > > with older Windows drivers too. Is possible that some newer version of
> > > > RNDIS protocol is around and Linux hasn't updated its RNDIS module
> > > > yet?
> > > 
> > > Hey, I was _just_ talking to someone at Google about this same issue
> > > yesterday, you beat him sending this same type of report to the mailing
> > > list, nice job :)
> > > 
> > > Yes, this is not good, and we should work to resolve this, but first,
> > > what kernel version are you using?  I think some fixes for the rndis
> > > driver went in recently to 4.15, but it would be good to verify that
> > > this isn't already resolved.
> > 
> > The error messages which I have attached were produced by a precompiled
> > Debian kernel: "Linux version 4.14.0-0.bpo.3-amd64
> > (debian-ker...@lists.debian.org) (gcc version 6.3.0 20170516 (Debian
> > 6.3.0-18)) #1 SMP Debian 4.14.13-1~bpo9+1 (2018-01-14)".
> > 
> > But I have downloaded the most recent version of the kernel from the
> > official git repository (last commit: Jan 31, 2018) and it had exactly
> > the same problem. Unless a patch was submitted within the last week, the
> > issue is still there.
> > 
> > Should I get the version as of today and test it again?
> 
> If you find a 4.15 tree, that would be great to test, but odds are, the
> issues are still there.
> 
> I'll try to carve out some time to look at this tomorrow, as I have a
> bunch of Android devices to test with, and there's no good reason why
> Windows should be slower than Linux for stuff like this.  We should be
> able to go as fast as the device lets us.  Most likely we are doing
> something "stupid" in the rndis driver somewhere :)
> 
Thanks a lot!

I have tested with kernel downloaded from:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
The version was 4.15.0+, so I guess that was cutting edge kernel as of 
01/31/2018.
Just to clarify, Windows is faster than Linux. 64 KB/s in Linux makes the USB 
tethering not so useful, and if the "PC" is a laptop, one can live with wi-fi 
hotspot, which works fine. Desktop PC must use USB. I also thought that USB has 
a greater potential to deliver better throughput than wifi hotspot.

I have tested another phone, Galaxy J7 Pro (2017 version). That phone uses a 
different hardware, different USB connector, and an older kernel version. J7 
works fine with current Linux kernel, so it is necessary to use as recent 
Android (and possibly hardware) as possible.

Thanks!
Tomasz

Re: [Patch net] ipt_CLUSTERIP: fix a race condition of proc file creation

2018-02-08 Thread Pablo Neira Ayuso

On Wed, Feb 07, 2018 at 09:59:17PM -0800, Cong Wang wrote:
> There is a race condition between clusterip_config_entry_put()
> and clusterip_config_init(), after we release the spinlock in
> clusterip_config_entry_put(), a new proc file with a same IP could
> be created immediately since it is already removed from the configs
> list, therefore it triggers this warning:
> 
> [ cut here ]
> proc_dir_entry 'ipt_CLUSTERIP/172.20.0.170' already registered
> WARNING: CPU: 1 PID: 4152 at fs/proc/generic.c:330 proc_register+0x2a4/0x370 
> fs/proc/generic.c:329
> Kernel panic - not syncing: panic_on_warn set ...
> 
> As a quick fix, just move the proc_remove() inside the spinlock.

Applied, thanks.

Re: [PATCH v4 3/3] qemu: add linkspeed and duplex settings to virtio-net

2018-02-08 Thread Michael S. Tsirkin

On Fri, Jan 05, 2018 at 05:44:55PM -0500, Jason Baron wrote:
> Although linkspeed and duplex can be set in a linux guest via 'ethtool -s',
> this requires custom ethtool commands for virtio-net by default.
> 
> Introduce a new feature flag, VIRTIO_NET_F_SPEED_DUPLEX, which allows
> the hypervisor to export a linkspeed and duplex setting. The user can
> subsequently overwrite it later if desired via: 'ethtool -s'.
> 
> Linkspeed and duplex settings can be set as:
> '-device virtio-net,speed=1,duplex=full'
> 
> where speed is [-1...INT_MAX], and duplex is ["half"|"full"].
> 
> Signed-off-by: Jason Baron 
> Cc: "Michael S. Tsirkin" 
> Cc: Jason Wang 
> Cc: virtio-...@lists.oasis-open.org
> ---
>  hw/net/virtio-net.c | 32 
> +
>  include/hw/virtio/virtio-net.h  |  3 +++
>  include/standard-headers/linux/virtio_net.h | 13 
>  3 files changed, 48 insertions(+)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 54823af..cd63659 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -40,6 +40,12 @@
>  #define VIRTIO_NET_RX_QUEUE_MIN_SIZE VIRTIO_NET_RX_QUEUE_DEFAULT_SIZE
>  #define VIRTIO_NET_TX_QUEUE_MIN_SIZE VIRTIO_NET_TX_QUEUE_DEFAULT_SIZE
>  
> +/* duplex and speed */
> +#define DUPLEX_UNKNOWN  0xff
> +#define DUPLEX_HALF 0x00
> +#define DUPLEX_FULL 0x01
> +#define SPEED_UNKNOWN   -1
> +
>  /*
>   * Calculate the number of bytes up to and including the given 'field' of
>   * 'container'.

Please import ethtool.h with these macros from linux using 
scripts/update-linux-headers.sh
machinery.

> @@ -61,6 +67,8 @@ static VirtIOFeature feature_sizes[] = {
>   .end = endof(struct virtio_net_config, max_virtqueue_pairs)},
>  {.flags = 1ULL << VIRTIO_NET_F_MTU,
>   .end = endof(struct virtio_net_config, mtu)},
> +{.flags = 1ULL << VIRTIO_NET_F_SPEED_DUPLEX,
> + .end = endof(struct virtio_net_config, duplex)},
>  {}
>  };
>  
> @@ -89,6 +97,8 @@ static void virtio_net_get_config(VirtIODevice *vdev, 
> uint8_t *config)
>  virtio_stw_p(vdev, _virtqueue_pairs, n->max_queues);
>  virtio_stw_p(vdev, , n->net_conf.mtu);
>  memcpy(netcfg.mac, n->mac, ETH_ALEN);
> +virtio_stl_p(vdev, , n->net_conf.speed);
> +netcfg.duplex = n->net_conf.duplex;
>  memcpy(config, , n->config_size);
>  }
>  
> @@ -1941,6 +1951,26 @@ static void virtio_net_device_realize(DeviceState 
> *dev, Error **errp)
>  n->host_features |= (1ULL << VIRTIO_NET_F_MTU);
>  }
>  
> +if (n->net_conf.duplex_str) {
> +if (strncmp(n->net_conf.duplex_str, "half", 5) == 0) {
> +n->net_conf.duplex = DUPLEX_HALF;
> +} else if (strncmp(n->net_conf.duplex_str, "full", 5) == 0) {
> +n->net_conf.duplex = DUPLEX_FULL;
> +} else {
> +error_setg(errp, "'duplex' must be 'half' or 'full'");
> +}
> +n->host_features |= (1ULL << VIRTIO_NET_F_SPEED_DUPLEX);
> +} else {
> +n->net_conf.duplex = DUPLEX_UNKNOWN;
> +}
> +
> +if (n->net_conf.speed < SPEED_UNKNOWN) {
> +error_setg(errp, "'speed' must be between -1 (SPEED_UNKOWN) and "
> +   "INT_MAX");
> +} else if (n->net_conf.speed >= 0) {
> +n->host_features |= (1ULL << VIRTIO_NET_F_SPEED_DUPLEX);
> +}
> +
>  virtio_net_set_config_size(n, n->host_features);
>  virtio_init(vdev, "virtio-net", VIRTIO_ID_NET, n->config_size);
>  
> @@ -2161,6 +2191,8 @@ static Property virtio_net_properties[] = {
>  DEFINE_PROP_UINT16("host_mtu", VirtIONet, net_conf.mtu, 0),
>  DEFINE_PROP_BOOL("x-mtu-bypass-backend", VirtIONet, mtu_bypass_backend,
>   true),
> +DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
> +DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
>  DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> index e7634c9..02484dc 100644
> --- a/include/hw/virtio/virtio-net.h
> +++ b/include/hw/virtio/virtio-net.h
> @@ -38,6 +38,9 @@ typedef struct virtio_net_conf
>  uint16_t rx_queue_size;
>  uint16_t tx_queue_size;
>  uint16_t mtu;
> +int32_t speed;
> +char *duplex_str;
> +uint8_t duplex;
>  } virtio_net_conf;
>  
>  /* Maximum packet size we can receive from tap device: header + 64k */
> diff --git a/include/standard-headers/linux/virtio_net.h 
> b/include/standard-headers/linux/virtio_net.h
> index 30ff249..17c8531 100644
> --- a/include/standard-headers/linux/virtio_net.h
> +++ b/include/standard-headers/linux/virtio_net.h
> @@ -57,6 +57,8 @@
>* Steering */
>  #define VIRTIO_NET_F_CTRL_MAC_ADDR 23/* Set MAC address */
>  
> +#define VIRTIO_NET_F_SPEED_DUPLEX 63   /* Device set linkspeed and duplex */
> +
>  #ifndef

Re: USB rndis_host - slow download transfers, RX errors

2018-02-08 Thread Greg KH

On Thu, Feb 08, 2018 at 10:53:20AM -0500, Tomasz Janowski wrote:
> On Thursday, February 8, 2018 3:43:05 PM EST Greg KH wrote:
> > On Thu, Feb 08, 2018 at 02:16:08PM +, Tomasz Janowski, Ph.D. wrote:
> > > Dear USB developers,
> > > 
> > > Based on my google research, the problem I experience seems to happen
> > > with some newer smartphones. My test case is Samsung Galaxy S8 (SM-950U1).
> > > I am trying to use USB tethering and everything seems to work as expected
> > > (modules are loaded, Ethernet devices are up and running, dhcp works
> > > fine). I can connect to the external world using both LTE or wireless
> > > network on the phone.
> > > 
> > > Now, the problem is that the download speeds are terrible, around 64 KB/s,
> > > while uploads are fast, the order of 15 MB/s. These speeds do not depend
> > > on the wireless service provider: the results are similar when I tether
> > > wi-fi. The USB Ethernet interface on the Linux host reports a lot of
> > > receive errors (attached: device_state.txt), while kernel reports bad
> > > rndis messages (attached: kernel.log.txt).
> > > 
> > > Windows 10 works great with the same hardware (same PC and same phone),
> > > with uploads and downloads in the order of 150 Mbit/s, which is probably
> > > as fast as my wireless network can do. But some people reported issues
> > > with older Windows drivers too. Is possible that some newer version of
> > > RNDIS protocol is around and Linux hasn't updated its RNDIS module yet?
> > 
> > Hey, I was _just_ talking to someone at Google about this same issue
> > yesterday, you beat him sending this same type of report to the mailing
> > list, nice job :)
> > 
> > Yes, this is not good, and we should work to resolve this, but first,
> > what kernel version are you using?  I think some fixes for the rndis
> > driver went in recently to 4.15, but it would be good to verify that
> > this isn't already resolved.
> 
> The error messages which I have attached were produced by a precompiled 
> Debian 
> kernel: "Linux version 4.14.0-0.bpo.3-amd64 (debian-ker...@lists.debian.org) 
> (gcc version 6.3.0 20170516 (Debian 6.3.0-18)) #1 SMP Debian 4.14.13-1~bpo9+1 
> (2018-01-14)".
> 
> But I have downloaded the most recent version of the kernel from the official 
> git repository (last commit: Jan 31, 2018) and it had exactly the same 
> problem. Unless a patch was submitted within the last week, the issue is 
> still 
> there.
> 
> Should I get the version as of today and test it again?

If you find a 4.15 tree, that would be great to test, but odds are, the
issues are still there.

I'll try to carve out some time to look at this tomorrow, as I have a
bunch of Android devices to test with, and there's no good reason why
Windows should be slower than Linux for stuff like this.  We should be
able to go as fast as the device lets us.  Most likely we are doing
something "stupid" in the rndis driver somewhere :)

thanks,

greg k-h

[PATCH iproute2-next 3/4] json: fix newline at end of array

2018-02-08 Thread Stephen Hemminger

From: Stephen Hemminger 

The json print library was toggling pretty print at the end of
an array to workaround a bug in underlying json_writer.
Instead, just fix json_writer to pretty print array correctly.

Signed-off-by: Stephen Hemminger 
---
 lib/json_print.c  | 2 --
 lib/json_writer.c | 5 +
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/json_print.c b/lib/json_print.c
index e3da1bdfd5b0..b507b14ba27f 100644
--- a/lib/json_print.c
+++ b/lib/json_print.c
@@ -89,9 +89,7 @@ void open_json_array(enum output_type type, const char *str)
 void close_json_array(enum output_type type, const char *str)
 {
if (_IS_JSON_CONTEXT(type)) {
-   jsonw_pretty(_jw, false);
jsonw_end_array(_jw);
-   jsonw_pretty(_jw, true);
} else if (_IS_FP_CONTEXT(type)) {
printf("%s", str);
}
diff --git a/lib/json_writer.c b/lib/json_writer.c
index f3eeaf7bc479..0d910dc068b5 100644
--- a/lib/json_writer.c
+++ b/lib/json_writer.c
@@ -180,10 +180,15 @@ void jsonw_end_object(json_writer_t *self)
 void jsonw_start_array(json_writer_t *self)
 {
jsonw_begin(self, '[');
+   if (self->pretty)
+   putc(' ', self->out);
 }
 
 void jsonw_end_array(json_writer_t *self)
 {
+   if (self->pretty && self->sep)
+   putc(' ', self->out);
+   self->sep = '\0';
jsonw_end(self, ']');
 }
 
-- 
2.15.1

[PATCH iproute2-next 0/4] JSON (and color) support for iproute

2018-02-08 Thread Stephen Hemminger

From: Stephen Hemminger 

This set of patches adds JSON output to route printing.
Tested for the simple cases, but there are many variations and there
such as lw tunnels which have not be tested.

The color formatting may need some additional tweaks. It looks
like for some tags the tag is also showing up in color.
This should be fixed in print_color_string rather than having
to do special case handling in so many places.

This patchset also changes the default JSON output to be compressed
(since the purpose of JSON is to make output machine readable);
but do optional pretty print formatting with -p flag.

Stephen Hemminger (4):
  json: make pretty printing optional
  man: add documentation for json and pretty flags
  json: fix newline at end of array
  iproute: implement JSON and color output

 include/json_print.h  |   2 +
 include/utils.h   |   5 +
 ip/ip.c   |   7 +-
 ip/iproute.c  | 376 +++---
 ip/iproute_lwtunnel.c | 129 ++---
 lib/json_print.c  |   5 +-
 lib/json_writer.c |   5 +
 man/man8/ip.8 |  18 ++-
 man/man8/tc.8 |   3 +-
 tc/tc.c   |   3 +
 10 files changed, 381 insertions(+), 172 deletions(-)

-- 
2.15.1

[PATCH iproute2-next 4/4] iproute: implement JSON and color output

2018-02-08 Thread Stephen Hemminger

From: Stephen Hemminger 

Add JSON and color output formatting to ip route command.
Similar to existing address and link output.

Signed-off-by: Stephen Hemminger 
---
 include/utils.h   |   5 +
 ip/iproute.c  | 376 +++---
 ip/iproute_lwtunnel.c | 129 ++---
 3 files changed, 348 insertions(+), 162 deletions(-)

diff --git a/include/utils.h b/include/utils.h
index 8b8ee2e55ab8..4dc514d66ad1 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -23,6 +23,7 @@ extern int resolve_hosts;
 extern int oneline;
 extern int brief;
 extern int json;
+extern int pretty;
 extern int timestamp;
 extern int timestamp_short;
 extern const char * _SL_;
@@ -155,6 +156,10 @@ int af_byte_len(int af);
 
 const char *format_host_r(int af, int len, const void *addr,
   char *buf, int buflen);
+#define format_host_rta_r(af, rta, buf, buflen)\
+   format_host_r(af, RTA_PAYLOAD(rta), RTA_DATA(rta), \
+ buf, buflen)
+
 const char *format_host(int af, int lne, const void *addr);
 #define format_host_rta(af, rta) \
format_host(af, RTA_PAYLOAD(rta), RTA_DATA(rta))
diff --git a/ip/iproute.c b/ip/iproute.c
index 3c56240f1291..e4809a4383d9 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -339,72 +339,95 @@ static void print_rtax_features(FILE *fp, unsigned int 
features)
unsigned int of = features;
 
if (features & RTAX_FEATURE_ECN) {
-   fprintf(fp, "ecn ");
+   print_null(PRINT_ANY, "ecn", "ecn ", NULL);
features &= ~RTAX_FEATURE_ECN;
}
 
if (features)
-   fprintf(fp, "0x%x ", of);
+   print_0xhex(PRINT_ANY,
+   "features", "0x%x ", of);
 }
 
 static void print_rt_flags(FILE *fp, unsigned int flags)
 {
+   open_json_array(PRINT_JSON,
+   is_json_context() ?  "flags" : "");
+
if (flags & RTNH_F_DEAD)
-   fprintf(fp, "dead ");
+   print_string(PRINT_ANY, NULL, "%s ", "dead");
if (flags & RTNH_F_ONLINK)
-   fprintf(fp, "onlink ");
+   print_string(PRINT_ANY, NULL, "%s ", "onlink");
if (flags & RTNH_F_PERVASIVE)
-   fprintf(fp, "pervasive ");
+   print_string(PRINT_ANY, NULL, "%s ", "pervasive");
if (flags & RTNH_F_OFFLOAD)
-   fprintf(fp, "offload ");
+   print_string(PRINT_ANY, NULL, "%s ", "offload");
+   if (flags & RTM_F_NOTIFY)
+   print_string(PRINT_ANY, NULL, "%s ", "notify");
if (flags & RTNH_F_LINKDOWN)
-   fprintf(fp, "linkdown ");
+   print_string(PRINT_ANY, NULL, "%s ", "linkdown");
if (flags & RTNH_F_UNRESOLVED)
-   fprintf(fp, "unresolved ");
+   print_string(PRINT_ANY, NULL, "%s ", "unresolved");
+
+   close_json_array(PRINT_JSON, NULL);
 }
 
 static void print_rt_pref(FILE *fp, unsigned int pref)
 {
-   fprintf(fp, "pref ");
 
switch (pref) {
case ICMPV6_ROUTER_PREF_LOW:
-   fprintf(fp, "low");
+   print_string(PRINT_ANY,
+"pref", "pref %s", "low");
break;
case ICMPV6_ROUTER_PREF_MEDIUM:
-   fprintf(fp, "medium");
+   print_string(PRINT_ANY,
+"pref", "pref %s", "medium");
break;
case ICMPV6_ROUTER_PREF_HIGH:
-   fprintf(fp, "high");
+   print_string(PRINT_ANY,
+"pref", "pref %s", "high");
break;
default:
-   fprintf(fp, "%u", pref);
+   print_uint(PRINT_ANY,
+  "pref", "%u", pref);
}
 }
 
 static void print_rta_if(FILE *fp, const struct rtattr *rta,
-const char *prefix)
+   const char *prefix)
 {
const char *ifname = ll_index_to_name(rta_getattr_u32(rta));
 
-   fprintf(fp, "%s %s ", prefix, ifname);
+   if (is_json_context())
+   print_string(PRINT_JSON, prefix, NULL, ifname);
+   else {
+   fprintf(fp, "%s ", prefix);
+   color_fprintf(fp, COLOR_IFNAME, "%s ", ifname);
+   }
 }
 
 static void print_cache_flags(FILE *fp, __u32 flags)
 {
+   json_writer_t *jw = get_json_writer();
flags &= ~0x;
 
-   fprintf(fp, "%scache ", _SL_);
-
-   if (flags == 0)
-   return;
-
-   putc('<', fp);
+   if (jw) {
+   jsonw_name(jw, "cache");
+   jsonw_start_array(jw);
+   } else {
+   fprintf(fp, "%scache ", _SL_);
+   if (flags == 0)
+   return;
+   putc('<', fp);
+   }
 
 #define PRTFL(fl, flname)  \
if (flags

[PATCH iproute2-next 2/4] man: add documentation for json and pretty flags

2018-02-08 Thread Stephen Hemminger

From: Stephen Hemminger 

Add description for -json and -pretty options.

Signed-off-by: Stephen Hemminger 
---
 ip/ip.c   |  4 ++--
 man/man8/ip.8 | 18 ++
 man/man8/tc.8 |  3 ++-
 3 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/ip/ip.c b/ip/ip.c
index a6611292808d..233a9d772492 100644
--- a/ip/ip.c
+++ b/ip/ip.c
@@ -54,12 +54,12 @@ static void usage(void)
 "   netns | l2tp | fou | macsec | tcp_metrics | token | 
netconf | ila |\n"
 "   vrf | sr }\n"
 "   OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] | -r[esolve] |\n"
-"-h[uman-readable] | -iec |\n"
+"-h[uman-readable] | -iec | -j[son] | -p[retty] |\n"
 "-f[amily] { inet | inet6 | ipx | dnet | mpls | bridge | 
link } |\n"
 "-4 | -6 | -I | -D | -B | -0 |\n"
 "-l[oops] { maximum-addr-flush-attempts } | -br[ief] |\n"
 "-o[neline] | -t[imestamp] | -ts[hort] | -b[atch] 
[filename] |\n"
-"-rc[vbuf] [size] | -n[etns] name | -a[ll] | -c[olor]}\n");
+"-rc[vbuf] [size] | -n[etns] name | -a[ll] | -c[olor]}\n");
exit(-1);
 }
 
diff --git a/man/man8/ip.8 b/man/man8/ip.8
index 7f26582db795..0087d18b7470 100644
--- a/man/man8/ip.8
+++ b/man/man8/ip.8
@@ -48,9 +48,10 @@ ip \- show / manipulate routing, network devices, interfaces 
and tunnels
 \fB\-ts\fR[\fIhort\fR] |
 \fB\-n\fR[\fIetns\fR] name |
 \fB\-a\fR[\fIll\fR] |
-\fB\-c\fR[\fIolor\fR]
-\fB\-br\fR[\fIief\fR] }
-
+\fB\-c\fR[\fIolor\fR] |
+\fB\-br\fR[\fIief\fR] |
+\fB\-j\fR[son\fR] |
+\fB\-p\fR[retty\fR] }
 
 .SH OPTIONS
 
@@ -208,10 +209,19 @@ Set the netlink socket receive buffer size, defaults to 
1MB.
 print human readable rates in IEC units (e.g. 1Ki = 1024).
 
 .TP
-.BR "\-br" , "\-brief"
+.BR "\-br" , " \-brief"
 Print only basic information in a tabular format for better readability. This 
option is currently only supported by
 .BR "ip addr show " and " ip link show " commands.
 
+.TP
+.BR "\-j", " \-json"
+Output results in JavaScript Object Notation (JSON).
+
+.TP
+.BR "\-p", " \-pretty"
+The default JSON format is compact and more efficient to parse but hard for 
most users to read.
+This flag adds indentation for readability.
+
 .SH IP - COMMAND SYNTAX
 
 .SS
diff --git a/man/man8/tc.8 b/man/man8/tc.8
index 5ffea373d18b..a58f46542340 100644
--- a/man/man8/tc.8
+++ b/man/man8/tc.8
@@ -670,7 +670,8 @@ output raw hex values for handles.
 
 .TP
 .BR "\-p", " \-pretty"
-decode filter offset and mask values to equivalent filter commands based on 
TCP/IP.
+for u32 filter, decode offset and mask values to equivalent filter commands 
based on TCP/IP.
+In JSON output, add whitespace to improve readability.
 
 .TP
 .BR "\-iec"
-- 
2.15.1

[PATCH iproute2-next 1/4] json: make pretty printing optional

2018-02-08 Thread Stephen Hemminger

From: Stephen Hemminger 

Since JSON is intended for programmatic consumption, it makes
sense for the default output format to be concise as possible.

For programmer and other uses, it is helpful to keep the pretty
whitespace format; therefore enable it with -p flag.

Signed-off-by: Stephen Hemminger 
---
 include/json_print.h | 2 ++
 ip/ip.c  | 3 +++
 lib/json_print.c | 3 ++-
 tc/tc.c  | 3 +++
 4 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/json_print.h b/include/json_print.h
index 2ca7830adbd6..45a817ce6b9a 100644
--- a/include/json_print.h
+++ b/include/json_print.h
@@ -15,6 +15,8 @@
 #include "json_writer.h"
 #include "color.h"
 
+extern int show_pretty;
+
 json_writer_t *get_json_writer(void);
 
 /*
diff --git a/ip/ip.c b/ip/ip.c
index b15e6b66b3f6..a6611292808d 100644
--- a/ip/ip.c
+++ b/ip/ip.c
@@ -31,6 +31,7 @@ int show_stats;
 int show_details;
 int oneline;
 int brief;
+int show_pretty;
 int json;
 int timestamp;
 const char *_SL_;
@@ -259,6 +260,8 @@ int main(int argc, char **argv)
++brief;
} else if (matches(opt, "-json") == 0) {
++json;
+   } else if (matches(opt, "-pretty") == 0) {
+   ++show_pretty;
} else if (matches(opt, "-rcvbuf") == 0) {
unsigned int size;
 
diff --git a/lib/json_print.c b/lib/json_print.c
index 6518ba98f5bf..e3da1bdfd5b0 100644
--- a/lib/json_print.c
+++ b/lib/json_print.c
@@ -28,7 +28,8 @@ void new_json_obj(int json)
perror("json object");
exit(1);
}
-   jsonw_pretty(_jw, true);
+   if (show_pretty)
+   jsonw_pretty(_jw, true);
jsonw_start_array(_jw);
}
 }
diff --git a/tc/tc.c b/tc/tc.c
index 63e64fece87d..aba5c101739c 100644
--- a/tc/tc.c
+++ b/tc/tc.c
@@ -42,6 +42,7 @@ int use_iec;
 int force;
 bool use_names;
 int json;
+int pretty;
 
 static char *conf_file;
 
@@ -484,6 +485,8 @@ int main(int argc, char **argv)
++timestamp_short;
} else if (matches(argv[1], "-json") == 0) {
++json;
+   } else if (matches(argv[1], "-pretty") == 0) {
+   ++pretty;
} else {
fprintf(stderr, "Option \"%s\" is unknown, try \"tc 
-help\".\n", argv[1]);
return -1;
-- 
2.15.1

Re: [PATCH RFC 2/4] netlink: add generic object description infrastructure

2018-02-08 Thread Pablo Neira Ayuso

Hi Randy,

On Wed, Feb 07, 2018 at 05:28:20PM -0800, Randy Dunlap wrote:
[...]
> > diff --git a/include/net/nldesc.h b/include/net/nldesc.h
> > new file mode 100644
> > index ..19306a648f10
> > --- /dev/null
> > +++ b/include/net/nldesc.h
> > @@ -0,0 +1,160 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef __NET_NLDESC_H
> > +#define __NET_NLDESC_H
> > +
> > +#include 
> > +
> > +struct nl_desc_cmd;
> > +struct nl_desc_obj;
> > +
> > +struct nl_desc_cmds {
> > +   int max;
> > +   const struct nl_desc_cmd*table;
> > +};
> > +
> > +struct nl_desc_objs {
> > +   int max;
> > +   const struct nl_desc_obj**table;
> > +};
> > +
> > +struct nl_desc_req {
> > +   u32 bus;
> > +};
> > +
> > +struct net;
> > +struct sk_buff;
> > +struct nlmsghdr;
> > +struct nlattr;
> > +
> 
> > +
> > +/**
> > + * struct nl_desc_obj - netlink object description
> > + * @id: unique ID to identify this netlink object
> > + * @max: number of attributes to describe this object
> 
>   @attr_max:

Thanks for spotting this.

> > + * @attrs: array of attribute descriptions
> > + */
> > +struct nl_desc_obj {
> > +   u16 id;
> > +   u16 attr_max;
> > +   const struct nl_desc_attr   *attrs;
> > +};
> 
> 
> Is there a test program for this?

I'm attaching what I have used to test this. These files print the
netlink bus description.

> Maybe add it to tools/testing/ ?

Yes, I can place it there, no problem. This userspace code depends on
libmnl though.

I was planning to add infrastructure to libmnl to add a couple of helper
functions that allows us to populate the nl_desc cache and to look up
for presence of commands/attributes.

People that don't like libmnl for whatever reason can add similar code
to their libraries too, of course.

Thanks!
>From 7826d6aa47d20bc09f7c8e33a457a5a338a8db55 Mon Sep 17 00:00:00 2001
From: Pablo Neira Ayuso 
Date: Tue, 16 Jan 2018 00:05:37 +0100
Subject: [PATCH libmnl] examples: add netlink bus description

Add nft-dump-desc-cmds.c and nft-dump-desc-obj.c to dump command and
object descriptions.
---
 examples/Makefile.am  |  11 ++
 examples/nft-dump-desc-cmds.c | 177 
 examples/nft-dump-desc-objs.c | 263 ++
 3 files changed, 451 insertions(+)
 create mode 100644 examples/nft-dump-desc-cmds.c
 create mode 100644 examples/nft-dump-desc-objs.c

diff --git a/examples/Makefile.am b/examples/Makefile.am
index e5cb052b315c..a8d4ba50f5ad 100644
--- a/examples/Makefile.am
+++ b/examples/Makefile.am
@@ -1 +1,12 @@
+include $(top_srcdir)/Make_global.am
+
 SUBDIRS = genl kobject netfilter rtnl
+
+check_PROGRAMS = nft-dump-desc-cmds \
+ nft-dump-desc-objs
+
+nft_dump_desc_cmds_SOURCES = nft-dump-desc-cmds.c
+nft_dump_desc_cmds_LDADD = ../src/libmnl.la
+
+nft_dump_desc_objs_SOURCES = nft-dump-desc-objs.c
+nft_dump_desc_objs_LDADD = ../src/libmnl.la
diff --git a/examples/nft-dump-desc-cmds.c b/examples/nft-dump-desc-cmds.c
new file mode 100644
index ..cfb5276e911f
--- /dev/null
+++ b/examples/nft-dump-desc-cmds.c
@@ -0,0 +1,177 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include 
+
+struct nl_desc_cmd;
+struct nl_desc_attr;
+
+struct nl_desc {
+	uint32_t			num_cmds;
+	struct nl_desc_cmd		*cmds;
+};
+
+struct nl_desc_cmd {
+	uint32_t			id;
+	uint32_t			obj_id;
+};
+
+static struct nl_desc nl_desc;
+
+static int nla_desc_attr_cb(const struct nlattr *attr, void *data)
+{
+	const struct nlattr **tb = data;
+	int type = mnl_attr_get_type(attr);
+
+	if (mnl_attr_type_valid(attr, NLA_DESC_CMD_MAX) < 0)
+		return MNL_CB_OK;
+
+	switch (type) {
+	case NLA_DESC_CMD_ID:
+	case NLA_DESC_CMD_OBJ:
+		if (mnl_attr_validate(attr, MNL_TYPE_U32) < 0) {
+			perror("mnl_attr_validate");
+			return MNL_CB_ERROR;
+		}
+		break;
+	}
+	tb[type] = attr;
+	return MNL_CB_OK;
+}
+
+static void print_desc_cmd(const struct nlattr *nest, struct nl_desc_cmd *cmd)
+{
+	struct nlattr *tb[NLA_DESC_CMD_MAX + 1] = {};
+
+	mnl_attr_parse_nested(nest, nla_desc_attr_cb, tb);
+	if (tb[NLA_DESC_CMD_ID])
+		cmd->id = mnl_attr_get_u32(tb[NLA_DESC_CMD_ID]);
+	if (tb[NLA_DESC_CMD_OBJ])
+		cmd->obj_id = mnl_attr_get_u32(tb[NLA_DESC_CMD_OBJ]);
+}
+
+static void print_desc_cmds(const struct nlattr *nest, struct nl_desc_cmd *cmds)
+{
+	struct nlattr *pos;
+	int j = 1;
+
+	mnl_attr_for_each_nested(pos, nest)
+		print_desc_cmd(pos, [j++]);
+}
+
+static int nla_desc_cmds_cb(const struct nlattr *attr, void *data)
+{
+	const struct nlattr **tb = data;
+	int type = mnl_attr_get_type(attr);
+
+	if (mnl_attr_type_valid(attr, NLA_DESC_OBJ_MAX) < 0)
+		return MNL_CB_OK;
+
+	switch(type) {
+	case NLA_DESC_NUM_OBJS:
+		if (mnl_attr_validate(attr, MNL_TYPE_U32) < 0) {
+			perror("mnl_attr_validate");
+			return

[net 1/1] tipc: fix skb truesize/datasize ratio control

2018-02-08 Thread Jon Maloy

From: Hoang Le 

In commit d618d09a68e4 ("tipc: enforce valid ratio between skb truesize
and contents") we introduced a test for ensuring that the condition
truesize/datasize <= 4 is true for a received buffer. Unfortunately this
test has two problems.

- Because of the integer arithmetics the test
  if (skb->truesize / buf_roundup_len(skb) > 4) will miss all
  ratios [4 < ratio < 5], which was not the intention.
- The buffer returned by skb_copy() inherits skb->truesize of the
  original buffer, which doesn't help the situation at all.

In this commit, we change the ratio condition and replace skb_copy()
with a call to skb_copy_expand() to finally get this right.

Acked-by: Jon Maloy 
Signed-off-by: Jon Maloy 
---
 net/tipc/msg.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/tipc/msg.c b/net/tipc/msg.c
index 55d8ba9..4e1c6f6 100644
--- a/net/tipc/msg.c
+++ b/net/tipc/msg.c
@@ -208,8 +208,8 @@ bool tipc_msg_validate(struct sk_buff **_skb)
int msz, hsz;
 
/* Ensure that flow control ratio condition is satisfied */
-   if (unlikely(skb->truesize / buf_roundup_len(skb) > 4)) {
-   skb = skb_copy(skb, GFP_ATOMIC);
+   if (unlikely(skb->truesize / buf_roundup_len(skb) >= 4)) {
+   skb = skb_copy_expand(skb, BUF_HEADROOM, 0, GFP_ATOMIC);
if (!skb)
return false;
kfree_skb(*_skb);
-- 
2.1.4

[PULL] virtio, vhost: fixes, cleanups, features

2018-02-08 Thread Michael S. Tsirkin

The following changes since commit d8a5b80568a9cb66810e75b182018e9edb68e8ff:

  Linux 4.15 (2018-01-28 13:20:33 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to d25cc43c6775bff6b8e3dad97c747954b805e421:

  vhost: don't hold onto file pointer for VHOST_SET_LOG_FD (2018-02-01 16:26:47 
+0200)


virtio, vhost: fixes, cleanups, features

This includes the disk/cache memory stats for for the virtio balloon,
as well as multiple fixes and cleanups.

Signed-off-by: Michael S. Tsirkin 


Arvind Yadav (1):
  virtio: virtio_mmio: make of_device_ids const.

Eric Biggers (3):
  vhost: don't hold onto file pointer for VHOST_SET_VRING_CALL
  vhost: don't hold onto file pointer for VHOST_SET_VRING_ERR
  vhost: don't hold onto file pointer for VHOST_SET_LOG_FD

Markus Elfring (1):
  vhost/scsi: Improve a size determination in four functions

Michael S. Tsirkin (2):
  virtio/ringtest: fix up need_event math
  virtio/ringtest: virtio_ring: fix up need_event math

Peter Malone (1):
  ringtest: ring.c malloc & memset to calloc

Stefan Hajnoczi (1):
  virtio_blk: print capacity at probe time

Tomáš Golembiovský (1):
  virtio_balloon: include disk/file caches memory statistics

Tonghao Zhang (1):
  vhost: Remove the unused variable.

Vasyl Gomonovych (2):
  virtio-mmio: Use PTR_ERR_OR_ZERO()
  firmware: Use PTR_ERR_OR_ZERO()

Vincent Legoll (1):
  virtio: make VIRTIO a menuconfig to ease disabling it all

weiping zhang (3):
  virtio: split device_register into device_initialize and device_add
  virtio_pci: don't kfree device on register failure
  virtio_vop: don't kfree device on register failure

夷则(Caspar) (1):
  vhost: remove unused lock check flag in vhost_dev_cleanup()

 drivers/block/virtio_blk.c  | 32 
 drivers/firmware/qemu_fw_cfg.c  |  4 +-
 drivers/misc/mic/vop/vop_main.c | 20 ++
 drivers/vhost/net.c |  2 +-
 drivers/vhost/scsi.c| 11 +++---
 drivers/vhost/test.c|  2 +-
 drivers/vhost/vhost.c   | 68 -
 drivers/vhost/vhost.h   |  9 +
 drivers/vhost/vsock.c   |  2 +-
 drivers/virtio/Kconfig  |  8 +++-
 drivers/virtio/virtio.c | 18 +++--
 drivers/virtio/virtio_balloon.c |  4 ++
 drivers/virtio/virtio_mmio.c|  6 +--
 drivers/virtio/virtio_pci_common.c  |  8 +++-
 include/uapi/linux/virtio_balloon.h |  3 +-
 tools/virtio/ringtest/ring.c| 30 ---
 tools/virtio/ringtest/virtio_ring_0_9.c | 24 +++-
 17 files changed, 120 insertions(+), 131 deletions(-)

[PATCH iproute2-next] ip: Use print_0xhex() where appropriate

2018-02-08 Thread Serhey Popovych

In gre/gre6 for non-JSON output 0x%x format is used: use print_0xhex()
to get the same value for JSON.

Get rid of custom _print_hex() in bridge slave code: print_0xhex() can
be used perfectly.

Break long print_uint() with long argument list to fit into 80 columns.

Signed-off-by: Serhey Popovych 
---
 ip/iplink_bridge_slave.c |   24 +---
 ip/link_gre.c|6 --
 ip/link_gre6.c   |6 --
 3 files changed, 13 insertions(+), 23 deletions(-)

diff --git a/ip/iplink_bridge_slave.c b/ip/iplink_bridge_slave.c
index be0fb4f..3fbfb87 100644
--- a/ip/iplink_bridge_slave.c
+++ b/ip/iplink_bridge_slave.c
@@ -81,21 +81,6 @@ static void _print_onoff(FILE *f, char *json_flag, char 
*flag, __u8 val)
fprintf(f, "%s %s ", flag, val ? "on" : "off");
 }
 
-static void _print_hex(FILE *f,
-  const char *json_attr,
-  const char *attr,
-  __u16 val)
-{
-   if (is_json_context()) {
-   SPRINT_BUF(b1);
-
-   snprintf(b1, sizeof(b1), "0x%x", val);
-   print_string(PRINT_JSON, json_attr, NULL, b1);
-   } else {
-   fprintf(f, "%s 0x%x ", attr, val);
-   }
-}
-
 static void _print_timer(FILE *f, const char *attr, struct rtattr *timer)
 {
struct timeval tv;
@@ -181,11 +166,11 @@ static void bridge_slave_print_opt(struct link_util *lu, 
FILE *f,
 rta_getattr_u8(tb[IFLA_BRPORT_UNICAST_FLOOD]));
 
if (tb[IFLA_BRPORT_ID])
-   _print_hex(f, "id", "port_id",
-  rta_getattr_u16(tb[IFLA_BRPORT_ID]));
+   print_0xhex(PRINT_ANY, "id", "port_id 0x%x ",
+   rta_getattr_u16(tb[IFLA_BRPORT_ID]));
 
if (tb[IFLA_BRPORT_NO])
-   _print_hex(f, "no", "port_no",
+   print_0xhex(PRINT_ANY, "no", "port_no 0x%x ",
   rta_getattr_u16(tb[IFLA_BRPORT_NO]));
 
if (tb[IFLA_BRPORT_DESIGNATED_PORT])
@@ -279,7 +264,8 @@ static void bridge_slave_print_opt(struct link_util *lu, 
FILE *f,
__u16 fwd_mask;
 
fwd_mask = rta_getattr_u16(tb[IFLA_BRPORT_GROUP_FWD_MASK]);
-   _print_hex(f, "group_fwd_mask", "group_fwd_mask", fwd_mask);
+   print_0xhex(PRINT_ANY, "group_fwd_mask",
+   "group_fwd_mask 0x%x ", fwd_mask);
_bitmask2str(fwd_mask, convbuf, sizeof(convbuf), fwd_mask_tbl);
print_string(PRINT_ANY, "group_fwd_mask_str",
 "group_fwd_mask_str %s ", convbuf);
diff --git a/ip/link_gre.c b/ip/link_gre.c
index b2573a1..e6a3174 100644
--- a/ip/link_gre.c
+++ b/ip/link_gre.c
@@ -490,7 +490,8 @@ static void gre_print_opt(struct link_util *lu, FILE *f, 
struct rtattr *tb[])
if (tb[IFLA_GRE_ERSPAN_VER]) {
__u8 erspan_ver = rta_getattr_u8(tb[IFLA_GRE_ERSPAN_VER]);
 
-   print_uint(PRINT_ANY, "erspan_ver", "erspan_ver %u ", 
erspan_ver);
+   print_uint(PRINT_ANY,
+  "erspan_ver", "erspan_ver %u ", erspan_ver);
}
 
if (tb[IFLA_GRE_ERSPAN_DIR]) {
@@ -507,7 +508,8 @@ static void gre_print_opt(struct link_util *lu, FILE *f, 
struct rtattr *tb[])
if (tb[IFLA_GRE_ERSPAN_HWID]) {
__u16 erspan_hwid = rta_getattr_u16(tb[IFLA_GRE_ERSPAN_HWID]);
 
-   print_hex(PRINT_ANY, "erspan_hwid", "erspan_hwid 0x%x ", 
erspan_hwid);
+   print_0xhex(PRINT_ANY,
+   "erspan_hwid", "erspan_hwid 0x%x ", erspan_hwid);
}
 
tnl_print_encap(tb,
diff --git a/ip/link_gre6.c b/ip/link_gre6.c
index 880b75d..86dcc96 100644
--- a/ip/link_gre6.c
+++ b/ip/link_gre6.c
@@ -557,7 +557,8 @@ static void gre_print_opt(struct link_util *lu, FILE *f, 
struct rtattr *tb[])
if (tb[IFLA_GRE_ERSPAN_VER]) {
__u8 erspan_ver = rta_getattr_u8(tb[IFLA_GRE_ERSPAN_VER]);
 
-   print_uint(PRINT_ANY, "erspan_ver", "erspan_ver %u ", 
erspan_ver);
+   print_uint(PRINT_ANY,
+  "erspan_ver", "erspan_ver %u ", erspan_ver);
}
 
if (tb[IFLA_GRE_ERSPAN_DIR]) {
@@ -574,7 +575,8 @@ static void gre_print_opt(struct link_util *lu, FILE *f, 
struct rtattr *tb[])
if (tb[IFLA_GRE_ERSPAN_HWID]) {
__u16 erspan_hwid = rta_getattr_u16(tb[IFLA_GRE_ERSPAN_HWID]);
 
-   print_hex(PRINT_ANY, "erspan_hwid", "erspan_hwid 0x%x ", 
erspan_hwid);
+   print_0xhex(PRINT_ANY,
+   "erspan_hwid", "erspan_hwid 0x%x ", erspan_hwid);
}
 
tnl_print_encap(tb,
-- 
1.7.10.4

Re: [PATCH] net: ethernet: ti: cpsw: fix net watchdog timeout

2018-02-08 Thread Grygorii Strashko

On 02/07/2018 08:57 PM, David Miller wrote:

From: Grygorii Strashko 
Date: Tue, 6 Feb 2018 19:17:06 -0600

It was discovered that simple program which indefinitely sends 200b UDP
packets and runs on TI AM574x SoC (SMP) under RT Kernel triggers network
watchdog timeout in TI CPSW driver (<6 hours run). The network watchdog
timeout is triggered due to race between cpsw_ndo_start_xmit() and
cpsw_tx_handler() [NAPI]

cpsw_ndo_start_xmit()
if (unlikely(!cpdma_check_free_tx_desc(txch))) {
txq = netdev_get_tx_queue(ndev, q_idx);
netif_tx_stop_queue(txq);

^^ as per [1] barier has to be used after set_bit() otherwise new value
might not be visible to other cpus
}

cpsw_tx_handler()
if (unlikely(netif_tx_queue_stopped(txq)))
netif_tx_wake_queue(txq);

and when it happens ndev TX queue became disabled forever while driver's HW
TX queue is empty.

Fix this, by adding smp_mb__after_atomic() after netif_tx_stop_queue()
calls and double check for free TX descriptors after stopping ndev TX queue
- if there are free TX descriptors wake up ndev TX queue.

[1] https://www.kernel.org/doc/html/latest/core-api/atomic_ops.html
Signed-off-by: Grygorii Strashko 

Applied, thanks.

Thank you David.

Could this be marked as stable material 4.9+?

--
regards,
-grygorii

Re: [PATCH iproute2 v1] ip netns: allow negative nsid

2018-02-08 Thread Stephen Hemminger

On Tue,  6 Feb 2018 19:39:31 +0100
Christian Brauner  wrote:

> If the kernel receives a negative nsid it will automatically assign the
> next available nsid. In this case alloc_netid() will set min and max to
> 0 for ird_alloc(). And when max == 0 idr_alloc() will interpret this as
> the maxium range, i.e. specific to nsids it will try to find an id in
> the range [0,INT_MAX). This is intentionally supported in the kernel for
> nsids. Commit acbe9118ce8086f765ffb0da15f80c7c01a8903a regressed ip
> netns in that respect although previously the use-case was either
> accidentally supported or opaquely supported such that it triggered the
> original commit. From what I can gather it went as follows before:
> atoi() was called with a string indicating a negative value which caused
> it to return -1 which was passed to the kernel. Let's make it less
> opaque by introducing the keyword "auto":
> 
> ip netns set  auto
> 
> will cause nsid to be set to -1 and the kernel will select an available
> nsid.
> 
> Signed-off-by: Christian Brauner 

Applied thank you.
I did have to fix spelling and format of commit reference in
the commit description. If you run checkpatch on patches to
iproute you would have caught that.

[PATCH net] rxrpc: Don't put crypto buffers on the stack

2018-02-08 Thread David Howells

Don't put buffers of data to be handed to crypto on the stack as this may
cause an assertion failure in the kernel (see below).  Fix this by using an
kmalloc'd buffer instead.

kernel BUG at ./include/linux/scatterlist.h:147!
...
RIP: 0010:rxkad_encrypt_response.isra.6+0x191/0x1b0 [rxrpc]
RSP: 0018:be2fc06cfca8 EFLAGS: 00010246
RAX:  RBX: 989277d59900 RCX: 0028
RDX: 259dc06cfd88 RSI: 0025 RDI: be30406cfd88
RBP: be2fc06cfd60 R08: be2fc06cfd08 R09: be2fc06cfd08
R10:  R11:  R12: 17c5f80d9f95
R13: be2fc06cfd88 R14: 98927a3f7aa0 R15: be2fc06cfd08
FS:  () GS:98927fc0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 55b1ff28f0f8 CR3: 1b412003 CR4: 003606f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 rxkad_respond_to_challenge+0x297/0x330 [rxrpc]
 rxrpc_process_connection+0xd1/0x690 [rxrpc]
 ? process_one_work+0x1c3/0x680
 ? __lock_is_held+0x59/0xa0
 process_one_work+0x249/0x680
 worker_thread+0x3a/0x390
 ? process_one_work+0x680/0x680
 kthread+0x121/0x140
 ? kthread_create_worker_on_cpu+0x70/0x70
 ret_from_fork+0x3a/0x50

Reported-by: Jonathan Billings 
Reported-by: Marc Dionne 
Signed-off-by: David Howells 
Tested-by: Jonathan Billings 
---

 net/rxrpc/conn_event.c |1 +
 net/rxrpc/rxkad.c  |   92 +++-
 2 files changed, 52 insertions(+), 41 deletions(-)

diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c
index 4ca11be6be3c..b1dfae107431 100644
--- a/net/rxrpc/conn_event.c
+++ b/net/rxrpc/conn_event.c
@@ -460,6 +460,7 @@ void rxrpc_process_connection(struct work_struct *work)
case -EKEYEXPIRED:
case -EKEYREJECTED:
goto protocol_error;
+   case -ENOMEM:
case -EAGAIN:
goto requeue_and_leave;
case -ECONNABORTED:
diff --git a/net/rxrpc/rxkad.c b/net/rxrpc/rxkad.c
index c38b3a1de56c..77cb23c7bd0a 100644
--- a/net/rxrpc/rxkad.c
+++ b/net/rxrpc/rxkad.c
@@ -773,8 +773,7 @@ static int rxkad_respond_to_challenge(struct 
rxrpc_connection *conn,
 {
const struct rxrpc_key_token *token;
struct rxkad_challenge challenge;
-   struct rxkad_response resp
-   __attribute__((aligned(8))); /* must be aligned for crypto */
+   struct rxkad_response *resp;
struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
const char *eproto;
u32 version, nonce, min_level, abort_code;
@@ -818,26 +817,29 @@ static int rxkad_respond_to_challenge(struct 
rxrpc_connection *conn,
token = conn->params.key->payload.data[0];
 
/* build the response packet */
-   memset(, 0, sizeof(resp));
-
-   resp.version= htonl(RXKAD_VERSION);
-   resp.encrypted.epoch= htonl(conn->proto.epoch);
-   resp.encrypted.cid  = htonl(conn->proto.cid);
-   resp.encrypted.securityIndex= htonl(conn->security_ix);
-   resp.encrypted.inc_nonce= htonl(nonce + 1);
-   resp.encrypted.level= htonl(conn->params.security_level);
-   resp.kvno   = htonl(token->kad->kvno);
-   resp.ticket_len = htonl(token->kad->ticket_len);
-
-   resp.encrypted.call_id[0] = htonl(conn->channels[0].call_counter);
-   resp.encrypted.call_id[1] = htonl(conn->channels[1].call_counter);
-   resp.encrypted.call_id[2] = htonl(conn->channels[2].call_counter);
-   resp.encrypted.call_id[3] = htonl(conn->channels[3].call_counter);
+   resp = kzalloc(sizeof(struct rxkad_response), GFP_NOFS);
+   if (!resp)
+   return -ENOMEM;
+
+   resp->version   = htonl(RXKAD_VERSION);
+   resp->encrypted.epoch   = htonl(conn->proto.epoch);
+   resp->encrypted.cid = htonl(conn->proto.cid);
+   resp->encrypted.securityIndex   = htonl(conn->security_ix);
+   resp->encrypted.inc_nonce   = htonl(nonce + 1);
+   resp->encrypted.level   = htonl(conn->params.security_level);
+   resp->kvno  = htonl(token->kad->kvno);
+   resp->ticket_len= htonl(token->kad->ticket_len);
+   resp->encrypted.call_id[0]  = htonl(conn->channels[0].call_counter);
+   resp->encrypted.call_id[1]  = htonl(conn->channels[1].call_counter);
+   resp->encrypted.call_id[2]  = htonl(conn->channels[2].call_counter);
+   resp->encrypted.call_id[3]  = htonl(conn->channels[3].call_counter);
 
/* calculate the response checksum and then do the encryption */
-   rxkad_calc_response_checksum();
-

Re: USB rndis_host - slow download transfers, RX errors

2018-02-08 Thread Tomasz Janowski

On Thursday, February 8, 2018 3:43:05 PM EST Greg KH wrote:
> On Thu, Feb 08, 2018 at 02:16:08PM +, Tomasz Janowski, Ph.D. wrote:
> > Dear USB developers,
> > 
> > Based on my google research, the problem I experience seems to happen
> > with some newer smartphones. My test case is Samsung Galaxy S8 (SM-950U1).
> > I am trying to use USB tethering and everything seems to work as expected
> > (modules are loaded, Ethernet devices are up and running, dhcp works
> > fine). I can connect to the external world using both LTE or wireless
> > network on the phone.
> > 
> > Now, the problem is that the download speeds are terrible, around 64 KB/s,
> > while uploads are fast, the order of 15 MB/s. These speeds do not depend
> > on the wireless service provider: the results are similar when I tether
> > wi-fi. The USB Ethernet interface on the Linux host reports a lot of
> > receive errors (attached: device_state.txt), while kernel reports bad
> > rndis messages (attached: kernel.log.txt).
> > 
> > Windows 10 works great with the same hardware (same PC and same phone),
> > with uploads and downloads in the order of 150 Mbit/s, which is probably
> > as fast as my wireless network can do. But some people reported issues
> > with older Windows drivers too. Is possible that some newer version of
> > RNDIS protocol is around and Linux hasn't updated its RNDIS module yet?
> 
> Hey, I was _just_ talking to someone at Google about this same issue
> yesterday, you beat him sending this same type of report to the mailing
> list, nice job :)
> 
> Yes, this is not good, and we should work to resolve this, but first,
> what kernel version are you using?  I think some fixes for the rndis
> driver went in recently to 4.15, but it would be good to verify that
> this isn't already resolved.

The error messages which I have attached were produced by a precompiled Debian 
kernel: "Linux version 4.14.0-0.bpo.3-amd64 (debian-ker...@lists.debian.org) 
(gcc version 6.3.0 20170516 (Debian 6.3.0-18)) #1 SMP Debian 4.14.13-1~bpo9+1 
(2018-01-14)".

But I have downloaded the most recent version of the kernel from the official 
git repository (last commit: Jan 31, 2018) and it had exactly the same 
problem. Unless a patch was submitted within the last week, the issue is still 
there.

Should I get the version as of today and test it again?

Thanks!
Tomasz

Re: [PATCH net V3 2/2] ptr_ring: fail on large queue size (>64K)

2018-02-08 Thread Michael S. Tsirkin

On Thu, Feb 08, 2018 at 03:11:22PM +0800, Jason Wang wrote:
> 
> 
> On 2018年02月08日 12:52, Michael S. Tsirkin wrote:
> > On Thu, Feb 08, 2018 at 11:59:25AM +0800, Jason Wang wrote:
> > > We need limit the maximum size of queue, otherwise it may cause
> > > several side effects e.g slab will warn when the size exceeds
> > > KMALLOC_MAX_SIZE. Using KMALLOC_MAX_SIZE still looks too so this patch
> > > tries to limit it to 64K. This value could be revisited if we found a
> > > real case that needs more.
> > > 
> > > Reported-by: syzbot+e4d4f9ddd42955397...@syzkaller.appspotmail.com
> > > Fixes: 2e0ab8ca83c12 ("ptr_ring: array based FIFO for pointers")
> > > Signed-off-by: Jason Wang 
> > > ---
> > >   include/linux/ptr_ring.h | 4 
> > >   1 file changed, 4 insertions(+)
> > > 
> > > diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> > > index 2af71a7..5858d48 100644
> > > --- a/include/linux/ptr_ring.h
> > > +++ b/include/linux/ptr_ring.h
> > > @@ -44,6 +44,8 @@ struct ptr_ring {
> > >   void **queue;
> > >   };
> > Seems like a weird location for a define. Either put defines on
> > top of the file, or near where they are used. I prefer the
> > second option.
> 
> Ok.
> 
> > 
> > > +#define PTR_RING_MAX_ALLOC 65536
> > > +
> > I guess it's an arbitrary number. Seems like a sufficiently large one,
> > but pls add a comment so readers don't wonder. And please explain what
> > it does:
> > 
> > /* Callers can create ptr_ring structures with userspace-supplied
> >   * parameters. This sets a limit on the size to make that usecase
> >   * safe. If you ever change this, make sure to audit all callers.
> >   */
> > 
> > Also I think we should generally use either hex 0x1 or (1 << 16).
> 
> I agree the number is arbitrary, so I still prefer the KMALLOC_MAX_SIZE
> especially consider it was used by pfifo_fast now. Try to limit it to an
> arbitrary may break lots of exist setups. E.g just google "txqueuelen
> 10" can give me a lots of search results.
> 
> We can do any kind of optimization on top but not for -net now.
> 
> Thanks

Interesting. I have an idea for fixing this, but maybe
for now KMALLOC_MAX_SIZE does make sense. It's unfortunate that
this value is architecture dependent.

The patch still needs code comments though, and fix the math to
use the proper size.


> > 
> > >   /* Note: callers invoking this in a loop must use a compiler barrier,
> > >* for example cpu_relax().
> > >*
> > > @@ -466,6 +468,8 @@ static inline int ptr_ring_consume_batched_bh(struct 
> > > ptr_ring *r,
> > >   static inline void **__ptr_ring_init_queue_alloc(unsigned int size, 
> > > gfp_t gfp)
> > >   {
> > > + if (size > PTR_RING_MAX_ALLOC)
> > > + return NULL;
> > >   return kvmalloc_array(size, sizeof(void *), gfp | __GFP_ZERO);
> > >   }
> > > -- 
> > > 2.7.4

Re: [PATCH net] ptr_ring: fail early if queue occupies more than KMALLOC_MAX_SIZE

2018-02-08 Thread Michael S. Tsirkin

On Wed, Feb 07, 2018 at 04:08:25PM +0800, Jason Wang wrote:
> To avoid slab to warn about exceeded size, fail early if queue
> occupies more than KMALLOC_MAX_SIZE.
> 
> Reported-by: syzbot+e4d4f9ddd42955397...@syzkaller.appspotmail.com
> Signed-off-by: Jason Wang 
> ---
>  include/linux/ptr_ring.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> index 1883d61..4b862da 100644
> --- a/include/linux/ptr_ring.h
> +++ b/include/linux/ptr_ring.h
> @@ -466,6 +466,8 @@ static inline int ptr_ring_consume_batched_bh(struct 
> ptr_ring *r,
>  
>  static inline void **__ptr_ring_init_queue_alloc(unsigned int size, gfp_t 
> gfp)
>  {
> + if (size > KMALLOC_MAX_SIZE)
> + return NULL;
>   return kcalloc(size, sizeof(void *), gfp);
>  }

I guess this approach does begin to make more sense
at least as a temporary stop-gap.

But does this actually prevent the crash in all cases?

size is in void* entry units, KMALLOC_MAX_SIZE is in bytes.


>  
> -- 
> 2.7.4

Re: USB rndis_host - slow download transfers, RX errors

2018-02-08 Thread Greg KH

On Thu, Feb 08, 2018 at 02:16:08PM +, Tomasz Janowski, Ph.D. wrote:
> Dear USB developers,
> 
> Based on my google research, the problem I experience seems to happen
> with some newer smartphones. My test case is Samsung Galaxy S8 (SM-950U1). I 
> am
> trying to use USB tethering and everything seems to work as expected (modules
> are loaded, Ethernet devices are up and running, dhcp works fine). I can 
> connect to
> the external world using both LTE or wireless network on the phone.
> 
> Now, the problem is that the download speeds are terrible, around 64 KB/s,
> while uploads are fast, the order of 15 MB/s. These speeds do not depend
> on the wireless service provider: the results are similar when I tether wi-fi.
> The USB Ethernet interface on the Linux host reports a lot of receive errors 
> (attached:
> device_state.txt), while kernel reports bad rndis messages (attached: 
> kernel.log.txt).
> 
> Windows 10 works great with the same hardware (same PC and same phone), with
> uploads and downloads in the order of 150 Mbit/s, which is probably as fast 
> as my
> wireless network can do. But some people reported issues with older Windows 
> drivers too.
> Is possible that some newer version of RNDIS protocol is around and Linux 
> hasn't updated
> its RNDIS module yet?

Hey, I was _just_ talking to someone at Google about this same issue
yesterday, you beat him sending this same type of report to the mailing
list, nice job :)

Yes, this is not good, and we should work to resolve this, but first,
what kernel version are you using?  I think some fixes for the rndis
driver went in recently to 4.15, but it would be good to verify that
this isn't already resolved.

thanks,

greg k-h

[PATCH net] net/sched: cls_u32: fix cls_u32 on filter replace

2018-02-08 Thread Ivan Vecera

The following sequence is currently broken:

 # tc qdisc add dev foo ingress
 # tc filter replace dev foo protocol all ingress \
   u32 match u8 0 0 action mirred egress mirror dev bar1
 # tc filter replace dev foo protocol all ingress \
   handle 800::800 pref 49152 \
   u32 match u8 0 0 action mirred egress mirror dev bar2
 Error: cls_u32: Key node flags do not match passed flags.
 We have an error talking to the kernel, -1

The error comes from u32_change() when comparing new and
existing flags. The existing ones always contains one of
TCA_CLS_FLAGS_{,NOT}_IN_HW flag depending on offloading state.
These flags cannot be passed from userspace so the condition
(n->flags != flags) in u32_change() always fails.

Fix the condition so the flags TCA_CLS_FLAGS_NOT_IN_HW and
TCA_CLS_FLAGS_IN_HW are not taken into account.

Fixes: 24d3dc6d27ea ("net/sched: cls_u32: Reflect HW offload status")
Signed-off-by: Ivan Vecera 
---
 net/sched/cls_u32.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 6311a548046b..c75e68e839c7 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -955,7 +955,8 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
return -EINVAL;
}
 
-   if (n->flags != flags) {
+   if ((n->flags ^ flags) &
+   ~(TCA_CLS_FLAGS_IN_HW | TCA_CLS_FLAGS_NOT_IN_HW)) {
NL_SET_ERR_MSG_MOD(extack, "Key node flags do not match 
passed flags");
return -EINVAL;
}
-- 
2.13.6

Re: [PATCH net 0/5] nfp: fix disabling TC offloads in flower, max TSO segs and module version

2018-02-08 Thread David Miller

From: Jakub Kicinski 
Date: Wed,  7 Feb 2018 20:55:21 -0800

> This set corrects the way nfp deals with the NETIF_F_HW_TC flag.
> It has slipped the review that flower offload does not currently
> refuse disabling this flag when filter offload is active.
> 
> nfp's flower offload does not actually keep track of how many filters
> for each port are offloaded.  The accounting of the number of filters
> is added to the nfp core structures, and BPF moved to use these
> structures as well.
> 
> If users are allowed to disable TC offloads while filters are active,
> not only is it incorrect behaviour, but actually the NFP will never
> be told to remove the flows, leading to use-after-free when stats
> arrive.
> 
> Fourth patch makes sure we declare the max number of TSO segments.
> FW should drop longer packets cleanly (otherwise this would be a
> security problem for untrusted VFs) but dropping longer TSO frames
> is not nice and driver should prevent them from being generated.
> 
> Last small addition populates MODULE_VERSION with kernel version.

Series applied, thanks Jakub.

1 2 >

1 - 100 of 147 matches

Mail list logo