Re: rename the command ip

2021-03-01 Thread Phil Sutter
On Mon, Mar 01, 2021 at 08:20:33AM +0200, Leon Romanovsky wrote:
> On Sun, Feb 28, 2021 at 10:39:14PM +0100, Phil Sutter wrote:
> > William,
> >
> > [Cc'ing netdev list as that's the place to discuss iproute2
> > development.]
> 
> <...>
> 
> > > IMPORTANT NOTICE: The contents of this email and any attachments are 
> > > confidential and may also be privileged. If you are not the intended 
> > > recipient, please notify the sender immediately and do not disclose the 
> > > contents to any other person, use it for any purpose, or store or copy 
> > > the information in any medium. Thank you.
> >
> > Luckily I'm the intended recipient so I may choose to disclose the
> > content (e.g. to a mailing list).
> 
> It is OK, he sent the same letter to many of us.

At least he expressed his appreciation to each of us individually. :D

Cheers, Phil


Re: rename the command ip

2021-02-28 Thread Phil Sutter
William,

[Cc'ing netdev list as that's the place to discuss iproute2
development.]

On Fri, Feb 26, 2021 at 12:04:12PM -0600, William Chen wrote:
> I see your excellent contributions to iproute2. I hope that you are well.

Thanks!

> But I have to say the command name "ip" is not good. It renders the command 
> ungoogleable. Why not give it a more googleable name in the first place? 
> Maybe just iproute2? If it is too long, then maybe iprt2?

Well, first of all, this is not my decision to make. Stephen Hemminger
maintains the project, he's the one to ask about such a thing. Apart
from that, renaming a tool because (some)one fails to google it is a
stupid idea: Existing resources won't change spontaneously, also people
will continue to use the old name as they are used to it so you won't
find anything in the future, either.

> The `ip route show` result is not a column-wise format making it hard to see 
> the alignment. In the aspect of output representation, it is worse than both 
> route and netstat.
> 
> $ ip route show
> default via 192.168.1.1 dev eth0 proto dhcp metric 100
> 10.6.0.0/17 dev tun0 proto kernel scope link src 10.6.49.100
> 10.10.0.0/16 via 10.6.0.1 dev tun0 metric 1000
> 192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.1.104 metric 100

You could implement 'brief' output support for ip-route, it is available
for ip-addr and ip-link:

| % ip -br a s
| lo   UNKNOWN127.0.0.1/8 ::1/128 
| e1000UP fe80::215:17ff:fe0b:bf49/64 
| enp34s0  DOWN   fe80::2d8:61ff:fea7:d2fa/64 
| % ip -br l sh
| lo   UNKNOWN00:00:00:00:00:00  
| e1000UP 00:15:17:0b:bf:49 
 
| enp34s0  DOWN   00:d8:61:a7:d2:fa 
 


> Anyway, the `ip route show` result seems to be different from `route` and 
> `netstat`. Where are, Flags, MSS, Window, Ref and Use?
> 
> $ route -n
> Kernel IP routing table
> Destination Gateway Genmask Flags Metric RefUse Iface
> 0.0.0.0 192.168.1.1 0.0.0.0 UG10000 eth0
> 10.6.0.00.0.0.0 255.255.128.0   U 0  00 tun0
> 10.10.0.0   10.6.0.1255.255.0.0 UG1000   00 tun0
> 192.168.1.0 0.0.0.0 255.255.255.0   U 10000 eth0
> $ netstat -rn
> Kernel IP routing table
> Destination Gateway Genmask Flags   MSS Window  irtt Iface
> 0.0.0.0 192.168.1.1 0.0.0.0 UG0 0  0 eth0
> 10.6.0.00.0.0.0 255.255.128.0   U 0 0  0 tun0
> 10.10.0.0   10.6.0.1255.255.0.0 UG0 0  0 tun0
> 192.168.1.0 0.0.0.0 255.255.255.0   U 0 0  0 eth0

Call 'ip -d r s' to see the route type ('U' flag above). 'G' flag is
redundant, assume it's present if a route contains 'via '. MSS,
Window, Ref and Use are all zero in your output, are they still
relevant?

> The output of `ip route show` is also not of a fixed number of fields. How to 
> interpret what are of a field and what are not of a field?
> 
> $ ip route show | exec awk -e '{ print NF }'
> 9
> 9
> 7
> 11
> 
> Is there a way to make the output maybe in TSV format (or at least in a 
> column-wise format) so that the command `column` can be used to make the 
> output easier to read?

Have a look at '-json' option if you want to parse the output.

> Your help is much appreciated! I look forward to hearing from you. Thanks.
> 
> 
> IMPORTANT NOTICE: The contents of this email and any attachments are 
> confidential and may also be privileged. If you are not the intended 
> recipient, please notify the sender immediately and do not disclose the 
> contents to any other person, use it for any purpose, or store or copy the 
> information in any medium. Thank you.

Luckily I'm the intended recipient so I may choose to disclose the
content (e.g. to a mailing list).

Cheers, Phil


[PATCH] selftests: tc-testing: u32: Add tests covering sample option

2021-02-08 Thread Phil Sutter
Kernel's key folding basically consists of shifting away least
significant zero bits in mask and masking the resulting value with
(divisor - 1). Test for u32's 'sample' option to behave identical.

Suggested-by: Jamal Hadi Salim 
Signed-off-by: Phil Sutter 
---
These tests require my iproute2 patch 'tc: u32: Fix key folding in
sample option' in order to pass.
---
 .../tc-testing/tc-tests/filters/u32.json  | 46 +++
 1 file changed, 46 insertions(+)

diff --git a/tools/testing/selftests/tc-testing/tc-tests/filters/u32.json 
b/tools/testing/selftests/tc-testing/tc-tests/filters/u32.json
index e09d3c0e307f6..bd64a4bf11abf 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/filters/u32.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/filters/u32.json
@@ -201,5 +201,51 @@
 "teardown": [
 "$TC qdisc del dev $DEV1 ingress"
 ]
+},
+{
+"id": "0692",
+"name": "Test u32 sample option, divisor 256",
+"category": [
+"filter",
+"u32"
+],
+"plugins": {
+"requires": "nsPlugin"
+},
+"setup": [
+"$TC qdisc add dev $DEV1 ingress",
+"$TC filter add dev $DEV1 ingress prio 99 handle 1: u32 divisor 
256"
+],
+"cmdUnderTest": "bash -c \"for mask in ff  ff  ff00ff 
ffff 00ff; do $TC filter add dev $DEV1 ingress prio 99 u32 ht 1: sample 
u32 0x10203040 \\$mask match u8 0 0 classid 1:1; done\"",
+"expExitCode": "0",
+"verifyCmd": "$TC filter show dev $DEV1 ingress",
+"matchPattern": "filter protocol all pref 99 u32( (chain|fh|order) 
[0-9:]+){3} key ht 1 bkt 40 flowid 1:1",
+"matchCount": "7",
+"teardown": [
+"$TC qdisc del dev $DEV1 ingress"
+]
+},
+{
+"id": "2478",
+"name": "Test u32 sample option, divisor 16",
+"category": [
+"filter",
+"u32"
+],
+"plugins": {
+"requires": "nsPlugin"
+},
+"setup": [
+"$TC qdisc add dev $DEV1 ingress",
+"$TC filter add dev $DEV1 ingress prio 99 handle 1: u32 divisor 
256"
+],
+"cmdUnderTest": "bash -c \"for mask in 70 f0 ff0 fff0 ff00f0; do $TC 
filter add dev $DEV1 ingress prio 99 u32 ht 1: sample u32 0x10203040 \\$mask 
match u8 0 0 classid 1:1; done\"",
+"expExitCode": "0",
+"verifyCmd": "$TC filter show dev $DEV1 ingress",
+"matchPattern": "filter protocol all pref 99 u32( (chain|fh|order) 
[0-9:]+){3} key ht 1 bkt 4 flowid 1:1",
+"matchCount": "5",
+"teardown": [
+"$TC qdisc del dev $DEV1 ingress"
+]
 }
 ]
-- 
2.28.0



Re: [iproute PATCH] tc: u32: Fix key folding in sample option

2021-02-04 Thread Phil Sutter
On Thu, Feb 04, 2021 at 10:28:26AM -0500, Jamal Hadi Salim wrote:
> On 2021-02-04 9:50 a.m., Phil Sutter wrote:
> > On Thu, Feb 04, 2021 at 09:34:01AM -0500, Jamal Hadi Salim wrote:
> >> On 2021-02-04 9:04 a.m., Phil Sutter wrote:
> >>> Jamal,
> >>>
> >>> On Thu, Feb 04, 2021 at 08:19:55AM -0500, Jamal Hadi Salim wrote:
> >>>> I couldnt tell by inspection if what used to work before continues to.
> >>>> In particular the kernel version does consider the divisor when folding.
> >>>
> >>> That's correct. And so does tc. What's the matter?
> >>>
> >>
> >> tc assumes 256 when undefined. Maybe man page needs to be
> >> updated to state we need divisor specified otherwise default
> >> is 256.
> > 
> > tc-u32.8 mentions the default in 'sample' option description. Specifying
> > divisor is mandatory when creating a hash table, so that path is
> > covered, too. I still don't get how this is related to my patch, though.
> > 
> 
> It is a general comment related to this code (that you are modifying).
> You mentioned divisor in your earlier email as part of the syntax for
> sample. So it made me wonder:
> Does the bucket placement assume a specific number of buckets in a
> table? Example if i had a hash table with 4 buckets, would the sample
> then pick the correct bucket? Would it be also correct for 32 buckets,
> etc. Or it didnt matter before and it doesnt matter now.

My patch doesn't change how divisor is applied. And yes, with a smaller
than 256 buckets hash table, specifying the divisor along with sample is
necessary.

> >>>> Two examples that currently work, if you can try them:
> >>>
> >>> Both lack information about the used hashkey and divisor.
> >>>
> >>>> Most used scheme:
> >>>> ---
> >>>> tc filter add dev $DEV parent 999:0  protocol ip prio 10 u32 \
> >>>> ht 2:: \
> >>>> sample ip protocol 1 0xff match ip src 1.2.3.4/32 flowid 1:10 \
> >>>> action ok
> >>>> 
> >>>
> >>> htid before: 0x201000
> >>> htid after: 0x201000
> >>>
> >>
> >> Ok, this is the most common use-case. So we are good.
> > 
> > Whatever.
> 
> Meaning?

Things are always fine if we only consider the positive results.

Cheers, Phil


Re: [iproute PATCH] tc: u32: Fix key folding in sample option

2021-02-04 Thread Phil Sutter
On Thu, Feb 04, 2021 at 09:34:01AM -0500, Jamal Hadi Salim wrote:
> On 2021-02-04 9:04 a.m., Phil Sutter wrote:
> > Jamal,
> > 
> > On Thu, Feb 04, 2021 at 08:19:55AM -0500, Jamal Hadi Salim wrote:
> >> I couldnt tell by inspection if what used to work before continues to.
> >> In particular the kernel version does consider the divisor when folding.
> > 
> > That's correct. And so does tc. What's the matter?
> > 
> 
> tc assumes 256 when undefined. Maybe man page needs to be
> updated to state we need divisor specified otherwise default
> is 256.

tc-u32.8 mentions the default in 'sample' option description. Specifying
divisor is mandatory when creating a hash table, so that path is
covered, too. I still don't get how this is related to my patch, though.

> >> Two examples that currently work, if you can try them:
> > 
> > Both lack information about the used hashkey and divisor.
> > 
> >> Most used scheme:
> >> ---
> >> tc filter add dev $DEV parent 999:0  protocol ip prio 10 u32 \
> >> ht 2:: \
> >> sample ip protocol 1 0xff match ip src 1.2.3.4/32 flowid 1:10 \
> >> action ok
> >> 
> > 
> > htid before: 0x201000
> > htid after: 0x201000
> > 
> 
> Ok, this is the most common use-case. So we are good.

Whatever.

Thanks, Phil


Re: [iproute PATCH] tc: u32: Fix key folding in sample option

2021-02-04 Thread Phil Sutter
Jamal,

On Thu, Feb 04, 2021 at 08:19:55AM -0500, Jamal Hadi Salim wrote:
> I couldnt tell by inspection if what used to work before continues to.
> In particular the kernel version does consider the divisor when folding.

That's correct. And so does tc. What's the matter?

> Two examples that currently work, if you can try them:

Both lack information about the used hashkey and divisor.

> Most used scheme:
> ---
> tc filter add dev $DEV parent 999:0  protocol ip prio 10 u32 \
> ht 2:: \
> sample ip protocol 1 0xff match ip src 1.2.3.4/32 flowid 1:10 \
> action ok
> 

htid before: 0x201000
htid after: 0x201000

> 
> and this i also found in one of my scripts:
> 
> tc filter add dev $DEV parent 999:0  protocol ip prio 10 u32 \
> ht 2:: \
> sample u32 0x0806 0x at 12 \
> match u32 0x0800 0xff00 at 12 flowid 1:10 \
> action ok
> 

htid before: 0x20e000 (0x8 ^ 0x6 = 0xe)
htid after: 0x206000

Are you sure this still works with current kernel and iproute2
(excluding my patch)? What divisor and hashkey is used?

> Probably a simple meaning of "working" is:
> the values before and after (your changes) are consistent.
> 
> If also you will do us a kindness and add maybe a testcase in tdc?
> This way next person wanting to fix it can run the tests first before
> posting a patch.

What is "tdc"?

Cheers, Phil


[iproute PATCH] tc: u32: Fix key folding in sample option

2021-02-02 Thread Phil Sutter
In between Linux kernel 2.4 and 2.6, key folding for hash tables changed
in kernel space. When iproute2 dropped support for the older algorithm,
the wrong code was removed and kernel 2.4 folding method remained in
place. To get things functional for recent kernels again, restoring the
old code alone was not sufficient - additional byteorder fixes were
needed.

While being at it, make use of ffs() and thereby align the code with how
kernel determines the shift width.

Fixes: 267480f55383c ("Backout the 2.4 utsname hash patch.")
Signed-off-by: Phil Sutter 
---
Initially I considered changing the kernel's key folding instead as the
old method didn't just ignore key bits beyond the first byte. Yet I am
not sure if this would cause problems with hardware offloading. And
given the fact that this simplified key folding is in place since the
dawn of 2.6, it is probably not such a big problem anyway.
---
 tc/f_u32.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/tc/f_u32.c b/tc/f_u32.c
index 2ed5254a40d5f..a5747f671e1ea 100644
--- a/tc/f_u32.c
+++ b/tc/f_u32.c
@@ -978,6 +978,13 @@ show_k:
goto show_k;
 }
 
+static __u32 u32_hash_fold(struct tc_u32_key *key)
+{
+   __u8 fshift = key->mask ? ffs(ntohl(key->mask)) - 1 : 0;
+
+   return ntohl(key->val & key->mask) >> fshift;
+}
+
 static int u32_parse_opt(struct filter_util *qu, char *handle,
 int argc, char **argv, struct nlmsghdr *n)
 {
@@ -1110,9 +1117,7 @@ static int u32_parse_opt(struct filter_util *qu, char 
*handle,
}
NEXT_ARG();
}
-   hash = sel2.keys[0].val & sel2.keys[0].mask;
-   hash ^= hash >> 16;
-   hash ^= hash >> 8;
+   hash = u32_hash_fold(&sel2.keys[0]);
htid = ((hash % divisor) << 12) | (htid & 0xFFF0);
sample_ok = 1;
continue;
-- 
2.28.0



Re: tc: u32: Wrong sample hash calculation

2021-01-22 Thread Phil Sutter
Jamal,

On Fri, Jan 22, 2021 at 06:25:22AM -0500, Jamal Hadi Salim wrote:
[...]
> My gut feel is user space is the right/easier spot to fix this
> as long as it doesnt break the working setup of 8b.

One last attempt at clarifying the situation:

Back in 2004, your commit 4e54c4816bf ("[NET]: Add tc extensions
infrastructure.")[1] was applied which commented out the old hash
folding and introduced the shift/cutoff we have today:

|  @@ -90,10 +101,12 @@ static struct tc_u_common *u32_list;
|  
|  static __inline__ unsigned u32_hash_fold(u32 key, struct tc_u32_sel *sel)
|  {
| - unsigned h = key & sel->hmask;
| + unsigned h = (key & sel->hmask)>>sel->fshift;
|  
| + /*
|   h ^= h>>16;
|   h ^= h>>8;
| + */
|   return h;
|  }

In a later commit, the new code was made compile-time selected via '#ifdef
fix_u32_bug'. In that same commit, I don't see a related #define though.

Do you remember why this was changed? Seems like the old code was
problematic somehow.

Cheers, Phil

[1] 
https://github.com/laijs/linux-kernel-ancient-history/commit/4e54c4816bfe51c145382d272b19c2ae41e9e36f#


Re: tc: u32: Wrong sample hash calculation

2021-01-22 Thread Phil Sutter
Hi Jamal,

On Fri, Jan 22, 2021 at 06:25:22AM -0500, Jamal Hadi Salim wrote:
[...]
> Is this always true though for all scenarios of key > 8b?

Key size reduction algorithms simply differ, and before applying the
divisor the key is reduced to an eight bit value. If the higher bytes
are zero, the result is identical. So for some keys the differences are
irrelevant.

> And is there a pattern that can be deduced?

Something like: 'broken = !!(key >> 8)'? ;)

> My gut feel is user space is the right/easier spot to fix this
> as long as it doesnt break the working setup of 8b.

My motivation to write the initial email was that I don't like the
kernel's key folding as it's basically just cutting off the extra bits.
I am aware that fixing user space is easier, but better distribution of
entries might be worth the extra effort. Given that you didn't point out
any implications with changing u32's key folding in kernel space, I'll
just give it a try.

Thanks, Phil


Re: tc: u32: Wrong sample hash calculation

2021-01-20 Thread Phil Sutter
Hi Jamal,

On Wed, Jan 20, 2021 at 08:55:11AM -0500, Jamal Hadi Salim wrote:
> On 2021-01-18 6:29 a.m., Phil Sutter wrote:
> > Hi!
> > 
> > Playing with u32 filter's hash table I noticed it is not possible to use
> > 'sample' option with keys larger than 8bits to calculate the hash
> > bucket. 
> 
> 
> I have mostly used something like: ht 2:: sample ip protocol 1 0xff
> Hoping this is continuing to work.

This should read 'sample ip protocol 1 divisor 0xff', right?

> I feel i am missing something basic in the rest of your email:
> Sample is a user space concept i.e it is used to instruct the
> kernel what table/bucket to insert the node into. This computation
> is done in user space. The kernel should just walk the nodes in
> the bucket and match.

Correct, but the kernel has to find the right bucket first. This is
where its key hashing comes into place.

> Reminder: you can only have 256 buckets (8 bit representation).
> Could that be the contributing factor?

It is. Any key smaller than 256B is unaffected as no folding is done in
either kernel or user space.

> Here's an example of something which is not 8 bit that i found in
> an old script that should work (but I didnt test in current kernels).
> ht 2:: sample u32 0x0800 0xff00 at 12
> We are still going to extract only 8 bits for the bucket.

Yes. The resulting key is 8Bit as the low zeroes are automatically
shifted away.

> Can you provide an example of what wouldnt work?

Sure, sorry for not including it in the original email. Let's apply
actions to some packets based on source IP address. To efficiently
support arbitrary numbers, we use a hash table with 256 buckets:

# tc qd add dev test0 ingress
# tc filter add dev test0 parent : prio 99 handle 1: u32 divisor 256
# tc filter add dev test0 parent : prio 1 protocol ip u32 \
hashkey mask 0x at 12 link 1: match u8 0 0

So with the above in place, the kernel uses 32bits at offset 12 as a key
to determine the bucket to jump to. This is done by just extracting the
lowest 8bits in host byteorder, i.e. the last octet of the packet's
source address.

Users don't know the above (and shouldn't need to), so they use sample
to have the bucket determined automatically:

# tc filter add dev test0 parent : prio 99 u32 \
match ip src 10.0.0.2 \
ht 1: sample ip src 10.0.0.2 divisor 256 \
action drop

iproute2 calculates bucket 8 (= 10 ^ 2), while the kernel will check
bucket 2. So the above filter will never match.

Cheers, Phil


tc: u32: Wrong sample hash calculation

2021-01-18 Thread Phil Sutter
Hi!

Playing with u32 filter's hash table I noticed it is not possible to use
'sample' option with keys larger than 8bits to calculate the hash
bucket. Turns out key hashing in kernel and iproute2 differ:

* net/sched/cls_u32.c (kernel) basically does:

hash = ntohl(key & mask);
hash >>= ffs(ntohl(mask)) - 1;
hash &= 0xff;
hash %= divisor;

* while tc/f_u32.c (iproute2) does:

hash = key & mask;
hash ^= hash >> 16;
hash ^= hash >> 8;
hash %= divisor;

In iproute2, the code changed in 2006 with commit 267480f55383c
("Backout the 2.4 utsname hash patch."), here's the relevant diff:

  hash = sel2.sel.keys[0].val&sel2.sel.keys[0].mask;
- uname(&utsname);
- if (strncmp(utsname.release, "2.4.", 4) == 0) {
- hash ^= hash>>16;
- hash ^= hash>>8;
- }
- else {
- __u32 mask = sel2.sel.keys[0].mask;
- while (mask && !(mask & 1)) {
- mask >>= 1;
- hash >>= 1;
- }
- hash &= 0xFF;
- }
+ hash ^= hash>>16;
+ hash ^= hash>>8;
  htid = ((hash%divisor)<<12)|(htid&0xFFF0);

The old code would work if key and mask weren't in network byteorder. I
guess that also changed since then.

I would simply send a patch to fix iproute2, but I don't like the
kernel's hash "folding" as it ignores any bits beyond the first eight.
So I would prefer to "fix" the kernel instead but would like to hear
your opinions as that change has a much larger scope than just
iproute2's 'sample' option.

Thanks, Phil


[ANNOUNCE] iptables 1.8.7 release

2021-01-15 Thread Phil Sutter
Hi!

The Netfilter project proudly presents:

iptables 1.8.7

This release contains the following fixes and enhancements:

iptables-nft:
- Improved performance when matching on IP/MAC address prefixes if the
  prefix is byte-aligned. In ideal cases, this doubles packet processing
  performance.
  *NOTE*: Older iptables versions will not recognize the mask and thus
  omit them when listing the ruleset.
- Dump user-defined chains in lexical order. This way ruleset dumps
  become stable and easily comparable.
- Avoid pointless table/chain creation. For instance, 'iptables-nft -L'
  no longer creates missing base-chains.

ebtables-nft:
- Renaming user-defined chains was entirely broken.

extensions:
- Code for printing and parsing of MAC addresses was consolidated
  internally, slightly reducing binary size. As a noticeable
  side-effect, all MAC addresses are now printed in lower-case (affects
  'mac'-extension).
- Fixed DCCP extension's match on 'INVALID' type, a meta-type which
  should match any type value in the range from ten to fifteen. In the
  past it matched on type value 10 only.

xtables-monitor:
- Don't print unrelated rules in the same chain when tracing.
- Flush output buffer after each rule when tracing to improve experience
  when redirecting output.
- Print the table's family when tracing instead of whatever the user
  specified on command line.
- Print the traced packet before the rule it traverses, not vice-versa.
- Recognize loopback interface and print "LOOPBACK" for link layer
  header info instead of "LL=0x304".

xtables-translate:
- Correctly translate DCCP type matches (including 'INVALID').

See the attached changelog for more details.

You can download it from:

http://www.netfilter.org/projects/iptables/downloads.html#iptables-1.8.7

To build the code, libnftnl 1.1.6 is required:

* http://netfilter.org/projects/libnftnl/downloads.html#libnftnl-1.1.6

In case of bugs and feature requests, file them via:

* https://bugzilla.netfilter.org

Happy firewalling!
Florian Westphal (4):
  xtables-monitor: fix rule printing
  xtables-monitor: fix packet family protocol
  xtables-monitor: print packet first
  xtables-monitor: 'LL=0x304' is not very convenient, print LOOPBACK instead.

Pablo Neira Ayuso (1):
  tests: shell: update format of registers in bitwise payloads.

Phil Sutter (21):
  nft: Optimize class-based IP prefix matches
  ebtables: Optimize masked MAC address matches
  tests/shell: Add test for bitwise avoidance fixes
  ebtables: Fix for broken chain renaming
  iptables-test.py: Accept multiple test files on commandline
  iptables-test.py: Try to unshare netns by default
  libxtables: Extend MAC address printing/parsing support
  xtables-arp: Don't use ARPT_INV_*
  xshared: Merge some command option-related code
  tests/shell: Test for fixed extension registration
  extensions: dccp: Fix for DCCP type 'INVALID'
  nft: Fix selective chain compatibility checks
  nft: cache: Introduce nft_cache_add_chain()
  nft: Implement nft_chain_foreach()
  nft: cache: Move nft_chain_find() over
  nft: Introduce struct nft_chain
  nft: Introduce a dedicated base chain array
  nft: cache: Sort custom chains by name
  tests: shell: Drop any dump sorting in place
  nft: Avoid pointless table/chain creation
  tests/shell: Fix nft-only/0009-needless-bitwise_0


Re: [PATCH iproute2 2/2] lib/fs: Fix single return points for get_cgroup2_*

2020-12-18 Thread Phil Sutter
On Fri, Dec 18, 2020 at 08:09:23PM +0100, Andrea Claudi wrote:
> Functions get_cgroup2_id() and get_cgroup2_path() uncorrectly performs
> cleanup on the single return point. Both of them may get to use close()
> with a negative argument, if open() fails.
> 
> Fix this adding proper labels and gotos to make sure we clean up only
> resources we are effectively used before.

Since free(NULL) is OK according to POSIX, the fds are initialized to -1
and open() returns -1 on error, you may simplify these
changes down to making the close() calls conditional:

| if (fd >= 0)
|   close(fd);

Cheers, Phil


Re: [PATCH iproute2 1/2] lib/fs: avoid double call to mkdir on make_path()

2020-12-18 Thread Phil Sutter
Hi Andrea,

On Fri, Dec 18, 2020 at 08:09:22PM +0100, Andrea Claudi wrote:
> make_path() function calls mkdir two times in a row. The first one it
> stores mkdir return code, and then it calls it again to check for errno.

To me it rather seems like I rebased the original commit into a mess. Or
I got really confused by the covscan error message this is based upon.
Either way, I don't see why this would not be a bug. :)

> This seems unnecessary, as we can use the return code from the first
> call and check for errno if not 0.
> 

Fixes: ac3415f5c1b1d ("lib/fs: Fix and simplify make_path()")
> Signed-off-by: Andrea Claudi 
Acked-by: Phil Sutter 

Thanks, Phil


Re: [PATCH v2] xfrm: interface: Don't hide plain packets from netfilter

2020-12-10 Thread Phil Sutter
Hi Nicolas,

On Thu, Dec 10, 2020 at 02:18:45PM +0100, Nicolas Dichtel wrote:
> Le 10/12/2020 à 12:48, Eyal Birger a écrit :
> > On Thu, Dec 10, 2020 at 1:10 PM Nicolas Dichtel
> >  wrote:
> [snip]
> > I also think they should be consistent. But it'd still be confusing to me
> > to get an OUTPUT hook on the inner packet in the forwarding case.
> I re-read the whole thread and I agree with you. There is no reason to pass 
> the
> inner packet through the OUTPUT hook (my comment about the consistency with ip
> tunnels is still valid ;-)).
> Sorry for the confusion.
> 
> Phil, with nftables, you can match the 'kind' of the interface, that should be
> enough to match packets, isn't it?

Yes, sure. Also, the inner packet passes POSTROUTING hook with ipsec
context present, it's just not visible in OUTPUT. Of course the broader
question is what do people use ipsec context matches for. If it's really
just to ensure traffic is encrypted, xfrm_interface alone is sufficient.

Originally this was reported as "ipsec match stops working if
xfrm_interface is used" and I suspected it's a bug in the driver.
Knowing the behaviour is expected (and at least consistent with vti),
the case is closed from my side. :)

Thanks, Phil


Re: [PATCH v2] xfrm: interface: Don't hide plain packets from netfilter

2020-12-08 Thread Phil Sutter
Hi Eyal,

On Tue, Dec 08, 2020 at 04:47:02PM +0200, Eyal Birger wrote:
> On Mon, Dec 7, 2020 at 4:07 PM Phil Sutter  wrote:
> >
> > With an IPsec tunnel without dedicated interface, netfilter sees locally
> > generated packets twice as they exit the physical interface: Once as "the
> > inner packet" with IPsec context attached and once as the encrypted
> > (ESP) packet.
> >
> > With xfrm_interface, the inner packet did not traverse NF_INET_LOCAL_OUT
> > hook anymore, making it impossible to match on both inner header values
> > and associated IPsec data from that hook.
> >
> 
> Why wouldn't locally generated traffic not traverse the
> NF_INET_LOCAL_OUT hook via e.g. __ip_local_out() when xmitted on an xfrmi?
> I would expect it to appear in netfilter, but without the IPsec
> context, as it's not
> there yet.

Yes, that's right. Having an iptables rule with LOG target in OUTPUT
chain, a packet sent from the local host is logged multiple times:

| IN= OUT=xfrm SRC=192.168.111.1 DST=192.168.111.2 LEN=84 TOS=0x00 PREC=0x00 
TTL=64 ID=21840 DF
| PROTO=ICMP TYPE=8 CODE=0 ID=56857 SEQ=1
| IN= OUT=eth0 SRC=192.168.111.1 DST=192.168.111.2 LEN=84 TOS=0x00 PREC=0x00 
TTL=64 ID=21840 DF PROTO=ICMP TYPE=8 CODE=0 ID=56857 SEQ=1
| IN= OUT=eth0 SRC=192.168.1.1 DST=192.168.1.2 LEN=140 TOS=0x00 PREC=0x00 
TTL=64 ID=0 DF PROTO=ESP SPI=0x1000

First when being sent to xfrm interface, then two times between xfrm and
eth0, the second time as ESP packet. This is with my patch applied.
Without it, the second log entry is missing. I'm arguing the above is
consistent to IPsec without xfrm interface:

| IN= OUT=eth1 SRC=192.168.112.1 DST=192.168.112.2 LEN=84 TOS=0x00 PREC=0x00 
TTL=64 ID=49341 DF PROTO=ICMP TYPE=8 CODE=0 ID=44114 SEQ=1
| IN= OUT=eth1 SRC=192.168.2.1 DST=192.168.2.2 LEN=140 TOS=0x00 PREC=0x00 
TTL=64 ID=37109 DF PROTO=ESP SPI=0x1000

The packet appears twice being sent to eth1, the second time as ESP
packet. I understand xfrm interface as a collector of to-be-xfrmed
packets, dropping those which do not match a policy.

> > Fix this by looping packets transmitted from xfrm_interface through
> > NF_INET_LOCAL_OUT before passing them on to dst_output(), which makes
> > behaviour consistent again from netfilter's point of view.
> 
> When an XFRM interface is used when forwarding, why would it be correct
> for NF_INET_LOCAL_OUT to observe the inner packet?

A valid question, indeed. One could interpret packets being forwarded by
those tunneling devices emit the packets one feeds them from the local
host. I just checked and ip_vti behaves identical to xfrm_interface
prior to my patch, so maybe my patch is crap and the inability to match
on ipsec context data when using any of those devices is just by design.

Thanks, Phil


Re: [PATCH v2] xfrm: interface: Don't hide plain packets from netfilter

2020-12-08 Thread Phil Sutter
Hi Nicolas,

On Tue, Dec 08, 2020 at 10:02:16AM +0100, Nicolas Dichtel wrote:
> Le 07/12/2020 à 14:43, Phil Sutter a écrit :
[...]
> > diff --git a/net/xfrm/xfrm_interface.c b/net/xfrm/xfrm_interface.c
> > index aa4cdcf69d471..24af61c95b4d4 100644
> > --- a/net/xfrm/xfrm_interface.c
> > +++ b/net/xfrm/xfrm_interface.c
> > @@ -317,7 +317,8 @@ xfrmi_xmit2(struct sk_buff *skb, struct net_device 
> > *dev, struct flowi *fl)
> > skb_dst_set(skb, dst);
> > skb->dev = tdev;
> >  
> > -   err = dst_output(xi->net, skb->sk, skb);
> > +   err = NF_HOOK(skb_dst(skb)->ops->family, NF_INET_LOCAL_OUT, xi->net,
> skb->protocol must be correctly set, maybe better to use it instead of
> skb_dst(skb)->ops->family?

skb->protocol holds ETH_P_* values in network byte order, NF_HOOK()
expects an NFPROTO_* value, so this would at least not be a simple
replacement. Actually I copied the code from xfrm_output_resume() in
xfrm_output.c, where skb_dst(skb)->ops is dereferenced without checking
as well. Do you think this is risky?

> > + skb->sk, skb, NULL, skb_dst(skb)->dev, dst_output);
> And here, tdev instead of skb_dst(skb)->dev ?

Well yes, tdev was set to dst->dev earlier. Likewise I could use dst
directly instead of skb_dst(skb) to simplify the call a bit further.
OTOH I like how in the version above it is clear that skb's dst should
be used, irrespective of the code above (and any later changes that may
receive). No strong opinion though, so if your version is regarded the
preferred one, I'm fine with that.

Thanks, Phil


[PATCH v2] xfrm: interface: Don't hide plain packets from netfilter

2020-12-07 Thread Phil Sutter
With an IPsec tunnel without dedicated interface, netfilter sees locally
generated packets twice as they exit the physical interface: Once as "the
inner packet" with IPsec context attached and once as the encrypted
(ESP) packet.

With xfrm_interface, the inner packet did not traverse NF_INET_LOCAL_OUT
hook anymore, making it impossible to match on both inner header values
and associated IPsec data from that hook.

Fix this by looping packets transmitted from xfrm_interface through
NF_INET_LOCAL_OUT before passing them on to dst_output(), which makes
behaviour consistent again from netfilter's point of view.

Fixes: f203b76d78092 ("xfrm: Add virtual xfrm interfaces")
Signed-off-by: Phil Sutter 
---
Changes since v1:
- Extend recipients list, no code changes.
---
 net/xfrm/xfrm_interface.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_interface.c b/net/xfrm/xfrm_interface.c
index aa4cdcf69d471..24af61c95b4d4 100644
--- a/net/xfrm/xfrm_interface.c
+++ b/net/xfrm/xfrm_interface.c
@@ -317,7 +317,8 @@ xfrmi_xmit2(struct sk_buff *skb, struct net_device *dev, 
struct flowi *fl)
skb_dst_set(skb, dst);
skb->dev = tdev;
 
-   err = dst_output(xi->net, skb->sk, skb);
+   err = NF_HOOK(skb_dst(skb)->ops->family, NF_INET_LOCAL_OUT, xi->net,
+ skb->sk, skb, NULL, skb_dst(skb)->dev, dst_output);
if (net_xmit_eval(err) == 0) {
struct pcpu_sw_netstats *tstats = this_cpu_ptr(dev->tstats);
 
-- 
2.28.0



[ANNOUNCE] iptables 1.8.6 release

2020-10-31 Thread Phil Sutter
Hi!

The Netfilter project proudly presents:

iptables 1.8.6

This release contains the following fixes and enhancements:

iptables-nft:
- Fix ip6tables error messages, they were incorrectly prefixed
  'iptables:'.
- Fix for pointless 'bitwise' expression being added to each IP address
  match, needlessly slowing down run-time performance (by 50% in worst
  cases).

iptables-nft-restore:
- Correctly print the flushed chains in verbose mode, like legacy
  restore does.
- Restoring multiple tables could fail if a ruleset flush happened in
  parallel (e.g. via 'nft flush ruleset').
- Fix for bogus error messages if a refreshed transaction fails.
- Support basechain policy value of '-' (indicating to not change the
  chain's policy).
- Fix for spurious errors in concurrent restore calls with '--noflush'.

iptables-legacy:
- Allow to configure lock file location via XTABLES_LOCKFILE environment
  variable.

xtables-monitor:
- Fix printing of IP addresses in ip6tables rules.

xtables-translate:
- Exit gracefully when called with '--help'.
- Fix some memory leaks.
- Add support for conntrack '--ctstate' match.
- Fix translation of ICMP type 'any' match.

libxtables:
- Fix for lower extension revisions not supported by the kernel anymore
  being retried each time the extension is used in a rule. This
  significantly improves performance when restoring large rulesets which
  extensively use e.g. conntrack match.

tests:
- Add help text to tests/shell/run-tests.sh.
- Test ip6tables error messages also, not just return codes.

General:
- Rejecting packets with ctstate INVALID might close good connections if
  packet reordering happened. Document this and suggest to use DROP
  target instead.
- Fix for iptables-apply script not being installed by 'make install'.
- Fix 'make uninstall', it was completely broken.
- Fix compiler warnings when building with NO_SHARED_LIBS.
- Extend 'make clean' to remove some generated man pages left in place.
- Fix for gcc-10 zero-length array warnings.

See the attached changelog for more details.

You can download it from:

http://www.netfilter.org/projects/iptables/downloads.html#iptables-1.8.6

To build the code, libnftnl 1.1.6 is required:

* http://netfilter.org/projects/libnftnl/downloads.html#libnftnl-1.1.6

In case of bugs and feature requests, file them via:

* https://bugzilla.netfilter.org

Happy firewalling!
Arturo Borrero Gonzalez (1):
  xtables-translate: don't fail if help was requested

Giuseppe Scrivano (1):
  iptables: accept lock file name at runtime

Jan Engelhardt (2):
  doc: document danger of applying REJECT to INVALID CTs
  build: resolve iptables-apply not getting installed

Maciej Żenczykowski (1):
  libxtables: compiler warning fixes for NO_SHARED_LIBS

Pablo Neira Ayuso (3):
  extensions: libxt_conntrack: provide translation for DNAT and SNAT
--ctstate
  iptables: replace libnftnl table list by linux list
  iptables-nft: fix basechain policy configuration

Phil Sutter (31):
  xtables-restore: Fix verbose mode table flushing
  build: Fix for failing 'make uninstall'
  xtables-translate: Use proper clear_cs function
  tests: shell: Add help output to run-tests.sh
  nft: Make table creation purely implicit
  nft: Be lazy when flushing
  nft: cache: Drop duplicate chain check
  nft: Drop pointless nft_xt_builtin_init() call
  nft: Turn nft_chain_save() into a foreach-callback
  nft: Use nft_chain_find() in two more places
  nft: Reorder enum nft_table_type
  nft: Eliminate table list from cache
  nft: Fix command name in ip6tables error message
  tests: shell: Merge and extend return codes test
  xtables-monitor: Fix ip6tables rule printing
  nft: Fix for ruleset flush while restoring
  Makefile: Add missing man pages to CLEANFILES
  nft: cache: Check consistency with NFT_CL_FAKE, too
  nft: Extend use of nftnl_chain_list_foreach()
  nft: Fold nftnl_rule_list_chain_save() into caller
  nft: Use nft_chain_find() in nft_chain_builtin_init()
  nft: Fix for broken address mask match detection
  extensions: libipt_icmp: Fix translation of type 'any'
  libxtables: Make sure extensions register in revision order
  libxtables: Simplify pending extension registration
  libxtables: Register multiple extensions in ascending order
  nft: Make batch_add_chain() return the added batch object
  nft: Fix error reporting for refreshed transactions
  libiptc: Avoid gcc-10 zero-length array warning
  nft: Fix for concurrent noflush restore calls
  tests: shell: Improve concurrent noflush restore test a bit


[iproute PATCH] ip link: Fix indenting in help text

2020-08-29 Thread Phil Sutter
Indenting of 'ip link set' options below 'link-netns' was wrong, they
should be on the same level as the above.

While being at it, fix closing brackets in vf-specific options. Also
write node/port_guid parameters in upper-case without curly braces: They
are supposed to be replaced by values, not put literally.

Fixes: 8589eb4efdf2a ("treewide: refactor help messages")
Fixes: 5a3ec4ba64783 ("iplink: Update usage in help message")
Signed-off-by: Phil Sutter 
---
 ip/iplink.c | 40 
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/ip/iplink.c b/ip/iplink.c
index 7d4b244d1d266..5ec33a98b96e9 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -86,26 +86,26 @@ void iplink_usage(void)
"   [ mtu MTU ]\n"
"   [ netns { PID | NAME } ]\n"
"   [ link-netns NAME | link-netnsid ID ]\n"
-   "   [ alias NAME ]\n"
-   "   [ vf NUM [ mac LLADDR ]\n"
-   "[ vlan VLANID [ qos VLAN-QOS ] 
[ proto VLAN-PROTO ] ]\n"
-   "[ rate TXRATE ]\n"
-   "[ max_tx_rate TXRATE ]\n"
-   "[ min_tx_rate TXRATE ]\n"
-   "[ spoofchk { on | off} ]\n"
-   "[ query_rss { on | off} ]\n"
-   "[ state { auto | enable | 
disable} ] ]\n"
-   "[ trust { on | off} ] ]\n"
-   "[ node_guid { eui64 } ]\n"
-   "[ port_guid { eui64 } ]\n"
-   "   [ { xdp | xdpgeneric | xdpdrv | 
xdpoffload } { off |\n"
-   " object FILE [ section NAME ] 
[ verbose ] |\n"
-   " pinned FILE } ]\n"
-   "   [ master DEVICE ][ vrf NAME ]\n"
-   "   [ nomaster ]\n"
-   "   [ addrgenmode { eui64 | none | 
stable_secret | random } ]\n"
-   "   [ protodown { on | off } ]\n"
-   "   [ gso_max_size BYTES ] | [ gso_max_segs 
PACKETS ]\n"
+   "   [ alias NAME ]\n"
+   "   [ vf NUM [ mac LLADDR ]\n"
+   "[ vlan VLANID [ qos VLAN-QOS ] [ proto 
VLAN-PROTO ] ]\n"
+   "[ rate TXRATE ]\n"
+   "[ max_tx_rate TXRATE ]\n"
+   "[ min_tx_rate TXRATE ]\n"
+   "[ spoofchk { on | off} ]\n"
+   "[ query_rss { on | off} ]\n"
+   "[ state { auto | enable | disable} ]\n"
+   "[ trust { on | off} ]\n"
+   "[ node_guid EUI64 ]\n"
+   "[ port_guid EUI64 ] ]\n"
+   "   [ { xdp | xdpgeneric | xdpdrv | xdpoffload } { 
off |\n"
+   " object FILE [ section NAME ] [ 
verbose ] |\n"
+   " pinned FILE } ]\n"
+   "   [ master DEVICE ][ vrf NAME ]\n"
+   "   [ nomaster ]\n"
+   "   [ addrgenmode { eui64 | none | stable_secret | 
random } ]\n"
+   "   [ protodown { on | off } ]\n"
+   "   [ gso_max_size BYTES ] | [ gso_max_segs PACKETS 
]\n"
"\n"
"   ip link show [ DEVICE | group GROUP ] [up] [master DEV] 
[vrf NAME] [type TYPE]\n"
"\n"
-- 
2.27.0



Re: netfilter: does the API break or something else ?

2020-05-14 Thread Phil Sutter
Hi,

On Wed, May 13, 2020 at 11:20:35PM +0800, Xiubo Li wrote:
> Recently I hit one netfilter issue, it seems the API breaks or something 
> else.

Just for the record, this was caused by a misconfigured kernel.

Cheers, Phil


Re: [net-next PATCH] net: rtnetlink: Enslave device before bringing it up

2019-06-02 Thread Phil Sutter
Hi David,

On Fri, May 31, 2019 at 02:26:15PM -0700, David Miller wrote:
> From: Phil Sutter 
> Date: Wed, 29 May 2019 15:51:20 +0200
> 
> > Unlike with bridges, one can't add an interface to a bond and set it up
> > at the same time:
> > 
> > | # ip link set dummy0 down
> > | # ip link set dummy0 master bond0 up
> > | Error: Device can not be enslaved while up.
> > 
> > Of all drivers with ndo_add_slave callback, bond and team decline if
> > IFF_UP flag is set, vrf cycles the interface (i.e., sets it down and
> > immediately up again) and the others just don't care.
> > 
> > Support the common notion of setting the interface up after enslaving it
> > by sorting the operations accordingly.
> > 
> > Signed-off-by: Phil Sutter 
> 
> What about other flags like IFF_PROMISCUITY?

Crap, that's the crux: Upon enslaving, team driver propagates
IFF_PROMISC and IFF_ALLMULTI flags from master to slave. With my change,
these propagations roll back by accident. So please disregard this
patch, I'll have to find a different solution.

Thanks, Phil


Re: [net-next PATCH] net: rtnetlink: Enslave device before bringing it up

2019-05-29 Thread Phil Sutter
On Wed, May 29, 2019 at 09:41:07AM -0600, David Ahern wrote:
> On 5/29/19 7:51 AM, Phil Sutter wrote:
> > Unlike with bridges, one can't add an interface to a bond and set it up
> > at the same time:
> > 
> > | # ip link set dummy0 down
> > | # ip link set dummy0 master bond0 up
> > | Error: Device can not be enslaved while up.
> > 
> > Of all drivers with ndo_add_slave callback, bond and team decline if
> > IFF_UP flag is set, vrf cycles the interface (i.e., sets it down and
> > immediately up again) and the others just don't care.
> > 
> > Support the common notion of setting the interface up after enslaving it
> > by sorting the operations accordingly.
> > 
> > Signed-off-by: Phil Sutter 
> > ---
> >  net/core/rtnetlink.c | 14 +++---
> >  1 file changed, 7 insertions(+), 7 deletions(-)
> > 
> 
> I agree with the intent - enslave before up.
> 
> Not sure how likely this is, but it does break the case:
> ip li set dummy0 master bond0 down

That's right. I could allow for that by ordering the enslave and state
change dynamically, but doubt it's worth the effort. Instead of the
above I'd rather expect 'ip l s d0 nomaster down' to be a more common
use-case (which is not affected by my patch).

Thanks, Phil


[net-next PATCH] net: rtnetlink: Enslave device before bringing it up

2019-05-29 Thread Phil Sutter
Unlike with bridges, one can't add an interface to a bond and set it up
at the same time:

| # ip link set dummy0 down
| # ip link set dummy0 master bond0 up
| Error: Device can not be enslaved while up.

Of all drivers with ndo_add_slave callback, bond and team decline if
IFF_UP flag is set, vrf cycles the interface (i.e., sets it down and
immediately up again) and the others just don't care.

Support the common notion of setting the interface up after enslaving it
by sorting the operations accordingly.

Signed-off-by: Phil Sutter 
---
 net/core/rtnetlink.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index adcc045952c2f..d81acace02ce5 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2488,13 +2488,6 @@ static int do_setlink(const struct sk_buff *skb,
call_netdevice_notifiers(NETDEV_CHANGEADDR, dev);
}
 
-   if (ifm->ifi_flags || ifm->ifi_change) {
-   err = dev_change_flags(dev, rtnl_dev_combine_flags(dev, ifm),
-  extack);
-   if (err < 0)
-   goto errout;
-   }
-
if (tb[IFLA_MASTER]) {
err = do_set_master(dev, nla_get_u32(tb[IFLA_MASTER]), extack);
if (err)
@@ -2502,6 +2495,13 @@ static int do_setlink(const struct sk_buff *skb,
status |= DO_SETLINK_MODIFIED;
}
 
+   if (ifm->ifi_flags || ifm->ifi_change) {
+   err = dev_change_flags(dev, rtnl_dev_combine_flags(dev, ifm),
+  extack);
+   if (err < 0)
+   goto errout;
+   }
+
if (tb[IFLA_CARRIER]) {
err = dev_change_carrier(dev, nla_get_u8(tb[IFLA_CARRIER]));
if (err)
-- 
2.21.0



Re: [PATCH iproute2 net-next] ip: add a new parameter -Numeric

2019-05-20 Thread Phil Sutter
Hi,

On Mon, May 20, 2019 at 03:56:48PM +0800, Hangbin Liu wrote:
> When calles rtnl_dsfield_n2a(), we get the dsfield name from
> /etc/iproute2/rt_dsfield. But different distribution may have
> different names. So add a new parameter '-Numeric' to only show
> the dsfield number.
> 
> This parameter is only used for tos value at present. We could enable
> this for other fields if needed in the future.

Rationale is to ensure expected output irrespective of host
configuration, especially in test scripts. Concrete example motivating
this patch was net/fib_rule_tests.sh in kernel self-tests.

> Suggested-by: Phil Sutter 
> Signed-off-by: Hangbin Liu 

Acked-by: Phil Sutter 

Thanks for doing this, Hangbin!

Phil


[iproute PATCH v2] ip-xfrm: Respect family in deleteall and list commands

2019-05-06 Thread Phil Sutter
Allow to limit 'ip xfrm {state|policy} list' output to a certain address
family and to delete all states/policies by family.

Although preferred_family was already set in filters, the filter
function ignored it. To enable filtering despite the lack of other
selectors, filter.use has to be set if family is not AF_UNSPEC.

Signed-off-by: Phil Sutter 
---
Changes since v1:
- Fix code indenting in first chunk.
---
 ip/xfrm_policy.c   | 6 +-
 ip/xfrm_state.c| 6 +-
 man/man8/ip-xfrm.8 | 6 +++---
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/ip/xfrm_policy.c b/ip/xfrm_policy.c
index 4a63e9ab602d7..9898435055d6f 100644
--- a/ip/xfrm_policy.c
+++ b/ip/xfrm_policy.c
@@ -410,6 +410,10 @@ static int xfrm_policy_filter_match(struct 
xfrm_userpolicy_info *xpinfo,
if (!filter.use)
return 1;
 
+   if (filter.xpinfo.sel.family != AF_UNSPEC &&
+   filter.xpinfo.sel.family != xpinfo->sel.family)
+   return 0;
+
if ((xpinfo->dir^filter.xpinfo.dir)&filter.dir_mask)
return 0;
 
@@ -780,7 +784,7 @@ static int xfrm_policy_list_or_deleteall(int argc, char 
**argv, int deleteall)
char *selp = NULL;
struct rtnl_handle rth;
 
-   if (argc > 0)
+   if (argc > 0 || preferred_family != AF_UNSPEC)
filter.use = 1;
filter.xpinfo.sel.family = preferred_family;
 
diff --git a/ip/xfrm_state.c b/ip/xfrm_state.c
index 9360143756016..f27270709d2fb 100644
--- a/ip/xfrm_state.c
+++ b/ip/xfrm_state.c
@@ -898,6 +898,10 @@ static int xfrm_state_filter_match(struct xfrm_usersa_info 
*xsinfo)
if (!filter.use)
return 1;
 
+   if (filter.xsinfo.family != AF_UNSPEC &&
+   filter.xsinfo.family != xsinfo->family)
+   return 0;
+
if (filter.id_src_mask)
if (xfrm_addr_match(&xsinfo->saddr, &filter.xsinfo.saddr,
filter.id_src_mask))
@@ -1170,7 +1174,7 @@ static int xfrm_state_list_or_deleteall(int argc, char 
**argv, int deleteall)
struct rtnl_handle rth;
bool nokeys = false;
 
-   if (argc > 0)
+   if (argc > 0 || preferred_family != AF_UNSPEC)
filter.use = 1;
filter.xsinfo.family = preferred_family;
 
diff --git a/man/man8/ip-xfrm.8 b/man/man8/ip-xfrm.8
index 9547808539a08..cfce1e40b7f7d 100644
--- a/man/man8/ip-xfrm.8
+++ b/man/man8/ip-xfrm.8
@@ -89,7 +89,7 @@ ip-xfrm \- transform configuration
 .IR MASK " ] ]"
 
 .ti -8
-.BR "ip xfrm state " deleteall " ["
+.BR ip " [ " -4 " | " -6 " ] " "xfrm state deleteall" " ["
 .IR ID " ]"
 .RB "[ " mode
 .IR MODE " ]"
@@ -99,7 +99,7 @@ ip-xfrm \- transform configuration
 .IR FLAG-LIST " ]"
 
 .ti -8
-.BR "ip xfrm state " list " ["
+.BR ip " [ " -4 " | " -6 " ] " "xfrm state list" " ["
 .IR ID " ]"
 .RB "[ " nokeys " ]"
 .RB "[ " mode
@@ -257,7 +257,7 @@ ip-xfrm \- transform configuration
 .IR PTYPE " ]"
 
 .ti -8
-.BR "ip xfrm policy" " { " deleteall " | " list " }"
+.BR ip " [ " -4 " | " -6 " ] " "xfrm policy" " { " deleteall " | " list " }"
 .RB "[ " nosock " ]"
 .RI "[ " SELECTOR " ]"
 .RB "[ " dir
-- 
2.21.0



[iproute PATCH] ip-xfrm: Respect family in deleteall and list commands

2019-04-29 Thread Phil Sutter
Allow to limit 'ip xfrm {state|policy} list' output to a certain address
family and to delete all states/policies by family.

Although preferred_family was already set in filters, the filter
function ignored it. To enable filtering despite the lack of other
selectors, filter.use has to be set if family is not AF_UNSPEC.

Signed-off-by: Phil Sutter 
---
 ip/xfrm_policy.c   | 6 +-
 ip/xfrm_state.c| 6 +-
 man/man8/ip-xfrm.8 | 6 +++---
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/ip/xfrm_policy.c b/ip/xfrm_policy.c
index 4a63e9ab602d7..c6dfe836c5374 100644
--- a/ip/xfrm_policy.c
+++ b/ip/xfrm_policy.c
@@ -410,6 +410,10 @@ static int xfrm_policy_filter_match(struct 
xfrm_userpolicy_info *xpinfo,
if (!filter.use)
return 1;
 
+   if (filter.xpinfo.sel.family != AF_UNSPEC &&
+   filter.xpinfo.sel.family != xpinfo->sel.family)
+   return 0;
+
if ((xpinfo->dir^filter.xpinfo.dir)&filter.dir_mask)
return 0;
 
@@ -780,7 +784,7 @@ static int xfrm_policy_list_or_deleteall(int argc, char 
**argv, int deleteall)
char *selp = NULL;
struct rtnl_handle rth;
 
-   if (argc > 0)
+   if (argc > 0 || preferred_family != AF_UNSPEC)
filter.use = 1;
filter.xpinfo.sel.family = preferred_family;
 
diff --git a/ip/xfrm_state.c b/ip/xfrm_state.c
index 9360143756016..f27270709d2fb 100644
--- a/ip/xfrm_state.c
+++ b/ip/xfrm_state.c
@@ -898,6 +898,10 @@ static int xfrm_state_filter_match(struct xfrm_usersa_info 
*xsinfo)
if (!filter.use)
return 1;
 
+   if (filter.xsinfo.family != AF_UNSPEC &&
+   filter.xsinfo.family != xsinfo->family)
+   return 0;
+
if (filter.id_src_mask)
if (xfrm_addr_match(&xsinfo->saddr, &filter.xsinfo.saddr,
filter.id_src_mask))
@@ -1170,7 +1174,7 @@ static int xfrm_state_list_or_deleteall(int argc, char 
**argv, int deleteall)
struct rtnl_handle rth;
bool nokeys = false;
 
-   if (argc > 0)
+   if (argc > 0 || preferred_family != AF_UNSPEC)
filter.use = 1;
filter.xsinfo.family = preferred_family;
 
diff --git a/man/man8/ip-xfrm.8 b/man/man8/ip-xfrm.8
index 9547808539a08..cfce1e40b7f7d 100644
--- a/man/man8/ip-xfrm.8
+++ b/man/man8/ip-xfrm.8
@@ -89,7 +89,7 @@ ip-xfrm \- transform configuration
 .IR MASK " ] ]"
 
 .ti -8
-.BR "ip xfrm state " deleteall " ["
+.BR ip " [ " -4 " | " -6 " ] " "xfrm state deleteall" " ["
 .IR ID " ]"
 .RB "[ " mode
 .IR MODE " ]"
@@ -99,7 +99,7 @@ ip-xfrm \- transform configuration
 .IR FLAG-LIST " ]"
 
 .ti -8
-.BR "ip xfrm state " list " ["
+.BR ip " [ " -4 " | " -6 " ] " "xfrm state list" " ["
 .IR ID " ]"
 .RB "[ " nokeys " ]"
 .RB "[ " mode
@@ -257,7 +257,7 @@ ip-xfrm \- transform configuration
 .IR PTYPE " ]"
 
 .ti -8
-.BR "ip xfrm policy" " { " deleteall " | " list " }"
+.BR ip " [ " -4 " | " -6 " ] " "xfrm policy" " { " deleteall " | " list " }"
 .RB "[ " nosock " ]"
 .RI "[ " SELECTOR " ]"
 .RB "[ " dir
-- 
2.21.0



[iproute PATCH] ip-address: Use correct max attribute value in print_vf_stats64()

2019-02-21 Thread Phil Sutter
IFLA_VF_MAX is larger than the highest valid index in vf array.

Fixes: a1b99717c7cd7 ("Add displaying VF traffic statistics")
Signed-off-by: Phil Sutter 
---
 ip/ipaddress.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index bc30d326ca0a3..139afe9d572e6 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -547,7 +547,7 @@ static void print_vf_stats64(FILE *fp, struct rtattr 
*vfstats)
return;
}
 
-   parse_rtattr_nested(vf, IFLA_VF_MAX, vfstats);
+   parse_rtattr_nested(vf, IFLA_VF_STATS_MAX, vfstats);
 
if (is_json_context()) {
open_json_object("stats");
-- 
2.20.1



Re: [PATCH iproute2] lib/libnetlink: ensure a minimum of 32KB for the buffer used in rtnl_recvmsg()

2019-02-14 Thread Phil Sutter
On Thu, Feb 14, 2019 at 10:34:06AM -0700, David Ahern wrote:
> On 2/14/19 6:49 AM, Michal Kubecek wrote:
> > On Tue, Feb 12, 2019 at 07:04:17PM -0700, David Ahern wrote:
> >>
> >> Do we know of any single message sizes > 32k? 2d34851cd341 cites
> >> increasing VF's but at some point there is a limit. If not, the whole
> >> PEEK thing should go away and we just malloc 32k (or 64k) buffers for
> >> each recvmsg.
> > 
> > IFLA_VF_LIST is by far the biggest thing I have seen so far. I don't
> > remember exact numbers but the issue with 32KB buffer (for the whole
> > RTM_NELINK message) was encountered by some of our customers with NICs
> > having 120 or 128 VFs.
> > 
> > There is a bigger issue with IFLA_VFINFO_LIST, though, as it's an
> > attribute so that netlink limits its size to 64 KB. IIRC with current
> > size of IFLA_VF_INFO this would be reached with 270-280 VFs (I'm sure
> > the number was higer than 256 but not too much higher.)

Using netdevsim, 'ip link show' becomes unusable after enabling more
than 244 VFs. I guess it depends on how much info per VF is available.

> > This would mean unless we let something else grow too much, the whole
> > message shouldn't get much bigger than 64 KB. And if we can find some
> > other solution (e.g. passing VF information in separate messages if
> > client declares support), even 32 KB would be more than enough.
> 
> That's what I was asking, thanks. So 32kB today is sufficient, 64kB has
> future buffer. So this whole PEEK and allocate the message size is
> overkill. It could just as easily been bumped from 32kB to 64kB in the
> original patch and been good for a while.

Yes, I think the real problem is how VF-related messages are structured
currently.

Cheers, Phil


Re: [PATCH iproute2 net-next v2 3/4] ss: Buffer raw fields first, then render them as a table

2019-02-13 Thread Phil Sutter
Hi Stephen,

On Wed, Feb 13, 2019 at 01:55:34PM -0800, Stephen Hemminger wrote:
> On Wed, 13 Feb 2019 22:17:16 +0100
> Stefano Brivio  wrote:
> 
> > On Wed, 13 Feb 2019 09:31:03 -0800
> > Eric Dumazet  wrote:
> > 
> > > On 02/13/2019 12:37 AM, Stefano Brivio wrote:  
> > > > On Tue, 12 Feb 2019 16:42:04 -0800
> > > > Eric Dumazet  wrote:
> > > > 
> > > >> I do not get it.
> > > >>
> > > >> "ss -emoi " uses almost 1KB per socket.
> > > >>
> > > >> 10,000,000 sockets -> we need about 10GB of memory  ???
> > > >>
> > > >> This is a serious regression.
> > > > 
> > > > I guess this is rather subjective: the worst case I considered back then
> > > > was the output of 'ss -tei0' (less than 500 bytes) for one million
> > > > sockets, which gives 500M of memory, which should in turn be fine on a
> > > > machine handling one million sockets.
> > > > 
> > > > Now, if 'ss -emoi' on 10 million sockets is an actual use case (out of
> > > > curiosity: how are you going to process that output? Would JSON help?),
> > > > I see two easy options to solve this:
> > > 
> > > 
> > > ss -temoi | parser (written in shell or awk or whatever...)
> > > 
> > > This is a use case, I just got bitten because using ss command
> > > actually OOM my container, while trying to debug a busy GFE.
> > > 
> > > The host itself can have 10,000,000 TCP sockets, but usually sysadmin 
> > > shells
> > > run in a container with no more than 500 MB available. 
> > > 
> > > Otherwise, it would be too easy for a buggy program to OOM the whole 
> > > machine
> > > and have angry customers.
> > >   
> > > > 
> > > > 1. flush the output every time we reach a given buffer size (1M
> > > >perhaps). This might make the resulting blocks slightly unaligned,
> > > >with occasional loss of readability on lines occurring every 1k to
> > > >10k sockets approximately, even though after 1k sockets column sizes
> > > >won't change much (it looks anyway better than the original), and I
> > > >don't expect anybody to actually scroll that output
> > > > 
> > > > 2. add a switch for unbuffered output, but then you need to remember to
> > > >pass it manually, and the whole output would be as bad as the
> > > >original in case you need the switch.
> > > > 
> > > > I'd rather go with 1., it's easy to implement (we already have partial
> > > > flushing with '--events') and it looks like a good compromise on
> > > > usability. Thoughts?
> > > > 
> > > 
> > > 1 seems fine, but a switch for 'please do not try to format' would be 
> > > fine.
> > > 
> > > I wonder why we try to 'format' when stdout is a pipe or a regular file . 
> > >  
> > 
> > On a second thought: what about | less, or | grep [ports],
> > or > readable.log? I guess those might also be rather common use cases,
> > what do you think?
> > 
> > I'm tempted to skip this for the moment and just go with option 1.
> > 
> 
> What I would favor:
>   * use big enough columns that for the common case everything lines up 
> fine
>   * if column is to wide just print that element wider (which is what 
> print %Ns does)

This is pretty much the situation Stefano attempted to improve, minus
scaling the columns to max terminal width. ss output formatting being
quirky and unreadable with either small or large terminals was the
number one reason I heard so far why people prefer netstat.

> and
>   * add json output for programs that want to parse
>   * use print_uint etc for that

For Eric's use-case, skipping any buffering and tabular output if stdout
is not a TTY suffices. In fact, iproute2 does this already for colored
output (see check_enable_color() for reference).

Adding JSON output support everywhere is a nice feature when it comes to
scripting, but it won't help console users. Unless you expect CLI
frontends to come turning that JSON into human-readable output.

IMHO, JSON output wouldn't even help in this case - unless Eric indeed
prefers to write/use a JSON parser for his analysis instead of something
along 'ss | grep'.

> The buffering patch (in iproute2-next) can/will be reverted.

It's not fair to claim that despite Stefano's commitment to fix the
reported issues. His ss output rewrite is there since v4.15.0 and
according to git history it needed only two fixes so far. I've had
one-liners which required more follow-ups than that! Also, we're still
discovering issues introduced by all the jsonify patches. Allowing for
people to get things right not the first time but after a few tries is
important. If you want to revert something, start with features which
have a fundamental design issue in the exact situation they tried to
improve, like the MSG_PEEK | MSG_TRUNC thing Hangbin and me wrote.

Thanks, Phil


Re: [PATCH iproute2] lib/libnetlink: ensure a minimum of 32KB for the buffer used in rtnl_recvmsg()

2019-02-13 Thread Phil Sutter
Hi Eric,

On Tue, Feb 12, 2019 at 05:58:41PM -0800, Eric Dumazet wrote:
> In the past, we tried to increase the buffer size up to 32 KB in order
> to reduce number of syscalls per dump.
> 
> Commit 2d34851cd341 ("lib/libnetlink: re malloc buff if size is not enough")
> brought the size back to 4KB because the kernel can not know the application
> is ready to receive bigger requests.
> 
> See kernel commits 9063e21fb026 ("netlink: autosize skb lengthes") and
> d35c99ff77ec ("netlink: do not enter direct reclaim from netlink_dump()")
> for more details.

Wouldn't it be better if the kernel recognized MSG_TRUNC and allocated a
buffer large enough to hold the full message in that case? I have no
idea how hard that would be to implement, but calling recvmsg() with
MSG_TRUNC set and not getting the full message length in return is not
quite what one expects after reading recvmsg(2).

Cheers, Phil


Re: [iproute PATCH] man: ip-link: Describe promisc mode

2019-02-12 Thread Phil Sutter
Hi,

On Mon, Feb 11, 2019 at 11:36:10AM -0800, Stephen Hemminger wrote:
> On Mon, 11 Feb 2019 10:17:06 +0100
> Phil Sutter  wrote:
[...]
> > diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
> > index 73d37c190fffa..5c327f01b6b45 100644
> > --- a/man/man8/ip-link.8.in
> > +++ b/man/man8/ip-link.8.in
> > @@ -1780,6 +1780,14 @@ flag on the device. Indicates that address can 
> > change when interface goes down (
> >  .B NOT
> >  used by the Linux).
> >  
> > +.TP
> > +.BR "promisc on " or " promisc off"
> > +change the
> > +.B PROMISC
> > +flag on the device. This requests receipt of all packets arriving at the 
> > NIC
> > +irrespective of their destination MAC address. It is typically used by 
> > traffic
> > +sniffers and also set by Linux bridges for their ports.
> 
> This added sentence is confusing. The Linux bridge enables it by default,
> and if a sniffer wants to enable it then it is best done from the application.
> In either case the user should not need to directly set this through ip 
> commands.

Well, "used by traffic sniffers" does not imply they don't set it by
themselves (at least not in the German accent I'm reading it :). And
there probably are ones that don't.

> Yes, there are a lots of incorrect web pages out there that say you need to
> set an interface into promiscious mode (with ifconfig) before adding it to a 
> bridge.
> That might have been true 20 years ago, but hasn't been needed since Linux 2.4

In this case ip-link.8 would become a resource pointing out that bridges
do that by themselves nowadays.

> Bottom line, adding this to the documentation is not going to be helpful.

OK, so I'll send a v2 with that last sentence removed?

Thanks, Phil


[iproute PATCH] man: ip-link: Describe promisc mode

2019-02-11 Thread Phil Sutter
Briefly explain what it is and where it's typically used.

Signed-off-by: Phil Sutter 
---
 man/man8/ip-link.8.in | 8 
 1 file changed, 8 insertions(+)

diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index 73d37c190fffa..5c327f01b6b45 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -1780,6 +1780,14 @@ flag on the device. Indicates that address can change 
when interface goes down (
 .B NOT
 used by the Linux).
 
+.TP
+.BR "promisc on " or " promisc off"
+change the
+.B PROMISC
+flag on the device. This requests receipt of all packets arriving at the NIC
+irrespective of their destination MAC address. It is typically used by traffic
+sniffers and also set by Linux bridges for their ports.
+
 .TP
 .BI name " NAME"
 change the name of the device. This operation is not
-- 
2.20.1



Re: How to set promiscuous mode

2019-02-11 Thread Phil Sutter
Hi,

On Sat, Feb 09, 2019 at 03:22:33PM -0500, William Flanagan wrote:
> Working with iproute2 for a task with Wireshark.  I don't see the 
> command in 'ip' to put an Ethernet port into promiscuous mode.  A reply 
> from the openSuse forum (below) tells me how.
> 
> I'm wondering if this should be in the 'MAN ip' page.

Please have a look at ip-link(8), its synopsis section at least lists
'promisc' option for 'ip link set' command. I'll follow-up with a patch
adding a little description, though.

Cheers, Phil


Re: [iproute PATCH] ip-link: Fix listing of alias interfaces

2019-02-08 Thread Phil Sutter
Hi Stephen,

On Fri, Feb 08, 2019 at 09:56:47AM -0800, Stephen Hemminger wrote:
> On Thu,  7 Feb 2019 14:05:27 +0100
> Phil Sutter  wrote:
> 
> > Commit 50b9950dd9011 ("link dump filter") accidentally broke listing of
> > links in the old alias interface notation:
> > 
> > | % ip link show eth0:1
> > | RTNETLINK answers: No such device
> > | Cannot send link get request: No such device
> > 
> > Prior to the above commit, link lookup was performed via ifindex
> > returned by if_nametoindex(). The latter uses SIOCGIFINDEX ioctl call
> > which on kernel side causes the colon-suffix to be dropped before doing
> > the interface lookup. Netlink API though doesn't care about that at all.
> > To keep things backward compatible, mimick ioctl API behaviour and drop
> > the colon-suffix prior to sending the RTM_GETLINK request.
> > 
> > Fixes: 50b9950dd9011 ("link dump filter")
> > Signed-off-by: Phil Sutter 
> 
> It is a regression, but the original code was kind of broken.
> iproute2 doesn't need or want the old style alias interface notation.

Thanks for your clarification!

Cheers, Phil


Re: [iproute PATCH] ip-link: Fix listing of alias interfaces

2019-02-08 Thread Phil Sutter
On Thu, Feb 07, 2019 at 04:24:36PM -0800, Stephen Hemminger wrote:
> On Thu,  7 Feb 2019 14:05:27 +0100
> Phil Sutter  wrote:
> 
> > Commit 50b9950dd9011 ("link dump filter") accidentally broke listing of
> > links in the old alias interface notation:
> > 
> > | % ip link show eth0:1
> > | RTNETLINK answers: No such device
> > | Cannot send link get request: No such device
> > 
> > Prior to the above commit, link lookup was performed via ifindex
> > returned by if_nametoindex(). The latter uses SIOCGIFINDEX ioctl call
> > which on kernel side causes the colon-suffix to be dropped before doing
> > the interface lookup. Netlink API though doesn't care about that at all.
> > To keep things backward compatible, mimick ioctl API behaviour and drop
> > the colon-suffix prior to sending the RTM_GETLINK request.
> > 
> > Fixes: 50b9950dd9011 ("link dump filter")
> > Signed-off-by: Phil Sutter 
> 
> What about mistaken usage where the text after the colon is not a number,
> or has additional colon?

That's completely ignored in ioctl-case as well. See dev_ioctl() in
kernel sources:

| colon = strchr(ifr->ifr_name, ':');
| if (colon)
| *colon = 0;

If you pass 'group 0' to link show command, ioctl code path is taken. It
allows (and drops) arbitrary input after the colon (as long as the total
name doesn't exceed 15 characters).

Cheers, Phil


Re: [net-next PATCH] net: rtnetlink: Support alias interfaces with RTM_GETLINK

2019-02-07 Thread Phil Sutter
On Thu, Feb 07, 2019 at 02:21:56PM +0100, Michal Kubecek wrote:
> On Thu, Feb 07, 2019 at 02:02:15PM +0100, Phil Sutter wrote:
> > On Thu, Feb 07, 2019 at 01:39:39PM +0100, Michal Kubecek wrote:
> > 
> > > But I still don't think it would be a good idea. It's bad enough that
> > > (as I just learned to my surprise) "ip link del dummy1:0" deletes dummy1
> > > without any complaint because ip uses SIOCGIFINDEX ioctl for ifindex
> > > lookup.
> > 
> > I'm struggling a bit with all this. The original problem is iproute2
> > commit 50b9950dd9011 ("link dump filter") which changed 'ip link show'
> > to not use if_indextoname() when given just an interface name. So lookup
> > happens by name (via RTM_GETLINK) and consequently 'ip link show eth0:1'
> > doesn't give link stats of eth0 anymore.
> 
> I would rather consider it a bug that it ever did. It's quite harmless
> with "show" but with "set" or "delete", the effect can be quite
> disastrous.

Yes, you're right. And unless maintainers care about this compatibility,
I guess these monsters will slowly disappear as things progress towards
netlink.

> We want to preserve backward compatibility in general but iproute2
> commit 50b9950dd901 is 4.5 years old and nobody seems to have complained
> about the change in behaviour until now.

Well, enterprise distributions deliberately try to prevent those changes
from hitting their customers. And (perfect match), these users are the
least tolerating ones. :)

> > Given that iproute2 is supposed to be backwards compatible, the only
> > valid option I see is to make sure netlink API calls like the above
> > behave identical to the ioctl ones they replace. Which means allowing
> > for 'ip link show eth0:42' even if there's no address with that label
> > assigned to eth0 as well as your example above.
> 
> One reason why I don't like this idea is that iproute2 is not the only
> user of rtnetlink interface. There is wicked and glibc for sure, most
> likely also NetworkManager (don't remember) and systemd-networkd (didn't
> check) and certainly many others I never heard of. Changing the logic
> in kernel rtnetlink implementation would affect all of them.
> 
> If we want to restore the old behaviour of ip (which I'm not convinced
> of), it would make more sense to me to strip the :* suffix in iproute2.

I just sent a patch, let's see what userspace has to say about it.

Thanks, Phil


[iproute PATCH] ip-link: Fix listing of alias interfaces

2019-02-07 Thread Phil Sutter
Commit 50b9950dd9011 ("link dump filter") accidentally broke listing of
links in the old alias interface notation:

| % ip link show eth0:1
| RTNETLINK answers: No such device
| Cannot send link get request: No such device

Prior to the above commit, link lookup was performed via ifindex
returned by if_nametoindex(). The latter uses SIOCGIFINDEX ioctl call
which on kernel side causes the colon-suffix to be dropped before doing
the interface lookup. Netlink API though doesn't care about that at all.
To keep things backward compatible, mimick ioctl API behaviour and drop
the colon-suffix prior to sending the RTM_GETLINK request.

Fixes: 50b9950dd9011 ("link dump filter")
Signed-off-by: Phil Sutter 
---
 ip/ipaddress.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 2bc33f3a3b3f2..bc30d326ca0a3 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -1974,6 +1974,7 @@ static int ipaddr_list_flush_or_save(int argc, char 
**argv, int action)
 * the link device
 */
if (filter_dev && filter.group == -1 && do_link == 1) {
+   *strchrnul(filter_dev, ':') = '\0';
if (iplink_get(filter_dev, RTEXT_FILTER_VF) < 0) {
perror("Cannot send link get request");
delete_json_obj();
-- 
2.20.1



Re: [net-next PATCH] net: rtnetlink: Support alias interfaces with RTM_GETLINK

2019-02-07 Thread Phil Sutter
Hi,

On Thu, Feb 07, 2019 at 01:39:39PM +0100, Michal Kubecek wrote:
> On Thu, Feb 07, 2019 at 01:27:28PM +0100, Phil Sutter wrote:
> > On Thu, Feb 07, 2019 at 11:24:38AM +0100, Phil Sutter wrote:
> > > Align interface name handling regarding alias interfaces in
> > > rtnl_getlink() with dev_ioctl() treating SIOCGIFINDEX ioctl calls. The
> > > latter function strips any colon suffix before doing the interface
> > > lookup, do the same for RTM_GETLINK requests.
> > 
> > After a second thought, I'll self-NACK this one: Given that netlink API
> > is completely unrelated to ioctl one, there is no inherent need to do
> > things the same way. Looking at RTM_NEWLINK handler, it seems possible
> > to create interface names containing a colon via netlink.
> 
> Not since commit a4176a939186 ("net: reject creation of netdev names
> with colons") which disallowed using such names in general.

In my typical afterthought I tried 'ip link add d0:1 type dummy' and was
surprised to get EINVAL from kernel side. Just discovered that line in
dev_valid_name(), too.

> But I still don't think it would be a good idea. It's bad enough that
> (as I just learned to my surprise) "ip link del dummy1:0" deletes dummy1
> without any complaint because ip uses SIOCGIFINDEX ioctl for ifindex
> lookup.

I'm struggling a bit with all this. The original problem is iproute2
commit 50b9950dd9011 ("link dump filter") which changed 'ip link show'
to not use if_indextoname() when given just an interface name. So lookup
happens by name (via RTM_GETLINK) and consequently 'ip link show eth0:1'
doesn't give link stats of eth0 anymore.

Given that iproute2 is supposed to be backwards compatible, the only
valid option I see is to make sure netlink API calls like the above
behave identical to the ioctl ones they replace. Which means allowing
for 'ip link show eth0:42' even if there's no address with that label
assigned to eth0 as well as your example above.

Cheers, Phil


Re: [net-next PATCH] net: rtnetlink: Support alias interfaces with RTM_GETLINK

2019-02-07 Thread Phil Sutter
On Thu, Feb 07, 2019 at 11:24:38AM +0100, Phil Sutter wrote:
> Align interface name handling regarding alias interfaces in
> rtnl_getlink() with dev_ioctl() treating SIOCGIFINDEX ioctl calls. The
> latter function strips any colon suffix before doing the interface
> lookup, do the same for RTM_GETLINK requests.

After a second thought, I'll self-NACK this one: Given that netlink API
is completely unrelated to ioctl one, there is no inherent need to do
things the same way. Looking at RTM_NEWLINK handler, it seems possible
to create interface names containing a colon via netlink. Of course
those interfaces are not accessible via ioctl() then, but so are
secondary interface addresses which don't have a properly chosen label.

Sorry for the noise, Phil


[net-next PATCH] net: rtnetlink: Support alias interfaces with RTM_GETLINK

2019-02-07 Thread Phil Sutter
Align interface name handling regarding alias interfaces in
rtnl_getlink() with dev_ioctl() treating SIOCGIFINDEX ioctl calls. The
latter function strips any colon suffix before doing the interface
lookup, do the same for RTM_GETLINK requests.

Signed-off-by: Phil Sutter 
---
 net/core/rtnetlink.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a51cab95ba64c..aaee4df0ff00c 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3312,8 +3312,10 @@ static int rtnl_getlink(struct sk_buff *skb, struct 
nlmsghdr *nlh,
return PTR_ERR(tgt_net);
}
 
-   if (tb[IFLA_IFNAME])
+   if (tb[IFLA_IFNAME]) {
nla_strlcpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ);
+   *strchrnul(ifname, ':') = '\0';
+   }
 
if (tb[IFLA_EXT_MASK])
ext_filter_mask = nla_get_u32(tb[IFLA_EXT_MASK]);
-- 
2.20.1



Re: [PATCH] test_rhashtable: remove semaphore usage

2018-12-10 Thread Phil Sutter
Hi,

On Tue, Dec 11, 2018 at 01:45:52PM +0800, Herbert Xu wrote:
> On Mon, Dec 10, 2018 at 10:17:20PM +0100, Arnd Bergmann wrote:
> > This is one of only two files that initialize a semaphore to a negative
> > value. We don't really need the two semaphores here at all, but can do
> > the same thing in more conventional and more effient way, by using a
> > single waitqueue and an atomic thread counter.
> > 
> > This gets us a little bit closer to eliminating classic semaphores from
> > the kernel. It also fixes a corner case where we fail to continue after
> > one of the threads fails to start up.
> > 
> > An alternative would be to use a split kthread_create()+wake_up_process()
> > and completely eliminate the separate synchronization.
> > 
> > Signed-off-by: Arnd Bergmann 
> > ---
> > This is part of a longer, untested, series to remove semaphores
> > from the kernel, please review as such before applying.
> > ---
> >  lib/test_rhashtable.c | 28 
> >  1 file changed, 16 insertions(+), 12 deletions(-)
> 
> This was created by Phil Sutter so I am adding him to the cc list.

Thanks, I would have missed it otherwise. Just a minor nit:

> > diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c
> > index 18de5ff1255b..12bdea4f6c20 100644
> > --- a/lib/test_rhashtable.c
> > +++ b/lib/test_rhashtable.c
[...]
> > @@ -635,8 +636,9 @@ static int threadfunc(void *data)
> > int i, step, err = 0, insert_retries = 0;
> > struct thread_data *tdata = data;
> >  
> > -   up(&prestart_sem);
> > -   if (down_interruptible(&startup_sem))
> > +   if (atomic_dec_and_test(&startup_count))
> > +   wake_up(&startup_wait);
> > +   if (wait_event_interruptible(startup_wait, atomic_read(&startup_count) 
> > == -1))
> > pr_err("  thread[%d]: down_interruptible failed\n", tdata->id);

The error message should probably be adjusted as well. Apart from that:

Acked-by: Phil Sutter 

Thanks, Phil


Re: [PATCH RFC 15/15] net: replace **** with a hug

2018-12-03 Thread Phil Sutter
On Fri, Nov 30, 2018 at 11:27:24AM -0800, Jarkko Sakkinen wrote:
> In order to comply with the CoC, replace  with a hug.

This is incorrect: The patch doesn't replace '', but 'fuck' instead.

> - /* Fuck, we are miserable poor guys... */
> + /* Hug, we are miserable poor guys... */

I fail to see the benefit from doing that. If I wrote: "Hug you you
hugging hug", you would still find it offensive as long as you know what
"hug" is supposed to translate into. And if you don't, you'll soon find
out.

A more constructive approach to replacing "fuck this and that" than
simply applying 's/fuck/whatever/' would be to provide details what
exactly is so bad about this and that. Of course that's not possible
without considerable effort, unlike your approach which doesn't alter
what is written but merely which words are used for it.

Cheers, Phil


Re: [iproute PATCH] man: ip-route.8: Fix ENCAP references in synopsis

2018-11-30 Thread Phil Sutter
Hi Simon,

On Fri, Nov 30, 2018 at 03:39:05PM +0100, Simon Horman wrote:
> On Wed, Nov 28, 2018 at 12:12:32PM +0100, Phil Sutter wrote:
> > The different encapsulation types are described in ENCAP_*
> > non-terminals, but ENCAP definition lists them without the ENCAP_
> > prefix. Fix this for consistency.
> > 
> > Signed-off-by: Phil Sutter 
> > ---
> >  man/man8/ip-route.8.in | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/man/man8/ip-route.8.in b/man/man8/ip-route.8.in
> > index 11dd4ca7abf68..26dfe0b06c86b 100644
> > --- a/man/man8/ip-route.8.in
> > +++ b/man/man8/ip-route.8.in
> > @@ -187,7 +187,8 @@ throw " | " unreachable " | " prohibit " | " blackhole 
> > " | " nat " ]"
> >  
> >  .ti -8
> >  .IR ENCAP " := [ "
> > -.IR MPLS " | " IP " | " BPF " | " SEG6 " | " SEG6LOCAL " ] "
> > +.IR ENCAP_MPLS " | " ENCAP_IP " | " ENCAP_BPF " | "
> > +.IR ENCAP_SEG6 " | " ENCAP_SEG6LOCAL " ] "
> >  
> >  .ti -8
> >  .IR ENCAP_MPLS " := "
> 
> This looks good but do we have another inconsistency with regards
> to ENCAP and ENCAPTYPE ENCAPHDR (further down).
> 
> Glancing over the file the following also seem inconsistent:
> 
> * NH / NEXTHOP
> * SCOPE / SCOPE_VAL

Well, strictly speaking yes. If I want to know more about NH listed in
synopsis I can't find it further down. Though its description in
'nexthop' is incomplete, as well.

My personal focus is that non-terminals in synopsis are defined in there
as well, otherwise I can't parse and the thrown exception is hard to
clean off my screen. ;)

My assumption (actually what *I* do) is to search for terminals in order
to get more info. For instance, I would search for 'nexthop' or 'encap'
not 'NH'.

I guess the broader question is about the scope of non-terminals in
synopsis and description sections - I don't necessarily consider them
related.

Of course feel free to fix what you don't like, I'll review and (N)ACK
if you put me in Cc. :)

Cheers, Phil


[iproute PATCH] ssfilter: Fix for inverted last expression

2018-11-29 Thread Phil Sutter
When fixing for shift/reduce conflicts, possibility to invert the last
expression by prefixing with '!' or 'not' was accidentally removed.

Fix this by allowing for expr to be an inverted expr so that any
reference to it in exprlist accepts the inverted prefix.

Reported-by: Eric Dumazet 
Fixes: b2038cc0b2403 ("ssfilter: Eliminate shift/reduce conflicts")
Signed-off-by: Phil Sutter 
---
 misc/ssfilter.y | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/misc/ssfilter.y b/misc/ssfilter.y
index 0413dddaa7584..a901ae753a284 100644
--- a/misc/ssfilter.y
+++ b/misc/ssfilter.y
@@ -54,10 +54,6 @@ null:   /* NOTHING */ { $$ = NULL; }
 ;
 
 exprlist: expr
-| '!' expr
-{
-$$ = alloc_node(SSF_NOT, $2);
-}
 | exprlist '|' expr
 {
 $$ = alloc_node(SSF_OR, $1);
@@ -83,6 +79,10 @@ expr:'(' exprlist ')'
{
$$ = $2;
}
+   | '!' expr
+   {
+   $$ = alloc_node(SSF_NOT, $2);
+   }
| DCOND eq HOSTCOND
 {
$$ = alloc_node(SSF_DCOND, $3);
-- 
2.19.0



[iproute PATCH] man: ip-route.8: Fix ENCAP references in synopsis

2018-11-28 Thread Phil Sutter
The different encapsulation types are described in ENCAP_*
non-terminals, but ENCAP definition lists them without the ENCAP_
prefix. Fix this for consistency.

Signed-off-by: Phil Sutter 
---
 man/man8/ip-route.8.in | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/man/man8/ip-route.8.in b/man/man8/ip-route.8.in
index 11dd4ca7abf68..26dfe0b06c86b 100644
--- a/man/man8/ip-route.8.in
+++ b/man/man8/ip-route.8.in
@@ -187,7 +187,8 @@ throw " | " unreachable " | " prohibit " | " blackhole " | 
" nat " ]"
 
 .ti -8
 .IR ENCAP " := [ "
-.IR MPLS " | " IP " | " BPF " | " SEG6 " | " SEG6LOCAL " ] "
+.IR ENCAP_MPLS " | " ENCAP_IP " | " ENCAP_BPF " | "
+.IR ENCAP_SEG6 " | " ENCAP_SEG6LOCAL " ] "
 
 .ti -8
 .IR ENCAP_MPLS " := "
-- 
2.19.0



Re: iproute2 compile and linking errors on Fedora 19

2018-11-27 Thread Phil Sutter
Hi,

On Mon, Nov 26, 2018 at 04:06:27PM -0800, Stephen Hemminger wrote:
> On Tue, 31 Oct 2017 16:28:20 -0700
> Cong Wang  wrote:
> 
> > On Tue, Oct 31, 2017 at 2:10 PM, Stephen Hemminger
> >  wrote:
> > >
> > > IPPROTO_MH comes from include/uapi/linux/in6.h
> > > Maybe it is trying to use old kernel headers from libc.  

That header defines it only if __UAPI_DEF_IPPROTO_V6 is defined to a
non-zero value. When compiling upstream iproute2 on my system, this is
not the case. I also see why:

- ip/xfrm_policy.c includes 
  - netdb.h includes 
- netinet/in.h defines _NETINET_IN_H
- ip/xfrm_policy.c then includes "xfrm.h"
  - ip/xfrm.h includes 
- linux/xfrm.h includes 
  - linux/in6.h includes 
- linux/libc-compat.h defines __UAPI_DEF_IPPROTO_V6 to 0 if
  _NETINET_IN_H is defined.

Note that I didn't follow all includes so the above is not necessarily
what exactly happens. But either way, ip/xfrm_policy.c doesn't get
IPPROTO_MH define from include/uapi/linux/in6.h.

Cheers, Phil


[iproute PATCH] man: rdma: Add reference to rdma-resource.8

2018-11-26 Thread Phil Sutter
All rdma-related man pages list each other in SEE ALSO section, only
rdma-resource.8 is missing. Add it for the sake of consistency.

Signed-off-by: Phil Sutter 
---
 man/man8/rdma-dev.8  | 1 +
 man/man8/rdma-link.8 | 1 +
 man/man8/rdma.8  | 1 +
 3 files changed, 3 insertions(+)

diff --git a/man/man8/rdma-dev.8 b/man/man8/rdma-dev.8
index 461681b60f54d..b7abfe1088c2f 100644
--- a/man/man8/rdma-dev.8
+++ b/man/man8/rdma-dev.8
@@ -49,6 +49,7 @@ Shows the state of specified RDMA device.
 .SH SEE ALSO
 .BR rdma (8),
 .BR rdma-link (8),
+.BR rdma-resource (8),
 .br
 
 .SH AUTHOR
diff --git a/man/man8/rdma-link.8 b/man/man8/rdma-link.8
index 97dd8bb994d24..bddf34746e8b2 100644
--- a/man/man8/rdma-link.8
+++ b/man/man8/rdma-link.8
@@ -49,6 +49,7 @@ Shows the state of specified rdma link.
 .SH SEE ALSO
 .BR rdma (8),
 .BR rdma-dev (8),
+.BR rdma-resource (8),
 .br
 
 .SH AUTHOR
diff --git a/man/man8/rdma.8 b/man/man8/rdma.8
index 12aa149bbaf3e..b2b5aef866ab0 100644
--- a/man/man8/rdma.8
+++ b/man/man8/rdma.8
@@ -106,6 +106,7 @@ Exit status is 0 if command was successful or a positive 
integer upon failure.
 .SH SEE ALSO
 .BR rdma-dev (8),
 .BR rdma-link (8),
+.BR rdma-resource (8),
 .br
 
 .SH REPORTING BUGS
-- 
2.19.0



[iproute PATCH v2] ip-address: Fix filtering by negated address flags

2018-11-15 Thread Phil Sutter
When disabling a flag, one needs to AND with the inverse not the flag
itself. Otherwise specifying for instance 'home -nodad' will effectively
clear the flags variable.

While being at it, simplify the code a bit by merging common parts of
negated and non-negated case branches. Also allow for the "special
cases" to be inverted, too.

Fixes: f73ac674d0abf ("ip: change flag names to an array")
Signed-off-by: Phil Sutter 
---
Changes since v1:
- Improve "special cases" handling per Stephen's suggestion.
- Update man page accordingly.
---
 ip/ipaddress.c   | 43 +++-
 man/man8/ip-address.8.in | 39 +++-
 2 files changed, 50 insertions(+), 32 deletions(-)

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index cd8cc76a3473f..7212f082b31b4 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -1211,38 +1211,35 @@ static void print_ifa_flags(FILE *fp, const struct 
ifaddrmsg *ifa,
 
 static int get_filter(const char *arg)
 {
+   bool inv = false;
unsigned int i;
 
+   if (arg[0] == '-') {
+   inv = true;
+   arg++;
+   }
+
/* Special cases */
if (strcmp(arg, "dynamic") == 0) {
-   filter.flags &= ~IFA_F_PERMANENT;
-   filter.flagmask |= IFA_F_PERMANENT;
+   inv = !inv;
+   arg = "permanent";
} else if (strcmp(arg, "primary") == 0) {
-   filter.flags &= ~IFA_F_SECONDARY;
-   filter.flagmask |= IFA_F_SECONDARY;
-   } else if (*arg == '-') {
-   for (i = 0; i < ARRAY_SIZE(ifa_flag_names); i++) {
-   if (strcmp(arg + 1, ifa_flag_names[i].name))
-   continue;
+   inv = !inv;
+   arg = "secondary";
+   }
 
-   filter.flags &= ifa_flag_names[i].value;
-   filter.flagmask |= ifa_flag_names[i].value;
-   return 0;
-   }
+   for (i = 0; i < ARRAY_SIZE(ifa_flag_names); i++) {
+   if (strcmp(arg, ifa_flag_names[i].name))
+   continue;
 
-   return -1;
-   } else {
-   for (i = 0; i < ARRAY_SIZE(ifa_flag_names); i++) {
-   if (strcmp(arg, ifa_flag_names[i].name))
-   continue;
+   if (inv)
+   filter.flags &= ~ifa_flag_names[i].value;
+   else
filter.flags |= ifa_flag_names[i].value;
-   filter.flagmask |= ifa_flag_names[i].value;
-   return 0;
-   }
-   return -1;
+   filter.flagmask |= ifa_flag_names[i].value;
+   return 0;
}
-
-   return 0;
+   return -1;
 }
 
 static int ifa_label_match_rta(int ifindex, const struct rtattr *rta)
diff --git a/man/man8/ip-address.8.in b/man/man8/ip-address.8.in
index c3861b3725ccb..2a553190a37e9 100644
--- a/man/man8/ip-address.8.in
+++ b/man/man8/ip-address.8.in
@@ -76,10 +76,15 @@ ip-address \- protocol address management
 .IR FLAG-LIST " := [ "  FLAG-LIST " ] " FLAG
 
 .ti -8
-.IR FLAG " := "
-.RB "[ " permanent " | " dynamic " | " secondary " | " primary " |"
-.RB [ - ] tentative " | [" - ] deprecated " | [" - ] dadfailed " |"
-.BR temporary " |"
+.IR FLAG " := ["
+.RB [ - ] permanent " |"
+.RB [ - ] dynamic " |"
+.RB [ - ] secondary " |"
+.RB [ - ] primary " |"
+.RB [ - ] tentative " |"
+.RB [ - ] deprecated " |"
+.RB [ - ] dadfailed " |"
+.RB [ - ] temporary " |"
 .IR CONFFLAG-LIST " ]"
 
 .ti -8
@@ -334,7 +339,9 @@ only list running interfaces.
 .BR dynamic " and " permanent
 (IPv6 only) only list addresses installed due to stateless
 address configuration or only list permanent (not dynamic)
-addresses.
+addresses. These two flags are inverses of each other, so
+.BR -dynamic " is equal to " permanent " and "
+.BR -permanent " is equal to " dynamic .
 
 .TP
 .B tentative
@@ -365,12 +372,26 @@ address detection.
 address detection.
 
 .TP
-.B temporary
-(IPv6 only) only list temporary addresses.
+.BR temporary " or " secondary
+List temporary IPv6 or secondary IPv4 addresses only. The Linux kernel shares a
+single bit for those, so they are actually aliases for each other although the
+meaning differs depending on address family.
 
 .TP
-.BR primary " and " secondary
-only list primary (or secondary) addresses.
+.BR -temporary " or " -secondary
+These flags are aliases for
+.BR primary .
+
+.TP
+.B primary
+List only primary addresses, in IPv6 exclude temporary ones. This flag is the
+inverse of
+.BR temporary " and " secondary .
+
+.TP
+.B -primary
+This is an alias for
+.BR temporary " or " secondary .
 
 .SS ip address flush - flush protocol addresses
 This command flushes the protocol addresses selected by some criteria.
-- 
2.19.0



Re: [iproute PATCH] ip-address: Fix filtering by negated address flags

2018-11-15 Thread Phil Sutter
Hi Stephen,

On Wed, Nov 14, 2018 at 11:29:03AM -0800, Stephen Hemminger wrote:
[...]
> I was thinking something like this which simplifies the logic.
> 
> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
> index cd8cc76a3473..3f1510383071 100644
> --- a/ip/ipaddress.c
> +++ b/ip/ipaddress.c
> @@ -1212,37 +1212,34 @@ static void print_ifa_flags(FILE *fp, const struct 
> ifaddrmsg *ifa,
>  static int get_filter(const char *arg)
>  {
>   unsigned int i;
> + bool inv = false;
>  
>   /* Special cases */
>   if (strcmp(arg, "dynamic") == 0) {
> - filter.flags &= ~IFA_F_PERMANENT;
> - filter.flagmask |= IFA_F_PERMANENT;
> + arg = "-permanent";

I like this idea, also because it's much easier to get how 'dynamic' and
'primary' are special. I'll respin then.

Thanks, Phil


Re: [iproute PATCH] ip-address: Fix filtering by negated address flags

2018-11-14 Thread Phil Sutter
Hi Stephen,

On Tue, Nov 13, 2018 at 02:47:59PM -0800, Stephen Hemminger wrote:
> On Tue, 13 Nov 2018 16:12:01 +0100
> Phil Sutter  wrote:
> 
> > +   if (arg[0] == '-') {
> > +   inv = true;
> > +   arg++;
> > +   }
> The inverse logic needs to be moved into the loop handling filter names.
> 
> Otherwise, you get weirdness like "-dynamic" being accepted and not
> doing what was expected.

I intentionally moved it there to allow for '-dynamic' and '-primary'
as well. IMO this is consistent: 'dynamic' is the inverse of 'permanent'
and 'primary' the inverse of 'secondary' but currently only '-permanent'
and '-secondary' are allowed. With my patch applied, one may specify not
only '-permanent' to get the same effect as 'dynamic' but also
'-dynamic' to get the same effect as 'permanent'. Likewise for the other
two. Did I miss something?

> Also, please make sure the man page matches the code.

Oh, right. Given the above is fine with you, I will add the man page
change in v2.

Thanks, Phil


[iproute PATCH v2] man: ip-route.8: Document nexthop limit

2018-11-13 Thread Phil Sutter
Add a note to 'nexthop' description stating the maximum number of
nexthops per command and pointing at 'append' command as a workaround.

Signed-off-by: Phil Sutter 
---
Changes since v1:
- Reviewed text.
- 'route append' trick works with IPv6 only.
---
 man/man8/ip-route.8.in | 9 +
 1 file changed, 9 insertions(+)

diff --git a/man/man8/ip-route.8.in b/man/man8/ip-route.8.in
index a33ce1f0f4006..11dd4ca7abf68 100644
--- a/man/man8/ip-route.8.in
+++ b/man/man8/ip-route.8.in
@@ -589,6 +589,15 @@ argument lists:
 route reflecting its relative bandwidth or quality.
 .in -8
 
+The internal buffer used in iproute2 limits the maximum number of nexthops that
+may be specified in one go. If only
+.I ADDRESS
+is given, the current buffer size allows for 144 IPv6 nexthops and 253 IPv4
+ones. For IPv4, this effectively limits the number of nexthops possible per
+route. With IPv6, further nexthops may be appended to the same route via
+.B "ip route append"
+command.
+
 .TP
 .BI scope " SCOPE_VAL"
 the scope of the destinations covered by the route prefix.
-- 
2.19.0



[iproute PATCH] ip-address: Fix filtering by negated address flags

2018-11-13 Thread Phil Sutter
When disabling a flag, one needs to AND with the inverse not the flag
itself. Otherwise specifying for instance 'home -nodad' will effectively
clear the flags variable.

While being at it, simplify the code a bit by merging common parts of
negated and non-negated case branches. Also allow for the "special
cases" to be inverted, too.

Fixes: f73ac674d0abf ("ip: change flag names to an array")
Signed-off-by: Phil Sutter 
---
 ip/ipaddress.c | 32 ++--
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index cd8cc76a3473f..7b98877085878 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -1211,31 +1211,35 @@ static void print_ifa_flags(FILE *fp, const struct 
ifaddrmsg *ifa,
 
 static int get_filter(const char *arg)
 {
+   bool inv = false;
unsigned int i;
 
+   if (arg[0] == '-') {
+   inv = true;
+   arg++;
+   }
/* Special cases */
if (strcmp(arg, "dynamic") == 0) {
-   filter.flags &= ~IFA_F_PERMANENT;
+   if (inv)
+   filter.flags |= IFA_F_PERMANENT;
+   else
+   filter.flags &= ~IFA_F_PERMANENT;
filter.flagmask |= IFA_F_PERMANENT;
} else if (strcmp(arg, "primary") == 0) {
-   filter.flags &= ~IFA_F_SECONDARY;
+   if (inv)
+   filter.flags |= IFA_F_SECONDARY;
+   else
+   filter.flags &= ~IFA_F_SECONDARY;
filter.flagmask |= IFA_F_SECONDARY;
-   } else if (*arg == '-') {
-   for (i = 0; i < ARRAY_SIZE(ifa_flag_names); i++) {
-   if (strcmp(arg + 1, ifa_flag_names[i].name))
-   continue;
-
-   filter.flags &= ifa_flag_names[i].value;
-   filter.flagmask |= ifa_flag_names[i].value;
-   return 0;
-   }
-
-   return -1;
} else {
for (i = 0; i < ARRAY_SIZE(ifa_flag_names); i++) {
if (strcmp(arg, ifa_flag_names[i].name))
continue;
-   filter.flags |= ifa_flag_names[i].value;
+
+   if (inv)
+   filter.flags &= ~ifa_flag_names[i].value;
+   else
+   filter.flags |= ifa_flag_names[i].value;
filter.flagmask |= ifa_flag_names[i].value;
return 0;
}
-- 
2.19.0



[iproute PATCH] ip-route: Fix nexthop encap parsing

2018-11-13 Thread Phil Sutter
When parsing nexthop parameters, a buffer of 4k bytes is provided. Yet,
in lwt_parse_encap() and some functions called by it, buffer size was
assumed to be 1k despite the actual size was provided. This led to
spurious buffer size errors if the buffer was filled by previous nexthop
parameters to exceed that 1k boundary.

Fixes: 1e5293056a02c ("lwtunnel: Add encapsulation support to ip route")
Fixes: 5866bddd9aa9e ("ila: Add support for ILA lwtunnels")
Fixes: ed67f83806538 ("ila: Support for checksum neutral translation")
Fixes: 86905c8f057c0 ("ila: support for configuring identifier and hook types")
Fixes: b15f440e78373 ("lwt: BPF support for LWT")
Signed-off-by: Phil Sutter 
---
 ip/iproute_lwtunnel.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/ip/iproute_lwtunnel.c b/ip/iproute_lwtunnel.c
index 8f49701509d20..85ab13cb31746 100644
--- a/ip/iproute_lwtunnel.c
+++ b/ip/iproute_lwtunnel.c
@@ -860,7 +860,7 @@ static int parse_encap_ila(struct rtattr *rta, size_t len,
 
argc--; argv++;
 
-   if (rta_addattr64(rta, 1024, ILA_ATTR_LOCATOR, locator))
+   if (rta_addattr64(rta, len, ILA_ATTR_LOCATOR, locator))
return -1;
 
while (argc > 0) {
@@ -874,7 +874,7 @@ static int parse_encap_ila(struct rtattr *rta, size_t len,
invarg("\"csum-mode\" value is invalid\n",
   *argv);
 
-   ret = rta_addattr8(rta, 1024, ILA_ATTR_CSUM_MODE,
+   ret = rta_addattr8(rta, len, ILA_ATTR_CSUM_MODE,
   (__u8)csum_mode);
 
argc--; argv++;
@@ -888,7 +888,7 @@ static int parse_encap_ila(struct rtattr *rta, size_t len,
invarg("\"ident-type\" value is invalid\n",
   *argv);
 
-   ret = rta_addattr8(rta, 1024, ILA_ATTR_IDENT_TYPE,
+   ret = rta_addattr8(rta, len, ILA_ATTR_IDENT_TYPE,
   (__u8)ident_type);
 
argc--; argv++;
@@ -902,7 +902,7 @@ static int parse_encap_ila(struct rtattr *rta, size_t len,
invarg("\"hook-type\" value is invalid\n",
   *argv);
 
-   ret = rta_addattr8(rta, 1024, ILA_ATTR_HOOK_TYPE,
+   ret = rta_addattr8(rta, len, ILA_ATTR_HOOK_TYPE,
   (__u8)hook_type);
 
argc--; argv++;
@@ -1034,7 +1034,7 @@ static int parse_encap_bpf(struct rtattr *rta, size_t 
len, int *argcp,
if (get_unsigned(&headroom, *argv, 0) || headroom == 0)
invarg("headroom is invalid\n", *argv);
if (!headroom_set)
-   rta_addattr32(rta, 1024, LWT_BPF_XMIT_HEADROOM,
+   rta_addattr32(rta, len, LWT_BPF_XMIT_HEADROOM,
  headroom);
headroom_set = 1;
} else if (strcmp(*argv, "help") == 0) {
@@ -1075,7 +1075,7 @@ int lwt_parse_encap(struct rtattr *rta, size_t len, int 
*argcp, char ***argvp)
exit(-1);
}
 
-   nest = rta_nest(rta, 1024, RTA_ENCAP);
+   nest = rta_nest(rta, len, RTA_ENCAP);
switch (type) {
case LWTUNNEL_ENCAP_MPLS:
ret = parse_encap_mpls(rta, len, &argc, &argv);
@@ -1108,7 +1108,7 @@ int lwt_parse_encap(struct rtattr *rta, size_t len, int 
*argcp, char ***argvp)
 
rta_nest_end(rta, nest);
 
-   ret = rta_addattr16(rta, 1024, RTA_ENCAP_TYPE, type);
+   ret = rta_addattr16(rta, len, RTA_ENCAP_TYPE, type);
 
*argcp = argc;
*argvp = argv;
-- 
2.19.0



Re: [iproute PATCH] man: ip-route.8: Document nexthop limit

2018-11-13 Thread Phil Sutter
Hi David,

On Mon, Nov 12, 2018 at 04:37:48PM -0800, David Ahern wrote:
> On 11/12/18 2:21 PM, Phil Sutter wrote:
> > diff --git a/man/man8/ip-route.8.in b/man/man8/ip-route.8.in
> > index a33ce1f0f4006..383178c11331e 100644
> > --- a/man/man8/ip-route.8.in
> > +++ b/man/man8/ip-route.8.in
> > @@ -589,6 +589,13 @@ argument lists:
> >  route reflecting its relative bandwidth or quality.
> >  .in -8
> >  
> > +The internal buffer used in iproute2 limits the maximum number of nexthops 
> > to
> > +be specified in one go. If only a gateway address is given, the current 
> > buffer
> > +size allows for 144 IPv6 nexthops and 253 IPv4 ones. If more are required, 
> > they
> > +may be added to the existing route using
> > +.B "ip route append"
> > +command.
> > +
> 
> That is not true for IPv4. 'ip ro append' adds a new route after the
> existing route - an entry that can not be hit unless all of the nexthops
> in the first route are down. 'ip ro prepend' adds a new entry before the
> existing one meaning it takes precedence over the existing entries.

Oh, thanks for clarifying. I'll follow-up with a fixed version.

> For IPv6, 'append' and 'prepend' both add new nexthops to the existing
> entry.

'ip route prepend' is not even documented. :(

Thanks, Phil


[iproute PATCH] man: ip-route.8: Document nexthop limit

2018-11-12 Thread Phil Sutter
Add a note to 'nexthop' description stating the maximum number of
nexthops per command and pointing at 'append' command as a workaround.

Signed-off-by: Phil Sutter 
---
 man/man8/ip-route.8.in | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/man/man8/ip-route.8.in b/man/man8/ip-route.8.in
index a33ce1f0f4006..383178c11331e 100644
--- a/man/man8/ip-route.8.in
+++ b/man/man8/ip-route.8.in
@@ -589,6 +589,13 @@ argument lists:
 route reflecting its relative bandwidth or quality.
 .in -8
 
+The internal buffer used in iproute2 limits the maximum number of nexthops to
+be specified in one go. If only a gateway address is given, the current buffer
+size allows for 144 IPv6 nexthops and 253 IPv4 ones. If more are required, they
+may be added to the existing route using
+.B "ip route append"
+command.
+
 .TP
 .BI scope " SCOPE_VAL"
 the scope of the destinations covered by the route prefix.
-- 
2.19.0



Re: [PATCH iproute2] testsuite: ss: Fix spacing in expected output for ssfilter.t

2018-11-11 Thread Phil Sutter
Hi Stefano,

On Sun, Nov 11, 2018 at 12:50:39PM +0100, Stefano Brivio wrote:
> On Sat, 10 Nov 2018 22:48:44 +0100
> Phil Sutter  wrote:
> 
> > On Sat, Nov 10, 2018 at 10:21:59AM +0100, Stefano Brivio wrote:
> >
> > > @@ -12,37 +12,37 @@ export TCPDIAG_FILE="$(dirname $0)/ss1.dump"
> > >  ts_log "[Testing ssfilter]"
> > >  
> > >  ts_ss "$0" "Match dport = 22" -Htna dport = 22
> > > -test_on "ESTAB0   0 10.0.0.1:36266   
> > > 10.0.0.1:22"
> > > +test_on "ESTAB 0   010.0.0.1:36266   
> > > 10.0.0.1:22"  
> > 
> > How about using a regular expression ('test_on' calls grep with '-E')?
> > E.g. this instead of the above:
> > 
> > | test_on "ESTAB *0 *0 *10.0.0.1:36266 *10.0.0.1:22"
> 
> I also thought about something similar (perhaps uglier: piping the
> output through tr -s ' ' in ts_ss()).
> 
> But then I thought we might like to use this test to also check that we
> don't accidentally modify spacing, so I'd rather leave it as it is,
> with this patch on top.

Fair enough, no objections from my side.

Thanks, Phil


Re: [PATCH iproute2] testsuite: ss: Fix spacing in expected output for ssfilter.t

2018-11-10 Thread Phil Sutter
Hi Stefano,

On Sat, Nov 10, 2018 at 10:21:59AM +0100, Stefano Brivio wrote:
> Since commit 00240899ec0b ("ss: Actually print left delimiter for
> columns") changes spacing in ss output, we also need to adjust for that in
> the ss filter test.
> 
> Fixes: 00240899ec0b ("ss: Actually print left delimiter for columns")
> Signed-off-by: Stefano Brivio 
> ---
>  testsuite/tests/ss/ssfilter.t | 24 
>  1 file changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/testsuite/tests/ss/ssfilter.t b/testsuite/tests/ss/ssfilter.t
> index e74f1765cb72..3091054f2892 100755
> --- a/testsuite/tests/ss/ssfilter.t
> +++ b/testsuite/tests/ss/ssfilter.t
> @@ -12,37 +12,37 @@ export TCPDIAG_FILE="$(dirname $0)/ss1.dump"
>  ts_log "[Testing ssfilter]"
>  
>  ts_ss "$0" "Match dport = 22" -Htna dport = 22
> -test_on "ESTAB0   0 10.0.0.1:36266   
> 10.0.0.1:22"
> +test_on "ESTAB 0   010.0.0.1:36266   
> 10.0.0.1:22"

How about using a regular expression ('test_on' calls grep with '-E')?
E.g. this instead of the above:

| test_on "ESTAB *0 *0 *10.0.0.1:36266 *10.0.0.1:22"

Note that I didn't test this change, just made sure 'grep -E' recognizes
' *' as expected.

Cheers, Phil


Re: [PATCH iproute2] bridge: fdb: remove redundant dev string in show output

2018-11-08 Thread Phil Sutter
Hi Roopa,

On Wed, Nov 07, 2018 at 03:14:09PM -0800, Roopa Prabhu wrote:
> From: Roopa Prabhu 
> 
> After commit 4abb8c723a64 ("bridge: fdb: Fix for missing
> keywords in non-JSON output"), I am seeing a double print for dev
> in bridge fdb show. eg:
> "44:38:39:00:6a:82 dev dev bridge vlan 1 master bridge permanent"
> 
> this patch removes the redundant print.
> 
> Fixes: 4abb8c723a64 ("bridge: fdb: Fix for missing keywords in non-JSON 
> output")

Oh, stupid mistake. :(

Thanks for the fix!

> CC: Phil Sutter 
> Signed-off-by: Roopa Prabhu 

Acked-by: Phil Sutter 


[iproute PATCH v2] tc: htb: Print default value in hex

2018-10-23 Thread Phil Sutter
Value of 'default' is assumed to be hexadecimal when parsing, so
consequently it should be printed in hex as well. This is a regression
introduced when adding JSON output.

As requested, also change JSON output to print the value as hex string.

Fixes: f354fa6aa5ff0 ("tc: jsonify htb qdisc")
Signed-off-by: Phil Sutter 
---
Changes since v1:
- Use print_0xhex().
---
 tc/q_htb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tc/q_htb.c b/tc/q_htb.c
index c8b2941d945b7..5fb11d28c5c3a 100644
--- a/tc/q_htb.c
+++ b/tc/q_htb.c
@@ -332,7 +332,7 @@ static int htb_print_opt(struct qdisc_util *qu, FILE *f, 
struct rtattr *opt)
if (RTA_PAYLOAD(tb[TCA_HTB_INIT])  < sizeof(*gopt)) return -1;
 
print_int(PRINT_ANY, "r2q", "r2q %d", gopt->rate2quantum);
-   print_uint(PRINT_ANY, "default", " default %u", gopt->defcls);
+   print_0xhex(PRINT_ANY, "default", " default %x", gopt->defcls);
print_uint(PRINT_ANY, "direct_packets_stat",
   " direct_packets_stat %u", gopt->direct_pkts);
if (show_details) {
-- 
2.19.0



Re: [iproute PATCH] tc: htb: Print default value in hex

2018-10-22 Thread Phil Sutter
On Mon, Oct 22, 2018 at 09:56:23AM -0700, Stephen Hemminger wrote:
> On Fri, 19 Oct 2018 17:42:55 +0200
> Phil Sutter  wrote:
> 
> > Value of 'default' is assumed to be hexadecimal when parsing, so
> > consequently it should be printed in hex as well. This is a regression
> > introduced when adding JSON output.
> > 
> > Fixes: f354fa6aa5ff0 ("tc: jsonify htb qdisc")
> > Signed-off-by: Phil Sutter 
> > ---
> >  tc/q_htb.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/tc/q_htb.c b/tc/q_htb.c
> > index c8b2941d945b7..c69485db8ef7d 100644
> > --- a/tc/q_htb.c
> > +++ b/tc/q_htb.c
> > @@ -332,7 +332,7 @@ static int htb_print_opt(struct qdisc_util *qu, FILE 
> > *f, struct rtattr *opt)
> > if (RTA_PAYLOAD(tb[TCA_HTB_INIT])  < sizeof(*gopt)) return -1;
> >  
> > print_int(PRINT_ANY, "r2q", "r2q %d", gopt->rate2quantum);
> > -   print_uint(PRINT_ANY, "default", " default %u", gopt->defcls);
> > +   print_uint(PRINT_ANY, "default", " default %x", gopt->defcls);
> > print_uint(PRINT_ANY, "direct_packets_stat",
> >" direct_packets_stat %u", gopt->direct_pkts);
> > if (show_details) {
> 
> The output should be in hex. Because you used print_uint it will still be in 
> decimal
> in the JSON output.  The correct fix here is to use print_0xhex instead.
> 
> Please fix that, test it, and resubmit.

I did test, but not JSON output. It is not a regression in the latter
because it has been like this since JSON output was added to HTB.
Assuming the output is programmatically parsed, it shouldn't matter too
much as well.

Grepping through the code I found three more cases where print_uint()
was used for data printed in %x. Should I fix those as well?

Thanks, Phil


[iproute PATCH] tc: htb: Print default value in hex

2018-10-19 Thread Phil Sutter
Value of 'default' is assumed to be hexadecimal when parsing, so
consequently it should be printed in hex as well. This is a regression
introduced when adding JSON output.

Fixes: f354fa6aa5ff0 ("tc: jsonify htb qdisc")
Signed-off-by: Phil Sutter 
---
 tc/q_htb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tc/q_htb.c b/tc/q_htb.c
index c8b2941d945b7..c69485db8ef7d 100644
--- a/tc/q_htb.c
+++ b/tc/q_htb.c
@@ -332,7 +332,7 @@ static int htb_print_opt(struct qdisc_util *qu, FILE *f, 
struct rtattr *opt)
if (RTA_PAYLOAD(tb[TCA_HTB_INIT])  < sizeof(*gopt)) return -1;
 
print_int(PRINT_ANY, "r2q", "r2q %d", gopt->rate2quantum);
-   print_uint(PRINT_ANY, "default", " default %u", gopt->defcls);
+   print_uint(PRINT_ANY, "default", " default %x", gopt->defcls);
print_uint(PRINT_ANY, "direct_packets_stat",
   " direct_packets_stat %u", gopt->direct_pkts);
if (show_details) {
-- 
2.19.0



Re: [iproute PATCH] rdma: Fix for ineffective check in add_filter()

2018-10-18 Thread Phil Sutter
Hi,

On Thu, Oct 18, 2018 at 09:27:47AM -0600, David Ahern wrote:
> On 10/18/18 5:41 AM, Phil Sutter wrote:
> > With 'name' field defined as array in struct filters, it will always
> > contain a value irrespective of whether a name was assigned or not.
> > 
> > Fix this by turning the field into a const char pointer.
> > 
> > Fixes: 8cd644095842a ("devlink: Add support for devlink resource 
> > abstraction")
> 
> Stale paste buffer? Seems like the correct tag is
> Fixes: 1174be72d1b4c ("rdma: Add filtering infrastructure")

Oh, indeed. Also the wrong person in Cc: Arkadi is innocent, culprit
patch came from Leon.

Stephen, should I respin or will you correct the Fixes tag?

Thanks, Phil


[iproute PATCH] tc: Remove pointless assignments in batch()

2018-10-18 Thread Phil Sutter
All these assignments are later overwritten without reading in between,
so just drop them.

Fixes: 485d0c6001c4a ("tc: Add batchsize feature for filter and actions")
Signed-off-by: Phil Sutter 
---
 tc/tc.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/tc/tc.c b/tc/tc.c
index c493d5e92e0dd..eacd5c08573d4 100644
--- a/tc/tc.c
+++ b/tc/tc.c
@@ -325,11 +325,11 @@ static int batch(const char *name)
struct batch_buf *head = NULL, *tail = NULL, *buf_pool = NULL;
char *largv[100], *largv_next[100];
char *line, *line_next = NULL;
-   bool bs_enabled_next = false;
bool bs_enabled = false;
bool lastline = false;
int largc, largc_next;
bool bs_enabled_saved;
+   bool bs_enabled_next;
int batchsize = 0;
size_t len = 0;
int ret = 0;
@@ -358,7 +358,6 @@ static int batch(const char *name)
goto Exit;
largc = makeargs(line, largv, 100);
bs_enabled = batchsize_enabled(largc, largv);
-   bs_enabled_saved = bs_enabled;
do {
if (getcmdline(&line_next, &len, stdin) == -1)
lastline = true;
@@ -394,7 +393,6 @@ static int batch(const char *name)
len = 0;
bs_enabled_saved = bs_enabled;
bs_enabled = bs_enabled_next;
-   bs_enabled_next = false;
 
if (largc == 0) {
largc = largc_next;
-- 
2.19.0



[iproute PATCH] tipc: Drop unused variable 'genl'

2018-10-18 Thread Phil Sutter
Although initialized by call to libmnl, the variable is used only in a
call to sizeof(). Drop it and call sizeof with its type instead.

Fixes: f043759dd4928 ("tipc: add new TIPC configuration tool")
Signed-off-by: Phil Sutter 
---
 tipc/node.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/tipc/node.c b/tipc/node.c
index 0fa1064c72a17..2fec6753c974d 100644
--- a/tipc/node.c
+++ b/tipc/node.c
@@ -26,13 +26,12 @@
 
 static int node_list_cb(const struct nlmsghdr *nlh, void *data)
 {
-   struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
struct nlattr *info[TIPC_NLA_MAX + 1] = {};
struct nlattr *attrs[TIPC_NLA_NODE_MAX + 1] = {};
char str[33] = {};
uint32_t addr;
 
-   mnl_attr_parse(nlh, sizeof(*genl), parse_attrs, info);
+   mnl_attr_parse(nlh, sizeof(struct genlmsghdr), parse_attrs, info);
if (!info[TIPC_NLA_NODE])
return MNL_CB_ERROR;
 
@@ -160,7 +159,6 @@ static int cmd_node_set_nodeid(struct nlmsghdr *nlh, const 
struct cmd *cmd,
 
 static int nodeid_get_cb(const struct nlmsghdr *nlh, void *data)
 {
-   struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
struct nlattr *info[TIPC_NLA_MAX + 1] = {};
struct nlattr *attrs[TIPC_NLA_NET_MAX + 1] = {};
char str[33] = {0,};
@@ -168,7 +166,7 @@ static int nodeid_get_cb(const struct nlmsghdr *nlh, void 
*data)
uint64_t *w0 = (uint64_t *) &id[0];
uint64_t *w1 = (uint64_t *) &id[8];
 
-   mnl_attr_parse(nlh, sizeof(*genl), parse_attrs, info);
+   mnl_attr_parse(nlh, sizeof(struct genlmsghdr), parse_attrs, info);
if (!info[TIPC_NLA_NET])
return MNL_CB_ERROR;
 
@@ -207,11 +205,10 @@ static int cmd_node_get_nodeid(struct nlmsghdr *nlh, 
const struct cmd *cmd,
 
 static int netid_get_cb(const struct nlmsghdr *nlh, void *data)
 {
-   struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
struct nlattr *info[TIPC_NLA_MAX + 1] = {};
struct nlattr *attrs[TIPC_NLA_NET_MAX + 1] = {};
 
-   mnl_attr_parse(nlh, sizeof(*genl), parse_attrs, info);
+   mnl_attr_parse(nlh, sizeof(struct genlmsghdr), parse_attrs, info);
if (!info[TIPC_NLA_NET])
return MNL_CB_ERROR;
 
-- 
2.19.0



[iproute PATCH] ip-route: Fix parse_encap_seg6() srh parsing

2018-10-18 Thread Phil Sutter
In case caller did not specify 'segs' parameter, parse_srh() would read
garbage while iterating over 'segbuf'. Avoid this by initializing
'segbuf' to an empty string.

Fixes: e8493916a8ede ("iproute: add support for SR-IPv6 lwtunnel encapsulation")
Signed-off-by: Phil Sutter 
---
 ip/iproute_lwtunnel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ip/iproute_lwtunnel.c b/ip/iproute_lwtunnel.c
index 85045d4fff742..4ebfaa7cd6826 100644
--- a/ip/iproute_lwtunnel.c
+++ b/ip/iproute_lwtunnel.c
@@ -494,7 +494,7 @@ static int parse_encap_seg6(struct rtattr *rta, size_t len, 
int *argcp,
struct seg6_iptunnel_encap *tuninfo;
struct ipv6_sr_hdr *srh;
char **argv = *argvp;
-   char segbuf[1024];
+   char segbuf[1024] = "";
int argc = *argcp;
int encap = -1;
__u32 hmac = 0;
-- 
2.19.0



[iproute PATCH] rdma: Don't pass garbage to rd_check_is_filtered()

2018-10-18 Thread Phil Sutter
Variables 'src_port' and 'dst_port' are initialized only if attributes
RDMA_NLDEV_ATTR_RES_SRC_ADDR or RDMA_NLDEV_ATTR_RES_DST_ADDR are
present. Make sure to pass them over to rd_check_is_filtered() only if
that is the case.

Fixes: 9a362cc71a455 ("rdma: Add CM_ID resource tracking information")
Signed-off-by: Phil Sutter 
---
 rdma/res.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/rdma/res.c b/rdma/res.c
index 074b9929a38b2..0d8c1c388c4ca 100644
--- a/rdma/res.c
+++ b/rdma/res.c
@@ -621,6 +621,8 @@ static int res_cm_id_parse_cb(const struct nlmsghdr *nlh, 
void *data)
if (rd_check_is_string_filtered(rd, "src-addr",
src_addr_str))
continue;
+   if (rd_check_is_filtered(rd, "src-port", src_port))
+   continue;
}
 
if (nla_line[RDMA_NLDEV_ATTR_RES_DST_ADDR]) {
@@ -630,14 +632,10 @@ static int res_cm_id_parse_cb(const struct nlmsghdr *nlh, 
void *data)
if (rd_check_is_string_filtered(rd, "dst-addr",
dst_addr_str))
continue;
+   if (rd_check_is_filtered(rd, "dst-port", dst_port))
+   continue;
}
 
-   if (rd_check_is_filtered(rd, "src-port", src_port))
-   continue;
-
-   if (rd_check_is_filtered(rd, "dst-port", dst_port))
-   continue;
-
if (nla_line[RDMA_NLDEV_ATTR_RES_PID]) {
pid = mnl_attr_get_u32(
nla_line[RDMA_NLDEV_ATTR_RES_PID]);
-- 
2.19.0



[iproute PATCH] ip-route: Fix for memleak in error path

2018-10-18 Thread Phil Sutter
If call to rta_addattr_l() failed, parse_encap_seg6() would leak memory.
Fix this by making sure calls to free() are not skipped.

Fixes: bd59e5b1517b0 ("ip-route: Fix segfault with many nexthops")
Signed-off-by: Phil Sutter 
---
 ip/iproute_lwtunnel.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/ip/iproute_lwtunnel.c b/ip/iproute_lwtunnel.c
index 969a4763df71d..85045d4fff742 100644
--- a/ip/iproute_lwtunnel.c
+++ b/ip/iproute_lwtunnel.c
@@ -498,6 +498,7 @@ static int parse_encap_seg6(struct rtattr *rta, size_t len, 
int *argcp,
int argc = *argcp;
int encap = -1;
__u32 hmac = 0;
+   int ret = 0;
int srhlen;
 
while (argc > 0) {
@@ -539,16 +540,19 @@ static int parse_encap_seg6(struct rtattr *rta, size_t 
len, int *argcp,
memcpy(tuninfo->srh, srh, srhlen);
 
if (rta_addattr_l(rta, len, SEG6_IPTUNNEL_SRH, tuninfo,
- sizeof(*tuninfo) + srhlen))
-   return -1;
-
-   free(tuninfo);
-   free(srh);
+ sizeof(*tuninfo) + srhlen)) {
+   ret = -1;
+   goto out;
+   }
 
*argcp = argc + 1;
*argvp = argv - 1;
 
-   return 0;
+out:
+   free(tuninfo);
+   free(srh);
+
+   return ret;
 }
 
 struct lwt_x {
-- 
2.19.0



[iproute PATCH] rdma: Fix for ineffective check in add_filter()

2018-10-18 Thread Phil Sutter
With 'name' field defined as array in struct filters, it will always
contain a value irrespective of whether a name was assigned or not.

Fix this by turning the field into a const char pointer.

Fixes: 8cd644095842a ("devlink: Add support for devlink resource abstraction")
Signed-off-by: Phil Sutter 
---
 rdma/rdma.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/rdma/rdma.h b/rdma/rdma.h
index d4b7ba1918b13..c3b7530b6cc71 100644
--- a/rdma/rdma.h
+++ b/rdma/rdma.h
@@ -34,7 +34,7 @@
 
 #define MAX_NUMBER_OF_FILTERS 64
 struct filters {
-   char name[32];
+   const char *name;
bool is_number;
 };
 
-- 
2.19.0



[iproute PATCH] devlink: Fix error reporting in cmd_resource_set()

2018-10-18 Thread Phil Sutter
resource_path_parse() returns either zero or a negative error code,
hence the negated value must be passed to strerror().

Fixes: 8cd644095842a ("devlink: Add support for devlink resource abstraction")
Signed-off-by: Phil Sutter 
---
 devlink/devlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index 519ee2577cc4c..8bb254ea1b0b8 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -5127,7 +5127,7 @@ static int cmd_resource_set(struct dl *dl)
  &dl->opts.resource_id,
  &dl->opts.resource_id_valid);
if (err) {
-   pr_err("error parsing resource path %s\n", strerror(err));
+   pr_err("error parsing resource path %s\n", strerror(-err));
goto out;
}
 
-- 
2.19.0



[net PATCH] net: sched: Fix for duplicate class dump

2018-10-18 Thread Phil Sutter
When dumping classes by parent, kernel would return classes twice:

| # tc qdisc add dev lo root prio
| # tc class show dev lo
| class prio 8001:1 parent 8001:
| class prio 8001:2 parent 8001:
| class prio 8001:3 parent 8001:
| # tc class show dev lo parent 8001:
| class prio 8001:1 parent 8001:
| class prio 8001:2 parent 8001:
| class prio 8001:3 parent 8001:
| class prio 8001:1 parent 8001:
| class prio 8001:2 parent 8001:
| class prio 8001:3 parent 8001:

This comes from qdisc_match_from_root() potentially returning the root
qdisc itself if its handle matched. Though in that case, root's classes
were already dumped a few lines above.

Fixes: cb395b2010879 ("net: sched: optimize class dumps")
Signed-off-by: Phil Sutter 
---
 net/sched/sch_api.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 6684641ea3448..3dc0acf542454 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -2059,7 +2059,8 @@ static int tc_dump_tclass_root(struct Qdisc *root, struct 
sk_buff *skb,
 
if (tcm->tcm_parent) {
q = qdisc_match_from_root(root, TC_H_MAJ(tcm->tcm_parent));
-   if (q && tc_dump_tclass_qdisc(q, skb, tcm, cb, t_p, s_t) < 0)
+   if (q && q != root &&
+   tc_dump_tclass_qdisc(q, skb, tcm, cb, t_p, s_t) < 0)
return -1;
return 0;
}
-- 
2.19.0



[iproute PATCH] ip-addrlabel: Fix printing of label value

2018-10-15 Thread Phil Sutter
Passing the return value of RTA_DATA() to rta_getattr_u32() is wrong
since that function will call RTA_DATA() by itself already.

Fixes: a7ad1c8a6845d ("ipaddrlabel: add json support")
Signed-off-by: Phil Sutter 
---
 ip/ipaddrlabel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ip/ipaddrlabel.c b/ip/ipaddrlabel.c
index 2f79c56dcead2..8abe5722bafd1 100644
--- a/ip/ipaddrlabel.c
+++ b/ip/ipaddrlabel.c
@@ -95,7 +95,7 @@ int print_addrlabel(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg
}
 
if (tb[IFAL_LABEL] && RTA_PAYLOAD(tb[IFAL_LABEL]) == sizeof(uint32_t)) {
-   uint32_t label = rta_getattr_u32(RTA_DATA(tb[IFAL_LABEL]));
+   uint32_t label = rta_getattr_u32(tb[IFAL_LABEL]);
 
print_uint(PRINT_ANY,
   "label", "label %u ", label);
-- 
2.19.0



[iproute PATCH] bridge: fdb: Fix for missing keywords in non-JSON output

2018-10-09 Thread Phil Sutter
While migrating to JSON print library, some keywords were dropped from
standard output by accident. Add them back to unbreak output parsers.

Fixes: c7c1a1ef51aea ("bridge: colorize output and use JSON print library")
Signed-off-by: Phil Sutter 
---
 bridge/fdb.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/bridge/fdb.c b/bridge/fdb.c
index 4dbc894ceab93..6487fac579c20 100644
--- a/bridge/fdb.c
+++ b/bridge/fdb.c
@@ -182,7 +182,7 @@ int print_fdb(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
if (!is_json_context())
fprintf(fp, "dev ");
print_color_string(PRINT_ANY, COLOR_IFNAME,
-  "ifname", "%s ",
+  "ifname", "dev %s ",
   ll_index_to_name(r->ndm_ifindex));
}
 
@@ -199,7 +199,7 @@ int print_fdb(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
 
print_color_string(PRINT_ANY,
   ifa_family_color(family),
-   "dst", "%s ", dst);
+   "dst", "dst %s ", dst);
}
 
if (vid)
@@ -246,7 +246,7 @@ int print_fdb(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
 
 
if (tb[NDA_MASTER])
-   print_string(PRINT_ANY, "master", "%s ",
+   print_string(PRINT_ANY, "master", "master %s ",
 ll_index_to_name(rta_getattr_u32(tb[NDA_MASTER])));
 
print_string(PRINT_ANY, "state", "%s\n",
-- 
2.19.0



Re: re iproute2 - don't return error on success fix

2018-10-08 Thread Phil Sutter
Hi Or,

On Tue, Oct 02, 2018 at 03:21:18PM +0300, Or Gerlitz wrote:
> On Thu, Sep 27, 2018 at 3:53 PM Phil Sutter  wrote:
[...]
> > Hmm, I can't reproduce this. My HEAD is at the commit you mentioned:
> >
> > | % sudo ./tc/tc filter add dev d0 protocol ip parent : flower skip_sw 
> > ip_flags nofirstfrag action drop
> > | RTNETLINK answers: Operation not supported
> > | We have an error talking to the kernel, -1
> > | % echo $?
> > | 2
> >
> > Are you sure you tested the right binary?
> 
> double checked now, seems broken on my end:
> 
> $ git log --oneline -3
> b45e300 libnetlink: don't return error on success
> 5dc2204 testsuite: add libmnl
> 8804a8c Makefile: Add check target
> 
> $./tc/tc filter add dev enp33s0f0 protocol ip parent : flower
> skip_sw ip_flags frag action drop && echo "success" || echo "failed"
> success
> 
> $ ./tc/tc filter add dev enp33s0f0 protocol ip parent : flower
> skip_sw ip_flags firstfrag action drop && echo "success" || echo
> "failed"
> RTNETLINK answers: Operation not supported
> success

Interestingly, your output lacks the "We have an error" message mine
shows. So in your case, the call to rtnl_talk_iov() from
tc_filter_modify() does not return < 0.

In my case, commands always fail with 'skip_sw' since I only test on a
dummy interface. So maybe we hit different error paths? Could you please
check what happens for you in __rtnl_talk_iov()? I guess the "RTNETLINK
answers:" message should come from rtnl_talk_error(). In my case it does
at least, and I haven't found an alternative place where that message
could come from.

Cheers, Phil


Re: [PATCH net-next] geneve: fix ttl inherit type

2018-10-01 Thread Phil Sutter
Hangbin,

On Sat, Sep 29, 2018 at 05:16:05PM +0800, Hangbin Liu wrote:
[...]
> > Is it desirable to switch to a flag? If I read geneve_changelink() and
> > geneve_nl2info() correctly, it allows you to set the ttl_inherit flag
> > for an existing device but doesn't allow you to clear it. With NLA_U8,
> > you could distinguish three cases: set the flag (non-zero value), clear
> > the flag (zero value) and preserve current state (attribute not
> > present).
> 
> I re-read geneve_changelink() and I agree with you. Since we can change ttl
> number, we should also be able to set/unset ttl inherit.
> 
> Phil, what do you think?

All fine with me. I'm not familiar with either of VXLAN or Geneve,
you're the experts here. I was just the random guy wondering why things
are done one way and not the other. :)

Thanks for your diligent efforts at clearing up the mysteries!

Cheers, Phil


Re: [PATCH net-next] geneve: fix ttl inherit type

2018-09-28 Thread Phil Sutter
On Fri, Sep 28, 2018 at 09:09:58AM +0800, Hangbin Liu wrote:
> Phil pointed out that there is a mismatch between vxlan and geneve ttl
> inherit. We should define it as a flag and use nla_put_flag to export this
> opiton.

Same typo here. :)

Apart from that, LGTM!

Thanks, Phil


Re: [PATCH net] vxlan: use nla_put_flag for ttl inherit

2018-09-28 Thread Phil Sutter
On Fri, Sep 28, 2018 at 09:08:26AM +0800, Hangbin Liu wrote:
> Phil pointed out that there is a mismatch between vxlan and geneve ttl 
> inherit.
> We should define it as a flag and use nla_put_flag to export this opiton.

s/opiton/option/

Apart from that, LGTM!

Thanks, Phil


Re: [PATCH iproute2] vxlan: show correct ttl inherit info

2018-09-27 Thread Phil Sutter
Hi Hangbin,

On Thu, Sep 27, 2018 at 10:07:51PM +0800, Hangbin Liu wrote:
> On Thu, Sep 27, 2018 at 11:27:45AM +0200, Phil Sutter wrote:
> > On Thu, Sep 27, 2018 at 03:28:36PM +0800, Hangbin Liu wrote:
> > > We should only show ttl inherit when IFLA_VXLAN_TTL_INHERIT supplied.
> > > Otherwise show the ttl number, or auto when it is 0.
> > > 
> > > Signed-off-by: Hangbin Liu 
> > > ---
> > >  ip/iplink_vxlan.c | 16 ++--
> > >  1 file changed, 10 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/ip/iplink_vxlan.c b/ip/iplink_vxlan.c
> > > index 831f39a..7fc0e2b 100644
> > > --- a/ip/iplink_vxlan.c
> > > +++ b/ip/iplink_vxlan.c
> > > @@ -145,7 +145,7 @@ static int vxlan_parse_opt(struct link_util *lu, int 
> > > argc, char **argv,
> > >   NEXT_ARG();
> > >   check_duparg(&attrs, IFLA_VXLAN_TTL, "ttl", *argv);
> > >   if (strcmp(*argv, "inherit") == 0) {
> > > - addattr_l(n, 1024, IFLA_VXLAN_TTL_INHERIT, 
> > > NULL, 0);
> > > + addattr(n, 1024, IFLA_VXLAN_TTL_INHERIT);
> > 
> > So for VXLAN, the attribute is just added but with a zero value. Looking
> > at respective kernel code, this seems fine. Now I wonder why for Geneve,
> > you set the value to 1 and when displaying explicitly check whether the
> > attribute is there *and* non-zero. OK, looks like Geneve driver always
> > exports IFLA_GENEVE_TTL_INHERIT. Oddly, I can't find where VXLAN driver
> > in kernel does export the IFLA_VXLAN_TTL_INHERIT attribute. Am I missing
> > something?
> 
> Hi Phil,
> 
> The vxlan ttl inherit exportation is just fixed by
> net commit 8fd78069874 ("vxlan: fill ttl inherit info") yesterday.

Ah, thanks!

> > Do you know why handling of the attributes in both drivers differ?
> 
> That's because I set IFLA_VXLAN_TTL_INHERIT type to NLA_FLAG. But when fix
> the geneve issue, I forgot this and set IFLA_GENEVE_TTL_INHERIT type to
> NLA_U8... :(
> 
> I noticed this when post the iproute2 patch. But I thought this should have
> no much influence, so I just leave it as it is.
> 
> If you think we'd better have a same behaver. We can drop the geneve iproute2
> patch and post a fix to net kernel tree.

I guess it doesn't make much difference, but NLA_FLAG would be more
appropriate. If you want to spend the extra effort to fix it, I guess
now's the time - later it can't be changed anymore due to compatibility
issues (I guess).

Thanks, Phil


Re: re iproute2 - don't return error on success fix

2018-09-27 Thread Phil Sutter
Hi,

On Thu, Sep 27, 2018 at 03:22:41PM +0300, Or Gerlitz wrote:
> Something is still broken also after commit b45e300 "libnetlink: don't
> return error on success" - when error is returned, the error code is
> success..
> 
> $ tc filter add dev enp33s0f0 protocol ip parent : flower skip_sw
> ip_flags nofirstfrag action drop && echo "success" || echo "failed"
> 
> RTNETLINK answers: Operation not supported
> success

Hmm, I can't reproduce this. My HEAD is at the commit you mentioned:

| % sudo ./tc/tc filter add dev d0 protocol ip parent : flower skip_sw 
ip_flags nofirstfrag action drop
| RTNETLINK answers: Operation not supported
| We have an error talking to the kernel, -1
| % echo $?
| 2

Are you sure you tested the right binary?

Cheers, Phil


Re: [PATCH iproute2] vxlan: show correct ttl inherit info

2018-09-27 Thread Phil Sutter
Hi Hangbin,

On Thu, Sep 27, 2018 at 03:28:36PM +0800, Hangbin Liu wrote:
> We should only show ttl inherit when IFLA_VXLAN_TTL_INHERIT supplied.
> Otherwise show the ttl number, or auto when it is 0.
> 
> Signed-off-by: Hangbin Liu 
> ---
>  ip/iplink_vxlan.c | 16 ++--
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/ip/iplink_vxlan.c b/ip/iplink_vxlan.c
> index 831f39a..7fc0e2b 100644
> --- a/ip/iplink_vxlan.c
> +++ b/ip/iplink_vxlan.c
> @@ -145,7 +145,7 @@ static int vxlan_parse_opt(struct link_util *lu, int 
> argc, char **argv,
>   NEXT_ARG();
>   check_duparg(&attrs, IFLA_VXLAN_TTL, "ttl", *argv);
>   if (strcmp(*argv, "inherit") == 0) {
> - addattr_l(n, 1024, IFLA_VXLAN_TTL_INHERIT, 
> NULL, 0);
> + addattr(n, 1024, IFLA_VXLAN_TTL_INHERIT);

So for VXLAN, the attribute is just added but with a zero value. Looking
at respective kernel code, this seems fine. Now I wonder why for Geneve,
you set the value to 1 and when displaying explicitly check whether the
attribute is there *and* non-zero. OK, looks like Geneve driver always
exports IFLA_GENEVE_TTL_INHERIT. Oddly, I can't find where VXLAN driver
in kernel does export the IFLA_VXLAN_TTL_INHERIT attribute. Am I missing
something?
Do you know why handling of the attributes in both drivers differ?

Cheers, Phil


Re: [PATCH iproute2-next] geneve: fix ttl inherit behavior

2018-09-27 Thread Phil Sutter
On Thu, Sep 27, 2018 at 03:27:37PM +0800, Hangbin Liu wrote:
> Currently when we add geneve with "ttl inherit", we set ttl to 0, which
> is actually use whatever default value instead of inherit the inner
> protocol's ttl value.
> 
> To respect compatibility with old behavior and make a difference between
> ttl inherit and ttl == 0, we add an attribute IFLA_GENEVE_TTL_INHERIT in
> kernel commit 52d0d404d39dd ("geneve: add ttl inherit support").
> 
> Now let's use "ttl inherit" to inherit the inner protocol's ttl, and use
> "ttl auto" to means "use whatever default value", the same behavior with
> ttl == 0.
> 
> Reported-by: Jianlin Shi 
> Signed-off-by: Hangbin Liu 

Acked-by: Phil Sutter 


Re: [PATCH iproute2 v2 0/3] testsuite: make alltests fixes

2018-09-20 Thread Phil Sutter
Hi Petr,

On Thu, Sep 20, 2018 at 01:36:21AM +0200, Petr Vorel wrote:
> here are simply fixes to restore 'make alltests'.
> Currently it does not run.

Yeah, that testsuite definitely deserves some love.

Just one nit: The one-line summary in Fixes: tags should be enclosed in
quotes and parens, e.g.:

| Fixes: deadbeef ("foo bar")

Cheers, Phil


Re: [PATCH iproute2-next] iplink: add ipvtap support

2018-09-19 Thread Phil Sutter
On Wed, Sep 19, 2018 at 11:03:29AM +0800, Hangbin Liu wrote:
> IPVLAN and IPVTAP are using the same functions and parameters. So we can
> just add a new link_util with id ipvtap. Others are the same.
> 
> Signed-off-by: Hangbin Liu 

Acked-by: Phil Sutter 


Re: [PATCHv2 iproute2] bridge/mdb: fix missing new line when show bridge mdb

2018-09-11 Thread Phil Sutter
Hi Hangbin,

On Tue, Sep 11, 2018 at 09:26:35AM +0800, Hangbin Liu wrote:
[...]
> + if (!is_json_context() && !show_stats)
> + print_string(PRINT_FP, NULL, "\n", NULL);

There is no need to check for !is_json_context() here. You give a type
of PRINT_FP which won't lead to output if JSON is active. For reference,
check print_color_string() in lib/json_print.c and _IS_JSON_CONTEXT
macro.

Cheers, Phil


[iproute PATCH v2] ip-route: Fix segfault with many nexthops

2018-09-06 Thread Phil Sutter
It was possible to crash ip-route by adding an IPv6 route with 37
nexthop statements. A simple reproducer is:

| for i in `seq 37`; do
|   nhs="nexthop via ::$i "$nhs
| done
| ip -6 route add ::/64 $nhs

The related code was broken in multiple ways:

* parse_one_nh() assumed that rta points to 4kB of storage but caller
  provided just 1kB. Fixed by passing 'len' parameter with the correct
  value.

* Error checking of rta_addattr*() calls in parse_one_nh() and called
  functions was completely absent, so with above fix in place output
  flood would occur due to parser looping forever.

While being at it, increase message buffer sizes to 4k. This allows for
at most 144 nexthops.

Signed-off-by: Phil Sutter 
---
Changes since v1:
- Remove accidentally added 'return 0' line from parse_nexthops().
- Increase buffer sizes.
---
 ip/iproute.c  |  43 ++---
 ip/iproute_lwtunnel.c | 108 +-
 2 files changed, 91 insertions(+), 60 deletions(-)

diff --git a/ip/iproute.c b/ip/iproute.c
index 30833414a3f7f..398322fd1f4ff 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -941,7 +941,7 @@ int print_route(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
 }
 
 static int parse_one_nh(struct nlmsghdr *n, struct rtmsg *r,
-   struct rtattr *rta, struct rtnexthop *rtnh,
+   struct rtattr *rta, size_t len, struct rtnexthop *rtnh,
int *argcp, char ***argvp)
 {
int argc = *argcp;
@@ -962,11 +962,16 @@ static int parse_one_nh(struct nlmsghdr *n, struct rtmsg 
*r,
if (r->rtm_family == AF_UNSPEC)
r->rtm_family = addr.family;
if (addr.family == r->rtm_family) {
-   rta_addattr_l(rta, 4096, RTA_GATEWAY, 
&addr.data, addr.bytelen);
-   rtnh->rtnh_len += sizeof(struct rtattr) + 
addr.bytelen;
+   if (rta_addattr_l(rta, len, RTA_GATEWAY,
+ &addr.data, addr.bytelen))
+   return -1;
+   rtnh->rtnh_len += sizeof(struct rtattr)
+ + addr.bytelen;
} else {
-   rta_addattr_l(rta, 4096, RTA_VIA, &addr.family, 
addr.bytelen+2);
-   rtnh->rtnh_len += RTA_SPACE(addr.bytelen+2);
+   if (rta_addattr_l(rta, len, RTA_VIA,
+ &addr.family, addr.bytelen + 
2))
+   return -1;
+   rtnh->rtnh_len += RTA_SPACE(addr.bytelen + 2);
}
} else if (strcmp(*argv, "dev") == 0) {
NEXT_ARG();
@@ -988,13 +993,15 @@ static int parse_one_nh(struct nlmsghdr *n, struct rtmsg 
*r,
NEXT_ARG();
if (get_rt_realms_or_raw(&realm, *argv))
invarg("\"realm\" value is invalid\n", *argv);
-   rta_addattr32(rta, 4096, RTA_FLOW, realm);
+   if (rta_addattr32(rta, len, RTA_FLOW, realm))
+   return -1;
rtnh->rtnh_len += sizeof(struct rtattr) + 4;
} else if (strcmp(*argv, "encap") == 0) {
-   int len = rta->rta_len;
+   int old_len = rta->rta_len;
 
-   lwt_parse_encap(rta, 4096, &argc, &argv);
-   rtnh->rtnh_len += rta->rta_len - len;
+   if (lwt_parse_encap(rta, len, &argc, &argv))
+   return -1;
+   rtnh->rtnh_len += rta->rta_len - old_len;
} else if (strcmp(*argv, "as") == 0) {
inet_prefix addr;
 
@@ -1002,8 +1009,9 @@ static int parse_one_nh(struct nlmsghdr *n, struct rtmsg 
*r,
if (strcmp(*argv, "to") == 0)
NEXT_ARG();
get_addr(&addr, *argv, r->rtm_family);
-   rta_addattr_l(rta, 4096, RTA_NEWDST, &addr.data,
- addr.bytelen);
+   if (rta_addattr_l(rta, len, RTA_NEWDST,
+ &addr.data, addr.bytelen))
+   return -1;
rtnh->rtnh_len += sizeof(struct rtattr) + addr.bytelen;
} else
break;
@@ -1016,7 +1024,7 @@ static int parse_one_nh(struct nlmsghdr *n, struct rtmsg 
*r,
 static int parse_nexthops(struct 

Re: [iproute PATCH] ip-route: Fix segfault with many nexthops

2018-09-06 Thread Phil Sutter
Hi,

On Tue, Sep 04, 2018 at 07:15:44PM +0200, Phil Sutter wrote:
[...]
> diff --git a/ip/iproute.c b/ip/iproute.c
> index 30833414a3f7f..9e5ae48c0715c 100644
> --- a/ip/iproute.c
> +++ b/ip/iproute.c
[...]
> @@ -1036,15 +1044,18 @@ static int parse_nexthops(struct nlmsghdr *n, struct 
> rtmsg *r,
>   memset(rtnh, 0, sizeof(*rtnh));
>   rtnh->rtnh_len = sizeof(*rtnh);
>   rta->rta_len += rtnh->rtnh_len;
> - if (parse_one_nh(n, r, rta, rtnh, &argc, &argv)) {
> + if (parse_one_nh(n, r, rta, 1024, rtnh, &argc, &argv)) {
>   fprintf(stderr, "Error: cannot parse nexthop\n");
>   exit(-1);
>   }
>   rtnh = RTNH_NEXT(rtnh);
>   }
>  
> + return 0;
> +

This line got added by accident, I'll respin.

Sorry, Phil


[iproute PATCH] ip-route: Fix segfault with many nexthops

2018-09-04 Thread Phil Sutter
It was possible to crash ip-route by adding an IPv6 route with 37
nexthop statements. A simple reproducer is:

| for i in `seq 37`; do
|   nhs="nexthop via ::$i "$nhs
| done
| ip -6 route add ::/64 $nhs

The related code was broken in multiple ways:

* parse_one_nh() assumed that rta points to 4kB of storage but caller
  provided just 1kB. Fixed by passing 'len' parameter with the correct
  value.

* Error checking of rta_addattr*() calls in parse_one_nh() and called
  functions was completely absent, so with above fix in place output
  flood would occur due to parser looping forever.

Note that it is still not possible to add a route with more than 36
nexthops due to stack buffer sizes, this patch merely fixes error path.

Signed-off-by: Phil Sutter 
---
 ip/iproute.c  |  41 ++--
 ip/iproute_lwtunnel.c | 108 +-
 2 files changed, 91 insertions(+), 58 deletions(-)

diff --git a/ip/iproute.c b/ip/iproute.c
index 30833414a3f7f..9e5ae48c0715c 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -941,7 +941,7 @@ int print_route(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
 }
 
 static int parse_one_nh(struct nlmsghdr *n, struct rtmsg *r,
-   struct rtattr *rta, struct rtnexthop *rtnh,
+   struct rtattr *rta, size_t len, struct rtnexthop *rtnh,
int *argcp, char ***argvp)
 {
int argc = *argcp;
@@ -962,11 +962,16 @@ static int parse_one_nh(struct nlmsghdr *n, struct rtmsg 
*r,
if (r->rtm_family == AF_UNSPEC)
r->rtm_family = addr.family;
if (addr.family == r->rtm_family) {
-   rta_addattr_l(rta, 4096, RTA_GATEWAY, 
&addr.data, addr.bytelen);
-   rtnh->rtnh_len += sizeof(struct rtattr) + 
addr.bytelen;
+   if (rta_addattr_l(rta, len, RTA_GATEWAY,
+ &addr.data, addr.bytelen))
+   return -1;
+   rtnh->rtnh_len += sizeof(struct rtattr)
+ + addr.bytelen;
} else {
-   rta_addattr_l(rta, 4096, RTA_VIA, &addr.family, 
addr.bytelen+2);
-   rtnh->rtnh_len += RTA_SPACE(addr.bytelen+2);
+   if (rta_addattr_l(rta, len, RTA_VIA,
+ &addr.family, addr.bytelen + 
2))
+   return -1;
+   rtnh->rtnh_len += RTA_SPACE(addr.bytelen + 2);
}
} else if (strcmp(*argv, "dev") == 0) {
NEXT_ARG();
@@ -988,13 +993,15 @@ static int parse_one_nh(struct nlmsghdr *n, struct rtmsg 
*r,
NEXT_ARG();
if (get_rt_realms_or_raw(&realm, *argv))
invarg("\"realm\" value is invalid\n", *argv);
-   rta_addattr32(rta, 4096, RTA_FLOW, realm);
+   if (rta_addattr32(rta, len, RTA_FLOW, realm))
+   return -1;
rtnh->rtnh_len += sizeof(struct rtattr) + 4;
} else if (strcmp(*argv, "encap") == 0) {
-   int len = rta->rta_len;
+   int old_len = rta->rta_len;
 
-   lwt_parse_encap(rta, 4096, &argc, &argv);
-   rtnh->rtnh_len += rta->rta_len - len;
+   if (lwt_parse_encap(rta, len, &argc, &argv))
+   return -1;
+   rtnh->rtnh_len += rta->rta_len - old_len;
} else if (strcmp(*argv, "as") == 0) {
inet_prefix addr;
 
@@ -1002,8 +1009,9 @@ static int parse_one_nh(struct nlmsghdr *n, struct rtmsg 
*r,
if (strcmp(*argv, "to") == 0)
NEXT_ARG();
get_addr(&addr, *argv, r->rtm_family);
-   rta_addattr_l(rta, 4096, RTA_NEWDST, &addr.data,
- addr.bytelen);
+   if (rta_addattr_l(rta, len, RTA_NEWDST,
+ &addr.data, addr.bytelen))
+   return -1;
rtnh->rtnh_len += sizeof(struct rtattr) + addr.bytelen;
} else
break;
@@ -1036,15 +1044,18 @@ static int parse_nexthops(struct nlmsghdr *n, struct 
rtmsg *r,
memset(rtnh, 0, sizeof(*rtnh));
rtnh->rtnh_len = sizeof(*rtnh);

[iproute PATCH] iprule: Fix for incorrect space between dst and prefix

2018-08-29 Thread Phil Sutter
This was added by accident when introducing JSON support.

Fixes: 0dd4ccc56c0e3 ("iprule: add json support")
Signed-off-by: Phil Sutter 
---
 ip/iprule.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ip/iprule.c b/ip/iprule.c
index 8b9421431c26a..744d6d88e3433 100644
--- a/ip/iprule.c
+++ b/ip/iprule.c
@@ -239,7 +239,7 @@ int print_rule(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
 
print_string(PRINT_FP, NULL, "to ", NULL);
print_color_string(PRINT_ANY, ifa_family_color(frh->family),
-  "dst", "%s ", dst);
+  "dst", "%s", dst);
if (frh->dst_len != host_len)
print_uint(PRINT_ANY, "dstlen", "/%u ", frh->dst_len);
else
-- 
2.18.0



[iproute PATCH 2/2] lib: Make check_enable_color() return boolean

2018-08-17 Thread Phil Sutter
As suggested, turn return code into true/false although it's not checked
anywhere yet.

Fixes: 4d829626a ("Merge common code for conditionally colored output")
Signed-off-by: Phil Sutter 
---
 include/color.h | 2 +-
 lib/color.c | 8 
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/color.h b/include/color.h
index a22a00c2277e0..e30f28c51c844 100644
--- a/include/color.h
+++ b/include/color.h
@@ -21,7 +21,7 @@ enum color_opt {
 };
 
 void enable_color(void);
-int check_enable_color(int color, int json);
+bool check_enable_color(int color, int json);
 bool matches_color(const char *arg, int *val);
 void set_color_palette(void);
 int color_fprintf(FILE *fp, enum color_attr attr, const char *fmt, ...);
diff --git a/lib/color.c b/lib/color.c
index 9c9023587748f..eaf69e74d673a 100644
--- a/lib/color.c
+++ b/lib/color.c
@@ -79,16 +79,16 @@ void enable_color(void)
set_color_palette();
 }
 
-int check_enable_color(int color, int json)
+bool check_enable_color(int color, int json)
 {
if (json || color == COLOR_OPT_NEVER)
-   return 1;
+   return false;
 
if (color == COLOR_OPT_ALWAYS || isatty(fileno(stdout))) {
enable_color();
-   return 0;
+   return true;
}
-   return 1;
+   return false;
 }
 
 bool matches_color(const char *arg, int *val)
-- 
2.18.0



[iproute PATCH v5 1/2] Make colored output configurable

2018-08-17 Thread Phil Sutter
Allow for -color={never,auto,always} to have colored output disabled,
enabled only if stdout is a terminal or enabled regardless of stdout
state.

Signed-off-by: Phil Sutter 
---
Changes since v1:
- Allow to override isatty() check by specifying '-color' flag more than
  once.
- Document new behaviour in man pages.

Changes since v2:
- Implement new -color=foo syntax.
- Update commit message and man page texts accordingly.

Changes since v3:
- Fix typo in tc/tc.c causing compile error.

Changes since v4:
- Make matches_color() return boolean.
---
 bridge/bridge.c   |  3 +--
 include/color.h   |  9 +
 ip/ip.c   |  3 +--
 lib/color.c   | 33 -
 man/man8/bridge.8 | 13 +++--
 man/man8/ip.8 | 13 +++--
 man/man8/tc.8 | 13 +++--
 tc/tc.c   |  3 +--
 8 files changed, 77 insertions(+), 13 deletions(-)

diff --git a/bridge/bridge.c b/bridge/bridge.c
index b3cab717ead30..663a35b2b2e46 100644
--- a/bridge/bridge.c
+++ b/bridge/bridge.c
@@ -173,8 +173,7 @@ main(int argc, char **argv)
NEXT_ARG();
if (netns_switch(argv[1]))
exit(-1);
-   } else if (matches(opt, "-color") == 0) {
-   ++color;
+   } else if (matches_color(opt, &color)) {
} else if (matches(opt, "-compressvlans") == 0) {
++compress_vlans;
} else if (matches(opt, "-force") == 0) {
diff --git a/include/color.h b/include/color.h
index 4f2c918db7e43..a22a00c2277e0 100644
--- a/include/color.h
+++ b/include/color.h
@@ -2,6 +2,8 @@
 #ifndef __COLOR_H__
 #define __COLOR_H__ 1
 
+#include 
+
 enum color_attr {
COLOR_IFNAME,
COLOR_MAC,
@@ -12,8 +14,15 @@ enum color_attr {
COLOR_NONE
 };
 
+enum color_opt {
+   COLOR_OPT_NEVER = 0,
+   COLOR_OPT_AUTO = 1,
+   COLOR_OPT_ALWAYS = 2
+};
+
 void enable_color(void);
 int check_enable_color(int color, int json);
+bool matches_color(const char *arg, int *val);
 void set_color_palette(void);
 int color_fprintf(FILE *fp, enum color_attr attr, const char *fmt, ...);
 enum color_attr ifa_family_color(__u8 ifa_family);
diff --git a/ip/ip.c b/ip/ip.c
index 72e858eed50d5..58c643df8a366 100644
--- a/ip/ip.c
+++ b/ip/ip.c
@@ -283,8 +283,7 @@ int main(int argc, char **argv)
exit(-1);
}
rcvbuf = size;
-   } else if (matches(opt, "-color") == 0) {
-   ++color;
+   } else if (matches_color(opt, &color)) {
} else if (matches(opt, "-help") == 0) {
usage();
} else if (matches(opt, "-netns") == 0) {
diff --git a/lib/color.c b/lib/color.c
index edf96e5c6ecd7..9c9023587748f 100644
--- a/lib/color.c
+++ b/lib/color.c
@@ -3,11 +3,13 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 
 #include "color.h"
+#include "utils.h"
 
 enum color {
C_RED,
@@ -79,13 +81,42 @@ void enable_color(void)
 
 int check_enable_color(int color, int json)
 {
-   if (color && !json) {
+   if (json || color == COLOR_OPT_NEVER)
+   return 1;
+
+   if (color == COLOR_OPT_ALWAYS || isatty(fileno(stdout))) {
enable_color();
return 0;
}
return 1;
 }
 
+bool matches_color(const char *arg, int *val)
+{
+   char *dup, *p;
+
+   if (!val)
+   return false;
+
+   dup = strdupa(arg);
+   p = strchrnul(dup, '=');
+   if (*p)
+   *(p++) = '\0';
+
+   if (matches(dup, "-color"))
+   return false;
+
+   if (*p == '\0' || !strcmp(p, "always"))
+   *val = COLOR_OPT_ALWAYS;
+   else if (!strcmp(p, "auto"))
+   *val = COLOR_OPT_AUTO;
+   else if (!strcmp(p, "never"))
+   *val = COLOR_OPT_NEVER;
+   else
+   return false;
+   return true;
+}
+
 void set_color_palette(void)
 {
char *p = getenv("COLORFGBG");
diff --git a/man/man8/bridge.8 b/man/man8/bridge.8
index 1d10cb2b6a72c..53cd3d0a3d933 100644
--- a/man/man8/bridge.8
+++ b/man/man8/bridge.8
@@ -172,8 +172,17 @@ If there were any errors during execution of the commands, 
the application
 return code will be non zero.
 
 .TP
-.BR "\-c" , " -color"
-Use color output.
+.BR \-c [ color ][ = { always | auto | never }
+Configure color output. If parameter is omitted or
+.BR always ,
+color output is enabled regardless of stdout state. If parameter is
+.BR auto ,
+stdout is checked to be a terminal before enabling color output. If parameter 
is
+.BR never ,
+color output is disabled. If specified multiple times, the l

Re: [iproute PATCH 07/10] ip: Add missing -M flag to help text

2018-08-17 Thread Phil Sutter
Hi Stephen,
On Thu, Aug 16, 2018 at 12:27:59PM +0200, Phil Sutter wrote:
> Signed-off-by: Phil Sutter 
> ---
>  ip/ip.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Seems like this patch wasn't applied. Did you perhaps drop it by
accident?

Cheers, Phil


Re: [iproute PATCH v4] Make colored output configurable

2018-08-16 Thread Phil Sutter
On Thu, Aug 16, 2018 at 07:06:07AM -0600, David Ahern wrote:
> On 8/16/18 3:37 AM, Phil Sutter wrote:
> > Allow for -color={never,auto,always} to have colored output disabled,
> > enabled only if stdout is a terminal or enabled regardless of stdout
> > state.
> > 
> > Signed-off-by: Phil Sutter 
> > ---
> > Changes since v1:
> > - Allow to override isatty() check by specifying '-color' flag more than
> >   once.
> > - Document new behaviour in man pages.
> > 
> > Changes since v2:
> > - Implement new -color=foo syntax.
> > - Update commit message and man page texts accordingly.
> > 
> > Changes since v3:
> > - Fix typo in tc/tc.c causing compile error.
> > ---
> >  bridge/bridge.c   |  3 +--
> >  include/color.h   |  7 +++
> >  ip/ip.c   |  3 +--
> >  lib/color.c   | 33 -
> >  man/man8/bridge.8 | 13 +++--
> >  man/man8/ip.8 | 13 +++--
> >  man/man8/tc.8 | 13 +++--
> >  tc/tc.c   |  3 +--
> >  8 files changed, 75 insertions(+), 13 deletions(-)
> > 
> > diff --git a/bridge/bridge.c b/bridge/bridge.c
> > index 451d684e0bcfd..e35e5bdf7fb30 100644
> > --- a/bridge/bridge.c
> > +++ b/bridge/bridge.c
> > @@ -173,8 +173,7 @@ main(int argc, char **argv)
> > NEXT_ARG();
> > if (netns_switch(argv[1]))
> > exit(-1);
> > -   } else if (matches(opt, "-color") == 0) {
> > -   ++color;
> > +   } else if (matches_color(opt, &color) == 0) {
> > } else if (matches(opt, "-compressvlans") == 0) {
> > ++compress_vlans;
> > } else if (matches(opt, "-force") == 0) {
> > diff --git a/include/color.h b/include/color.h
> > index 4f2c918db7e43..42038dc2e7f87 100644
> > --- a/include/color.h
> > +++ b/include/color.h
> > @@ -12,8 +12,15 @@ enum color_attr {
> > COLOR_NONE
> >  };
> >  
> > +enum color_opt {
> > +   COLOR_OPT_NEVER = 0,
> > +   COLOR_OPT_AUTO = 1,
> > +   COLOR_OPT_ALWAYS = 2
> > +};
> 
> The order of AUTO and ALWAYS is backwards. Existing users who do
> something like:

Ordering color_opt this way felt natural since the default (0)
automatically matches the existing default (no colors).

> ip -c addr list | less -R
> or
> ip -c addr list > /tmp/addr
> less -R /tmp/addr
> 
> should not be affected by this change. That is an existing command that
> works and should continue to work the same after this change.
> 
> Users who add -c but don't want the codes if stdout is not a tty are the
> ones who should be doing something new - be it adding another -c or
> using -c=auto.

My code is correct in that regard: Plain -c[olor] is equivalent to
-color=always.

Cheers, Phil


[iproute PATCH 04/10] man: devlink.8: Document -verbose option

2018-08-16 Thread Phil Sutter
This was the only bit missing in comparison to devlink help text.

Signed-off-by: Phil Sutter 
---
 man/man8/devlink.8 | 4 
 1 file changed, 4 insertions(+)

diff --git a/man/man8/devlink.8 b/man/man8/devlink.8
index ac61b6add7fba..360031f77c165 100644
--- a/man/man8/devlink.8
+++ b/man/man8/devlink.8
@@ -47,6 +47,10 @@ Generate JSON output.
 .BR "\-p" , " --pretty"
 When combined with -j generate a pretty JSON output.
 
+.TP
+.BR "\-v" , " --verbose"
+Turn on verbose output.
+
 .SS
 .I OBJECT
 
-- 
2.18.0



[iproute PATCH 00/10] Review help texts and man pages

2018-08-16 Thread Phil Sutter
This series fixes a number of issues identified by an automated scan
over man pages and help texts.

Phil Sutter (10):
  man: bridge.8: Document -oneline option
  bridge: trivial: Make help text consistent
  devlink: trivial: Make help text consistent
  man: devlink.8: Document -verbose option
  genl: Fix help text
  man: ifstat.8: Document --json and --pretty options
  ip: Add missing -M flag to help text
  man: rtacct.8: Fix nstat options
  rtmon: List options in help text
  man: ss.8: Describe --events option

 bridge/bridge.c|  2 +-
 devlink/devlink.c  |  2 +-
 genl/genl.c|  4 ++--
 ip/ip.c|  2 +-
 ip/rtmon.c |  4 +++-
 man/man8/bridge.8  | 15 ++-
 man/man8/devlink.8 |  4 
 man/man8/ifstat.8  |  8 
 man/man8/rtacct.8  | 14 +-
 man/man8/ss.8  |  3 +++
 10 files changed, 46 insertions(+), 12 deletions(-)

-- 
2.18.0



[iproute PATCH 02/10] bridge: trivial: Make help text consistent

2018-08-16 Thread Phil Sutter
Change curly braces into brackets for -json option in help text to be
consistent with the rest.

Signed-off-by: Phil Sutter 
---
 bridge/bridge.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/bridge/bridge.c b/bridge/bridge.c
index e35e5bdf7fb30..04d84163e68c4 100644
--- a/bridge/bridge.c
+++ b/bridge/bridge.c
@@ -42,7 +42,7 @@ static void usage(void)
 "where OBJECT := { link | fdb | mdb | vlan | monitor }\n"
 "  OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] |\n"
 "   -o[neline] | -t[imestamp] | -n[etns] name |\n"
-"   -c[ompressvlans] -color -p[retty] -j{son} }\n");
+"   -c[ompressvlans] -color -p[retty] -j[son] }\n");
exit(-1);
 }
 
-- 
2.18.0



[iproute PATCH 05/10] genl: Fix help text

2018-08-16 Thread Phil Sutter
The '| help' part was misleading: In fact, 'genl help' does not work but
'genl  help' does. Fix the help text to make that clear.

In addition to that, list -Version and -help flags as well.

Signed-off-by: Phil Sutter 
---
 genl/genl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/genl/genl.c b/genl/genl.c
index 20ecb8b63ef62..1940a23c5a99c 100644
--- a/genl/genl.c
+++ b/genl/genl.c
@@ -98,9 +98,9 @@ static void usage(void) __attribute__((noreturn));
 
 static void usage(void)
 {
-   fprintf(stderr, "Usage: genl [ OPTIONS ] OBJECT | help }\n"
+   fprintf(stderr, "Usage: genl [ OPTIONS ] OBJECT [help] }\n"
"where  OBJECT := { ctrl etc }\n"
-   "   OPTIONS := { -s[tatistics] | -d[etails] | 
-r[aw] }\n");
+   "   OPTIONS := { -s[tatistics] | -d[etails] | 
-r[aw] | -V[ersion] | -h[elp] }\n");
exit(-1);
 }
 
-- 
2.18.0



[iproute PATCH 10/10] man: ss.8: Describe --events option

2018-08-16 Thread Phil Sutter
Signed-off-by: Phil Sutter 
---
 man/man8/ss.8 | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/man/man8/ss.8 b/man/man8/ss.8
index 28033d8f01dda..7a6572b173648 100644
--- a/man/man8/ss.8
+++ b/man/man8/ss.8
@@ -242,6 +242,9 @@ Print summary statistics. This option does not parse socket 
lists obtaining
 summary from various sources. It is useful when amount of sockets is so huge
 that parsing /proc/net/tcp is painful.
 .TP
+.B \-E, \-\-events
+Continually display sockets as they are destroyed
+.TP
 .B \-Z, \-\-context
 As the
 .B \-p
-- 
2.18.0



[iproute PATCH 09/10] rtmon: List options in help text

2018-08-16 Thread Phil Sutter
Signed-off-by: Phil Sutter 
---
 ip/rtmon.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/ip/rtmon.c b/ip/rtmon.c
index acc11df49b423..0e795f740e627 100644
--- a/ip/rtmon.c
+++ b/ip/rtmon.c
@@ -63,7 +63,9 @@ static int dump_msg2(const struct sockaddr_nl *who,
 
 static void usage(void)
 {
-   fprintf(stderr, "Usage: rtmon file FILE [ all | LISTofOBJECTS]\n");
+   fprintf(stderr, "Usage: rtmon [ OPTIONS ] file FILE [ all | 
LISTofOBJECTS ]\n");
+   fprintf(stderr, "OPTIONS := { -f[amily] { inet | inet6 | link | help } 
|\n"
+   " -4 | -6 | -0 | -V[ersion] }\n");
fprintf(stderr, "LISTofOBJECTS := [ link ] [ address ] [ route ]\n");
exit(-1);
 }
-- 
2.18.0



[iproute PATCH 03/10] devlink: trivial: Make help text consistent

2018-08-16 Thread Phil Sutter
Typically the part of the flag in brackets completes the leading part
instead of repeating it.

Signed-off-by: Phil Sutter 
---
 devlink/devlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index 784bb84bf7ad6..519ee2577cc4c 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -5426,7 +5426,7 @@ static void help(void)
pr_err("Usage: devlink [ OPTIONS ] OBJECT { COMMAND | help }\n"
   "   devlink [ -f[orce] ] -b[atch] filename\n"
   "where  OBJECT := { dev | port | sb | monitor | dpipe | resource 
| region }\n"
-  "   OPTIONS := { -V[ersion] | -n[no-nice-names] | -j[json] | 
-p[pretty] | -v[verbose] }\n");
+  "   OPTIONS := { -V[ersion] | -n[o-nice-names] | -j[son] | 
-p[retty] | -v[erbose] }\n");
 }
 
 static int dl_cmd(struct dl *dl, int argc, char **argv)
-- 
2.18.0



[iproute PATCH 07/10] ip: Add missing -M flag to help text

2018-08-16 Thread Phil Sutter
Signed-off-by: Phil Sutter 
---
 ip/ip.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ip/ip.c b/ip/ip.c
index 893c3c43ef99a..93382d671f467 100644
--- a/ip/ip.c
+++ b/ip/ip.c
@@ -55,7 +55,7 @@ static void usage(void)
 "   OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] | -r[esolve] |\n"
 "-h[uman-readable] | -iec | -j[son] | -p[retty] |\n"
 "-f[amily] { inet | inet6 | ipx | dnet | mpls | bridge | 
link } |\n"
-"-4 | -6 | -I | -D | -B | -0 |\n"
+"-4 | -6 | -I | -D | -M | -B | -0 |\n"
 "-l[oops] { maximum-addr-flush-attempts } | -br[ief] |\n"
 "-o[neline] | -t[imestamp] | -ts[hort] | -b[atch] 
[filename] |\n"
 "-rc[vbuf] [size] | -n[etns] name | -a[ll] | -c[olor]}\n");
-- 
2.18.0



  1   2   3   4   5   6   7   8   9   10   >