Re: [PATCH v2 net 2/2] net: core: generic XDP support for stacked device

2019-05-21 Thread Stephen Hemminger
On Tue, 21 May 2019 08:15:36 +0200
Jiri Pirko  wrote:

> + if (static_branch_unlikely(&generic_xdp_needed_key)) {
> + int ret2;
> +
> + preempt_disable();
> + rcu_read_lock();
> + ret2 = do_xdp_generic(rcu_dereference(skb->dev->xdp_prog), skb);
> + rcu_read_unlock();
> + preempt_enable();
> +
> + if (ret2 != XDP_PASS)
> + return NET_RX_DROP;
> + }
> +

rcu_read_lock is already held by callers of __netif_receive_skb_core


[PATCH iproute2] man: fix macaddr section of ip-link

2019-05-21 Thread Stephen Hemminger
The formatting of setting mac address was confusing.
Break lines and fix highlighting.

Signed-off-by: Stephen Hemminger 
---
 man/man8/ip-link.8.in | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index d035a5c92ed5..883d88077d66 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -154,10 +154,15 @@ ip-link \- network device configuration
 .br
 .RB "[ " addrgenmode " { " eui64 " | " none " | " stable_secret " | " random " 
} ]"
 .br
-.RB "[ " macaddr " { " flush " | { " add " | " del " } "
-.IR MACADDR " | set [ "
-.IR MACADDR " [ "
-.IR MACADDR " [ ... ] ] ] } ]"
+.RB "[ " macaddr
+.RI "[ " MACADDR " ]"
+.br
+.in +10
+.RB "[ { " flush " | " add " | " del " } "
+.IR MACADDR " ]"
+.br
+.RB "[ " set
+.IR MACADDR " ] ]"
 .br
 
 .ti -8
-- 
2.20.1



Re: [PATCH iproute2 v2] m_mirred: don't bail if the control action is missing

2019-05-22 Thread Stephen Hemminger
On Mon, 20 May 2019 11:56:52 +0200
Paolo Abeni  wrote:

> The mirred act admits an optional control action, defaulting
> to TC_ACT_PIPE. The parsing code currently emits an error message
> if the control action is not provided on the command line, even
> if the command itself completes with no error.
> 
> This change shuts down the error message, using the appropriate
> parsing helper.
> 
> Fixes: e67aba559581 ("tc: actions: add helpers to parse and print control 
> actions")
> Signed-off-by: Paolo Abeni 

Applied


Re: [PATCH net-next] hv_sock: perf: Allow the socket buffer size options to influence the actual socket buffers

2019-05-22 Thread Stephen Hemminger
On Wed, 22 May 2019 22:56:07 +
Sunil Muthuswamy  wrote:

> Currently, the hv_sock buffer size is static and can't scale to the
> bandwidth requirements of the application. This change allows the
> applications to influence the socket buffer sizes using the SO_SNDBUF and
> the SO_RCVBUF socket options.
> 
> Few interesting points to note:
> 1. Since the VMBUS does not allow a resize operation of the ring size, the
> socket buffer size option should be set prior to establishing the
> connection for it to take effect.
> 2. Setting the socket option comes with the cost of that much memory being
> reserved/allocated by the kernel, for the lifetime of the connection.
> 
> Perf data:
> Total Data Transfer: 1GB
> Single threaded reader/writer
> Results below are summarized over 10 iterations.
> 
> Linux hvsocket writer + Windows hvsocket reader:
> |-|
> |Packet size ->   |  128B   |   1KB   |   4KB   | 
>64KB |
> |-|
> |SO_SNDBUF size | | Throughput in MB/s (min/max/avg/median):  
> |
> |   v |   
> |
> |-|
> |  Default| 109/118/114/116 | 636/774/701/700 | 435/507/480/476 |   
> 410/491/462/470   |
> |  16KB   | 110/116/112/111 | 575/705/662/671 | 749/900/854/869 |   
> 592/824/692/676   |
> |  32KB   | 108/120/115/115 | 703/823/767/772 | 718/878/850/866 | 
> 1593/2124/2000/2085 |
> |  64KB   | 108/119/114/114 | 592/732/683/688 | 805/934/903/911 | 
> 1784/1943/1862/1843 |
> |-|
> 
> Windows hvsocket writer + Linux hvsocket reader:
> |-|
> |Packet size ->   | 128B|  1KB|  4KB| 
>64KB |
> |-|
> |SO_RCVBUF size | |   Throughput in MB/s (min/max/avg/median):
> |
> |   v |   
> |
> |-|
> |  Default| 69/82/75/73 | 313/343/333/336 |   418/477/446/445   |   
> 659/701/676/678   |
> |  16KB   | 69/83/76/77 | 350/401/375/382 |   506/548/517/516   |   
> 602/624/615/615   |
> |  32KB   | 62/83/73/73 | 471/529/496/494 |   830/1046/935/939  | 
> 944/1180/1070/1100  |
> |  64KB   | 64/70/68/69 | 467/533/501/497 | 1260/1590/1430/1431 | 
> 1605/1819/1670/1660 |
> |-|
> 
> Signed-off-by: Sunil Muthuswamy 

It looks like Exchange mangled you patch. It doesn't apply clean.


>  



Re: [PATCH iproute2] ip route: Set rtm_dst_len to 32 for all ip route get requests

2019-05-22 Thread Stephen Hemminger
On Fri, 17 May 2019 10:59:13 -0700
David Ahern  wrote:

> index 2b3dcc5dbd53..d980b86ffd42 100644
> --- a/ip/iproute.c
> +++ b/ip/iproute.c
> @@ -2035,7 +2035,11 @@ static int iproute_get(int argc, char **argv)
>   if (addr.bytelen)
>   addattr_l(&req.n, sizeof(req),
> RTA_DST, &addr.data, addr.bytelen);
> - req.r.rtm_dst_len = addr.bitlen;
> + /* kernel ignores prefix length on 'route get'
> +  * requests; to allow ip to work with strict mode
> +  * but not break existing users, just set to 32
> +  */
> + req.r.rtm_dst_len = 32;
>   address_found = true;
>   }
>   argc--; argv++;

Why not warn user that any prefix length (ie not 32) is ignored,
then do what you propose.


[PATCH v3 0/2] XDP generic fixes

2019-05-23 Thread Stephen Hemminger
This set of patches came about while investigating XDP
generic on Azure. The split brain nature of the accelerated
networking exposed issues with the stack device model.

Stephen Hemminger (2):
  netvsc: unshare skb in VF rx handler
  net: core: support XDP generic on stacked devices.

 drivers/net/hyperv/netvsc_drv.c |  6 
 net/core/dev.c  | 56 +++--
 2 files changed, 17 insertions(+), 45 deletions(-)

-- 
2.20.1



[PATCH v3 2/2] net: core: support XDP generic on stacked devices.

2019-05-23 Thread Stephen Hemminger
When a device is stacked like (team, bonding, failsafe or netvsc) the
XDP generic program for the parent device was not called.

Move the call to XDP generic inside __netif_receive_skb_core where
it can be done multiple times for stacked case.

Suggested-by: Jiri Pirko 
Fixes: d445516966dc ("net: xdp: support xdp generic on virtual devices")
Signed-off-by: Stephen Hemminger 
---
v1 - call xdp_generic in netvsc handler
v2 - do xdp_generic in generic rx handler processing
v3 - move xdp_generic call inside the another pass loop

 net/core/dev.c | 56 ++
 1 file changed, 11 insertions(+), 45 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index b6b8505cfb3e..696776e14d00 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4502,23 +4502,6 @@ static int netif_rx_internal(struct sk_buff *skb)
 
trace_netif_rx(skb);
 
-   if (static_branch_unlikely(&generic_xdp_needed_key)) {
-   int ret;
-
-   preempt_disable();
-   rcu_read_lock();
-   ret = do_xdp_generic(rcu_dereference(skb->dev->xdp_prog), skb);
-   rcu_read_unlock();
-   preempt_enable();
-
-   /* Consider XDP consuming the packet a success from
-* the netdev point of view we do not want to count
-* this as an error.
-*/
-   if (ret != XDP_PASS)
-   return NET_RX_SUCCESS;
-   }
-
 #ifdef CONFIG_RPS
if (static_branch_unlikely(&rps_needed)) {
struct rps_dev_flow voidflow, *rflow = &voidflow;
@@ -4858,6 +4841,17 @@ static int __netif_receive_skb_core(struct sk_buff *skb, 
bool pfmemalloc,
 
__this_cpu_inc(softnet_data.processed);
 
+   if (static_branch_unlikely(&generic_xdp_needed_key)) {
+   int ret2;
+
+   preempt_disable();
+   ret2 = do_xdp_generic(rcu_dereference(skb->dev->xdp_prog), skb);
+   preempt_enable();
+
+   if (ret2 != XDP_PASS)
+   return NET_RX_DROP;
+   }
+
if (skb->protocol == cpu_to_be16(ETH_P_8021Q) ||
skb->protocol == cpu_to_be16(ETH_P_8021AD)) {
skb = skb_vlan_untag(skb);
@@ -5178,19 +5172,6 @@ static int netif_receive_skb_internal(struct sk_buff 
*skb)
if (skb_defer_rx_timestamp(skb))
return NET_RX_SUCCESS;
 
-   if (static_branch_unlikely(&generic_xdp_needed_key)) {
-   int ret;
-
-   preempt_disable();
-   rcu_read_lock();
-   ret = do_xdp_generic(rcu_dereference(skb->dev->xdp_prog), skb);
-   rcu_read_unlock();
-   preempt_enable();
-
-   if (ret != XDP_PASS)
-   return NET_RX_DROP;
-   }
-
rcu_read_lock();
 #ifdef CONFIG_RPS
if (static_branch_unlikely(&rps_needed)) {
@@ -5224,21 +5205,6 @@ static void netif_receive_skb_list_internal(struct 
list_head *head)
}
list_splice_init(&sublist, head);
 
-   if (static_branch_unlikely(&generic_xdp_needed_key)) {
-   preempt_disable();
-   rcu_read_lock();
-   list_for_each_entry_safe(skb, next, head, list) {
-   xdp_prog = rcu_dereference(skb->dev->xdp_prog);
-   skb_list_del_init(skb);
-   if (do_xdp_generic(xdp_prog, skb) == XDP_PASS)
-   list_add_tail(&skb->list, &sublist);
-   }
-   rcu_read_unlock();
-   preempt_enable();
-   /* Put passed packets back on main list */
-   list_splice_init(&sublist, head);
-   }
-
rcu_read_lock();
 #ifdef CONFIG_RPS
if (static_branch_unlikely(&rps_needed)) {
-- 
2.20.1



[PATCH v3 1/2] netvsc: unshare skb in VF rx handler

2019-05-23 Thread Stephen Hemminger
The netvsc VF skb handler should make sure that skb is not
shared. Similar logic already exists in bonding and team device
drivers.

This is not an issue in practice because the VF devicex
does not send up shared skb's. But the netvsc driver
should do the right thing if it did.

Fixes: 0c195567a8f6 ("netvsc: transparent VF management")
Signed-off-by: Stephen Hemminger 
---
 drivers/net/hyperv/netvsc_drv.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 06393b215102..9873b8679f81 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -2000,6 +2000,12 @@ static rx_handler_result_t netvsc_vf_handle_frame(struct 
sk_buff **pskb)
struct netvsc_vf_pcpu_stats *pcpu_stats
 = this_cpu_ptr(ndev_ctx->vf_stats);
 
+   skb = skb_share_check(skb, GFP_ATOMIC);
+   if (unlikely(!skb))
+   return RX_HANDLER_CONSUMED;
+
+   *pskb = skb;
+
skb->dev = ndev;
 
u64_stats_update_begin(&pcpu_stats->syncp);
-- 
2.20.1



Re: [PATCH v3 2/2] net: core: support XDP generic on stacked devices.

2019-05-23 Thread Stephen Hemminger
On Thu, 23 May 2019 19:19:40 +
Saeed Mahameed  wrote:

> On Thu, 2019-05-23 at 10:54 -0700, Stephen Hemminger wrote:
> > When a device is stacked like (team, bonding, failsafe or netvsc) the
> > XDP generic program for the parent device was not called.
> > 
> > Move the call to XDP generic inside __netif_receive_skb_core where
> > it can be done multiple times for stacked case.
> > 
> > Suggested-by: Jiri Pirko 
> > Fixes: d445516966dc ("net: xdp: support xdp generic on virtual
> > devices")
> > Signed-off-by: Stephen Hemminger 
> > ---
> > v1 - call xdp_generic in netvsc handler
> > v2 - do xdp_generic in generic rx handler processing
> > v3 - move xdp_generic call inside the another pass loop
> > 
> >  net/core/dev.c | 56 ++
> > 
> >  1 file changed, 11 insertions(+), 45 deletions(-)
> > 
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index b6b8505cfb3e..696776e14d00 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -4502,23 +4502,6 @@ static int netif_rx_internal(struct sk_buff
> > *skb)
> >  
> > trace_netif_rx(skb);
> >  
> > -   if (static_branch_unlikely(&generic_xdp_needed_key)) {
> > -   int ret;
> > -
> > -   preempt_disable();
> > -   rcu_read_lock();
> > -   ret = do_xdp_generic(rcu_dereference(skb->dev-  
> > >xdp_prog), skb);  
> > -   rcu_read_unlock();
> > -   preempt_enable();
> > -
> > -   /* Consider XDP consuming the packet a success from
> > -* the netdev point of view we do not want to count
> > -* this as an error.
> > -*/
> > -   if (ret != XDP_PASS)
> > -   return NET_RX_SUCCESS;
> > -   }
> > -  
> 
> Adding Jesper, 
> 
> There is a small behavioral change due to this patch, 
> the XDP program after this patch will run on the RPS CPU, if
> configured, which could cause some behavioral changes in
> xdp_redirect_cpu: bpf_redirect_map(cpu_map).
> 
> Maybe this is acceptable, but it should be documented, as the current
> assumption dictates: XDP program runs on the core where the XDP
> frame/SKB was first seen.

Or maybe XDP should just force off RPS (like it does gro)


Re: [RFC net-next 1/1] net: sched: protect against loops in TC filter hooks

2019-05-24 Thread Stephen Hemminger
On Fri, 24 May 2019 17:05:46 +0100
John Hurley  wrote:

> TC hooks allow the application of filters and actions to packets at both
> ingress and egress of the network stack. It is possible, with poor
> configuration, that this can produce loops whereby an ingress hook calls
> a mirred egress action that has an egress hook that redirects back to
> the first ingress etc. The TC core classifier protects against loops when
> doing reclassifies but, as yet, there is no protection against a packet
> looping between multiple hooks. This can lead to stack overflow panics.
> 
> Add a per cpu counter that tracks recursion of packets through TC hooks.
> The packet will be dropped if a recursive limit is passed and the counter
> reset for the next packet.
> 
> Signed-off-by: John Hurley 
> Reviewed-by: Simon Horman 
> ---
>  net/core/dev.c | 62 
> +++---
>  1 file changed, 55 insertions(+), 7 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index b6b8505..a6d9ed7 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -154,6 +154,9 @@
>  /* This should be increased if a protocol with a bigger head is added. */
>  #define GRO_MAX_HEAD (MAX_HEADER + 128)
>  
> +#define SCH_RECURSION_LIMIT  4
> +static DEFINE_PER_CPU(int, sch_recursion_level);

Maybe use unsigned instead of int?

> +
>  static DEFINE_SPINLOCK(ptype_lock);
>  static DEFINE_SPINLOCK(offload_lock);
>  struct list_head ptype_base[PTYPE_HASH_SIZE] __read_mostly;
> @@ -3598,16 +3601,42 @@ int dev_loopback_xmit(struct net *net, struct sock 
> *sk, struct sk_buff *skb)
>  }
>  EXPORT_SYMBOL(dev_loopback_xmit);
>  
> +static inline int sch_check_inc_recur_level(void)
> +{
> + int rec_level = __this_cpu_inc_return(sch_recursion_level);
> +
> + if (rec_level >= SCH_RECURSION_LIMIT) {
unlikely here?

> + net_warn_ratelimited("Recursion limit reached on TC datapath, 
> probable configuration error\n");

It would be good to know which device this was on.

> + return -ELOOP;
> + }
> +
> + return 0;
> +}
> +
> +static inline void sch_dec_recur_level(void)
> +{
> + __this_cpu_dec(sch_recursion_level);

Decrement of past 0 is an error that should be trapped and logged.

> +}
> +
>  #ifdef CONFIG_NET_EGRESS
>  static struct sk_buff *
>  sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
>  {
>   struct mini_Qdisc *miniq = rcu_dereference_bh(dev->miniq_egress);
>   struct tcf_result cl_res;
> + int err;
>  
>   if (!miniq)
>   return skb;
>  
> + err = sch_check_inc_recur_level();
> + if (err) {
> + sch_dec_recur_level();

You should have sch_check_inc_recur_level do the unwind on error.
That would simplify the error paths.

> + *ret = NET_XMIT_DROP;
> + consume_skb(skb);
> + return NULL;
> + }
> +



Fw: [Bug 203703] New: 5.1 regression makes r8169 Ethernet connection inoperable if fq_codel qdisc is used

2019-05-26 Thread Stephen Hemminger



Begin forwarded message:

Date: Sat, 25 May 2019 04:55:19 +
From: bugzilla-dae...@bugzilla.kernel.org
To: step...@networkplumber.org
Subject: [Bug 203703] New: 5.1 regression makes r8169 Ethernet connection 
inoperable if fq_codel qdisc is used


https://bugzilla.kernel.org/show_bug.cgi?id=203703

Bug ID: 203703
   Summary: 5.1 regression makes r8169 Ethernet connection
inoperable if fq_codel qdisc is used
   Product: Networking
   Version: 2.5
Kernel Version: 5.1.4
  Hardware: x86-64
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: high
  Priority: P1
 Component: Other
  Assignee: step...@networkplumber.org
  Reporter: virtuous...@gmail.com
Regression: No

Created attachment 282937
  --> https://bugzilla.kernel.org/attachment.cgi?id=282937&action=edit  
kernel config

After updating from 5.0.x to 5.1.x my network started halting less than hour
after boot with "network unreachable" messages for any connection attempt. With
these lines in kernel log:
[34441.731088] NETDEV WATCHDOG: enp4s0 (r8169): transmit queue 0 timed out
[34441.731126] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:461
dev_watchdog+0x21a/0x220
[34441.731128] Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event
snd_seq af_packet ts_bm xt_pkttype xt_string nf_nat_ftp nf_conntrack_ftp
xt_tcpudp ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat
ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables scsi_transport_iscsi
ip6table_filter ip6_tables iptable_filter ip_tables x_tables bpfilter rfcomm
bnep zram msr it87 hwmon_vid snd_hda_codec_hdmi snd_usb_audio snd_usbmidi_lib
rc_avermedia btusb snd_rawmidi snd_hda_codec_realtek btrtl
snd_hda_codec_generic snd_seq_device btbcm ledtrig_audio tuner_simple
tuner_types ath9k btintel tuner tda7432 ath9k_common ath9k_hw bluetooth tvaudio
msp3400 ath amd64_edac_mod bttv snd_hda_intel edac_mce_amd kvm_amd
snd_hda_codec snd_hda_core mac80211 snd_hwdep tea575x kvm tveeprom
videobuf_dma_sg videobuf_core snd_pcm_oss
[34441.731180]  rc_core snd_mixer_oss v4l2_common videodev irqbypass mxm_wmi
wmi_bmof amdgpu media pcspkr k10temp snd_pcm cfg80211 r8169 fam15h_power
realtek sp5100_tco i2c_piix4 chash libphy gpu_sched ttm rfkill mac_hid
hid_generic usbhid uas usb_storage sd_mod ohci_pci serio_raw ohci_hcd xhci_pci
ehci_pci ehci_hcd xhci_hcd wmi exfat(O) l2tp_ppp l2tp_netlink l2tp_core
ip6_udp_tunnel udp_tunnel pppox ppp_generic slhc vhba(O) uinput sg nbd
dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ecryptfs
[34441.731218] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G  IO 
5.1.4-1320.g0739fa4-HSF #1 openSUSE Tumbleweed (unreleased)
[34441.731220] Hardware name: Gigabyte Technology Co., Ltd.
GA-990XA-UD3/GA-990XA-UD3, BIOS F14e 09/09/2014
[34441.731224] RIP: 0010:dev_watchdog+0x21a/0x220
[34441.731227] Code: 49 63 4c 24 e0 eb 8c 4c 89 ef c6 05 a7 8d 0e 01 01 e8 9a
dd fa ff 89 d9 4c 89 ee 48 c7 c7 c0 51 9b 9a 48 89 c2 e8 1a e2 44 ff <0f> 0b eb
be 66 90 0f 1f 44 00 00 48 c7 47 08 00 00 00 00 48 c7 07
[34441.731230] RSP: 0018:8c1cede03e40 EFLAGS: 00010286
[34441.731233] RAX:  RBX:  RCX:

[34441.731235] RDX: 0007 RSI: 8c19c5d14dc8 RDI:
0001
[34441.731238] RBP: 8c1cdcd8e4a0 R08: 0103 R09:

[34441.731240] R10:  R11:  R12:
8c1cdcd8e508
[34441.731243] R13: 8c1cdcd8e000 R14: 0001 R15:
8c1cdc31d080
[34441.731246] FS:  () GS:8c1cede0()
knlGS:
[34441.731248] CS:  0010 DS:  ES:  CR0: 80050033
[34441.731251] CR2: 7fec8043aa48 CR3: 0004067d4000 CR4:
000406e0
[34441.731253] Call Trace:
[34441.731256]  
[34441.731263]  ? qdisc_put_unlocked+0x30/0x30
[34441.731269]  call_timer_fn+0xaa/0x300
[34441.731279]  ? qdisc_put_unlocked+0x30/0x30
[34441.731283]  run_timer_softirq+0x1df/0x530
[34441.731291]  ? read_hpet+0x124/0x140
[34441.731302]  __do_softirq+0xf3/0x4c5
[34441.731315]  irq_exit+0xef/0x100
[34441.731319]  smp_apic_timer_interrupt+0xb5/0x270
[34441.731324]  apic_timer_interrupt+0xf/0x20
[34441.731327]  
[34441.731331] RIP: 0010:native_safe_halt+0xe/0x10
[34441.731334] Code: f0 80 48 02 20 48 8b 00 a8 08 75 c3 e9 7c ff ff ff 90 90
90 90 90 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d 86 40 52 00 fb f4  90 e9
07 00 00 00 0f 00 2d 76 40 52 00 f4 c3 90 90 0f 1f 44 00
[34441.731337] RSP: 0018:b835c196beb0 EFLAGS: 0206 ORIG_RAX:
ff13
[34441.731340] RAX: 8c19c5d14440 RBX: 0001 RCX:

[34441.731342] RDX: 8c19c5d14440 RSI: 0006 RDI:
8c19c5d14440
[34441.731344] RBP: 9ae3

Re: netdev_alloc_skb is failing for 16k length

2019-05-27 Thread Stephen Hemminger
On Mon, 27 May 2019 12:21:51 +0530
Arun Kumar Neelakantam  wrote:

> Hi team,
> 
> we are using "skb = netdev_alloc_skb(NULL, len);" which is getting 
> failed sometimes for len = 16k.
> 
> I suspect mostly system memory got fragmented and hence atomic memory 
> allocation for 16k is failing, can you please suggest best way to handle 
> this failure case.
> 
> Thanks
> 
> Arun N
> 

If you are handling big frames, then put the data in page size chunks
and use build_skb.


Fw: [Bug 203743] New: Networking goes down when running Docker and receiving fragmented IPv4 packets

2019-05-28 Thread Stephen Hemminger



Begin forwarded message:

Date: Tue, 28 May 2019 15:49:52 +
From: bugzilla-dae...@bugzilla.kernel.org
To: step...@networkplumber.org
Subject: [Bug 203743] New: Networking goes down when running Docker and 
receiving fragmented IPv4 packets


https://bugzilla.kernel.org/show_bug.cgi?id=203743

Bug ID: 203743
   Summary: Networking goes down when running Docker and receiving
fragmented IPv4 packets
   Product: Networking
   Version: 2.5
Kernel Version: 4.15.0-1032-aws
  Hardware: x86-64
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: IPV4
  Assignee: step...@networkplumber.org
  Reporter: camden.full...@boxcast.com
Regression: No

We are experiencing an issue where our EC2 instances fail instance status
checks and completely lose networking because of cellular UDP traffic going to
an instance.

It seems that when sending UDP traffic using T-Mobile cellular the packets are
fragmented and causes the system to completely lose networking because of this.
I have attached the source code for the iOS app that can reliabily reproduce
this issue as well as the server code to receive the traffic. The packet
capture of the traffic is attached as well. Also important to note that the
system only drops networking when Docker is running, but the fragmentation also
happens no matter if Docker is installed or not.

It's also worth pointing out that when sending the traffic over Cellular to a
local network at our office that the traffic is not fragmented. This makes me
think that there is an issue with networking between T-Mobile and AWS.

Base AWS AMI: ami-0a313d6098716f372
Instance Types: g3.4xlarge or c5.2xlarge
Docker GitHub Issue: https://github.com/docker/for-linux/issues/672
iOS app: https://github.com/docker/for-linux/files/3192116/LockUpDemo.zip
Server app:https://github.com/docker/for-linux/files/3192118/main.c.zip
tcpdump capture:
https://github.com/docker/for-linux/files/3192155/capture.pcap.zip

-- 
You are receiving this mail because:
You are the assignee for the bug.


[PATCH PATCH v4 2/2] net: core: support XDP generic on stacked devices.

2019-05-28 Thread Stephen Hemminger
When a device is stacked like (team, bonding, failsafe or netvsc) the
XDP generic program for the parent device was not called.

Move the call to XDP generic inside __netif_receive_skb_core where
it can be done multiple times for stacked case.

Fixes: d445516966dc ("net: xdp: support xdp generic on virtual devices")
Signed-off-by: Stephen Hemminger 
---
v1 - call xdp_generic in netvsc handler
v2 - do xdp_generic in generic rx handler processing
v3 - move xdp_generic call inside the another pass loop
v4 - reset skb mac_len after xdp is called

 net/core/dev.c | 58 +++---
 1 file changed, 12 insertions(+), 46 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index b6b8505cfb3e..cc2a4e257324 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4502,23 +4502,6 @@ static int netif_rx_internal(struct sk_buff *skb)
 
trace_netif_rx(skb);
 
-   if (static_branch_unlikely(&generic_xdp_needed_key)) {
-   int ret;
-
-   preempt_disable();
-   rcu_read_lock();
-   ret = do_xdp_generic(rcu_dereference(skb->dev->xdp_prog), skb);
-   rcu_read_unlock();
-   preempt_enable();
-
-   /* Consider XDP consuming the packet a success from
-* the netdev point of view we do not want to count
-* this as an error.
-*/
-   if (ret != XDP_PASS)
-   return NET_RX_SUCCESS;
-   }
-
 #ifdef CONFIG_RPS
if (static_branch_unlikely(&rps_needed)) {
struct rps_dev_flow voidflow, *rflow = &voidflow;
@@ -4858,6 +4841,18 @@ static int __netif_receive_skb_core(struct sk_buff *skb, 
bool pfmemalloc,
 
__this_cpu_inc(softnet_data.processed);
 
+   if (static_branch_unlikely(&generic_xdp_needed_key)) {
+   int ret2;
+
+   preempt_disable();
+   ret2 = do_xdp_generic(rcu_dereference(skb->dev->xdp_prog), skb);
+   preempt_enable();
+
+   if (ret2 != XDP_PASS)
+   return NET_RX_DROP;
+   skb_reset_mac_len(skb);
+   }
+
if (skb->protocol == cpu_to_be16(ETH_P_8021Q) ||
skb->protocol == cpu_to_be16(ETH_P_8021AD)) {
skb = skb_vlan_untag(skb);
@@ -5178,19 +5173,6 @@ static int netif_receive_skb_internal(struct sk_buff 
*skb)
if (skb_defer_rx_timestamp(skb))
return NET_RX_SUCCESS;
 
-   if (static_branch_unlikely(&generic_xdp_needed_key)) {
-   int ret;
-
-   preempt_disable();
-   rcu_read_lock();
-   ret = do_xdp_generic(rcu_dereference(skb->dev->xdp_prog), skb);
-   rcu_read_unlock();
-   preempt_enable();
-
-   if (ret != XDP_PASS)
-   return NET_RX_DROP;
-   }
-
rcu_read_lock();
 #ifdef CONFIG_RPS
if (static_branch_unlikely(&rps_needed)) {
@@ -5211,7 +5193,6 @@ static int netif_receive_skb_internal(struct sk_buff *skb)
 
 static void netif_receive_skb_list_internal(struct list_head *head)
 {
-   struct bpf_prog *xdp_prog = NULL;
struct sk_buff *skb, *next;
struct list_head sublist;
 
@@ -5224,21 +5205,6 @@ static void netif_receive_skb_list_internal(struct 
list_head *head)
}
list_splice_init(&sublist, head);
 
-   if (static_branch_unlikely(&generic_xdp_needed_key)) {
-   preempt_disable();
-   rcu_read_lock();
-   list_for_each_entry_safe(skb, next, head, list) {
-   xdp_prog = rcu_dereference(skb->dev->xdp_prog);
-   skb_list_del_init(skb);
-   if (do_xdp_generic(xdp_prog, skb) == XDP_PASS)
-   list_add_tail(&skb->list, &sublist);
-   }
-   rcu_read_unlock();
-   preempt_enable();
-   /* Put passed packets back on main list */
-   list_splice_init(&sublist, head);
-   }
-
rcu_read_lock();
 #ifdef CONFIG_RPS
if (static_branch_unlikely(&rps_needed)) {
-- 
2.20.1



[PATCH PATCH v4 0/2] XDP generic fixes

2019-05-28 Thread Stephen Hemminger
This set of patches came about while investigating XDP
generic on Azure. The split brain nature of the accelerated
networking exposed issues with the stack device model.

Stephen Hemminger (2):
  netvsc: unshare skb in VF rx handler
  net: core: support XDP generic on stacked devices.

 drivers/net/hyperv/netvsc_drv.c |  6 
 net/core/dev.c  | 58 +++--
 2 files changed, 18 insertions(+), 46 deletions(-)

-- 
2.20.1



[PATCH PATCH v4 1/2] netvsc: unshare skb in VF rx handler

2019-05-28 Thread Stephen Hemminger
The netvsc VF skb handler should make sure that skb is not
shared. Similar logic already exists in bonding and team device
drivers.

This is not an issue in practice because the VF devicex
does not send up shared skb's. But the netvsc driver
should do the right thing if it did.

Fixes: 0c195567a8f6 ("netvsc: transparent VF management")
Signed-off-by: Stephen Hemminger 
---
 drivers/net/hyperv/netvsc_drv.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 06393b215102..9873b8679f81 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -2000,6 +2000,12 @@ static rx_handler_result_t netvsc_vf_handle_frame(struct 
sk_buff **pskb)
struct netvsc_vf_pcpu_stats *pcpu_stats
 = this_cpu_ptr(ndev_ctx->vf_stats);
 
+   skb = skb_share_check(skb, GFP_ATOMIC);
+   if (unlikely(!skb))
+   return RX_HANDLER_CONSUMED;
+
+   *pskb = skb;
+
skb->dev = ndev;
 
u64_stats_update_begin(&pcpu_stats->syncp);
-- 
2.20.1



Re: [PATCH iproute2] lib: suppress error msg when filling the cache

2019-05-28 Thread Stephen Hemminger
On Fri, 24 May 2019 10:08:28 -0600
David Ahern  wrote:

> On 5/24/19 2:59 AM, Nicolas Dichtel wrote:
> > Before the patch:
> > $ ip netns add foo
> > $ ip link add name veth1 address 2a:a5:5c:b9:52:89 type veth peer name 
> > veth2 address 2a:a5:5c:b9:53:90 netns foo
> > RTNETLINK answers: No such device
> > RTNETLINK answers: No such device
> > 
> > But the command was successful. This may break script. Let's remove those
> > error messages.
> > 
> > Fixes: 55870dfe7f8b ("Improve batch and dump times by caching link lookups")
> > Reported-by: Philippe Guibert 
> > Signed-off-by: Nicolas Dichtel 
> > ---
> >  lib/ll_map.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/lib/ll_map.c b/lib/ll_map.c
> > index 2d7b65dcb8f7..e0ed54bf77c9 100644
> > --- a/lib/ll_map.c
> > +++ b/lib/ll_map.c
> > @@ -177,7 +177,7 @@ static int ll_link_get(const char *name, int index)
> > addattr_l(&req.n, sizeof(req), IFLA_IFNAME, name,
> >   strlen(name) + 1);
> >  
> > -   if (rtnl_talk(&rth, &req.n, &answer) < 0)
> > +   if (rtnl_talk_suppress_rtnl_errmsg(&rth, &req.n, &answer) < 0)
> > goto out;
> >  
> > /* add entry to cache */
> >   
> 
> In general, ll_link_get suppressing the error message seems like the
> right thing to do.
> 
> For the example above, seems like nl_get_ll_addr_len is the cause of the
> error messages, and it should not be called for this use case (NEWLINK
> with NLM_F_CREATE set)

Agree. I merged the patch to ll_link_get but send another to avoid
the cause of unnecessary query.


Re: [PATCH] tc: flower: fix port value truncation

2019-05-28 Thread Stephen Hemminger
On Mon, 27 May 2019 23:03:49 +0200
Lukasz Czapnik  wrote:

> sscanf truncates read port values silently without any error. As sscanf
> man says:
> (...) sscanf() conform to C89 and C99 and POSIX.1-2001. These standards
> do not specify the ERANGE error.
> 
> Replace sscanf with safer get_be16 that returns error when value is out
> of range.
> 
> Example:
> tc filter add dev eth0 protocol ip parent : prio 1 flower ip_proto
> tcp dst_port 7 hw_tc 1
> 
> Would result in filter for port 4464 without any warning.
> 
> Fixes: 8930840e678b ("tc: flower: Classify packets based port ranges")
> Signed-off-by: Lukasz Czapnik 

Looks good, applied.




Re: [RFC PATCH iproute2-next 1/1] tc: add support for act ctinfo

2019-05-30 Thread Stephen Hemminger
On Thu, 30 May 2019 16:43:20 +
Kevin 'ldir' Darbyshire-Bryant  wrote:

> diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
> index 51a0496f..b0c6a49a 100644
> --- a/include/uapi/linux/pkt_cls.h
> +++ b/include/uapi/linux/pkt_cls.h
> @@ -105,7 +105,8 @@ enum tca_id {
>   TCA_ID_IFE = TCA_ACT_IFE,
>   TCA_ID_SAMPLE = TCA_ACT_SAMPLE,
>   /* other actions go here */
> - __TCA_ID_MAX = 255
> + TCA_ID_CTINFO,
> + __TCA_ID_MAX=255
>  };

This version of the file does not match upstream (the whitespace is different).


Re: [RFC PATCH iproute2-next 1/1] tc: add support for act ctinfo

2019-05-30 Thread Stephen Hemminger
On Thu, 30 May 2019 16:43:20 +
Kevin 'ldir' Darbyshire-Bryant  wrote:

> ctinfo is an action restoring data stored in conntrack marks to various
> fields.  At present it has two independent modes of operation,
> restoration of DSCP into IPv4/v6 diffserv and restoration of conntrack
> marks into packet skb marks.
> 
> It understands a number of parameters specific to this action in
> additional to the usual action syntax.  Each operating mode is
> independent of the other so all options are err, optional, however not
> specifying at least one mode is a bit pointless.
> 
> Usage: ... ctinfo [dscp mask[/statemask]] [cpmark [mask]] [zone ZONE] 
> [CONTROL] [index ]\n"
> 
> DSCP mode
> 
> dscp enables copying of a DSCP store in the conntrack mark into the
> ipv4/v6 diffserv field.  The mask is a 32bit field and specifies where
> in the conntrack mark the DSCP value is stored.  It must be 6 contiguous
> bits long, e.g. 0xfc00 would restore the DSCP from the upper 6 bits
> of the conntrack mark.
> 
> The DSCP copying may be optionally controlled by a statemask.  The
> statemask is a 32bit field, usually with a single bit set and must not
> overlap the dscp mask.  The DSCP restore operation will only take place
> if the corresponding bit/s in conntrack mark yield a non zero result.
> 
> eg. dscp 0xfc00/0x0100 would retrieve the DSCP from the top 6
> bits, whilst using bit 25 as a flag to do so.  Bit 26 is unused in this
> example.
> 
> CPMARK mode
> 
> cpmark enables copying of the conntrack mark to the packet skb mark.  In
> this mode it is completely equivalent to the existing act_connmark.
> Additional functionality is provided by the optional mask parameter,
> whereby the stored conntrack mark is logically anded with the cpmark
> mask before being stored into skb mark.  This allows shared usage of the
> conntrack mark between applications.
> 
> eg. cpmark 0x00ff would restore only the lower 24 bits of the
> conntrack mark, thus may be useful in the event that the upper 8 bits
> are used by the DSCP function.
> 
> Usage: ... ctinfo [dscp mask[/statemask]] [cpmark [mask]] [zone ZONE] 
> [CONTROL] [index ]
> where :
>   dscp MASK is the bitmask to restore DSCP
>STATEMASK is the bitmask to determine conditional restoring
>   cpmark MASK mask applied to restored packet mark
>   ZONE is the conntrack zone
>   CONTROL := reclassify | pipe | drop | continue | ok | goto chain 
> 
> 
> Signed-off-by: Kevin Darbyshire-Bryant 


Your patch is mangled by your mail system, it has DOS line endings etc.

Run checkpatch please.






Re: [PATCH iproute2-next 1/9] libnetlink: Set NLA_F_NESTED in rta_nest

2019-05-30 Thread Stephen Hemminger
On Wed, 29 May 2019 20:17:38 -0700
David Ahern  wrote:

> From: David Ahern 
> 
> Kernel now requires NLA_F_NESTED to be set on new nested
> attributes. Set NLA_F_NESTED in rta_nest.
> 
> Signed-off-by: David Ahern 
> ---
>  lib/libnetlink.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/lib/libnetlink.c b/lib/libnetlink.c
> index 0d48a3d43cf0..6ae51a9dba14 100644
> --- a/lib/libnetlink.c
> +++ b/lib/libnetlink.c
> @@ -1336,6 +1336,7 @@ struct rtattr *rta_nest(struct rtattr *rta, int maxlen, 
> int type)
>   struct rtattr *nest = RTA_TAIL(rta);
>  
>   rta_addattr_l(rta, maxlen, type, NULL, 0);
> + nest->rta_type |= NLA_F_NESTED;
>  
>   return nest;
>  }

I assume older kernels ignore the attribute?

Also, how is this opt-in for running iproute2-next on old kernels?


Re: [PATCH iproute2-next 7/9] Add support for nexthop objects

2019-05-30 Thread Stephen Hemminger
On Wed, 29 May 2019 20:17:44 -0700
David Ahern  wrote:

> +static struct
> +{

kernel style is:

static struct {
...


Re: [PATCH iproute2-next 7/9] Add support for nexthop objects

2019-05-30 Thread Stephen Hemminger
On Wed, 29 May 2019 20:17:44 -0700
David Ahern  wrote:

> +
> +struct rtnl_handle rth_del = { .fd = -1 };
> +

can this be static?


Re: [PATCH iproute2-next 7/9] Add support for nexthop objects

2019-05-30 Thread Stephen Hemminger
On Wed, 29 May 2019 20:17:44 -0700
David Ahern  wrote:

> +
> +static void print_nh_gateway(FILE *fp, const struct nhmsg *nhm,
> +   const struct rtattr *rta)
> +{
> + const char *gateway = format_host_rta(nhm->nh_family, rta);
> +
> + if (is_json_context())
> + print_string(PRINT_JSON, "gateway", NULL, gateway);
> + else {
> + fprintf(fp, "via ");

I was trying to get rid of all use of /fprintf(fp, / since it was
indication of non-json code and fp is always stdout.

Maybe
print_string(PRINT_FP, NULL, "via ", NULL);
print_color_string(PRINT_ANY, ifa_family_color(nhm->nh_family),
"gateway", "%s ", format_host_rta(nhm->nh_family, rta));


   


Re: [PATCH iproute2] iplink: don't try to get ll addr len when creating an iface

2019-05-30 Thread Stephen Hemminger
On Wed, 29 May 2019 16:42:10 +0200
Nicolas Dichtel  wrote:

> It will obviously fail. This is a follow up of the commit
> 757837230a65 ("lib: suppress error msg when filling the cache").
> 
> Suggested-by: David Ahern 
> Signed-off-by: Nicolas Dichtel 

Looks good, thanks for following up.
Applied.


Re: [PATCH iproute2] bridge: mdb: restore text output format

2019-05-30 Thread Stephen Hemminger
On Wed, 29 May 2019 20:52:42 +0300
Nikolay Aleksandrov  wrote:

> While I fixed the mdb json output, I did overlook the text output.
> This patch returns the original text output format:
>  dev  port  grp
> Example (old format, restored by this patch):
>  dev br0 port eth8 grp 239.1.1.11 temp
> 
> Example (changed format after the commit below):
>  23: br0  eth8  239.1.1.11  temp
> 
> We had some reports of failing scripts which were parsing the output.
> Also the old format matches the bridge mdb command syntax which makes
> it easier to build commands out of the output.
> 
> Fixes: c7c1a1ef51ae ("bridge: colorize output and use JSON print library")
> Signed-off-by: Nikolay Aleksandrov 

Looks good, thanks for fixing.
Applied.


Re: [PATCH net v2 1/3] net/sched: act_csum: pull all VLAN headers before checksumming

2019-05-30 Thread Stephen Hemminger
On Thu, 30 May 2019 20:03:41 +0200
Davide Caratti  wrote:

>  
> +static inline int tc_skb_pull_vlans(struct sk_buff *skb,
> + unsigned int *hdr_count,
> + __be16 *proto)
> +{
> + if (skb_vlan_tag_present(skb))
> + *proto = skb->protocol;
> +
> + while (eth_type_vlan(*proto)) {
> + struct vlan_hdr *vlan;
> +
> + if (unlikely(!pskb_may_pull(skb, VLAN_HLEN)))
> + return -ENOMEM;
> +
> + vlan = (struct vlan_hdr *)skb->data;
> + *proto = vlan->h_vlan_encapsulated_proto;
> + skb_pull(skb, VLAN_HLEN);
> + skb_reset_network_header(skb);
> + (*hdr_count)++;
> + }
> + return 0;
> +}

Does this really need to be an inline, or could it just be
part of the sched_api?


Re: [RFC PATCH iproute2-next 1/1] tc: add support for act ctinfo

2019-05-30 Thread Stephen Hemminger
On Thu, 30 May 2019 16:43:20 +
Kevin 'ldir' Darbyshire-Bryant  wrote:

Please don't use HTML encoded mail. I.e not exchange.

> +
> + if (argc) {
> + if (matches(*argv, "dscp") == 0) {
> + NEXT_ARG();
> + char *slash;
> + if ((slash = strchr(*argv, '/')))
> + *slash = '\0';

Don't mix assignment and conditional on same line


Fw: [Bug 203769] New: Regression: Valid network link no longer detected

2019-06-01 Thread Stephen Hemminger
Details are a bit scarce. 

The referenced commit is.

commit 7dc2bccab0ee37ac28096b8fcdc390a679a15841
Author: Maxim Mikityanskiy 
Date:   Tue May 21 06:40:04 2019 +

Validate required parameters in inet6_validate_link_af

Begin forwarded message:

Date: Sat, 01 Jun 2019 09:53:51 +
From: bugzilla-dae...@bugzilla.kernel.org
To: step...@networkplumber.org
Subject: [Bug 203769] New: Regression: Valid network link no longer detected


https://bugzilla.kernel.org/show_bug.cgi?id=203769

Bug ID: 203769
   Summary: Regression: Valid network link no longer detected
   Product: Networking
   Version: 2.5
Kernel Version: 5.2.0-rc2
  Hardware: All
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: Other
  Assignee: step...@networkplumber.org
  Reporter: gwh...@kupulau.com
Regression: No

Commit 7dc2bccab0ee37ac28096b8fcdc390a679a15841 has broken wired networking on
two of my machines.  Drivers e1000e, igb and r8169 all fail to detect link with
this commit applied.  The drivers load and appear to initialize correctly, but
no link is ever detected.

With this commit reverted, they work perfectly.

This is on a fully updated Arch.  Happy to provide any other information that
is of use.

-- 
You are receiving this mail because:
You are the assignee for the bug.


Re: [PATCH iproute2-next v1] tc: add support for action act_ctinfo

2019-06-03 Thread Stephen Hemminger
On Mon,  3 Jun 2019 21:41:43 +0100
Kevin Darbyshire-Bryant  wrote:

> ctinfo is an action restoring data stored in conntrack marks to various
> fields.  At present it has two independent modes of operation,
> restoration of DSCP into IPv4/v6 diffserv and restoration of conntrack
> marks into packet skb marks.
> 
> It understands a number of parameters specific to this action in
> additional to the usual action syntax.  Each operating mode is
> independent of the other so all options are optional, however not
> specifying at least one mode is a bit pointless.
> 
> Usage: ... ctinfo [dscp mask[/statemask]] [cpmark [mask]] [zone ZONE]
> [CONTROL] [index ]
> 
> DSCP mode
> 
> dscp enables copying of a DSCP store in the conntrack mark into the
> ipv4/v6 diffserv field.  The mask is a 32bit field and specifies where
> in the conntrack mark the DSCP value is stored.  It must be 6 contiguous
> bits long, e.g. 0xfc00 would restore the DSCP from the upper 6 bits
> of the conntrack mark.
> 
> The DSCP copying may be optionally controlled by a statemask.  The
> statemask is a 32bit field, usually with a single bit set and must not
> overlap the dscp mask.  The DSCP restore operation will only take place
> if the corresponding bit/s in conntrack mark yield a non zero result.
> 
> eg. dscp 0xfc00/0x0100 would retrieve the DSCP from the top 6
> bits, whilst using bit 25 as a flag to do so.  Bit 26 is unused in this
> example.
> 
> CPMARK mode
> 
> cpmark enables copying of the conntrack mark to the packet skb mark.  In
> this mode it is completely equivalent to the existing act_connmark.
> Additional functionality is provided by the optional mask parameter,
> whereby the stored conntrack mark is logically anded with the cpmark
> mask before being stored into skb mark.  This allows shared usage of the
> conntrack mark between applications.
> 
> eg. cpmark 0x00ff would restore only the lower 24 bits of the
> conntrack mark, thus may be useful in the event that the upper 8 bits
> are used by the DSCP function.
> 
> Usage: ... ctinfo [dscp mask [statemask]] [cpmark [mask]] [zone ZONE]
> [CONTROL] [index ]
> where :
>   dscp MASK is the bitmask to restore DSCP
>STATEMASK is the bitmask to determine conditional restoring
>   cpmark MASK mask applied to restored packet mark
>   ZONE is the conntrack zone
>   CONTROL := reclassify | pipe | drop | continue | ok |
>  goto chain 
> 
> Signed-off-by: Kevin Darbyshire-Bryant 
> Reviewed-by: Toke Høiland-Jørgensen 

How about a man page update?


Re: [PATCH iproute2] man: tc-skbedit.8: document 'inheritdsfield'

2019-06-04 Thread Stephen Hemminger
On Fri, 31 May 2019 14:12:15 +0200
Davide Caratti  wrote:

> while at it, fix missing square bracket near 'ptype' and a typo in the
> action description (it's -> its).
> 
> Signed-off-by: Davide Caratti 

Applied. Thanks


Re: [PATCH] devlink: fix libc and kernel headers collision

2019-06-04 Thread Stephen Hemminger
On Thu, 30 May 2019 18:32:27 +0300
Baruch Siach  wrote:

> Since commit 2f1242efe9d ("devlink: Add devlink health show command") we
> use the sys/sysinfo.h header for the sysinfo(2) system call. But since
> iproute2 carries a local version of the kernel struct sysinfo, this
> causes a collision with libc that do not rely on kernel defined sysinfo
> like musl libc:
> 
> In file included from devlink.c:25:0:
> .../sysroot/usr/include/sys/sysinfo.h:10:8: error: redefinition of 'struct 
> sysinfo'
>  struct sysinfo {
> ^~~
> In file included from ../include/uapi/linux/kernel.h:5:0,
>  from ../include/uapi/linux/netlink.h:5,
>  from ../include/uapi/linux/genetlink.h:6,
>  from devlink.c:21:
> ../include/uapi/linux/sysinfo.h:8:8: note: originally defined here
>  struct sysinfo {
>   ^~~
> 
> Rely on the kernel header alone to avoid kernel and userspace headers
> collision of definitions.
> 
> Cc: Aya Levin 
> Cc: Moshe Shemesh 
> Signed-off-by: Baruch Siach 

Ok, applied. Note that musl libc is not officially supported or tested
as part of iproute2. I will take patches to fix build and bugs, but you
are on your own if you must use it.



Re: [PATCH] devlink: fix libc and kernel headers collision

2019-06-04 Thread Stephen Hemminger
On Thu, 30 May 2019 18:32:27 +0300
Baruch Siach  wrote:

> Since commit 2f1242efe9d ("devlink: Add devlink health show command") we
> use the sys/sysinfo.h header for the sysinfo(2) system call. But since
> iproute2 carries a local version of the kernel struct sysinfo, this
> causes a collision with libc that do not rely on kernel defined sysinfo
> like musl libc:
> 
> In file included from devlink.c:25:0:
> .../sysroot/usr/include/sys/sysinfo.h:10:8: error: redefinition of 'struct 
> sysinfo'
>  struct sysinfo {
> ^~~
> In file included from ../include/uapi/linux/kernel.h:5:0,
>  from ../include/uapi/linux/netlink.h:5,
>  from ../include/uapi/linux/genetlink.h:6,
>  from devlink.c:21:
> ../include/uapi/linux/sysinfo.h:8:8: note: originally defined here
>  struct sysinfo {
>   ^~~
> 
> Rely on the kernel header alone to avoid kernel and userspace headers
> collision of definitions.
> 
> Cc: Aya Levin 
> Cc: Moshe Shemesh 
> Signed-off-by: Baruch Siach 

Sorry this breaks the glibc build.


CC   devlink.o
devlink.c: In function ‘format_logtime’:
devlink.c:6124:8: warning: implicit declaration of function ‘sysinfo’; did you 
mean ‘psiginfo’? [-Wimplicit-function-declaration]
  err = sysinfo(&s_info);
^~~
psiginfo

I backed out the patch now (before pushing it).
Please fix and resubmit.


Re: [PATCH iproute2] tc: simple: don't hardcode the control action

2019-06-04 Thread Stephen Hemminger
On Fri, 31 May 2019 11:34:46 +0200
Davide Caratti  wrote:

> don't hardcode the 'pipe' control action, so that the following TDC test
> 
>  b776 - Replace simple action with invalid goto chain control
> 
> can detect kernels that correctly validate 'goto chain' control action.
> 
> CC: Andrea Claudi 
> CC: Marcelo Ricardo Leitner 
> Signed-off-by: Davide Caratti 

The commit message is too short for me to understand exactly what
you are trying to fix, and what the test does.

Please resubmit with a more complete description. Including before/after.


[PATCH] revert async probing of VMBus network devices.

2019-06-05 Thread Stephen Hemminger
Doing asynchronous probing can lead to reordered network device names.
And because udev doesn't have any useful information to construct a
persistent name, this causes VM's to sporadically boot with reordered
device names and no connectivity.

This shows up on the Ubuntu image on larger VM's where 30% of the
time eth0 and eth1 get swapped.

Note: udev MAC address policy is disabled on Azure images
because the netvsc and PCI VF will have the same mac address.

Fixes: af0a5646cb8d ("use the new async probing feature for the hyperv drivers")
Signed-off-by: Stephen Hemminger 
---
 drivers/net/hyperv/netvsc_drv.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 06393b215102..1a2c3206 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -2411,9 +2411,6 @@ static struct  hv_driver netvsc_drv = {
.id_table = id_table,
.probe = netvsc_probe,
.remove = netvsc_remove,
-   .driver = {
-   .probe_type = PROBE_PREFER_ASYNCHRONOUS,
-   },
 };
 
 /*
-- 
2.20.1



Re: [PATCH RFC iproute2-next v4] tc: add support for action act_ctinfo

2019-06-05 Thread Stephen Hemminger
On Wed, 05 Jun 2019 11:23:59 -0700
Joe Perches  wrote:

> Strict 80 column limits with long identifiers are just silly.
> 
> I don't know how strictly enforced the iproute2 80 column limit
> actually is, but I suggest ignoring that limit where appropriate.


Not strictly enforced, but long identifiers are discouraged.
Overly deep nesting which is the main cause of long lines
is also discouraged.


Re: [PATCH iproute2-next] devlink: Increase bus,device buffer size to 64 bytes

2019-06-06 Thread Stephen Hemminger
On Thu,  6 Jun 2019 06:49:19 -0500
Parav Pandit  wrote:

> Device name on mdev bus is 36 characters long which follow standard uuid
> RFC 4122.
> This is probably the longest name that a kernel will return for a
> device.
> 
> Hence increase the buffer size to 64 bytes.
> 
> Acked-by: Jiri Pirko 
> Signed-off-by: Parav Pandit 
> 
> ---
>  devlink/devlink.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/devlink/devlink.c b/devlink/devlink.c
> index 436935f8..559f624e 100644
> --- a/devlink/devlink.c
> +++ b/devlink/devlink.c
> @@ -1523,7 +1523,7 @@ static void __pr_out_handle_start(struct dl *dl, struct 
> nlattr **tb,
>  {
>   const char *bus_name = mnl_attr_get_str(tb[DEVLINK_ATTR_BUS_NAME]);
>   const char *dev_name = mnl_attr_get_str(tb[DEVLINK_ATTR_DEV_NAME]);
> - char buf[32];
> + char buf[64];
>  
>   sprintf(buf, "%s/%s", bus_name, dev_name);
>  
> @@ -1616,7 +1616,7 @@ static void __pr_out_port_handle_start(struct dl *dl, 
> const char *bus_name,
>  uint32_t port_index, bool try_nice,
>  bool array)
>  {
> - static char buf[32];
> + static char buf[64];
>   char *ifname = NULL;
>  
>   if (dl->no_nice_names || !try_nice ||

I will take this now no need to wait for next


Re: [PATCH iproute2 net-next v1 3/6] taprio: Add support for enabling offload mode

2019-06-06 Thread Stephen Hemminger
On Thu,  6 Jun 2019 10:52:18 -0700
Vedang Patel  wrote:

> @@ -405,6 +420,7 @@ static int taprio_print_opt(struct qdisc_util *qu, FILE 
> *f, struct rtattr *opt)
>   struct rtattr *tb[TCA_TAPRIO_ATTR_MAX + 1];
>   struct tc_mqprio_qopt *qopt = 0;
>   __s32 clockid = CLOCKID_INVALID;
> + __u32 offload_flags = 0;
>   int i;
>  
>   if (opt == NULL)
> @@ -442,6 +458,11 @@ static int taprio_print_opt(struct qdisc_util *qu, FILE 
> *f, struct rtattr *opt)
>  
>   print_string(PRINT_ANY, "clockid", "clockid %s", 
> get_clock_name(clockid));
>  
> + if (tb[TCA_TAPRIO_ATTR_OFFLOAD_FLAGS])
> + offload_flags = 
> rta_getattr_u32(tb[TCA_TAPRIO_ATTR_OFFLOAD_FLAGS]);
> +
> + print_uint(PRINT_ANY, "offload", " offload %x", offload_flags);

I don't think offload flags should be  printed at all if not present.

Why not?
if (tb[TCA_TAPRIO_ATTR_OFFLOLAD_FLAGS]) {
__u32 offload_flags = 
rta_getattr_u32(tb[TCA_TAPRIO_ATTR_OFFLOAD_FLAGS]);

print_uint(PRINT_ANY, "offload", " offload %x", offload_flags);
}


Re: [PATCH iproute2 net-next v1 4/6] taprio: add support for setting txtime_delay.

2019-06-06 Thread Stephen Hemminger
On Thu,  6 Jun 2019 10:52:19 -0700
Vedang Patel  wrote:

> + if (tb[TCA_TAPRIO_ATTR_TXTIME_DELAY])
> + txtime_delay = 
> rta_getattr_s32(tb[TCA_TAPRIO_ATTR_TXTIME_DELAY]);
> +
> + print_int(PRINT_ANY, "txtime_delay", " txtime delay %d", txtime_delay);
> +

Once again do not print anything if option is not present.


Re: [PATCH] revert async probing of VMBus network devices.

2019-06-06 Thread Stephen Hemminger
On Wed,  5 Jun 2019 11:51:14 -0700
Stephen Hemminger  wrote:

> Doing asynchronous probing can lead to reordered network device names.
> And because udev doesn't have any useful information to construct a
> persistent name, this causes VM's to sporadically boot with reordered
> device names and no connectivity.
> 
> This shows up on the Ubuntu image on larger VM's where 30% of the
> time eth0 and eth1 get swapped.
> 
> Note: udev MAC address policy is disabled on Azure images
> because the netvsc and PCI VF will have the same mac address.
> 
> Fixes: af0a5646cb8d ("use the new async probing feature for the hyperv 
> drivers")
> Signed-off-by: Stephen Hemminger 
> ---
>  drivers/net/hyperv/netvsc_drv.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
> index 06393b215102..1a2c3206 100644
> --- a/drivers/net/hyperv/netvsc_drv.c
> +++ b/drivers/net/hyperv/netvsc_drv.c
> @@ -2411,9 +2411,6 @@ static struct  hv_driver netvsc_drv = {
>   .id_table = id_table,
>   .probe = netvsc_probe,
>   .remove = netvsc_remove,
> - .driver = {
> - .probe_type = PROBE_PREFER_ASYNCHRONOUS,
> - },
>  };
>  
>  /*

Even though storage can handle out of order devices, networking can not.
The network devices in Hyper-V do not have any persistant properties that will
work with existing udev. The current kernel is breaking current distributions.
This patch fixes it, why did you reject it?


Re: [PATCH iproute2 1/1] tc: Fix binding of gact action by index.

2019-06-06 Thread Stephen Hemminger
On Thu,  6 Jun 2019 17:32:09 -0400
Roman Mashak  wrote:

> The following operation fails:
> % sudo tc actions add action pipe index 1
> % sudo tc filter add dev lo parent : \
>protocol ip pref 10 u32 match ip src 127.0.0.2 \
>flowid 1:10 action gact index 1
> 
> Bad action type index
> Usage: ... gact  [RAND] [INDEX]
> Where:  ACTION := reclassify | drop | continue | pass | pipe |
>   goto chain  | jump 
> RAND := random   
> RANDTYPE := netrand | determ
> VAL : = value not exceeding 1
> JUMP_COUNT := Absolute jump from start of action list
> INDEX := index value used
> 
> However, passing a control action of gact rule during filter binding works:
> 
> % sudo tc filter add dev lo parent : \
>protocol ip pref 10 u32 match ip src 127.0.0.2 \
>flowid 1:10 action gact pipe index 1
> 
> Binding by reference, i.e. by index, has to consistently work with
> any tc action.
> 
> Since tc is sensitive to the order of keywords passed on the command line,
> we can teach gact to skip parsing arguments as soon as it sees 'gact'
> followed by 'index' keyword. 
> 
> Signed-off-by: Roman Mashak 
> ---

Applied


Re: [PATCH iproute2 1/1] tc: Fix binding of gact action by index.

2019-06-06 Thread Stephen Hemminger
On Thu,  6 Jun 2019 17:32:09 -0400
Roman Mashak  wrote:

> The following operation fails:
> % sudo tc actions add action pipe index 1
> % sudo tc filter add dev lo parent : \
>protocol ip pref 10 u32 match ip src 127.0.0.2 \
>flowid 1:10 action gact index 1
> 
> Bad action type index
> Usage: ... gact  [RAND] [INDEX]
> Where:  ACTION := reclassify | drop | continue | pass | pipe |
>   goto chain  | jump 
> RAND := random   
> RANDTYPE := netrand | determ
> VAL : = value not exceeding 1
> JUMP_COUNT := Absolute jump from start of action list
> INDEX := index value used
> 
> However, passing a control action of gact rule during filter binding works:
> 
> % sudo tc filter add dev lo parent : \
>protocol ip pref 10 u32 match ip src 127.0.0.2 \
>flowid 1:10 action gact pipe index 1
> 
> Binding by reference, i.e. by index, has to consistently work with
> any tc action.
> 
> Since tc is sensitive to the order of keywords passed on the command line,
> we can teach gact to skip parsing arguments as soon as it sees 'gact'
> followed by 'index' keyword. 
> 
> Signed-off-by: Roman Mashak 

Applied


Re: [PATCH iproute2 v2] tc: simple: don't hardcode the control action

2019-06-06 Thread Stephen Hemminger
On Wed,  5 Jun 2019 00:30:16 +0200
Davide Caratti  wrote:

> the following TDC test case:
> 
>  b776 - Replace simple action with invalid goto chain control
> 
> checks if the kernel correctly validates the 'goto chain' control action,
> when it is specified in 'act_simple' rules. The test systematically fails
> because the control action is hardcoded in parse_simple(), i.e. it is not
> parsed by command line arguments, so its value is constantly TC_ACT_PIPE.
> Because of that, the following command:
> 
>  # tc action add action simple sdata "test" drop index 7
> 
> installs an 'act_simple' rule that never drops packets, and whose 'index'
> is the first IDR available, plus an 'act_gact' rule with 'index' equal to
> 7, that drops packets.
> 
> Use parse_action_control_dflt(), like we did on many other TC actions, to
> make the control action configurable also with 'act_simple'. The expected
> results of test b776 are summarized below:
> 
>  iproute2
>v   kernel->| 5.1-rc2 (and previous)  | 5.1-rc3 (and subsequent)
>  --+-+-
>  5.1.0 | FAIL (bad IDR)  | FAIL (bad IDR)
>  5.1.0(patched)| FAIL (no rule/bad sdata)| PASS
> 
> Changes since v1:
>  - reword commit message, thanks Stephen Hemminger
> 
> Fixes: 087f46ee4ebd ("tc: introduce simple action")
> CC: Andrea Claudi 
> CC: Marcelo Ricardo Leitner 
> Signed-off-by: Davide Caratti 

Applied, thanks


Re: [PATCH iproute2 net-next v1 3/6] taprio: Add support for enabling offload mode

2019-06-06 Thread Stephen Hemminger
On Thu, 6 Jun 2019 21:13:50 +
"Patel, Vedang"  wrote:

> > On Jun 6, 2019, at 12:43 PM, Stephen Hemminger  
> > wrote:
> > 
> > On Thu,  6 Jun 2019 10:52:18 -0700
> > Vedang Patel  wrote:
> >   
> >> @@ -405,6 +420,7 @@ static int taprio_print_opt(struct qdisc_util *qu, 
> >> FILE *f, struct rtattr *opt)
> >>struct rtattr *tb[TCA_TAPRIO_ATTR_MAX + 1];
> >>struct tc_mqprio_qopt *qopt = 0;
> >>__s32 clockid = CLOCKID_INVALID;
> >> +  __u32 offload_flags = 0;
> >>int i;
> >> 
> >>if (opt == NULL)
> >> @@ -442,6 +458,11 @@ static int taprio_print_opt(struct qdisc_util *qu, 
> >> FILE *f, struct rtattr *opt)
> >> 
> >>print_string(PRINT_ANY, "clockid", "clockid %s", 
> >> get_clock_name(clockid));
> >> 
> >> +  if (tb[TCA_TAPRIO_ATTR_OFFLOAD_FLAGS])
> >> +  offload_flags = 
> >> rta_getattr_u32(tb[TCA_TAPRIO_ATTR_OFFLOAD_FLAGS]);
> >> +
> >> +  print_uint(PRINT_ANY, "offload", " offload %x", offload_flags);  
> > 
> > I don't think offload flags should be  printed at all if not present.
> > 
> > Why not?  
> Will make this in the next version.

Mostly this is so that output doesn't change for users who aren't using offload 
or have old kernel.


Fw: [Bug 203843] New: e1000e crash when ethernet is plugged in

2019-06-07 Thread Stephen Hemminger



Begin forwarded message:

Date: Fri, 07 Jun 2019 10:09:19 +
From: bugzilla-dae...@bugzilla.kernel.org
To: step...@networkplumber.org
Subject: [Bug 203843] New: e1000e crash when ethernet is plugged in


https://bugzilla.kernel.org/show_bug.cgi?id=203843

Bug ID: 203843
   Summary: e1000e crash when ethernet is plugged in
   Product: Networking
   Version: 2.5
Kernel Version: 5.1.7
  Hardware: Intel
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: Other
  Assignee: step...@networkplumber.org
  Reporter: justinvan...@gmail.com
Regression: No

Created attachment 283143
  --> https://bugzilla.kernel.org/attachment.cgi?id=283143&action=edit  
log of the crash

I get a crash when i plug in a network cable after the system has booted.

Im on kernel version 5.1.7

e1000e driver version 3.4.2.4






T30 ~ # [   20.027123] e1000e: enp0s31f6 NIC Link is Up 10 Mbps Full Duplex,
Flow Control: None
[   20.027151] e1000e :00:1f.6 enp0s31f6: 10/100 speed: disabling TSO
[   20.027211] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s31f6: link becomes ready
[   25.212764] [ cut here ]
[   25.212824] NETDEV WATCHDOG: enp0s31f6 (e1000e): transmit queue 0 timed out
[   25.212909] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:461
dev_watchdog+0x1ee/0x200
[   25.212977] Modules linked in: snd_hda_codec_realtek snd_hda_codec_generic
snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core e1000e(O) snd_pcm tpm_tis
tpm_tis_core efivarfs
[   25.213107] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G   O 
5.1.7-gentoo #1
[   25.213171] Hardware name: Dell Inc. PowerEdge T30/07T4MC, BIOS 1.0.15
07/12/2018
[   25.213236] RIP: 0010:dev_watchdog+0x1ee/0x200
[   25.213277] Code: 00 48 63 4d e0 eb 93 4c 89 e7 c6 05 bf 98 ad 00 01 e8 26
d4 fc ff 89 d9 4c 89 e6 48 c7 c7 b0 85 68 84 48 89 c2 e8 79 96 83 ff <0f> 0b eb
c0 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 c7 47 08
[   25.213421] RSP: 0018:9eefddb83ea0 EFLAGS: 00010286
[   25.213466] RAX:  RBX:  RCX:

[   25.213525] RDX: 00040400 RSI: 00f6 RDI:
0300
[   25.213584] RBP: 9eefcaa9c440 R08: 039c R09:
00aa
[   25.213643] R10:  R11: b691f01c0220 R12:
9eefcaa9c000
[   25.213702] R13: 0003 R14: 9eefddb83ef0 R15:

[   25.213762] FS:  () GS:9eefddb8()
knlGS:
[   25.213828] CS:  0010 DS:  ES:  CR0: 80050033
[   25.213877] CR2: 7f6a77a0c680 CR3: 0001b820e003 CR4:
003606e0
[   25.213936] DR0:  DR1:  DR2:

[   25.213994] DR3:  DR6: fffe0ff0 DR7:
0400
[   25.214053] Call Trace:
[   25.214080]  
[   25.214107]  ? qdisc_put_unlocked+0x30/0x30
[   25.214148]  call_timer_fn+0x26/0x120
[   25.214185]  run_timer_softirq+0x390/0x3c0
[   25.214225]  ? tick_sched_timer+0x32/0x70
[   25.214266]  ? __hrtimer_run_queues+0x10b/0x280
[   25.214310]  ? recalibrate_cpu_khz+0x10/0x10
[   25.214352]  __do_softirq+0xd3/0x2ec
[   25.214390]  irq_exit+0xa0/0xb0
[   25.214422]  smp_apic_timer_interrupt+0x67/0x130
[   25.214466]  apic_timer_interrupt+0xf/0x20
[   25.214504]  
[   25.214530] RIP: 0010:cpuidle_enter_state+0xac/0x420
[   25.214574] Code: 89 04 24 0f 1f 44 00 00 31 ff e8 ff 66 8f ff 45 84 ff 74
12 9c 58 f6 c4 02 0f 85 3e 03 00 00 31 ff e8 58 ca 93 ff fb 45 85 e4 <0f> 88 7c
02 00 00 49 63 cc 48 8b 34 24 48 2b 74 24 08 48 8d 04 49
[   25.214717] RSP: 0018:b691c00bbe98 EFLAGS: 0206 ORIG_RAX:
ff13
[   25.214787] RAX: 9eefddba0d00 RBX: 8489b260 RCX:
001f
[   25.214846] RDX: 0005decbd37a RSI: 26a5b845 RDI:

[   25.214905] RBP: 9eefddba8200 R08: 0002 R09:
000205c0
[   25.214964] R10: 0017dec25626 R11: 9eefddb9fe44 R12:
0006
[   25.215023] R13: 8489b4b8 R14: 8489b4a0 R15:

[   25.215086]  ? cpuidle_enter_state+0x91/0x420
[   25.215128]  do_idle+0x1a6/0x1e0
[   25.215164]  cpu_startup_entry+0x14/0x20
[   25.215202]  start_secondary+0x159/0x180
[   25.215240]  secondary_startup_64+0xa4/0xb0
[   25.215279] ---[ end trace 7e6afe981485542a ]---
[   25.215346] e1000e :00:1f.6 enp0s31f6: Reset adapter unexpectedly
[   29.186751] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow
Control: None

-- 
You are receiving this mail because:
You are the assignee for the bug.


Re: [PATCH iproute2] ip: reset netns after each command in batch mode

2019-06-07 Thread Stephen Hemminger
On Fri,  7 Jun 2019 12:13:13 +0200
Matteo Croce  wrote:

> +void netns_restore(void)
> +{
> + if (saved_netns != -1) {

If saved_netns is -1 then it is a program bug becase
no save was done? then do something?


> + if (!setns(saved_netns, CLONE_NEWNET)) {
> + close(saved_netns);
> + saved_netns = -1;
> + } else {
> + perror("setns");

If you are going to look for errors. then you need to either
return the error or cause the program to exit.
You don't want later commands in the batch to be applied
to wrong namespace.


Re: [PATCH v2 iproute-next 10/10] ipmonitor: Add nexthop option to monitor

2019-06-07 Thread Stephen Hemminger
On Fri,  7 Jun 2019 15:38:16 -0700
David Ahern  wrote:

> From: David Ahern 
> 
> Add capability to ip-monitor to listen and dump nexthop messages.
> Since the nexthop group = 32 which exceeds the max groups bit
> field, 2 separate flags are needed - one that defaults on to indicate
> nexthop group is joined by default and a second that indicates a
> specific selection by the user (e.g, ip mon nexthop route).
> 
> Signed-off-by: David Ahern 

Acked-by: Stephen Hemminger 


Fw: [Bug 203847] New: VMware e1000 adapter doesn't get an IP address assigned. Manually assigning one does not solve the issue

2019-06-07 Thread Stephen Hemminger



Begin forwarded message:

Date: Fri, 07 Jun 2019 23:53:00 +
From: bugzilla-dae...@bugzilla.kernel.org
To: step...@networkplumber.org
Subject: [Bug 203847] New: VMware e1000 adapter doesn't get an IP address 
assigned. Manually assigning one does not solve the issue


https://bugzilla.kernel.org/show_bug.cgi?id=203847

Bug ID: 203847
   Summary: VMware e1000 adapter doesn't get an IP address
assigned. Manually assigning one does not solve the
issue
   Product: Networking
   Version: 2.5
Kernel Version: 5.2-rc3
  Hardware: x86-64
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: high
  Priority: P1
 Component: Other
  Assignee: step...@networkplumber.org
  Reporter: www.anuprita...@gmail.com
Regression: No

Created attachment 283149
  --> https://bugzilla.kernel.org/attachment.cgi?id=283149&action=edit  
Dmesg for 5.2-rc3

When booting 5.2-rc3 on an arubacloud VPS (running VMware and using the e1000
driver for internet access) no IP address is assigned to eth0 

The last working kernel was 5.2-rc2
Systemd-networkd is being used for network setup. Tried networkmanager too.
With systemd-networkd no IP address is assigned to eth0. Networkmanager does
assign an address to eth0 but there is no internet access

Booting 5.2-rc2 or 5.1.7 works with both systemd-networkd and networkmanager
without any issues

The dmesg for 5.2-rc3 has been attached
The kernel configuration can be found here
https://emptyclouds.net/anupri/dot.config

This is an output of ip addr on 5.2-rc3

1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group
default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
   valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
   valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc noop state DOWN group default
qlen 1000
link/ether 00:50:56:9f:92:d8 brd ff:ff:ff:ff:ff:ff
3: eth1:  mtu 1500 qdisc noop state DOWN group default
qlen 1000
link/ether 00:50:56:9f:e4:53 brd ff:ff:ff:ff:ff:ff
4: eth2:  mtu 1500 qdisc noop state DOWN group default
qlen 1000
link/ether 00:50:56:9f:31:a5 brd ff:ff:ff:ff:ff:ff

Same thing but with 5.2-rc2 (or 5.1.7 as the output is the same for both)
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group
default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
   valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
   valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc cake state UP group
default qlen 1000
link/ether 00:50:56:9f:92:d8 brd ff:ff:ff:ff:ff:ff
inet 217.61.104.90/24 brd 217.61.104.255 scope global eth0
   valid_lft forever preferred_lft forever
inet6 2a03:a140:10:2c5a::1/64 scope global
   valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:fe9f:92d8/64 scope link
   valid_lft forever preferred_lft forever
3: eth1:  mtu 1500 qdisc noop state DOWN group default
qlen 1000
link/ether 00:50:56:9f:e4:53 brd ff:ff:ff:ff:ff:ff
4: eth2:  mtu 1500 qdisc noop state DOWN group default
qlen 1000
link/ether 00:50:56:9f:31:a5 brd ff:ff:ff:ff:ff:ff

ip link on 5.2-rc3
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN mode
DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0:  mtu 1500 qdisc noop state DOWN mode DEFAULT
group default qlen 1000
link/ether 00:50:56:9f:92:d8 brd ff:ff:ff:ff:ff:ff
3: eth1:  mtu 1500 qdisc noop state DOWN mode DEFAULT
group default qlen 1000
link/ether 00:50:56:9f:e4:53 brd ff:ff:ff:ff:ff:ff
4: eth2:  mtu 1500 qdisc noop state DOWN mode DEFAULT
group default qlen 1000
link/ether 00:50:56:9f:31:a5 brd ff:ff:ff:ff:ff:ff
[root@archlinux ~]# ip link
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN mode
DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0:  mtu 1500 qdisc cake state UP mode
DEFAULT group default qlen 1000
link/ether 00:50:56:9f:92:d8 brd ff:ff:ff:ff:ff:ff
3: eth1:  mtu 1500 qdisc noop state DOWN mode DEFAULT
group default qlen 1000
link/ether 00:50:56:9f:e4:53 brd ff:ff:ff:ff:ff:ff
4: eth2:  mtu 1500 qdisc noop state DOWN mode DEFAULT
group default qlen 1000

ip link on 5.2-rc2
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN mode
DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0:  mtu 1500 qdisc cake state UP mode
DEFAULT group default qlen 1000
link/ether 00:50:56:9f:92:d8 brd ff:ff:ff:ff:ff:ff
3: eth1:  mtu 1500 qdisc noop state DOWN mode DEFAULT
group default qlen 1000
link/ether 00:50:56:9f:e4:53 brd ff:ff:ff:ff:ff:ff
4: eth2:  mtu 1500 qdisc noop state DOWN mode DEFAULT
group default qlen 1000
link/ether 00:50:56:9f:31:a5 brd ff:ff:ff:ff:ff:ff

-- 
You are receiving this mail b

Re: [PATCH iproute2 v2] ip: reset netns after each command in batch mode

2019-06-10 Thread Stephen Hemminger
On Fri,  7 Jun 2019 22:41:22 +0200
Matteo Croce  wrote:

> When creating a new netns or executing a program into an existing one,
> the unshare() or setns() calls will change the current netns.
> In batch mode, this can run commands on the wrong interfaces, as the
> ifindex value is meaningful only in the current netns. For example, this
> command fails because veth-c doesn't exists in the init netns:
> 
> # ip -b - <<-'EOF'
> netns add client
> link add name veth-c type veth peer veth-s netns client
> addr add 192.168.2.1/24 dev veth-c
> EOF
> Cannot find device "veth-c"
> Command failed -:7
> 
> But if there are two devices with the same name in the init and new netns,
> ip will build a wrong ll_map with indexes belonging to the new netns,
> and will execute actions in the init netns using this wrong mapping.
> This script will flush all eth0 addresses and bring it down, as it has
> the same ifindex of veth0 in the new netns:
> 
> # ip addr
> 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group 
> default qlen 1000
> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> inet 127.0.0.1/8 scope host lo
>valid_lft forever preferred_lft forever
> 2: eth0:  mtu 1500 qdisc mq state UP 
> group default qlen 1000
> link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
> inet 192.168.122.76/24 brd 192.168.122.255 scope global dynamic eth0
>valid_lft 3598sec preferred_lft 3598sec
> 
> # ip -b - <<-'EOF'
> netns add client
> link add name veth0 type veth peer name veth1
> link add name veth-ns type veth peer name veth0 netns client
> link set veth0 down
> address flush veth0
> EOF
> 
> # ip addr
> 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group 
> default qlen 1000
> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> inet 127.0.0.1/8 scope host lo
>valid_lft forever preferred_lft forever
> 2: eth0:  mtu 1500 qdisc mq state DOWN group default 
> qlen 1000
> link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
> 3: veth1@veth0:  mtu 1500 qdisc noop state 
> DOWN group default qlen 1000
> link/ether c2:db:d0:34:13:4a brd ff:ff:ff:ff:ff:ff
> 4: veth0@veth1:  mtu 1500 qdisc noop state 
> DOWN group default qlen 1000
> link/ether ca:9d:6b:5f:5f:8f brd ff:ff:ff:ff:ff:ff
> 5: veth-ns@if2:  mtu 1500 qdisc noop state DOWN 
> group default qlen 1000
> link/ether 32:ef:22:df:51:0a brd ff:ff:ff:ff:ff:ff link-netns client
> 
> The same issue can be triggered by the netns exec subcommand with a
> sligthy different script:
> 
> # ip netns add client
> # ip -b - <<-'EOF'
> netns exec client true
> link add name veth0 type veth peer name veth1
> link add name veth-ns type veth peer name veth0 netns client
> link set veth0 down
> address flush veth0
> EOF
> 
> Fix this by adding two netns_{save,reset} functions, which are used
> to get a file descriptor for the init netns, and restore it after
> each batch command.
> netns_save() is called before the unshare() or setns(),
> while netns_restore() is called after each command.
> 
> Fixes: 0dc34c7713bb ("iproute2: Add processless network namespace support")
> Reviewed-and-tested-by: Andrea Claudi 
> Signed-off-by: Matteo Croce 

Applied, thanks.


Re: [PATCH iproute2] ip6tunnel: fix 'ip -6 {show|change} dev ' cmds

2019-06-10 Thread Stephen Hemminger
On Thu,  6 Jun 2019 16:44:26 -0700
Mahesh Bandewar  wrote:

> Inclusion of 'dev' is allowed by the syntax but not handled
> correctly by the command. It produces no output for show
> command and falsely successful for change command but does
> not make any changes.
> 
> can be verified with the following steps
>   # ip -6 tunnel add ip6tnl1 mode ip6gre local fd::1 remote fd::2 tos inherit 
> ttl 127 encaplimit none
>   # ip -6 tunnel show ip6tnl1
>   
>   # ip -6 tunnel show dev ip6tnl1
>   
>   # ip -6 tunnel change dev ip6tnl1 local 2001:1234::1 remote 2001:1234::2 
> encaplimit none ttl 127 tos inherit allow-localremote
>   # echo $?
>   0
>   # ip -6 tunnel show ip6tnl1
>   
> 
> Signed-off-by: Mahesh Bandewar 

Applied, thanks.


Re: [PATCH iproute2 1/2] netns: switch netns in the child when executing commands

2019-06-10 Thread Stephen Hemminger
On Tue, 11 Jun 2019 00:16:12 +0200
Matteo Croce  wrote:

> + printf("\nnetns: %s\n", nsname);
> + cmd_exec(argv[0], argv, true, nsname);
>   return 0;

simple printf breaks JSON output.


Re: [PATCH iproute2 1/2] netns: switch netns in the child when executing commands

2019-06-10 Thread Stephen Hemminger
On Tue, 11 Jun 2019 01:03:57 +0200
Matteo Croce  wrote:

> On Tue, Jun 11, 2019 at 12:52 AM Matteo Croce  wrote:
> >
> > On Tue, Jun 11, 2019 at 12:46 AM Stephen Hemminger
> >  wrote:  
> > >
> > > On Tue, 11 Jun 2019 00:16:12 +0200
> > > Matteo Croce  wrote:
> > >  
> > > > + printf("\nnetns: %s\n", nsname);
> > > > + cmd_exec(argv[0], argv, true, nsname);
> > > >   return 0;  
> > >
> > > simple printf breaks JSON output.  
> >
> > It was just moved from on_netns_label(). I will check how the json
> > output works when running doall and provide a similar behaviour.
> >
> > Anyway, I noticed that the VRF env should be reset but only in the
> > child. I'm adding a function pointer to cmd_exec which will
> > point to an hook which changes the netns when doing 'ip netns exec'
> > and reset the VRF on vrf exec.
> >
> > Regards,
> > --
> > Matteo Croce
> > per aspera ad upstream  
> 
> Hi Stephen,
> 
> just checked, but it seems that netns exec in batch mode produces an
> invalid output anyway:
> 
> # ip netns add n1
> # ip netns add n2
> # ip -all -json netns exec date
> 
> netns: n2
> Tue 11 Jun 2019 12:55:11 AM CEST
> 
> netns: n1
> Tue 11 Jun 2019 12:55:11 AM CEST
> 
> Probably there is very little sense in using -all netns exec and json
> together, but worth noting it.
> 
> Bye,

Thanks, just being paranoid about json output.


Re: [PATCH iproute2] utils: don't match empty strings as prefixes

2019-07-15 Thread Stephen Hemminger
On Sun, 14 Jul 2019 16:57:54 +0200
Matteo Croce  wrote:

> On Wed, Jul 10, 2019 at 1:18 AM Matteo Croce  wrote:
> >
> > On Tue, Jul 9, 2019 at 11:38 PM Stephen Hemminger
> >  wrote:  
> > >
> > > On Tue,  9 Jul 2019 22:40:40 +0200
> > > Matteo Croce  wrote:
> > >  
> > > > iproute has an utility function which checks if a string is a prefix for
> > > > another one, to allow use of abbreviated commands, e.g. 'addr' or 'a'
> > > > instead of 'address'.
> > > >
> > > > This routine unfortunately considers an empty string as prefix
> > > > of any pattern, leading to undefined behaviour when an empty
> > > > argument is passed to ip:
> > > >
> > > > # ip ''
> > > > 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN 
> > > > group default qlen 1000
> > > > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> > > > inet 127.0.0.1/8 scope host lo
> > > >valid_lft forever preferred_lft forever
> > > > inet6 ::1/128 scope host
> > > >valid_lft forever preferred_lft forever
> > > >
> > > > # tc ''
> > > > qdisc noqueue 0: dev lo root refcnt 2
> > > >
> > > > # ip address add 192.0.2.0/24 '' 198.51.100.1 dev dummy0
> > > > # ip addr show dev dummy0
> > > > 6: dummy0:  mtu 1500 qdisc noop state DOWN group 
> > > > default qlen 1000
> > > > link/ether 02:9d:5e:e9:3f:c0 brd ff:ff:ff:ff:ff:ff
> > > > inet 192.0.2.0/24 brd 198.51.100.1 scope global dummy0
> > > >valid_lft forever preferred_lft forever
> > > >
> > > > Rewrite matches() so it takes care of an empty input, and doesn't
> > > > scan the input strings three times: the actual implementation
> > > > does 2 strlen and a memcpy to accomplish the same task.
> > > >
> > > > Signed-off-by: Matteo Croce 
> > > > ---
> > > >  include/utils.h |  2 +-
> > > >  lib/utils.c | 14 +-
> > > >  2 files changed, 10 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/include/utils.h b/include/utils.h
> > > > index 927fdc17..f4d12abb 100644
> > > > --- a/include/utils.h
> > > > +++ b/include/utils.h
> > > > @@ -198,7 +198,7 @@ int nodev(const char *dev);
> > > >  int check_ifname(const char *);
> > > >  int get_ifname(char *, const char *);
> > > >  const char *get_ifname_rta(int ifindex, const struct rtattr *rta);
> > > > -int matches(const char *arg, const char *pattern);
> > > > +int matches(const char *prefix, const char *string);
> > > >  int inet_addr_match(const inet_prefix *a, const inet_prefix *b, int 
> > > > bits);
> > > >  int inet_addr_match_rta(const inet_prefix *m, const struct rtattr 
> > > > *rta);
> > > >
> > > > diff --git a/lib/utils.c b/lib/utils.c
> > > > index be0f11b0..73ce19bb 100644
> > > > --- a/lib/utils.c
> > > > +++ b/lib/utils.c
> > > > @@ -887,13 +887,17 @@ const char *get_ifname_rta(int ifindex, const 
> > > > struct rtattr *rta)
> > > >   return name;
> > > >  }
> > > >
> > > > -int matches(const char *cmd, const char *pattern)
> > > > +/* Check if 'prefix' is a non empty prefix of 'string' */
> > > > +int matches(const char *prefix, const char *string)
> > > >  {
> > > > - int len = strlen(cmd);
> > > > + if (!*prefix)
> > > > + return 1;
> > > > + while(*string && *prefix == *string) {
> > > > + prefix++;
> > > > + string++;
> > > > + }
> > > >
> > > > - if (len > strlen(pattern))
> > > > - return -1;
> > > > - return memcmp(pattern, cmd, len);
> > > > + return *prefix;
> > > >  }
> > > >
> > > >  int inet_addr_match(const inet_prefix *a, const inet_prefix *b, int 
> > > > bits)  
> > >
> > > ERROR: space required before the open parenthesis '('
> > > #134: FILE: lib/utils.c:895:
> > > +   while(*string && *prefix == *string) {
> > >
> > > total: 1 err

Re: [PATCH iproute2 net-next v2 1/6] Kernel header update for hardware offloading changes.

2019-07-15 Thread Stephen Hemminger
On Mon, 15 Jul 2019 19:40:19 +
"Patel, Vedang"  wrote:

> Hi Stephen, 
> 
> The kernel patches corresponding to this series have been merged. I just 
> wanted to check whether these iproute2 related patches are on your TODO list.
> 
> Let me know if you need any information from me on these patches.
> 
> Thanks,
> Vedang Patel


David Ahern handles iproute2 next

https://patchwork.ozlabs.org/patch/466/


Re: [PATCH iproute2] tc: util: constrain percentage in 0-100 interval

2019-07-15 Thread Stephen Hemminger
On Sat, 13 Jul 2019 11:44:07 +0200
Andrea Claudi  wrote:

> parse_percent() currently allows to specify negative percentages
> or value above 100%. However this does not seems to make sense,
> as the function is used for probabilities or bandiwidth rates.
> 
> Moreover, using negative values leads to erroneous results
> (using Bernoulli loss model as example):
> 
> $ ip link add test type dummy
> $ ip link set test up
> $ tc qdisc add dev test root netem loss gemodel -10% limit 10
> $ tc qdisc show dev test
> qdisc netem 800c: root refcnt 2 limit 10 loss gemodel p 90% r 10% 1-h 100% 
> 1-k 0%
> 
> Using values above 100% we have instead:
> 
> $ ip link add test type dummy
> $ ip link set test up
> $ tc qdisc add dev test root netem loss gemodel 140% limit 10
> $ tc qdisc show dev test
> qdisc netem 800f: root refcnt 2 limit 10 loss gemodel p 40% r 60% 1-h 100% 
> 1-k 0%
> 
> This commit changes parse_percent() with a check to ensure
> percentage values stay between 1.0 and 0.0.
> parse_percent_rate() function, which already employs a similar
> check, is adjusted accordingly.
> 
> With this check in place, we have:
> 
> $ ip link add test type dummy
> $ ip link set test up
> $ tc qdisc add dev test root netem loss gemodel -10% limit 10
> Illegal "loss gemodel p"
> 
> Fixes: 927e3cfb52b58 ("tc: B.W limits can now be specified in %.")
> Signed-off-by: Andrea Claudi 

Looks good. Applied


Re: [PATCH iproute2 v2] utils: don't match empty strings as prefixes

2019-07-15 Thread Stephen Hemminger
On Mon, 15 Jul 2019 20:04:30 +0200
Matteo Croce  wrote:

> iproute has an utility function which checks if a string is a prefix for
> another one, to allow use of abbreviated commands, e.g. 'addr' or 'a'
> instead of 'address'.
> 
> This routine unfortunately considers an empty string as prefix
> of any pattern, leading to undefined behaviour when an empty
> argument is passed to ip:
> 
> # ip ''
> 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group 
> default qlen 1000
> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> inet 127.0.0.1/8 scope host lo
>valid_lft forever preferred_lft forever
> inet6 ::1/128 scope host
>valid_lft forever preferred_lft forever
> 
> # tc ''
> qdisc noqueue 0: dev lo root refcnt 2
> 
> # ip address add 192.0.2.0/24 '' 198.51.100.1 dev dummy0
> # ip addr show dev dummy0
> 6: dummy0:  mtu 1500 qdisc noop state DOWN group default 
> qlen 1000
> link/ether 02:9d:5e:e9:3f:c0 brd ff:ff:ff:ff:ff:ff
> inet 192.0.2.0/24 brd 198.51.100.1 scope global dummy0
>valid_lft forever preferred_lft forever
> 
> Rewrite matches() so it takes care of an empty input, and doesn't
> scan the input strings three times: the actual implementation
> does 2 strlen and a memcpy to accomplish the same task.
> 
> Signed-off-by: Matteo Croce 

Thanks for following up. Applied


Re: [PATCH iproute2 master 0/3] devlink dumpit fixes

2019-07-15 Thread Stephen Hemminger
On Wed, 10 Jul 2019 14:03:18 +0300
Tariq Toukan  wrote:

> Hi,
> 
> This series from Aya contains several fixes for devlink health
> dump show command with binary data.
> 
> In patch 1 we replace the usage of doit with a dumpit, which
> is non-blocking and allows transferring larger amount of data.
> 
> Patches 2 and 3 fix the output for binary data prints, for both
> json and non-json.
> 
> Series generated against master commit:
> 2eb23f3e7aaf devlink: Show devlink port number
> 
> Regards,
> Tariq
> 
> Aya Levin (3):
>   devlink: Change devlink health dump show command to dumpit
>   devlink: Fix binary values print
>   devlink: Remove enclosing array brackets binary print with json format
> 
>  devlink/devlink.c | 41 +
>  1 file changed, 21 insertions(+), 20 deletions(-)
> 

Applied


Re: [PATCH iproute2-rc 1/8] rdma: Update uapi headers to add statistic counter support

2019-07-15 Thread Stephen Hemminger
On Wed, 10 Jul 2019 10:24:48 +0300
Leon Romanovsky  wrote:

> From: Mark Zhang 
> 
> Update rdma_netlink.h to kernel commit 6e7be47a5345 ("RDMA/nldev:
> Allow get default counter statistics through RDMA netlink").
> 
> Signed-off-by: Mark Zhang 
> Signed-off-by: Leon Romanovsky 

I am waiting on this until it gets to Linus's tree.


Re: [PATCH iproute2 net-next v3 2/5] taprio: Add support for setting flags

2019-07-15 Thread Stephen Hemminger
On Mon, 15 Jul 2019 15:51:41 -0700
Vedang Patel  wrote:

> @@ -405,6 +420,7 @@ static int taprio_print_opt(struct qdisc_util *qu, FILE 
> *f, struct rtattr *opt)
>   struct rtattr *tb[TCA_TAPRIO_ATTR_MAX + 1];
>   struct tc_mqprio_qopt *qopt = 0;
>   __s32 clockid = CLOCKID_INVALID;
> + __u32 taprio_flags = 0;
>   int i;
>  
>   if (opt == NULL)
> @@ -442,6 +458,11 @@ static int taprio_print_opt(struct qdisc_util *qu, FILE 
> *f, struct rtattr *opt)
>  
>   print_string(PRINT_ANY, "clockid", "clockid %s", 
> get_clock_name(clockid));
>  
> + if (tb[TCA_TAPRIO_ATTR_FLAGS]) {
> + taprio_flags = rta_getattr_u32(tb[TCA_TAPRIO_ATTR_FLAGS]);
> + print_uint(PRINT_ANY, "flags", " flags %x", taprio_flags);
> + }
> +

Overall this looks fine, but three small comments:
1. It is better not to do unnecessary variable initialization
2. It is better to move variables into the basic block where they are used.
3. Use the print_0xhex() instead of print_uint() for hex values. The difference
   is that in the JSON output, print_uint would be decimal but the print_0xhex
   is always hex.  And use "flags %#x" so that it is clear you are printing 
flags in hex.




Re: [PATCH iproute2 net-next v3 3/5] taprio: add support for setting txtime_delay.

2019-07-15 Thread Stephen Hemminger
On Mon, 15 Jul 2019 15:51:42 -0700
Vedang Patel  wrote:

> + if (get_s32(&txtime_delay, *argv, 0)) {

Is txtime_delay of a negative value meaningful?


Re: [PATCH iproute2 net-next v3 2/5] taprio: Add support for setting flags

2019-07-15 Thread Stephen Hemminger
On Mon, 15 Jul 2019 17:15:15 -0700
Jakub Kicinski  wrote:

> On Mon, 15 Jul 2019 16:37:43 -0700, Stephen Hemminger wrote:
> > On Mon, 15 Jul 2019 15:51:41 -0700
> > Vedang Patel  wrote:  
> > > @@ -442,6 +458,11 @@ static int taprio_print_opt(struct qdisc_util *qu, 
> > > FILE *f, struct rtattr *opt)
> > >  
> > >   print_string(PRINT_ANY, "clockid", "clockid %s", 
> > > get_clock_name(clockid));
> > >  
> > > + if (tb[TCA_TAPRIO_ATTR_FLAGS]) {
> > > + taprio_flags = rta_getattr_u32(tb[TCA_TAPRIO_ATTR_FLAGS]);
> > > + print_uint(PRINT_ANY, "flags", " flags %x", taprio_flags);
> > > + }  
> >[...]
> > 3. Use the print_0xhex() instead of print_uint() for hex values. The 
> > difference
> >is that in the JSON output, print_uint would be decimal but the 
> > print_0xhex
> >is always hex.  And use "flags %#x" so that it is clear you are printing 
> > flags in hex.  
> 
> In my humble personal experience scripting tests using iproute2 and
> bpftool with Python I found printing the "hex string" instead of just
> outputing the integer value counter productive :( Even tho it looks
> better to the eye, JSON is primarily for machine processing and hex
> strings have to be manually converted.

If it is hex on normal output, it should be hex on JSON output.
And what ever the normal output format is has to be accepted on command line as 
input.


Re: [PATCH iproute2-rc 2/8] rdma: Add "stat qp show" support

2019-07-16 Thread Stephen Hemminger
On Wed, 10 Jul 2019 10:24:49 +0300
Leon Romanovsky  wrote:

> From: Mark Zhang 
> 
> This patch presents link, id, task name, lqpn, as well as all sub
> counters of a QP counter.
> A QP counter is a dynamically allocated statistic counter that is
> bound with one or more QPs. It has several sub-counters, each is
> used for a different purpose.
> 
> Examples:
> $ rdma stat qp show
> link mlx5_2/1 cntn 5 pid 31609 comm client.1 rx_write_requests 0
> rx_read_requests 0 rx_atomic_requests 0 out_of_buffer 0 out_of_sequence 0
> duplicate_request 0 rnr_nak_retry_err 0 packet_seq_err 0
> implied_nak_seq_err 0 local_ack_timeout_err 0 resp_local_length_error 0
> resp_cqe_error 0 req_cqe_error 0 req_remote_invalid_request 0
> req_remote_access_errors 0 resp_remote_access_errors 0
> resp_cqe_flush_error 0 req_cqe_flush_error 0
> LQPN: <178>
> $ rdma stat show link rocep1s0f5/1
> link rocep1s0f5/1 rx_write_requests 0 rx_read_requests 0 rx_atomic_requests 0 
> out_of_buffer 0 duplicate_request 0
> rnr_nak_retry_err 0 packet_seq_err 0 implied_nak_seq_err 0 
> local_ack_timeout_err 0 resp_local_length_error 0 resp_cqe_error 0
> req_cqe_error 0 req_remote_invalid_request 0 req_remote_access_errors 0 
> resp_remote_access_errors 0 resp_cqe_flush_error 0
> req_cqe_flush_error 0 rp_cnp_ignored 0 rp_cnp_handled 0 
> np_ecn_marked_roce_packets 0 np_cnp_sent 0
> $ rdma stat show link rocep1s0f5/1 -p
> link rocep1s0f5/1
> rx_write_requests 0
> rx_read_requests 0
> rx_atomic_requests 0
> out_of_buffer 0
> duplicate_request 0
> rnr_nak_retry_err 0
> packet_seq_err 0
> implied_nak_seq_err 0
> local_ack_timeout_err 0
> resp_local_length_error 0
> resp_cqe_error 0
> req_cqe_error 0
> req_remote_invalid_request 0
> req_remote_access_errors 0
> resp_remote_access_errors 0
> resp_cqe_flush_error 0
> req_cqe_flush_error 0
> rp_cnp_ignored 0
> rp_cnp_handled 0
> np_ecn_marked_roce_packets 0
> np_cnp_sent 0
> 
> Signed-off-by: Mark Zhang 
> Signed-off-by: Leon Romanovsky 
> ---
>  rdma/Makefile |   2 +-
>  rdma/rdma.c   |   3 +-
>  rdma/rdma.h   |   1 +
>  rdma/stat.c   | 268 ++
>  rdma/utils.c  |   7 ++
>  5 files changed, 279 insertions(+), 2 deletions(-)
>  create mode 100644 rdma/stat.c
> 

Headers have been merged, but this patch does not apply cleanly to current 
iproute2



Re: [PATCH iproute2 0/2] Fix IPv6 tunnel add when dev param is used

2019-07-16 Thread Stephen Hemminger
On Tue,  9 Jul 2019 15:16:49 +0200
Andrea Claudi  wrote:

> Commit ba126dcad20e6 ("ip6tunnel: fix 'ip -6 {show|change} dev
> ' cmds") breaks IPv6 tunnel creation when dev parameter
> is used.
> 
> This series revert the original commit, which mistakenly use
> dev for tunnel name, while addressing a issue on tunnel change
> when no interface name is specified.
> 
> Andrea Claudi (2):
>   Revert "ip6tunnel: fix 'ip -6 {show|change} dev ' cmds"
>   ip tunnel: warn when changing IPv6 tunnel without tunnel name
> 
>  ip/ip6tunnel.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 

Both applied, thanks


Re: [PATCH iproute2 net-next v5 1/5] etf: Add skip_sock_check

2019-07-18 Thread Stephen Hemminger
On Thu, 18 Jul 2019 12:55:39 -0700
Vedang Patel  wrote:

> - print_string(PRINT_ANY, "deadline_mode", "deadline_mode %s",
> + print_string(PRINT_ANY, "deadline_mode", "deadline_mode %s ",
>   (qopt->flags & TC_ETF_DEADLINE_MODE_ON) ? "on" 
> : "off");
> + print_string(PRINT_ANY, "skip_sock_check", "skip_sock_check %s",
> + (qopt->flags & TC_ETF_SKIP_SOCK_CHECK) ? "on" : 
> "off");

These should really be boolean options in JSON, not string values.


Re: [patch net-next rfc 2/7] net: introduce name_node struct to be used in hashlist

2019-07-19 Thread Stephen Hemminger
On Fri, 19 Jul 2019 13:00:24 +0200
Jiri Pirko  wrote:

> From: Jiri Pirko 
> 
> Signed-off-by: Jiri Pirko 
> ---
>  include/linux/netdevice.h | 10 +++-
>  net/core/dev.c| 96 +++
>  2 files changed, 86 insertions(+), 20 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 88292953aa6f..74f99f127b0e 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -918,6 +918,12 @@ struct dev_ifalias {
>  struct devlink;
>  struct tlsdev_ops;
>  
> +struct netdev_name_node {
> + struct hlist_node hlist;
> + struct net_device *dev;
> + char *name

You probably can make this const char *

Do you want to add __rcu to this list?


Re: [patch net-next rfc 0/7] net: introduce alternative names for network interfaces

2019-07-19 Thread Stephen Hemminger
On Fri, 19 Jul 2019 13:00:22 +0200
Jiri Pirko  wrote:

> From: Jiri Pirko 
> 
> In the past, there was repeatedly discussed the IFNAMSIZ (16) limit for
> netdevice name length. Now when we have PF and VF representors
> with port names like "pfXvfY", it became quite common to hit this limit:
> 0123456789012345
> enp131s0f1npf0vf6
> enp131s0f1npf0vf22
> 
> Udev cannot rename these interfaces out-of-the-box and user needs to
> create custom rules to handle them.
> 
> Also, udev has multiple schemes of netdev names. From udev code:
>  * Type of names:
>  *   b - BCMA bus core number
>  *   c - bus id of a grouped CCW or CCW 
> device,
>  *   with all leading zeros stripped 
> [s390]
>  *   o[n|d]
>  * - on-board device index number
>  *   s[f][n|d]
>  * - hotplug slot index number
>  *   x- MAC address
>  *   [P]ps[f][n|d]
>  * - PCI geographical location
>  *   
> [P]ps[f][u][..][c][i]
>  * - USB port number chain
>  *   v   - VIO slot number (IBM PowerVM)
>  *   ai   - Platform bus ACPI instance id
>  *   in  - Netdevsim bus address and port 
> name
> 
> One device can be often renamed by multiple patterns at the
> same time (e.g. pci address/mac).
> 
> This patchset introduces alternative names for network interfaces.
> Main goal is to:
> 1) Overcome the IFNAMSIZ limitation
> 2) Allow to have multiple names at the same time (multiple udev patterns)
> 3) Allow to use alternative names as handle for commands
> 
> See following examples.
> 
> $ ip link
> 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN mode 
> DEFAULT group default qlen 1000
> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> 3: dummy0:  mtu 1500 qdisc noop state DOWN mode DEFAULT 
> group default qlen 1000
> link/ether 7e:a2:d4:b8:91:7a brd ff:ff:ff:ff:ff:ff
> 
> -> Add alternative names for dummy0:  
> 
> $ ip link altname add dummy0 name someothername
> $ ip link altname add dummy0 name someotherveryveryveryverylongname
> $ ip link
> 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN mode 
> DEFAULT group default qlen 1000
> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> 3: dummy0:  mtu 1500 qdisc noop state DOWN mode DEFAULT 
> group default qlen 1000
> link/ether 7e:a2:d4:b8:91:7a brd ff:ff:ff:ff:ff:ff
> altname someothername
> altname someotherveryveryveryverylongname
>   
> -> Add bridge brx, add it's alternative name and use alternative names to  
>do enslavement.
> 
> $ ip link add name brx type bridge
> $ ip link altname add brx name mypersonalsuperspecialbridge
> $ ip link set someotherveryveryveryverylongname master 
> mypersonalsuperspecialbridge
> $ ip link
> 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN mode 
> DEFAULT group default qlen 1000
> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> 3: dummy0:  mtu 1500 qdisc noop master brx state DOWN mode 
> DEFAULT group default qlen 1000
> link/ether 7e:a2:d4:b8:91:7a brd ff:ff:ff:ff:ff:ff
> altname someothername
> altname someotherveryveryveryverylongname
> 4: brx:  mtu 1500 qdisc noop state DOWN mode DEFAULT 
> group default qlen 1000
> link/ether 7e:a2:d4:b8:91:7a brd ff:ff:ff:ff:ff:ff
> altname mypersonalsuperspecialbridge
> 
> -> Add ipv4 address to the bridge using alternative name:  
> 
> $ ip addr add 192.168.0.1/24 dev mypersonalsuperspecialbridge
> $ ip addr show mypersonalsuperspecialbridge 
> 4: brx:  mtu 1500 qdisc noop state DOWN group default 
> qlen 1000
> link/ether 7e:a2:d4:b8:91:7a brd ff:ff:ff:ff:ff:ff
> altname mypersonalsuperspecialbridge
> inet 192.168.0.1/24 scope global brx
>valid_lft forever preferred_lft forever
> 
> -> Delete one of dummy0 alternative names:  
> 
> $ ip link altname del someotherveryveryveryverylongname
> $ ip link
> 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN mode 
> DEFAULT group default qlen 1000
> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> 3: dummy0:  mtu 1500 qdisc noop master brx state DOWN mode 
> DEFAULT group default qlen 1000
> link/ether 7e:a2:d4:b8:91:7a brd ff:ff:ff:ff:ff:ff
> altname someothername
> 4: brx:  mtu 1500 qdisc noop state DOWN mode DEFAULT 
> group default qlen 1000
> link/ether 7e:a2:d4:b8:91:7a brd ff:ff:ff:ff:ff:ff
> altname mypersonalsuperspecialbridge
> 
> TODO:
> - notifications for alternative names add/removal
> - sanitization of add/del cmds (similar to get link)
> - test more usecases and write selftests
> - extend support for other netlink ifaces (ovs for example)
> - add sysfs symlink altname->basename?
> 
> Jiri Pirko (7):
>   net: procfs: use index hashlist instead of name hashlist
>   net: introduce name_node struct to be used in hashlis

Re: [PATCH iproute2-rc v1 0/7] Statistics counter support

2019-07-19 Thread Stephen Hemminger
On Wed, 17 Jul 2019 17:31:49 +0300
Leon Romanovsky  wrote:

> From: Leon Romanovsky 
> 
> Changelog v0->v1:
>  * Fixed typo in manual page (Gal)
>  * Rebased on top of d035cc1b "ip tunnel: warn when changing IPv6 tunnel 
> without tunnel name"
>  * Dropped update header file because it was already merged.
> 
> ---
> 
> Hi,
> 
> This is supplementary part of accepted to rdma-next kernel series,
> that kernel series provided an option to get various counters: global
> and per-objects.
> 
> Currently, all counters are printed in format similar to other
> device/link properties, while "-p" option will print them in table like
> format.
> 
> [leonro@server ~]$ rdma stat show
> link mlx5_0/1 rx_write_requests 0 rx_read_requests 0 rx_atomic_requests
> 0 out_of_buffer 0 duplicate_request 0 rnr_nak_retry_err 0 packet_seq_err
> 0 implied_nak_seq_err 0 local_ack_timeout_err 0 resp_local_length_error
> 0 resp_cqe_error 0 req_cqe_error 0 req_remote_invalid_request 0
> req_remote_access_errors 0 resp_remote_access_errors 0
> resp_cqe_flush_error 0 req_cqe_flush_error 0 rp_cnp_ignored 0
> rp_cnp_handled 0 np_ecn_marked_roce_packets 0 np_cnp_sent 0
> 
> [leonro@server ~]$ rdma stat show -p
> link mlx5_0/1
>   rx_write_requests 0
>   rx_read_requests 0
>   rx_atomic_requests 0
>   out_of_buffer 0
>   duplicate_request 0
>   rnr_nak_retry_err 0
>   packet_seq_err 0
>   implied_nak_seq_err 0
>   local_ack_timeout_err 0
>   resp_local_length_error 0
>   resp_cqe_error 0
>   req_cqe_error 0
>   req_remote_invalid_request 0
>   req_remote_access_errors 0
>   resp_remote_access_errors 0
>   resp_cqe_flush_error 0
>   req_cqe_flush_error 0
>   rp_cnp_ignored 0
>   rp_cnp_handled 0
>   np_ecn_marked_roce_packets 0
>   np_cnp_sent 0
> 
> Thanks
> 
> Mark Zhang (7):
>   rdma: Add "stat qp show" support
>   rdma: Add get per-port counter mode support
>   rdma: Add rdma statistic counter per-port auto mode support
>   rdma: Make get_port_from_argv() returns valid port in strict port mode
>   rdma: Add stat manual mode support
>   rdma: Add default counter show support
>   rdma: Document counter statistic
> 
>  man/man8/rdma-dev.8   |   1 +
>  man/man8/rdma-link.8  |   1 +
>  man/man8/rdma-resource.8  |   1 +
>  man/man8/rdma-statistic.8 | 167 +
>  man/man8/rdma.8   |   7 +-
>  rdma/Makefile |   2 +-
>  rdma/rdma.c   |   3 +-
>  rdma/rdma.h   |   1 +
>  rdma/stat.c   | 759 ++
>  rdma/utils.c  |  17 +-
>  10 files changed, 954 insertions(+), 5 deletions(-)
>  create mode 100644 man/man8/rdma-statistic.8
>  create mode 100644 rdma/stat.c
> 
> --
> 2.20.1
> 

Applied now, thanks


Re: [PATCH iproute2] json: fix backslash escape typo in jsonw_puts

2019-07-19 Thread Stephen Hemminger
On Wed, 17 Jul 2019 18:15:31 -0700
Ivan Delalande  wrote:

> Fixes: fcc16c22 ("provide common json output formatter")
> Signed-off-by: Ivan Delalande 

Applied.


Re: [patch net-next rfc 2/7] net: introduce name_node struct to be used in hashlist

2019-07-19 Thread Stephen Hemminger
On Fri, 19 Jul 2019 21:17:40 +0200
Jiri Pirko  wrote:

> Fri, Jul 19, 2019 at 06:29:36PM CEST, step...@networkplumber.org wrote:
> >On Fri, 19 Jul 2019 13:00:24 +0200
> >Jiri Pirko  wrote:
> >  
> >> From: Jiri Pirko 
> >> 
> >> Signed-off-by: Jiri Pirko 
> >> ---
> >>  include/linux/netdevice.h | 10 +++-
> >>  net/core/dev.c| 96 +++
> >>  2 files changed, 86 insertions(+), 20 deletions(-)
> >> 
> >> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> >> index 88292953aa6f..74f99f127b0e 100644
> >> --- a/include/linux/netdevice.h
> >> +++ b/include/linux/netdevice.h
> >> @@ -918,6 +918,12 @@ struct dev_ifalias {
> >>  struct devlink;
> >>  struct tlsdev_ops;
> >>  
> >> +struct netdev_name_node {
> >> +  struct hlist_node hlist;
> >> +  struct net_device *dev;
> >> +  char *name  
> >
> >You probably can make this const char *  

Don't bother, it looks ok as is. the problem is you would have
to cast it when calling free.

> >Do you want to add __rcu to this list?  
> 
> Which list?
> 

   struct netdev_name_node __rcu *name_node;

You might also want to explictly init the hlist node rather
than relying on the fact that zero is an empty node ptr.

 
 static struct netdev_name_node *netdev_name_node_alloc(struct net_device *dev,
-  char *name)
+  const char *name)
 {
struct netdev_name_node *name_node;
 
-   name_node = kzalloc(sizeof(*name_node), GFP_KERNEL);
+   name_node = kmalloc(sizeof(*name_node));
if (!name_node)
return NULL;
+
+   INIT_HLIST_NODE(&name_node->hlist);
name_node->dev = dev;
name_node->name = name;
return name_node;



Re: [PATCH iproute2] etf: make printing of variable JSON friendly

2019-07-19 Thread Stephen Hemminger
On Fri, 19 Jul 2019 14:40:43 -0700
Vedang Patel  wrote:

> In iproute2 txtime-assist series, it was pointed out that print_bool()
> should be used to print binary values. This is to make it JSON friendly.
> 
> So, make the corresponding changes in ETF.
> 
> Fixes: 8ccd49383cdc ("etf: Add skip_sock_check")
> Reported-by: Stephen Hemminger 
> Signed-off-by: Vedang Patel 
> ---
>  tc/q_etf.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/tc/q_etf.c b/tc/q_etf.c
> index c2090589bc64..307c50eed48b 100644
> --- a/tc/q_etf.c
> +++ b/tc/q_etf.c
> @@ -176,12 +176,12 @@ static int etf_print_opt(struct qdisc_util *qu, FILE 
> *f, struct rtattr *opt)
>get_clock_name(qopt->clockid));
>  
>   print_uint(PRINT_ANY, "delta", "delta %d ", qopt->delta);
> - print_string(PRINT_ANY, "offload", "offload %s ",
> - (qopt->flags & TC_ETF_OFFLOAD_ON) ? "on" : 
> "off");
> - print_string(PRINT_ANY, "deadline_mode", "deadline_mode %s ",
> - (qopt->flags & TC_ETF_DEADLINE_MODE_ON) ? "on" 
> : "off");
> - print_string(PRINT_ANY, "skip_sock_check", "skip_sock_check %s",
> - (qopt->flags & TC_ETF_SKIP_SOCK_CHECK) ? "on" : 
> "off");
> + if (qopt->flags & TC_ETF_OFFLOAD_ON)
> + print_bool(PRINT_ANY, "offload", "offload ", true);
> + if (qopt->flags & TC_ETF_DEADLINE_MODE_ON)
> + print_bool(PRINT_ANY, "deadline_mode", "deadline_mode ", true);
> + if (qopt->flags & TC_ETF_SKIP_SOCK_CHECK)
> + print_bool(PRINT_ANY, "skip_sock_check", "skip_sock_check", 
> true);
>  
>   return 0;
>  }

Thanks, but if you are only going to print json boolean if true, then a common
way to indicate that something is enabled is to use print_null().

Could you do that instead?


Re: [patch iproute2 1/2] tc: action: fix crash caused by incorrect *argv check

2019-07-23 Thread Stephen Hemminger
On Tue, 23 Jul 2019 13:25:37 +0200
Jiri Pirko  wrote:

> From: Jiri Pirko 
> 
> One cannot depend on *argv being null in case of no arg is left on the
> command line. For example in batch mode, this is not always true. Check
> argc instead to prevent crash.
> 
> Reported-by: Alex Kushnarov 
> Fixes: fd8b3d2c1b9b ("actions: Add support for user cookies")
> Signed-off-by: Jiri Pirko 
> ---
>  tc/m_action.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tc/m_action.c b/tc/m_action.c
> index ab6bc0ad28ff..0f9c3a27795d 100644
> --- a/tc/m_action.c
> +++ b/tc/m_action.c
> @@ -222,7 +222,7 @@ done0:
>   goto bad_val;
>   }
>  
> - if (*argv && strcmp(*argv, "cookie") == 0) {
> + if (argc && strcmp(*argv, "cookie") == 0) {
>   size_t slen;
>  
>   NEXT_ARG();

Ok, but we should also fix batch mode to null last argv


Re: [patch iproute2 1/2] tc: action: fix crash caused by incorrect *argv check

2019-07-23 Thread Stephen Hemminger
On Tue, 23 Jul 2019 13:25:37 +0200
Jiri Pirko  wrote:

> From: Jiri Pirko 
> 
> One cannot depend on *argv being null in case of no arg is left on the
> command line. For example in batch mode, this is not always true. Check
> argc instead to prevent crash.
> 
> Reported-by: Alex Kushnarov 
> Fixes: fd8b3d2c1b9b ("actions: Add support for user cookies")
> Signed-off-by: Jiri Pirko 

Actually makeargs does NULL terminate the last arg so what input
to batchmode is breaking this?


Re: [PATCH iproute2] etf: make printing of variable JSON friendly

2019-07-23 Thread Stephen Hemminger
On Tue, 23 Jul 2019 21:34:46 +
"Patel, Vedang"  wrote:

> > On Jul 22, 2019, at 5:11 PM, David Ahern  wrote:
> > 
> > On 7/22/19 1:11 PM, Patel, Vedang wrote:  
> >> 
> >>   
> >>> On Jul 22, 2019, at 11:21 AM, David Ahern  wrote:
> >>> 
> >>> On 7/19/19 3:40 PM, Vedang Patel wrote:  
> >>>> In iproute2 txtime-assist series, it was pointed out that print_bool()
> >>>> should be used to print binary values. This is to make it JSON friendly.
> >>>> 
> >>>> So, make the corresponding changes in ETF.
> >>>> 
> >>>> Fixes: 8ccd49383cdc ("etf: Add skip_sock_check")
> >>>> Reported-by: Stephen Hemminger 
> >>>> Signed-off-by: Vedang Patel 
> >>>> ---
> >>>> tc/q_etf.c | 12 ++--
> >>>> 1 file changed, 6 insertions(+), 6 deletions(-)
> >>>> 
> >>>> diff --git a/tc/q_etf.c b/tc/q_etf.c
> >>>> index c2090589bc64..307c50eed48b 100644
> >>>> --- a/tc/q_etf.c
> >>>> +++ b/tc/q_etf.c
> >>>> @@ -176,12 +176,12 @@ static int etf_print_opt(struct qdisc_util *qu, 
> >>>> FILE *f, struct rtattr *opt)
> >>>>   get_clock_name(qopt->clockid));
> >>>> 
> >>>>  print_uint(PRINT_ANY, "delta", "delta %d ", qopt->delta);
> >>>> -print_string(PRINT_ANY, "offload", "offload %s ",
> >>>> -(qopt->flags & TC_ETF_OFFLOAD_ON) ? 
> >>>> "on" : "off");
> >>>> -print_string(PRINT_ANY, "deadline_mode", "deadline_mode %s ",
> >>>> -(qopt->flags & TC_ETF_DEADLINE_MODE_ON) 
> >>>> ? "on" : "off");
> >>>> -print_string(PRINT_ANY, "skip_sock_check", "skip_sock_check %s",
> >>>> -(qopt->flags & TC_ETF_SKIP_SOCK_CHECK) 
> >>>> ? "on" : "off");
> >>>> +if (qopt->flags & TC_ETF_OFFLOAD_ON)
> >>>> +print_bool(PRINT_ANY, "offload", "offload ", true);
> >>>> +if (qopt->flags & TC_ETF_DEADLINE_MODE_ON)
> >>>> +print_bool(PRINT_ANY, "deadline_mode", "deadline_mode 
> >>>> ", true);
> >>>> +if (qopt->flags & TC_ETF_SKIP_SOCK_CHECK)
> >>>> +print_bool(PRINT_ANY, "skip_sock_check", 
> >>>> "skip_sock_check", true);
> >>>> 
> >>>>  return 0;
> >>>> }
> >>>>   
> >>> 
> >>> This changes existing output for TC_ETF_OFFLOAD_ON and
> >>> TC_ETF_DEADLINE_MODE_ON which were added a year ago.  
> >> Yes, this is a good point. I missed that. 
> >> 
> >> Another idea is to use is_json_context() and call print_bool() there. But, 
> >> that will still change values corresponding to the json output for the 
> >> above flags from “on”/“off” to “true”/“false”. I am not sure if this is a 
> >> big issue. 
> >> 
> >> My suggestion is to keep the code as is. what do you think?
> >>   
> > 
> > I think we need automated checkers for new code. ;-)
> > 
> > The first 2 should not change for backward compatibility - unless there
> > is agreement that this feature is too new and long term it is better to
> > print as above.
> > 
> > Then the new one should follow context of the other 2 - consistency IMHO
> > takes precedence.  
> Thanks for the inputs. 
> 
> Let’s keep whatever is currently present upstream and you can ignore this 
> patch.
> 
> Thanks,
> Vedang
Agreed. At this point consistency is better.
Maybe at some future point, all the JSON will be reviewed and fixed (yes it 
would be a breaking flag day).
But for now inconsistent usage across ip, tc, and devlink is a fact of life.


Re: [PATCH] iproute2: devlink: use sys/queue.h from libbsd as a fallback

2019-07-26 Thread Stephen Hemminger
On Wed, 24 Jul 2019 09:18:38 +0100
Sergei Trofimovich  wrote:

> On sys/queue.h does not exist linux-musl targets and
> fails build as:
> 
> devlink.c:28:10: fatal error: sys/queue.h: No such file or directory
>28 | #include 
>   |  ^
> 
> The change pulls in 'sys/queue.h' from libbsd in case
> system headers don't already provides it.
> 
> Tested on linux-musl and linux-glibc.
> 
> Bug: https://bugs.gentoo.org/690486
> CC: Stephen Hemminger 
> CC: netdev@vger.kernel.org
> Signed-off-by: Sergei Trofimovich 
> ---

This is ugly and causes more maintainability issues.

Maybe just fix devlink not to depend on sys/queue.h at all.
It makes more sense to have common code style and usage across all of
iproute2.

We already have local version list.h, why continue with BSD stuff.


Re: [PATCH iproute2] iplink: document the 'link change' subcommand

2019-07-26 Thread Stephen Hemminger
On Wed, 24 Jul 2019 21:12:18 +0200
Matteo Croce  wrote:

> ip link can set parameters both via the 'set' and 'change' keyword.
> In fact, 'change' is an alias for 'set'.
> Document this in the help and manpage.
> 
> Fixes: 1d93483985f0 ("iplink: use netlink for link configuration")
> Signed-off-by: Matteo Croce 

Probably just done originally for compatibility in some way with ip route.
Not sure if it really needs to be documented.


Re: [patch iproute2 1/2] tc: action: fix crash caused by incorrect *argv check

2019-07-26 Thread Stephen Hemminger
On Tue, 23 Jul 2019 21:36:00 +0200
Jiri Pirko  wrote:

> Tue, Jul 23, 2019 at 07:54:01PM CEST, step...@networkplumber.org wrote:
> >On Tue, 23 Jul 2019 13:25:37 +0200
> >Jiri Pirko  wrote:
> >  
> >> From: Jiri Pirko 
> >> 
> >> One cannot depend on *argv being null in case of no arg is left on the
> >> command line. For example in batch mode, this is not always true. Check
> >> argc instead to prevent crash.
> >> 
> >> Reported-by: Alex Kushnarov 
> >> Fixes: fd8b3d2c1b9b ("actions: Add support for user cookies")
> >> Signed-off-by: Jiri Pirko   
> >
> >Actually makeargs does NULL terminate the last arg so what input
> >to batchmode is breaking this?  
> 
> Interesting, there must be another but out there then.
> 
> My input is:
> filter add dev testdummy parent : protocol all prio 11000 flower action 
> drop
> filter add dev testdummy parent : protocol ipv4 prio 1 flower dst_mac 
> 11:22:33:44:55:66 action drop

This maybe related. Looks like the batchsize patches had issues.

# valgrind ./tc/tc -batch filter.bat 
==27348== Memcheck, a memory error detector
==27348== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==27348== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==27348== Command: ./tc/tc -batch filter.bat
==27348== 
==27348== Conditional jump or move depends on uninitialised value(s)
==27348==at 0x4EE9C0C: getdelim (iogetdelim.c:59)
==27348==by 0x152A37: getline (stdio.h:120)
==27348==by 0x152A37: getcmdline (utils.c:1311)
==27348==by 0x115543: batch (tc.c:358)
==27348==by 0x4E9D09A: (below main) (libc-start.c:308)
==27348== 
==27348== Conditional jump or move depends on uninitialised value(s)
==27348==at 0x152BE4: makeargs (utils.c:1359)
==27348==by 0x115614: batch (tc.c:366)
==27348==by 0x4E9D09A: (below main) (libc-start.c:308)
==27348== 
==27348== Conditional jump or move depends on uninitialised value(s)
==27348==at 0x11EBFD: parse_action (m_action.c:225)
==27348==by 0x13633E: flower_parse_opt (f_flower.c:1285)
==27348==by 0x1190EB: tc_filter_modify (tc_filter.c:217)
==27348==by 0x115674: batch (tc.c:404)
==27348==by 0x4E9D09A: (below main) (libc-start.c:308)
==27348== 
==27348== Use of uninitialised value of size 8
==27348==at 0x11EC0B: parse_action (m_action.c:225)
==27348==by 0x13633E: flower_parse_opt (f_flower.c:1285)
==27348==by 0x1190EB: tc_filter_modify (tc_filter.c:217)
==27348==by 0x115674: batch (tc.c:404)
==27348==by 0x4E9D09A: (below main) (libc-start.c:308)
==27348== 
Error: Parent Qdisc doesn't exists.
Error: Parent Qdisc doesn't exists.
Command failed filter.bat:1


Re: [patch iproute2 1/2] tc: action: fix crash caused by incorrect *argv check

2019-07-26 Thread Stephen Hemminger
On Tue, 23 Jul 2019 13:25:37 +0200
Jiri Pirko  wrote:

> From: Jiri Pirko 
> 
> One cannot depend on *argv being null in case of no arg is left on the
> command line. For example in batch mode, this is not always true. Check
> argc instead to prevent crash.
> 
> Reported-by: Alex Kushnarov 
> Fixes: fd8b3d2c1b9b ("actions: Add support for user cookies")
> Signed-off-by: Jiri Pirko 
> ---
>  tc/m_action.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tc/m_action.c b/tc/m_action.c
> index ab6bc0ad28ff..0f9c3a27795d 100644
> --- a/tc/m_action.c
> +++ b/tc/m_action.c
> @@ -222,7 +222,7 @@ done0:
>   goto bad_val;
>   }
>  
> - if (*argv && strcmp(*argv, "cookie") == 0) {
> + if (argc && strcmp(*argv, "cookie") == 0) {
>   size_t slen;
>  
>   NEXT_ARG();


The logic here is broken at end of file.

do {
if (getcmdline(&line_next, &len, stdin) == -1)
lastline = true;

largc_next = makeargs(line_next, largv_next, 100);
bs_enabled_next = batchsize_enabled(largc_next, largv_next);
if (bs_enabled) {
struct batch_


getcmdline() will return -1 at end of file.
The code will call make_args on an uninitialized pointer.

I see lots of other unnecessary complexity in the whole batch logic.
It needs to be rewritten.

Rather than me fixing the code, I am probably going to revert.

commit 485d0c6001c4aa134b99c86913d6a7089b7b2ab0
Author: Chris Mi 
Date:   Fri Jan 12 14:13:16 2018 +0900

tc: Add batchsize feature for filter and actions


Re: [patch iproute2 2/2] tc: batch: fix line/line_next processing in batch

2019-07-26 Thread Stephen Hemminger
On Tue, 23 Jul 2019 13:25:38 +0200
Jiri Pirko  wrote:

> From: Jiri Pirko 
> 
> When getcmdline fails, there is no valid string in line_next.
> So change the flow and don't process it. Alongside with that,
> free the previous line buffer and prevent memory leak.
> 
> Fixes: 485d0c6001c4 ("tc: Add batchsize feature for filter and actions")
> Signed-off-by: Jiri Pirko 


This is not sufficient to avoid valgrind detected uninitialized memory.
The following changes to your patch (#2 alone) is enough to fix
the issue.

The logic here is still a mess and needs to be cleaned up to avoid
future related bugs.

From bbcc22899566556ced9692e4811aea2a38817834 Mon Sep 17 00:00:00 2001
From: Jiri Pirko 
Date: Tue, 23 Jul 2019 13:25:38 +0200
Subject: [PATCH] tc: batch: fix line/line_next processing in batch

When getcmdline fails, there is no valid string in line_next.
So change the flow and don't process it. Alongside with that,
free the previous line buffer and prevent memory leak.

Also, avoid passing uninitialized memory to filters.

Fixes: 485d0c6001c4 ("tc: Add batchsize feature for filter and actions")
Signed-off-by: Jiri Pirko 
Signed-off-by: Stephen Hemminger 
---
 tc/tc.c | 84 +
 1 file changed, 43 insertions(+), 41 deletions(-)

diff --git a/tc/tc.c b/tc/tc.c
index 64e342dd85bf..95e5481955ad 100644
--- a/tc/tc.c
+++ b/tc/tc.c
@@ -325,11 +325,10 @@ static int batch(const char *name)
 {
struct batch_buf *head = NULL, *tail = NULL, *buf_pool = NULL;
char *largv[100], *largv_next[100];
-   char *line, *line_next = NULL;
+   char *line = NULL, *line_next = NULL;
bool bs_enabled = false;
bool lastline = false;
int largc, largc_next;
-   bool bs_enabled_saved;
bool bs_enabled_next;
int batchsize = 0;
size_t len = 0;
@@ -360,47 +359,40 @@ static int batch(const char *name)
largc = makeargs(line, largv, 100);
bs_enabled = batchsize_enabled(largc, largv);
do {
-   if (getcmdline(&line_next, &len, stdin) == -1)
+   if (getcmdline(&line_next, &len, stdin) == -1) {
lastline = true;
-
-   largc_next = makeargs(line_next, largv_next, 100);
-   bs_enabled_next = batchsize_enabled(largc_next, largv_next);
-   if (bs_enabled) {
-   struct batch_buf *buf;
-
-   buf = get_batch_buf(&buf_pool, &head, &tail);
-   if (!buf) {
-   fprintf(stderr,
-   "failed to allocate batch_buf\n");
-   return -1;
-   }
-   ++batchsize;
-   }
-
-   /*
-* In batch mode, if we haven't accumulated enough commands
-* and this is not the last command and this command & next
-* command both support the batchsize feature, don't send the
-* message immediately.
-*/
-   if (!lastline && bs_enabled && bs_enabled_next
-   && batchsize != MSG_IOV_MAX)
-   send = false;
-   else
send = true;
+   } else {
+   largc_next = makeargs(line_next, largv_next, 100);
+   bs_enabled_next = batchsize_enabled(largc_next, 
largv_next);
+   if (bs_enabled) {
+   struct batch_buf *buf;
+
+   buf = get_batch_buf(&buf_pool, &head, &tail);
+   if (!buf) {
+   fprintf(stderr,
+   "failed to allocate 
batch_buf\n");
+   return -1;
+   }
+   ++batchsize;
+   }
 
-   line = line_next;
-   line_next = NULL;
-   len = 0;
-   bs_enabled_saved = bs_enabled;
-   bs_enabled = bs_enabled_next;
-
-   if (largc == 0) {
-   largc = largc_next;
-   memcpy(largv, largv_next, largc * sizeof(char *));
-   continue;   /* blank line */
+   /*
+* In batch mode, if we haven't accumulated enough
+* commands and this is not the last command and this
+* command & next command both support the batchsize
+* feature, don't send the message immediately.
+*/
+   if (bs_enabled && bs_enabled_next
+

[PATCH] iplink: document 'change' option to ip link

2019-07-26 Thread Stephen Hemminger
Add the command alias "change" to man page.
Don't show it on usage, since it is not commonly used.

Reported-off-by: Matteo Croce 
Signed-off-by: Stephen Hemminger 
---
 man/man8/ip-link.8.in | 5 +
 1 file changed, 5 insertions(+)

diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index 883d88077d66..a8ae72d2b097 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -1815,6 +1815,11 @@ can move the system to an unpredictable state. The 
solution
 is to avoid changing several parameters with one
 .B ip link set
 call.
+The modifier
+.B change
+is equivalent to
+.BR "set" .
+
 
 .TP
 .BI dev " DEVICE "
-- 
2.20.1



Re: [PATCH v2] iproute2: devlink: port from sys/queue.h to list.h

2019-07-26 Thread Stephen Hemminger
On Fri, 26 Jul 2019 22:01:05 +0100
Sergei Trofimovich  wrote:

> sys/queue.h does not exist on linux-musl targets and fails build as:
> 
> devlink.c:28:10: fatal error: sys/queue.h: No such file or directory
>28 | #include 
>   |  ^
> 
> The change ports to list.h API and drops dependency of 'sys/queue.h'.
> The API maps one-to-one.
> 
> Build-tested on linux-musl and linux-glibc.
> 
> Bug: https://bugs.gentoo.org/690486
> CC: Stephen Hemminger 
> CC: netdev@vger.kernel.org
> Signed-off-by: Sergei Trofimovich 

Looks good, thanks for following through.
Applied.


Re: [PATCH] iplink_can: fix format output of clock with flag -details

2019-07-26 Thread Stephen Hemminger
On Fri, 26 Jul 2019 15:06:09 +0200
Antonio Borneo  wrote:

> The command
>   ip -details link show can0
> prints in the last line the value of the clock frequency attached
> to the name of the following value "numtxqueues", e.g.
>   clock 4950numtxqueues 1 numrxqueues 1 gso_max_size
>65536 gso_max_segs 65535
> 
> Add the missing space after the clock value.
> 
> Signed-off-by: Antonio Borneo 

Applied, but CAN looks like it doesn't support single line output correctly.


Re: ip route JSON format is unparseable for "unreachable" routes

2019-07-28 Thread Stephen Hemminger
On Sun, 28 Jul 2019 13:09:55 +0200
Michael Ziegler  wrote:

> Hi,
> 
> I created a couple "unreachable" routes on one of my systems, like such:
> 
> > ip route add unreachable 10.0.0.0/8 metric 255
> > ip route add unreachable 192.168.0.0/16 metric 255  
> 
> Unfortunately this results in unparseable JSON output from "ip":
> 
> > # ip -j route show  | jq .
> > parse error: Objects must consist of key:value pairs at line 1, column 84  
> 
> The offending JSON objects are these:
> 
> > {"unreachable","dst":"10.0.0.0/8","metric":255,"flags":[]}
> > {"unreachable","dst":"192.168.0.0/16","metric":255,"flags":[]}  
> "unreachable" cannot appear on its own here, it needs to be some kind of
> field.
> 
> The manpage says to report here, thus I do :) I've searched the
> archives, but I wasn't able to find any existing bug reports about this.
> I'm running version
> 
> > ip utility, iproute2-ss190107  
> 
> on Debian Buster.
> 
> Regards,
> Michael.

Already fixed upstream by:

commit 073661773872709518d35d4d093f3a715281f21d
Author: Matteo Croce 
Date:   Mon Mar 18 18:19:29 2019 +0100

ip route: print route type in JSON output

ip route generates an invalid JSON if the route type has to be printed,
eg. when detailed mode is active, or the type is different that unicast:

$ ip -d -j -p route show
[ {"unicast",
"dst": "192.168.122.0/24",
"dev": "virbr0",
"protocol": "kernel",
"scope": "link",
"prefsrc": "192.168.122.1",
"flags": [ "linkdown" ]
} ]

$ ip -j -p route show
[ {"unreachable",
"dst": "192.168.23.0/24",
"flags": [ ]
},{"prohibit",
"dst": "192.168.24.0/24",
"flags": [ ]
},{"blackhole",
"dst": "192.168.25.0/24",
"flags": [ ]
} ]

Fix it by printing the route type as the "type" attribute:

$ ip -d -j -p route show
[ {
"type": "unicast",
"dst": "default",
"gateway": "192.168.85.1",
"dev": "wlp3s0",
"protocol": "dhcp",
"scope": "global",
"metric": 600,
"flags": [ ]
},{
"type": "unreachable",
"dst": "192.168.23.0/24",
    "protocol": "boot",
"scope": "global",
"flags": [ ]
},{
"type": "prohibit",
"dst": "192.168.24.0/24",
"protocol": "boot",
"scope": "global",
"flags": [ ]
},{
"type": "blackhole",
"dst": "192.168.25.0/24",
"protocol": "boot",
"scope": "global",
"flags": [ ]
} ]

Fixes: 663c3cb23103 ("iproute: implement JSON and color output")
Acked-by: Phil Sutter 
Reviewed-and-tested-by: Andrea Claudi 
Signed-off-by: Matteo Croce 
Signed-off-by: Stephen Hemminger 


Re: [PATCH iproute2 0/1] Fix s64 argument parsing

2019-07-29 Thread Stephen Hemminger
On Mon, 29 Jul 2019 13:04:09 +0200
Kurt Kanzenbach  wrote:

> On Thu, Jul 04, 2019 at 02:24:26PM +0200, Kurt Kanzenbach wrote:
> > Hi,
> >
> > while using the TAPRIO Qdisc on ARM32 I've noticed that the base_time 
> > parameter is
> > incorrectly configured. The problem is the utility function get_s64() used 
> > by
> > TAPRIO doesn't parse the value correctly.  
> 
> polite ping.
> 
> >
> > Thanks,
> > Kurt
> >
> > Kurt Kanzenbach (1):
> >   utils: Fix get_s64() function
> >
> >  lib/utils.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > --
> > 2.11.0
> >  

Not sure why this got marked "Changes Requested"
Applied.


pgprPkXZxKgQO.pgp
Description: OpenPGP digital signature


Re: [PATCH v4 net-next 15/19] ionic: Add netdev-event handling

2019-07-30 Thread Stephen Hemminger
On Mon, 22 Jul 2019 14:40:19 -0700
Shannon Nelson  wrote:

> +
> +static void ionic_lif_set_netdev_info(struct lif *lif)
> +{
> + struct ionic_admin_ctx ctx = {
> + .work = COMPLETION_INITIALIZER_ONSTACK(ctx.work),
> + .cmd.lif_setattr = {
> + .opcode = CMD_OPCODE_LIF_SETATTR,
> + .index = cpu_to_le16(lif->index),
> + .attr = IONIC_LIF_ATTR_NAME,
> + },
> + };
> +
> + strlcpy(ctx.cmd.lif_setattr.name, lif->netdev->name,
> + sizeof(ctx.cmd.lif_setattr.name));
> +
> + dev_info(lif->ionic->dev, "NETDEV_CHANGENAME %s %s\n",
> +  lif->name, ctx.cmd.lif_setattr.name);
> +

There is already a kernel message for this. Repeating the same thing in the
driver is redundant.


[RFC iproute2 0/4] Revert tc batchsize feature

2019-07-31 Thread Stephen Hemminger
The batchsize feature of tc might save a few cycles but it
is a maintaince nightmare, it has uninitialized variables and
poor error handling. 

This patch set reverts back to the original state.
Please don't resubmit original code. Go back to the drawing
board and do something generic.  For example, the routing
daemons have figured out that by using multiple threads and
turning off the netlink ACK they can update millions of routes
quickly.

Stephen Hemminger (4):
  Revert "tc: Remove pointless assignments in batch()"
  Revert "tc: flush after each command in batch mode"
  Revert "tc: fix batch force option"
  Revert "tc: Add batchsize feature for filter and actions"

 tc/m_action.c  |  65 ++--
 tc/tc.c| 201 -
 tc/tc_common.h |   7 +-
 tc/tc_filter.c | 129 ---
 4 files changed, 87 insertions(+), 315 deletions(-)



[RFC iproute2 3/4] Revert "tc: fix batch force option"

2019-07-31 Thread Stephen Hemminger
This reverts commit b133392468d1f404077a8f3554d1f63d48bb45e8.
---
 tc/tc.c | 19 ---
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/tc/tc.c b/tc/tc.c
index c115155b2234..b7b6bd288897 100644
--- a/tc/tc.c
+++ b/tc/tc.c
@@ -334,7 +334,6 @@ static int batch(const char *name)
int batchsize = 0;
size_t len = 0;
int ret = 0;
-   int err;
bool send;
 
batch_mode = 1;
@@ -403,9 +402,9 @@ static int batch(const char *name)
continue;   /* blank line */
}
 
-   err = do_cmd(largc, largv, tail == NULL ? NULL : tail->buf,
+   ret = do_cmd(largc, largv, tail == NULL ? NULL : tail->buf,
 tail == NULL ? 0 : sizeof(tail->buf));
-   if (err != 0) {
+   if (ret != 0) {
fprintf(stderr, "Command failed %s:%d\n", name,
cmdlineno - 1);
ret = 1;
@@ -427,17 +426,15 @@ static int batch(const char *name)
iov->iov_len = n->nlmsg_len;
}
 
-   err = rtnl_talk_iov(&rth, iovs, batchsize, NULL);
-   put_batch_bufs(&buf_pool, &head, &tail);
-   free(iovs);
-   if (err < 0) {
+   ret = rtnl_talk_iov(&rth, iovs, batchsize, NULL);
+   if (ret < 0) {
fprintf(stderr, "Command failed %s:%d\n", name,
-   cmdlineno - (batchsize + err) - 1);
-   ret = 1;
-   if (!force)
-   break;
+   cmdlineno - (batchsize + ret) - 1);
+   return 2;
}
+   put_batch_bufs(&buf_pool, &head, &tail);
batchsize = 0;
+   free(iovs);
}
} while (!lastline);
 
-- 
2.20.1



[RFC iproute2 2/4] Revert "tc: flush after each command in batch mode"

2019-07-31 Thread Stephen Hemminger
This reverts commit d66fdfda71e4a30c1ca0ddb7b1a048bef30fe79e.
---
 tc/tc.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tc/tc.c b/tc/tc.c
index 1f23971ae4b9..c115155b2234 100644
--- a/tc/tc.c
+++ b/tc/tc.c
@@ -405,7 +405,6 @@ static int batch(const char *name)
 
err = do_cmd(largc, largv, tail == NULL ? NULL : tail->buf,
 tail == NULL ? 0 : sizeof(tail->buf));
-   fflush(stdout);
if (err != 0) {
fprintf(stderr, "Command failed %s:%d\n", name,
cmdlineno - 1);
-- 
2.20.1



[RFC iproute2 4/4] Revert "tc: Add batchsize feature for filter and actions"

2019-07-31 Thread Stephen Hemminger
This reverts commit 485d0c6001c4aa134b99c86913d6a7089b7b2ab0.
---
 tc/m_action.c  |  65 ++--
 tc/tc.c| 199 -
 tc/tc_common.h |   7 +-
 tc/tc_filter.c | 129 
 4 files changed, 87 insertions(+), 313 deletions(-)

diff --git a/tc/m_action.c b/tc/m_action.c
index ab6bc0ad28ff..bdc62720879c 100644
--- a/tc/m_action.c
+++ b/tc/m_action.c
@@ -556,61 +556,40 @@ bad_val:
return ret;
 }
 
-struct tc_action_req {
-   struct nlmsghdr n;
-   struct tcamsg   t;
-   charbuf[MAX_MSG];
-};
-
 static int tc_action_modify(int cmd, unsigned int flags,
-   int *argc_p, char ***argv_p,
-   void *buf, size_t buflen)
+   int *argc_p, char ***argv_p)
 {
-   struct tc_action_req *req, action_req;
-   char **argv = *argv_p;
-   struct rtattr *tail;
int argc = *argc_p;
-   struct iovec iov;
+   char **argv = *argv_p;
int ret = 0;
-
-   if (buf) {
-   req = buf;
-   if (buflen < sizeof (struct tc_action_req)) {
-   fprintf(stderr, "buffer is too small: %zu\n", buflen);
-   return -1;
-   }
-   } else {
-   memset(&action_req, 0, sizeof (struct tc_action_req));
-   req = &action_req;
-   }
-
-   req->n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcamsg));
-   req->n.nlmsg_flags = NLM_F_REQUEST | flags;
-   req->n.nlmsg_type = cmd;
-   req->t.tca_family = AF_UNSPEC;
-   tail = NLMSG_TAIL(&req->n);
+   struct {
+   struct nlmsghdr n;
+   struct tcamsg   t;
+   charbuf[MAX_MSG];
+   } req = {
+   .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcamsg)),
+   .n.nlmsg_flags = NLM_F_REQUEST | flags,
+   .n.nlmsg_type = cmd,
+   .t.tca_family = AF_UNSPEC,
+   };
+   struct rtattr *tail = NLMSG_TAIL(&req.n);
 
argc -= 1;
argv += 1;
-   if (parse_action(&argc, &argv, TCA_ACT_TAB, &req->n)) {
+   if (parse_action(&argc, &argv, TCA_ACT_TAB, &req.n)) {
fprintf(stderr, "Illegal \"action\"\n");
return -1;
}
-   tail->rta_len = (void *) NLMSG_TAIL(&req->n) - (void *) tail;
+   tail->rta_len = (void *) NLMSG_TAIL(&req.n) - (void *) tail;
 
-   *argc_p = argc;
-   *argv_p = argv;
-
-   if (buf)
-   return 0;
-
-   iov.iov_base = &req->n;
-   iov.iov_len = req->n.nlmsg_len;
-   if (rtnl_talk_iov(&rth, &iov, 1, NULL) < 0) {
+   if (rtnl_talk(&rth, &req.n, NULL) < 0) {
fprintf(stderr, "We have an error talking to the kernel\n");
ret = -1;
}
 
+   *argc_p = argc;
+   *argv_p = argv;
+
return ret;
 }
 
@@ -711,7 +690,7 @@ bad_val:
return ret;
 }
 
-int do_action(int argc, char **argv, void *buf, size_t buflen)
+int do_action(int argc, char **argv)
 {
 
int ret = 0;
@@ -721,12 +700,12 @@ int do_action(int argc, char **argv, void *buf, size_t 
buflen)
if (matches(*argv, "add") == 0) {
ret =  tc_action_modify(RTM_NEWACTION,
NLM_F_EXCL | NLM_F_CREATE,
-   &argc, &argv, buf, buflen);
+   &argc, &argv);
} else if (matches(*argv, "change") == 0 ||
  matches(*argv, "replace") == 0) {
ret = tc_action_modify(RTM_NEWACTION,
   NLM_F_CREATE | NLM_F_REPLACE,
-  &argc, &argv, buf, buflen);
+  &argc, &argv);
} else if (matches(*argv, "delete") == 0) {
argc -= 1;
argv += 1;
diff --git a/tc/tc.c b/tc/tc.c
index b7b6bd288897..a0a18f380b46 100644
--- a/tc/tc.c
+++ b/tc/tc.c
@@ -205,18 +205,18 @@ static void usage(void)
"-nm | -nam[es] | { -cf | -conf } path 
}\n");
 }
 
-static int do_cmd(int argc, char **argv, void *buf, size_t buflen)
+static int do_cmd(int argc, char **argv)
 {
if (matches(*argv, "qdisc") == 0)
return do_qdisc(argc-1, argv+1);
if (matches(*argv, "class") == 0)
return do_class(argc-1, argv+1);
if (matches(*argv, "filter") == 0)
-   return do_filter(argc-1, argv+1, buf, buflen);
+   return do_filter(argc-1, argv+1);
if (matches(*argv, "chain") == 0)
-   return do_chain(argc-1, argv+1, buf, buflen);
+   return do_chain(argc-1, argv+1);
if (matches(*argv, "actions") == 0)
-   

[RFC iproute2 1/4] Revert "tc: Remove pointless assignments in batch()"

2019-07-31 Thread Stephen Hemminger
This reverts commit 6358bbc381c6e38465838370bcbbdeb77ec3565a.
---
 tc/tc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tc/tc.c b/tc/tc.c
index 64e342dd85bf..1f23971ae4b9 100644
--- a/tc/tc.c
+++ b/tc/tc.c
@@ -326,11 +326,11 @@ static int batch(const char *name)
struct batch_buf *head = NULL, *tail = NULL, *buf_pool = NULL;
char *largv[100], *largv_next[100];
char *line, *line_next = NULL;
+   bool bs_enabled_next = false;
bool bs_enabled = false;
bool lastline = false;
int largc, largc_next;
bool bs_enabled_saved;
-   bool bs_enabled_next;
int batchsize = 0;
size_t len = 0;
int ret = 0;
@@ -359,6 +359,7 @@ static int batch(const char *name)
goto Exit;
largc = makeargs(line, largv, 100);
bs_enabled = batchsize_enabled(largc, largv);
+   bs_enabled_saved = bs_enabled;
do {
if (getcmdline(&line_next, &len, stdin) == -1)
lastline = true;
@@ -394,6 +395,7 @@ static int batch(const char *name)
len = 0;
bs_enabled_saved = bs_enabled;
bs_enabled = bs_enabled_next;
+   bs_enabled_next = false;
 
if (largc == 0) {
largc = largc_next;
-- 
2.20.1



Fw: [Bug 204399] New: error in handling vlan tag by raw ethernet socket

2019-08-01 Thread Stephen Hemminger



Begin forwarded message:

Date: Thu, 01 Aug 2019 06:37:39 +
From: bugzilla-dae...@bugzilla.kernel.org
To: step...@networkplumber.org
Subject: [Bug 204399] New: error in handling vlan tag by raw ethernet socket


https://bugzilla.kernel.org/show_bug.cgi?id=204399

Bug ID: 204399
   Summary: error in handling vlan tag by raw ethernet socket
   Product: Networking
   Version: 2.5
Kernel Version: 5.0.0-21-generic #22-Ubuntu x86_64
  Hardware: All
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: high
  Priority: P1
 Component: IPV4
  Assignee: step...@networkplumber.org
  Reporter: lvenkatakumarcha...@gmail.com
Regression: No

Created attachment 284067
  --> https://bugzilla.kernel.org/attachment.cgi?id=284067&action=edit  
set/unset the variable "stripping_vlan_tag" to reproduce the issue

I am using raw ethernet socket and sending/receiving network traffic.

problem I am seeing is if read socket is opened before starting sending the
vlan traffic, read socket is able to capture packets along with vlan tag. but
if I open the read socket before starting sending the packets, vlan tag is
being stripped and only those four bytes are missing from the packet. Rest of
the packet is intact.

Please refer to the small code snippet which consistently produces the bug.

set/unset the variable "stripping_vlan_tag" to reproduce the issue.

is this a bug in the kernel or any problem with my coding ?

I am working on a project where I will be opening all the required sockets at
once and will be using as and when required. In that case, I am not able to
capture vlan tags which is blocking me. However, tcpdump is capture along with
vlan tag.

please be noted that, both the cards are in same system and are connected just
back to back using a cat5 cable. eth0 is built-in card and eth1 is usb based
external network card.

Is there a way to get rid of this issue ?

Best Regards,
Lokesh.

-- 
You are receiving this mail because:
You are the assignee for the bug.


Re: [PATCH iproute2-next] ip tunnel: add json output

2019-08-01 Thread Stephen Hemminger
On Thu,  1 Aug 2019 12:12:58 +0200
Andrea Claudi  wrote:

> Add json support on iptunnel and ip6tunnel.
> The plain text output format should remain the same.
> 
> Signed-off-by: Andrea Claudi 
> ---
>  ip/ip6tunnel.c | 82 +++--
>  ip/iptunnel.c  | 90 +-
>  ip/tunnel.c| 42 +--
>  3 files changed, 148 insertions(+), 66 deletions(-)
> 
> diff --git a/ip/ip6tunnel.c b/ip/ip6tunnel.c
> index d7684a673fdc4..f2b9710c1320f 100644
> --- a/ip/ip6tunnel.c
> +++ b/ip/ip6tunnel.c
> @@ -71,57 +71,90 @@ static void usage(void)
>  static void print_tunnel(const void *t)
>  {
>   const struct ip6_tnl_parm2 *p = t;
> - char s1[1024];
> - char s2[1024];
> + SPRINT_BUF(b1);
>  
>   /* Do not use format_host() for local addr,
>* symbolic name will not be useful.
>*/
> - printf("%s: %s/ipv6 remote %s local %s",
> -p->name,
> -tnl_strproto(p->proto),
> -format_host_r(AF_INET6, 16, &p->raddr, s1, sizeof(s1)),
> -rt_addr_n2a_r(AF_INET6, 16, &p->laddr, s2, sizeof(s2)));
> + open_json_object(NULL);
> + print_string(PRINT_ANY, "ifname", "%s: ", p->name);

Print this using color for interface name?


> + snprintf(b1, sizeof(b1), "%s/ipv6", tnl_strproto(p->proto));
> + print_string(PRINT_ANY, "mode", "%s ", b1);
> + print_string(PRINT_ANY,
> +  "remote",
> +  "remote %s ",
> +  format_host_r(AF_INET6, 16, &p->raddr, b1, sizeof(b1)));
> + print_string(PRINT_ANY,
> +  "local",
> +  "local %s",
> +  rt_addr_n2a_r(AF_INET6, 16, &p->laddr, b1, sizeof(b1)));
> +
>   if (p->link) {
>   const char *n = ll_index_to_name(p->link);
>  
>   if (n)
> - printf(" dev %s", n);
> + print_string(PRINT_ANY, "link", " dev %s", n);
>   }
>  
>   if (p->flags & IP6_TNL_F_IGN_ENCAP_LIMIT)
> - printf(" encaplimit none");
> + print_bool(PRINT_ANY,
> +"ip6_tnl_f_ign_encap_limit",
> +" encaplimit none",
> +true);

For flags like this, print_null is more typical JSON than a boolean
value. Null is better for presence flag. Bool is better if both true and
false are printed.


Re: [PATCH iproute2-next] ip tunnel: add json output

2019-08-02 Thread Stephen Hemminger
On Fri, 2 Aug 2019 13:14:15 +0200
Andrea Claudi  wrote:

> On Thu, Aug 1, 2019 at 5:16 PM Stephen Hemminger
>  wrote:
> >
> > On Thu,  1 Aug 2019 12:12:58 +0200
> > Andrea Claudi  wrote:
> >  
> > > Add json support on iptunnel and ip6tunnel.
> > > The plain text output format should remain the same.
> > >
> > > Signed-off-by: Andrea Claudi 
> > > ---
> > >  ip/ip6tunnel.c | 82 +++--
> > >  ip/iptunnel.c  | 90 +-
> > >  ip/tunnel.c| 42 +--
> > >  3 files changed, 148 insertions(+), 66 deletions(-)
> > >
> > > diff --git a/ip/ip6tunnel.c b/ip/ip6tunnel.c
> > > index d7684a673fdc4..f2b9710c1320f 100644
> > > --- a/ip/ip6tunnel.c
> > > +++ b/ip/ip6tunnel.c
> > > @@ -71,57 +71,90 @@ static void usage(void)
> > >  static void print_tunnel(const void *t)
> > >  {
> > >   const struct ip6_tnl_parm2 *p = t;
> > > - char s1[1024];
> > > - char s2[1024];
> > > + SPRINT_BUF(b1);
> > >
> > >   /* Do not use format_host() for local addr,
> > >* symbolic name will not be useful.
> > >*/
> > > - printf("%s: %s/ipv6 remote %s local %s",
> > > -p->name,
> > > -tnl_strproto(p->proto),
> > > -format_host_r(AF_INET6, 16, &p->raddr, s1, sizeof(s1)),
> > > -rt_addr_n2a_r(AF_INET6, 16, &p->laddr, s2, sizeof(s2)));
> > > + open_json_object(NULL);
> > > + print_string(PRINT_ANY, "ifname", "%s: ", p->name);  
> >
> > Print this using color for interface name?  
> 
> Thanks for the suggestion, I can do the same also for remote and local
> addresses?
> 
> >
> >  
> > > + snprintf(b1, sizeof(b1), "%s/ipv6", tnl_strproto(p->proto));
> > > + print_string(PRINT_ANY, "mode", "%s ", b1);
> > > + print_string(PRINT_ANY,
> > > +  "remote",
> > > +  "remote %s ",
> > > +  format_host_r(AF_INET6, 16, &p->raddr, b1, 
> > > sizeof(b1)));
> > > + print_string(PRINT_ANY,
> > > +  "local",
> > > +  "local %s",
> > > +  rt_addr_n2a_r(AF_INET6, 16, &p->laddr, b1, 
> > > sizeof(b1)));
> > > +
> > >   if (p->link) {
> > >   const char *n = ll_index_to_name(p->link);
> > >
> > >   if (n)
> > > - printf(" dev %s", n);
> > > + print_string(PRINT_ANY, "link", " dev %s", n);
> > >   }
> > >
> > >   if (p->flags & IP6_TNL_F_IGN_ENCAP_LIMIT)
> > > - printf(" encaplimit none");
> > > + print_bool(PRINT_ANY,
> > > +"ip6_tnl_f_ign_encap_limit",
> > > +" encaplimit none",
> > > +true);  
> >
> > For flags like this, print_null is more typical JSON than a boolean
> > value. Null is better for presence flag. Bool is better if both true and
> > false are printed.  
> 
> Using print_null json JSON output becomes:
> 
>   {
> "ifname": "gre2",
> "mode": "gre/ipv6",
> "remote": "fd::1",
> "local": "::",
> "ip6_tnl_f_ign_encap_limit": null,
> "hoplimit": 64,
> "tclass": "0x00",
> "flowlabel": "0x0",
> "flowinfo": "0x",
> "flags": []
>   }
> 
> Which seems a bit confusing to me (what does the "null" value means?).
> Using print_null we also introduce an inconsistency with the JSON
> output of ip/link_gre6.c and ip/link_ip6tnl.c.
> I would prefer to use print_bool and print out
> ip6_tnl_f_ign_encap_limit both when true and false, patching both
> files above to do the same. WDYT?

JSON has several ways to represent the same type of flag value.
Since JSON always has key/value. Null is used to indicate something is present 
and
in that case the value is unnecessary, which is what the null field was meant 
for.




Re: [patch iproute2] devlink: finish queue.h to list.h transition

2019-08-05 Thread Stephen Hemminger
On Mon,  5 Aug 2019 11:56:56 +0200
Jiri Pirko  wrote:

> From: Jiri Pirko 
> 
> Loose the "q" from the names and name the structure fields in the same
> way rest of the code does. Also, fix list_add arg order which leads
> to segfault.
> 
> Fixes: 33267017faf1 ("iproute2: devlink: port from sys/queue.h to list.h")
> Signed-off-by: Jiri Pirko 

Applied, thanks.


[PATCH net 1/2] docs: admin-guide: remove references to IPX and token-ring

2019-08-05 Thread Stephen Hemminger
Both IPX and TR have not been supported for a while now.
Remove them from the /proc/sys/net documentation.

Signed-off-by: Stephen Hemminger 
---
 Documentation/admin-guide/sysctl/net.rst | 29 +---
 1 file changed, 1 insertion(+), 28 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/net.rst 
b/Documentation/admin-guide/sysctl/net.rst
index a7d44e71019d..287b98708a40 100644
--- a/Documentation/admin-guide/sysctl/net.rst
+++ b/Documentation/admin-guide/sysctl/net.rst
@@ -39,7 +39,6 @@ Table : Subdirectories in /proc/sys/net
  802   E802 protocol ax25   AX25
  ethernet  Ethernet protocol rose   X.25 PLP layer
  ipv4  IP version 4  x25X.25 protocol
- ipx   IPX   token-ring IBM token ring
  bridgeBridging  decnet DEC net
  ipv6  IP version 6  tipc   TIPC
  = === = == ==
@@ -401,33 +400,7 @@ interface.
 (network) that the route leads to, the router (may be directly connected), the
 route flags, and the device the route is using.
 
-
-5. IPX
---
-
-The IPX protocol has no tunable values in proc/sys/net.
-
-The IPX  protocol  does,  however,  provide  proc/net/ipx. This lists each IPX
-socket giving  the  local  and  remote  addresses  in  Novell  format (that is
-network:node:port). In  accordance  with  the  strange  Novell  tradition,
-everything but the port is in hex. Not_Connected is displayed for sockets that
-are not  tied to a specific remote address. The Tx and Rx queue sizes indicate
-the number  of  bytes  pending  for  transmission  and  reception.  The  state
-indicates the  state  the  socket  is  in and the uid is the owning uid of the
-socket.
-
-The /proc/net/ipx_interface  file lists all IPX interfaces. For each interface
-it gives  the network number, the node number, and indicates if the network is
-the primary  network.  It  also  indicates  which  device  it  is bound to (or
-Internal for  internal  networks)  and  the  Frame  Type if appropriate. Linux
-supports 802.3,  802.2,  802.2  SNAP  and DIX (Blue Book) ethernet framing for
-IPX.
-
-The /proc/net/ipx_route  table  holds  a list of IPX routes. For each route it
-gives the  destination  network, the router node (or Directly) and the network
-address of the router (or Connected) for internal networks.
-
-6. TIPC
+5. TIPC
 ---
 
 tipc_rmem
-- 
2.20.1



[PATCH net 2/2] net: docs: replace IPX in tuntap documentation

2019-08-05 Thread Stephen Hemminger
IPX is no longer supported, but the example in the documentation
might useful. Replace it with IPv6.

Signed-off-by: Stephen Hemminger 
---
 Documentation/networking/tuntap.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/tuntap.txt 
b/Documentation/networking/tuntap.txt
index 949d5dcdd9a3..0104830d5075 100644
--- a/Documentation/networking/tuntap.txt
+++ b/Documentation/networking/tuntap.txt
@@ -204,8 +204,8 @@ Ethernet device, which instead of receiving packets from a 
physical
 media, receives them from user space program and instead of sending 
 packets via physical media sends them to the user space program. 
 
-Let's say that you configured IPX on the tap0, then whenever 
-the kernel sends an IPX packet to tap0, it is passed to the application
+Let's say that you configured IPv6 on the tap0, then whenever
+the kernel sends an IPv6 packet to tap0, it is passed to the application
 (VTun for example). The application encrypts, compresses and sends it to 
 the other side over TCP or UDP. The application on the other side decompresses
 and decrypts the data received and writes the packet to the TAP device, 
-- 
2.20.1



Re: [PATCH iproute2-next] ss: sctp: fix typo for nodelay

2019-08-06 Thread Stephen Hemminger
On Sat,  3 Aug 2019 10:37:41 +0200
Patrick Talbert  wrote:

> nodealy should be nodelay.
> 
> Signed-off-by: Patrick Talbert 

Both patches applied to current iproute2


Re: [PATCH 12/17] skge: no need to check return value of debugfs_create functions

2019-08-06 Thread Stephen Hemminger
On Tue,  6 Aug 2019 18:11:23 +0200
Greg Kroah-Hartman  wrote:

> When calling debugfs functions, there is no need to ever check the
> return value.  The function can work or not, but the code logic should
> never do something different based on this.
> 
> Cc: Mirko Lindner 
> Cc: Stephen Hemminger 
> Cc: "David S. Miller" 
> Cc: netdev@vger.kernel.org
> Signed-off-by: Greg Kroah-Hartman 

Acked-by: Stephen Hemminger 

Did not pull card out of dust bin to test it though


Re: [patch net-next rfc 3/7] net: rtnetlink: add commands to add and delete alternative ifnames

2019-08-12 Thread Stephen Hemminger
On Mon, 12 Aug 2019 10:31:39 +0200
Jiri Pirko  wrote:

> Mon, Aug 12, 2019 at 03:37:26AM CEST, dsah...@gmail.com wrote:
> >On 8/11/19 7:34 PM, David Ahern wrote:  
> >> On 8/10/19 12:30 AM, Jiri Pirko wrote:  
> >>> Could you please write me an example message of add/remove?  
> >> 
> >> altnames are for existing netdevs, yes? existing netdevs have an id and
> >> a name - 2 existing references for identifying the existing netdev for
> >> which an altname will be added. Even using the altname as the main
> >> 'handle' for a setlink change, I see no reason why the GETLINK api can
> >> not take an the IFLA_ALT_IFNAME and return the full details of the
> >> device if the altname is unique.
> >> 
> >> So, what do the new RTM commands give you that you can not do with
> >> RTM_*LINK?
> >>   
> >
> >
> >To put this another way, the ALT_NAME is an attribute of an object - a
> >LINK. It is *not* a separate object which requires its own set of
> >commands for manipulating.  
> 
> Okay, again, could you provide example of a message to add/remove
> altname using existing setlink message? Thanks!

The existing IFALIAS takes an empty name to do deletion.


Re: [PATCH iproute2] tc: Fix block-handle support for filter operations

2019-08-12 Thread Stephen Hemminger
On Mon, 12 Aug 2019 13:17:06 +0300
Ido Schimmel  wrote:

> From: Ido Schimmel 
> 
> Commit e991c04d64c0 ("Revert "tc: Add batchsize feature for filter and
> actions"") reverted more than it should and broke shared block
> functionality. Fix this by restoring the original functionality.
> 
> To reproduce:
> 
> # tc qdisc add dev swp1 ingress_block 10 ingress
> # tc filter add block 10 proto ip pref 1 flower \
>   dst_ip 192.0.2.0/24 action drop
> Unknown filter "block", hence option "10" is unparsable
> 
> Fixes: e991c04d64c0 ("Revert "tc: Add batchsize feature for filter and 
> actions"")
> Signed-off-by: Ido Schimmel 

Applied


Fw: [Bug 211175] New: gretap does not fragment packets regardless of the DF flag

2021-01-14 Thread Stephen Hemminger



Begin forwarded message:

Date: Wed, 13 Jan 2021 13:37:33 +
From: bugzilla-dae...@bugzilla.kernel.org
To: step...@networkplumber.org
Subject: [Bug 211175] New: gretap does not fragment packets regardless of the 
DF flag


https://bugzilla.kernel.org/show_bug.cgi?id=211175

Bug ID: 211175
   Summary: gretap does not fragment packets regardless of the DF
flag
   Product: Networking
   Version: 2.5
Kernel Version: 5.10.4
  Hardware: x86-64
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: IPV4
  Assignee: step...@networkplumber.org
  Reporter: pupi...@hotmail.com
Regression: No

Hello everyone,

I'm running linux 5.10.4 with iproute-5.10 on Slackware (64bit).

When I try to configure a gretap device with the "ignore-df" I getting this
error: 

ip link add testgre type gretap remote 10.42.44.6 local 10.86.44.6 ignore-df
RTNETLINK answers: Invalid argument

Instead, if I try to run the following command it is going to be executed:
ip link add testgre type gretap remote 10.42.44.6 local 10.86.44.6 noignore-df

Also I have noticed that the icmp datagrams with the DF=none are not fragmented
anyway.
For example this is a tcpdump capture showing a 1459 bytes lenght icmp packet
that is not going to be fragmented and delivered to the other remote gretap
linux box (running the same kernel version).

ethertype 802.1Q (0x8100), length 1477: vlan 802, p 0, ethertype IPv4, (flags
[none], proto ICMP (1), length 1459)
192.168.1.247 > 192.168.1.1: ICMP echo request, id 10287, seq 0, length
1439


Is this expected?


This is my full gretap setup: eth0 mtu is 1500 bytes.

ip link add testgre type gretap remote 10.42.44.6 local 10.86.44.6
ip link set testgre up
ip link set eth0 up


ip link add name br0 type bridge
ip link set br0 up

ip link set testgre master br0
ip link set eth0 master br0


and this my 'ip a s' output:

13: testgre@NONE:  mtu 1462 qdisc pfifo_fast
master br0 state UNKNOWN group default qlen 1000
link/ether 5e:56:0a:0c:12:f0 brd ff:ff:ff:ff:ff:ff
14: br0:  mtu 1462 qdisc noqueue state UP
group default qlen 1000
link/ether 5e:56:0a:0c:12:f0 brd ff:ff:ff:ff:ff:ff

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are the assignee for the bug.


  1   2   3   4   5   6   7   8   9   10   >