from:"Ivan Vecera"

[PATCH ethtool] ethtool: don't report UFO on kernels v4.14 and above

2018-11-29 Thread Ivan Vecera

Support for UDP fragmentation offloading was removed in kernel v4.14.
The ethtool reports incorrectly its state on this and newer kernels:

$ ethtool -k enp0s31f6 | grep udp-frag
udp-fragmentation-offload: off

It's look like that the feature is supported and disabled only. Instead
of this behavior the status of UFO should not be reported on such kernels.

As the UFO is only one feature that was removed from the ethtool I have
decided after discussion with John to implement only simple check for
this instead of implementing more generic solution like 'max_kernel_ver
field in struct off_flag_def'.

Cc: John W. Linville 
Signed-off-by: Ivan Vecera 
---
 ethtool.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/ethtool.c b/ethtool.c
index 2f7e96b..6121979 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -1465,8 +1465,10 @@ static void dump_features(const struct feature_defs 
*defs,
 * kernel version
 */
if (defs->off_flag_matched[i] == 0 &&
-   off_flag_def[i].get_cmd == 0 &&
-   kernel_ver < off_flag_def[i].min_kernel_ver)
+   ((off_flag_def[i].get_cmd == 0 &&
+ kernel_ver < off_flag_def[i].min_kernel_ver) ||
+(off_flag_def[i].get_cmd == ETHTOOL_GUFO &&
+ kernel_ver >= KERNEL_VERSION(4, 14, 0
continue;
 
value = off_flag_def[i].value;
-- 
2.18.1

Re: [PATCH net] be2net: Fix NULL pointer dereference in be_tx_timeout()

2018-11-28 Thread Ivan Vecera

On 28. 11. 18 20:00, David Miller wrote:

From: Ivan Vecera 
Date: Wed, 28 Nov 2018 11:12:22 +0100

On 27. 11. 18 23:51, David Miller wrote:

From: Petr Oros 
Date: Thu, 22 Nov 2018 15:24:07 +0100

@@ -4700,8 +4700,11 @@ int be_update_queues(struct be_adapter
*adapter)
struct net_device *netdev = adapter->netdev;
int status;
   -if (netif_running(netdev))
+   if (netif_running(netdev)) {
+   /* prevent netdev watchdog during tx queue destroy */
+   netif_carrier_off(netdev);
be_close(netdev);
+   }

This will make userspace networking daemons will think that the link
went down.
This absolutely should not be a side effect of making a simple
TX queue configuration change via ethtool.

Yes, you're right Dave. But the same thing (netif_carrier_off()) is
done by number of other drivers (igb, tg3, ixgbe...) during
.set_channels() callback. The patch that Petr sent does the
practically the same thing like this one:

Bug exist in other drivers, thanks for reporting that.

It doesn't mean you should add the same bug here as well.

OK.
And the right way? netif_device_detach() that does not fire linkwatch events?

Thx,
Ivan

Re: [PATCH net] be2net: Fix NULL pointer dereference in be_tx_timeout()

2018-11-28 Thread Ivan Vecera

On 27. 11. 18 23:51, David Miller wrote:

From: Petr Oros 
Date: Thu, 22 Nov 2018 15:24:07 +0100

@@ -4700,8 +4700,11 @@ int be_update_queues(struct be_adapter *adapter)
struct net_device *netdev = adapter->netdev;
int status;

-	if (netif_running(netdev))

+   if (netif_running(netdev)) {
+   /* prevent netdev watchdog during tx queue destroy */
+   netif_carrier_off(netdev);
be_close(netdev);
+   }

This will make userspace networking daemons will think that the link
went down.

This absolutely should not be a side effect of making a simple
TX queue configuration change via ethtool.

Yes, you're right Dave. But the same thing (netif_carrier_off()) is done by 
number of other drivers (igb, tg3, ixgbe...) during .set_channels() callback. 
The patch that Petr sent does the practically the same thing like this one:

commit c0dfb90e5b2d41c907de9b624657a6688541837e
Author: John Fastabend 
Date:   Tue Apr 27 02:13:39 2010 +

ixgbe: ixgbe_down needs to stop dev_watchdog

There is a small race between when the tx queues are stopped
and when netif_carrier_off() is called in ixgbe_down.  If the
dev_watchdog() timer fires during this time it is possible for
a false tx timeout to occur.

This patch moves the netif_carrier_off() so that it is called before
the tx queues are stopped preventing the dev_watchdog timer from
detecting false tx timeouts.  The race is seen occosionally when
FCoE or DCB settings are being configured or changed.

Testing note, running ifconfig up/down will not reproduce this
issue because dev_open/dev_close call dev_deactivate() and then
dev_activate().

where netif_carrier_off() is called prior netif_tx_disable() from ixgbe_down() 
to avoid false Tx timeout. And ixgbe_down() is called from ixgbe_set_channels() 
this way:

ixgbe_set_channels()->ixgbe_setup_tc()->ixgbe_close()->ixgbe_close_suspend()->ixgbe_down()

As I said the similar approach is used by other drivers as well.

The mlx4 driver resolves this situation differently. It calls 
mlx4_en_stop_port() from mlx4_en_set_channels() with parameter 'detach'==1 that 
causes that netif_device_detach() is called prior stopping of Tx queues. This 
also effectively prevents dev_watchdog() to call .ndo_tx_timeout() callback. An 
advantage is netif_device_detach() does not fire linkwatch events.

So... what ways is the _right_ one ?

Thanks,
Ivan

Re: [PATCH] Revert "be2net: remove desc field from be_eq_obj"

2018-10-23 Thread Ivan Vecera

On 23.10.2018 20:03, David Miller wrote:
> From: Ivan Vecera 
> Date: Tue, 23 Oct 2018 16:40:26 +0200
> 
>> The mentioned commit needs to be reverted because we cannot pass
>> string allocated on stack to request_irq(). This function stores
>> uses this pointer for later use (e.g. /proc/interrupts) so we need
>> to keep this string persistently.
>>
>> Fixes: d6d9704af8f4 ("be2net: remove desc field from be_eq_obj")
>>
>> Signed-off-by: Ivan Vecera 
> 
> Please do not put empty lines between Fixes and the other tags,
> all tags should appear together as one contiguous group.
Sure Dave... sorry for that.

Ivan

Re: [PATCH iproute2] iplink: report drop stats for VFs

2018-07-25 Thread Ivan Vecera

On 25.7.2018 19:04, David Ahern wrote:
> On 7/25/18 10:22 AM, Ivan Vecera wrote:
>> Kernel commit c5a9f6f0ab40 ("net/core: Add drop counters to VF
>> statistics") added support for Rx/Tx packet drops but these stats are
>> not reported by 'ip link'.
>>
>> Cc: Eugenia Emantayev 
>> Cc: Saeed Mahameed 
>>
>> Signed-off-by: Ivan Vecera 
>> ---
>>  ip/ipaddress.c | 11 +--
>>  1 file changed, 9 insertions(+), 2 deletions(-)
>>
> 
> duplicates a patch from Eran which I just committed to iproute2-next
> 
Oops, I missed it :-)

I.

[PATCH iproute2] iplink: report drop stats for VFs

2018-07-25 Thread Ivan Vecera

Kernel commit c5a9f6f0ab40 ("net/core: Add drop counters to VF
statistics") added support for Rx/Tx packet drops but these stats are
not reported by 'ip link'.

Cc: Eugenia Emantayev 
Cc: Saeed Mahameed 

Signed-off-by: Ivan Vecera 
---
 ip/ipaddress.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index ea8211c1..1b5ec02a 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -558,6 +558,8 @@ static void print_vf_stats64(FILE *fp, struct rtattr 
*vfstats)
   rta_getattr_u64(vf[IFLA_VF_STATS_RX_BYTES]));
print_u64(PRINT_JSON, "packets", NULL,
   rta_getattr_u64(vf[IFLA_VF_STATS_RX_PACKETS]));
+   print_u64(PRINT_JSON, "dropped", NULL,
+  rta_getattr_u64(vf[IFLA_VF_STATS_RX_DROPPED]));
print_u64(PRINT_JSON, "multicast", NULL,
   rta_getattr_u64(vf[IFLA_VF_STATS_MULTICAST]));
print_u64(PRINT_JSON, "broadcast", NULL,
@@ -570,26 +572,31 @@ static void print_vf_stats64(FILE *fp, struct rtattr 
*vfstats)
   rta_getattr_u64(vf[IFLA_VF_STATS_TX_BYTES]));
print_u64(PRINT_JSON, "tx_packets", NULL,
   rta_getattr_u64(vf[IFLA_VF_STATS_TX_PACKETS]));
+   print_u64(PRINT_JSON, "tx_dropped", NULL,
+  rta_getattr_u64(vf[IFLA_VF_STATS_TX_DROPPED]));
close_json_object();
close_json_object();
} else {
/* RX stats */
fprintf(fp, "%s", _SL_);
-   fprintf(fp, "RX: bytes  packets  mcast   bcast %s", _SL_);
+   fprintf(fp, "RX: bytes  packets  dropped mcast   bcast %s",
+   _SL_);
fprintf(fp, "");
 
print_num(fp, 10, rta_getattr_u64(vf[IFLA_VF_STATS_RX_BYTES]));
print_num(fp, 8, rta_getattr_u64(vf[IFLA_VF_STATS_RX_PACKETS]));
+   print_num(fp, 7, rta_getattr_u64(vf[IFLA_VF_STATS_RX_DROPPED]));
print_num(fp, 7, rta_getattr_u64(vf[IFLA_VF_STATS_MULTICAST]));
print_num(fp, 7, rta_getattr_u64(vf[IFLA_VF_STATS_BROADCAST]));
 
/* TX stats */
fprintf(fp, "%s", _SL_);
-   fprintf(fp, "TX: bytes  packets %s", _SL_);
+   fprintf(fp, "TX: bytes  packets  dropped %s", _SL_);
fprintf(fp, "");
 
print_num(fp, 10, rta_getattr_u64(vf[IFLA_VF_STATS_TX_BYTES]));
print_num(fp, 8, rta_getattr_u64(vf[IFLA_VF_STATS_TX_PACKETS]));
+   print_num(fp, 7, rta_getattr_u64(vf[IFLA_VF_STATS_TX_DROPPED]));
}
 }
 
-- 
2.16.4

[PATCH net-next v2 0/8] be2net: small structures clean-up

2018-07-10 Thread Ivan Vecera

The series:
- removes unused / unneccessary fields in several be2net structures
- re-order fields in some structures to eliminate holes, cache-lines
  crosses
- as result reduces size of main struct be_adapter by 4kB

Ivan Vecera (8):
  be2net: remove unused old AIC info
  be2net: remove unused old custom busy-poll fields
  be2net: remove desc field from be_eq_obj
  be2net: reorder fields in be_eq_obj structure
  be2net: move txcp field in be_tx_obj to eliminate holes in the struct
  be2net: remove unused tx_jiffies field from be_tx_stats
  be2net: re-order fields in be_error_recovert to avoid hole
  be2net: move rss_flags field in rss_info to ensure proper alignment

 drivers/net/ethernet/emulex/benet/be.h  | 39 +++--
 drivers/net/ethernet/emulex/benet/be_main.c |  6 +++--
 2 files changed, 13 insertions(+), 32 deletions(-)

-- 
2.16.4

[PATCH net-next v2 3/8] be2net: remove desc field from be_eq_obj

2018-07-10 Thread Ivan Vecera

The event queue description (be_eq_obj.desc) field is used only to format
string for IRQ name and it is not really needed to hold this value.
Remove it and use local variable to format string for IRQ name.

Signed-off-by: Ivan Vecera 
---
 drivers/net/ethernet/emulex/benet/be.h  | 1 -
 drivers/net/ethernet/emulex/benet/be_main.c | 6 --
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h 
b/drivers/net/ethernet/emulex/benet/be.h
index a4604dea4560..e71e5e592626 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -185,7 +185,6 @@ static inline void queue_tail_inc(struct be_queue_info *q)
 
 struct be_eq_obj {
struct be_queue_info q;
-   char desc[32];
 
u8 idx; /* array index */
u8 msix_idx;
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c 
b/drivers/net/ethernet/emulex/benet/be_main.c
index 8f755009ff38..05e4c0bb25f4 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -3403,9 +3403,11 @@ static int be_msix_register(struct be_adapter *adapter)
int status, i, vec;
 
for_all_evt_queues(adapter, eqo, i) {
-   sprintf(eqo->desc, "%s-q%d", netdev->name, i);
+   char irq_name[IFNAMSIZ+4];
+
+   snprintf(irq_name, sizeof(irq_name), "%s-q%d", netdev->name, i);
vec = be_msix_vec_get(adapter, eqo);
-   status = request_irq(vec, be_msix, 0, eqo->desc, eqo);
+   status = request_irq(vec, be_msix, 0, irq_name, eqo);
if (status)
goto err_msix;
 
-- 
2.16.4

[PATCH net-next v2 6/8] be2net: remove unused tx_jiffies field from be_tx_stats

2018-07-10 Thread Ivan Vecera

Signed-off-by: Ivan Vecera 
---
 drivers/net/ethernet/emulex/benet/be.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h 
b/drivers/net/ethernet/emulex/benet/be.h
index 91ca8d132e87..d521364e17cf 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -217,7 +217,6 @@ struct be_tx_stats {
u64 tx_vxlan_offload_pkts;
u64 tx_reqs;
u64 tx_compl;
-   ulong tx_jiffies;
u32 tx_stops;
u32 tx_drv_drops;   /* pkts dropped by driver */
/* the error counters are described in be_ethtool.c */
-- 
2.16.4

[PATCH net-next v2 2/8] be2net: remove unused old custom busy-poll fields

2018-07-10 Thread Ivan Vecera

The commit fb6113e688e0 ("be2net: get rid of custom busy poll code")
replaced custom busy-poll code by the generic one but left several
macros and fields in struct be_eq_obj that are currently unused.
Remove this stuff.

Fixes: fb6113e688e0 ("be2net: get rid of custom busy poll code")
Signed-off-by: Ivan Vecera 
---
 drivers/net/ethernet/emulex/benet/be.h | 13 -
 1 file changed, 13 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h 
b/drivers/net/ethernet/emulex/benet/be.h
index 6cf9d106c989..a4604dea4560 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -193,19 +193,6 @@ struct be_eq_obj {
struct napi_struct napi;
struct be_adapter *adapter;
cpumask_var_t  affinity_mask;
-
-#ifdef CONFIG_NET_RX_BUSY_POLL
-#define BE_EQ_IDLE 0
-#define BE_EQ_NAPI 1   /* napi owns this EQ */
-#define BE_EQ_POLL 2   /* poll owns this EQ */
-#define BE_EQ_LOCKED   (BE_EQ_NAPI | BE_EQ_POLL)
-#define BE_EQ_NAPI_YIELD   4   /* napi yielded this EQ */
-#define BE_EQ_POLL_YIELD   8   /* poll yielded this EQ */
-#define BE_EQ_YIELD(BE_EQ_NAPI_YIELD | BE_EQ_POLL_YIELD)
-#define BE_EQ_USER_PEND(BE_EQ_POLL | BE_EQ_POLL_YIELD)
-   unsigned int state;
-   spinlock_t lock;/* lock to serialize napi and busy-poll */
-#endif  /* CONFIG_NET_RX_BUSY_POLL */
 } cacheline_aligned_in_smp;
 
 struct be_aic_obj {/* Adaptive interrupt coalescing (AIC) info */
-- 
2.16.4

[PATCH net-next v2 5/8] be2net: move txcp field in be_tx_obj to eliminate holes in the struct

2018-07-10 Thread Ivan Vecera

Before patch:
struct be_tx_obj {
u32db_offset;/* 0 4 */

/* XXX 4 bytes hole, try to pack */

struct be_queue_info   q;/* 856 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct be_queue_info   cq;   /*6456 */
struct be_tx_compl_infotxcp; /*   120 4 */

/* XXX 4 bytes hole, try to pack */

/* --- cacheline 2 boundary (128 bytes) --- */
struct sk_buff *   sent_skb_list[2048];  /*   128 16384 */
...
}:

After patch:
struct be_tx_obj {
u32db_offset;/* 0 4 */
struct be_tx_compl_infotxcp; /* 4 4 */
struct be_queue_info   q;/* 856 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct be_queue_info   cq;   /*6456 */
struct sk_buff *   sent_skb_list[2048];  /*   120 16384 */
...
};

Signed-off-by: Ivan Vecera 
---
 drivers/net/ethernet/emulex/benet/be.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h 
b/drivers/net/ethernet/emulex/benet/be.h
index 716b4bc410f5..91ca8d132e87 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -240,9 +240,9 @@ struct be_tx_compl_info {
 
 struct be_tx_obj {
u32 db_offset;
+   struct be_tx_compl_info txcp;
struct be_queue_info q;
struct be_queue_info cq;
-   struct be_tx_compl_info txcp;
/* Remember the skbs that were transmitted */
struct sk_buff *sent_skb_list[TX_Q_LEN];
struct be_tx_stats stats;
-- 
2.16.4

[PATCH net-next v2 7/8] be2net: re-order fields in be_error_recovert to avoid hole

2018-07-10 Thread Ivan Vecera

- Unionize two u8 fields where only one of them is used depending on NIC
chipset.
- Move recovery_supported field after that union

These changes eliminate 7-bytes hole in the struct and makes it smaller
by 8 bytes.

Signed-off-by: Ivan Vecera 
---
 drivers/net/ethernet/emulex/benet/be.h | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h 
b/drivers/net/ethernet/emulex/benet/be.h
index d521364e17cf..4f805be43180 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -522,11 +522,13 @@ enum {
 };
 
 struct be_error_recovery {
-   /* Lancer error recovery variables */
-   u8 recovery_retries;
+   union {
+   u8 recovery_retries;/* used for Lancer  */
+   u8 recovery_state;  /* used for BEx and Skyhawk */
+   };
 
/* BEx/Skyhawk error recovery variables */
-   u8 recovery_state;
+   bool recovery_supported;
u16 ue_to_reset_time;   /* Time after UE, to soft reset
 * the chip - PF0 only
 */
@@ -534,7 +536,6 @@ struct be_error_recovery {
 * of SLIPORT_SEMAPHORE reg
 */
u16 last_err_code;
-   bool recovery_supported;
unsigned long probe_time;
unsigned long last_recovery_time;
 
-- 
2.16.4

[PATCH net-next v2 4/8] be2net: reorder fields in be_eq_obj structure

2018-07-10 Thread Ivan Vecera

Re-order fields in struct be_eq_obj to ensure that .napi field begins
at start of cache-line. Also the .adapter field is moved to the first
cache-line next to .q field and 3 fields (idx,msi_idx,spurious_intr)
and the 4-bytes hole to 3rd cache-line.

Signed-off-by: Ivan Vecera 
---
 drivers/net/ethernet/emulex/benet/be.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h 
b/drivers/net/ethernet/emulex/benet/be.h
index e71e5e592626..716b4bc410f5 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -186,11 +186,11 @@ static inline void queue_tail_inc(struct be_queue_info *q)
 struct be_eq_obj {
struct be_queue_info q;
 
+   struct be_adapter *adapter;
+   struct napi_struct napi;
u8 idx; /* array index */
u8 msix_idx;
u16 spurious_intr;
-   struct napi_struct napi;
-   struct be_adapter *adapter;
cpumask_var_t  affinity_mask;
 } cacheline_aligned_in_smp;
 
-- 
2.16.4

[PATCH net-next v2 8/8] be2net: move rss_flags field in rss_info to ensure proper alignment

2018-07-10 Thread Ivan Vecera

The current position of .rss_flags field in struct rss_info causes
that fields .rsstable and .rssqueue (both 128 bytes long) crosses
cache-line boundaries. Moving it at the end properly align all fields.

Before patch:
struct rss_info {
u64rss_flags;/* 0 8 */
u8 rsstable[128];/* 8   128 */
/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
u8 rss_queue[128];   /*   136   128 */
/* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */
u8 rss_hkey[40]; /*   26440 */
};

After patch:
struct rss_info {
u8 rsstable[128];/* 0   128 */
/* --- cacheline 2 boundary (128 bytes) --- */
u8 rss_queue[128];   /*   128   128 */
/* --- cacheline 4 boundary (256 bytes) --- */
u8 rss_hkey[40]; /*   25640 */
u64rss_flags;/*   296 8 */
};

Signed-off-by: Ivan Vecera 
---
 drivers/net/ethernet/emulex/benet/be.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h 
b/drivers/net/ethernet/emulex/benet/be.h
index 4f805be43180..7005949dc17b 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -436,10 +436,10 @@ struct be_port_resources {
 #define be_is_os2bmc_enabled(adapter) (adapter->flags & BE_FLAGS_OS2BMC)
 
 struct rss_info {
-   u64 rss_flags;
u8 rsstable[RSS_INDIR_TABLE_LEN];
u8 rss_queue[RSS_INDIR_TABLE_LEN];
u8 rss_hkey[RSS_HASH_KEY_LEN];
+   u64 rss_flags;
 };
 
 #define BE_INVALID_DIE_TEMP0xFF
-- 
2.16.4

[PATCH net-next v2 1/8] be2net: remove unused old AIC info

2018-07-10 Thread Ivan Vecera

The commit 2632bafd74ae ("be2net: fix adaptive interrupt coalescing")
introduced a separate struct be_aic_obj to hold AIC information but
unfortunately left the old stuff in be_eq_obj. So remove it.

Fixes: 2632bafd74ae ("be2net: fix adaptive interrupt coalescing")
Signed-off-by: Ivan Vecera 
---
 drivers/net/ethernet/emulex/benet/be.h | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h 
b/drivers/net/ethernet/emulex/benet/be.h
index 382891f81e09..6cf9d106c989 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -187,13 +187,6 @@ struct be_eq_obj {
struct be_queue_info q;
char desc[32];
 
-   /* Adaptive interrupt coalescing (AIC) info */
-   bool enable_aic;
-   u32 min_eqd;/* in usecs */
-   u32 max_eqd;/* in usecs */
-   u32 eqd;/* configured val when aic is off */
-   u32 cur_eqd;/* in usecs */
-
u8 idx; /* array index */
u8 msix_idx;
u16 spurious_intr;
-- 
2.16.4

[PATCH net-next 2/8] be2net: remove unused old custom busy-poll fields

2018-06-21 Thread Ivan Vecera

The commit fb6113e688e0 ("be2net: get rid of custom busy poll code")
replaced custom busy-poll code by the generic one but left several
macros and fields in struct be_eq_obj that are currently unused.
Remove this stuff.

Fixes: fb6113e688e0 ("be2net: get rid of custom busy poll code")
Signed-off-by: Ivan Vecera 
---
 drivers/net/ethernet/emulex/benet/be.h | 13 -
 1 file changed, 13 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h 
b/drivers/net/ethernet/emulex/benet/be.h
index 6cf9d106c989..a4604dea4560 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -193,19 +193,6 @@ struct be_eq_obj {
struct napi_struct napi;
struct be_adapter *adapter;
cpumask_var_t  affinity_mask;
-
-#ifdef CONFIG_NET_RX_BUSY_POLL
-#define BE_EQ_IDLE 0
-#define BE_EQ_NAPI 1   /* napi owns this EQ */
-#define BE_EQ_POLL 2   /* poll owns this EQ */
-#define BE_EQ_LOCKED   (BE_EQ_NAPI | BE_EQ_POLL)
-#define BE_EQ_NAPI_YIELD   4   /* napi yielded this EQ */
-#define BE_EQ_POLL_YIELD   8   /* poll yielded this EQ */
-#define BE_EQ_YIELD(BE_EQ_NAPI_YIELD | BE_EQ_POLL_YIELD)
-#define BE_EQ_USER_PEND(BE_EQ_POLL | BE_EQ_POLL_YIELD)
-   unsigned int state;
-   spinlock_t lock;/* lock to serialize napi and busy-poll */
-#endif  /* CONFIG_NET_RX_BUSY_POLL */
 } cacheline_aligned_in_smp;
 
 struct be_aic_obj {/* Adaptive interrupt coalescing (AIC) info */
-- 
2.16.4

[PATCH net-next 4/8] be2net: reorder fields in be_eq_obj structure

2018-06-21 Thread Ivan Vecera

Re-order fields in struct be_eq_obj to ensure that .napi field begins
at start of cache-line. Also the .adapter field is moved to the first
cache-line next to .q field and 3 fields (idx,msi_idx,spurious_intr)
and the 4-bytes hole to 3rd cache-line.

Signed-off-by: Ivan Vecera 
---
 drivers/net/ethernet/emulex/benet/be.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h 
b/drivers/net/ethernet/emulex/benet/be.h
index e71e5e592626..716b4bc410f5 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -186,11 +186,11 @@ static inline void queue_tail_inc(struct be_queue_info *q)
 struct be_eq_obj {
struct be_queue_info q;
 
+   struct be_adapter *adapter;
+   struct napi_struct napi;
u8 idx; /* array index */
u8 msix_idx;
u16 spurious_intr;
-   struct napi_struct napi;
-   struct be_adapter *adapter;
cpumask_var_t  affinity_mask;
 } cacheline_aligned_in_smp;
 
-- 
2.16.4

[PATCH net-next 5/8] be2net: move txcp field in be_tx_obj to eliminate holes in the struct

2018-06-21 Thread Ivan Vecera

Before patch:
struct be_tx_obj {
u32db_offset;/* 0 4 */

/* XXX 4 bytes hole, try to pack */

struct be_queue_info   q;/* 856 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct be_queue_info   cq;   /*6456 */
struct be_tx_compl_infotxcp; /*   120 4 */

/* XXX 4 bytes hole, try to pack */

/* --- cacheline 2 boundary (128 bytes) --- */
struct sk_buff *   sent_skb_list[2048];  /*   128 16384 */
...
}:

After patch:
struct be_tx_obj {
u32db_offset;/* 0 4 */
struct be_tx_compl_infotxcp; /* 4 4 */
struct be_queue_info   q;/* 856 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct be_queue_info   cq;   /*6456 */
struct sk_buff *   sent_skb_list[2048];  /*   120 16384 */
...
};

Signed-off-by: Ivan Vecera 
---
 drivers/net/ethernet/emulex/benet/be.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h 
b/drivers/net/ethernet/emulex/benet/be.h
index 716b4bc410f5..91ca8d132e87 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -240,9 +240,9 @@ struct be_tx_compl_info {
 
 struct be_tx_obj {
u32 db_offset;
+   struct be_tx_compl_info txcp;
struct be_queue_info q;
struct be_queue_info cq;
-   struct be_tx_compl_info txcp;
/* Remember the skbs that were transmitted */
struct sk_buff *sent_skb_list[TX_Q_LEN];
struct be_tx_stats stats;
-- 
2.16.4

[PATCH net-next 7/8] be2net: re-order fields in be_error_recovert to avoid hole

2018-06-21 Thread Ivan Vecera

- Unionize two u8 fields where only one of them is used depending on NIC
chipset.
- Move recovery_supported field after that union

These changes eliminate 7-bytes hole in the struct and makes it smaller
by 8 bytes.

Signed-off-by: Ivan Vecera 
---
 drivers/net/ethernet/emulex/benet/be.h | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h 
b/drivers/net/ethernet/emulex/benet/be.h
index d521364e17cf..4f805be43180 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -522,11 +522,13 @@ enum {
 };
 
 struct be_error_recovery {
-   /* Lancer error recovery variables */
-   u8 recovery_retries;
+   union {
+   u8 recovery_retries;/* used for Lancer  */
+   u8 recovery_state;  /* used for BEx and Skyhawk */
+   };
 
/* BEx/Skyhawk error recovery variables */
-   u8 recovery_state;
+   bool recovery_supported;
u16 ue_to_reset_time;   /* Time after UE, to soft reset
 * the chip - PF0 only
 */
@@ -534,7 +536,6 @@ struct be_error_recovery {
 * of SLIPORT_SEMAPHORE reg
 */
u16 last_err_code;
-   bool recovery_supported;
unsigned long probe_time;
unsigned long last_recovery_time;
 
-- 
2.16.4

[PATCH net-next 6/8] be2net: remove unused tx_jiffies field from be_tx_stats

2018-06-21 Thread Ivan Vecera

Signed-off-by: Ivan Vecera 
---
 drivers/net/ethernet/emulex/benet/be.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h 
b/drivers/net/ethernet/emulex/benet/be.h
index 91ca8d132e87..d521364e17cf 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -217,7 +217,6 @@ struct be_tx_stats {
u64 tx_vxlan_offload_pkts;
u64 tx_reqs;
u64 tx_compl;
-   ulong tx_jiffies;
u32 tx_stops;
u32 tx_drv_drops;   /* pkts dropped by driver */
/* the error counters are described in be_ethtool.c */
-- 
2.16.4

[PATCH net-next 3/8] be2net: remove desc field from be_eq_obj

2018-06-21 Thread Ivan Vecera

The event queue description (be_eq_obj.desc) field is used only to format
string for IRQ name and it is not really needed to hold this value.
Remove it and use local variable to format string for IRQ name.

Signed-off-by: Ivan Vecera 
---
 drivers/net/ethernet/emulex/benet/be.h  | 1 -
 drivers/net/ethernet/emulex/benet/be_main.c | 6 --
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h 
b/drivers/net/ethernet/emulex/benet/be.h
index a4604dea4560..e71e5e592626 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -185,7 +185,6 @@ static inline void queue_tail_inc(struct be_queue_info *q)
 
 struct be_eq_obj {
struct be_queue_info q;
-   char desc[32];
 
u8 idx; /* array index */
u8 msix_idx;
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c 
b/drivers/net/ethernet/emulex/benet/be_main.c
index 8f755009ff38..05e4c0bb25f4 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -3403,9 +3403,11 @@ static int be_msix_register(struct be_adapter *adapter)
int status, i, vec;
 
for_all_evt_queues(adapter, eqo, i) {
-   sprintf(eqo->desc, "%s-q%d", netdev->name, i);
+   char irq_name[IFNAMSIZ+4];
+
+   snprintf(irq_name, sizeof(irq_name), "%s-q%d", netdev->name, i);
vec = be_msix_vec_get(adapter, eqo);
-   status = request_irq(vec, be_msix, 0, eqo->desc, eqo);
+   status = request_irq(vec, be_msix, 0, irq_name, eqo);
if (status)
goto err_msix;
 
-- 
2.16.4

[PATCH net-next 8/8] be2net: move rss_flags field in rss_info to ensure proper alignment

2018-06-21 Thread Ivan Vecera

The current position of .rss_flags field in struct rss_info causes
that fields .rsstable and .rssqueue (both 128 bytes long) crosses
cache-line boundaries. Moving it at the end properly align all fields.

Before patch:
struct rss_info {
u64rss_flags;/* 0 8 */
u8 rsstable[128];/* 8   128 */
/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
u8 rss_queue[128];   /*   136   128 */
/* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */
u8 rss_hkey[40]; /*   26440 */
};

After patch:
struct rss_info {
u8 rsstable[128];/* 0   128 */
/* --- cacheline 2 boundary (128 bytes) --- */
u8 rss_queue[128];   /*   128   128 */
/* --- cacheline 4 boundary (256 bytes) --- */
u8 rss_hkey[40]; /*   25640 */
u64rss_flags;/*   296 8 */
};

Signed-off-by: Ivan Vecera 
---
 drivers/net/ethernet/emulex/benet/be.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h 
b/drivers/net/ethernet/emulex/benet/be.h
index 4f805be43180..7005949dc17b 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -436,10 +436,10 @@ struct be_port_resources {
 #define be_is_os2bmc_enabled(adapter) (adapter->flags & BE_FLAGS_OS2BMC)
 
 struct rss_info {
-   u64 rss_flags;
u8 rsstable[RSS_INDIR_TABLE_LEN];
u8 rss_queue[RSS_INDIR_TABLE_LEN];
u8 rss_hkey[RSS_HASH_KEY_LEN];
+   u64 rss_flags;
 };
 
 #define BE_INVALID_DIE_TEMP0xFF
-- 
2.16.4

[PATCH net-next 1/8] be2net: remove unused old AIC info

2018-06-21 Thread Ivan Vecera

The commit 2632bafd74ae ("be2net: fix adaptive interrupt coalescing")
introduced a separate struct be_aic_obj to hold AIC information but
unfortunately left the old stuff in be_eq_obj. So remove it.

Fixes: 2632bafd74ae ("be2net: fix adaptive interrupt coalescing")
Signed-off-by: Ivan Vecera 
---
 drivers/net/ethernet/emulex/benet/be.h | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h 
b/drivers/net/ethernet/emulex/benet/be.h
index 382891f81e09..6cf9d106c989 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -187,13 +187,6 @@ struct be_eq_obj {
struct be_queue_info q;
char desc[32];
 
-   /* Adaptive interrupt coalescing (AIC) info */
-   bool enable_aic;
-   u32 min_eqd;/* in usecs */
-   u32 max_eqd;/* in usecs */
-   u32 eqd;/* configured val when aic is off */
-   u32 cur_eqd;/* in usecs */
-
u8 idx; /* array index */
u8 msix_idx;
u16 spurious_intr;
-- 
2.16.4

[PATCH net-next 0/8] be2net: small structures clean-up

2018-06-21 Thread Ivan Vecera

The series:
- removes unused / unneccessary fields in several be2net structures
- re-order fields in some structures to eliminate holes, cache-lines
  crosses
- as result reduces size of main struct be_adapter by 4kB

Ivan Vecera (8):
  be2net: remove unused old AIC info
  be2net: remove unused old custom busy-poll fields
  be2net: remove desc field from be_eq_obj
  be2net: reorder fields in be_eq_obj structure
  be2net: move txcp field in be_tx_obj to eliminate holes in the struct
  be2net: remove unused tx_jiffies field from be_tx_stats
  be2net: re-order fields in be_error_recovert to avoid hole
  be2net: move rss_flags field in rss_info to ensure proper alignment

 drivers/net/ethernet/emulex/benet/be.h  | 39 +++--
 drivers/net/ethernet/emulex/benet/be_main.c |  6 +++--
 2 files changed, 13 insertions(+), 32 deletions(-)

-- 
2.16.4

Re: [RFC v2, net-next, PATCH 4/4] net/cpsw_switchdev: add switchdev mode of operation on cpsw driver

2018-06-21 Thread Ivan Vecera

On 20.6.2018 20:03, Ilias Apalodimas wrote:
> Hi Florian,
> 
> On Wed, Jun 20, 2018 at 10:58:26AM -0700, Florian Fainelli wrote:
>> On 06/20/2018 10:51 AM, Ilias Apalodimas wrote:
>>> Hello Ivan,
>>> On Wed, Jun 20, 2018 at 02:56:48PM +0200, Ivan Vecera wrote:
>>>> On 18.6.2018 22:19, Ilias Apalodimas wrote:
>>>>> Jiri proposed using devlink, which makes sense, but i am not sure it's
>>>>> applicable on this patchset. This will change the driver completely and 
>>>>> will
>>>>> totally break backwards compatibility.
>>>>
>>>> Another good reason for a new driver.
>>>>
>>>> I.
>>> This is actually conflicting at least to my understanding. Jiri proposed 
>>> using 
>>> devlink was used as an alternative method to enable a new mode instead of 
>>> adding it on a .config option. A new driver wouldn't have a need for that 
>>> right?
>>
>> Correct, with a new driver would likely behave correctly upon being
>> probed such that you could have your switch ports act as normal network
>> devices from which you could run IP-config and do NFS boot.
> The current driver also does NFS properly and the 2 ethernet ports act as 
> normal
> network interfaces. The NFS section in the cover letter
> is to cover the cases were users running on NFS need to change the running
> switch configuration(starting from adding the 2 interfaces on a bridge). 
> Since iproute2 is located on the NFS filesystem the moment
> network connectivity is lost, you loose the ability to perform further
> configuration and in certian configuration scenarios render the device
> unusable.

Yes, with a new driver you can drop NFS-boot hack you mentioned in cover letter.
All configuration is done during driver probe and thus prior NFS mount. Only
thing you loose with a new driver is backward compatibility but the question is:
DO you really need it?

Ivan

Re: [RFC v2, net-next, PATCH 4/4] net/cpsw_switchdev: add switchdev mode of operation on cpsw driver

2018-06-20 Thread Ivan Vecera

On 20.6.2018 14:59, Ilias Apalodimas wrote:
> On Wed, Jun 20, 2018 at 02:53:47PM +0200, Ivan Vecera wrote:
>> On 20.6.2018 09:08, Jiri Pirko wrote:
>>> Tue, Jun 19, 2018 at 01:19:00AM CEST, grygorii.stras...@ti.com wrote:
>>>>
>>>>
>>>> On 06/14/2018 06:43 AM, Ilias Apalodimas wrote:
>>>>> On Thu, Jun 14, 2018 at 01:39:58PM +0200, Jiri Pirko wrote:
>>>>>> Thu, Jun 14, 2018 at 01:34:04PM CEST, ilias.apalodi...@linaro.org wrote:
>>>>>>> On Thu, Jun 14, 2018 at 01:30:28PM +0200, Jiri Pirko wrote:
>>>>>>>> Thu, Jun 14, 2018 at 01:11:30PM CEST, ilias.apalodi...@linaro.org 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> [...]
>>>>>>>>
>>>>>>>>> @@ -2711,6 +2789,10 @@ static int cpsw_probe_dt(struct 
>>>>>>>>> cpsw_platform_data *data,
>>>>>>>>>   if (of_property_read_bool(node, "dual_emac"))
>>>>>>>>>   data->switch_mode = CPSW_DUAL_EMAC;
>>>>>>>>>
>>>>>>>>> + /* switchdev overrides DTS */
>>>>>>>>> + if (IS_ENABLED(CONFIG_TI_CPSW_SWITCHDEV))
>>>>>>>>> + data->switch_mode = CPSW_SWITCHDEV;
>>>>>>>>
>>>>>>>> So you force CPSW_SWITCHDEV mode if the CONFIG_TI_CPSW_SWITCHDEV is
>>>>>>>> enabled. That does not sound right. I think that user should tell what
>>>>>>>> mode does he want regardless what the kernel config is.
>>>>>>> We discussed this during the V1 of the RFC. Yes it doesn't seem good, 
>>>>>>> but the
>>>>>>> device currently configures the modes using DTS (which is not correct). 
>>>>>>> I choose
>>>>>>> the .config due to that. I can't think of anything better, but i am 
>>>>>>> open to
>>>>>>> suggestions
>>>>>>
>>>>>> Agreed that DTS does fit as well. I think that this might be a job for
>>>>>> devlink parameters (patchset is going to be sent upstream next week).
>>>>>> You do have 1 bus address for the whole device (both ports), right?
>>>>>>
>>>>> Yes devlink sounds reasonable. I thyink there's only one bus for it, but 
>>>>> then
>>>>> again i am far from an expert on the hardware interrnals. Grygorii can 
>>>>> correct
>>>>> me if i am wrong.
>>>>
>>>> Devlink and NFS boot are not compatible as per my understanding, so .. 
>>>
>>> ? Why do you say so?
>>
>> Why aren't they compatible?? You can have devlink binary in initramfs and
>> configure the ASIC and ports via devlink batch file - prior mounting NFS 
>> root.
> This is doable but, are we taking into consideration device reconfiguration 
> (which was the reason for creating the minimal filesystem) or are we just 
> talking about device booting/initial config?
Devlink calls should be placed in setup.sh

I.

Re: [RFC v2, net-next, PATCH 4/4] net/cpsw_switchdev: add switchdev mode of operation on cpsw driver

2018-06-20 Thread Ivan Vecera

On 18.6.2018 22:19, Ilias Apalodimas wrote:
> Jiri proposed using devlink, which makes sense, but i am not sure it's
> applicable on this patchset. This will change the driver completely and will
> totally break backwards compatibility.

Another good reason for a new driver.

I.

Re: [RFC v2, net-next, PATCH 4/4] net/cpsw_switchdev: add switchdev mode of operation on cpsw driver

2018-06-20 Thread Ivan Vecera

On 20.6.2018 09:08, Jiri Pirko wrote:
> Tue, Jun 19, 2018 at 01:19:00AM CEST, grygorii.stras...@ti.com wrote:
>>
>>
>> On 06/14/2018 06:43 AM, Ilias Apalodimas wrote:
>>> On Thu, Jun 14, 2018 at 01:39:58PM +0200, Jiri Pirko wrote:
 Thu, Jun 14, 2018 at 01:34:04PM CEST, ilias.apalodi...@linaro.org wrote:
> On Thu, Jun 14, 2018 at 01:30:28PM +0200, Jiri Pirko wrote:
>> Thu, Jun 14, 2018 at 01:11:30PM CEST, ilias.apalodi...@linaro.org wrote:
>>
>> [...]
>>
>>> @@ -2711,6 +2789,10 @@ static int cpsw_probe_dt(struct 
>>> cpsw_platform_data *data,
>>> if (of_property_read_bool(node, "dual_emac"))
>>> data->switch_mode = CPSW_DUAL_EMAC;
>>>
>>> +   /* switchdev overrides DTS */
>>> +   if (IS_ENABLED(CONFIG_TI_CPSW_SWITCHDEV))
>>> +   data->switch_mode = CPSW_SWITCHDEV;
>>
>> So you force CPSW_SWITCHDEV mode if the CONFIG_TI_CPSW_SWITCHDEV is
>> enabled. That does not sound right. I think that user should tell what
>> mode does he want regardless what the kernel config is.
> We discussed this during the V1 of the RFC. Yes it doesn't seem good, but 
> the
> device currently configures the modes using DTS (which is not correct). I 
> choose
> the .config due to that. I can't think of anything better, but i am open 
> to
> suggestions

 Agreed that DTS does fit as well. I think that this might be a job for
 devlink parameters (patchset is going to be sent upstream next week).
 You do have 1 bus address for the whole device (both ports), right?

>>> Yes devlink sounds reasonable. I thyink there's only one bus for it, but 
>>> then
>>> again i am far from an expert on the hardware interrnals. Grygorii can 
>>> correct
>>> me if i am wrong.
>>
>> Devlink and NFS boot are not compatible as per my understanding, so .. 
> 
> ? Why do you say so?

Why aren't they compatible?? You can have devlink binary in initramfs and
configure the ASIC and ports via devlink batch file - prior mounting NFS root.

>>
>> Again, current driver, as is, supports NFS boot in all modes
>> (how good is the cur driver is question which out of scope of this 
>> discussion).
>>
>> And we discussed this in RFC v1 - driver mode selection will be changed 
>> if we will proceed and it will be new driver for proper switch support.
>>
>> Not sure about about Devlink (need to study it and we never got any requests 
>> from end
>> users for this as I know), and I'd like to note (again) that this is 
>> embedded 
>> (industrial/automotive etc), so everything preferred to be simple, fast and
>> preferably configured statically (in most of the cases end users what boot 
>> time 
>> configuration).
> 
> You need to study it indeed.
> 
>>
>> -- 
>> regards,
>> -grygorii

[PATCH ethtool 6/6] ethtool: remove unreachable code

2018-06-08 Thread Ivan Vecera

The default switch case is unreachable as the MAX_CHANNEL_NUM == 4.

Fixes: a5e73bb ("ethtool:QSFP Plus/QSFP28 Diagnostics Information Support")
Cc: Vidya Sagar Ravipati 
Signed-off-by: Ivan Vecera 
---
 qsfp.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/qsfp.c b/qsfp.c
index aecd5bb..32e195d 100644
--- a/qsfp.c
+++ b/qsfp.c
@@ -661,9 +661,6 @@ static void sff8636_dom_parse(const __u8 *id, struct 
sff_diags *sd)
tx_power_offset = SFF8636_TX_PWR_4_OFFSET;
tx_bias_offset = SFF8636_TX_BIAS_4_OFFSET;
break;
-   default:
-   printf(" Invalid channel: %d\n", i);
-   break;
}
sd->scd[i].bias_cur = OFFSET_TO_U16(tx_bias_offset);
sd->scd[i].rx_power = OFFSET_TO_U16(rx_power_offset);
-- 
2.16.4

[PATCH ethtool 1/6] ethtool: fix uninitialized return value

2018-06-08 Thread Ivan Vecera

Fixes: b0fe96d ("Ethtool: Implements ETHTOOL_PHY_GTUNABLE/ETHTOOL_PHY_STUNABLE 
and PHY downshift")
Signed-off-by: Ivan Vecera 
---
 ethtool.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/ethtool.c b/ethtool.c
index 2e87384..e7495fe 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -4723,8 +4723,8 @@ static int do_get_phy_tunable(struct cmd_context *ctx)
 {
int argc = ctx->argc;
char **argp = ctx->argp;
-   int err, i;
u8 downshift_changed = 0;
+   int i;
 
if (argc < 1)
exit_bad_args();
@@ -4750,8 +4750,7 @@ static int do_get_phy_tunable(struct cmd_context *ctx)
cont.ds.id = ETHTOOL_PHY_DOWNSHIFT;
cont.ds.type_id = ETHTOOL_TUNABLE_U8;
cont.ds.len = 1;
-   err = send_ioctl(ctx, );
-   if (err < 0) {
+   if (send_ioctl(ctx, ) < 0) {
perror("Cannot Get PHY downshift count");
return 87;
}
@@ -4762,7 +4761,7 @@ static int do_get_phy_tunable(struct cmd_context *ctx)
fprintf(stdout, "Downshift disabled\n");
}
 
-   return err;
+   return 0;
 }
 
 static __u32 parse_reset(char *val, __u32 bitset, char *arg, __u32 *data)
-- 
2.16.4

[PATCH ethtool 3/6] ethtool: remove unused global variable

2018-06-08 Thread Ivan Vecera

Fixes: 2c2ee7a ("ethtool: Add support for sfc register dumps")
Signed-off-by: Ivan Vecera 
---
 sfc.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/sfc.c b/sfc.c
index 9478b38..b4c590f 100644
--- a/sfc.c
+++ b/sfc.c
@@ -3083,9 +3083,6 @@ static const struct efx_nic_reg_field 
efx_nic_reg_fields_TX_PACE[] = {
REGISTER_FIELD_BZ(TX_PACE_SB_AF),
REGISTER_FIELD_BZ(TX_PACE_SB_NOT_AF),
 };
-static const struct efx_nic_reg_field efx_nic_reg_fields_TX_PACE_DROP_QID[] = {
-   REGISTER_FIELD_BZ(TX_PACE_QID_DRP_CNT),
-};
 static const struct efx_nic_reg_field efx_nic_reg_fields_TX_VLAN[] = {
REGISTER_FIELD_BB(TX_VLAN0),
REGISTER_FIELD_BB(TX_VLAN0_PORT0_EN),
-- 
2.16.4

[PATCH ethtool 4/6] ethtool: several fixes in do_gregs()

2018-06-08 Thread Ivan Vecera

- correctly close gregs_dump_file in case of fstat() failure
- check for error from realloc

Fixes: be4c2d0 ("ethtool.c: fix dump_regs heap corruption")
Cc: David Decotigny 
Signed-off-by: Ivan Vecera 
---
 ethtool.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/ethtool.c b/ethtool.c
index e7495fe..2b90984 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -3179,17 +3179,26 @@ static int do_gregs(struct cmd_context *ctx)
if (!gregs_dump_raw && gregs_dump_file != NULL) {
/* overwrite reg values from file dump */
FILE *f = fopen(gregs_dump_file, "r");
+   struct ethtool_regs *nregs;
struct stat st;
size_t nread;
 
if (!f || fstat(fileno(f), ) < 0) {
fprintf(stderr, "Can't open '%s': %s\n",
gregs_dump_file, strerror(errno));
+   if (f)
+   fclose(f);
free(regs);
return 75;
}
 
-   regs = realloc(regs, sizeof(*regs) + st.st_size);
+   nregs = realloc(regs, sizeof(*regs) + st.st_size);
+   if (!nregs) {
+   perror("Cannot allocate memory for register dump");
+   free(regs); /* was not freed by realloc */
+   return 73;
+   }
+   regs = nregs;
regs->len = st.st_size;
nread = fread(regs->data, regs->len, 1, f);
fclose(f);
-- 
2.16.4

[PATCH ethtool 5/6] ethtool: correctly free hkey when get_stringset() fails

2018-06-08 Thread Ivan Vecera

Memory allocated for 'hkey' is not freed when
get_stringset(..., ETH_SS_RSS_HASH_FUNCS...) fails.

Fixes: b888f35 ("ethtool: Support for configurable RSS hash function")
Cc: Gal Pressman 
Signed-off-by: Ivan Vecera 
---
 ethtool.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/ethtool.c b/ethtool.c
index 2b90984..fb93ae8 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -3910,7 +3910,7 @@ static int do_srxfhindir(struct cmd_context *ctx, int 
rxfhindir_default,
 static int do_srxfh(struct cmd_context *ctx)
 {
struct ethtool_rxfh rss_head = {0};
-   struct ethtool_rxfh *rss;
+   struct ethtool_rxfh *rss = NULL;
struct ethtool_rxnfc ring_count;
int rxfhindir_equal = 0, rxfhindir_default = 0;
struct ethtool_gstrings *hfuncs = NULL;
@@ -4064,7 +4064,8 @@ static int do_srxfh(struct cmd_context *ctx)
hfuncs = get_stringset(ctx, ETH_SS_RSS_HASH_FUNCS, 0, 1);
if (!hfuncs) {
perror("Cannot get hash functions names");
-   return 1;
+   err = 1;
+   goto free;
}
 
for (i = 0; i < hfuncs->len && !req_hfunc ; i++) {
@@ -4078,8 +4079,8 @@ static int do_srxfh(struct cmd_context *ctx)
if (!req_hfunc) {
fprintf(stderr,
"Unknown hash function: %s\n", req_hfunc_name);
-   free(hfuncs);
-   return 1;
+   err = 1;
+   goto free;
}
}
 
@@ -4120,9 +4121,7 @@ static int do_srxfh(struct cmd_context *ctx)
}
 
 free:
-   if (hkey)
-   free(hkey);
-
+   free(hkey);
free(rss);
free(hfuncs);
return err;
-- 
2.16.4

[PATCH ethtool 2/6] ethtool: fix RING_VF assignment

2018-06-08 Thread Ivan Vecera

Fixes: 36ee712 ("ethtool: support queue and VF fields for rxclass filters")
Cc: Jacob Keller 
Signed-off-by: Ivan Vecera 
---
 rxclass.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/rxclass.c b/rxclass.c
index ce4b382..42d122d 100644
--- a/rxclass.c
+++ b/rxclass.c
@@ -1066,7 +1066,7 @@ static int rxclass_get_val(char *str, unsigned char *p, 
u32 *flags,
val++;
 
*(u64 *)[opt->offset] &= ~ETHTOOL_RX_FLOW_SPEC_RING_VF;
-   *(u64 *)[opt->offset] = (u64)val << 
ETHTOOL_RX_FLOW_SPEC_RING_VF_OFF;
+   *(u64 *)[opt->offset] |= (u64)val << 
ETHTOOL_RX_FLOW_SPEC_RING_VF_OFF;
break;
}
case OPT_RING_QUEUE: {
-- 
2.16.4

[PATCH iproute2] devlink: don't enforce NETLINK_{CAP,EXT}_ACK sock opts

2018-06-01 Thread Ivan Vecera

Since commit 049c58539f5d ("devlink: mnlg: Add support for extended ack")
devlink requires NETLINK_{CAP,EXT}_ACK. This prevents devlink from
working with older kernels that don't support these features.

host # ./devlink/devlink
Failed to connect to devlink Netlink

Fixes: 049c58539f5d ("devlink: mnlg: Add support for extended ack")
Cc: Arkadi Sharshevsky 
Cc: Stephen Hemminger 
Signed-off-by: Ivan Vecera 
---
 devlink/mnlg.c | 14 +++---
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/devlink/mnlg.c b/devlink/mnlg.c
index 3d28453a..c33c90be 100644
--- a/devlink/mnlg.c
+++ b/devlink/mnlg.c
@@ -271,15 +271,9 @@ struct mnlg_socket *mnlg_socket_open(const char 
*family_name, uint8_t version)
if (!nlg->nl)
goto err_mnl_socket_open;
 
-   err = mnl_socket_setsockopt(nlg->nl, NETLINK_CAP_ACK, ,
-   sizeof(one));
-   if (err)
-   goto err_mnl_set_ack;
-
-   err = mnl_socket_setsockopt(nlg->nl, NETLINK_EXT_ACK, ,
-   sizeof(one));
-   if (err)
-   goto err_mnl_set_ext_ack;
+   /* Older kernels may no support capped/extended ACK reporting */
+   mnl_socket_setsockopt(nlg->nl, NETLINK_CAP_ACK, , sizeof(one));
+   mnl_socket_setsockopt(nlg->nl, NETLINK_EXT_ACK, , sizeof(one));
 
err = mnl_socket_bind(nlg->nl, 0, MNL_SOCKET_AUTOPID);
if (err < 0)
@@ -305,8 +299,6 @@ struct mnlg_socket *mnlg_socket_open(const char 
*family_name, uint8_t version)
 err_mnlg_socket_recv_run:
 err_mnlg_socket_send:
 err_mnl_socket_bind:
-err_mnl_set_ext_ack:
-err_mnl_set_ack:
mnl_socket_close(nlg->nl);
 err_mnl_socket_open:
free(nlg->buf);
-- 
2.16.1

Re: [PATCH 0/4] RFC CPSW switchdev mode

2018-05-24 Thread Ivan Vecera

On 24.5.2018 14:54, Andrew Lunn wrote:
> On Thu, May 24, 2018 at 11:48:31AM +0300, Ilias Apalodimas wrote:
>> On Thu, May 24, 2018 at 10:05:28AM +0200, Jiri Pirko wrote:
>>> Thu, May 24, 2018 at 08:56:20AM CEST, ilias.apalodi...@linaro.org wrote:
>>> Any reason you need cpu port? We don't need it in mlxsw and also in dsa.
>> Yes i've seen that on mlxsw/rocker drivers and i was reluctant adding one 
>> here. 
>> The reason is that TI wants this configured differently from customer facing 
>> ports. Apparently there are existing customers already using the "feature". 
>> So OR'ing and adding the cpu port on every operation (add/del vlans add 
>> ucast/mcast entries etc) was less favoured. 
> 
> Hi Ilias
> 
> Nice to see this device moving away from its custom model and towards
> the switchdev model.
+1

> Did you consider making a clean break from the existing code and write
> a new driver. Let the existing customers using the existing
> driver. Have the new switchdev driver fully conform to switchdev.

I would also prefer fresh new driver. The existing one can be marked as
'bugfix-only' and later pertinently deprecated/removed.
> 
> I don't like having this 'cpu' interface. As you say, it breaks the
> switchhdev model. If we need to extend the switchdev model to support
> some use case, lets do that. Please can you fully describe the use
> cases, so we can discuss how to implement them cleanly within the
> switchdev model.
+1

Ivan

Re: [PATCH net-next mlxsw v2 2/2] net: bridge: Notify about !added_by_user FDB entries

2018-05-03 Thread Ivan Vecera

ridge *br, struct net_bridge_port 
> *p,
> -   const unsigned char *addr, u16 vid)
> +   const unsigned char *addr, u16 vid,
> +   bool swdev_notify)
>  {
>   struct net_bridge_fdb_entry *fdb;
>   int err = 0;
> @@ -1121,7 +1126,7 @@ int br_fdb_external_learn_del(struct net_bridge *br, 
> struct net_bridge_port *p,
>  
>   fdb = br_fdb_find(br, addr, vid);
>   if (fdb && fdb->added_by_external_learn)
> - fdb_delete(br, fdb);
> + fdb_delete(br, fdb, swdev_notify);
>   else
>   err = -ENOENT;
>  
> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index 1a50931..80a69b8 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -553,9 +553,11 @@ int br_fdb_dump(struct sk_buff *skb, struct 
> netlink_callback *cb,
>  int br_fdb_sync_static(struct net_bridge *br, struct net_bridge_port *p);
>  void br_fdb_unsync_static(struct net_bridge *br, struct net_bridge_port *p);
>  int br_fdb_external_learn_add(struct net_bridge *br, struct net_bridge_port 
> *p,
> -   const unsigned char *addr, u16 vid);
> +   const unsigned char *addr, u16 vid,
> +   bool swdev_notify);
>  int br_fdb_external_learn_del(struct net_bridge *br, struct net_bridge_port 
> *p,
> -   const unsigned char *addr, u16 vid);
> +   const unsigned char *addr, u16 vid,
> +   bool swdev_notify);
>  void br_fdb_offloaded_set(struct net_bridge *br, struct net_bridge_port *p,
> const unsigned char *addr, u16 vid);
>  
> diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
> index 71a03c4..35474d4 100644
> --- a/net/bridge/br_switchdev.c
> +++ b/net/bridge/br_switchdev.c
> @@ -118,7 +118,7 @@ br_switchdev_fdb_call_notifiers(bool adding, const 
> unsigned char *mac,
>  void
>  br_switchdev_fdb_notify(const struct net_bridge_fdb_entry *fdb, int type)
>  {
> - if (!fdb->added_by_user || !fdb->dst)
> + if (!fdb->dst)
>   return;
>  
>   switch (type) {
> 

Acked-by: Ivan Vecera <ivec...@redhat.com>

Re: [PATCH net-next mlxsw v2 1/2] switchdev: Add fdb.added_by_user to switchdev notifications

2018-05-03 Thread Ivan Vecera

   fdb->added_by_user);
>   break;
>   case RTM_NEWNEIGH:
>   br_switchdev_fdb_call_notifiers(true, fdb->key.addr.addr,
>   fdb->key.vlan_id,
> - fdb->dst->dev);
> + fdb->dst->dev,
> + fdb->added_by_user);
>   break;
>   }
>  }
> diff --git a/net/dsa/slave.c b/net/dsa/slave.c
> index f3fb3a0..c287f1e 100644
> --- a/net/dsa/slave.c
> +++ b/net/dsa/slave.c
> @@ -1441,6 +1441,7 @@ static int dsa_slave_switchdev_event(struct 
> notifier_block *unused,
>unsigned long event, void *ptr)
>  {
>   struct net_device *dev = switchdev_notifier_info_to_dev(ptr);
> + struct switchdev_notifier_fdb_info *fdb_info = ptr;
>   struct dsa_switchdev_event_work *switchdev_work;
>  
>   if (!dsa_slave_dev_check(dev))
> @@ -1458,8 +1459,10 @@ static int dsa_slave_switchdev_event(struct 
> notifier_block *unused,
>   switch (event) {
>   case SWITCHDEV_FDB_ADD_TO_DEVICE: /* fall through */
>   case SWITCHDEV_FDB_DEL_TO_DEVICE:
> + if (!fdb_info->added_by_user)
> + break;
>   if (dsa_slave_switchdev_fdb_work_init(switchdev_work,
> -   ptr))
> +   fdb_info))
>   goto err_fdb_work_init;
>   dev_hold(dev);
>   break;
> 
LGTM

Acked-by: Ivan Vecera <ivec...@redhat.com>

[PATCH net-next v2] ipv6: addrconf: don't evaluate keep_addr_on_down twice

2018-04-24 Thread Ivan Vecera

The addrconf_ifdown() evaluates keep_addr_on_down state twice. There
is no need to do it.

Cc: David Ahern <dsah...@gmail.com>
Signed-off-by: Ivan Vecera <c...@cera.cz>
---
 net/ipv6/addrconf.c | 23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 7b4d7bbf2c17..fbfd71a2d9c8 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3622,8 +3622,7 @@ static int addrconf_ifdown(struct net_device *dev, int 
how)
struct net *net = dev_net(dev);
struct inet6_dev *idev;
struct inet6_ifaddr *ifa, *tmp;
-   int _keep_addr;
-   bool keep_addr;
+   bool keep_addr = false;
int state, i;
 
ASSERT_RTNL();
@@ -3649,15 +3648,18 @@ static int addrconf_ifdown(struct net_device *dev, int 
how)
 
}
 
-   /* aggregate the system setting and interface setting */
-   _keep_addr = net->ipv6.devconf_all->keep_addr_on_down;
-   if (!_keep_addr)
-   _keep_addr = idev->cnf.keep_addr_on_down;
-
/* combine the user config with event to determine if permanent
 * addresses are to be removed from address hash table
 */
-   keep_addr = !(how || _keep_addr <= 0 || idev->cnf.disable_ipv6);
+   if (!how && !idev->cnf.disable_ipv6) {
+   /* aggregate the system setting and interface setting */
+   int _keep_addr = net->ipv6.devconf_all->keep_addr_on_down;
+
+   if (!_keep_addr)
+   _keep_addr = idev->cnf.keep_addr_on_down;
+
+   keep_addr = (_keep_addr > 0);
+   }
 
/* Step 2: clear hash table */
for (i = 0; i < IN6_ADDR_HSIZE; i++) {
@@ -3707,11 +3709,6 @@ static int addrconf_ifdown(struct net_device *dev, int 
how)
write_lock_bh(>lock);
}
 
-   /* re-combine the user config with event to determine if permanent
-* addresses are to be removed from the interface list
-*/
-   keep_addr = (!how && _keep_addr > 0 && !idev->cnf.disable_ipv6);
-
list_for_each_entry_safe(ifa, tmp, >addr_list, if_list) {
struct fib6_info *rt = NULL;
bool keep;
-- 
2.16.1

[PATCH net-next] ipv6: addrconf: don't evaluate keep_addr_on_down twice

2018-04-24 Thread Ivan Vecera

The addrconf_ifdown() evaluates keep_addr_on_down state twice. There
is no need to do it.

Cc: David Ahern <dsah...@gmail.com>
Signed-off-by: Ivan Vecera <c...@cera.cz>
---
 net/ipv6/addrconf.c | 23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 78cef00c9596..f40e25fd15ee 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3612,8 +3612,7 @@ static int addrconf_ifdown(struct net_device *dev, int 
how)
struct net *net = dev_net(dev);
struct inet6_dev *idev;
struct inet6_ifaddr *ifa, *tmp;
-   int _keep_addr;
-   bool keep_addr;
+   bool keep_addr = false;
int state, i;
 
ASSERT_RTNL();
@@ -3639,15 +3638,18 @@ static int addrconf_ifdown(struct net_device *dev, int 
how)
 
}
 
-   /* aggregate the system setting and interface setting */
-   _keep_addr = net->ipv6.devconf_all->keep_addr_on_down;
-   if (!_keep_addr)
-   _keep_addr = idev->cnf.keep_addr_on_down;
-
/* combine the user config with event to determine if permanent
 * addresses are to be removed from address hash table
 */
-   keep_addr = !(how || _keep_addr <= 0 || idev->cnf.disable_ipv6);
+   if (!how && !idev->cnf.disable_ipv6) {
+   /* aggregate the system setting and interface setting */
+   int _keep_addr = net->ipv6.devconf_all->keep_addr_on_down;
+
+   if (!_keep_addr)
+   _keep_addr = idev->cnf.keep_addr_on_down;
+
+   keep_addr = (_keep_addr > 0);
+   }
 
/* Step 2: clear hash table */
for (i = 0; i < IN6_ADDR_HSIZE; i++) {
@@ -3697,11 +3699,6 @@ static int addrconf_ifdown(struct net_device *dev, int 
how)
write_lock_bh(>lock);
}
 
-   /* re-combine the user config with event to determine if permanent
-* addresses are to be removed from the interface list
-*/
-   keep_addr = (!how && _keep_addr > 0 && !idev->cnf.disable_ipv6);
-
list_for_each_entry_safe(ifa, tmp, >addr_list, if_list) {
struct rt6_info *rt = NULL;
bool keep;
-- 
2.16.1

[PATCH net] net/sched: cls_u32: fix cls_u32 on filter replace

2018-02-08 Thread Ivan Vecera

The following sequence is currently broken:

 # tc qdisc add dev foo ingress
 # tc filter replace dev foo protocol all ingress \
   u32 match u8 0 0 action mirred egress mirror dev bar1
 # tc filter replace dev foo protocol all ingress \
   handle 800::800 pref 49152 \
   u32 match u8 0 0 action mirred egress mirror dev bar2
 Error: cls_u32: Key node flags do not match passed flags.
 We have an error talking to the kernel, -1

The error comes from u32_change() when comparing new and
existing flags. The existing ones always contains one of
TCA_CLS_FLAGS_{,NOT}_IN_HW flag depending on offloading state.
These flags cannot be passed from userspace so the condition
(n->flags != flags) in u32_change() always fails.

Fix the condition so the flags TCA_CLS_FLAGS_NOT_IN_HW and
TCA_CLS_FLAGS_IN_HW are not taken into account.

Fixes: 24d3dc6d27ea ("net/sched: cls_u32: Reflect HW offload status")
Signed-off-by: Ivan Vecera <ivec...@redhat.com>
---
 net/sched/cls_u32.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 6311a548046b..c75e68e839c7 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -955,7 +955,8 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
return -EINVAL;
}
 
-   if (n->flags != flags) {
+   if ((n->flags ^ flags) &
+   ~(TCA_CLS_FLAGS_IN_HW | TCA_CLS_FLAGS_NOT_IN_HW)) {
NL_SET_ERR_MSG_MOD(extack, "Key node flags do not match 
passed flags");
return -EINVAL;
}
-- 
2.13.6

Re: [PATCH net v2] cls_u32: fix use after free in u32_destroy_key()

2018-02-02 Thread Ivan Vecera

==
> 
> The problem is that the htnode is freed before the linked knodes and the
> latter will try to access the first at u32_destroy_key() time.
> This change addresses the issue using the htnode refcnt to guarantee
> the correct free order. While at it also add a RCU annotation,
> to keep sparse happy.
> 
> v1 -> v2: use rtnl_derefence() instead of RCU read locks
> 
> Reported-by: Li Shuang <shu...@redhat.com>
> Fixes: c0d378ef1266 ("net_sched: use tcf_queue_work() in u32 filter")
> Signed-off-by: Paolo Abeni <pab...@redhat.com>
> ---
>  net/sched/cls_u32.c | 19 +++
>  1 file changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
> index 60c892c36a60..10440fbf3c68 100644
> --- a/net/sched/cls_u32.c
> +++ b/net/sched/cls_u32.c
> @@ -398,10 +398,12 @@ static int u32_init(struct tcf_proto *tp)
>  static int u32_destroy_key(struct tcf_proto *tp, struct tc_u_knode *n,
>  bool free_pf)
>  {
> + struct tc_u_hnode *ht = rtnl_dereference(n->ht_down);
> +
>   tcf_exts_destroy(>exts);
>   tcf_exts_put_net(>exts);
> - if (n->ht_down)
> - n->ht_down->refcnt--;
> + if (ht && ht->refcnt-- == 0)
> + kfree(ht);
>  #ifdef CONFIG_CLS_U32_PERF
>   if (free_pf)
>   free_percpu(n->pf);
> @@ -624,7 +626,12 @@ static int u32_destroy_hnode(struct tcf_proto *tp, 
> struct tc_u_hnode *ht,
>   idr_destroy(>handle_idr);
>   idr_remove_ext(_c->handle_idr, ht->handle);
>   RCU_INIT_POINTER(*hn, ht->next);
> - kfree_rcu(ht, rcu);
> +
> + /* u32_destroy_key() will will later free ht for us, if
> +  * it's still referenced by some knode
> +  */
> + if (ht->refcnt == 0)
> + kfree_rcu(ht, rcu);
>   return 0;
>   }
>   }
> @@ -667,7 +674,11 @@ static void u32_destroy(struct tcf_proto *tp, struct 
> netlink_ext_ack *extack)
>  
>   while ((ht = rtnl_dereference(tp_c->hlist)) != NULL) {
>   RCU_INIT_POINTER(tp_c->hlist, ht->next);
> - kfree_rcu(ht, rcu);
> + /* u32_destroy_key() will will later free ht for us, if
> +  * it's still referenced by some knode
> +  */
> + if (ht->refcnt == 0)
> + kfree_rcu(ht, rcu);
>   }
>  
>   idr_destroy(_c->handle_idr);
> 
Good catch, Paolo.

Tested-by: Ivan Vecera <ivec...@redhat.com>

[PATCH net] be2net: restore properly promisc mode after queues reconfiguration

2018-01-19 Thread Ivan Vecera

The commit 622190669403 ("be2net: Request RSS capability of Rx interface
depending on number of Rx rings") modified be_update_queues() so the
IFACE (HW representation of the netdevice) is destroyed and then
re-created. This causes a regression because potential promiscuous mode
is not restored properly during be_open() because the driver thinks
that the HW has promiscuous mode already enabled.

Note that Lancer is not affected by this bug because RX-filter flags are
disabled during be_close() for this chipset.

Cc: Sathya Perla <sathya.pe...@broadcom.com>
Cc: Ajit Khaparde <ajit.khapa...@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapa...@broadcom.com>
Cc: Somnath Kotur <somnath.ko...@broadcom.com>

Fixes: 622190669403 ("be2net: Request RSS capability of Rx interface depending 
on number of Rx rings")
Signed-off-by: Ivan Vecera <ivec...@redhat.com>
---
 drivers/net/ethernet/emulex/benet/be_main.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c 
b/drivers/net/ethernet/emulex/benet/be_main.c
index c6e859a27ee6..e180657a02ef 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -4634,6 +4634,15 @@ int be_update_queues(struct be_adapter *adapter)
 
be_schedule_worker(adapter);
 
+   /*
+* The IF was destroyed and re-created. We need to clear
+* all promiscuous flags valid for the destroyed IF.
+* Without this promisc mode is not restored during
+* be_open() because the driver thinks that it is
+* already enabled in HW.
+*/
+   adapter->if_flags &= ~BE_IF_FLAGS_ALL_PROMISCUOUS;
+
if (netif_running(netdev))
status = be_open(netdev);
 
-- 
2.13.6

Re: [patch net resend] net: sched: cbq: create block for q->link.block

2017-11-28 Thread Ivan Vecera

On 27.11.2017 18:37, Jiri Pirko wrote:
> From: Jiri Pirko <j...@mellanox.com>
> 
> q->link.block is not initialized, that leads to EINVAL when one tries to
> add filter there. So initialize it properly.
> 
> This can be reproduced by:
> $ tc qdisc add dev eth0 root handle 1: cbq avpkt 1000 rate 1000Mbit bandwidth 
> 1000Mbit
> $ tc filter add dev eth0 parent 1: protocol ip prio 100 u32 match ip protocol 
> 0 0x00 flowid 1:1
> 
> Reported-by: Jaroslav Aster <jas...@redhat.com>
> Reported-by: Ivan Vecera <ivec...@redhat.com>
> Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure")
> Signed-off-by: Jiri Pirko <j...@mellanox.com>
> ---
>  net/sched/sch_cbq.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
> index 6361be7..525eb3a 100644
> --- a/net/sched/sch_cbq.c
> +++ b/net/sched/sch_cbq.c
> @@ -1158,9 +1158,13 @@ static int cbq_init(struct Qdisc *sch, struct nlattr 
> *opt)
>   if ((q->link.R_tab = qdisc_get_rtab(r, tb[TCA_CBQ_RTAB])) == NULL)
>   return -EINVAL;
>  
> + err = tcf_block_get(>link.block, >link.filter_list, sch);
> + if (err)
> + goto put_rtab;
> +
>   err = qdisc_class_hash_init(>clhash);
>   if (err < 0)
> - goto put_rtab;
> + goto put_block;
>  
>   q->link.sibling = >link;
>   q->link.common.classid = sch->handle;
> @@ -1194,6 +1198,9 @@ static int cbq_init(struct Qdisc *sch, struct nlattr 
> *opt)
>   cbq_addprio(q, >link);
>   return 0;
>  
> +put_block:
> + tcf_block_put(q->link.block);
> +
>  put_rtab:
>   qdisc_put_rtab(q->link.R_tab);
>   return err;
> 
Thanks for quick fix.

Reviewed-by: Ivan Vecera <ivec...@redhat.com>

Re: [PATCH iproute2 2/2] devlink: add batch command support

2017-11-09 Thread Ivan Vecera

On 10.11.2017 07:57, Leon Romanovsky wrote:
> On Fri, Nov 10, 2017 at 07:20:14AM +0100, Ivan Vecera wrote:
>> The patch adds support to batch devlink commands.
>>
>> Cc: Jiri Pirko <j...@mellanox.com>
>> Cc: Arkadi Sharshevsky <arka...@mellanox.com>
>> Signed-off-by: Ivan Vecera <ivec...@redhat.com>
>> ---
>>  devlink/devlink.c  | 70 
>> +++---
>>  man/man8/devlink.8 | 16 +
>>  2 files changed, 78 insertions(+), 8 deletions(-)
>>
> 
> <..>
> 
>> diff --git a/man/man8/devlink.8 b/man/man8/devlink.8
>> index a480766c..a975ef34 100644
>> --- a/man/man8/devlink.8
>> +++ b/man/man8/devlink.8
>> @@ -12,6 +12,12 @@ devlink \- Devlink tool
>>  .sp
>>
>>  .ti -8
>> +.B devlink
>> +.RB "[ " -force " ] "
>> +.BI "-batch " filename
>> +.sp
>> +
>> +.ti -8
>>  .IR OBJECT " := { "
>>  .BR dev " | " port " | " monitor " }"
>>  .sp
>> @@ -32,6 +38,16 @@ Print the version of the
>>  utility and exit.
>>
>>  .TP
>> +.BR "\-b", " \-batch " 
>> +Read commands from provided file or standard input and invoke them.
>> +First failure will cause termination of devlink.
> 
> It is worth to document the expected format of that file.
> And IMHO, it is better to have ability to load JSON fie which was
> generated by -j, instead of declaring new format/knob.
It's just a list of command-lines... like other utils (bridge,ip...)

I.



signature.asc
Description: OpenPGP digital signature

[PATCH iproute2 0/2] add batch command support to devlink

2017-11-09 Thread Ivan Vecera

This patch series adds support for devlink commands batching. The first
just removes a requirement to have declared 'resolve_hosts' variable in
any command that use any function implemented in utils.c (it is really
confusing to see this declaration in utils like bridge or devlink).

Ivan Vecera (2):
  lib: make resolve_hosts variable common
  devlink: add batch command support

 bridge/bridge.c|  1 -
 devlink/devlink.c  | 70 +++---
 genl/genl.c|  1 -
 ip/ip.c|  1 -
 ip/rtmon.c |  1 -
 lib/utils.c|  1 +
 man/man8/devlink.8 | 16 +
 misc/arpd.c|  2 --
 misc/ss.c  |  1 -
 tc/tc.c|  1 -
 10 files changed, 79 insertions(+), 16 deletions(-)

-- 
2.13.6

[PATCH iproute2 1/2] lib: make resolve_hosts variable common

2017-11-09 Thread Ivan Vecera

Any iproute utility that uses any function from lib/utils.c needs
to declare its own resolve_hosts variable instance although it does
not need/use hostname resolving functionality (currently only 'ip'
and 'ss' commands uses this).
The patch declares single common instance of resolve_hosts directly
in utils.c so the existing ones can be removed (the same approach
that is used for timestamp_short).

Cc: Jiri Pirko <j...@mellanox.com>
Cc: Arkadi Sharshevsky <arka...@mellanox.com>
Signed-off-by: Ivan Vecera <ivec...@redhat.com>
---
 bridge/bridge.c | 1 -
 genl/genl.c | 1 -
 ip/ip.c | 1 -
 ip/rtmon.c  | 1 -
 lib/utils.c | 1 +
 misc/arpd.c | 2 --
 misc/ss.c   | 1 -
 tc/tc.c | 1 -
 8 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/bridge/bridge.c b/bridge/bridge.c
index 5ff038d6..6658cb8f 100644
--- a/bridge/bridge.c
+++ b/bridge/bridge.c
@@ -18,7 +18,6 @@
 
 struct rtnl_handle rth = { .fd = -1 };
 int preferred_family = AF_UNSPEC;
-int resolve_hosts;
 int oneline;
 int show_stats;
 int show_details;
diff --git a/genl/genl.c b/genl/genl.c
index 747074b0..7e4a208d 100644
--- a/genl/genl.c
+++ b/genl/genl.c
@@ -30,7 +30,6 @@
 int show_stats = 0;
 int show_details = 0;
 int show_raw = 0;
-int resolve_hosts = 0;
 
 static void *BODY;
 static struct genl_util * genl_list;
diff --git a/ip/ip.c b/ip/ip.c
index e66f6970..e2da46dd 100644
--- a/ip/ip.c
+++ b/ip/ip.c
@@ -30,7 +30,6 @@ int human_readable;
 int use_iec;
 int show_stats;
 int show_details;
-int resolve_hosts;
 int oneline;
 int brief;
 int json;
diff --git a/ip/rtmon.c b/ip/rtmon.c
index 1c2981f7..94baa38e 100644
--- a/ip/rtmon.c
+++ b/ip/rtmon.c
@@ -25,7 +25,6 @@
 #include "utils.h"
 #include "libnetlink.h"
 
-int resolve_hosts;
 static int init_phase = 1;
 
 static void write_stamp(FILE *fp)
diff --git a/lib/utils.c b/lib/utils.c
index ac155bf5..f77be1fd 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -37,6 +37,7 @@
 #include "utils.h"
 #include "namespace.h"
 
+int resolve_hosts;
 int timestamp_short;
 
 int get_hex(char c)
diff --git a/misc/arpd.c b/misc/arpd.c
index c2666f76..67d86b67 100644
--- a/misc/arpd.c
+++ b/misc/arpd.c
@@ -38,8 +38,6 @@
 #include "utils.h"
 #include "rt_names.h"
 
-int resolve_hosts;
-
 DB *dbase;
 char   *dbname = "/var/lib/arpd/arpd.db";
 
diff --git a/misc/ss.c b/misc/ss.c
index 56a9ad41..45a0c330 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -88,7 +88,6 @@ static int security_get_initial_context(char *name,  char 
**context)
 }
 #endif
 
-int resolve_hosts;
 int resolve_services = 1;
 int preferred_family = AF_UNSPEC;
 int show_options;
diff --git a/tc/tc.c b/tc/tc.c
index 8e64a82b..32924164 100644
--- a/tc/tc.c
+++ b/tc/tc.c
@@ -39,7 +39,6 @@ int show_graph;
 int timestamp;
 
 int batch_mode;
-int resolve_hosts;
 int use_iec;
 int force;
 bool use_names;
-- 
2.13.6

[PATCH iproute2 2/2] devlink: add batch command support

2017-11-09 Thread Ivan Vecera

The patch adds support to batch devlink commands.

Cc: Jiri Pirko <j...@mellanox.com>
Cc: Arkadi Sharshevsky <arka...@mellanox.com>
Signed-off-by: Ivan Vecera <ivec...@redhat.com>
---
 devlink/devlink.c  | 70 +++---
 man/man8/devlink.8 | 16 +
 2 files changed, 78 insertions(+), 8 deletions(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index 39cda067..1b15eef8 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -3803,12 +3803,16 @@ static int cmd_dpipe(struct dl *dl)
 static void help(void)
 {
pr_err("Usage: devlink [ OPTIONS ] OBJECT { COMMAND | help }\n"
+  "   devlink [ -f[orce] ] -b[atch] filename\n"
   "where  OBJECT := { dev | port | sb | monitor | dpipe }\n"
   "   OPTIONS := { -V[ersion] | -n[no-nice-names] | -j[json] | 
-p[pretty] | -v[verbose] }\n");
 }
 
-static int dl_cmd(struct dl *dl)
+static int dl_cmd(struct dl *dl, int argc, char **argv)
 {
+   dl->argc = argc;
+   dl->argv = argv;
+
if (dl_argv_match(dl, "help") || dl_no_arg(dl)) {
help();
return 0;
@@ -3832,13 +3836,10 @@ static int dl_cmd(struct dl *dl)
return -ENOENT;
 }
 
-static int dl_init(struct dl *dl, int argc, char **argv)
+static int dl_init(struct dl *dl)
 {
int err;
 
-   dl->argc = argc;
-   dl->argv = argv;
-
dl->nlg = mnlg_socket_open(DEVLINK_GENL_NAME, DEVLINK_GENL_VERSION);
if (!dl->nlg) {
pr_err("Failed to connect to devlink Netlink\n");
@@ -3890,16 +3891,59 @@ static void dl_free(struct dl *dl)
free(dl);
 }
 
+static int dl_batch(struct dl *dl, const char *name, bool force)
+{
+   char *line = NULL;
+   size_t len = 0;
+   int ret = EXIT_SUCCESS;
+
+   if (name && strcmp(name, "-") != 0) {
+   if (freopen(name, "r", stdin) == NULL) {
+   fprintf(stderr,
+   "Cannot open file \"%s\" for reading: %s\n",
+   name, strerror(errno));
+   return EXIT_FAILURE;
+   }
+   }
+
+   cmdlineno = 0;
+   while (getcmdline(, , stdin) != -1) {
+   char *largv[100];
+   int largc;
+
+   largc = makeargs(line, largv, 100);
+   if (!largc)
+   continue;   /* blank line */
+
+   if (dl_cmd(dl, largc, largv)) {
+   fprintf(stderr, "Command failed %s:%d\n",
+   name, cmdlineno);
+   ret = EXIT_FAILURE;
+   if (!force)
+   break;
+   }
+   }
+
+   if (line)
+   free(line);
+
+   return ret;
+}
+
 int main(int argc, char **argv)
 {
static const struct option long_options[] = {
{ "Version",no_argument,NULL, 'V' },
+   { "force",  no_argument,NULL, 'f' },
+   { "batch",  required_argument,  NULL, 'b' },
{ "no-nice-names",  no_argument,NULL, 'n' },
{ "json",   no_argument,NULL, 'j' },
{ "pretty", no_argument,NULL, 'p' },
{ "verbose",no_argument,NULL, 'v' },
{ NULL, 0, NULL, 0 }
};
+   const char *batch_file = NULL;
+   bool force = false;
struct dl *dl;
int opt;
int err;
@@ -3911,7 +3955,7 @@ int main(int argc, char **argv)
return EXIT_FAILURE;
}
 
-   while ((opt = getopt_long(argc, argv, "Vnjpv",
+   while ((opt = getopt_long(argc, argv, "Vfb:njpv",
  long_options, NULL)) >= 0) {
 
switch (opt) {
@@ -3919,6 +3963,12 @@ int main(int argc, char **argv)
printf("devlink utility, iproute2-ss%s\n", SNAPSHOT);
ret = EXIT_SUCCESS;
goto dl_free;
+   case 'f':
+   force = true;
+   break;
+   case 'b':
+   batch_file = optarg;
+   break;
case 'n':
dl->no_nice_names = true;
break;
@@ -3942,13 +3992,17 @@ int main(int argc, char **argv)
argc -= optind;
argv += optind;
 
-   err = dl_init(dl, argc, argv);
+   err = dl_init(dl);
if (err) {
ret = EXIT_FAILURE;
goto dl_free;
}
 
-   err

Re: [patch net-next v2 2/5] net: bridge: Export bridge multicast router state

2017-10-09 Thread Ivan Vecera

On 9.10.2017 11:15, Jiri Pirko wrote:
> From: Yotam Gigi <yot...@mellanox.com>
> 
> Add an access function that, given a bridge netdevice, returns whether the
> bridge device is currently an mrouter or not. The function uses the already
> existing br_multicast_is_router function to check that.
> 
> This function is needed in order to allow ports that join an already
> existing bridge to know the current mrouter state of the bridge device.
> Together with the bridge device mrouter ports switchdev notifications, it
> is possible to have full offloading of the semantics of the bridge device
> mcast router state.
> 
> Due to the fact that the bridge multicast router status can change in
> packet RX path, take the multicast_router bridge spinlock to protect the
> read.
> 
> Signed-off-by: Yotam Gigi <yot...@mellanox.com>
> Reviewed-by: Nogah Frankel <nog...@mellanox.com>
> Reviewed-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>
> Signed-off-by: Jiri Pirko <j...@mellanox.com>
> ---
>  include/linux/if_bridge.h |  5 +
>  net/bridge/br_multicast.c | 12 
>  2 files changed, 17 insertions(+)
> 
> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> index 316ee11..02639eb 100644
> --- a/include/linux/if_bridge.h
> +++ b/include/linux/if_bridge.h
> @@ -64,6 +64,7 @@ int br_multicast_list_adjacent(struct net_device *dev,
>  bool br_multicast_has_querier_anywhere(struct net_device *dev, int proto);
>  bool br_multicast_has_querier_adjacent(struct net_device *dev, int proto);
>  bool br_multicast_enabled(const struct net_device *dev);
> +bool br_multicast_router(const struct net_device *dev);
>  #else
>  static inline int br_multicast_list_adjacent(struct net_device *dev,
>struct list_head *br_ip_list)
> @@ -84,6 +85,10 @@ static inline bool br_multicast_enabled(const struct 
> net_device *dev)
>  {
>   return false;
>  }
> +static inline bool br_multicast_router(const struct net_device *dev)
> +{
> + return false;
> +}
>  #endif
>  
>  #if IS_ENABLED(CONFIG_BRIDGE) && IS_ENABLED(CONFIG_BRIDGE_VLAN_FILTERING)
> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
> index bd50550..7947e04 100644
> --- a/net/bridge/br_multicast.c
> +++ b/net/bridge/br_multicast.c
> @@ -2216,6 +2216,18 @@ bool br_multicast_enabled(const struct net_device *dev)
>  }
>  EXPORT_SYMBOL_GPL(br_multicast_enabled);
>  
> +bool br_multicast_router(const struct net_device *dev)
> +{
> + struct net_bridge *br = netdev_priv(dev);
> + bool is_router;
> +
> + spin_lock_bh(>multicast_lock);
> + is_router = br_multicast_is_router(br);
> + spin_unlock_bh(>multicast_lock);
> + return is_router;
> +}
> +EXPORT_SYMBOL_GPL(br_multicast_router);
> +
>  int br_multicast_set_querier(struct net_bridge *br, unsigned long val)
>  {
>   unsigned long max_delay;
> 

Reviewed by: Ivan Vecera <ivec...@redhat.com>

Re: [patch net-next v2 1/5] net: bridge: Notify on bridge device mrouter state changes

2017-10-09 Thread Ivan Vecera

On 9.10.2017 11:15, Jiri Pirko wrote:
> From: Yotam Gigi <yot...@mellanox.com>
> 
> Add the SWITCHDEV_ATTR_ID_BRIDGE_MROUTER switchdev notification type, used
> to indicate whether the bridge is or isn't mrouter. Notify when the bridge
> changes its state, similarly to the already existing bridged port mrouter
> notifications.
> 
> The notification uses the switchdev_attr.u.mrouter boolean flag to indicate
> the current bridge mrouter status. Thus, it only indicates whether the
> bridge is currently used as an mrouter or not, and does not indicate the
> exact mrouter state of the bridge (learning, permanent, etc.).
> 
> Signed-off-by: Yotam Gigi <yot...@mellanox.com>
> Signed-off-by: Jiri Pirko <j...@mellanox.com>
> ---
> v1->v2:
>  - use the timer_pending to distinguish between learning-on and
>learning-off states
> ---
>  include/net/switchdev.h   |  1 +
>  net/bridge/br_multicast.c | 38 +++---
>  2 files changed, 36 insertions(+), 3 deletions(-)
> 
> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
> index d767b79..d756fbe 100644
> --- a/include/net/switchdev.h
> +++ b/include/net/switchdev.h
> @@ -51,6 +51,7 @@ enum switchdev_attr_id {
>   SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME,
>   SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING,
>   SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED,
> + SWITCHDEV_ATTR_ID_BRIDGE_MROUTER,
>  };
>  
>  struct switchdev_attr {
> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
> index 8dc5c8d..bd50550 100644
> --- a/net/bridge/br_multicast.c
> +++ b/net/bridge/br_multicast.c
> @@ -859,8 +859,32 @@ static void br_multicast_router_expired(unsigned long 
> data)
>   spin_unlock(>multicast_lock);
>  }
>  
> +static void br_mc_router_state_change(struct net_bridge *p,
> +   bool is_mc_router)
> +{
> + struct switchdev_attr attr = {
> + .orig_dev = p->dev,
> + .id = SWITCHDEV_ATTR_ID_BRIDGE_MROUTER,
> + .flags = SWITCHDEV_F_DEFER,
> + .u.mrouter = is_mc_router,
> + };
> +
> + switchdev_port_attr_set(p->dev, );
> +}
> +
>  static void br_multicast_local_router_expired(unsigned long data)
>  {
> + struct net_bridge *br = (struct net_bridge *)data;
> +
> + spin_lock(>multicast_lock);
> + if (br->multicast_router == MDB_RTR_TYPE_DISABLED ||
> + br->multicast_router == MDB_RTR_TYPE_PERM ||
> + timer_pending(>multicast_router_timer))
> + goto out;
> +
> + br_mc_router_state_change(br, false);
> +out:
> + spin_unlock(>multicast_lock);
>  }
>  
>  static void br_multicast_querier_expired(struct net_bridge *br,
> @@ -1364,9 +1388,12 @@ static void br_multicast_mark_router(struct net_bridge 
> *br,
>   unsigned long now = jiffies;
>  
>   if (!port) {
> - if (br->multicast_router == MDB_RTR_TYPE_TEMP_QUERY)
> + if (br->multicast_router == MDB_RTR_TYPE_TEMP_QUERY) {
> + if (!timer_pending(>multicast_router_timer))
> + br_mc_router_state_change(br, true);
>   mod_timer(>multicast_router_timer,
> now + br->multicast_querier_interval);
> + }
>   return;
>   }
>  
> @@ -1952,7 +1979,7 @@ void br_multicast_init(struct net_bridge *br)
>  
>   spin_lock_init(>multicast_lock);
>   setup_timer(>multicast_router_timer,
> - br_multicast_local_router_expired, 0);
> + br_multicast_local_router_expired, (unsigned long)br);
>   setup_timer(>ip4_other_query.timer,
>   br_ip4_multicast_querier_expired, (unsigned long)br);
>   setup_timer(>ip4_own_query.timer, br_ip4_multicast_query_expired,
> @@ -2042,9 +2069,14 @@ int br_multicast_set_router(struct net_bridge *br, 
> unsigned long val)
>   switch (val) {
>   case MDB_RTR_TYPE_DISABLED:
>   case MDB_RTR_TYPE_PERM:
> + br_mc_router_state_change(br, val == MDB_RTR_TYPE_PERM);
>   del_timer(>multicast_router_timer);
> - /* fall through */
> + br->multicast_router = val;
> + err = 0;
> + break;
>   case MDB_RTR_TYPE_TEMP_QUERY:
> + if (br->multicast_router != MDB_RTR_TYPE_TEMP_QUERY)
> + br_mc_router_state_change(br, false);
>   br->multicast_router = val;
>   err = 0;
>   break;
> 

Reviewed by: Ivan Vecera <ivec...@redhat.com>

Re: [PATCH net-next v3 13/13] net: switchdev: Remove bridge bypass support from switchdev

2017-08-07 Thread Ivan Vecera

EINVAL;
> - vlan.vid_begin = vinfo->vid;
> - /* don't allow range of pvids */
> - if (vlan.flags & BRIDGE_VLAN_INFO_PVID)
> - return -EINVAL;
> - } else if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END) {
> - if (!vlan.vid_begin)
> - return -EINVAL;
> - vlan.vid_end = vinfo->vid;
> - if (vlan.vid_end <= vlan.vid_begin)
> - return -EINVAL;
> - err = f(dev, );
> - if (err)
> - return err;
> - vlan.vid_begin = 0;
> - } else {
> - if (vlan.vid_begin)
> - return -EINVAL;
> - vlan.vid_begin = vinfo->vid;
> - vlan.vid_end = vinfo->vid;
> - err = f(dev, );
> - if (err)
> - return err;
> - vlan.vid_begin = 0;
> - }
> - }
> -
> - return 0;
> -}
> -
> -/**
> - *   switchdev_port_bridge_setlink - Set bridge port attributes
> - *
> - *   @dev: port device
> - *   @nlh: netlink header
> - *   @flags: netlink flags
> - *
> - *   Called for SELF on rtnl_bridge_setlink to set bridge port
> - *   attributes.
> - */
> -int switchdev_port_bridge_setlink(struct net_device *dev,
> -   struct nlmsghdr *nlh, u16 flags)
> -{
> - struct nlattr *protinfo;
> - struct nlattr *afspec;
> - int err = 0;
> -
> - if (!netif_is_bridge_port(dev))
> - return -EOPNOTSUPP;
> -
> - protinfo = nlmsg_find_attr(nlh, sizeof(struct ifinfomsg),
> -IFLA_PROTINFO);
> - if (protinfo) {
> - err = switchdev_port_br_setlink_protinfo(dev, protinfo);
> - if (err)
> - return err;
> - }
> -
> - afspec = nlmsg_find_attr(nlh, sizeof(struct ifinfomsg),
> -  IFLA_AF_SPEC);
> - if (afspec)
> - err = switchdev_port_br_afspec(dev, afspec,
> -switchdev_port_obj_add);
> -
> - return err;
> -}
> -EXPORT_SYMBOL_GPL(switchdev_port_bridge_setlink);
> -
> -/**
> - *   switchdev_port_bridge_dellink - Set bridge port attributes
> - *
> - *   @dev: port device
> - *   @nlh: netlink header
> - *   @flags: netlink flags
> - *
> - *   Called for SELF on rtnl_bridge_dellink to set bridge port
> - *   attributes.
> - */
> -int switchdev_port_bridge_dellink(struct net_device *dev,
> -   struct nlmsghdr *nlh, u16 flags)
> -{
> - struct nlattr *afspec;
> -
> - if (!netif_is_bridge_port(dev))
> - return -EOPNOTSUPP;
> -
> - afspec = nlmsg_find_attr(nlh, sizeof(struct ifinfomsg),
> -      IFLA_AF_SPEC);
> - if (afspec)
> - return switchdev_port_br_afspec(dev, afspec,
> - switchdev_port_obj_del);
> -
> - return 0;
> -}
> -EXPORT_SYMBOL_GPL(switchdev_port_bridge_dellink);
> -
> -/**
> - *   switchdev_port_fdb_add - Add FDB (MAC/VLAN) entry to port
> - *
> - *   @ndmsg: netlink hdr
> - *   @nlattr: netlink attributes
> - *   @dev: port device
> - *   @addr: MAC address to add
> - *   @vid: VLAN to add
> - *
> - *   Add FDB entry to switch device.
> - */
> -int switchdev_port_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
> -struct net_device *dev, const unsigned char *addr,
> -u16 vid, u16 nlm_flags)
> -{
> - struct switchdev_obj_port_fdb fdb = {
> - .obj.orig_dev = dev,
> - .obj.id = SWITCHDEV_OBJ_ID_PORT_FDB,
> - .vid = vid,
> - };
> -
> - ether_addr_copy(fdb.addr, addr);
> - return switchdev_port_obj_add(dev, );
> -}
> -EXPORT_SYMBOL_GPL(switchdev_port_fdb_add);
> -
> -/**
> - *   switchdev_port_fdb_del - Delete FDB (MAC/VLAN) entry from port
> - *
> - *   @ndmsg: netlink hdr
> - *   @nlattr: netlink attributes
> - *   @dev: port device
> - *   @addr: MAC address to delete
> - *   @vid: VLAN to delete
> - *
> - *   Delete FDB entry from switch device.
> - */
> -int switchdev_port_fdb_del(struct ndmsg *ndm, struct nlattr *tb[],
> -struct net_device *dev, const unsigned char *addr,
> -u16 vid)
> -{
> - struct switchdev_obj_port_fdb fdb = {
> - .obj.orig_dev = dev,
> - .obj.id = SWITCHDEV_OBJ_ID_PORT_FDB,
> - .vid = vid,
> - };
> -
> - ether_addr_copy(fdb.addr, addr);
> - return switchdev_port_obj_del(dev, );
> -}
> -EXPORT_SYMBOL_GPL(switchdev_port_fdb_del);
> -
>  bool switchdev_port_same_parent_id(struct net_device *a,
>  struct net_device *b)
>  {
> 

Reviewed-by: Ivan Vecera <ivec...@redhat.com>

Re: [PATCH net-next v3 12/13] net: bridge: Remove FDB deletion through switchdev object

2017-08-07 Thread Ivan Vecera

On 6.8.2017 15:15, Arkadi Sharshevsky wrote:
> At this point no driver supports FDB add/del through switchdev object
> but rather via notification chain, thus, it is removed.
> 
> Signed-off-by: Arkadi Sharshevsky <arka...@mellanox.com>
> Reviewed-by: Vivien Didelot <vivien.dide...@savoirfairelinux.com>
> ---
>  net/bridge/br_fdb.c | 18 --
>  1 file changed, 18 deletions(-)
> 
> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
> index a5e4a73..a79b648 100644
> --- a/net/bridge/br_fdb.c
> +++ b/net/bridge/br_fdb.c
> @@ -169,29 +169,11 @@ static void fdb_del_hw_addr(struct net_bridge *br, 
> const unsigned char *addr)
>   }
>  }
>  
> -static void fdb_del_external_learn(struct net_bridge_fdb_entry *f)
> -{
> - struct switchdev_obj_port_fdb fdb = {
> - .obj = {
> - .orig_dev = f->dst->dev,
> - .id = SWITCHDEV_OBJ_ID_PORT_FDB,
> - .flags = SWITCHDEV_F_DEFER,
> - },
> - .vid = f->vlan_id,
> - };
> -
> - ether_addr_copy(fdb.addr, f->addr.addr);
> - switchdev_port_obj_del(f->dst->dev, );
> -}
> -
>  static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f)
>  {
>   if (f->is_static)
>   fdb_del_hw_addr(br, f->addr.addr);
>  
> - if (f->added_by_external_learn)
> - fdb_del_external_learn(f);
> -
>   hlist_del_init_rcu(>hlist);
>   fdb_notify(br, f, RTM_DELNEIGH);
>   call_rcu(>rcu, fdb_rcu_free);
> 
Reviewed-by: Ivan Vecera <ivec...@redhat.com>

Re: [PATCH net-next v3 1/9] net: switchdev: add SET_SWITCHDEV_OPS helper

2017-06-30 Thread Ivan Vecera

On 29.6.2017 22:08, Simon Horman wrote:
> Add a helper to allow switchdev ops to be set if NET_SWITCHDEV is configured
> and do nothing otherwise. This allows for slightly cleaner code which
> uses switchdev but does not select NET_SWITCHDEV.
> 
> Signed-off-by: Simon Horman <simon.hor...@netronome.com>
> ---
>  include/net/switchdev.h | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
> index c784a6ac6ef1..8ae9e3b6392e 100644
> --- a/include/net/switchdev.h
> +++ b/include/net/switchdev.h
> @@ -217,6 +217,8 @@ void switchdev_port_fwd_mark_set(struct net_device *dev,
>  
>  bool switchdev_port_same_parent_id(struct net_device *a,
>  struct net_device *b);
> +
> +#define SWITCHDEV_SET_OPS(netdev, ops) ((netdev)->switchdev_ops = (ops))
>  #else
>  
>  static inline void switchdev_deferred_process(void)
> @@ -322,6 +324,8 @@ static inline bool switchdev_port_same_parent_id(struct 
> net_device *a,
>   return false;
>  }
>  
> +#define SWITCHDEV_SET_OPS(netdev, ops) do {} while (0)
> +
>  #endif
>  
>  #endif /* _LINUX_SWITCHDEV_H_ */
> 

Acked-by: Ivan Vecera <ivec...@redhat.com>

Re: [patch net-next v2 04/19] net: switchdev: Change notifier chain to be atomic

2017-06-08 Thread Ivan Vecera

On 8.6.2017 08:44, Jiri Pirko wrote:
> From: Arkadi Sharshevsky <arka...@mellanox.com>
> 
> In order to use the switchdev notifier chain for FDB sync with the
> device it has to be changed to atomic. The is done because the bridge
> can learn new FDBs in atomic context.
> 
> Signed-off-by: Arkadi Sharshevsky <arka...@mellanox.com>
> Reviewed-by: Ido Schimmel <ido...@mellanox.com>
> Reviewed-by: Ivan Vecera <ivec...@redhat.com>
> Signed-off-by: Jiri Pirko <j...@mellanox.com>
> ---
>  net/switchdev/switchdev.c | 30 ++
>  1 file changed, 6 insertions(+), 24 deletions(-)
> 
> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
> index 8d40a7d..25dc67e 100644
> --- a/net/switchdev/switchdev.c
> +++ b/net/switchdev/switchdev.c
> @@ -571,24 +571,17 @@ int switchdev_port_obj_dump(struct net_device *dev, 
> struct switchdev_obj *obj,
>  }
>  EXPORT_SYMBOL_GPL(switchdev_port_obj_dump);
>  
> -static RAW_NOTIFIER_HEAD(switchdev_notif_chain);
> +static ATOMIC_NOTIFIER_HEAD(switchdev_notif_chain);
>  
>  /**
>   *   register_switchdev_notifier - Register notifier
>   *   @nb: notifier_block
>   *
> - *   Register switch device notifier. This should be used by code
> - *   which needs to monitor events happening in particular device.
> - *   Return values are same as for atomic_notifier_chain_register().
> + *   Register switch device notifier.
>   */
>  int register_switchdev_notifier(struct notifier_block *nb)
>  {
> - int err;
> -
> - rtnl_lock();
> - err = raw_notifier_chain_register(_notif_chain, nb);
> - rtnl_unlock();
> - return err;
> + return atomic_notifier_chain_register(_notif_chain, nb);
>  }
>  EXPORT_SYMBOL_GPL(register_switchdev_notifier);
>  
> @@ -597,16 +590,10 @@ EXPORT_SYMBOL_GPL(register_switchdev_notifier);
>   *   @nb: notifier_block
>   *
>   *   Unregister switch device notifier.
> - *   Return values are same as for atomic_notifier_chain_unregister().
>   */
>  int unregister_switchdev_notifier(struct notifier_block *nb)
>  {
> - int err;
> -
> - rtnl_lock();
> - err = raw_notifier_chain_unregister(_notif_chain, nb);
> - rtnl_unlock();
> - return err;
> + return atomic_notifier_chain_unregister(_notif_chain, nb);
>  }
>  EXPORT_SYMBOL_GPL(unregister_switchdev_notifier);
>  
> @@ -616,18 +603,13 @@ EXPORT_SYMBOL_GPL(unregister_switchdev_notifier);
>   *   @dev: port device
>   *   @info: notifier information data
>   *
> - *   Call all network notifier blocks. This should be called by driver
> - *   when it needs to propagate hardware event.
> - *   Return values are same as for atomic_notifier_call_chain().
> - *   rtnl_lock must be held.
> + *   Call all network notifier blocks.
>   */
>  int call_switchdev_notifiers(unsigned long val, struct net_device *dev,
>struct switchdev_notifier_info *info)
>  {
> - ASSERT_RTNL();
> -
>   info->dev = dev;
> - return raw_notifier_call_chain(_notif_chain, val, info);
> + return atomic_notifier_call_chain(_notif_chain, val, info);
>  }
>  EXPORT_SYMBOL_GPL(call_switchdev_notifiers);
>  
> 

Reviewed-by: Ivan Vecera <ivec...@redhat.com>

Re: [patch net-next v2 05/19] net: bridge: Add support for notifying devices about FDB add/del

2017-06-08 Thread Ivan Vecera

net_bridge *br,
>   struct sk_buff *skb;
>   int err = -ENOBUFS;
>  
> + br_switchdev_fdb_notify(fdb, type);
> +
>   skb = nlmsg_new(fdb_nlmsg_size(), GFP_ATOMIC);
>   if (skb == NULL)
>   goto errout;
> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index a122684..98410ea 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -1085,6 +1085,8 @@ bool nbp_switchdev_allowed_egress(const struct 
> net_bridge_port *p,
>  int br_switchdev_set_port_flag(struct net_bridge_port *p,
>  unsigned long flags,
>  unsigned long mask);
> +void br_switchdev_fdb_notify(const struct net_bridge_fdb_entry *fdb,
> +  int type);
>  #else
>  static inline int nbp_switchdev_mark_set(struct net_bridge_port *p)
>  {
> @@ -1108,6 +1110,11 @@ static inline int br_switchdev_set_port_flag(struct 
> net_bridge_port *p,
>  {
>   return 0;
>  }
> +
> +static inline void
> +br_switchdev_fdb_notify(const struct net_bridge_fdb_entry *fdb, int type)
> +{
> +}
>  #endif /* CONFIG_NET_SWITCHDEV */
>  
>  #endif
> diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
> index b975959..181a44d 100644
> --- a/net/bridge/br_switchdev.c
> +++ b/net/bridge/br_switchdev.c
> @@ -98,3 +98,36 @@ int br_switchdev_set_port_flag(struct net_bridge_port *p,
>  
>   return 0;
>  }
> +
> +static void
> +br_switchdev_fdb_call_notifiers(bool adding, const unsigned char *mac,
> + u16 vid, struct net_device *dev)
> +{
> + struct switchdev_notifier_fdb_info info;
> + unsigned long notifier_type;
> +
> + info.addr = mac;
> + info.vid = vid;
> + notifier_type = adding ? SWITCHDEV_FDB_ADD_TO_DEVICE : 
> SWITCHDEV_FDB_DEL_TO_DEVICE;
> + call_switchdev_notifiers(notifier_type, dev, );
> +}
> +
> +void
> +br_switchdev_fdb_notify(const struct net_bridge_fdb_entry *fdb, int type)
> +{
> + if (!fdb->added_by_user)
> + return;
> +
> + switch (type) {
> + case RTM_DELNEIGH:
> + br_switchdev_fdb_call_notifiers(false, fdb->addr.addr,
> + fdb->vlan_id,
> + fdb->dst->dev);
> + break;
> + case RTM_NEWNEIGH:
> + br_switchdev_fdb_call_notifiers(true, fdb->addr.addr,
> + fdb->vlan_id,
> + fdb->dst->dev);
> + break;
> + }
> +}
> 

Reviewed-by: Ivan Vecera <ivec...@redhat.com>

Re: [patch net-next v2 01/19] net: switchdev: Add support for querying supported bridge flags by hardware

2017-06-08 Thread Ivan Vecera

On 8.6.2017 08:44, Jiri Pirko wrote:
> From: Arkadi Sharshevsky <arka...@mellanox.com>
> 
> This is done as a preparation stage before setting the bridge port flags
> from the bridge code. Currently the device can be queried for the bridge
> flags state, but the querier cannot distinguish if the flag is disabled
> or if it is not supported at all. Thus, add new attr and a bit-mask which
> include information regarding the support on a per-flag basis.
> 
> Drivers that support bridge offload but not support bridge flags should
> return zeroed bitmask.
> 
> Signed-off-by: Arkadi Sharshevsky <arka...@mellanox.com>
> Reviewed-by: Ido Schimmel <ido...@mellanox.com>
> Reviewed-by: Ivan Vecera <ivec...@redhat.com>
> Signed-off-by: Jiri Pirko <j...@mellanox.com>
> ---
>  include/net/switchdev.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
> index 929d6af..63a754d 100644
> --- a/include/net/switchdev.h
> +++ b/include/net/switchdev.h
> @@ -46,6 +46,7 @@ enum switchdev_attr_id {
>   SWITCHDEV_ATTR_ID_PORT_PARENT_ID,
>   SWITCHDEV_ATTR_ID_PORT_STP_STATE,
>   SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS,
> + SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT,
>   SWITCHDEV_ATTR_ID_PORT_MROUTER,
>   SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME,
>   SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING,
> @@ -62,6 +63,7 @@ struct switchdev_attr {
>   struct netdev_phys_item_id ppid;/* PORT_PARENT_ID */
>   u8 stp_state;   /* PORT_STP_STATE */
>   unsigned long brport_flags; /* PORT_BRIDGE_FLAGS */
> + unsigned long brport_flags_support; /* 
> PORT_BRIDGE_FLAGS_SUPPORT */
>   bool mrouter;   /* PORT_MROUTER */
>   clock_t ageing_time;        /* BRIDGE_AGEING_TIME */
>   bool vlan_filtering;/* 
> BRIDGE_VLAN_FILTERING */
> 

Reviewed-by: Ivan Vecera <ivec...@redhat.com>

Re: [patch net-next v2 02/19] net: bridge: Add support for offloading port attributes

2017-06-08 Thread Ivan Vecera

;   br_vlan_tunnel_old = (p->flags & BR_VLAN_TUNNEL) ? true : false;
> - br_set_port_flag(p, tb, IFLA_BRPORT_VLAN_TUNNEL, BR_VLAN_TUNNEL);
> + err = br_set_port_flag(p, tb, IFLA_BRPORT_VLAN_TUNNEL, BR_VLAN_TUNNEL);
> + if (err)
> + return err;
> +
>   if (br_vlan_tunnel_old && !(p->flags & BR_VLAN_TUNNEL))
>   nbp_vlan_tunnel_info_flush(p);
>  
> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index 2062692..7f43992 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -1076,6 +1076,9 @@ void nbp_switchdev_frame_mark(const struct 
> net_bridge_port *p,
> struct sk_buff *skb);
>  bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p,
> const struct sk_buff *skb);
> +int br_switchdev_set_port_flag(struct net_bridge_port *p,
> +unsigned long flags,
> +unsigned long mask);
>  #else
>  static inline int nbp_switchdev_mark_set(struct net_bridge_port *p)
>  {
> @@ -1092,6 +1095,13 @@ static inline bool nbp_switchdev_allowed_egress(const 
> struct net_bridge_port *p,
>  {
>   return true;
>  }
> +
> +static inline int br_switchdev_set_port_flag(struct net_bridge_port *p,
> +  unsigned long flags,
> +  unsigned long mask)
> +{
> + return 0;
> +}
>  #endif /* CONFIG_NET_SWITCHDEV */
>  
>  #endif
> diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
> index f4097b9..b975959 100644
> --- a/net/bridge/br_switchdev.c
> +++ b/net/bridge/br_switchdev.c
> @@ -55,3 +55,46 @@ bool nbp_switchdev_allowed_egress(const struct 
> net_bridge_port *p,
>   return !skb->offload_fwd_mark ||
>  BR_INPUT_SKB_CB(skb)->offload_fwd_mark != p->offload_fwd_mark;
>  }
> +
> +/* Flags that can be offloaded to hardware */
> +#define BR_PORT_FLAGS_HW_OFFLOAD (BR_LEARNING | BR_FLOOD | \
> +   BR_MCAST_FLOOD | BR_BCAST_FLOOD)
> +
> +int br_switchdev_set_port_flag(struct net_bridge_port *p,
> +unsigned long flags,
> +unsigned long mask)
> +{
> + struct switchdev_attr attr = {
> + .orig_dev = p->dev,
> + .id = SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT,
> + };
> + int err;
> +
> + if (mask & ~BR_PORT_FLAGS_HW_OFFLOAD)
> + return 0;
> +
> + err = switchdev_port_attr_get(p->dev, );
> + if (err == -EOPNOTSUPP)
> + return 0;
> + if (err)
> + return err;
> +
> + /* Check if specific bridge flag attribute offload is supported */
> + if (!(attr.u.brport_flags_support & mask)) {
> + br_warn(p->br, "bridge flag offload is not supported %u(%s)\n",
> + (unsigned int)p->port_no, p->dev->name);
> + return -EOPNOTSUPP;
> + }
> +
> + attr.id = SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS;
> + attr.flags = SWITCHDEV_F_DEFER;
> + attr.u.brport_flags = flags;
> + err = switchdev_port_attr_set(p->dev, );
> + if (err) {
> + br_warn(p->br, "error setting offload flag on port %u(%s)\n",
> + (unsigned int)p->port_no, p->dev->name);
> + return err;
> + }
> +
> + return 0;
> +}
> 

Reviewed-by: Ivan Vecera <ivec...@redhat.com>

Re: [patch net-next 01/19] net: switchdev: Add support for querying supported bridge flags by hardware

2017-06-05 Thread Ivan Vecera

On 5.6.2017 11:20, Jiri Pirko wrote:
> From: Arkadi Sharshevsky <arka...@mellanox.com>
> 
> This is done as a preparation stage before setting the bridge port flags
> from the bridge code. Currently the device can be queried for the bridge
> flags state, but the querier cannot distinguish if the flag is disabled
> or if it is not supported at all. Thus, add new attr and a bit-mask which
> include information regarding the support on a per-flag basis.
> 
> Drivers that support bridge offload but not support bridge flags should
> return zeroed bitmask.
> 
> Signed-off-by: Arkadi Sharshevsky <arka...@mellanox.com>
> Reviewed-by: Ido Schimmel <ido...@mellanox.com>
> Signed-off-by: Jiri Pirko <j...@mellanox.com>
> ---
>  include/net/switchdev.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
> index 929d6af..63a754d 100644
> --- a/include/net/switchdev.h
> +++ b/include/net/switchdev.h
> @@ -46,6 +46,7 @@ enum switchdev_attr_id {
>   SWITCHDEV_ATTR_ID_PORT_PARENT_ID,
>   SWITCHDEV_ATTR_ID_PORT_STP_STATE,
>   SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS,
> + SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT,
>   SWITCHDEV_ATTR_ID_PORT_MROUTER,
>   SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME,
>   SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING,
> @@ -62,6 +63,7 @@ struct switchdev_attr {
>   struct netdev_phys_item_id ppid;/* PORT_PARENT_ID */
>   u8 stp_state;   /* PORT_STP_STATE */
>   unsigned long brport_flags; /* PORT_BRIDGE_FLAGS */
> + unsigned long brport_flags_support; /* 
> PORT_BRIDGE_FLAGS_SUPPORT */
>   bool mrouter;   /* PORT_MROUTER */
>   clock_t ageing_time;    /* BRIDGE_AGEING_TIME */
>   bool vlan_filtering;/* 
> BRIDGE_VLAN_FILTERING */
> 

Reviewed-by: Ivan Vecera <ivec...@redhat.com>

Re: [patch net-next 04/19] net: switchdev: Change notifier chain to be atomic

2017-06-05 Thread Ivan Vecera

On 5.6.2017 11:20, Jiri Pirko wrote:
> From: Arkadi Sharshevsky <arka...@mellanox.com>
> 
> In order to use the switchdev notifier chain for FDB sync with the
> device it has to be changed to atomic. The is done because the bridge
> can learn new FDBs in atomic context.
> 
> Signed-off-by: Arkadi Sharshevsky <arka...@mellanox.com>
> Reviewed-by: Ido Schimmel <ido...@mellanox.com>
> Signed-off-by: Jiri Pirko <j...@mellanox.com>
> ---
>  net/switchdev/switchdev.c | 30 ++
>  1 file changed, 6 insertions(+), 24 deletions(-)
> 
> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
> index 8d40a7d..25dc67e 100644
> --- a/net/switchdev/switchdev.c
> +++ b/net/switchdev/switchdev.c
> @@ -571,24 +571,17 @@ int switchdev_port_obj_dump(struct net_device *dev, 
> struct switchdev_obj *obj,
>  }
>  EXPORT_SYMBOL_GPL(switchdev_port_obj_dump);
>  
> -static RAW_NOTIFIER_HEAD(switchdev_notif_chain);
> +static ATOMIC_NOTIFIER_HEAD(switchdev_notif_chain);
>  
>  /**
>   *   register_switchdev_notifier - Register notifier
>   *   @nb: notifier_block
>   *
> - *   Register switch device notifier. This should be used by code
> - *   which needs to monitor events happening in particular device.
> - *   Return values are same as for atomic_notifier_chain_register().
> + *   Register switch device notifier.
>   */
>  int register_switchdev_notifier(struct notifier_block *nb)
>  {
> - int err;
> -
> - rtnl_lock();
> - err = raw_notifier_chain_register(_notif_chain, nb);
> - rtnl_unlock();
> - return err;
> + return atomic_notifier_chain_register(_notif_chain, nb);
>  }
>  EXPORT_SYMBOL_GPL(register_switchdev_notifier);
>  
> @@ -597,16 +590,10 @@ EXPORT_SYMBOL_GPL(register_switchdev_notifier);
>   *   @nb: notifier_block
>   *
>   *   Unregister switch device notifier.
> - *   Return values are same as for atomic_notifier_chain_unregister().
>   */
>  int unregister_switchdev_notifier(struct notifier_block *nb)
>  {
> - int err;
> -
> - rtnl_lock();
> - err = raw_notifier_chain_unregister(_notif_chain, nb);
> - rtnl_unlock();
> - return err;
> + return atomic_notifier_chain_unregister(_notif_chain, nb);
>  }
>  EXPORT_SYMBOL_GPL(unregister_switchdev_notifier);
>  
> @@ -616,18 +603,13 @@ EXPORT_SYMBOL_GPL(unregister_switchdev_notifier);
>   *   @dev: port device
>   *   @info: notifier information data
>   *
> - *   Call all network notifier blocks. This should be called by driver
> - *   when it needs to propagate hardware event.
> - *   Return values are same as for atomic_notifier_call_chain().
> - *   rtnl_lock must be held.
> + *   Call all network notifier blocks.
>   */
>  int call_switchdev_notifiers(unsigned long val, struct net_device *dev,
>struct switchdev_notifier_info *info)
>  {
> -     ASSERT_RTNL();
> -
>   info->dev = dev;
> - return raw_notifier_call_chain(_notif_chain, val, info);
> + return atomic_notifier_call_chain(_notif_chain, val, info);
>  }
>  EXPORT_SYMBOL_GPL(call_switchdev_notifiers);
>  
> 

Reviewed-by: Ivan Vecera <ivec...@redhat.com>

Re: [PATCH net-next v2] bridge: fix hello and hold timers starting/stopping

2017-05-20 Thread Ivan Vecera

2017-05-20 7:57 GMT+02:00 Hangbin Liu <liuhang...@gmail.com>:
> On Fri, May 19, 2017 at 07:30:43PM +0200, Ivan Vecera wrote:
>> Current bridge code incorrectly handles starting/stopping of hello and
>> hold timers during STP enable/disable.
>>
>> 1. Timers are stopped in br_stp_start() during NO_STP->USER_STP
>>transition. The timers are already stopped in NO_STP state so
>>this is confusing no-op.
>
> Hi Ivan,
>
> Shouldn't we start hello timer in br_stp_start when NO_STP -> BR_KERNEL_STP ?

As Nikolay mentioned, this is fixed by
https://patchwork.ozlabs.org/patch/764685/

>>
>> 2. During USER_STP->NO_STP transition the timers are started. This
>>does not make sense and is confusion because the timer should not be
>>active in NO_STP state.
>
> Yes, but what about BR_KERNEL_STP -> NO_STP in function br_stp_stop() ?

The timer is lazily stopped by itself in its handler... or not rearmed
respectively.

>> Cc: da...@davemloft.net
>> Cc: sas...@cumulusnetworks.com
>> Cc: step...@networkplumber.org
>> Cc: bri...@lists.linux-foundation.org
>> Cc: lucien@gmail.com
>> Cc: niko...@cumulusnetworks.com
>> Signed-off-by: Ivan Vecera <c...@cera.cz>
>> ---
>>  net/bridge/br_stp_if.c | 11 ---
>>  1 file changed, 11 deletions(-)
>>
>> diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
>> index 08341d2aa9c9..a05027027513 100644
>> --- a/net/bridge/br_stp_if.c
>> +++ b/net/bridge/br_stp_if.c
>> @@ -150,7 +150,6 @@ static int br_stp_call_user(struct net_bridge *br, char 
>> *arg)
>>
>>  static void br_stp_start(struct net_bridge *br)
>>  {
>> - struct net_bridge_port *p;
>>   int err = -ENOENT;
>>
>>   if (net_eq(dev_net(br->dev), _net))
>> @@ -169,11 +168,6 @@ static void br_stp_start(struct net_bridge *br)
>>   if (!err) {
>>   br->stp_enabled = BR_USER_STP;
>>   br_debug(br, "userspace STP started\n");
>> -
>> - /* Stop hello and hold timers */
>> - del_timer(>hello_timer);
>> - list_for_each_entry(p, >port_list, list)
>> - del_timer(>hold_timer);
>
> I'm not sure if user space daemon will send bpdu or not? In comment
> 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel hello and
> hold timers"). Nikolay said we should not handle it with BR_USER_STP.
>
>>   } else {
>>   br->stp_enabled = BR_KERNEL_STP;
>>   br_debug(br, "using kernel STP\n");
>> @@ -187,7 +181,6 @@ static void br_stp_start(struct net_bridge *br)
>>
>>  static void br_stp_stop(struct net_bridge *br)
>>  {
>> - struct net_bridge_port *p;
>>   int err;
>>
>>   if (br->stp_enabled == BR_USER_STP) {
>> @@ -196,10 +189,6 @@ static void br_stp_stop(struct net_bridge *br)
>>   br_err(br, "failed to stop userspace STP (%d)\n", err);
>>
>>   /* To start timers on any ports left in blocking */
>> - mod_timer(>hello_timer, jiffies + br->hello_time);
>> - list_for_each_entry(p, >port_list, list)
>> - mod_timer(>hold_timer,
>> -   round_jiffies(jiffies + BR_HOLD_TIME));
>
> If we do not del hello_timer. after it expired in br_hello_timer_expired(),
> Our state is br->dev->flags & IFF_UP and br->stp_enabled == NO_STP, it will
> call mod_timer(>hello_timer, round_jiffies(jiffies + br->hello_time))
> and we will keep sending bpdu message even after stp stoped.
>
>>   spin_lock_bh(>lock);
>>   br_port_state_selection(br);
>>   spin_unlock_bh(>lock);
>> --
>
> So how about just like
>
> diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
> index d8ad73b..0198f62 100644
> --- a/net/bridge/br_stp_if.c
> +++ b/net/bridge/br_stp_if.c
> @@ -183,6 +183,7 @@ static void br_stp_start(struct net_bridge *br)
> } else {
> br->stp_enabled = BR_KERNEL_STP;
> br_debug(br, "using kernel STP\n");
> +   mod_timer(>hello_timer, jiffies + br->hello_time);
>
> /* To start timers on any ports left in blocking */
> br_port_state_selection(br);
> @@ -202,7 +203,6 @@ static void br_stp_stop(struct net_bridge *br)
> br_err(br, "failed to stop userspace STP (%d)\n", 
> err);
>
> /* To start timers on any ports left in blocking */
> -   mod_timer(>hello_timer, jiffies + br->hello_time);
> list_for_each_entry(p, >port_list, list)
> mod_timer(>hold_timer,
>   round_jiffies(jiffies + BR_HOLD_TIME));
> @@ -211,6 +211,7 @@ static void br_stp_stop(struct net_bridge *br)
> spin_unlock_bh(>lock);
> }
>
> +   del_timer_sync(>hello_timer);
> br->stp_enabled = BR_NO_STP;
>  }
>
> Thanks
> Hangbin

[PATCH net-next v2] bridge: fix hello and hold timers starting/stopping

2017-05-19 Thread Ivan Vecera

Current bridge code incorrectly handles starting/stopping of hello and
hold timers during STP enable/disable.

1. Timers are stopped in br_stp_start() during NO_STP->USER_STP
   transition. The timers are already stopped in NO_STP state so
   this is confusing no-op.

2. During USER_STP->NO_STP transition the timers are started. This
   does not make sense and is confusion because the timer should not be
   active in NO_STP state.

Cc: da...@davemloft.net
Cc: sas...@cumulusnetworks.com
Cc: step...@networkplumber.org
Cc: bri...@lists.linux-foundation.org
Cc: lucien@gmail.com
Cc: niko...@cumulusnetworks.com
Signed-off-by: Ivan Vecera <c...@cera.cz>
---
 net/bridge/br_stp_if.c | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
index 08341d2aa9c9..a05027027513 100644
--- a/net/bridge/br_stp_if.c
+++ b/net/bridge/br_stp_if.c
@@ -150,7 +150,6 @@ static int br_stp_call_user(struct net_bridge *br, char 
*arg)
 
 static void br_stp_start(struct net_bridge *br)
 {
-   struct net_bridge_port *p;
int err = -ENOENT;
 
if (net_eq(dev_net(br->dev), _net))
@@ -169,11 +168,6 @@ static void br_stp_start(struct net_bridge *br)
if (!err) {
br->stp_enabled = BR_USER_STP;
br_debug(br, "userspace STP started\n");
-
-   /* Stop hello and hold timers */
-   del_timer(>hello_timer);
-   list_for_each_entry(p, >port_list, list)
-   del_timer(>hold_timer);
} else {
br->stp_enabled = BR_KERNEL_STP;
br_debug(br, "using kernel STP\n");
@@ -187,7 +181,6 @@ static void br_stp_start(struct net_bridge *br)
 
 static void br_stp_stop(struct net_bridge *br)
 {
-   struct net_bridge_port *p;
int err;
 
if (br->stp_enabled == BR_USER_STP) {
@@ -196,10 +189,6 @@ static void br_stp_stop(struct net_bridge *br)
br_err(br, "failed to stop userspace STP (%d)\n", err);
 
/* To start timers on any ports left in blocking */
-   mod_timer(>hello_timer, jiffies + br->hello_time);
-   list_for_each_entry(p, >port_list, list)
-   mod_timer(>hold_timer,
- round_jiffies(jiffies + BR_HOLD_TIME));
spin_lock_bh(>lock);
br_port_state_selection(br);
spin_unlock_bh(>lock);
-- 
2.13.0

Re: [PATCH net] bridge: fix hello and hold timers starting/stopping

2017-05-19 Thread Ivan Vecera

2017-05-19 18:55 GMT+02:00 Nikolay Aleksandrov <niko...@cumulusnetworks.com>:
> On 5/19/17 7:25 PM, Ivan Vecera wrote:
>>
>> Current bridge code incorrectly handles starting/stopping of hello and
>> hold timers during STP enable/disable.
>>
>> 1. Timers are stopped in br_stp_start() during NO_STP->USER_STP
>> transition. This is not correct as the timers are stopped in NO_STP
>> case.
>
>
> This really is a noop, but ok.

Yes, stopping of stopped timers are safe but confusing.

>> 2. Timers are started in br_stp_stop() during USER_STP->NO_STP transition.
>> This is not also correct as the timers should be stopped in NO_STP
>> state.
>
>
> Indeed, but the actual end result is almost as them being stopped because
> in the timers there are specific checks if the STP == KERNEL_STP (see
> br_transmit_config()) and the hold_timers will simply expire and not rearm
> in any other mode. The only real problem is the hello_timer which continues
> to rearm itself, but with Xin's earlier patch that is taken care of too.

Yes, this is clean-up as well. The starting of timers are more
confusing than dangerous
but from a reader's point of view the starting of timers is non-sense
when STP is
going to be disabled.

>>
>> 3. Timers are NOT stopped in br_stp_stop() during KERNEL_STP->NO_STP
>> transition. They should be stopped as they are running in KERNEL_STP
>> state and should not run in NO_STP case.
>
>
> Same comment as for point 2.
This can be removed... and leave hello_timer handler to stop itself.

>> The patch is a follow-up for "bridge: start hello_timer when enabling
>> KERNEL_STP in br_stp_start" patch from Xin Long.
>>
>
> I'd say this is more of a cleanup/improvement after Xin's patch and thus
> would
> suggest targeting net-next. The only real issue is fixed by his patch.
Agree... will send resend against net-next.

Thanks for comments,
Ivan

Re: [PATCH net] bridge: fix hello and hold timers starting/stopping

2017-05-19 Thread Ivan Vecera

2017-05-19 18:51 GMT+02:00 Xin Long <lucien@gmail.com>:
> On Sat, May 20, 2017 at 12:25 AM, Ivan Vecera <c...@cera.cz> wrote:
>> Current bridge code incorrectly handles starting/stopping of hello and
>> hold timers during STP enable/disable.
>>
>> 1. Timers are stopped in br_stp_start() during NO_STP->USER_STP
>>transition. This is not correct as the timers are stopped in NO_STP
>>case.
>>
>> 2. Timers are started in br_stp_stop() during USER_STP->NO_STP transition.
>>This is not also correct as the timers should be stopped in NO_STP
>>state.
>>
>> 3. Timers are NOT stopped in br_stp_stop() during KERNEL_STP->NO_STP
>>transition. They should be stopped as they are running in KERNEL_STP
>>state and should not run in NO_STP case.
>>
>> The patch is a follow-up for "bridge: start hello_timer when enabling
>> KERNEL_STP in br_stp_start" patch from Xin Long.
>>
>> Cc: da...@davemloft.net
>> Cc: sas...@cumulusnetworks.com
>> Cc: step...@networkplumber.org
>> Cc: bri...@lists.linux-foundation.org
>> Cc: lucien@gmail.com
>> Cc: niko...@cumulusnetworks.com
>> Signed-off-by: Ivan Vecera <c...@cera.cz>
>> ---
>>  net/bridge/br_stp_if.c | 15 +--
>>  1 file changed, 5 insertions(+), 10 deletions(-)
>>
>> diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
>> index 0db8102995a5..f137ebf27755 100644
>> --- a/net/bridge/br_stp_if.c
>> +++ b/net/bridge/br_stp_if.c
>> @@ -150,7 +150,6 @@ static int br_stp_call_user(struct net_bridge *br, char 
>> *arg)
>>
>>  static void br_stp_start(struct net_bridge *br)
>>  {
>> -   struct net_bridge_port *p;
>> int err = -ENOENT;
>>
>> if (net_eq(dev_net(br->dev), _net))
>> @@ -169,11 +168,6 @@ static void br_stp_start(struct net_bridge *br)
>> if (!err) {
>> br->stp_enabled = BR_USER_STP;
>> br_debug(br, "userspace STP started\n");
>> -
>> -   /* Stop hello and hold timers */
>> -   del_timer(>hello_timer);
>> -   list_for_each_entry(p, >port_list, list)
>> -   del_timer(>hold_timer);
>> } else {
>> br->stp_enabled = BR_KERNEL_STP;
>> br_debug(br, "using kernel STP\n");
>> @@ -197,13 +191,14 @@ static void br_stp_stop(struct net_bridge *br)
>> br_err(br, "failed to stop userspace STP (%d)\n", 
>> err);
>>
>> /* To start timers on any ports left in blocking */
>> -   mod_timer(>hello_timer, jiffies + br->hello_time);
>> -   list_for_each_entry(p, >port_list, list)
>> -   mod_timer(>hold_timer,
>> - round_jiffies(jiffies + BR_HOLD_TIME));
>> spin_lock_bh(>lock);
>> br_port_state_selection(br);
>> spin_unlock_bh(>lock);
>> +   } else {
>> +   /* BR_KERNEL_STP - stop hello and hold timers */
>> +   del_timer(>hello_timer);
>> +   list_for_each_entry(p, >port_list, list)
>> +   del_timer(>hold_timer);
> I'm thinking, what if the timers are running when deleting them ?
> del_timer may not be going to delete it, and still have to stop itself
> next time when br->stp_enabled = BR_NO_STP.
>
> So do you think it's better to do nothing here and just leave it to be
> stopped by itself when checking br->stp_enabled  in
> br_hello_timer_expired ?

Yes, this kind of "lazy stopping" could be safer.

I.

Re: [PATCH net] bridge: start hello_timer when enabling KERNEL_STP in br_stp_start

2017-05-19 Thread Ivan Vecera

2017-05-19 16:20 GMT+02:00 Xin Long <lucien@gmail.com>:
> Since commit 76b91c32dd86 ("bridge: stp: when using userspace stp stop
> kernel hello and hold timers"), bridge would not start hello_timer if
> stp_enabled is not KERNEL_STP when br_dev_open.
>
> The problem is even if users set stp_enabled with KERNEL_STP later,
> the timer will still not be started. It causes that KERNEL_STP can
> not really work. Users have to re-ifup the bridge to avoid this.
>
> This patch is to fix it by starting br->hello_timer when enabling
> KERNEL_STP in br_stp_start.
>
> As an improvement, it's also to start hello_timer again only when
> br->stp_enabled is KERNEL_STP in br_hello_timer_expired, there is
> no reason to start the timer again when it's NO_STP.
>
> Fixes: 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel hello 
> and hold timers")
> Reported-by: Haidong Li <ha...@redhat.com>
> Signed-off-by: Xin Long <lucien@gmail.com>
> ---
>  net/bridge/br_stp_if.c| 1 +
>  net/bridge/br_stp_timer.c | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
> index 08341d2..0db8102 100644
> --- a/net/bridge/br_stp_if.c
> +++ b/net/bridge/br_stp_if.c
> @@ -179,6 +179,7 @@ static void br_stp_start(struct net_bridge *br)
> br_debug(br, "using kernel STP\n");
>
> /* To start timers on any ports left in blocking */
> +   mod_timer(>hello_timer, jiffies + br->hello_time);
> br_port_state_selection(br);
> }
>
> diff --git a/net/bridge/br_stp_timer.c b/net/bridge/br_stp_timer.c
> index c98b3e5..60b6fe2 100644
> --- a/net/bridge/br_stp_timer.c
> +++ b/net/bridge/br_stp_timer.c
> @@ -40,7 +40,7 @@ static void br_hello_timer_expired(unsigned long arg)
> if (br->dev->flags & IFF_UP) {
> br_config_bpdu_generation(br);
>
> -   if (br->stp_enabled != BR_USER_STP)
> +   if (br->stp_enabled == BR_KERNEL_STP)
> mod_timer(>hello_timer,
>   round_jiffies(jiffies + br->hello_time));
> }
> --
> 2.1.0
>
Reviewed-by: Ivan Vecera <c...@cera.cz>

[PATCH net] bridge: fix hello and hold timers starting/stopping

2017-05-19 Thread Ivan Vecera

Current bridge code incorrectly handles starting/stopping of hello and
hold timers during STP enable/disable.

1. Timers are stopped in br_stp_start() during NO_STP->USER_STP
   transition. This is not correct as the timers are stopped in NO_STP
   case.

2. Timers are started in br_stp_stop() during USER_STP->NO_STP transition.
   This is not also correct as the timers should be stopped in NO_STP
   state.

3. Timers are NOT stopped in br_stp_stop() during KERNEL_STP->NO_STP
   transition. They should be stopped as they are running in KERNEL_STP
   state and should not run in NO_STP case.

The patch is a follow-up for "bridge: start hello_timer when enabling
KERNEL_STP in br_stp_start" patch from Xin Long.

Cc: da...@davemloft.net
Cc: sas...@cumulusnetworks.com
Cc: step...@networkplumber.org
Cc: bri...@lists.linux-foundation.org
Cc: lucien@gmail.com
Cc: niko...@cumulusnetworks.com
Signed-off-by: Ivan Vecera <c...@cera.cz>
---
 net/bridge/br_stp_if.c | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
index 0db8102995a5..f137ebf27755 100644
--- a/net/bridge/br_stp_if.c
+++ b/net/bridge/br_stp_if.c
@@ -150,7 +150,6 @@ static int br_stp_call_user(struct net_bridge *br, char 
*arg)
 
 static void br_stp_start(struct net_bridge *br)
 {
-   struct net_bridge_port *p;
int err = -ENOENT;
 
if (net_eq(dev_net(br->dev), _net))
@@ -169,11 +168,6 @@ static void br_stp_start(struct net_bridge *br)
if (!err) {
br->stp_enabled = BR_USER_STP;
br_debug(br, "userspace STP started\n");
-
-   /* Stop hello and hold timers */
-   del_timer(>hello_timer);
-   list_for_each_entry(p, >port_list, list)
-   del_timer(>hold_timer);
} else {
br->stp_enabled = BR_KERNEL_STP;
br_debug(br, "using kernel STP\n");
@@ -197,13 +191,14 @@ static void br_stp_stop(struct net_bridge *br)
br_err(br, "failed to stop userspace STP (%d)\n", err);
 
/* To start timers on any ports left in blocking */
-   mod_timer(>hello_timer, jiffies + br->hello_time);
-   list_for_each_entry(p, >port_list, list)
-   mod_timer(>hold_timer,
- round_jiffies(jiffies + BR_HOLD_TIME));
spin_lock_bh(>lock);
br_port_state_selection(br);
spin_unlock_bh(>lock);
+   } else {
+   /* BR_KERNEL_STP - stop hello and hold timers */
+   del_timer(>hello_timer);
+   list_for_each_entry(p, >port_list, list)
+   del_timer(>hold_timer);
}
 
br->stp_enabled = BR_NO_STP;
-- 
2.13.0

Re: [PATCH net] bridge: start hello_timer when enabling KERNEL_STP in br_stp_start

2017-05-19 Thread Ivan Vecera

2017-05-19 17:05 GMT+02:00 Nikolay Aleksandrov <niko...@cumulusnetworks.com>:
> On 5/19/17 6:03 PM, Ivan Vecera wrote:
>>
>> 2017-05-19 16:57 GMT+02:00 Nikolay Aleksandrov
>> <niko...@cumulusnetworks.com>:
>>>
>>> On 5/19/17 5:51 PM, Ivan Vecera wrote:
>>>>
>>>>
>>>> 2017-05-19 16:45 GMT+02:00 Nikolay Aleksandrov
>>>> <niko...@cumulusnetworks.com>:
>>>>>
>>>>>
>>>>> On 5/19/17 5:20 PM, Xin Long wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> Since commit 76b91c32dd86 ("bridge: stp: when using userspace stp stop
>>>>>> kernel hello and hold timers"), bridge would not start hello_timer if
>>>>>> stp_enabled is not KERNEL_STP when br_dev_open.
>>>>>>
>>>>>> The problem is even if users set stp_enabled with KERNEL_STP later,
>>>>>> the timer will still not be started. It causes that KERNEL_STP can
>>>>>> not really work. Users have to re-ifup the bridge to avoid this.
>>>>>>
>>>>>> This patch is to fix it by starting br->hello_timer when enabling
>>>>>> KERNEL_STP in br_stp_start.
>>>>>>
>>>>>> As an improvement, it's also to start hello_timer again only when
>>>>>> br->stp_enabled is KERNEL_STP in br_hello_timer_expired, there is
>>>>>> no reason to start the timer again when it's NO_STP.
>>>>>>
>>>>>> Fixes: 76b91c32dd86 ("bridge: stp: when using userspace stp stop
>>>>>> kernel
>>>>>> hello and hold timers")
>>>>>> Reported-by: Haidong Li <ha...@redhat.com>
>>>>>> Signed-off-by: Xin Long <lucien@gmail.com>
>>>>>> ---
>>>>>> net/bridge/br_stp_if.c| 1 +
>>>>>> net/bridge/br_stp_timer.c | 2 +-
>>>>>> 2 files changed, 2 insertions(+), 1 deletion(-)
>>>>>>
>>>>>
>>>>> This doesn't make much sense to me, how do you change from USER_STP to
>>>>> KERNEL_STP without first going through NO_STP ?
>>>>>
>>>>> If you go through NO_STP then all will be fine because br_stp_stop will
>>>>> restart
>>>>> the timers if the previous val was USER_STP.
>>>>>
>>>> The problem occurs when KERNEL_STP is enabled if the bridge itself is
>>>> already
>>>> up. Then the hello_timer is not started. If the hello and hold timers
>>>> should run only
>>>> when KERNEL_STP is used then there are another problematic places
>>>> (will send follow-up).
>>>>
>>>> Ivan
>>>>
>>>
>>> Oh, the problem seems to be rather going from NO_STP -> KERNEL_STP only
>>> then, because you cannot do direct USER_STP -> KERNEL_STP.
>>>
>> No only NO_STP->KERNEL_STP but KERNEL_STP->NO_STP as well as
>> USER_STP->NO_STP:
>>
>> 1) NO_STP->KERNEL_STP issue
>> hello_timer should be started in br_stp_start() - this patch
>>
>
> Right, I was talking only about this patch. By the way what about
> the port hold_timers ? This patch only starts the hello_timer.

The hold_timers should be started indirectly from br_transmit_config()
called from
br_config_bpdu_generation() called from hello_timer_expired() handler.

I.

>> 2) KERNEL_STP->NO_STP issue
>> hello timer and hold timers should be stopped (deleted) in br_stp_stop()
>>
>> 3) USER_STP->NO_STP issue
>> hello timer and hold timers should NOT be started in br_stp_stop()
>>
>
> Yep, ack.
>
>> Ivan
>>
>

Re: [PATCH net] bridge: start hello_timer when enabling KERNEL_STP in br_stp_start

2017-05-19 Thread Ivan Vecera

2017-05-19 16:57 GMT+02:00 Nikolay Aleksandrov <niko...@cumulusnetworks.com>:
> On 5/19/17 5:51 PM, Ivan Vecera wrote:
>>
>> 2017-05-19 16:45 GMT+02:00 Nikolay Aleksandrov
>> <niko...@cumulusnetworks.com>:
>>>
>>> On 5/19/17 5:20 PM, Xin Long wrote:
>>>>
>>>>
>>>> Since commit 76b91c32dd86 ("bridge: stp: when using userspace stp stop
>>>> kernel hello and hold timers"), bridge would not start hello_timer if
>>>> stp_enabled is not KERNEL_STP when br_dev_open.
>>>>
>>>> The problem is even if users set stp_enabled with KERNEL_STP later,
>>>> the timer will still not be started. It causes that KERNEL_STP can
>>>> not really work. Users have to re-ifup the bridge to avoid this.
>>>>
>>>> This patch is to fix it by starting br->hello_timer when enabling
>>>> KERNEL_STP in br_stp_start.
>>>>
>>>> As an improvement, it's also to start hello_timer again only when
>>>> br->stp_enabled is KERNEL_STP in br_hello_timer_expired, there is
>>>> no reason to start the timer again when it's NO_STP.
>>>>
>>>> Fixes: 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel
>>>> hello and hold timers")
>>>> Reported-by: Haidong Li <ha...@redhat.com>
>>>> Signed-off-by: Xin Long <lucien@gmail.com>
>>>> ---
>>>>net/bridge/br_stp_if.c| 1 +
>>>>net/bridge/br_stp_timer.c | 2 +-
>>>>2 files changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>
>>> This doesn't make much sense to me, how do you change from USER_STP to
>>> KERNEL_STP without first going through NO_STP ?
>>>
>>> If you go through NO_STP then all will be fine because br_stp_stop will
>>> restart
>>> the timers if the previous val was USER_STP.
>>>
>> The problem occurs when KERNEL_STP is enabled if the bridge itself is
>> already
>> up. Then the hello_timer is not started. If the hello and hold timers
>> should run only
>> when KERNEL_STP is used then there are another problematic places
>> (will send follow-up).
>>
>> Ivan
>>
>
> Oh, the problem seems to be rather going from NO_STP -> KERNEL_STP only
> then, because you cannot do direct USER_STP -> KERNEL_STP.
>
No only NO_STP->KERNEL_STP but KERNEL_STP->NO_STP as well as USER_STP->NO_STP:

1) NO_STP->KERNEL_STP issue
hello_timer should be started in br_stp_start() - this patch

2) KERNEL_STP->NO_STP issue
hello timer and hold timers should be stopped (deleted) in br_stp_stop()

3) USER_STP->NO_STP issue
hello timer and hold timers should NOT be started in br_stp_stop()

Ivan

Re: [PATCH net] bridge: start hello_timer when enabling KERNEL_STP in br_stp_start

2017-05-19 Thread Ivan Vecera

2017-05-19 16:45 GMT+02:00 Nikolay Aleksandrov :
> On 5/19/17 5:20 PM, Xin Long wrote:
>>
>> Since commit 76b91c32dd86 ("bridge: stp: when using userspace stp stop
>> kernel hello and hold timers"), bridge would not start hello_timer if
>> stp_enabled is not KERNEL_STP when br_dev_open.
>>
>> The problem is even if users set stp_enabled with KERNEL_STP later,
>> the timer will still not be started. It causes that KERNEL_STP can
>> not really work. Users have to re-ifup the bridge to avoid this.
>>
>> This patch is to fix it by starting br->hello_timer when enabling
>> KERNEL_STP in br_stp_start.
>>
>> As an improvement, it's also to start hello_timer again only when
>> br->stp_enabled is KERNEL_STP in br_hello_timer_expired, there is
>> no reason to start the timer again when it's NO_STP.
>>
>> Fixes: 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel
>> hello and hold timers")
>> Reported-by: Haidong Li 
>> Signed-off-by: Xin Long 
>> ---
>>   net/bridge/br_stp_if.c| 1 +
>>   net/bridge/br_stp_timer.c | 2 +-
>>   2 files changed, 2 insertions(+), 1 deletion(-)
>>
>
> This doesn't make much sense to me, how do you change from USER_STP to
> KERNEL_STP without first going through NO_STP ?
>
> If you go through NO_STP then all will be fine because br_stp_stop will
> restart
> the timers if the previous val was USER_STP.
>
The problem occurs when KERNEL_STP is enabled if the bridge itself is already
up. Then the hello_timer is not started. If the hello and hold timers
should run only
when KERNEL_STP is used then there are another problematic places
(will send follow-up).

Ivan

Re: [PATCH] switchdev: documentation: fix whitespace issues

2017-05-01 Thread Ivan Vecera

 |   |  |   |
> + +--++  +---+
> +|
> +   kernel   | HW bus (eg PCI)
> +  +---+
> +   hardware |
> + +--+---++
> + | Switch device (sw1)   |
> + |  ++   ++
> + |  |v offloaded data path   | mgmt port
> + |  ||   |
> + +--||++++---+
> +||||||
> +++++++
>     p1   p2   p3   p4   p5   p6
> -   
> - front-panel ports   
>  
> - 
>  
> +
> + front-panel ports
> +
>  
>  Fig 1.
>  
> 

Acked-by: Ivan Vecera <ivec...@redhat.com>

Re: [PATCH net v2 1/2] bridge: implement missing ndo_uninit()

2017-04-08 Thread Ivan Vecera

2017-04-08 16:14 GMT+02:00 Stephen Hemminger :
> On Sat, 8 Apr 2017 17:05:48 +0300
> Nikolay Aleksandrov  wrote:
>
>> On 08/04/17 16:49, Ido Schimmel wrote:
>> > On Sat, Apr 08, 2017 at 09:30:42AM -0400, Stephen Hemminger wrote:
>> >> On Sat, 8 Apr 2017 14:41:58 +0300
>> >>  wrote:
>> >>
>> >>>  static void br_dev_free(struct net_device *dev)
>> >>>  {
>> >>> - struct net_bridge *br = netdev_priv(dev);
>> >>> -
>> >>> - free_percpu(br->stats);
>> >>>   free_netdev(dev);
>> >>>  }
>> >>>
>> >>
>> >> Since the only thing left is free_netdev, you can now just set 
>> >> dev->destructor
>> >> to be free_netdev.
>> >
>> > Fine.
>> >
>> > Beside stylistic issues, I would appreciate comments on how this should
>> > be handled. Are we reverting the patch in the Fixes line or applying
>> > this patchset?
>> >
>> > I prefer the first option. Then after net is merged into net-next I can
>> > re-post this patchset with the requested changes.
>> >
>>
>> +1
>>
>>
>
> If this fixes the issue, then the one fix should go to stable, net and 
> net-next.
> There is no good reason to have two versions.
>
+1

Re: [patch net-next v2 0/8] Add support for pipeline debug (dpipe)

2017-03-28 Thread Ivan Vecera


Dne 28.3.2017 v 17:24 Jiri Pirko napsal(a):

From: Jiri Pirko <j...@mellanox.com>

Arkadi says:

While doing the hardware offloading process much of the hardware
specifics cannot be presented. An example for such is the routing
LPM algorithm which differ in hardware implementation from the
kernel software implementation. The only information the user receives
is whether specific route is offloaded or not, but he cannot really
understand the underlying implementation nor get the specific statistics
related to that process.

Another example is ACL offload using TC which is commonly implemented
using TCAM memory. Currently there is no capability to gain visibility
into the TCAM structure and to debug suboptimal resource allocation.

This patchset introduces capability for exporting the ASICs pipeline
abstraction via devlink infrastructure, which should serve as an
complementary tool. This infrastructure allows the user to get visibility
into the ASIC by modeling it as a set of match/action tables.

The main objects defined:
Table - abstraction for a single pipeline stage. Contains the
available match/actions and counter availability.
Entry - entry in a specific table with specific matches/actions
values and dedicated counter.
Header/field - tuples which describes the tables behavior.

As an example one of the ASIC's L3 blocks will be modeled. The egress
rif (router interface) table is the final step in the L3 pipeline
processing which does match on the internal rif index which was
determined before by the routing logic. The erif table determines
whether to forward or drop the packet and updates the corresponding
rif L3 statistics.

To expose this internal resources a special metadata header will
be introduced that describes the internal information gathered by
the ASIC's pipeline and contains the following fields: rif_port_index,
forward and drop.

Some internal hardware resources have direct mapping to kernel
objects. For example the rif_port_index is mapped to the net-devices
ifindex. By providing this mapping the users gains visibility into
the offloading process.

Follow-up work will include exporting more L3 tables which will give
visibility into the routing process.

First stage is adding support for dpipe in devlink. Next add support
in spectrum driver. Finally implement egress router interface
(erif) table for spectrum ASIC as an example.

---
v1->v2: Please see individual patches

Arkadi Sharshevsky (8):
  devlink: Support for pipeline debug (dpipe)
  mlxsw: reg: Add counter fields to RITR register
  mlxsw: spectrum: Add placeholder for dpipe
  mlxsw: spectrum: Add definition for egress rif table
  mlxsw: reg: Add Router Interface Counter Register
  mlxsw: spectrum: Support for counters on router interfaces
  mlxsw: spectrum_router: Add rif helper functions
  mlxsw: spectrum: Add Support for erif table entries access

 drivers/net/ethernet/mellanox/mlxsw/Makefile   |   2 +-
 drivers/net/ethernet/mellanox/mlxsw/reg.h  | 178 +
 drivers/net/ethernet/mellanox/mlxsw/resources.h|   2 +
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c |  10 +
 drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.c |   9 +
 drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.h |   1 +
 .../net/ethernet/mellanox/mlxsw/spectrum_dpipe.c   | 351 +
 .../net/ethernet/mellanox/mlxsw/spectrum_dpipe.h   |  43 ++
 .../net/ethernet/mellanox/mlxsw/spectrum_router.c  | 174 +
 .../net/ethernet/mellanox/mlxsw/spectrum_router.h  |  58 ++
 include/net/devlink.h  | 259 +++
 include/uapi/linux/devlink.h   |  67 +-
 net/core/devlink.c | 836 +
 13 files changed, 1988 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.h
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.h



Reviewed-by: Ivan Vecera <ivec...@redhat.com>

Re: [patch net-next 0/8] Add support for pipeline debug (dpipe)

2017-03-28 Thread Ivan Vecera


Dne 28.3.2017 v 09:59 Jiri Pirko napsal(a):

Tue, Mar 28, 2017 at 09:57:22AM CEST, j...@resnulli.us wrote:

Tue, Mar 28, 2017 at 09:42:58AM CEST, ivec...@redhat.com wrote:

Dne 28.3.2017 v 09:10 Jiri Pirko napsal(a):

Tue, Mar 28, 2017 at 12:48:34AM CEST, da...@davemloft.net wrote:


Please fix up these warnings and resubmit:

drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c: In function 
‘mlxsw_sp_rif_counter_free’:
drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:208:2: warning: 
‘p_counter_index’ may be used uninitialized in this function 
[-Wmaybe-uninitialized]
 mlxsw_sp_rif_counter_edit(mlxsw_sp, rif->rif_index,
 ^


This is I believe also gcc bug. Code looks fine. I'm not getting the warning
with gcc 6


No it is not a gcc bug. The function mlxsw_sp_rif_counter_free() is not
static so the compiler cannot know all its callers and so 'dir' parameter can
be theoretically anything.
You call mlxsw_sp_rif_p_counter_get() there it assumes dir can be only
MLXSW_SP_RIF_COUNTER_EGRESS or MLXSW_SP_RIF_COUNTER_INGRESS so initializes
*pp_counter_index only for them. For any other value the value is
uninitialized.


Interesting, why gcc 6.2.1 is silent then?


Oh, I see that it is not :) My bad, will fix, thanks!

Probably because 'dir' is enumeration and this gcc assumes that any caller uses 
only defined enum values.


Ivan

Re: [patch net-next 0/8] Add support for pipeline debug (dpipe)

2017-03-28 Thread Ivan Vecera


Dne 28.3.2017 v 08:51 Jiri Pirko napsal(a):

Tue, Mar 28, 2017 at 12:48:34AM CEST, da...@davemloft.net wrote:


Please fix up these warnings and resubmit:

drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c: In function 
‘mlxsw_sp_rif_counter_free’:
drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:208:2: warning: 
‘p_counter_index’ may be used uninitialized in this function 
[-Wmaybe-uninitialized]
 mlxsw_sp_rif_counter_edit(mlxsw_sp, rif->rif_index,
 ^
drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c: In function 
‘mlxsw_sp_table_erif_entries_dump’:
drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c:220:9: warning: missing 
braces around initializer [-Wmissing-braces]
 struct devlink_dpipe_value match_value = {0}, action_value = {0};


I know about this warning. I believe that it is a gcc bug. It happens
for gcc 4, but gcc 6 is silent, therefore I choose to ignore this. Do
you still want me to fix it?

Yes, this is gcc bug (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53119). The 
problem is the first member of struct is union so older gcc assumes initializer 
like '{ { 0 } }' instead of '{ 0 }'.


IMHO it should be better to workaround this bug for older gccs as gcc6 is not 
only supported compiler.


Ivan

Re: [patch net-next 0/8] Add support for pipeline debug (dpipe)

2017-03-28 Thread Ivan Vecera


Dne 28.3.2017 v 09:10 Jiri Pirko napsal(a):

Tue, Mar 28, 2017 at 12:48:34AM CEST, da...@davemloft.net wrote:


Please fix up these warnings and resubmit:

drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c: In function 
‘mlxsw_sp_rif_counter_free’:
drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:208:2: warning: 
‘p_counter_index’ may be used uninitialized in this function 
[-Wmaybe-uninitialized]
 mlxsw_sp_rif_counter_edit(mlxsw_sp, rif->rif_index,
 ^


This is I believe also gcc bug. Code looks fine. I'm not getting the warning
with gcc 6

No it is not a gcc bug. The function mlxsw_sp_rif_counter_free() is not static 
so the compiler cannot know all its callers and so 'dir' parameter can be 
theoretically anything.
You call mlxsw_sp_rif_p_counter_get() there it assumes dir can be only 
MLXSW_SP_RIF_COUNTER_EGRESS or MLXSW_SP_RIF_COUNTER_INGRESS so initializes 
*pp_counter_index only for them. For any other value the value is uninitialized.


Ivan

[PATCH ethtool] ethtool: sync help output for -x/-X with man page

2017-02-20 Thread Ivan Vecera

Add missing words to the help output for -x and -X commands as well
as an ability to set the indirection table to default value.

Signed-off-by: Ivan Vecera <c...@cera.cz>
---
 ethtool.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/ethtool.c b/ethtool.c
index 7af039e..61043ee 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -4786,10 +4786,10 @@ static const struct option {
{ "-T|--show-time-stamping", 1, do_tsinfo,
  "Show time stamping capabilities" },
{ "-x|--show-rxfh-indir|--show-rxfh", 1, do_grxfh,
- "Show Rx flow hash indirection and/or hash key" },
+ "Show Rx flow hash indirection table and/or RSS hash key" },
{ "-X|--set-rxfh-indir|--rxfh", 1, do_srxfh,
- "Set Rx flow hash indirection and/or hash key",
- " [ equal N | weight W0 W1 ... ]\n"
+ "Set Rx flow hash indirection table and/or RSS hash key",
+ " [ equal N | weight W0 W1 ... | default ]\n"
  " [ hkey %x:%x:%x:%x:%x: ]\n" },
{ "-f|--flash", 1, do_flash,
  "Flash firmware image from the specified file to a region on the 
device",
-- 
2.10.2

Re: [patch net-next 01/10] switchdev: bridge: Offload multicast disabled

2017-02-09 Thread Ivan Vecera


Dne 9.2.2017 v 14:54 Jiri Pirko napsal(a):

From: Nogah Frankel <nog...@mellanox.com>

Offload multicast disabled flag, for more accurate mc flood behavior:
When it is on, the mdb should be ignored.
When it is off, unregistered mc packets should be flooded to mc router
ports.

Signed-off-by: Nogah Frankel <nog...@mellanox.com>
Signed-off-by: Yotam Gigi <yot...@mellanox.com>
Signed-off-by: Jiri Pirko <j...@mellanox.com>
---
 include/net/switchdev.h   |  2 ++
 net/bridge/br_multicast.c | 16 
 2 files changed, 18 insertions(+)

diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index eba80c4..2971c2a 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -48,6 +48,7 @@ enum switchdev_attr_id {
SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS,
SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME,
SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING,
+   SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED,
 };

 struct switchdev_attr {
@@ -62,6 +63,7 @@ struct switchdev_attr {
unsigned long brport_flags; /* PORT_BRIDGE_FLAGS */
clock_t ageing_time;/* BRIDGE_AGEING_TIME */
bool vlan_filtering;/* 
BRIDGE_VLAN_FILTERING */
+   bool mc_disabled;   /* MC_DISABLED */
} u;
 };

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 1de3438..8c0e896 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #if IS_ENABLED(CONFIG_IPV6)
 #include 
 #include 
@@ -1007,6 +1008,18 @@ static void br_ip6_multicast_port_query_expired(unsigned 
long data)
 }
 #endif

+static void br_mc_disabled_update(struct net_device *dev, bool value)
+{
+   struct switchdev_attr attr = {
+   .orig_dev = dev,
+   .id = SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED,
+   .flags = SWITCHDEV_F_DEFER,
+   .u.mc_disabled = value,
+   };
+
+   switchdev_port_attr_set(dev, );
+}
+
 int br_multicast_add_port(struct net_bridge_port *port)
 {
port->multicast_router = MDB_RTR_TYPE_TEMP_QUERY;
@@ -1019,6 +1032,8 @@ int br_multicast_add_port(struct net_bridge_port *port)
setup_timer(>ip6_own_query.timer,
br_ip6_multicast_port_query_expired, (unsigned long)port);
 #endif
+   br_mc_disabled_update(port->dev, port->br->multicast_disabled);
+
port->mcast_stats = netdev_alloc_pcpu_stats(struct bridge_mcast_stats);
if (!port->mcast_stats)
return -ENOMEM;
@@ -2121,6 +2136,7 @@ int br_multicast_toggle(struct net_bridge *br, unsigned 
long val)
if (br->multicast_disabled == !val)
goto unlock;

+   br_mc_disabled_update(br->dev, !val);
br->multicast_disabled = !val;
if (br->multicast_disabled)
goto unlock;


Acked-by: Ivan Vecera <ivec...@redhat.com>

Re: [patch net-next 03/10] switchdev: bridge: Offload mc router ports

2017-02-09 Thread Ivan Vecera


Dne 9.2.2017 v 14:54 Jiri Pirko napsal(a):

From: Nogah Frankel <nog...@mellanox.com>

Offload the mc router ports list, whenever it is being changed.
It is done because in some cases mc packets needs to be flooded to all
the ports in this list.

Signed-off-by: Nogah Frankel <nog...@mellanox.com>
Signed-off-by: Yotam Gigi <yot...@mellanox.com>
Signed-off-by: Jiri Pirko <j...@mellanox.com>
---
 include/net/switchdev.h   |  2 ++
 net/bridge/br_multicast.c | 15 +++
 2 files changed, 17 insertions(+)

diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 2971c2a..929d6af 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -46,6 +46,7 @@ enum switchdev_attr_id {
SWITCHDEV_ATTR_ID_PORT_PARENT_ID,
SWITCHDEV_ATTR_ID_PORT_STP_STATE,
SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS,
+   SWITCHDEV_ATTR_ID_PORT_MROUTER,
SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME,
SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING,
SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED,
@@ -61,6 +62,7 @@ struct switchdev_attr {
struct netdev_phys_item_id ppid;/* PORT_PARENT_ID */
u8 stp_state;   /* PORT_STP_STATE */
unsigned long brport_flags; /* PORT_BRIDGE_FLAGS */
+   bool mrouter;   /* PORT_MROUTER */
clock_t ageing_time;/* BRIDGE_AGEING_TIME */
bool vlan_filtering;/* 
BRIDGE_VLAN_FILTERING */
bool mc_disabled;   /* MC_DISABLED */
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 2add6d4..b760f26 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1317,6 +1317,19 @@ br_multicast_update_query_timer(struct net_bridge *br,
mod_timer(>timer, jiffies + br->multicast_querier_interval);
 }

+static void br_port_mc_router_state_change(struct net_bridge_port *p,
+  bool is_mc_router)
+{
+   struct switchdev_attr attr = {
+   .orig_dev = p->dev,
+   .id = SWITCHDEV_ATTR_ID_PORT_MROUTER,
+   .flags = SWITCHDEV_F_DEFER,
+   .u.mrouter = is_mc_router,
+   };
+
+   switchdev_port_attr_set(p->dev, );
+}
+
 /*
  * Add port to router_list
  *  list is maintained ordered by pointer value
@@ -1342,6 +1355,7 @@ static void br_multicast_add_router(struct net_bridge *br,
else
hlist_add_head_rcu(>rlist, >router_list);
br_rtr_notify(br->dev, port, RTM_NEWMDB);
+   br_port_mc_router_state_change(port, true);
 }

 static void br_multicast_mark_router(struct net_bridge *br,
@@ -2049,6 +2063,7 @@ static void __del_port_router(struct net_bridge_port *p)
return;
hlist_del_init_rcu(>rlist);
br_rtr_notify(p->br->dev, p, RTM_DELMDB);
+   br_port_mc_router_state_change(p, false);

/* don't allow timer refresh */
if (p->multicast_router == MDB_RTR_TYPE_TEMP)


Acked-by: Ivan Vecera <ivec...@redhat.com>

Re: [patch net-next 02/10] bridge: mcast: Merge the mc router ports deletions to one function

2017-02-09 Thread Ivan Vecera


Dne 9.2.2017 v 14:54 Jiri Pirko napsal(a):

From: Nogah Frankel <nog...@mellanox.com>

There are three places where a port gets deleted from the mc router port
list. This patch join the actual deletion to one function.
It will be helpful for later patch that will offload changes in the mc
router ports list.

Signed-off-by: Nogah Frankel <nog...@mellanox.com>
Signed-off-by: Yotam Gigi <yot...@mellanox.com>
Signed-off-by: Jiri Pirko <j...@mellanox.com>
---
 net/bridge/br_multicast.c | 24 +---
 1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 8c0e896..2add6d4 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -47,6 +47,7 @@ static void br_ip4_multicast_leave_group(struct net_bridge 
*br,
 __u16 vid,
 const unsigned char *src);

+static void __del_port_router(struct net_bridge_port *p);
 #if IS_ENABLED(CONFIG_IPV6)
 static void br_ip6_multicast_leave_group(struct net_bridge *br,
 struct net_bridge_port *port,
@@ -850,16 +851,10 @@ static void br_multicast_router_expired(unsigned long 
data)
spin_lock(>multicast_lock);
if (port->multicast_router == MDB_RTR_TYPE_DISABLED ||
port->multicast_router == MDB_RTR_TYPE_PERM ||
-   timer_pending(>multicast_router_timer) ||
-   hlist_unhashed(>rlist))
+   timer_pending(>multicast_router_timer))
goto out;

-   hlist_del_init_rcu(>rlist);
-   br_rtr_notify(br->dev, port, RTM_DELMDB);
-   /* Don't allow timer refresh if the router expired */
-   if (port->multicast_router == MDB_RTR_TYPE_TEMP)
-   port->multicast_router = MDB_RTR_TYPE_TEMP_QUERY;
-
+   __del_port_router(port);
 out:
spin_unlock(>multicast_lock);
 }
@@ -1101,13 +1096,8 @@ void br_multicast_disable_port(struct net_bridge_port 
*port)
if (!(pg->flags & MDB_PG_FLAGS_PERMANENT))
br_multicast_del_pg(br, pg);

-   if (!hlist_unhashed(>rlist)) {
-   hlist_del_init_rcu(>rlist);
-   br_rtr_notify(br->dev, port, RTM_DELMDB);
-   /* Don't allow timer refresh if disabling */
-   if (port->multicast_router == MDB_RTR_TYPE_TEMP)
-   port->multicast_router = MDB_RTR_TYPE_TEMP_QUERY;
-   }
+   __del_port_router(port);
+
del_timer(>multicast_router_timer);
del_timer(>ip4_own_query.timer);
 #if IS_ENABLED(CONFIG_IPV6)
@@ -2059,6 +2049,10 @@ static void __del_port_router(struct net_bridge_port *p)
return;
hlist_del_init_rcu(>rlist);
br_rtr_notify(p->br->dev, p, RTM_DELMDB);
+
+   /* don't allow timer refresh */
+   if (p->multicast_router == MDB_RTR_TYPE_TEMP)
+   p->multicast_router = MDB_RTR_TYPE_TEMP_QUERY;
 }

 int br_multicast_set_port_router(struct net_bridge_port *p, unsigned long val)


Acked-by: Ivan Vecera <ivec...@redhat.com>

Re: [patch net-next] MAINTAINERS: add Ivan as a switchdev maintainer

2017-02-02 Thread Ivan Vecera


Dne 2.2.2017 v 17:05 Jiri Pirko napsal(a):

From: Jiri Pirko <j...@mellanox.com>

Ivan will be taking care of switchdev code from now on.

Signed-off-by: Jiri Pirko <j...@mellanox.com>
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 300d2ec..eacacb2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11908,6 +11908,7 @@ F:  include/linux/swiotlb.h

 SWITCHDEV
 M: Jiri Pirko <j...@resnulli.us>
+M: Ivan Vecera <ivec...@redhat.com>
 L: netdev@vger.kernel.org
 S: Supported
 F: net/switchdev/


Acked-by: Ivan Vecera <ivec...@redhat.com>

[PATCH net v2] be2net: fix initial MAC setting

2017-01-31 Thread Ivan Vecera

Recent commit 34393529163a ("be2net: fix MAC addr setting on privileged
BE3 VFs") allows privileged BE3 VFs to set its MAC address during
initialization. Although the initial MAC for such VFs is already
programmed by parent PF the subsequent setting performed by VF is OK,
but in certain cases (after fresh boot) this command in VF can fail.

The MAC should be initialized only when:
1) no MAC is programmed (always except BE3 VFs during first init)
2) programmed MAC is different from requested (e.g. MAC is set when
   interface is down). In this case the initial MAC programmed by PF
   needs to be deleted.

The adapter->dev_mac contains MAC address currently programmed in HW so
it should be zeroed when the MAC is deleted from HW and should not be
filled when MAC is set when interface is down in be_mac_addr_set() as
no programming is performed in this case.

Example of failure without the fix (immediately after fresh boot):

# ip link set eth0 up  <- eth0 is BE3 PF
be2net :01:00.0 eth0: Link is Up

# echo 1 > /sys/class/net/eth0/device/sriov_numvfs  <- Create 1 VF
...
be2net :01:04.0: Emulex OneConnect(be3): VF  port 0

# ip link set eth8 up  <- eth8 is created privileged VF
be2net :01:04.0: opcode 59-1 failed:status 1-76
RTNETLINK answers: Input/output error

# echo 0 > /sys/class/net/eth0/device/sriov_numvfs  <- Delete VF
iommu: Removing device :01:04.0 from group 33
...

# echo 1 > /sys/class/net/eth0/device/sriov_numvfs  <- Create it again
iommu: Removing device :01:04.0 from group 33
...

# ip link set eth8 up
be2net :01:04.0 eth8: Link is Up

Initialization is now OK.

v2 - Corrected the comment and condition check suggested by Suresh & Harsha

Fixes: 34393529163a ("be2net: fix MAC addr setting on privileged BE3 VFs")
Cc: Sathya Perla <sathya.pe...@broadcom.com>
Cc: Ajit Khaparde <ajit.khapa...@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapa...@broadcom.com>
Cc: Somnath Kotur <somnath.ko...@broadcom.com>
Signed-off-by: Ivan Vecera <c...@cera.cz>
---
 drivers/net/ethernet/emulex/benet/be_main.c | 33 -
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c 
b/drivers/net/ethernet/emulex/benet/be_main.c
index 1a7f8ad7b9c6..cd49a54c538d 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -362,8 +362,10 @@ static int be_mac_addr_set(struct net_device *netdev, void 
*p)
status = -EPERM;
goto err;
}
-done:
+
+   /* Remember currently programmed MAC */
ether_addr_copy(adapter->dev_mac, addr->sa_data);
+done:
ether_addr_copy(netdev->dev_addr, addr->sa_data);
dev_info(dev, "MAC address changed to %pM\n", addr->sa_data);
return 0;
@@ -3618,8 +3620,10 @@ static void be_disable_if_filters(struct be_adapter 
*adapter)
 {
/* Don't delete MAC on BE3 VFs without FILTMGMT privilege  */
if (!BEx_chip(adapter) || !be_virtfn(adapter) ||
-   check_privilege(adapter, BE_PRIV_FILTMGMT))
+   check_privilege(adapter, BE_PRIV_FILTMGMT)) {
be_dev_mac_del(adapter, adapter->pmac_id[0]);
+   eth_zero_addr(adapter->dev_mac);
+   }
 
be_clear_uc_list(adapter);
be_clear_mc_list(adapter);
@@ -3773,12 +3777,27 @@ static int be_enable_if_filters(struct be_adapter 
*adapter)
if (status)
return status;
 
-   /* Don't add MAC on BE3 VFs without FILTMGMT privilege */
-   if (!BEx_chip(adapter) || !be_virtfn(adapter) ||
-   check_privilege(adapter, BE_PRIV_FILTMGMT)) {
+   /* Normally this condition usually true as the ->dev_mac is zeroed.
+* But on BE3 VFs the initial MAC is pre-programmed by PF and
+* subsequent be_dev_mac_add() can fail (after fresh boot)
+*/
+   if (!ether_addr_equal(adapter->dev_mac, adapter->netdev->dev_addr)) {
+   int old_pmac_id = -1;
+
+   /* Remember old programmed MAC if any - can happen on BE3 VF */
+   if (!is_zero_ether_addr(adapter->dev_mac))
+   old_pmac_id = adapter->pmac_id[0];
+
status = be_dev_mac_add(adapter, adapter->netdev->dev_addr);
if (status)
return status;
+
+   /* Delete the old programmed MAC as we successfully programmed
+* a new MAC
+*/
+   if (old_pmac_id >= 0 && old_pmac_id != adapter->pmac_id[0])
+   be_dev_mac_del(adapter, old_pmac_id);
+
ether_addr_copy(adapter->dev_mac, adapter->netdev->dev_addr);
}
 
@@ -4552,6 +4571,10 @@ static int be_mac_setup(struct be_adapter *adapter)
 
memcpy(adapter->netdev->

Re: [PATCH net] be2net: fix initial MAC setting

2017-01-31 Thread Ivan Vecera

My apologies for my previous replies... Gmail sent them as HTML emails
that are not acceptable for ML...

Thanks,
Ivan

Re: [PATCH net] be2net: fix initial MAC setting

2017-01-31 Thread Ivan Vecera

2017-01-31 19:01 GMT+01:00 David Miller :
>
>
> Sriharsha and other Broadcom folks, you must follow-up on Ivan's
> explanations and proof of testing.
>
> It is not acceptable for you to leave this patch's review state in
> limbo for days, no matter how complicated or big the patch is.  You
> must at the very least say what you are doing, and how long it will
> take for you to do that before you will be able to fully respond.
>
> Thank you.

Hi Dave,
I have some review notes from them but they have not cc-ed netdev
mailing list. I'm including their comments:

Reply 1:
">> > +
>> > +   /* Delete old programmed MAC if necessary */
>> > +   if (old_pmac_id > 0 && old_pmac_id !=
>> > adapter->pmac_id[0])
>> > +   be_dev_mac_del(adapter, old_pmac_id);
>>
>> I'm trying to understand why you added the above call to be_dev_mac_del()
>> here.
>> Since be_close() --> be_disable_if_filters() already does this.
>>
> It is necessary, see:
>
> 1) Lets say, we have created BE3 VF... it has programmed MAC1 by PF
> 2) This VF is initially down
> 3) Lets change its MAC address to MAC2. Because the interface is down then
> no programming is done and MAC1 is still active in HW
> -> adapter->dev_mac = MAC1 and adapter->netdev->dev_addr = MAC2
> 4) Now we set the interface up and be_enable_if_filters() is executed
>-> dev_mac and dev_addr is different. so MAC2 is programmed by
> be_dev_mac_add()
> 5) Interface is up and has MAC2 as an active MAC
> 6) Lets change active MAC to MAC1
>-> this will fail if you didn't delete inital MAC1 at step 4

Ok, so this takes care of the case where we do another set-mac without
an intervening interface-down right ? Can you please change the
comment to something like this (drop the "if necessary" since I feel
it is a distraction).

"Delete the old programmed MAC as we successfully programmed a new MAC"

Thanks for the fix, we will update you as soon as we are done with all
our tests tomorrow."

Reply 2:
"I am seeing the collision error with the steps mentioned in your
previous mail. MAC address will not be deleted when old_mac_id is 0.
ZERO (0) is the valid pmac id. So the above check should be >=.

Can you please verify again and fix it.

Regards,
Suresh."

So I will submit v2 with corrected comment and condition check. Btw.
Suresh, Harsha, next time please always cc netdev mailing list. It is
not my responsibility to replicate your emails.

Thanks,
Ivan

Re: [PATCH net] be2net: fix initial MAC setting

2017-01-27 Thread Ivan Vecera

2017-01-27 13:38 GMT+01:00 Sriharsha Basavapatna
:
> Hi Ivan,
>
> This patch is a bit involved. We need some time to review and test
> this to make sure it works with other (non-BE3/VF - Skyhawk, Lancer)
> devices. Also, IIRC we shouldn't see this issue with the latest FW.
>
> Thanks,
> -Harsha

Btw. FW 11.1.215.0 is the latest available... As I described the
problem is reproducible after fresh boot.

1) After _FRESH_ boot we have enp1s0f0 interface that is BE3 PF with
SR-IOV enabled
[root@sm-02 ~]# uname -r
4.10.0-rc4-00189-ga47b70e
[root@sm-02 ~]# ethtool -i enp1s0f0 | egrep '(firm|bus)'
firmware-version: 11.1.215.0
bus-info: :01:00.0
[root@sm-02 ~]# lspci -vv -s :01:00.0 | egrep '(Name:|VFs)'
   Product Name: OCe11102-FM 2P 10GbE Tomcat Enterprise CNA, NIC PF
   Initial VFs: 32, Total VFs: 32, Number of VFs: 0,
Function Dependency Link: 00

2) Now we create 1 VF
[root@sm-02 ~]# ip link set enp1s0f1 up
be2net :01:00.1 enp1s0f1: Link is Up
[root@sm-02 ~]# echo 1 > /sys/class/net/enp1s0f1/device/sriov_numvfs
...
be2net :01:00.1: VF0 has FILTMGMT privilege
...
be2net :01:08.0: Emulex OneConnect(be3): VF  port 1
be2net :01:08.0 enp1s8f0: renamed from eth0

3) Now we have 1 privileged VF named enp1s8f0 that is down
[root@sm-02 ~]# ip link show dev enp1s8f0
12: enp1s8f0:  mtu 1500 qdisc noop portid
02333630384254 state DOWN mode DEFAULT qlen 1000
   link/ether 00:00:c9:9c:5c:60 brd ff:ff:ff:ff:ff:ff

4) Now we set the interface up (this fails)
[root@sm-02 ~]# ip link set enp1s8f0 up
be2net :01:08.0: opcode 59-1 failed:status 1-76
RTNETLINK answers: Input/output error

5) Delete VF and create 2 new VFs
[root@sm-02 ~]# echo 0 > /sys/class/net/enp1s0f1/device/sriov_numvfs
...
[root@sm-02 ~]# echo 2 > /sys/class/net/enp1s0f1/device/sriov_numvfs
...
be2net :01:08.0 enp1s8f0: renamed from eth0
...
be2net :01:08.1 enp1s8f1: renamed from eth0

6) We have 2 privileged VFs named enp1s8f0 and enp1s8f1, both down
[root@sm-02 ~]# ip link show dev enp1s8f0
13: enp1s8f0:  mtu 1500 qdisc noop portid
02333630384254 state DOWN mode DEFAULT qlen 1000
   link/ether 00:00:c9:9c:5c:60 brd ff:ff:ff:ff:ff:ff
[root@sm-02 ~]# ip link show dev enp1s8f1
14: enp1s8f1:  mtu 1500 qdisc noop portid
02333630384254 state DOWN mode DEFAULT qlen 1000
   link/ether 00:00:c9:9c:5c:61 brd ff:ff:ff:ff:ff:ff

7) Now set them up
[root@sm-02 ~]# ip link set enp1s8f0 up
be2net :01:08.0 enp1s8f0: Link is Up
[root@sm-02 ~]# ip link set enp1s8f1 up
be2net :01:08.1: opcode 59-1 failed:status 1-76
RTNETLINK answers: Input/output error

As you can see the re-used interface enp1s8f0 is OK but not-yet-used
interface enp1s8f1 fails.

8) Try to set enp1s8f1 up again
[root@sm-02 ~]# ip link set enp1s8f1 up
be2net :01:08.1 enp1s8f1: Link is Up

Now it is OK... This is because be_close() is called when
be_enable_if_filters() fails. be_close() calls be_disable_if_filters()
and there the MAC pre-programmed by PF is deleted by be_dev_mac_del().
So when you try to set the interface up again then the pre-programmed
MAC is away and be_dev_mac_add() succeeds.

Regards,
Ivan

Re: [PATCH net] be2net: fix initial MAC setting

2017-01-27 Thread Ivan Vecera

2017-01-27 13:38 GMT+01:00 Sriharsha Basavapatna
<sriharsha.basavapa...@broadcom.com>:
> Hi Ivan,
>
> This patch is a bit involved. We need some time to review and test
> this to make sure it works with other (non-BE3/VF - Skyhawk, Lancer)
> devices. Also, IIRC we shouldn't see this issue with the latest FW.
>
I have done tests on BE3 PF & both privileged and non-privileged as well
as on Skyhawk PF and VF. No problems found.
Regarding firmware, I'm using version 11.1.215.0 on my devices and this version
is affected on BE3. Older versions are affected as well.

Ivan
>
> On Thu, Jan 26, 2017 at 3:58 PM, Ivan Vecera <c...@cera.cz> wrote:
>> Recent commit 34393529163a ("be2net: fix MAC addr setting on privileged
>> BE3 VFs") allows privileged BE3 VFs to set its MAC address during
>> initialization. Although the initial MAC for such VFs is already
>> programmed by parent PF the subsequent setting performed by VF is OK,
>> but in certain cases (after fresh boot) this command in VF can fail.
>>
>> The MAC should be initialized only when:
>> 1) no MAC is programmed (always except BE3 VFs during first init)
>> 2) programmed MAC is different from requested (e.g. MAC is set when
>>interface is down). In this case the initial MAC programmed by PF
>>needs to be deleted.
>>
>> The adapter->dev_mac contains MAC address currently programmed in HW so
>> it should be zeroed when the MAC is deleted from HW and should not be
>> filled when MAC is set when interface is down in be_mac_addr_set() as
>> no programming is performed in this case.
>>
>> Example of failure without the fix (immediately after fresh boot):
>>
>> # ip link set eth0 up  <- eth0 is BE3 PF
>> be2net :01:00.0 eth0: Link is Up
>>
>> # echo 1 > /sys/class/net/eth0/device/sriov_numvfs  <- Create 1 VF
>> ...
>> be2net :01:04.0: Emulex OneConnect(be3): VF  port 0
>>
>> # ip link set eth8 up  <- eth8 is created privileged VF
>> be2net :01:04.0: opcode 59-1 failed:status 1-76
>> RTNETLINK answers: Input/output error
>>
>> # echo 0 > /sys/class/net/eth0/device/sriov_numvfs  <- Delete VF
>> iommu: Removing device :01:04.0 from group 33
>> ...
>>
>> # echo 1 > /sys/class/net/eth0/device/sriov_numvfs  <- Create it again
>> iommu: Removing device :01:04.0 from group 33
>> ...
>>
>> # ip link set eth8 up
>> be2net :01:04.0 eth8: Link is Up
>>
>> Initialization is now OK.
>>
>> Fixes: 34393529163a ("be2net: fix MAC addr setting on privileged BE3 VFs")
>> Cc: Sathya Perla <sathya.pe...@broadcom.com>
>> Cc: Ajit Khaparde <ajit.khapa...@broadcom.com>
>> Cc: Sriharsha Basavapatna <sriharsha.basavapa...@broadcom.com>
>> Cc: Somnath Kotur <somnath.ko...@broadcom.com>
>> Signed-off-by: Ivan Vecera <c...@cera.cz>
>> ---
>>  drivers/net/ethernet/emulex/benet/be_main.c | 31 
>> -
>>  1 file changed, 26 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/emulex/benet/be_main.c 
>> b/drivers/net/ethernet/emulex/benet/be_main.c
>> index 1a7f8ad7b9c6..cf13b99b8551 100644
>> --- a/drivers/net/ethernet/emulex/benet/be_main.c
>> +++ b/drivers/net/ethernet/emulex/benet/be_main.c
>> @@ -362,8 +362,10 @@ static int be_mac_addr_set(struct net_device *netdev, 
>> void *p)
>> status = -EPERM;
>> goto err;
>> }
>> -done:
>> +
>> +   /* Remember currently programmed MAC */
>> ether_addr_copy(adapter->dev_mac, addr->sa_data);
>> +done:
>> ether_addr_copy(netdev->dev_addr, addr->sa_data);
>> dev_info(dev, "MAC address changed to %pM\n", addr->sa_data);
>> return 0;
>> @@ -3618,8 +3620,10 @@ static void be_disable_if_filters(struct be_adapter 
>> *adapter)
>>  {
>> /* Don't delete MAC on BE3 VFs without FILTMGMT privilege  */
>> if (!BEx_chip(adapter) || !be_virtfn(adapter) ||
>> -   check_privilege(adapter, BE_PRIV_FILTMGMT))
>> +   check_privilege(adapter, BE_PRIV_FILTMGMT)) {
>> be_dev_mac_del(adapter, adapter->pmac_id[0]);
>> +   eth_zero_addr(adapter->dev_mac);
>> +   }
>>
>> be_clear_uc_list(adapter);
>> be_clear_mc_list(adapter);
>> @@ -3773,12 +3777,25 @@ static int be_enable_if_filters(struct be_adapter 
>> *adapter)
>> if (status)
>> return status;
>>
>

[PATCH net] be2net: fix initial MAC setting

2017-01-26 Thread Ivan Vecera

Recent commit 34393529163a ("be2net: fix MAC addr setting on privileged
BE3 VFs") allows privileged BE3 VFs to set its MAC address during
initialization. Although the initial MAC for such VFs is already
programmed by parent PF the subsequent setting performed by VF is OK,
but in certain cases (after fresh boot) this command in VF can fail.

The MAC should be initialized only when:
1) no MAC is programmed (always except BE3 VFs during first init)
2) programmed MAC is different from requested (e.g. MAC is set when
   interface is down). In this case the initial MAC programmed by PF
   needs to be deleted.

The adapter->dev_mac contains MAC address currently programmed in HW so
it should be zeroed when the MAC is deleted from HW and should not be
filled when MAC is set when interface is down in be_mac_addr_set() as
no programming is performed in this case.

Example of failure without the fix (immediately after fresh boot):

# ip link set eth0 up  <- eth0 is BE3 PF
be2net :01:00.0 eth0: Link is Up

# echo 1 > /sys/class/net/eth0/device/sriov_numvfs  <- Create 1 VF
...
be2net :01:04.0: Emulex OneConnect(be3): VF  port 0

# ip link set eth8 up  <- eth8 is created privileged VF
be2net :01:04.0: opcode 59-1 failed:status 1-76
RTNETLINK answers: Input/output error

# echo 0 > /sys/class/net/eth0/device/sriov_numvfs  <- Delete VF
iommu: Removing device :01:04.0 from group 33
...

# echo 1 > /sys/class/net/eth0/device/sriov_numvfs  <- Create it again
iommu: Removing device :01:04.0 from group 33
...

# ip link set eth8 up
be2net :01:04.0 eth8: Link is Up

Initialization is now OK.

Fixes: 34393529163a ("be2net: fix MAC addr setting on privileged BE3 VFs")
Cc: Sathya Perla <sathya.pe...@broadcom.com>
Cc: Ajit Khaparde <ajit.khapa...@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapa...@broadcom.com>
Cc: Somnath Kotur <somnath.ko...@broadcom.com>
Signed-off-by: Ivan Vecera <c...@cera.cz>
---
 drivers/net/ethernet/emulex/benet/be_main.c | 31 -
 1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c 
b/drivers/net/ethernet/emulex/benet/be_main.c
index 1a7f8ad7b9c6..cf13b99b8551 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -362,8 +362,10 @@ static int be_mac_addr_set(struct net_device *netdev, void 
*p)
status = -EPERM;
goto err;
}
-done:
+
+   /* Remember currently programmed MAC */
ether_addr_copy(adapter->dev_mac, addr->sa_data);
+done:
ether_addr_copy(netdev->dev_addr, addr->sa_data);
dev_info(dev, "MAC address changed to %pM\n", addr->sa_data);
return 0;
@@ -3618,8 +3620,10 @@ static void be_disable_if_filters(struct be_adapter 
*adapter)
 {
/* Don't delete MAC on BE3 VFs without FILTMGMT privilege  */
if (!BEx_chip(adapter) || !be_virtfn(adapter) ||
-   check_privilege(adapter, BE_PRIV_FILTMGMT))
+   check_privilege(adapter, BE_PRIV_FILTMGMT)) {
be_dev_mac_del(adapter, adapter->pmac_id[0]);
+   eth_zero_addr(adapter->dev_mac);
+   }
 
be_clear_uc_list(adapter);
be_clear_mc_list(adapter);
@@ -3773,12 +3777,25 @@ static int be_enable_if_filters(struct be_adapter 
*adapter)
if (status)
return status;
 
-   /* Don't add MAC on BE3 VFs without FILTMGMT privilege */
-   if (!BEx_chip(adapter) || !be_virtfn(adapter) ||
-   check_privilege(adapter, BE_PRIV_FILTMGMT)) {
+   /* Normally this condition usually true as the ->dev_mac is zeroed.
+* But on BE3 VFs the initial MAC is pre-programmed by PF and
+* subsequent be_dev_mac_add() can fail (after fresh boot)
+*/
+   if (!ether_addr_equal(adapter->dev_mac, adapter->netdev->dev_addr)) {
+   int old_pmac_id = -1;
+
+   /* Remember old programmed MAC if any - can happen on BE3 VF */
+   if (!is_zero_ether_addr(adapter->dev_mac))
+   old_pmac_id = adapter->pmac_id[0];
+
status = be_dev_mac_add(adapter, adapter->netdev->dev_addr);
if (status)
return status;
+
+   /* Delete old programmed MAC if necessary */
+   if (old_pmac_id > 0 && old_pmac_id != adapter->pmac_id[0])
+   be_dev_mac_del(adapter, old_pmac_id);
+
ether_addr_copy(adapter->dev_mac, adapter->netdev->dev_addr);
}
 
@@ -4552,6 +4569,10 @@ static int be_mac_setup(struct be_adapter *adapter)
 
memcpy(adapter->netdev->dev_addr, mac, ETH_ALEN);
memcpy(adapter->netdev->perm_addr, mac, ETH_ALEN);
+
+   /* Initial MAC

[PATCH net v3] bridge: netlink: call br_changelink() during br_dev_newlink()

2017-01-20 Thread Ivan Vecera

Any bridge options specified during link creation (e.g. ip link add)
are ignored as br_dev_newlink() does not process them.
Use br_changelink() to do it.

Fixes: 1332351 ("bridge: implement rtnl_link_ops->changelink")
Signed-off-by: Ivan Vecera <c...@cera.cz>
---
 net/bridge/br_netlink.c | 33 +++--
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 71c7453..7109b38 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -781,20 +781,6 @@ static int br_validate(struct nlattr *tb[], struct nlattr 
*data[])
return 0;
 }
 
-static int br_dev_newlink(struct net *src_net, struct net_device *dev,
- struct nlattr *tb[], struct nlattr *data[])
-{
-   struct net_bridge *br = netdev_priv(dev);
-
-   if (tb[IFLA_ADDRESS]) {
-   spin_lock_bh(>lock);
-   br_stp_change_bridge_id(br, nla_data(tb[IFLA_ADDRESS]));
-   spin_unlock_bh(>lock);
-   }
-
-   return register_netdevice(dev);
-}
-
 static int br_port_slave_changelink(struct net_device *brdev,
struct net_device *dev,
struct nlattr *tb[],
@@ -1115,6 +1101,25 @@ static int br_changelink(struct net_device *brdev, 
struct nlattr *tb[],
return 0;
 }
 
+static int br_dev_newlink(struct net *src_net, struct net_device *dev,
+ struct nlattr *tb[], struct nlattr *data[])
+{
+   struct net_bridge *br = netdev_priv(dev);
+   int err;
+
+   if (tb[IFLA_ADDRESS]) {
+   spin_lock_bh(>lock);
+   br_stp_change_bridge_id(br, nla_data(tb[IFLA_ADDRESS]));
+   spin_unlock_bh(>lock);
+   }
+
+   err = br_changelink(dev, tb, data);
+   if (err)
+   return err;
+
+   return register_netdevice(dev);
+}
+
 static size_t br_get_size(const struct net_device *brdev)
 {
return nla_total_size(sizeof(u32)) +/* IFLA_BR_FORWARD_DELAY  */
-- 
2.10.2

[PATCH net v2] bridge: netlink: call br_changelink() during br_dev_newlink()

2017-01-20 Thread Ivan Vecera

Any bridge options specified during link creation (e.g. ip link add)
are ignored as br_dev_newlink() does not process them.
Use br_changelink() to do it.

Fixes: 1332351 bridge: implement rtnl_link_ops->changelink
Signed-off-by: Ivan Vecera <c...@cera.cz>
---
 net/bridge/br_netlink.c | 33 +++--
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 71c7453..7109b38 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -781,20 +781,6 @@ static int br_validate(struct nlattr *tb[], struct nlattr 
*data[])
return 0;
 }
 
-static int br_dev_newlink(struct net *src_net, struct net_device *dev,
- struct nlattr *tb[], struct nlattr *data[])
-{
-   struct net_bridge *br = netdev_priv(dev);
-
-   if (tb[IFLA_ADDRESS]) {
-   spin_lock_bh(>lock);
-   br_stp_change_bridge_id(br, nla_data(tb[IFLA_ADDRESS]));
-   spin_unlock_bh(>lock);
-   }
-
-   return register_netdevice(dev);
-}
-
 static int br_port_slave_changelink(struct net_device *brdev,
struct net_device *dev,
struct nlattr *tb[],
@@ -1115,6 +1101,25 @@ static int br_changelink(struct net_device *brdev, 
struct nlattr *tb[],
return 0;
 }
 
+static int br_dev_newlink(struct net *src_net, struct net_device *dev,
+ struct nlattr *tb[], struct nlattr *data[])
+{
+   struct net_bridge *br = netdev_priv(dev);
+   int err;
+
+   if (tb[IFLA_ADDRESS]) {
+   spin_lock_bh(>lock);
+   br_stp_change_bridge_id(br, nla_data(tb[IFLA_ADDRESS]));
+   spin_unlock_bh(>lock);
+   }
+
+   err = br_changelink(dev, tb, data);
+   if (err)
+   return err;
+
+   return register_netdevice(dev);
+}
+
 static size_t br_get_size(const struct net_device *brdev)
 {
return nla_total_size(sizeof(u32)) +/* IFLA_BR_FORWARD_DELAY  */
-- 
2.10.2

[PATCH net] bridge: netlink: call br_changelink() during br_dev_newlink()

2017-01-20 Thread Ivan Vecera

Any bridge options specified during link creation (e.g. ip link add)
are ignored as br_dev_newlink() does not process them.
Use br_changelink() to do it.

Signed-off-by: Ivan Vecera <c...@cera.cz>
---
 net/bridge/br_netlink.c | 33 +++--
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 71c7453..7109b38 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -781,20 +781,6 @@ static int br_validate(struct nlattr *tb[], struct nlattr 
*data[])
return 0;
 }
 
-static int br_dev_newlink(struct net *src_net, struct net_device *dev,
- struct nlattr *tb[], struct nlattr *data[])
-{
-   struct net_bridge *br = netdev_priv(dev);
-
-   if (tb[IFLA_ADDRESS]) {
-   spin_lock_bh(>lock);
-   br_stp_change_bridge_id(br, nla_data(tb[IFLA_ADDRESS]));
-   spin_unlock_bh(>lock);
-   }
-
-   return register_netdevice(dev);
-}
-
 static int br_port_slave_changelink(struct net_device *brdev,
struct net_device *dev,
struct nlattr *tb[],
@@ -1115,6 +1101,25 @@ static int br_changelink(struct net_device *brdev, 
struct nlattr *tb[],
return 0;
 }
 
+static int br_dev_newlink(struct net *src_net, struct net_device *dev,
+ struct nlattr *tb[], struct nlattr *data[])
+{
+   struct net_bridge *br = netdev_priv(dev);
+   int err;
+
+   if (tb[IFLA_ADDRESS]) {
+   spin_lock_bh(>lock);
+   br_stp_change_bridge_id(br, nla_data(tb[IFLA_ADDRESS]));
+   spin_unlock_bh(>lock);
+   }
+
+   err = br_changelink(dev, tb, data);
+   if (err)
+   return err;
+
+   return register_netdevice(dev);
+}
+
 static size_t br_get_size(const struct net_device *brdev)
 {
return nla_total_size(sizeof(u32)) +/* IFLA_BR_FORWARD_DELAY  */
-- 
2.10.2

[PATCH net 1/3] be2net: fix status check in be_cmd_pmac_add()

2017-01-13 Thread Ivan Vecera

Return value from be_mcc_notify_wait() contains a base completion status
together with an additional status. The base_status() macro need to be
used to access base status.

Fixes: e3a7ae2 be2net: Changing MAC Address of a VF was broken
Cc: Sathya Perla <sathya.pe...@broadcom.com>
Cc: Ajit Khaparde <ajit.khapa...@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapa...@broadcom.com>
Cc: Somnath Kotur <somnath.ko...@broadcom.com>
Signed-off-by: Ivan Vecera <c...@cera.cz>
---
 drivers/net/ethernet/emulex/benet/be_cmds.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.c 
b/drivers/net/ethernet/emulex/benet/be_cmds.c
index 0e74529..30e8550 100644
--- a/drivers/net/ethernet/emulex/benet/be_cmds.c
+++ b/drivers/net/ethernet/emulex/benet/be_cmds.c
@@ -1118,7 +1118,7 @@ int be_cmd_pmac_add(struct be_adapter *adapter, u8 
*mac_addr,
 err:
mutex_unlock(>mcc_lock);
 
-if (status == MCC_STATUS_UNAUTHORIZED_REQUEST)
+if (base_status(status) == MCC_STATUS_UNAUTHORIZED_REQUEST)
status = -EPERM;
 
return status;
-- 
2.10.2

[PATCH net 2/3] be2net: don't delete MAC on close on unprivileged BE3 VFs

2017-01-13 Thread Ivan Vecera

BE3 VFs without FILTMGMT privilege are not allowed to modify its MAC,
VLAN table and UC/MC lists. So don't try to delete MAC on such VFs.

Cc: Sathya Perla <sathya.pe...@broadcom.com>
Cc: Ajit Khaparde <ajit.khapa...@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapa...@broadcom.com>
Cc: Somnath Kotur <somnath.ko...@broadcom.com>
Signed-off-by: Ivan Vecera <c...@cera.cz>
---
 drivers/net/ethernet/emulex/benet/be_main.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c 
b/drivers/net/ethernet/emulex/benet/be_main.c
index ec010ce..d606e20 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -3609,7 +3609,11 @@ static void be_rx_qs_destroy(struct be_adapter *adapter)
 
 static void be_disable_if_filters(struct be_adapter *adapter)
 {
-   be_dev_mac_del(adapter, adapter->pmac_id[0]);
+   /* Don't delete MAC on BE3 VFs without FILTMGMT privilege  */
+   if (!BEx_chip(adapter) || !be_virtfn(adapter) ||
+   check_privilege(adapter, BE_PRIV_FILTMGMT))
+   be_dev_mac_del(adapter, adapter->pmac_id[0]);
+
be_clear_uc_list(adapter);
be_clear_mc_list(adapter);
 
-- 
2.10.2

[PATCH net 3/3] be2net: fix MAC addr setting on privileged BE3 VFs

2017-01-13 Thread Ivan Vecera

During interface opening MAC address stored in netdev->dev_addr is
programmed in the HW with exception of BE3 VFs where the initial
MAC is programmed by parent PF. This is OK when MAC address is not
changed when an interfaces is down. In this case the requested MAC is
stored to netdev->dev_addr and later is stored into HW during opening.
But this is not done for all BE3 VFs so the NIC HW does not know
anything about this change and all traffic is filtered.

This is the case of bonding if fail_over_mac == 0 where the MACs of
the slaves are changed while they are down.

The be2net behavior is too restrictive because if a BE3 VF has
the FILTMGMT privilege then it is able to modify its MAC without
any restriction.

To solve the described problem the driver should take care about these
privileged BE3 VFs so the MAC is programmed during opening. And by
contrast unpriviled BE3 VFs should not be allowed to change its MAC
in any case.

Cc: Sathya Perla <sathya.pe...@broadcom.com>
Cc: Ajit Khaparde <ajit.khapa...@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapa...@broadcom.com>
Cc: Somnath Kotur <somnath.ko...@broadcom.com>
Signed-off-by: Ivan Vecera <c...@cera.cz>
---
 drivers/net/ethernet/emulex/benet/be_main.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c 
b/drivers/net/ethernet/emulex/benet/be_main.c
index d606e20..1a7f8ad 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -318,6 +318,13 @@ static int be_mac_addr_set(struct net_device *netdev, void 
*p)
if (ether_addr_equal(addr->sa_data, adapter->dev_mac))
return 0;
 
+   /* BE3 VFs without FILTMGMT privilege are not allowed to set its MAC
+* address
+*/
+   if (BEx_chip(adapter) && be_virtfn(adapter) &&
+   !check_privilege(adapter, BE_PRIV_FILTMGMT))
+   return -EPERM;
+
/* if device is not running, copy MAC to netdev->dev_addr */
if (!netif_running(netdev))
goto done;
@@ -3766,8 +3773,9 @@ static int be_enable_if_filters(struct be_adapter 
*adapter)
if (status)
return status;
 
-   /* For BE3 VFs, the PF programs the initial MAC address */
-   if (!(BEx_chip(adapter) && be_virtfn(adapter))) {
+   /* Don't add MAC on BE3 VFs without FILTMGMT privilege */
+   if (!BEx_chip(adapter) || !be_virtfn(adapter) ||
+   check_privilege(adapter, BE_PRIV_FILTMGMT)) {
status = be_dev_mac_add(adapter, adapter->netdev->dev_addr);
if (status)
return status;
-- 
2.10.2

[PATCH net] be2net: fix unicast list filling

2017-01-06 Thread Ivan Vecera

The adapter->pmac_id[0] item is used for primary MAC address but
this is not true for adapter->uc_list[0] as is assumed in
be_set_uc_list(). There are N UC addresses copied first from net_device
to adapter->uc_list[1..N] and then N UC addresses from
adapter->uc_list[0..N-1] are sent to HW. So the last UC address is never
stored into HW and address 00:00:00:00;00:00 (from uc_list[0]) is used
instead.

Cc: Sathya Perla <sathya.pe...@broadcom.com>
Cc: Ajit Khaparde <ajit.khapa...@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapa...@broadcom.com>
Cc: Somnath Kotur <somnath.ko...@broadcom.com>
Fixes: b717241 be2net: replace polling with sleeping in the FW completion path
Signed-off-by: Ivan Vecera <c...@cera.cz>
---
 drivers/net/ethernet/emulex/benet/be_main.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c 
b/drivers/net/ethernet/emulex/benet/be_main.c
index 3510352..ec010ce 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -1695,9 +1695,8 @@ static void be_set_uc_list(struct be_adapter *adapter)
}
 
if (adapter->update_uc_list) {
-   i = 1; /* First slot is claimed by the Primary MAC */
-
/* cache the uc-list in adapter array */
+   i = 0;
netdev_for_each_uc_addr(ha, netdev) {
ether_addr_copy(adapter->uc_list[i].mac, ha->addr);
i++;
-- 
2.10.2

[PATCH net] be2net: fix accesses to unicast list

2017-01-06 Thread Ivan Vecera

Commit 988d44b "be2net: Avoid redundant addition of mac address in HW"
introduced be_dev_mac_add & be_uc_mac_add helpers that incorrectly
access adapter->uc_list as an array of bytes instead of an array of
be_eth_addr. Consequently NIC is not filled with valid data so unicast
filtering is broken.

Cc: Sathya Perla <sathya.pe...@broadcom.com>
Cc: Ajit Khaparde <ajit.khapa...@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapa...@broadcom.com>
Cc: Somnath Kotur <somnath.ko...@broadcom.com>
Fixes: 988d44b be2net: Avoid redundant addition of mac address in HW
Signed-off-by: Ivan Vecera <c...@cera.cz>
---
 drivers/net/ethernet/emulex/benet/be_main.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c 
b/drivers/net/ethernet/emulex/benet/be_main.c
index 225e9a4..3510352 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -275,8 +275,7 @@ static int be_dev_mac_add(struct be_adapter *adapter, u8 
*mac)
 
/* Check if mac has already been added as part of uc-list */
for (i = 0; i < adapter->uc_macs; i++) {
-   if (ether_addr_equal((u8 *)>uc_list[i * ETH_ALEN],
-mac)) {
+   if (ether_addr_equal(adapter->uc_list[i].mac, mac)) {
/* mac already added, skip addition */
adapter->pmac_id[0] = adapter->pmac_id[i + 1];
return 0;
@@ -1655,14 +1654,12 @@ static void be_clear_mc_list(struct be_adapter *adapter)
 
 static int be_uc_mac_add(struct be_adapter *adapter, int uc_idx)
 {
-   if (ether_addr_equal((u8 *)>uc_list[uc_idx * ETH_ALEN],
-adapter->dev_mac)) {
+   if (ether_addr_equal(adapter->uc_list[uc_idx].mac, adapter->dev_mac)) {
adapter->pmac_id[uc_idx + 1] = adapter->pmac_id[0];
return 0;
}
 
-   return be_cmd_pmac_add(adapter,
-  (u8 *)>uc_list[uc_idx * ETH_ALEN],
+   return be_cmd_pmac_add(adapter, adapter->uc_list[uc_idx].mac,
   adapter->if_handle,
   >pmac_id[uc_idx + 1], 0);
 }
-- 
2.10.2

Re: [patch net-next 3/5] mlxsw: Move PCI id table definitions into driver modules

2016-10-27 Thread Ivan Vecera


Dne 27.10.2016 v 15:12 Jiri Pirko napsal(a):

From: Jiri Pirko <j...@mellanox.com>

So far, mlxsw_pci.ko is the module that registers PCI table for all
drivers (spectrum and switchx2). That is problematic for example with
dracut. Since mlxsw_spectrum.ko and mlxsw_switchx2.ko are loaded
dynamically from within mlxsw_core.ko, dracut does not have track of
them and avoids them from being included in initramfs.

So make this in an ordinary way and define the PCI tables in individual
driver modules, so it can be properly loaded and included in dracut
initramfs image. As a side effect, this patch could remove no longer
necessary driver "kind" strings which were used to link PCI ids with
individual mlxsw drivers.

Suggested-by: Ivan Vecera <ivec...@redhat.com>
Tested-by: Ivan Vecera <ivec...@redhat.com>
Signed-off-by: Jiri Pirko <j...@mellanox.com>
Reviewed-by: Ido Schimmel <ido...@mellanox.com>

Thanks Jiri.

Acked-by: Ivan Vecera <ivec...@redhat.com>

[PATCH net] arch/powerpc: Update parameters for csum_tcpudp_magic & csum_tcpudp_nofold

2016-10-27 Thread Ivan Vecera

Commit 01cfbad "ipv4: Update parameters for csum_tcpudp_magic to their
original types" changed parameters for csum_tcpudp_magic and
csum_tcpudp_nofold for many platforms but not for PowerPC.

Fixes: 01cfbad "ipv4: Update parameters for csum_tcpudp_magic to their original 
types"
Cc: Alexander Duyck <adu...@mirantis.com>
Signed-off-by: Ivan Vecera <ivec...@redhat.com>
---
 arch/powerpc/include/asm/checksum.h | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/checksum.h 
b/arch/powerpc/include/asm/checksum.h
index ee655ed..1e8fceb 100644
--- a/arch/powerpc/include/asm/checksum.h
+++ b/arch/powerpc/include/asm/checksum.h
@@ -53,10 +53,8 @@ static inline __sum16 csum_fold(__wsum sum)
return (__force __sum16)(~((__force u32)sum + tmp) >> 16);
 }
 
-static inline __wsum csum_tcpudp_nofold(__be32 saddr, __be32 daddr,
- unsigned short len,
- unsigned short proto,
- __wsum sum)
+static inline __wsum csum_tcpudp_nofold(__be32 saddr, __be32 daddr, __u32 len,
+   __u8 proto, __wsum sum)
 {
 #ifdef __powerpc64__
unsigned long s = (__force u32)sum;
@@ -83,10 +81,8 @@ static inline __wsum csum_tcpudp_nofold(__be32 saddr, __be32 
daddr,
  * computes the checksum of the TCP/UDP pseudo-header
  * returns a 16-bit checksum, already complemented
  */
-static inline __sum16 csum_tcpudp_magic(__be32 saddr, __be32 daddr,
-   unsigned short len,
-   unsigned short proto,
-   __wsum sum)
+static inline __sum16 csum_tcpudp_magic(__be32 saddr, __be32 daddr, __u32 len,
+   __u8 proto, __wsum sum)
 {
return csum_fold(csum_tcpudp_nofold(saddr, daddr, len, proto, sum));
 }
-- 
2.7.3

Re: [PATCH net] bnx2: fix locking when netconsole is used

2016-10-18 Thread Ivan Vecera


Dne 17.10.2016 v 19:21 David Miller napsal(a):

From: Ivan Vecera <ivec...@redhat.com>
Date: Mon, 17 Oct 2016 18:20:31 +0200


diff --git a/drivers/net/ethernet/broadcom/bnx2.c 
b/drivers/net/ethernet/broadcom/bnx2.c
index 27f11a5..9c408413 100644
--- a/drivers/net/ethernet/broadcom/bnx2.c
+++ b/drivers/net/ethernet/broadcom/bnx2.c
@@ -272,21 +272,24 @@ static u32
 bnx2_reg_rd_ind(struct bnx2 *bp, u32 offset)
 {
u32 val;
+   unsigned long flags;



Please order local variable declarations from longest to shortest
line.

Sure Dave, done.

Ivan

[PATCH net v2] bnx2: fix locking when netconsole is used

2016-10-18 Thread Ivan Vecera

Functions bnx2_reg_rd_ind(), bnx2_reg_wr_ind() and bnx2_ctx_wr()
can be called with IRQs disabled when netconsole is enabled. So they
should use spin_{,un}lock_irq{save,restore} instead of _bh variants.

Example call flow:
bnx2_poll()
  ->bnx2_poll_link()
->bnx2_phy_int()
  ->bnx2_set_remote_link()
->bnx2_shmem_rd()
  ->bnx2_reg_rd_ind()
-> spin_lock_bh(>indirect_lock);
   spin_unlock_bh(>indirect_lock);
   ...
   -> __local_bh_enable_ip

static inline void __local_bh_enable_ip(unsigned long ip)
  WARN_ON_ONCE(in_irq() || irqs_disabled());   <<<<<< WARN

Cc: Sony Chacko <sony.cha...@qlogic.com>
Cc: dept-hsglinuxnic...@qlogic.com
Signed-off-by: Ivan Vecera <ivec...@redhat.com>
---
 drivers/net/ethernet/broadcom/bnx2.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2.c 
b/drivers/net/ethernet/broadcom/bnx2.c
index 27f11a5..b3791b3 100644
--- a/drivers/net/ethernet/broadcom/bnx2.c
+++ b/drivers/net/ethernet/broadcom/bnx2.c
@@ -271,22 +271,25 @@ static inline u32 bnx2_tx_avail(struct bnx2 *bp, struct 
bnx2_tx_ring_info *txr)
 static u32
 bnx2_reg_rd_ind(struct bnx2 *bp, u32 offset)
 {
+   unsigned long flags;
u32 val;
 
-   spin_lock_bh(>indirect_lock);
+   spin_lock_irqsave(>indirect_lock, flags);
BNX2_WR(bp, BNX2_PCICFG_REG_WINDOW_ADDRESS, offset);
val = BNX2_RD(bp, BNX2_PCICFG_REG_WINDOW);
-   spin_unlock_bh(>indirect_lock);
+   spin_unlock_irqrestore(>indirect_lock, flags);
return val;
 }
 
 static void
 bnx2_reg_wr_ind(struct bnx2 *bp, u32 offset, u32 val)
 {
-   spin_lock_bh(>indirect_lock);
+   unsigned long flags;
+
+   spin_lock_irqsave(>indirect_lock, flags);
BNX2_WR(bp, BNX2_PCICFG_REG_WINDOW_ADDRESS, offset);
BNX2_WR(bp, BNX2_PCICFG_REG_WINDOW, val);
-   spin_unlock_bh(>indirect_lock);
+   spin_unlock_irqrestore(>indirect_lock, flags);
 }
 
 static void
@@ -304,8 +307,10 @@ bnx2_shmem_rd(struct bnx2 *bp, u32 offset)
 static void
 bnx2_ctx_wr(struct bnx2 *bp, u32 cid_addr, u32 offset, u32 val)
 {
+   unsigned long flags;
+
offset += cid_addr;
-   spin_lock_bh(>indirect_lock);
+   spin_lock_irqsave(>indirect_lock, flags);
if (BNX2_CHIP(bp) == BNX2_CHIP_5709) {
int i;
 
@@ -322,7 +327,7 @@ bnx2_ctx_wr(struct bnx2 *bp, u32 cid_addr, u32 offset, u32 
val)
BNX2_WR(bp, BNX2_CTX_DATA_ADR, offset);
BNX2_WR(bp, BNX2_CTX_DATA, val);
}
-   spin_unlock_bh(>indirect_lock);
+   spin_unlock_irqrestore(>indirect_lock, flags);
 }
 
 #ifdef BCM_CNIC
-- 
2.7.3

[PATCH net] bnx2: fix locking when netconsole is used

2016-10-17 Thread Ivan Vecera

Functions bnx2_reg_rd_ind(), bnx2_reg_wr_ind() and bnx2_ctx_wr()
can be called with IRQs disabled when netconsole is enabled. So they
should use spin_{,un}lock_irq{save,restore} instead of _bh variants.

Example call flow:
bnx2_poll()
  ->bnx2_poll_link()
->bnx2_phy_int()
  ->bnx2_set_remote_link()
->bnx2_shmem_rd()
  ->bnx2_reg_rd_ind()
-> spin_lock_bh(>indirect_lock);
   spin_unlock_bh(>indirect_lock);
   ...
   -> __local_bh_enable_ip

static inline void __local_bh_enable_ip(unsigned long ip)
  WARN_ON_ONCE(in_irq() || irqs_disabled());   <<<<<< WARN

Cc: Sony Chacko <sony.cha...@qlogic.com>
Cc: dept-hsglinuxnic...@qlogic.com
Signed-off-by: Ivan Vecera <ivec...@redhat.com>
---
 drivers/net/ethernet/broadcom/bnx2.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2.c 
b/drivers/net/ethernet/broadcom/bnx2.c
index 27f11a5..9c408413 100644
--- a/drivers/net/ethernet/broadcom/bnx2.c
+++ b/drivers/net/ethernet/broadcom/bnx2.c
@@ -272,21 +272,24 @@ static u32
 bnx2_reg_rd_ind(struct bnx2 *bp, u32 offset)
 {
u32 val;
+   unsigned long flags;
 
-   spin_lock_bh(>indirect_lock);
+   spin_lock_irqsave(>indirect_lock, flags);
BNX2_WR(bp, BNX2_PCICFG_REG_WINDOW_ADDRESS, offset);
val = BNX2_RD(bp, BNX2_PCICFG_REG_WINDOW);
-   spin_unlock_bh(>indirect_lock);
+   spin_unlock_irqrestore(>indirect_lock, flags);
return val;
 }
 
 static void
 bnx2_reg_wr_ind(struct bnx2 *bp, u32 offset, u32 val)
 {
-   spin_lock_bh(>indirect_lock);
+   unsigned long flags;
+
+   spin_lock_irqsave(>indirect_lock, flags);
BNX2_WR(bp, BNX2_PCICFG_REG_WINDOW_ADDRESS, offset);
BNX2_WR(bp, BNX2_PCICFG_REG_WINDOW, val);
-   spin_unlock_bh(>indirect_lock);
+   spin_unlock_irqrestore(>indirect_lock, flags);
 }
 
 static void
@@ -304,8 +307,10 @@ bnx2_shmem_rd(struct bnx2 *bp, u32 offset)
 static void
 bnx2_ctx_wr(struct bnx2 *bp, u32 cid_addr, u32 offset, u32 val)
 {
+   unsigned long flags;
+
offset += cid_addr;
-   spin_lock_bh(>indirect_lock);
+   spin_lock_irqsave(>indirect_lock, flags);
if (BNX2_CHIP(bp) == BNX2_CHIP_5709) {
int i;
 
@@ -322,7 +327,7 @@ bnx2_ctx_wr(struct bnx2 *bp, u32 cid_addr, u32 offset, u32 
val)
BNX2_WR(bp, BNX2_CTX_DATA_ADR, offset);
BNX2_WR(bp, BNX2_CTX_DATA, val);
}
-   spin_unlock_bh(>indirect_lock);
+   spin_unlock_irqrestore(>indirect_lock, flags);
 }
 
 #ifdef BCM_CNIC
-- 
2.7.3

Re: ethtool.h compile warning on c++

2016-10-17 Thread Ivan Vecera


Dne 14.10.2016 v 21:46 Ben Greear napsal(a):

I am getting warnings about sign missmatch.

Maybe make SPEED_UNKNOWN be ((__u32)(0x)) ?


The 0xU could be better.

Ivan

[PATCH net 2/2] bna: fix crash in bnad_get_strings()

2016-09-15 Thread Ivan Vecera

Commit 6e7333d "net: add rx_nohandler stat counter" added the new entry
rx_nohandler into struct rtnl_link_stats64. Unfortunately the bna
driver foolishly depends on the structure. It uses part of it for
ethtool statistics and it's not bad but the driver assumes its size
is constant as it defines string for each existing entry. The problem
occurs when the structure is extended because you need to modify bna
driver as well. If not any attempt to retrieve ethtool statistics results
in crash in bnad_get_strings().
The patch changes BNAD_ETHTOOL_STATS_NUM so it counts real number of
strings in the array and also removes rtnl_link_stats64 entries that
are not used in output and are always zero.

Fixes: 6e7333d "net: add rx_nohandler stat counter"
Signed-off-by: Ivan Vecera <ivec...@redhat.com>
---
 drivers/net/ethernet/brocade/bna/bnad_ethtool.c | 50 -
 1 file changed, 23 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ethernet/brocade/bna/bnad_ethtool.c 
b/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
index 5671353..31f61a7 100644
--- a/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
+++ b/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
@@ -34,12 +34,7 @@
 #define BNAD_NUM_RXQ_COUNTERS 7
 #define BNAD_NUM_TXQ_COUNTERS 5
 
-#define BNAD_ETHTOOL_STATS_NUM \
-   (sizeof(struct rtnl_link_stats64) / sizeof(u64) +   \
-   sizeof(struct bnad_drv_stats) / sizeof(u64) +   \
-   offsetof(struct bfi_enet_stats, rxf_stats[0]) / sizeof(u64))
-
-static const char *bnad_net_stats_strings[BNAD_ETHTOOL_STATS_NUM] = {
+static const char *bnad_net_stats_strings[] = {
"rx_packets",
"tx_packets",
"rx_bytes",
@@ -50,22 +45,10 @@ static const char 
*bnad_net_stats_strings[BNAD_ETHTOOL_STATS_NUM] = {
"tx_dropped",
"multicast",
"collisions",
-
"rx_length_errors",
-   "rx_over_errors",
"rx_crc_errors",
"rx_frame_errors",
-   "rx_fifo_errors",
-   "rx_missed_errors",
-
-   "tx_aborted_errors",
-   "tx_carrier_errors",
"tx_fifo_errors",
-   "tx_heartbeat_errors",
-   "tx_window_errors",
-
-   "rx_compressed",
-   "tx_compressed",
 
"netif_queue_stop",
"netif_queue_wakeup",
@@ -254,6 +237,8 @@ static const char 
*bnad_net_stats_strings[BNAD_ETHTOOL_STATS_NUM] = {
"fc_tx_fid_parity_errors",
 };
 
+#define BNAD_ETHTOOL_STATS_NUM ARRAY_SIZE(bnad_net_stats_strings)
+
 static int
 bnad_get_settings(struct net_device *netdev, struct ethtool_cmd *cmd)
 {
@@ -859,9 +844,9 @@ bnad_get_ethtool_stats(struct net_device *netdev, struct 
ethtool_stats *stats,
   u64 *buf)
 {
struct bnad *bnad = netdev_priv(netdev);
-   int i, j, bi;
+   int i, j, bi = 0;
unsigned long flags;
-   struct rtnl_link_stats64 *net_stats64;
+   struct rtnl_link_stats64 net_stats64;
u64 *stats64;
u32 bmap;
 
@@ -876,14 +861,25 @@ bnad_get_ethtool_stats(struct net_device *netdev, struct 
ethtool_stats *stats,
 * under the same lock
 */
spin_lock_irqsave(>bna_lock, flags);
-   bi = 0;
-   memset(buf, 0, stats->n_stats * sizeof(u64));
-
-   net_stats64 = (struct rtnl_link_stats64 *)buf;
-   bnad_netdev_qstats_fill(bnad, net_stats64);
-   bnad_netdev_hwstats_fill(bnad, net_stats64);
 
-   bi = sizeof(*net_stats64) / sizeof(u64);
+   memset(_stats64, 0, sizeof(net_stats64));
+   bnad_netdev_qstats_fill(bnad, _stats64);
+   bnad_netdev_hwstats_fill(bnad, _stats64);
+
+   buf[bi++] = net_stats64.rx_packets;
+   buf[bi++] = net_stats64.tx_packets;
+   buf[bi++] = net_stats64.rx_bytes;
+   buf[bi++] = net_stats64.tx_bytes;
+   buf[bi++] = net_stats64.rx_errors;
+   buf[bi++] = net_stats64.tx_errors;
+   buf[bi++] = net_stats64.rx_dropped;
+   buf[bi++] = net_stats64.tx_dropped;
+   buf[bi++] = net_stats64.multicast;
+   buf[bi++] = net_stats64.collisions;
+   buf[bi++] = net_stats64.rx_length_errors;
+   buf[bi++] = net_stats64.rx_crc_errors;
+   buf[bi++] = net_stats64.rx_frame_errors;
+   buf[bi++] = net_stats64.tx_fifo_errors;
 
/* Get netif_queue_stopped from stack */
bnad->stats.drv_stats.netif_queue_stopped = netif_queue_stopped(netdev);
-- 
2.7.3

1 2 3 >

1 - 100 of 203 matches

Mail list logo