Re: [PATCH iproute2-next 1/9] libnetlink: Set NLA_F_NESTED in rta_nest

2019-05-30 Thread Michal Kubecek
On Thu, May 30, 2019 at 11:50:59AM -0600, David Ahern wrote:
> On 5/30/19 11:43 AM, Stephen Hemminger wrote:
> > 
> > I assume older kernels ignore the attribute?
> > 
> > Also, how is this opt-in for running iproute2-next on old kernels?
> 
> from what I can see older kernel added the flag when generating a nest
> (users of nla_nest_start), but did not pay attention to the flag for
> messages received from userspace.

Most of kernel generated messages do not set NLA_F_NESTED as
nla_nest_start() did not set it automatically before commit ae0be8de9a53
("netlink: make nla_nest_start() add NLA_F_NESTED flag") which only
reached mainline in 5.2 merge window. Unfortunately we cannot simply
start setting the flag everywhere as there may be userspace software
using nla->type directly rather than through a wrapper masking out the
flags.

On the other hand, it's safe to set set NLA_F_NESTED in messages sent to
kernel. When checking kernel tree for direct nla->type access, I found
only few (~10) places doing that and with one exception, those were
special cases, e.g. when attribute type was (ab)used as an array index.
Most of the code (and IIRC all of rtnetlink) either uses parse functions
or nla_type() accessor.

Michal


Re: [PATCH V1 net-next 02/11] net: ena: ethtool: add extra properties retrieval via get_priv_flags

2019-05-31 Thread Michal Kubecek
On Wed, May 29, 2019 at 12:49:55PM +0300, same...@amazon.com wrote:
> From: Arthur Kiyanovski 
> 
> This commit adds a mechanism for exposing different driver
> properties via ethtool's priv_flags.
> 
> In this commit we:
> 
> Add commands, structs and defines necessary for handling
> extra properties
> 
> Add functions for:
> Allocation/destruction of a buffer for extra properties strings.
> Retreival of extra properties strings and flags from the network device.
> 
> Handle the allocation of a buffer for extra properties strings.
> 
> * Initialize buffer with extra properties strings from the
>   network device at driver startup.
> 
> Use ethtool's get_priv_flags to expose extra properties of
> the ENA device
> 
> Signed-off-by: Arthur Kiyanovski 
> Signed-off-by: Sameeh Jubran 
> ---
...
> diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c 
> b/drivers/net/ethernet/amazon/ena/ena_com.c
> index bd0d785b2..935e8fa8d 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_com.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_com.c
> @@ -1877,6 +1877,62 @@ int ena_com_get_link_params(struct ena_com_dev 
> *ena_dev,
>   return ena_com_get_feature(ena_dev, resp, ENA_ADMIN_LINK_CONFIG);
>  }
>  
> +int ena_com_extra_properties_strings_init(struct ena_com_dev *ena_dev)
> +{
> + struct ena_admin_get_feat_resp resp;
> + struct ena_extra_properties_strings *extra_properties_strings =
> + &ena_dev->extra_properties_strings;
> + u32 rc;
> +
> + extra_properties_strings->size = ENA_ADMIN_EXTRA_PROPERTIES_COUNT *
> + ENA_ADMIN_EXTRA_PROPERTIES_STRING_LEN;
> +
> + extra_properties_strings->virt_addr =
> + dma_alloc_coherent(ena_dev->dmadev,
> +extra_properties_strings->size,
> +&extra_properties_strings->dma_addr,
> +GFP_KERNEL);

Do you need to fetch the private flag names on each ETHTOOL_GSTRING
request? I suppose they could change e.g. on a firmware update but then
even the count could change which you do not seem to handle. Is there
a reason not to fetch the names once on init rather then accessing the
device memory each time?

My point is that ethtool_ops::get_strings() does not return a value
which indicates that it's supposed to be a trivial operation which
cannot fail, usually a simple copy within kernel memory.

> + if (unlikely(!extra_properties_strings->virt_addr)) {
> + pr_err("Failed to allocate extra properties strings\n");
> + return 0;
> + }
> +
> + rc = ena_com_get_feature_ex(ena_dev, &resp,
> + ENA_ADMIN_EXTRA_PROPERTIES_STRINGS,
> + extra_properties_strings->dma_addr,
> + extra_properties_strings->size);
> + if (rc) {
> + pr_debug("Failed to get extra properties strings\n");
> + goto err;
> + }
> +
> + return resp.u.extra_properties_strings.count;
> +err:
> + ena_com_delete_extra_properties_strings(ena_dev);
> + return 0;
> +}
...
> diff --git a/drivers/net/ethernet/amazon/ena/ena_ethtool.c 
> b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
> index fe596bc30..65bc5a2b2 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_ethtool.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
...
> +static void get_private_flags_strings(struct ena_adapter *adapter, u8 *data)
> +{
> + struct ena_com_dev *ena_dev = adapter->ena_dev;
> + u8 *strings = ena_dev->extra_properties_strings.virt_addr;
> + int i;
> +
> + if (unlikely(!strings)) {
> + adapter->ena_extra_properties_count = 0;
> + netif_err(adapter, drv, adapter->netdev,
> +   "Failed to allocate extra properties strings\n");
> + return;
> + }

This is a bit confusing, IMHO. I'm aware we shouldn't really get here as
with strings null, count would be zero and ethtool_get_strings()
wouldn't call the ->get_strings() callback. But if we ever do, it makes
little sense to complain about failed allocation (which happened once on
init) each time userspace makes ETHTOOL_GSTRINGS request for private
flags.

> +
> + for (i = 0; i < adapter->ena_extra_properties_count; i++) {
> + snprintf(data, ETH_GSTRING_LEN, "%s",
> +  strings + ENA_ADMIN_EXTRA_PROPERTIES_STRING_LEN * i);

snprintf() is a bit overkill here, what you are doing is rather
strlcpy() or strscpy(). Or maybe strncpy() as strings returned by
->get_strings() do not have to be null terminated.

> + data += ETH_GSTRING_LEN;
> + }
> +}

Michal Kubecek


Re: [PATCH 2/2] ethtool: Add 100BaseT1 and 1000BaseT1 link modes

2019-05-31 Thread Michal Kubecek
On Thu, May 30, 2019 at 08:06:16PM +0200, Andrew Lunn wrote:
> The kernel can now indicate if the PHY supports operating over a
> single pair at 100Mbps or 1000Mbps.
> 
> Signed-off-by: Andrew Lunn 
> ---
>  ethtool.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/ethtool.c b/ethtool.c
> index 66a907edd97b..35158939e04c 100644
> --- a/ethtool.c
> +++ b/ethtool.c
> @@ -494,8 +494,10 @@ static void init_global_link_mode_masks(void)
>   ETHTOOL_LINK_MODE_10baseT_Full_BIT,
>   ETHTOOL_LINK_MODE_100baseT_Half_BIT,
>   ETHTOOL_LINK_MODE_100baseT_Full_BIT,
> + ETHTOOL_LINK_MODE_100baseT1_Full_BIT,
>   ETHTOOL_LINK_MODE_1000baseT_Half_BIT,
>   ETHTOOL_LINK_MODE_1000baseT_Full_BIT,
> + ETHTOOL_LINK_MODE_1000baseT1_Full_BIT,
>   ETHTOOL_LINK_MODE_1000baseKX_Full_BIT,
>   ETHTOOL_LINK_MODE_2500baseX_Full_BIT,
>   ETHTOOL_LINK_MODE_1baseT_Full_BIT,

The only place where the all_advertised_modes_bits[] array is used is

ethtool_link_mode_zero(all_advertised_modes);
ethtool_link_mode_zero(all_advertised_flags);
for (i = 0; i < ARRAY_SIZE(all_advertised_modes_bits); ++i) {
ethtool_link_mode_set_bit(all_advertised_modes_bits[i],
  all_advertised_modes);
ethtool_link_mode_set_bit(all_advertised_modes_bits[i],
  all_advertised_flags);
}

so the order does not really matter. I would prefer to have the elements
ordered the same way as in enum ethtool_link_mode_bit_indices so that
it's easier to check if something is missing.

> @@ -634,10 +636,14 @@ static void dump_link_caps(const char *prefix, const 
> char *an_prefix,
> "100baseT/Half" },
>   { 1, ETHTOOL_LINK_MODE_100baseT_Full_BIT,
> "100baseT/Full" },
> + { 1, ETHTOOL_LINK_MODE_100baseT1_Full_BIT,
> +   "100baseT1/Full" },
>   { 0, ETHTOOL_LINK_MODE_1000baseT_Half_BIT,
> "1000baseT/Half" },
>   { 1, ETHTOOL_LINK_MODE_1000baseT_Full_BIT,
> "1000baseT/Full" },
> + { 1, ETHTOOL_LINK_MODE_1000baseT1_Full_BIT,
> +   "1000baseT1/Full" },
>   { 0, ETHTOOL_LINK_MODE_1000baseKX_Full_BIT,
> "1000baseKX/Full" },
>   { 0, ETHTOOL_LINK_MODE_2500baseX_Full_BIT,

Does it mean that we could end up with lines like

100baseT/Half 100baseT/Full 100baseT1/Full
1000baseT/Full 1000baseT1/Full

if there is a NIC supporting both T and T1? (I'm not sure if it's
possible - but if not, there is no need for setting same_line.) It would
be probably confusing for users as modes on the same line always were
half/full duplex variants of the same.

You should also add the new modes to ethtool.8.in.

Michal


Re: [PATCH v2 2/2] ethtool: Add 100BaseT1 and 1000BaseT1 link modes

2019-05-31 Thread Michal Kubecek
On Fri, May 31, 2019 at 03:57:48PM +0200, Andrew Lunn wrote:
> The kernel can now indicate if the PHY supports operating over a
> single pair at 100Mbps or 1000Mbps.
> 
> Signed-off-by: Andrew Lunn 
> ---

Reviewed-by: Michal Kubecek 

>  ethtool.8.in | 2 ++
>  ethtool.c| 6 ++
>  2 files changed, 8 insertions(+)
> 
> diff --git a/ethtool.8.in b/ethtool.8.in
> index 430d11b915af..6af63455c636 100644
> --- a/ethtool.8.in
> +++ b/ethtool.8.in
> @@ -639,8 +639,10 @@ lB   l   lB.
>  0x00210baseT Full
>  0x004100baseT Half
>  0x008100baseT Full
> +0x8  100baseT1 Full
>  0x0101000baseT Half  (not supported by IEEE standards)
>  0x0201000baseT Full
> +0x10 1000baseT1 Full
>  0x2  1000baseKX Full
>  0x2001000baseX Full
>  0x8000   2500baseT Full

This reminds me the earlier discussion about which syntax extension
would be more useful:

  ethtool -s  advertise 100baseT1/Full 1000baseT1/Full

(listing modes to be advertised) or

  ethtool -s  advertise 100baseT1/Full off 1000baseT1/Full on

(enabling/disabling selected modes). But maybe we could support both;
after all, it's unlikely there would ever be a link mode named "on" or
"off".

Michal


Re: [PATCH v2 1/2] ethtool: sync ethtool-copy.h with linux-next from 30/05/2019

2019-05-31 Thread Michal Kubecek
On Fri, May 31, 2019 at 03:57:47PM +0200, Andrew Lunn wrote:
> Sync ethtool-copy.h with linux-next from 22/05/2019. This provides
> access to the new link modes for 100BaseT1 and 1000BaseT1.
> 
> Signed-off-by: Andrew Lunn 
> ---

Reviewed-by: Michal Kubecek 

BtW, this differs from the file "make headers_install" produces in
net-next but only in white space so that it doesn't really matter and it
gets sorted in a future sync.

>  ethtool-copy.h | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/ethtool-copy.h b/ethtool-copy.h
> index 92ab10d65fc9..ad16e8f9c290 100644
> --- a/ethtool-copy.h
> +++ b/ethtool-copy.h
> @@ -1481,6 +1481,8 @@ enum ethtool_link_mode_bit_indices {
>   ETHTOOL_LINK_MODE_20baseLR4_ER4_FR4_Full_BIT = 64,
>   ETHTOOL_LINK_MODE_20baseDR4_Full_BIT = 65,
>   ETHTOOL_LINK_MODE_20baseCR4_Full_BIT = 66,
> + ETHTOOL_LINK_MODE_100baseT1_Full_BIT = 67,
> + ETHTOOL_LINK_MODE_1000baseT1_Full_BIT= 68,
>  
>   /* must be last entry */
>   __ETHTOOL_LINK_MODE_MASK_NBITS
> @@ -1597,7 +1599,7 @@ enum ethtool_link_mode_bit_indices {
>  
>  static __inline__ int ethtool_validate_speed(__u32 speed)
>  {
> - return speed <= INT_MAX || speed == SPEED_UNKNOWN;
> + return speed <= INT_MAX || speed == (__u32)SPEED_UNKNOWN;
>  }
>  
>  /* Duplex, half or full. */
> @@ -1710,6 +1712,9 @@ static __inline__ int ethtool_validate_duplex(__u8 
> duplex)
>  #define ETH_MODULE_SFF_8436  0x4
>  #define ETH_MODULE_SFF_8436_LEN  256
>  
> +#define ETH_MODULE_SFF_8636_MAX_LEN  640
> +#define ETH_MODULE_SFF_8436_MAX_LEN  640
> +
>  /* Reset flags */
>  /* The reset() operation must clear the flags for the components which
>   * were actually reset.  On successful return, the flags indicate the
> -- 
> 2.20.1
> 


Re: [PATCH V1 net-next 02/11] net: ena: ethtool: add extra properties retrieval via get_priv_flags

2019-05-31 Thread Michal Kubecek
On Fri, May 31, 2019 at 09:57:51PM +, Machulsky, Zorik wrote:
> >  
> > +int ena_com_extra_properties_strings_init(struct ena_com_dev *ena_dev)
> > +{
> > +   struct ena_admin_get_feat_resp resp;
> > +   struct ena_extra_properties_strings *extra_properties_strings =
> > +   &ena_dev->extra_properties_strings;
> > +   u32 rc;
> > +
> > +   extra_properties_strings->size = 
> ENA_ADMIN_EXTRA_PROPERTIES_COUNT *
> > +   ENA_ADMIN_EXTRA_PROPERTIES_STRING_LEN;
> > +
> > +   extra_properties_strings->virt_addr =
> > +   dma_alloc_coherent(ena_dev->dmadev,
> > +  extra_properties_strings->size,
> > +  &extra_properties_strings->dma_addr,
> > +  GFP_KERNEL);
> 
> Do you need to fetch the private flag names on each ETHTOOL_GSTRING
> request? I suppose they could change e.g. on a firmware update but then
> even the count could change which you do not seem to handle. Is there
> a reason not to fetch the names once on init rather then accessing the
> device memory each time?
> 
> My point is that ethtool_ops::get_strings() does not return a value
> which indicates that it's supposed to be a trivial operation which
> cannot fail, usually a simple copy within kernel memory.
> 
> ena_com_extra_properties_strings_init() is called in probe() only, and not 
> for every ETHTOOL_GSTRING
> request. For the latter we use ena_get_strings(). And just to make sure we 
> are on the same page, extra_properties_strings->virt_addr 
> points to the host memory and not to the device memory. I believe this should 
> answer your concern.

That's what I misunderstood. Sorry for the noise then.

> > +static void get_private_flags_strings(struct ena_adapter *adapter, u8 
> *data)
> > +{
> > +   struct ena_com_dev *ena_dev = adapter->ena_dev;
> > +   u8 *strings = ena_dev->extra_properties_strings.virt_addr;
> > +   int i;
> > +
> > +   if (unlikely(!strings)) {
> > +   adapter->ena_extra_properties_count = 0;
> > +   netif_err(adapter, drv, adapter->netdev,
> > + "Failed to allocate extra properties 
> strings\n");
> > +   return;
> > +   }
> 
> This is a bit confusing, IMHO. I'm aware we shouldn't really get here as
> with strings null, count would be zero and ethtool_get_strings()
> wouldn't call the ->get_strings() callback. But if we ever do, it makes
> little sense to complain about failed allocation (which happened once on
> init) each time userspace makes ETHTOOL_GSTRINGS request for private
> flags.
> 
> I believe we still want to check validity of the strings pointer to keep the 
> driver resilient, however I agree that 
> the logged message is confusing. Let us rework this commit  

Yes, I didn't question the check, only the error message.

Michal


Re: [PATCH net] ethtool: fix potential userspace buffer overflow

2019-06-03 Thread Michal Kubecek
On Fri, May 31, 2019 at 07:12:21PM -0400, Vivien Didelot wrote:
> ethtool_get_regs() allocates a buffer of size ops->get_regs_len(),
> and pass it to the kernel driver via ops->get_regs() for filling.
> 
> There is no restriction about what the kernel drivers can or cannot do
> with the open ethtool_regs structure. They usually set regs->version
> and ignore regs->len or set it to the same size as ops->get_regs_len().
> 
> Userspace may actually allocate a smaller buffer for registers dump,
> for instance ethtool does that when dumping the raw registers directly
> into a fixed-size file.

This is not exactly true. AFAICS ethtool always calls the ETHTOOL_GREGS
ioctl with the size returned by ETHTOOL_GDRVINFO. Only after that it may
replace regs->len with file length but it's a file it _reads_ and
replaces the dump (or part of it) with it. (Which doesn't seem to make
sense: if ethtool(8) man page says ethtool is to display registers from
a previously saved dump, why does ethtool execute the ETHTOOL_GREGS
request at all?)
 
> Because the current code uses the regs.len value potentially updated
> by the driver when copying the buffer back to userspace, we may
> actually cause a userspace buffer overflow in the final copy_to_user()
> call.
> 
> To fix this, make this case obvious and store regs.len before calling
> ops->get_regs(), to only copy as much data as requested by userspace,
> up to the value returned by ops->get_regs_len().
> 
> While at it, remove the redundant check for non-null regbuf.
> 
> Signed-off-by: Vivien Didelot  ---
> net/core/ethtool.c | 5 - 1 file changed, 4 insertions(+), 1
> deletion(-)
> 
> diff --git a/net/core/ethtool.c b/net/core/ethtool.c index
> 43e9add58340..1a0196fbb49c 100644 --- a/net/core/ethtool.c +++
> b/net/core/ethtool.c @@ -1338,38 +1338,41 @@ static noinline_for_stack
> int ethtool_set_rxfh(struct net_device *dev, static int
> ethtool_get_regs(struct net_device *dev, char __user *useraddr) {
> struct ethtool_regs regs; const struct ethtool_ops *ops =
> dev->ethtool_ops; void *regbuf; int reglen, ret;
>  
>   if (!ops->get_regs || !ops->get_regs_len) return -EOPNOTSUPP;
>  
>   if (copy_from_user(®s, useraddr, sizeof(regs))) return
>   -EFAULT;
>  
>   reglen = ops->get_regs_len(dev); if (reglen <= 0) return reglen;
>  
>   if (regs.len > reglen) regs.len = reglen;
>  
>   regbuf = vzalloc(reglen); if (!regbuf) return -ENOMEM;
>  
> + if (regs.len < reglen) +reglen = regs.len; +
> ops->get_regs(dev, ®s, regbuf);
>  
>   ret = -EFAULT; if (copy_to_user(useraddr, ®s, sizeof(regs)))
>   goto out; useraddr += offsetof(struct ethtool_regs, data); -
>   if (regbuf && copy_to_user(useraddr, regbuf, regs.len)) +
>   if (copy_to_user(useraddr, regbuf, reglen)) goto out; ret = 0;
>  
>   out: vfree(regbuf); return ret; }

Yes, this will address overflowing the userspace buffer. It will still
either preserve regs.len or replace it with full dump length, depending
on the driver. But to address that, we should first agree which it
should be. I'm afraid there is no good choice, setting regs.len to size
actually returned is IMHO less bad.

Michal


Re: [PATCH net] ethtool: fix potential userspace buffer overflow

2019-06-03 Thread Michal Kubecek
On Mon, Jun 03, 2019 at 01:42:19PM -0400, Vivien Didelot wrote:
> Hi Michal,
> 
> On Mon, 3 Jun 2019 19:15:12 +0200, Michal Kubecek  wrote:
> > On Fri, May 31, 2019 at 07:12:21PM -0400, Vivien Didelot wrote:
> > > ethtool_get_regs() allocates a buffer of size ops->get_regs_len(),
> > > and pass it to the kernel driver via ops->get_regs() for filling.
> > > 
> > > There is no restriction about what the kernel drivers can or cannot do
> > > with the open ethtool_regs structure. They usually set regs->version
> > > and ignore regs->len or set it to the same size as ops->get_regs_len().
> > > 
> > > Userspace may actually allocate a smaller buffer for registers dump,
> > > for instance ethtool does that when dumping the raw registers directly
> > > into a fixed-size file.
> > 
> > This is not exactly true. AFAICS ethtool always calls the ETHTOOL_GREGS
> > ioctl with the size returned by ETHTOOL_GDRVINFO. Only after that it may
> > replace regs->len with file length but it's a file it _reads_ and
> > replaces the dump (or part of it) with it. (Which doesn't seem to make
> > sense: if ethtool(8) man page says ethtool is to display registers from
> > a previously saved dump, why does ethtool execute the ETHTOOL_GREGS
> > request at all?)
> 
> You are correct, what ethtool does here is weird. I can remove this statement
> from the commit message in a v2.
> 
> >  
> > > Because the current code uses the regs.len value potentially updated
> > > by the driver when copying the buffer back to userspace, we may
> > > actually cause a userspace buffer overflow in the final copy_to_user()
> > > call.
> > > 
> > > To fix this, make this case obvious and store regs.len before calling
> > > ops->get_regs(), to only copy as much data as requested by userspace,
> > > up to the value returned by ops->get_regs_len().
> > > 
> > > While at it, remove the redundant check for non-null regbuf.
> > > 
> > > Signed-off-by: Vivien Didelot  ---
> > > net/core/ethtool.c | 5 - 1 file changed, 4 insertions(+), 1
> > > deletion(-)
> > > 
> > > diff --git a/net/core/ethtool.c b/net/core/ethtool.c index
> > > 43e9add58340..1a0196fbb49c 100644 --- a/net/core/ethtool.c +++
> > > b/net/core/ethtool.c @@ -1338,38 +1338,41 @@ static noinline_for_stack
> > > int ethtool_set_rxfh(struct net_device *dev, static int
> > > ethtool_get_regs(struct net_device *dev, char __user *useraddr) {
> > > struct ethtool_regs regs; const struct ethtool_ops *ops =
> > > dev->ethtool_ops; void *regbuf; int reglen, ret;
> > >  
> > >   if (!ops->get_regs || !ops->get_regs_len) return -EOPNOTSUPP;
> > >  
> > >   if (copy_from_user(®s, useraddr, sizeof(regs))) return
> > >   -EFAULT;
> > >  
> > >   reglen = ops->get_regs_len(dev); if (reglen <= 0) return reglen;
> > >  
> > >   if (regs.len > reglen) regs.len = reglen;
> > >  
> > >   regbuf = vzalloc(reglen); if (!regbuf) return -ENOMEM;
> > >  
> > > + if (regs.len < reglen) +reglen = regs.len; +
> > > ops->get_regs(dev, ®s, regbuf);
> > >  
> > >   ret = -EFAULT; if (copy_to_user(useraddr, ®s, sizeof(regs)))
> > >   goto out; useraddr += offsetof(struct ethtool_regs, data); -
> > >   if (regbuf && copy_to_user(useraddr, regbuf, regs.len)) +
> > >   if (copy_to_user(useraddr, regbuf, reglen)) goto out; ret = 0;
> > >  
> > >   out: vfree(regbuf); return ret; }
> > 
> > Yes, this will address overflowing the userspace buffer. It will still
> > either preserve regs.len or replace it with full dump length, depending
> > on the driver. But to address that, we should first agree which it
> > should be. I'm afraid there is no good choice, setting regs.len to size
> > actually returned is IMHO less bad.
> 
> I don't think it is necessary to change the current behavior of the drivers
> for the moment. regs.len is kept the same, so we don't impact userspace as
> well. But what's important is to not use regs.len after ops->get_regs()
> is called, because we must use the value passed by userspace (up to
> ops->get_regs_len() for sure). That is what this patch focuses on.
> 
> Do you want me to change something beside rewording the commit message?

I guess we can handle returned regs.len in a follow-up patch. Othere
than that (and the part of commit message discussed above), the patch
looks good to me.

Michal


Re: [PATCH net v2] ethtool: fix potential userspace buffer overflow

2019-06-05 Thread Michal Kubecek
On Mon, Jun 03, 2019 at 04:57:13PM -0400, Vivien Didelot wrote:
> ethtool_get_regs() allocates a buffer of size ops->get_regs_len(),
> and pass it to the kernel driver via ops->get_regs() for filling.
> 
> There is no restriction about what the kernel drivers can or cannot do
> with the open ethtool_regs structure. They usually set regs->version
> and ignore regs->len or set it to the same size as ops->get_regs_len().
> 
> But if userspace allocates a smaller buffer for the registers dump,
> we would cause a userspace buffer overflow in the final copy_to_user()
> call, which uses the regs.len value potentially reset by the driver.
> 
> To fix this, make this case obvious and store regs.len before calling
> ops->get_regs(), to only copy as much data as requested by userspace,
> up to the value returned by ops->get_regs_len().
> 
> While at it, remove the redundant check for non-null regbuf.
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Michal Kubecek 

I believe we should also unify returned regs.len value as discussed
before. While the easiest way to do that would be copying regs.len to
reglen uncoditionally and restoring regs.len after the call to
ops->get_regs(), we should also clarify the documentation and clean up
in-tree drivers modifying regs.len (probably also add a warning); this
would be rather material for net-next.

Michal Kubecek

> ---
>  net/core/ethtool.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/ethtool.c b/net/core/ethtool.c
> index 43e9add58340..1a0196fbb49c 100644
> --- a/net/core/ethtool.c
> +++ b/net/core/ethtool.c
> @@ -1338,38 +1338,41 @@ static noinline_for_stack int ethtool_set_rxfh(struct 
> net_device *dev,
>  static int ethtool_get_regs(struct net_device *dev, char __user *useraddr)
>  {
>   struct ethtool_regs regs;
>   const struct ethtool_ops *ops = dev->ethtool_ops;
>   void *regbuf;
>   int reglen, ret;
>  
>   if (!ops->get_regs || !ops->get_regs_len)
>   return -EOPNOTSUPP;
>  
>   if (copy_from_user(®s, useraddr, sizeof(regs)))
>   return -EFAULT;
>  
>   reglen = ops->get_regs_len(dev);
>   if (reglen <= 0)
>   return reglen;
>  
>   if (regs.len > reglen)
>   regs.len = reglen;
>  
>   regbuf = vzalloc(reglen);
>   if (!regbuf)
>   return -ENOMEM;
>  
> + if (regs.len < reglen)
> + reglen = regs.len;
> +
>   ops->get_regs(dev, ®s, regbuf);
>  
>   ret = -EFAULT;
>   if (copy_to_user(useraddr, ®s, sizeof(regs)))
>   goto out;
>   useraddr += offsetof(struct ethtool_regs, data);
> - if (regbuf && copy_to_user(useraddr, regbuf, regs.len))
> + if (copy_to_user(useraddr, regbuf, reglen))
>   goto out;
>   ret = 0;
>  
>   out:
>   vfree(regbuf);
>   return ret;
>  }
> -- 
> 2.21.0
> 


Re: [PATCH V1 net-next 5/6] net: ena: add ethtool function for changing io queue sizes

2019-06-06 Thread Michal Kubecek
On Thu, Jun 06, 2019 at 02:55:19PM +0300, same...@amazon.com wrote:
> From: Sameeh Jubran 
> 
> Implement the set_ringparam() function of the ethtool interface
> to enable the changing of io queue sizes.
> 
> Signed-off-by: Arthur Kiyanovski 
> Signed-off-by: Sameeh Jubran 
> ---
>  drivers/net/ethernet/amazon/ena/ena_ethtool.c | 25 +++
>  drivers/net/ethernet/amazon/ena/ena_netdev.c  | 14 +++
>  drivers/net/ethernet/amazon/ena/ena_netdev.h  |  5 +++-
>  3 files changed, 43 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/amazon/ena/ena_ethtool.c 
> b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
> index 101d93f16..33e28ad71 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_ethtool.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
> @@ -495,6 +495,30 @@ static void ena_get_ringparam(struct net_device *netdev,
>   ring->rx_pending = adapter->rx_ring[0].ring_size;
>  }
>  
> +static int ena_set_ringparam(struct net_device *netdev,
> +  struct ethtool_ringparam *ring)
> +{
> + struct ena_adapter *adapter = netdev_priv(netdev);
> + u32 new_tx_size, new_rx_size;
> +
> + if (ring->rx_mini_pending || ring->rx_jumbo_pending)
> + return -EINVAL;

This check is superfluous as ethtool_set_ringparam() checks supplied
values against maximum returned by ->get_ringparam() which will be 0 in
this case.

> +
> + new_tx_size = clamp_val(ring->tx_pending, ENA_MIN_RING_SIZE,
> + adapter->max_tx_ring_size);
> + new_tx_size = rounddown_pow_of_two(new_tx_size);
> +
> + new_rx_size = clamp_val(ring->rx_pending, ENA_MIN_RING_SIZE,
> + adapter->max_rx_ring_size);
> + new_rx_size = rounddown_pow_of_two(new_rx_size);

For the same reason, clamping from below would suffice here.

Michal Kubecek

> +
> + if (new_tx_size == adapter->requested_tx_ring_size &&
> + new_rx_size == adapter->requested_rx_ring_size)
> + return 0;
> +
> + return ena_update_queue_sizes(adapter, new_tx_size, new_rx_size);
> +}


Re: [PATCH V1 net-next 5/6] net: ena: add ethtool function for changing io queue sizes

2019-06-06 Thread Michal Kubecek
On Thu, Jun 06, 2019 at 02:55:19PM +0300, same...@amazon.com wrote:
> diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
> b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> index 938aca254..7d3837c13 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> @@ -2031,6 +2031,20 @@ static int ena_close(struct net_device *netdev)
>   return 0;
>  }
>  
> +int ena_update_queue_sizes(struct ena_adapter *adapter,
> +int new_tx_size,
> +int new_rx_size)
> +{
> + bool dev_up;
> +
> + dev_up = test_bit(ENA_FLAG_DEV_UP, &adapter->flags);
> + ena_close(adapter->netdev);
> + adapter->requested_tx_ring_size = new_tx_size;
> + adapter->requested_rx_ring_size = new_rx_size;
> + ena_init_io_rings(adapter);
> + return dev_up ? ena_up(adapter) : 0;
> +}

This function is called with u32 values as arguments by its only caller
and copies them into u32 members of struct ena_adapter. Why are its
arguments new_tx_size and new_rx_size declared as int?

Michal Kubecek


Re: [PATCH V1 net-next 5/6] net: ena: add ethtool function for changing io queue sizes

2019-06-10 Thread Michal Kubecek
On Mon, Jun 10, 2019 at 07:46:08AM +, Jubran, Samih wrote:
> > -Original Message-
> > From: Michal Kubecek 
> > Sent: Thursday, June 6, 2019 5:48 PM
> > To: netdev@vger.kernel.org
> > Cc: Jubran, Samih ; da...@davemloft.net;
> > Woodhouse, David ; Machulsky, Zorik
> > ; Matushevsky, Alexander ;
> > Bshara, Saeed ; Wilson, Matt ;
> > Liguori, Anthony ; Bshara, Nafea
> > ; Tzalik, Guy ; Belgazal,
> > Netanel ; Saidi, Ali ;
> > Herrenschmidt, Benjamin ; Kiyanovski, Arthur
> > 
> > Subject: Re: [PATCH V1 net-next 5/6] net: ena: add ethtool function for
> > changing io queue sizes
> > 
> > On Thu, Jun 06, 2019 at 02:55:19PM +0300, same...@amazon.com wrote:
> > > From: Sameeh Jubran 
> > >
> > > Implement the set_ringparam() function of the ethtool interface to
> > > enable the changing of io queue sizes.
> > >
> > > Signed-off-by: Arthur Kiyanovski 
> > > Signed-off-by: Sameeh Jubran 
> > > ---
> > >  drivers/net/ethernet/amazon/ena/ena_ethtool.c | 25
> > > +++
> > drivers/net/ethernet/amazon/ena/ena_netdev.c  |
> > > 14 +++  drivers/net/ethernet/amazon/ena/ena_netdev.h  |  5
> > > +++-
> > >  3 files changed, 43 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/net/ethernet/amazon/ena/ena_ethtool.c
> > > b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
> > > index 101d93f16..33e28ad71 100644
> > > --- a/drivers/net/ethernet/amazon/ena/ena_ethtool.c
> > > +++ b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
> > > @@ -495,6 +495,30 @@ static void ena_get_ringparam(struct net_device
> > *netdev,
> > >   ring->rx_pending = adapter->rx_ring[0].ring_size;  }
> > >
> > > +static int ena_set_ringparam(struct net_device *netdev,
> > > +  struct ethtool_ringparam *ring) {
> > > + struct ena_adapter *adapter = netdev_priv(netdev);
> > > + u32 new_tx_size, new_rx_size;
> > > +
> > > + if (ring->rx_mini_pending || ring->rx_jumbo_pending)
> > > + return -EINVAL;
> > 
> > This check is superfluous as ethtool_set_ringparam() checks supplied values
> > against maximum returned by ->get_ringparam() which will be 0 in this case.
> > 
> > > +
> > > + new_tx_size = clamp_val(ring->tx_pending, ENA_MIN_RING_SIZE,
> > > + adapter->max_tx_ring_size);
> > > + new_tx_size = rounddown_pow_of_two(new_tx_size);
> > > +
> > > + new_rx_size = clamp_val(ring->rx_pending, ENA_MIN_RING_SIZE,
> > > + adapter->max_rx_ring_size);
> > > + new_rx_size = rounddown_pow_of_two(new_rx_size);
> > 
> > For the same reason, clamping from below would suffice here.
> > 
> > Michal Kubecek
> > 
> > > +
> > > + if (new_tx_size == adapter->requested_tx_ring_size &&
> > > + new_rx_size == adapter->requested_rx_ring_size)
> > > + return 0;
> > > +
> > > + return ena_update_queue_sizes(adapter, new_tx_size,
> > new_rx_size); }
> 
> You are right with both arguments the way the code is written now,
> however, in the future the code might change and we prefer to be extra
> cautious.

If we accept this logic, commit 37e2d99b59c4 ("ethtool: Ensure new ring
parameters are within bounds during SRINGPARAM") would be useless as
every driver would have to duplicate the checks anyway.

Michal Kubecek


Re: [PATCH] ethtool: igb: dump RR2DCDELAY register

2019-07-15 Thread Michal Kubecek
On Mon, Jul 15, 2019 at 09:52:28AM +0300, Artem Bityutskiy wrote:
> Decode 'RR2DCDELAY' register which Linux kernel provides starting from version
> 5.3. The corresponding commit in the Linux kernel is:
> cd502a7f7c9c igb: add RR2DCDELAY to ethtool registers dump
> 
> The RR2DCDELAY register is present in I210 and I211 Intel Gigabit Ethernet
> chips and it stands for "Read Request To Data Completion Delay". Here is how
> this register is described in the I210 datasheet:
> 
> "This field captures the maximum PCIe split time in 16 ns units, which is the
> maximum delay between the read request to the first data completion. This is
> giving an estimation of the PCIe round trip time."
> 
> In practice, this register can be used to measure the time it takes the NIC to
> read data from the host memory.
> 
> Signed-off-by: Artem Bityutskiy 
> ---
>  igb.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/igb.c b/igb.c
> index e0ccef9..ab0896f 100644
> --- a/igb.c
> +++ b/igb.c
> @@ -859,6 +859,18 @@ igb_dump_regs(struct ethtool_drvinfo *info, struct 
> ethtool_regs *regs)
>   "0x03430: TDFPC   (Tx data FIFO packet count)  
> 0x%08X\n",
>   regs_buff[550]);
>  
> + /*
> +  * Starting from kernel version 5.3 the registers dump buffer grew from
> +  * 739 4-byte words to 740 words, and word 740 contains the RR2DCDLAY

nit: "E" missing here:^

> +  * register.
> +  */
> + if (regs->len < 740)
> + return 0;
> +
> + fprintf(stdout,
> + "0x05BF4: RR2DCDELAY  (Max. DMA read delay)
> 0x%08X\n",
> + regs_buff[739]);

Showing a delay as hex number doesn't seem very user friendly but that
also applies to many existing registers so it's probably better to be
consistent and perhaps do an overall cleanup later.

Michal


Re: [PATCH v2] ethtool: igb: dump RR2DCDELAY register

2019-07-15 Thread Michal Kubecek
On Mon, Jul 15, 2019 at 01:59:33PM +0300, Artem Bityutskiy wrote:
> From: Artem Bityutskiy 
> 
> Decode 'RR2DCDELAY' register which Linux kernel provides starting from version
> 5.3. The corresponding commit in the Linux kernel is:
> cd502a7f7c9c igb: add RR2DCDELAY to ethtool registers dump
> 
> The RR2DCDELAY register is present in I210 and I211 Intel Gigabit Ethernet
> chips and it stands for "Read Request To Data Completion Delay". Here is how
> this register is described in the I210 datasheet:
> 
> "This field captures the maximum PCIe split time in 16 ns units, which is the
> maximum delay between the read request to the first data completion. This is
> giving an estimation of the PCIe round trip time."
> 
> In practice, this register can be used to measure the time it takes the NIC to
> read data from the host memory.
> 
> Signed-off-by: Artem Bityutskiy 
> ---
>  igb.c | 12 
>  1 file changed, 12 insertions(+)
> 
> Changelog:
> v2: Fixed a typo in the commentary.
> v1: Initial pach version.
> 
> 
> diff --git a/igb.c b/igb.c
> index e0ccef9..cb24877 100644
> --- a/igb.c
> +++ b/igb.c
> @@ -859,6 +859,18 @@ igb_dump_regs(struct ethtool_drvinfo *info, struct 
> ethtool_regs *regs)
>   "0x03430: TDFPC   (Tx data FIFO packet count)  
> 0x%08X\n",
>   regs_buff[550]);
>  
> + /*
> +  * Starting from kernel version 5.3 the registers dump buffer grew from
> +  * 739 4-byte words to 740 words, and word 740 contains the RR2DCDELAY
> +  * register.
> +  */
> + if (regs->len < 740)
> + return 0;
> +
> + fprintf(stdout,
> + "0x05BF4: RR2DCDELAY  (Max. DMA read delay)
> 0x%08X\n",
> + regs_buff[739]);
> +
>   return 0;
>  }
>  

Reviewed-by: Michal Kubecek 


[PATCH ethtool] gitignore: ignore vim swapfiles and patches

2019-07-29 Thread Michal Kubecek
The .*.swp files are created by vim to hold the undo/redo log. Add them to
.gitignore to prevent "git status" or "git gui" from showing them whenever
some file is open in editor.

Add also *.patch to hide patches created by e.g. "git format-patch".

Signed-off-by: Michal Kubecek 
---
 .gitignore | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/.gitignore b/.gitignore
index f1165a2c9037..c4df588c37ea 100644
--- a/.gitignore
+++ b/.gitignore
@@ -27,3 +27,6 @@ autom4te.cache
 .deps
 test-*.log
 test-*.trs
+
+.*.swp
+*.patch
-- 
2.22.0



Re: [PATCH ethtool] ethtool: dump nested registers

2019-08-05 Thread Michal Kubecek
On Fri, Aug 02, 2019 at 03:34:54PM -0400, Vivien Didelot wrote:
> Usually kernel drivers set the regs->len value to the same length as
> info->regdump_len, which was used for the allocation. In case where
> regs->len is smaller than the allocated info->regdump_len length,
> we may assume that the dump contains a nested set of registers.
> 
> This becomes handy for kernel drivers to expose registers of an
> underlying network conduit unfortunately not exposed to userspace,
> as found in network switching equipment for example.
> 
> This patch adds support for recursing into the dump operation if there
> is enough room for a nested ethtool_drvinfo structure containing a
> valid driver name, followed by a ethtool_regs structure like this:
> 
> 0  regs->leninfo->regdump_len
> v  vv
> +--+-+--+-- - --+
> | ethtool_regs | ethtool_drvinfo | ethtool_regs |   |
> +--+-+--+-- - --+
> 
> Signed-off-by: Vivien Didelot 
> ---

I'm not sure about this approach. If these additional objects with their
own registers are represented by a network device, we can query their
registers directly. If they are not (which, IIUC, is the case in your
use case), we should use an appropriate interface. AFAIK the CPU ports
are already represented in devlink, shouldn't devlink be also used to
query their registers?

>  ethtool.c | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/ethtool.c b/ethtool.c
> index 05fe05a08..c0e2903c5 100644
> --- a/ethtool.c
> +++ b/ethtool.c
> @@ -1245,7 +1245,7 @@ static int dump_regs(int gregs_dump_raw, int 
> gregs_dump_hex,
>  
>   if (gregs_dump_raw) {
>   fwrite(regs->data, regs->len, 1, stdout);
> - return 0;
> + goto nested;
>   }
>  
>   if (!gregs_dump_hex)
> @@ -1253,7 +1253,7 @@ static int dump_regs(int gregs_dump_raw, int 
> gregs_dump_hex,
>   if (!strncmp(driver_list[i].name, info->driver,
>ETHTOOL_BUSINFO_LEN)) {
>   if (driver_list[i].func(info, regs) == 0)
> - return 0;
> + goto nested;
>   /* This version (or some other
>* variation in the dump format) is
>* not handled; fall back to hex
> @@ -1263,6 +1263,15 @@ static int dump_regs(int gregs_dump_raw, int 
> gregs_dump_hex,
>  
>   dump_hex(stdout, regs->data, regs->len, 0);
>  
> +nested:
> + /* Recurse dump if some drvinfo and regs structures are nested */
> + if (info->regdump_len > regs->len + sizeof(*info) + sizeof(*regs)) {
> + info = (struct ethtool_drvinfo *)(®s->data[0] + regs->len);
> + regs = (struct ethtool_regs *)(®s->data[0] + regs->len + 
> sizeof(*info));
> +
> + return dump_regs(gregs_dump_raw, gregs_dump_hex, info, regs);
> + }
> +
>   return 0;
>  }
>  

For raw and hex dumps, this will dump only the payloads without any
metadata allowing to identify what are the additional blocks for the
other related objects, i.e. where they start, how long they are and what
they belong to. That doesn't seem very useful.

Michal Kubecek


Re: [PATCH ethtool] ethtool: dump nested registers

2019-08-05 Thread Michal Kubecek
On Mon, Aug 05, 2019 at 10:52:16AM -0400, Vivien Didelot wrote:
> Hi Michal!
> 
> On Mon, 5 Aug 2019 10:04:48 +0200, Michal Kubecek  wrote:
> > On Fri, Aug 02, 2019 at 03:34:54PM -0400, Vivien Didelot wrote:
> > > Usually kernel drivers set the regs->len value to the same length as
> > > info->regdump_len, which was used for the allocation. In case where
> > > regs->len is smaller than the allocated info->regdump_len length,
> > > we may assume that the dump contains a nested set of registers.
> > > 
> > > This becomes handy for kernel drivers to expose registers of an
> > > underlying network conduit unfortunately not exposed to userspace,
> > > as found in network switching equipment for example.
> > > 
> > > This patch adds support for recursing into the dump operation if there
> > > is enough room for a nested ethtool_drvinfo structure containing a
> > > valid driver name, followed by a ethtool_regs structure like this:
> > > 
> > > 0  regs->leninfo->regdump_len
> > > v  vv
> > > +--+-+--+-- - --+
> > > | ethtool_regs | ethtool_drvinfo | ethtool_regs |   |
> > > +--+-+--+-- - --+
> > > 
> > > Signed-off-by: Vivien Didelot 
> > > ---
> > 
> > I'm not sure about this approach. If these additional objects with their
> > own registers are represented by a network device, we can query their
> > registers directly. If they are not (which, IIUC, is the case in your
> > use case), we should use an appropriate interface. AFAIK the CPU ports
> > are already represented in devlink, shouldn't devlink be also used to
> > query their registers?
> 
> Yet another interface wasn't that much appropriate for DSA, making the
> stack unnecessarily complex.

AFAICS, there is already devlink support in dsa and CPU ports are
presented as devlink ports. So it wouldn't be a completely new
interface.

> In fact we are already glueing the statistics of the CPU port into the
> master's ethtool ops (both physical ports are wired together).

The ethtool statistics (in general) are one big mess, I don't think it's
an example worth following; rather one showing us what to avoid.

> Adding support for nested registers dump in ethtool makes it simple to
> (pretty) dump CPU port's registers without too much userspace
> addition.

It is indeed convenient for pretty print - but very hard to use for any
automated processing. My point is that CPU port is not represented by
a network device but it is already represented by a devlink port so that
it makes much more sense to tie its register dump to that object than to
add add it to register dump of port's master.

In the future, I would like to transform the ethtool register dump from
current opaque block of data to an actual dump of registers. It is
unfortunate that drivers are already mixing information specific to
a network device with information common for the whole physical device.
Adding more data which is actually related to a different object would
only make things worse.

Michal Kubecek


Re: [patch net-next rfc 3/7] net: rtnetlink: add commands to add and delete alternative ifnames

2019-08-09 Thread Michal Kubecek
On Fri, Aug 09, 2019 at 08:40:25AM -0700, Roopa Prabhu wrote:
> to that point, I am also not sure why we have a new API For multiple
> names. I mean why support more than two names  (existing old name and
> a new name to remove the length limitation) ?

One use case is to allow "predictable names" from udev/systemd to work
the way do for e.g. block devices, see

  http://lkml.kernel.org/r/20190628162716.gf29...@unicorn.suse.cz

Michal


Re: [PATCH ethtool] ethtool: dump nested registers

2019-08-09 Thread Michal Kubecek
On Fri, Aug 09, 2019 at 12:23:36PM -0400, John W. Linville wrote:
> On Fri, Aug 02, 2019 at 03:34:54PM -0400, Vivien Didelot wrote:
> > Usually kernel drivers set the regs->len value to the same length as
> > info->regdump_len, which was used for the allocation. In case where
> > regs->len is smaller than the allocated info->regdump_len length,
> > we may assume that the dump contains a nested set of registers.
> > 
> > This becomes handy for kernel drivers to expose registers of an
> > underlying network conduit unfortunately not exposed to userspace,
> > as found in network switching equipment for example.
> > 
> > This patch adds support for recursing into the dump operation if there
> > is enough room for a nested ethtool_drvinfo structure containing a
> > valid driver name, followed by a ethtool_regs structure like this:
> > 
> > 0  regs->leninfo->regdump_len
> > v  vv
> > +--+-+--+-- - --+
> > | ethtool_regs | ethtool_drvinfo | ethtool_regs |   |
> > +--+-+--+-- - --+
> > 
> > Signed-off-by: Vivien Didelot 
> 
> I wasn't sure what to do with this one, given the disucssion that
> followed. But since Dave merged "net: dsa: dump CPU port regs through
> master" for net-next, I went ahead and queued this one for the next
> release. If that was the wrong thing to do, speak-up now!

I'm certainly not happy about it as I'm afraid it's going to give me
a headache one day. But as long as the driver packs CPU port registers
into the device's register dump, it doesn't make sense not to allow
ethtool to parse them. And I'm not sure my objections are strong enough
to request a revert of the kernel part as I'm not sure the idea of
transforming the register dump into a structured format (an actual list
of registers rather than an opaque data block) is feasible at all.

So let's keep the patch in.

Michal


Re: [patch net-next rfc 3/7] net: rtnetlink: add commands to add and delete alternative ifnames

2019-08-10 Thread Michal Kubecek
On Sat, Aug 10, 2019 at 06:46:57AM -0700, Roopa Prabhu wrote:
> On Fri, Aug 9, 2019 at 8:46 AM Michal Kubecek  wrote:
> >
> > On Fri, Aug 09, 2019 at 08:40:25AM -0700, Roopa Prabhu wrote:
> > > to that point, I am also not sure why we have a new API For multiple
> > > names. I mean why support more than two names  (existing old name and
> > > a new name to remove the length limitation) ?
> >
> > One use case is to allow "predictable names" from udev/systemd to work
> > the way do for e.g. block devices, see
> >
> >   http://lkml.kernel.org/r/20190628162716.gf29...@unicorn.suse.cz
> >
> 
> thanks for the link. don't know the details about alternate block
> device names. Does user-space generate multiple and assign them to a
> kernel object as proposed in this series ?. is there a limit to number
> of names ?. my understanding of 'predictable names' was still a single
> name but predictable structure to the name.

It is a single name but IMHO mostly because we can only have one name.
For block devices, udev uses symlinks to create multiple aliases based
on different naming schemes, e.g.

mike@lion:~> find -L /dev/disk/ -samefile /dev/sda2 -exec ls -l {} +
lrwxrwxrwx 1 root root 10 srp  5 21:47 
/dev/disk/by-id/ata-WDC_WD30EFRX-68AX9N0_WD-WMC1T3114933-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 srp  5 21:47 
/dev/disk/by-id/scsi-SATA_WDC_WD30EFRX-68A_WD-WMC1T3114933-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 srp  5 21:47 
/dev/disk/by-id/scsi-SATA_WDC_WD30EFRX-68_WD-WMC1T3114933-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 srp  5 21:47 
/dev/disk/by-id/scsi-0ATA_WDC_WD30EFRX-68A_WD-WMC1T3114933-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 srp  5 21:47 
/dev/disk/by-id/scsi-1ATA_WDC_WD30EFRX-68AX9N0_WD-WMC1T3114933-part2 -> 
../../sda2
lrwxrwxrwx 1 root root 10 srp  5 21:47 
/dev/disk/by-id/scsi-350014ee6589cfea0-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 srp  5 21:47 
/dev/disk/by-id/wwn-0x50014ee6589cfea0-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 srp  5 21:47 /dev/disk/by-partlabel/root2 -> 
../../sda2
lrwxrwxrwx 1 root root 10 srp  5 21:47 
/dev/disk/by-partuuid/71affb47-a93b-40fd-8986-d2e227e1b39d -> ../../sda2
lrwxrwxrwx 1 root root 10 srp  5 21:47 
/dev/disk/by-path/pci-:00:11.0-ata-1-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 srp  5 21:47 
/dev/disk/by-path/pci-:00:11.0-scsi-0:0:0:0-part2 -> ../../sda2

Few years ago, udev even dropped support for renaming block and
character devices (NAME="...") so that it now keeps kernel name and only
creates symlinks to it. Recent versions only allow NAME="..." for
network devices.

Michal


Re: [patch net-next rfc 3/7] net: rtnetlink: add commands to add and delete alternative ifnames

2019-08-11 Thread Michal Kubecek
On Sat, Aug 10, 2019 at 12:39:31PM -0700, Roopa Prabhu wrote:
> On Sat, Aug 10, 2019 at 8:50 AM Michal Kubecek  wrote:
> >
> > On Sat, Aug 10, 2019 at 06:46:57AM -0700, Roopa Prabhu wrote:
> > > On Fri, Aug 9, 2019 at 8:46 AM Michal Kubecek  wrote:
> > > >
> > > > On Fri, Aug 09, 2019 at 08:40:25AM -0700, Roopa Prabhu wrote:
> > > > > to that point, I am also not sure why we have a new API For multiple
> > > > > names. I mean why support more than two names  (existing old name and
> > > > > a new name to remove the length limitation) ?
> > > >
> > > > One use case is to allow "predictable names" from udev/systemd to work
> > > > the way do for e.g. block devices, see
> > > >
> > > >   http://lkml.kernel.org/r/20190628162716.gf29...@unicorn.suse.cz
> > > >
> > >
> > > thanks for the link. don't know the details about alternate block
> > > device names. Does user-space generate multiple and assign them to a
> > > kernel object as proposed in this series ?. is there a limit to number
> > > of names ?. my understanding of 'predictable names' was still a single
> > > name but predictable structure to the name.
> >
> > It is a single name but IMHO mostly because we can only have one name.
> > For block devices, udev uses symlinks to create multiple aliases based
> > on different naming schemes, e.g.
> >
> > mike@lion:~> find -L /dev/disk/ -samefile /dev/sda2 -exec ls -l {} +
> > lrwxrwxrwx 1 root root 10 srp  5 21:47 
> > /dev/disk/by-id/ata-WDC_WD30EFRX-68AX9N0_WD-WMC1T3114933-part2 -> ../../sda2
> > lrwxrwxrwx 1 root root 10 srp  5 21:47 
> > /dev/disk/by-id/scsi-SATA_WDC_WD30EFRX-68A_WD-WMC1T3114933-part2 -> 
> > ../../sda2
> > lrwxrwxrwx 1 root root 10 srp  5 21:47 
> > /dev/disk/by-id/scsi-SATA_WDC_WD30EFRX-68_WD-WMC1T3114933-part2 -> 
> > ../../sda2
> > lrwxrwxrwx 1 root root 10 srp  5 21:47 
> > /dev/disk/by-id/scsi-0ATA_WDC_WD30EFRX-68A_WD-WMC1T3114933-part2 -> 
> > ../../sda2
> > lrwxrwxrwx 1 root root 10 srp  5 21:47 
> > /dev/disk/by-id/scsi-1ATA_WDC_WD30EFRX-68AX9N0_WD-WMC1T3114933-part2 -> 
> > ../../sda2
> > lrwxrwxrwx 1 root root 10 srp  5 21:47 
> > /dev/disk/by-id/scsi-350014ee6589cfea0-part2 -> ../../sda2
> > lrwxrwxrwx 1 root root 10 srp  5 21:47 
> > /dev/disk/by-id/wwn-0x50014ee6589cfea0-part2 -> ../../sda2
> > lrwxrwxrwx 1 root root 10 srp  5 21:47 /dev/disk/by-partlabel/root2 -> 
> > ../../sda2
> > lrwxrwxrwx 1 root root 10 srp  5 21:47 
> > /dev/disk/by-partuuid/71affb47-a93b-40fd-8986-d2e227e1b39d -> ../../sda2
> > lrwxrwxrwx 1 root root 10 srp  5 21:47 
> > /dev/disk/by-path/pci-:00:11.0-ata-1-part2 -> ../../sda2
> > lrwxrwxrwx 1 root root 10 srp  5 21:47 
> > /dev/disk/by-path/pci-:00:11.0-scsi-0:0:0:0-part2 -> ../../sda2
> >
> > Few years ago, udev even dropped support for renaming block and
> > character devices (NAME="...") so that it now keeps kernel name and only
> > creates symlinks to it. Recent versions only allow NAME="..." for
> > network devices.
> 
> ok thanks for the details. This looks like names that are structured
> on hardware info which could fall into devlinks scope and they point
> to a single name.
> We should think about keeping them under devlink (by-id, by-mac etc).
> It already can recognize network interfaces by id.

Not all of them are hardware based, there are also links based on
filesystem label or UUID. But my point is rather that udev creates
multiple links so that any of them can be used in any place where
a block device is to be identified.

As network devices can have only one name, udev drops kernel provided
name completely and replaces it with name following one naming scheme.
Thus we have to know which naming scheme is going to be used and make
sure it does not change. With multiple alternative names, we could also
have all udev provided names at once (and also the original one from
kernel).

Michal


Re: [patch net-next rfc 3/7] net: rtnetlink: add commands to add and delete alternative ifnames

2019-08-12 Thread Michal Kubecek
On Mon, Aug 12, 2019 at 08:21:31AM -0700, Roopa Prabhu wrote:
> On Sun, Aug 11, 2019 at 3:10 PM Michal Kubecek  wrote:
> >
> > Not all of them are hardware based, there are also links based on
> > filesystem label or UUID. But my point is rather that udev creates
> > multiple links so that any of them can be used in any place where
> > a block device is to be identified.
> >
> > As network devices can have only one name, udev drops kernel provided
> > name completely and replaces it with name following one naming scheme.
> > Thus we have to know which naming scheme is going to be used and make
> > sure it does not change. With multiple alternative names, we could also
> > have all udev provided names at once (and also the original one from
> > kernel).
> 
> ok, understand the use-case.
> But, Its hard for me to understand how udev is going to manage this
> list of names without structure to them.
> Plus how is udev going to distinguish its own names from user given name ?.
> 
> I thought this list was giving an opportunity to use the long name
> everywhere else.
> But if this is going to be managed by udev with a bunch of structured
> names, I don't understand how the rest of the system is going to use
> these names.
> 
> Maybe we should just call this a udev managed list of names.
> 
> (again, i think the best way to do this for udev is to provide the
> symlink like facility via devlink or any other infra).

I certainly didn't want to suggest for alternative names to be managed
by udev. What I meant was that supporting multiple alternative names
would allow udev to create its names based on e.g. device bus address,
BIOS/UEFI slot number, MAC address etc. But it would still be up to
admins if they want to create their own names.

Michal


Re: [PATCH iproute2] iplink: work around rtattr length limits for IFLA_VFINFO_LIST

2021-01-16 Thread Michal Kubecek
On Fri, Jan 15, 2021 at 03:53:25PM -0800, Jakub Kicinski wrote:
> On Fri, 15 Jan 2021 14:59:50 -0800 Edwin Peer wrote:
> > The maximum possible length of an RTNL attribute is 64KB, but the
> > nested VFINFO list exceeds this for more than about 220 VFs (each VF
> > consumes approximately 300 bytes, depending on alignment and optional
> > fields). Exceeding the limit causes IFLA_VFINFO_LIST's length to wrap
> > modulo 16 bits in the kernel's nla_nest_end().
> 
> Let's add Michal to CC, my faulty memory tells me he was fighting with
> this in the past.

I've been looking into this some time ago and even tried to open
a discussion on this topic two or three times but there didn't seem
sufficient interest.

My idea back then was to use a separate  query which would allow getting
VF information using a dump request (one VF per message); the reply for
RTM_GETLINK request would either list all VFs as now if possible or only
as many as fit into a nested attribute and indicate that the information
is incomplete (or maybe omit the VF information in such case as
usefulness of the truncated list is questionable).

However, my take from the discussions was that most developers who took
part rather thought that there is no need for such rtnetlink feature as
there is a devlink interface which does not suffer from this limit and
NICs with so many VFs that IFLA_VFINFO_LIST exceeds 65535 bytes can
provide devlink interface to handle them.

In any case, while the idea of handling the malformed messages composed
by existing kernels makes sense, we should IMHO consider this a kernel
bug which should be fixed so that kernel does not reply with malformed
netlink messages (independently of whether this patch is applied to
iproute2 or not).

Michal


Re: [PATCH iproute2] iplink: work around rtattr length limits for IFLA_VFINFO_LIST

2021-01-18 Thread Michal Kubecek
On Mon, Jan 18, 2021 at 10:20:35AM -0800, Edwin Peer wrote:
> On Mon, Jan 18, 2021 at 9:49 AM David Ahern  wrote:
> 
> > Different bug, different solution required. The networking stack hits
> > these kind of scalability problems from time to time with original
> > uapis, so workarounds are needed. One example is rtmsg which only allows
> > 255 routing tables, so RTA_TABLE attribute was added as a u32. Once a
> > solution is found for the VF problem, iproute2 can be enhanced to
> > accommodate.
> 
> The problem is even worse, because user space already depends on the
> broken behavior. Erroring out will cause the whole ip link show
> command to fail, which works today. Even though the VF list is bust,
> the rest of the netdevs are still dumped correctly. A hard fail would
> break those too.

We could cut the list just before overflowing and inform userspace that
the list is incomplete. Not perfect but there is no perfect solution
which would not require userspace changes to work properly for devices
with "too many" VFs.

Michal


Re: [PATCH ethtool] ethtool: do_sset return correct value on fail

2020-12-14 Thread Michal Kubecek
On Sun, Dec 13, 2020 at 04:25:03PM +0200, Tariq Toukan wrote:
> From: Roy Novich 
> 
> The return value for do_sset was constant and returned 0.
> This value is misleading when returned on operation failure.
> Changed return value to the correct function err status.
> 
> Fixes: 32c8037055f5 ("Initial import of ethtool version 3 + a few patches.")
> Signed-off-by: Roy Novich 
> Signed-off-by: Tariq Toukan 
> ---
>  ethtool.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/ethtool.c b/ethtool.c
> index 1d9067e774af..5cc875c64591 100644
> --- a/ethtool.c
> +++ b/ethtool.c
> @@ -3287,7 +3287,7 @@ static int do_sset(struct cmd_context *ctx)
>   }
>   }
>  
> - return 0;
> + return err;
>  }
>  
>  static int do_gregs(struct cmd_context *ctx)

I'm afraid it's not as easy as this. The problem with -s/--set
subcommand is that its parameters are in fact implemented by three
separate requests (for ioctl, four requests for netlink). Each of them
may fail or succeed independently of others. Currently do_sset() always
returns 0 which is indeed unfortunate but it's not clear what should we
return if some requests fail and some succeed.

With your patch, do_sset() would always return the result of last
request it sent to kernel which is inconsistent; consider e.g.

  ethtool -s eth0 mdix on wol g

If the ETHTOOL_SLINKSETTINGS request setting MDI-X succeeds and
ETHTOOL_SWOL setting WoL fails, the result would be a failure. But if
setting MDI-X fails and setting WoL succeeds, do_sset() would report
success.

So if we really want to change the behaviour after so many years, it
should be consistent: either return non-zero exit code if any request
fails or if all fail. (Personally, I would prefer the latter.) And we
should probably modify nl_sset() to behave the same way as it currently
bails out on first failed request.

Michal


[PATCH net] ethtool: fix string set id check

2020-12-14 Thread Michal Kubecek
Syzbot reported a shift of a u32 by more than 31 in strset_parse_request()
which is undefined behavior. This is caused by range check of string set id
using variable ret (which is always 0 at this point) instead of id (string
set id from request).

Fixes: 71921690f974 ("ethtool: provide string sets with STRSET_GET request")
Reported-by: syzbot+96523fb438937cd01...@syzkaller.appspotmail.com
Signed-off-by: Michal Kubecek 
---
 net/ethtool/strset.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ethtool/strset.c b/net/ethtool/strset.c
index 0baad0ce1832..c3a5489964cd 100644
--- a/net/ethtool/strset.c
+++ b/net/ethtool/strset.c
@@ -182,7 +182,7 @@ static int strset_parse_request(struct ethnl_req_info 
*req_base,
ret = strset_get_id(attr, &id, extack);
if (ret < 0)
return ret;
-   if (ret >= ETH_SS_COUNT) {
+   if (id >= ETH_SS_COUNT) {
NL_SET_ERR_MSG_ATTR(extack, attr,
"unknown string set id");
return -EOPNOTSUPP;
-- 
2.29.2



Re: [PATCH net] ethtool: fix error paths in ethnl_set_channels()

2020-12-15 Thread Michal Kubecek
On Tue, Dec 15, 2020 at 10:08:10AM +0100, Ivan Vecera wrote:
> Fix two error paths in ethnl_set_channels() to avoid lock-up caused
> but unreleased RTNL.
> 
> Fixes: e19c591eafad ("ethtool: set device channel counts with CHANNELS_SET 
> request")
> Cc: Michal Kubecek 
> Reported-by: LiLiang 
> Signed-off-by: Ivan Vecera 
> ---
>  net/ethtool/channels.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ethtool/channels.c b/net/ethtool/channels.c
> index 5635604cb9ba..25a9e566ef5c 100644
> --- a/net/ethtool/channels.c
> +++ b/net/ethtool/channels.c
> @@ -194,8 +194,9 @@ int ethnl_set_channels(struct sk_buff *skb, struct 
> genl_info *info)
>   if (netif_is_rxfh_configured(dev) &&
>   !ethtool_get_max_rxfh_channel(dev, &max_rx_in_use) &&
>   (channels.combined_count + channels.rx_count) <= max_rx_in_use) {
> + ret = -EINVAL;
>   GENL_SET_ERR_MSG(info, "requested channel counts are too low 
> for existing indirection table settings");
> - return -EINVAL;
> + goto out_ops;
>   }
>  
>   /* Disabling channels, query zero-copy AF_XDP sockets */
> @@ -203,8 +204,9 @@ int ethnl_set_channels(struct sk_buff *skb, struct 
> genl_info *info)
>  min(channels.rx_count, channels.tx_count);
>   for (i = from_channel; i < old_total; i++)
>   if (xsk_get_pool_from_qid(dev, i)) {
> + ret = -EINVAL;
>   GENL_SET_ERR_MSG(info, "requested channel counts are 
> too low for existing zerocopy AF_XDP sockets");
> - return -EINVAL;
> +     goto out_ops;
>   }
>  
>   ret = dev->ethtool_ops->set_channels(dev, &channels);

Oh, the joys of mindless copy and paste... :-(

Reviewed-by: Michal Kubecek 


signature.asc
Description: PGP signature


ethtool 5.10 released

2020-12-16 Thread Michal Kubecek
Hello,

ethtool 5.10 has been released.

Home page: https://www.kernel.org/pub/software/network/ethtool/
Download link:
https://www.kernel.org/pub/software/network/ethtool/ethtool-5.10.tar.xz

Release notes:

* Feature: infrastructure for JSON output
* Feature: separate FLAGS in -h output
* Feature: use policy dumps to check flags support
* Feature: show pause stats (-a)
* Feature: pretty printing of policy dumps
* Feature: improve error message when SFP module is missing
* Fix: use after free in netlink_run_handler()
* Fix: leaked instances of struct nl_socket
* Fix: improve compatibility between netlink and ioctl (-s)

Enjoy,
Michal


signature.asc
Description: PGP signature


Re: [PATCH iproute2-next 0/5] iproute2: add libbpf support

2020-11-30 Thread Michal Kubecek
On Sun, Nov 29, 2020 at 07:22:54AM +0100, Greg KH wrote:
> On Sat, Nov 28, 2020 at 10:16:35PM -0800, Stephen Hemminger wrote:
> 
> > If someone steps up to doing this then I would be happy to merge it now
> > for 5.10. Otherwise it won't show up until 5.11.
> 
> Don't ever "rush" anything for a LTS/stable release, otherwise I am
> going to have to go back to the old way of not announcing them until
> _after_ they are released as people throw stuff that is not ready for
> a normal merge.
> 
> This looks like a new feature, and shouldn't go in right now in the
> development cycle anyway, all features for 5.10 had to be in linux-next
> before 5.9 was released.

>From the context, I believe Stephen meant merging into iproute2 5.10,
not kernel.

Michal Kubecek


Re: [PATCH v2] net/af_unix: don't create a path for a binded socket

2020-12-01 Thread Michal Kubecek
On Mon, Nov 30, 2020 at 05:30:00PM -0800, Jakub Kicinski wrote:
> On Mon, 30 Nov 2020 16:27:47 +0300 Denis Kirjanov wrote:
> > in the case of the socket which is bound to an adress
^ address
> > there is no sense to create a path in the next attempts
> > 
> > here is a program that shows the issue:
> > 
> > int main()
> > {
> > int s;
> > struct sockaddr_un a;
> > 
> > s = socket(AF_UNIX, SOCK_STREAM, 0);
> > if (s<0)
> > perror("socket() failed\n");
> > 
> > printf("First bind()\n");
> > 
> > memset(&a, 0, sizeof(a));
> > a.sun_family = AF_UNIX;
> > strncpy(a.sun_path, "/tmp/.first_bind", sizeof(a.sun_path));
> > 
> > if ((bind(s, (const struct sockaddr*) &a, sizeof(a))) == -1)
> > perror("bind() failed\n");
> > 
> > printf("Second bind()\n");
> > 
> > memset(&a, 0, sizeof(a));
> > a.sun_family = AF_UNIX;
> > strncpy(a.sun_path, "/tmp/.first_bind_failed", sizeof(a.sun_path));
> > 
> > if ((bind(s, (const struct sockaddr*) &a, sizeof(a))) == -1)
> > perror("bind() failed\n");
> > }
> > 
> > kda@SLES15-SP2:~> ./test
> > First bind()
> > Second bind()
> > bind() failed
> > : Invalid argument
> > 
> > kda@SLES15-SP2:~> ls -la /tmp/.first_bind
> > .first_bind .first_bind_failed
> > 
> > Signed-off-by: Denis Kirjanov 
> > 
> > v2: move a new patch creation after the address assignment check.
  ^ path

> 
> It is a behavior change, but IDK if anyone can reasonably depend on
> current behavior for anything useful. Otherwise LGTM.

For the record, we already changed the error code in this case once by
commit 0fb44559ffd6 ("af_unix: move unix_mknod() out of bindlock"), this
should revert it to the original value. As far as I'm aware, only LTP
guys noticed back then. I tried to raise the question in this list but
nobody seemed to care so I didn't pursue.

Technically, both errors are correct as both "address already in use"
and "socket already bound" conditions are met and the manual page
(neither linux nor posix) doesn't seem to specify which should take
precedence.

Michal


Re: LRO: creating vlan subports affects parent port's LRO settings

2020-12-06 Thread Michal Kubecek
On Sat, Dec 05, 2020 at 07:04:06PM -0500, Jarod Wilson wrote:
> On Mon, Nov 23, 2020 at 7:27 PM Jakub Kicinski  wrote:
> >
> > On Thu, 19 Nov 2020 20:37:27 -0500 Limin Wang wrote:
> > > Under relatively recent kernels (v4.4+), creating a vlan subport on a
> > > LRO supported parent NIC may turn LRO off on the parent port and
> > > further render its LRO feature practically unchangeable.
> >
> > That does sound like an oversight in commit fd867d51f889 ("net/core:
> > generic support for disabling netdev features down stack").
> >
> > Are you able to create a patch to fix this?
> 
> Something like this, perhaps? Completely untested copy-pasta'd
> theoretical patch:
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 8588ade790cb..a5ce372e02ba 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -9605,8 +9605,10 @@ int __netdev_update_features(struct net_device *dev)
> features = netdev_fix_features(dev, features);
> 
> /* some features can't be enabled if they're off on an upper device */
> -   netdev_for_each_upper_dev_rcu(dev, upper, iter)
> -   features = netdev_sync_upper_features(dev, upper, features);
> +   netdev_for_each_upper_dev_rcu(dev, upper, iter) {
> +   if (netif_is_lag_master(upper) || 
> netif_is_bridge_master(upper))
> +   features = netdev_sync_upper_features(dev,
> upper, features);
> +   }
> 
> if (dev->features == features)
> goto sync_lower;
> @@ -9633,8 +9635,10 @@ int __netdev_update_features(struct net_device *dev)
> /* some features must be disabled on lower devices when disabled
>  * on an upper device (think: bonding master or bridge)
>  */
> -   netdev_for_each_lower_dev(dev, lower, iter)
> -   netdev_sync_lower_features(dev, lower, features);
> +   if (netif_is_lag_master(dev) || netif_is_bridge_master(dev)) {
> +   netdev_for_each_lower_dev(dev, lower, iter)
> +   netdev_sync_lower_features(dev, lower, features);
> +   }
> 
> if (!err) {
> netdev_features_t diff = features ^ dev->features;
> 
> I'm not sure what all other upper devices this excludes besides just
> vlan ports though, so perhaps safer add upper device types to not do
> feature sync on than to choose which ones to do them on?

I'm not sure excluding devices from feature sync is the right way,
whether it's an explicit list types or default. The logic still makes
sense to me. Couldn't we address the issue by either setting features in
NETIF_F_UPPER_DISABLES) by default for a new vlan (and probably macvlan)
device? Or perhaps inheriting their values from the lower device.

Michal


Re: [PATCH ethtool v2] Improve error message when SFP module is missing

2020-12-06 Thread Michal Kubecek
On Wed, Dec 02, 2020 at 07:22:14PM +0200, Baruch Siach wrote:
> ETHTOOL_GMODULEINFO request success indicates that SFP cage is present.
> Failure of ETHTOOL_GMODULEEEPROM is most likely because SFP module is
> not plugged in. Add an indication to the user as to what might be the
> reason for the failure.
> 
> Signed-off-by: Baruch Siach 

Applied (with minor whitespace formatting cleanup), thank you.

Michal

> ---
> v2: Limit message to likely errno values (Andrew Lunn)
> ---
>  ethtool.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/ethtool.c b/ethtool.c
> index 1d9067e774af..63b3a095eded 100644
> --- a/ethtool.c
> +++ b/ethtool.c
> @@ -4855,7 +4855,11 @@ static int do_getmodule(struct cmd_context *ctx)
>   eeprom->offset = geeprom_offset;
>   err = send_ioctl(ctx, eeprom);
>   if (err < 0) {
> + int saved_errno = errno;
>   perror("Cannot get Module EEPROM data");
> + if (saved_errno == ENODEV || saved_errno == EIO ||
> + saved_errno == ENXIO)
> + fprintf(stderr, "SFP module not in cage?\n");
>   free(eeprom);
>   return 1;
>   }
> -- 
> 2.29.2
> 


signature.asc
Description: PGP signature


Re: [PATCH net-next v1 1/9] ethtool: Add support for configuring frame preemption

2020-12-07 Thread Michal Kubecek
On Mon, Dec 07, 2020 at 02:11:48PM -0800, Vinicius Costa Gomes wrote:
> Jakub Kicinski  writes:
> >> + * @min_frag_size_mult: Minimum size for all non-final fragment size,
> >> + * expressed in terms of X in '(1 + X)*64 + 4'
> >
> > Is this way of expressing the min frag size from the standard?
> >
> 
> The standard has this: "A 2-bit integer value indicating, in units of 64
> octets, the minimum number of octets over 64 octets required in
> non-final fragments by the receiver" from IEEE 802.3br-2016, Table
> 79-7a.

Can we be sure that newer version of the standard cannot change this,
e.g. come with a finer granularity? Perhaps it would be safer to express
the size in bytes in the userspace API and translate to this internal
representation in common ethtool code.

Also, please don't forget to update Documentation/networking/ethtool-netlink.rst

Michal


[PATCH net] ethtool: fix stack overflow in ethnl_parse_bitset()

2020-12-08 Thread Michal Kubecek
Syzbot reported a stack overflow in bitmap_from_arr32() called from
ethnl_parse_bitset() when bitset from netlink message is longer than
target bitmap length. While ethnl_compact_sanity_checks() makes sure that
trailing part is all zeros (i.e. the request does not try to touch bits
kernel does not recognize), we also need to cap change_bits to nbits so
that we don't try to write past the prepared bitmaps.

Fixes: 88db6d1e4f62 ("ethtool: add ethnl_parse_bitset() helper")
Reported-by: syzbot+9d39fa49d4df294aa...@syzkaller.appspotmail.com
Signed-off-by: Michal Kubecek 
---
 net/ethtool/bitset.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ethtool/bitset.c b/net/ethtool/bitset.c
index 1fb3603d92ad..0515d6604b3b 100644
--- a/net/ethtool/bitset.c
+++ b/net/ethtool/bitset.c
@@ -628,6 +628,8 @@ int ethnl_parse_bitset(unsigned long *val, unsigned long 
*mask,
return ret;
 
change_bits = nla_get_u32(tb[ETHTOOL_A_BITSET_SIZE]);
+   if (change_bits > nbits)
+   change_bits = nbits;
bitmap_from_arr32(val, nla_data(tb[ETHTOOL_A_BITSET_VALUE]),
  change_bits);
if (change_bits < nbits)
-- 
2.29.2



Re: [RFC net-next 1/2] ethtool: add support for controling the type of adaptive coalescing

2020-11-19 Thread Michal Kubecek
On Thu, Nov 19, 2020 at 05:15:57AM +0100, Andrew Lunn wrote:
> > diff --git a/include/uapi/linux/ethtool_netlink.h 
> > b/include/uapi/linux/ethtool_netlink.h
> > index e2bf36e..e3458d9 100644
> > --- a/include/uapi/linux/ethtool_netlink.h
> > +++ b/include/uapi/linux/ethtool_netlink.h
> > @@ -366,6 +366,7 @@ enum {
> > ETHTOOL_A_COALESCE_TX_USECS_HIGH,   /* u32 */
> > ETHTOOL_A_COALESCE_TX_MAX_FRAMES_HIGH,  /* u32 */
> > ETHTOOL_A_COALESCE_RATE_SAMPLE_INTERVAL,/* u32 */
> > +   ETHTOOL_A_COALESCE_USE_DIM, /* u8 */
> 
> This appears to be a boolean? So /* flag */ would be better. Or do you
> think there is scope for a few different algorithms, and an enum would
> be better. If so, you should add the enum with the two current
> options.

NLA_FLAG would suffice for a read only bool attribute but if the
attribute is expected to be set, we need to distinguish three cases:
set to true, set to false and keep current value. That's why we use
NLA_U8 for read write bool attributes (0 false, anything else true and
attribute not present means leave untouched).

Michal


Re: [RFC net-next 1/2] ethtool: add support for controling the type of adaptive coalescing

2020-11-19 Thread Michal Kubecek
On Thu, Nov 19, 2020 at 04:56:42PM +0800, tanhuazhong wrote:
> On 2020/11/19 12:15, Andrew Lunn wrote:
> > > diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
> > > index 9ca87bc..afd8de2 100644
> > > --- a/include/uapi/linux/ethtool.h
> > > +++ b/include/uapi/linux/ethtool.h
> > > @@ -433,6 +433,7 @@ struct ethtool_modinfo {
> > >*  a TX interrupt, when the packet rate is above @pkt_rate_high.
> > >* @rate_sample_interval: How often to do adaptive coalescing packet 
> > > rate
> > >*  sampling, measured in seconds.  Must not be zero.
> > > + * @use_dim: Use DIM for IRQ coalescing, if adaptive coalescing is 
> > > enabled.
> > >*
> > >* Each pair of (usecs, max_frames) fields specifies that interrupts
> > >* should be coalesced until
> > > @@ -483,6 +484,7 @@ struct ethtool_coalesce {
> > >   __u32   tx_coalesce_usecs_high;
> > >   __u32   tx_max_coalesced_frames_high;
> > >   __u32   rate_sample_interval;
> > > + __u32   use_dim;
> > >   };
> > 
> > You cannot do this.
> > 
> > static noinline_for_stack int ethtool_set_coalesce(struct net_device *dev,
> > void __user *useraddr)
> > {
> >  struct ethtool_coalesce coalesce;
> >  int ret;
> > 
> >  if (!dev->ethtool_ops->set_coalesce)
> >  return -EOPNOTSUPP;
> > 
> >  if (copy_from_user(&coalesce, useraddr, sizeof(coalesce)))
> >  return -EFAULT;
> > 
> > An old ethtool binary is not going to set this extra last byte to
> > anything meaningful. You cannot tell if you have an old or new user
> > space, so you have no idea if it put anything into use_dim, or if it
> > is random junk.

Even worse, as there is no indication of data length, ETHTOOL_GCOALESCE
ioctl request from old ethtool on new kernel would result in kernel
writing past the end of userspace buffer.

> > You have to leave the IOCTL interface unchanged, and limit this new
> > feature to the netlink API.
> > 
> 
> Hi, Andrew.
> thanks for pointing out this problem, i will fix it.
> without callling set_coalesce/set_coalesce of ethtool_ops, do you have any
> suggestion for writing/reading this new attribute to/from the driver? add a
> new field in net_device or a new callback function in ethtool_ops seems not
> good.

We could use a similar approach as struct ethtool_link_ksettings, e.g.

struct kernel_ethtool_coalesce {
struct ethtool_coalesce base;
/* new members which are not part of UAPI */
}

get_coalesce() and set_coalesce() would get pointer to struct
kernel_ethtool_coalesce and ioctl code would be modified to only touch
the base (legacy?) part.

While already changing the ops arguments, we could also add extack
pointer, either as a separate argument or as struct member (I slightly
prefer the former).

BtW, please don't forget to update the message descriptions in
Documentation/networking/ethtool-netlink.rst

Michal


Re: [RFC net-next 1/2] ethtool: add support for controling the type of adaptive coalescing

2020-11-19 Thread Michal Kubecek
On Fri, Nov 20, 2020 at 10:59:59AM +0800, tanhuazhong wrote:
> On 2020/11/20 6:02, Michal Kubecek wrote:
> > 
> > We could use a similar approach as struct ethtool_link_ksettings, e.g.
> > 
> > struct kernel_ethtool_coalesce {
> > struct ethtool_coalesce base;
> > /* new members which are not part of UAPI */
> > }
> > 
> > get_coalesce() and set_coalesce() would get pointer to struct
> > kernel_ethtool_coalesce and ioctl code would be modified to only touch
> > the base (legacy?) part.
> >  > While already changing the ops arguments, we could also add extack
> > pointer, either as a separate argument or as struct member (I slightly
> > prefer the former).
> 
> If changing the ops arguments, each driver who implement
> set_coalesce/get_coalesce of ethtool_ops need to be updated. Is it
> acceptable adding two new ops to get/set ext_coalesce info (like
> ecc31c60240b ("ethtool: Add link extended state") does)? Maybe i can send V2
> in this way, and then could you help to see which one is more suitable?

If it were just this one case, adding an extra op would be perfectly
fine. But from long term point of view, we should expect extending also
other existing ethtool requests and going this way for all of them would
essentially double the number of callbacks in struct ethtool_ops.

Michal


Re: [RFC net-next 1/2] ethtool: add support for controling the type of adaptive coalescing

2020-11-20 Thread Michal Kubecek
On Fri, Nov 20, 2020 at 02:39:38PM +0100, Andrew Lunn wrote:
> On Fri, Nov 20, 2020 at 08:23:22AM +0100, Michal Kubecek wrote:
> > On Fri, Nov 20, 2020 at 10:59:59AM +0800, tanhuazhong wrote:
> > > On 2020/11/20 6:02, Michal Kubecek wrote:
> > > > 
> > > > We could use a similar approach as struct ethtool_link_ksettings, e.g.
> > > > 
> > > > struct kernel_ethtool_coalesce {
> > > > struct ethtool_coalesce base;
> > > > /* new members which are not part of UAPI */
> > > > }
> > > > 
> > > > get_coalesce() and set_coalesce() would get pointer to struct
> > > > kernel_ethtool_coalesce and ioctl code would be modified to only touch
> > > > the base (legacy?) part.
> > > >  > While already changing the ops arguments, we could also add extack
> > > > pointer, either as a separate argument or as struct member (I slightly
> > > > prefer the former).
> > > 
> > > If changing the ops arguments, each driver who implement
> > > set_coalesce/get_coalesce of ethtool_ops need to be updated. Is it
> > > acceptable adding two new ops to get/set ext_coalesce info (like
> > > ecc31c60240b ("ethtool: Add link extended state") does)? Maybe i can send 
> > > V2
> > > in this way, and then could you help to see which one is more suitable?
> > 
> > If it were just this one case, adding an extra op would be perfectly
> > fine. But from long term point of view, we should expect extending also
> > other existing ethtool requests and going this way for all of them would
> > essentially double the number of callbacks in struct ethtool_ops.
> 
> coccinella might be useful here.

I played with spatch a bit and it with the spatch and patch below, I got
only three build failures (with allmodconfig) that would need to be
addressed manually - these drivers call the set_coalesce() callback on
device initialization.

I also tried to make the structure argument const in ->set_coalesce()
but that was more tricky as adjusting other functions that the structure
is passed to required either running the spatch three times or repeating
the same two rules three times in the spatch (or perhaps there is
a cleaner way but I'm missing relevant knowledge of coccinelle). Then
there was one more problem in i40e driver which modifies the structure
before passing it on to its helpers. It could be worked around but I'm
not sure if constifying the argument is worth these extra complications.

Michal


semantic patch:

@getc@
 identifier fn;
 identifier ops;
@@
 struct ethtool_ops ops = {
 ...
 ,.get_coalesce = fn,
 ...
 };
 

@@
identifier getc.fn;
identifier dev, data;
@@
-int fn(struct net_device *dev, struct ethtool_coalesce *data)
+int fn(struct net_device *dev, struct netlink_ext_ack *extack, struct 
kernel_ethtool_coalesce *data)
 {
+struct ethtool_coalesce *coal_base = &data->base;
 <...
-data
+coal_base
 ...>
 }


@setc@
 identifier fn;
 identifier ops;
@@
 struct ethtool_ops ops = {
 ...
 ,.set_coalesce = fn,
 ...
 };
 

@@
identifier setc.fn;
identifier dev, data;
@@
-int fn(struct net_device *dev, struct ethtool_coalesce *data)
+int fn(struct net_device *dev, struct netlink_ext_ack *extack, struct 
kernel_ethtool_coalesce *data)
 {
+struct ethtool_coalesce *coal_base = &data->base;
 <...
-data
+coal_base
 ...>
 }


maual part:

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 6408b446051f..01d049d36120 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -15,6 +15,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 #ifdef CONFIG_COMPAT
@@ -176,6 +177,10 @@ extern int
 __ethtool_get_link_ksettings(struct net_device *dev,
 struct ethtool_link_ksettings *link_ksettings);
 
+struct kernel_ethtool_coalesce {
+   struct ethtool_coalesce base;
+};
+
 /**
  * ethtool_intersect_link_masks - Given two link masks, AND them together
  * @dst: first mask and where result is stored
@@ -436,8 +441,12 @@ struct ethtool_ops {
  struct ethtool_eeprom *, u8 *);
int (*set_eeprom)(struct net_device *,
  struct ethtool_eeprom *, u8 *);
-   int (*get_coalesce)(struct net_device *, struct ethtool_coalesce *);
-   int (*set_coalesce)(struct net_device *, struct ethtool_coalesce *);
+   int (*get_coalesce)(struct net_device *,
+   struct netlink_ext_ack *extack,
+

Re: [PATCH net-next 1/6] ethtool: Extend link modes settings uAPI with lanes

2020-11-24 Thread Michal Kubecek
On Mon, Nov 23, 2020 at 09:47:53AM +, Danielle Ratson wrote:
> 
> 
> > -Original Message-
> > From: Michal Kubecek 
> > Sent: Thursday, October 22, 2020 7:28 PM
> > To: Danielle Ratson 
> > Cc: Jiri Pirko ; Andrew Lunn ; Jakub 
> > Kicinski ; Ido Schimmel
> > ; netdev@vger.kernel.org; da...@davemloft.net; Jiri 
> > Pirko ; f.faine...@gmail.com; mlxsw
> > ; Ido Schimmel ; 
> > johan...@sipsolutions.net
> > Subject: Re: [PATCH net-next 1/6] ethtool: Extend link modes settings uAPI 
> > with lanes
> > 
> > On Thu, Oct 22, 2020 at 06:15:48AM +0000, Danielle Ratson wrote:
> > > > -Original Message-
> > > > From: Michal Kubecek 
> > > > Sent: Wednesday, October 21, 2020 11:48 AM
> > > >
> > > > Ah, right, it does. But as you extend struct ethtool_link_ksettings
> > > > and drivers will need to be updated to provide this information,
> > > > wouldn't it be more useful to let the driver provide link mode in
> > > > use instead (and derive number of lanes from it)?
> > >
> > > This is the way it is done with the speed parameter, so I have aligned
> > > it to it. Why the lanes should be done differently comparing to the
> > > speed?
> > 
> > Speed and duplex have worked this way since ages and the interface was 
> > probably introduced back in times when combination of
> > speed and duplex was sufficient to identify the link mode. This is no 
> > longer the case and even adding number of lanes wouldn't make
> > the combination unique. So if we are going to extend the interface now and 
> > update drivers to provide extra information, I believe it
> > would be more useful to provide full information.
> > 
> > Michal
> 
> Hi Michal,
> 
> What do you think of passing the link modes you have suggested as a
> bitmask, similar to "supported", that contains only one positive bit?
> Something like that:
> 
> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
> index afae2beacbc3..dd946c88daa3 100644
> --- a/include/linux/ethtool.h
> +++ b/include/linux/ethtool.h
> @@ -127,6 +127,7 @@ struct ethtool_link_ksettings {
> __ETHTOOL_DECLARE_LINK_MODE_MASK(supported);
> __ETHTOOL_DECLARE_LINK_MODE_MASK(advertising);
> __ETHTOOL_DECLARE_LINK_MODE_MASK(lp_advertising);
> +   __ETHTOOL_DECLARE_LINK_MODE_MASK(chosen);
> } link_modes;
> u32 lanes;
>  };
> 
> Do you have perhaps a better suggestion?

Not sure if it's better but as we know there is only one mode, we could
simply pass the index. We would still need to reserve a special value
for none/unknown but getting an index would make lookup easier.

> And the speed and duplex parameters should be removed from being
> passed like as well, right?

We cannot remove them from struct ethtool_link_settings and the ioctl
and netlink messages as those are part of UAPI and we have to preserve
backward compatibility. But drivers which provide link mode would not
need to fill speed and duplex in their ->get_link_ksettings() as the
common code could do that for them.

Michal


Re: [PATCH net-next 1/6] ethtool: Extend link modes settings uAPI with lanes

2020-11-26 Thread Michal Kubecek
On Wed, Nov 25, 2020 at 10:35:35AM +, Danielle Ratson wrote:
> > > What do you think of passing the link modes you have suggested as a
> > > bitmask, similar to "supported", that contains only one positive bit?
> > > Something like that:
> 
> Hi Michal,
> 
> Thanks for your response.
> 
> Actually what I said is not very accurate. 
> In ethtool, for speed 100G and 4 lanes for example, there are few link modes 
> that fits:
> ETHTOOL_LINK_MODE_10baseKR4_Full_BIT
> ETHTOOL_LINK_MODE_10baseSR4_Full_BIT
> ETHTOOL_LINK_MODE_10baseCR4_Full_BIT
> ETHTOOL_LINK_MODE_10baseLR4_ER4_Full_BIT
> 
> The difference is the media. And in the driver we shrink into one bit.
> But maybe that makes passing a bitmask more sense, or am I missing something?

But as far as I understand, at any moment, only one of these will be
actually in use so that's what the driver should report. Or is the
problem that the driver cannot identify the media in use? (To be
precise: by "cannot identify" I mean "it's not possible for the driver
to find out", not "current code does not distinguish".)

Michal


[PATCH ethtool 0/2] netlink: data lifetime error fixes

2020-11-09 Thread Michal Kubecek
Fixes of two data lifetime bugs found by testing with valgrind: one use
after free, one memory leak.

Michal Kubecek (2):
  netlink: fix use after free in netlink_run_handler()
  netlink: fix leaked instances of struct nl_socket

 netlink/netlink.c | 21 +++--
 netlink/nlsock.c  |  3 +++
 2 files changed, 18 insertions(+), 6 deletions(-)

-- 
2.29.2



[PATCH ethtool 1/2] netlink: fix use after free in netlink_run_handler()

2020-11-09 Thread Michal Kubecek
Valgrind detected use after free in netlink_run_handler(): some members of
struct nl_context are accessed after the netlink context is freed by
netlink_done(). Use local variables to store the two flags and check them
instead.

Fixes: 6c19c0d559c8 ("netlink: use genetlink ops information to decide about 
fallback")
Signed-off-by: Michal Kubecek 
---
 netlink/netlink.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/netlink/netlink.c b/netlink/netlink.c
index f655f6ea25b7..bdd3048e 100644
--- a/netlink/netlink.c
+++ b/netlink/netlink.c
@@ -457,6 +457,7 @@ void netlink_run_handler(struct cmd_context *ctx, nl_func_t 
nlfunc,
 bool no_fallback)
 {
bool wildcard = ctx->devname && !strcmp(ctx->devname, WILDCARD_DEVNAME);
+   bool wildcard_unsupported, ioctl_fallback;
struct nl_context *nlctx;
const char *reason;
int ret;
@@ -478,14 +479,17 @@ void netlink_run_handler(struct cmd_context *ctx, 
nl_func_t nlfunc,
nlctx = ctx->nlctx;
 
ret = nlfunc(ctx);
+   wildcard_unsupported = nlctx->wildcard_unsupported;
+   ioctl_fallback = nlctx->ioctl_fallback;
netlink_done(ctx);
-   if (no_fallback || ret != -EOPNOTSUPP || !nlctx->ioctl_fallback) {
-   if (nlctx->wildcard_unsupported)
+
+   if (no_fallback || ret != -EOPNOTSUPP || !ioctl_fallback) {
+   if (wildcard_unsupported)
fprintf(stderr, "%s\n",
"subcommand does not support wildcard dump");
exit(ret >= 0 ? ret : 1);
}
-   if (nlctx->wildcard_unsupported)
+   if (wildcard_unsupported)
reason = "subcommand does not support wildcard dump";
else
reason = "kernel netlink support for subcommand missing";
-- 
2.29.2



[PATCH ethtool 2/2] netlink: fix leaked instances of struct nl_socket

2020-11-09 Thread Michal Kubecek
Valgrind detected memory leaks caused by missing cleanup of netlink
context's ethnl_socket, ethnl2_socket and rtnl_socket. Also, contrary to
its description, nlsock_done() does not free struct nl_socket itself.
Fix nlsock_done() to free the structure and use it to dispose of sockets
pointed to by struct nl_context members.

Fixes: 50efb3cdd2bb ("netlink: netlink socket wrapper and helpers")
Fixes: 87307c30724d ("netlink: initialize ethtool netlink socket")
Fixes: 7f3585b22a4b ("netlink: add handler for permaddr (-P)")
Signed-off-by: Michal Kubecek 
---
 netlink/netlink.c | 11 ---
 netlink/nlsock.c  |  3 +++
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/netlink/netlink.c b/netlink/netlink.c
index bdd3048e..ffe06339f099 100644
--- a/netlink/netlink.c
+++ b/netlink/netlink.c
@@ -435,11 +435,16 @@ out_free:
 
 static void netlink_done(struct cmd_context *ctx)
 {
-   if (!ctx->nlctx)
+   struct nl_context *nlctx = ctx->nlctx;
+
+   if (!nlctx)
return;
 
-   free(ctx->nlctx->ops_info);
-   free(ctx->nlctx);
+   nlsock_done(nlctx->ethnl_socket);
+   nlsock_done(nlctx->ethnl2_socket);
+   nlsock_done(nlctx->rtnl_socket);
+   free(nlctx->ops_info);
+   free(nlctx);
ctx->nlctx = NULL;
cleanup_all_strings();
 }
diff --git a/netlink/nlsock.c b/netlink/nlsock.c
index ef31d8c33b29..0ec2738d81d2 100644
--- a/netlink/nlsock.c
+++ b/netlink/nlsock.c
@@ -395,8 +395,11 @@ out_msgbuff:
  */
 void nlsock_done(struct nl_socket *nlsk)
 {
+   if (!nlsk)
+   return;
if (nlsk->sk)
mnl_socket_close(nlsk->sk);
msgbuff_done(&nlsk->msgbuff);
memset(nlsk, '\0', sizeof(*nlsk));
+   free(nlsk);
 }
-- 
2.29.2



[PATCH ethtool 2/2] ethtool: Improve compatibility between netlink and ioctl interfaces

2020-11-09 Thread Michal Kubecek
From: Ido Schimmel 

With the ioctl interface, when autoneg is enabled, but without
specifying speed, duplex or link modes, the advertised link modes are
set to the supported link modes by the ethtool user space utility.

This does not happen when using the netlink interface. Fix this
incompatibility problem by having ethtool query the supported link modes
from the kernel and advertise all the "real" ones when only "autoneg on"
is specified.

Before:

Settings for eth0:
Supported ports: [ TP ]
Supported link modes:   10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supported pause frame use: No
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes:  100baseT/Half 100baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Speed: 1000Mb/s
Duplex: Full
Auto-negotiation: on
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
MDI-X: off (auto)
Supports Wake-on: umbg
Wake-on: d
Current message level: 0x0007 (7)
   drv probe link
Link detected: yes

After:

Settings for eth0:
Supported ports: [ TP ]
Supported link modes:   10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supported pause frame use: No
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes:  10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Speed: 1000Mb/s
Duplex: Full
Auto-negotiation: on
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
MDI-X: on (auto)
Supports Wake-on: umbg
Wake-on: d
Current message level: 0x0007 (7)
   drv probe link
Link detected: yes

Signed-off-by: Ido Schimmel 
Signed-off-by: Michal Kubecek 
---
 netlink/settings.c | 92 ++
 1 file changed, 92 insertions(+)

diff --git a/netlink/settings.c b/netlink/settings.c
index dc9280c114b5..90c28b1bc424 100644
--- a/netlink/settings.c
+++ b/netlink/settings.c
@@ -1115,6 +1115,93 @@ static const struct param_parser sset_params[] = {
  */
 #define SSET_MAX_MSGS 4
 
+static int linkmodes_reply_advert_all_cb(const struct nlmsghdr *nlhdr,
+void *data)
+{
+   const struct nlattr *tb[ETHTOOL_A_LINKMODES_MAX + 1] = {};
+   DECLARE_ATTR_TB_INFO(tb);
+   struct nl_msg_buff *req_msgbuff = data;
+   const struct nlattr *ours_attr;
+   struct nlattr *req_bitset;
+   uint32_t *supported_modes;
+   unsigned int modes_count;
+   unsigned int i;
+   int ret;
+
+   ret = mnl_attr_parse(nlhdr, GENL_HDRLEN, attr_cb, &tb_info);
+   if (ret < 0)
+   return MNL_CB_ERROR;
+   ours_attr = tb[ETHTOOL_A_LINKMODES_OURS];
+   if (!ours_attr)
+   return MNL_CB_ERROR;
+   modes_count = bitset_get_count(tb[ETHTOOL_A_LINKMODES_OURS], &ret);
+   if (ret < 0)
+   return MNL_CB_ERROR;
+   supported_modes = get_compact_bitset_mask(tb[ETHTOOL_A_LINKMODES_OURS]);
+   if (!supported_modes)
+   return MNL_CB_ERROR;
+
+   /* keep only "real" link modes */
+   for (i = 0; i < modes_count; i++)
+   if (!lm_class_match(i, LM_CLASS_REAL))
+   supported_modes[i / 32] &= ~((uint32_t)1 << (i % 32));
+
+   req_bitset = ethnla_nest_start(req_msgbuff, ETHTOOL_A_LINKMODES_OURS);
+   if (!req_bitset)
+   return MNL_CB_ERROR;
+
+   if (ethnla_put_u32(req_msgbuff, ETHTOOL_A_BITSET_SIZE, modes_count) ||
+   ethnla_put(req_msgbuff, ETHTOOL_A_BITSET_VALUE,
+  DIV_ROUND_UP(modes_count, 32) * sizeof(uint32_t),
+  supported_modes) ||
+   ethnla_put(req_msgbuff, ETHTOOL_A_BITSET_MASK,
+  DIV_ROUND_UP(modes_count, 32) * sizeof(uint32_t),
+  supported_modes)) {
+   ethnla_nest_cancel(req_msgbuff, req_bitset);
+   return MNL_CB_ERROR;
+   }
+
+   ethnla_nest_end(req_msgbuff, req_bitset);
+   return MNL_CB_OK;
+}
+
+/* For compatibility reasons with ioctl-based ethtool, when "autoneg on" is
+ * specified without "advertise", "speed" and "duplex", we need to query the
+ * supported link modes from the kernel and 

[PATCH ethtool 1/2] netlink: do not send messages and process replies in nl_parser()

2020-11-09 Thread Michal Kubecek
When called with group_style = PARSER_GROUP_MSG, nl_parser() not only
parses the command line and composes the messages but also sends them to
kernel and processes the replies. This is inconsistent with other modes and
also impractical as it takes the control over the process from caller where
it belongs.

Modify nl_parser() to pass composed messages back to caller (which is only
nl_sset() at the moment) and let it send requests and process replies. This
will be needed for an upcoming backward compatibility patch which will need
to inspect and possibly modify one of the composed messages.

Signed-off-by: Michal Kubecek 
---
 netlink/cable_test.c |  2 +-
 netlink/channels.c   |  2 +-
 netlink/coalesce.c   |  2 +-
 netlink/eee.c|  2 +-
 netlink/parser.c | 43 ---
 netlink/parser.h |  3 ++-
 netlink/pause.c  |  2 +-
 netlink/rings.c  |  2 +-
 netlink/settings.c   | 35 ++-
 9 files changed, 66 insertions(+), 27 deletions(-)

diff --git a/netlink/cable_test.c b/netlink/cable_test.c
index 8a7145324610..17139f7d297d 100644
--- a/netlink/cable_test.c
+++ b/netlink/cable_test.c
@@ -574,7 +574,7 @@ int nl_cable_test_tdr(struct cmd_context *ctx)
   ctx->devname, 0))
return -EMSGSIZE;
 
-   ret = nl_parser(nlctx, tdr_params, NULL, PARSER_GROUP_NEST);
+   ret = nl_parser(nlctx, tdr_params, NULL, PARSER_GROUP_NEST, NULL);
if (ret < 0)
return ret;
 
diff --git a/netlink/channels.c b/netlink/channels.c
index c6002ceeb121..894c74bcc11a 100644
--- a/netlink/channels.c
+++ b/netlink/channels.c
@@ -126,7 +126,7 @@ int nl_schannels(struct cmd_context *ctx)
   ctx->devname, 0))
return -EMSGSIZE;
 
-   ret = nl_parser(nlctx, schannels_params, NULL, PARSER_GROUP_NONE);
+   ret = nl_parser(nlctx, schannels_params, NULL, PARSER_GROUP_NONE, NULL);
if (ret < 0)
return 1;
 
diff --git a/netlink/coalesce.c b/netlink/coalesce.c
index 07a92d04b7a1..75922a91c2e7 100644
--- a/netlink/coalesce.c
+++ b/netlink/coalesce.c
@@ -254,7 +254,7 @@ int nl_scoalesce(struct cmd_context *ctx)
   ctx->devname, 0))
return -EMSGSIZE;
 
-   ret = nl_parser(nlctx, scoalesce_params, NULL, PARSER_GROUP_NONE);
+   ret = nl_parser(nlctx, scoalesce_params, NULL, PARSER_GROUP_NONE, NULL);
if (ret < 0)
return 1;
 
diff --git a/netlink/eee.c b/netlink/eee.c
index d3135b2094a4..04d8f0bbe3fc 100644
--- a/netlink/eee.c
+++ b/netlink/eee.c
@@ -174,7 +174,7 @@ int nl_seee(struct cmd_context *ctx)
   ctx->devname, 0))
return -EMSGSIZE;
 
-   ret = nl_parser(nlctx, seee_params, NULL, PARSER_GROUP_NONE);
+   ret = nl_parser(nlctx, seee_params, NULL, PARSER_GROUP_NONE, NULL);
if (ret < 0)
return 1;
 
diff --git a/netlink/parser.c b/netlink/parser.c
index 3b25f5d5a88e..c2eae93efb69 100644
--- a/netlink/parser.c
+++ b/netlink/parser.c
@@ -920,7 +920,7 @@ static void __parser_set(uint64_t *map, unsigned int idx)
 }
 
 struct tmp_buff {
-   struct nl_msg_buff  msgbuff;
+   struct nl_msg_buff  *msgbuff;
unsigned intid;
unsigned intorig_len;
struct tmp_buff *next;
@@ -951,7 +951,12 @@ static struct tmp_buff *tmp_buff_find_or_create(struct 
tmp_buff **phead,
if (!new_buff)
return NULL;
new_buff->id = id;
-   msgbuff_init(&new_buff->msgbuff);
+   new_buff->msgbuff = malloc(sizeof(*new_buff->msgbuff));
+   if (!new_buff->msgbuff) {
+   free(new_buff);
+   return NULL;
+   }
+   msgbuff_init(new_buff->msgbuff);
new_buff->next = NULL;
*pbuff = new_buff;
 
@@ -965,7 +970,10 @@ static void tmp_buff_destroy(struct tmp_buff *head)
 
while (buff) {
next = buff->next;
-   msgbuff_done(&buff->msgbuff);
+   if (buff->msgbuff) {
+   msgbuff_done(buff->msgbuff);
+   free(buff->msgbuff);
+   }
free(buff);
buff = next;
}
@@ -980,13 +988,22 @@ static void tmp_buff_destroy(struct tmp_buff *head)
  *   param_parser::offset)
  * @group_style: defines if identifiers in .group represent separate messages,
  *   nested attributes or are not allowed
+ * @msgbuffs:(only used for @group_style = PARSER_GROUP_MSG) array to store
+ *   pointers to composed messages; caller must make sure this
+ *   array is sufficient, i.e. that it has at least as many entries
+ *   as the number of different .group values in params array;
+ *   entries are filled from the start, rema

[PATCH ethtool 0/2] netlink: improve compatibility with ioctl interface

2020-11-09 Thread Michal Kubecek
Restore special behavior of "ethtool -s  autoneg on" if no advertised
modes, speed and duplex are requested: ioctl code enables all link modes
supported by the device. This is most important for network devices which
report no advertised modes when autonegotiation is disabled.

First patch cleans up the parser interface; it allows nl_sset() to inspect
the composed message and append an attribute to it if needed.

Ido Schimmel (1):
  ethtool: Improve compatibility between netlink and ioctl interfaces

Michal Kubecek (1):
  netlink: do not send messages and process replies in nl_parser()

 netlink/cable_test.c |   2 +-
 netlink/channels.c   |   2 +-
 netlink/coalesce.c   |   2 +-
 netlink/eee.c|   2 +-
 netlink/parser.c |  43 ++-
 netlink/parser.h |   3 +-
 netlink/pause.c  |   2 +-
 netlink/rings.c  |   2 +-
 netlink/settings.c   | 127 +--
 9 files changed, 158 insertions(+), 27 deletions(-)

-- 
2.29.2



Re: [PATCH v2 net] ethtool: netlink: add missing netdev_features_change() call

2020-11-09 Thread Michal Kubecek
On Sun, Nov 08, 2020 at 12:46:15AM +, Alexander Lobakin wrote:
> After updating userspace Ethtool from 5.7 to 5.9, I noticed that
> NETDEV_FEAT_CHANGE is no more raised when changing netdev features
> through Ethtool.
> That's because the old Ethtool ioctl interface always calls
> netdev_features_change() at the end of user request processing to
> inform the kernel that our netdevice has some features changed, but
> the new Netlink interface does not. Instead, it just notifies itself
> with ETHTOOL_MSG_FEATURES_NTF.
> Replace this ethtool_notify() call with netdev_features_change(), so
> the kernel will be aware of any features changes, just like in case
> with the ioctl interface. This does not omit Ethtool notifications,
> as Ethtool itself listens to NETDEV_FEAT_CHANGE and drops
> ETHTOOL_MSG_FEATURES_NTF on it
> (net/ethtool/netlink.c:ethnl_netdev_event()).
> 
> From v1 [1]:
> - dropped extra new line as advised by Jakub;
> - no functional changes.
> 
> [1] 
> https://lore.kernel.org/netdev/alzxq2o5uutvhcfngoiggj8vj3kgo5yiwanqjh0...@cp3-web-009.plabs.ch
> 
> Fixes: 0980bfcd6954 ("ethtool: set netdev features with FEATURES_SET request")
> Signed-off-by: Alexander Lobakin 

Reviewed-by: Michal Kubecek 

> ---
>  net/ethtool/features.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/ethtool/features.c b/net/ethtool/features.c
> index 8ee4cdbd6b82..1c9f4df273bd 100644
> --- a/net/ethtool/features.c
> +++ b/net/ethtool/features.c
> @@ -280,7 +280,7 @@ int ethnl_set_features(struct sk_buff *skb, struct 
> genl_info *info)
> active_diff_mask, compact);
>   }
>   if (mod)
> - ethtool_notify(dev, ETHTOOL_MSG_FEATURES_NTF, NULL);
> + netdev_features_change(dev);
>  
>  out_rtnl:
>   rtnl_unlock();
> -- 
> 2.29.2
> 
> 


signature.asc
Description: PGP signature


Re: [PATCH net-next 1/1] netdevsim: support ethtool ring and coalesce settings

2020-11-13 Thread Michal Kubecek
On Thu, Nov 12, 2020 at 04:12:29PM +0100, Antonio Cardace wrote:
> Add ethtool ring and coalesce settings support for testing.
> 
> Signed-off-by: Antonio Cardace 
> ---
>  drivers/net/netdevsim/ethtool.c   | 65 +++
>  drivers/net/netdevsim/netdevsim.h |  9 -
>  2 files changed, 65 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/net/netdevsim/ethtool.c b/drivers/net/netdevsim/ethtool.c
> index f1884d90a876..25acd3bc1781 100644
> --- a/drivers/net/netdevsim/ethtool.c
> +++ b/drivers/net/netdevsim/ethtool.c
> @@ -7,15 +7,18 @@
>  
>  #include "netdevsim.h"
>  
> +#define UINT32_MAX 0xU

We already have U32_MAX in 

> +#define ETHTOOL_COALESCE_ALL_PARAMS UINT32_MAX

I would rather prefer this constant to include only bits corresponding
to parameters which actually exist, i.e. either GENMASK(21, 0) or
combination of existing ETHTOOL_COALESCE_* macros. It should probably
be defined in include/linux/ethtool.h then.

[...]
> +static void nsim_get_ringparam(struct net_device *dev, struct 
> ethtool_ringparam *ring)
> +{
> + struct netdevsim *ns = netdev_priv(dev);
> +
> + memcpy(ring, &ns->ethtool.ring, sizeof(ns->ethtool.ring));
> +}
> +
> +static int nsim_set_ringparam(struct net_device *dev, struct 
> ethtool_ringparam *ring)
> +{
> + struct netdevsim *ns = netdev_priv(dev);
> +
> + memcpy(&ns->ethtool.ring, ring, sizeof(ns->ethtool.ring));
>   return 0;
>  }
[...]
>  
> +static void nsim_ethtool_coalesce_init(struct netdevsim *ns)
> +{
> + ns->ethtool.ring.rx_max_pending = UINT32_MAX;
> + ns->ethtool.ring.rx_jumbo_max_pending = UINT32_MAX;
> + ns->ethtool.ring.rx_mini_max_pending = UINT32_MAX;
> + ns->ethtool.ring.tx_max_pending = UINT32_MAX;
> +}

This way an ETHTOOL_MSG_RINGS_SET request would never fail. It would be
more useful for selftests if the max values were more realistic and
ideally also configurable via debugfs.

Michal


Re: [PATCH net-next v3 1/4] ethtool: add ETHTOOL_COALESCE_ALL_PARAMS define

2020-11-16 Thread Michal Kubecek
On Sat, Nov 14, 2020 at 12:16:52AM +0100, Antonio Cardace wrote:
> This bitmask represents all existing coalesce parameters.
> 
> Signed-off-by: Antonio Cardace 

Reviewed-by: Michal Kubecek 

> ---
>  include/linux/ethtool.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
> index 6408b446051f..e3da25b51ae4 100644
> --- a/include/linux/ethtool.h
> +++ b/include/linux/ethtool.h
> @@ -215,6 +215,7 @@ bool ethtool_convert_link_mode_to_legacy_u32(u32 
> *legacy_u32,
>  #define ETHTOOL_COALESCE_TX_USECS_HIGH   BIT(19)
>  #define ETHTOOL_COALESCE_TX_MAX_FRAMES_HIGH  BIT(20)
>  #define ETHTOOL_COALESCE_RATE_SAMPLE_INTERVALBIT(21)
> +#define ETHTOOL_COALESCE_ALL_PARAMS  GENMASK(21, 0)
>  
>  #define ETHTOOL_COALESCE_USECS   
> \
>   (ETHTOOL_COALESCE_RX_USECS | ETHTOOL_COALESCE_TX_USECS)
> -- 
> 2.28.0
> 


signature.asc
Description: PGP signature


Re: [PATCH net-next v3 2/4] netdevsim: support ethtool ring and coalesce settings

2020-11-16 Thread Michal Kubecek
On Sat, Nov 14, 2020 at 12:16:53AM +0100, Antonio Cardace wrote:
> Add ethtool ring and coalesce settings support for testing.
> 
> Signed-off-by: Antonio Cardace 

Reviewed-by: Michal Kubecek 


signature.asc
Description: PGP signature


Re: [PATCH net-next v3 3/4] selftests: extract common functions in ethtool-common.sh

2020-11-16 Thread Michal Kubecek
On Sat, Nov 14, 2020 at 12:16:54AM +0100, Antonio Cardace wrote:
> Factor out some useful functions so that they can be reused
> by other ethtool-netdevsim scripts.
> 
> Signed-off-by: Antonio Cardace 

Reviewed-by: Michal Kubecek 

Just one comment:

[...]
> +function get_netdev_name {
> +local -n old=$1
> +
> +new=$(ls /sys/class/net)
> +
> +for netdev in $new; do
> + for check in $old; do
> +[ $netdev == $check ] && break
> + done
> +
> + if [ $netdev != $check ]; then
> + echo $netdev
> + break
> + fi
> +done
> +}
[...]
> +function make_netdev {
> +# Make a netdevsim
> +old_netdevs=$(ls /sys/class/net)
> +
> +if ! $(lsmod | grep -q netdevsim); then
> + modprobe netdevsim
> +fi
> +
> +echo $NSIM_ID > /sys/bus/netdevsim/new_device
> +echo `get_netdev_name old_netdevs`
> +}

This would be rather unpredictable if someone ran another selftest (or
anything else that would create a network device) in parallel. IMHO it
would be safer (and easier) to get the name of the new device from

  /sys/bus/netdevsim/devices/netdevsim${NSIM_ID}/net/

But as this is not new code and you are just moving existing code, it
can be done in a separate patch.

Michal


signature.asc
Description: PGP signature


Re: [PATCH net-next v4 5/6] selftests: refactor get_netdev_name function

2020-11-17 Thread Michal Kubecek
On Tue, Nov 17, 2020 at 04:20:14PM +0100, Antonio Cardace wrote:
> As pointed out by Michal Kubecek, getting the name
> with the previous approach was racy, it's better
> and easier to get the name of the device with this
> patch's approach.
> 
> Essentialy the function doesn't need to exist
> anymore as it's a simple 'ls' command.
> 
> Signed-off-by: Antonio Cardace 
> ---
>  .../drivers/net/netdevsim/ethtool-common.sh   | 20 ++-
>  1 file changed, 2 insertions(+), 18 deletions(-)
> 
> diff --git a/tools/testing/selftests/drivers/net/netdevsim/ethtool-common.sh 
> b/tools/testing/selftests/drivers/net/netdevsim/ethtool-common.sh
> index fa44cf6e732c..3c287ac78117 100644
> --- a/tools/testing/selftests/drivers/net/netdevsim/ethtool-common.sh
> +++ b/tools/testing/selftests/drivers/net/netdevsim/ethtool-common.sh
> @@ -20,23 +20,6 @@ function cleanup {
>  
>  trap cleanup EXIT
>  
> -function get_netdev_name {
> -local -n old=$1
> -
> -new=$(ls /sys/class/net)
> -
> -for netdev in $new; do
> - for check in $old; do
> -[ $netdev == $check ] && break
> - done
> -
> - if [ $netdev != $check ]; then
> - echo $netdev
> - break
> - fi
> -done
> -}
> -
>  function check {
>  local code=$1
>  local str=$2
> @@ -65,5 +48,6 @@ function make_netdev {
>  fi
>  
>  echo $NSIM_ID > /sys/bus/netdevsim/new_device
> -echo `get_netdev_name old_netdevs`
> +# get new device name
> +echo $(ls /sys/bus/netdevsim/devices/netdevsim${NSIM_ID}/net/)

Is there a reason for combining command substitution with echo? Couldn't
we use one of

  ls /sys/bus/netdevsim/devices/netdevsim${NSIM_ID}/net/
  echo /sys/bus/netdevsim/devices/netdevsim${NSIM_ID}/net/*

instead?

Michal

>  }
> -- 
> 2.28.0
> 


Re: "ethtool" missing "master-slave" args in "do_sset" function.[TEXT/PLAIN]

2020-12-27 Thread Michal Kubecek
On Sun, Dec 27, 2020 at 12:34:09PM +0800, Bruce LIU wrote:
> Hi Michal Kubecek and Network dev team,
> 
> Good day! Hope you are doing well.
> This is Bruce from China, and please allow me to cc Rudy from Cisco Systems
> in China team.
> 
> We are facing a weird behavior about "master-slave configuration" function
> in ethtool.
> Please correct me if I am wrong
> 
> As you know, start from ethtool 5.8,  "master/slave configuration support"
> added.
> https://lwn.net/Articles/828044/
> 
> 
> Appeal:
> Confirm and discuss workaround
> 
> 
> Issue description:
> As we test in lab, no "master-slave" option supported.
> 
> 
> Issue reproduce:
> root@raspberrypi:~# ethtool -s eth0 master-slave master-preferred
> ethtool: bad command line argument(s)
> For more information run ethtool -h
> 
> 
> Environment:
> debian-live-10.7.0-amd64-standard.iso
> Kernel 5.4.79

This is the problem. Kernel support for this feature was added in
5.8-rc1 so that your kernel does not have it and there is no chance it
could possibly work. Newer ethtool has support for this feature but
kernel must support it as well for it to actually work.

But I agree that the error message is misleading. We handle subcommands
supported only in netlink with proper error message when ioctl fallback
is used but we don't do the same for new parameters of existing
subcommands which are generally supported by ioctl code. That's why the
command line parser used by ioctl code does not recognize the new
parameter and handles it as a syntax error.

We'll need to handle new parameters in ioctl parser so that it produces
more meaningful error for parameters only supported via netlink. Long
term, the proper solution would probably be using one parser for both
netlink and ioctl but that was something I wanted to avoid for now to
reduce the risk of introducing subtle changes in behaviour of existing
code.

Michal


regression in iwlwifi: page fault in iwl_dbg_tlv_alloc_region() (commit ba8f6f4ae254)

2020-12-28 Thread Michal Kubecek
FYI, there is a regression in iwlwifi driver caused by commit
ba8f6f4ae254 ("iwlwifi: dbg: add dumping special device memory")
reported at

  https://bugzilla.kernel.org/show_bug.cgi?id=210733
  https://bugzilla.suse.com/show_bug.cgi?id=1180344

The problem seems to be an attempt to write terminating null character
into a string which may be read only. There is also a proposed fix.

Michal


Re: [PATCH net-next 1/6] ethtool: Extend link modes settings uAPI with lanes

2020-10-19 Thread Michal Kubecek
On Mon, Oct 19, 2020 at 07:19:34AM +, Danielle Ratson wrote:
> > -Original Message-
> > From: Andrew Lunn 
> > Sent: Saturday, October 17, 2020 1:16 AM
> > 
> > I'm not sure i fully understand all these different link modes, but
> > i thought these 5 are all 100G using 2 lanes? So why cannot the user
> > simply do
> > 
> > ethtool -s swp1 advertise 10baseKR2/Full
> > 
> > and the driver can figure out it needs to use two lanes at 50G?
> > 
> > Andrew
> 
> Hi Andrew,
> 
> Thanks for the feedback.
> 
> I guess you mean " ethtool -s swp1 advertise 10baseKR2/Full on".
> 
> First, the idea might work but only for auto negotiation mode, whereas
> the lanes parameter is a solution for both.
> 
> Second, the command as you have suggested it, wouldn't change anything
> in the current situation as I checked. We can disable all the others
> and leave only the one we want but the command doesn't filter the
> other link modes but it just turns the mentioned link modes up if they
> aren't. However, the lanes parameter is a selector, which make it much
> more user friendly in my opinion.

It would be quite easy to extend the ethtool command line parser to
allow also

  ethtool -s  advertise  ...

in addition to already supported

  ethtool -s  advertise 
  ethtool -s  advertise /
  ethtool -s {  { on | off } } ...

Parser converting simple list of values into a maskless bitset is
already there so it would be only matter of checking if there are at
least two arguments and second is "on" or "off" and using corresponding
parser. I think it would be useful independently of this series.

> Also, we can't turn only one of them up. But you have to set for
> example:
> 
> $ ethtool -s swp1 advertise 10baseKR2/Full on 10baseSR2/Full on 
> 10baseCR2/Full on 10baseLR2_ER2_FR2/Full on 10baseDR2/Full on
> 
> Am I missing something?

IIUC Jakub's concern is rather about real life need for such selectors,
i.e. how realistic is "I want a(ny) 100Gb/s mode with two lanes" as an
actual user need; if it wouldn't be mostly (or only) used as a quick way
to distinguish between two supported 100Gb/s modes.

IMHO if we go this way, we should consider going all the way, i.e. allow
also selecting by the remaining part of the mode ("media type", e.g.
"LR", not sure what the official name is) and, more important, get full
information about link mode in use from driver (we only get speed and
duplex at the moment). But that would require changes in the
get_linksettings() interface and drivers.

Michal


Re: [PATCH ethtool v3 0/7] pause frame stats

2020-10-19 Thread Michal Kubecek
On Sun, Oct 18, 2020 at 02:31:44PM -0700, Jakub Kicinski wrote:
> Hi!
> 
> Sorry about the delay from v2.

Actually, I'm rather surprised you were able to get back to this so
early, given the situation.

> This set adds support for pause frame statistics.
> 
> First pause frame info is extended to support --json.
> 
> Pause stats are first of this kind of statistics so add
> support for a new flag (--include-statistics).
> 
> Next add support for dumping policy to check if the statistics
> flag is supported for a given subcommand.
> 
> Last but not least - display statistics.
> 
> v3:
>  - rename the ctx variable to policy_ctx
>  - instead of returning the flags policy save it to a member
>of struct nl_context, for potential reuse. Also we don't
>have to worry about returning flags and negative errors
>from the read helper this way :)
> 
> Jakub Kicinski (7):
>   update UAPI header copies
>   pause: add --json support
>   separate FLAGS out in -h
>   add support for stats in subcommands
>   netlink: prepare for more per-op info
>   netlink: use policy dumping to check if stats flag is supported
>   pause: add support for dumping statistics
> 
>  ethtool.8.in   |   7 ++
>  ethtool.c  |  17 +++-
>  internal.h |   1 +
>  netlink/coalesce.c |   6 +-
>  netlink/msgbuff.h  |   6 ++
>  netlink/netlink.c  | 179 ++---
>  netlink/netlink.h  |  31 +--
>  netlink/pause.c| 111 ++---
>  uapi/linux/genetlink.h |  11 +++
>  uapi/linux/netlink.h   |   4 +
>  10 files changed, 336 insertions(+), 37 deletions(-)

Series applied, thank you.

Michal


signature.asc
Description: PGP signature


Re: [PATCH net-next 1/6] ethtool: Extend link modes settings uAPI with lanes

2020-10-19 Thread Michal Kubecek
On Mon, Oct 19, 2020 at 02:26:43PM +0200, Jiri Pirko wrote:
> Mon, Oct 19, 2020 at 01:04:22PM CEST, mkube...@suse.cz wrote:
> >
> >It would be quite easy to extend the ethtool command line parser to
> >allow also
> >
> >  ethtool -s  advertise  ...
> >
> >in addition to already supported
> >
> >  ethtool -s  advertise 
> >  ethtool -s  advertise /
> >  ethtool -s {  { on | off } } ...

This should have been

  ethtool -s  advertise {  { on | off } } ...

> >Parser converting simple list of values into a maskless bitset is
> >already there so it would be only matter of checking if there are at
> >least two arguments and second is "on" or "off" and using corresponding
> >parser. I think it would be useful independently of this series.
> 
> Understood. So basically you will pass some selectors on cmdline and the
> uapi would stay intact.
> How do you imagine the specific lane number selection should look like
> on the cmdline?

As I said, I meant the extension suggested in my mail as independent of
what this series is about. For lanes count selector, I find proposed

  ethtool -s  ... lanes  ...

the most natural.

>From purely syntactic/semantic point of view, there are three types of
requests:

  (1) enable specific set of modes, disable the rest
  (2) enable/disable specific modes, leave the rest as they are
  (3) enable modes matching a condition (and disable the rest)

What I proposed was to allow the use symbolic names instead of masks
(which are getting more and more awful with each new mode) also for (1),
like they can already be used for (2).

The lanes selector is an extension of (3) which I would prefer not to
mix with (1) or (2) within one command line, i.e. either "advertise" or
"speed / duplex / lanes".

IIUC Jakub's and Andrew's comments were not so much about the syntax and
semantic (which is quite clear) but rather about the question if the
requests like "advertise exactly the modes with (100Gb/s speed and) two
lanes" would really address a real life need and wouldn't be more often
used as shortcuts for "advertise 10baseKR2/Full". (On the other
hand, I suspect existing speed and duplex selectors are often used the
same way.)

Michal


[PATCH ethtool 3/4] netlink: add descriptions for genetlink policy dumps

2020-10-19 Thread Michal Kubecek
Add GENL_ID_CTRL message descriptions for messages and attributes used
for policy dumps.

Signed-off-by: Michal Kubecek 
---
 netlink/desc-genlctrl.c | 57 +
 1 file changed, 57 insertions(+)

diff --git a/netlink/desc-genlctrl.c b/netlink/desc-genlctrl.c
index 9840179b0a1a..43b41ab395b8 100644
--- a/netlink/desc-genlctrl.c
+++ b/netlink/desc-genlctrl.c
@@ -29,6 +29,59 @@ static const struct pretty_nla_desc __mcgrps_desc[] = {
NLATTR_DESC_NESTED(0, mcgrp),
 };
 
+static const char *__policy_attr_type_names[] = {
+   [NL_ATTR_TYPE_INVALID]  = "NL_ATTR_TYPE_INVALID",
+   [NL_ATTR_TYPE_FLAG] = "NL_ATTR_TYPE_FLAG",
+   [NL_ATTR_TYPE_U8]   = "NL_ATTR_TYPE_U8",
+   [NL_ATTR_TYPE_U16]  = "NL_ATTR_TYPE_U16",
+   [NL_ATTR_TYPE_U32]  = "NL_ATTR_TYPE_U32",
+   [NL_ATTR_TYPE_U64]  = "NL_ATTR_TYPE_U64",
+   [NL_ATTR_TYPE_S8]   = "NL_ATTR_TYPE_S8",
+   [NL_ATTR_TYPE_S16]  = "NL_ATTR_TYPE_S16",
+   [NL_ATTR_TYPE_S32]  = "NL_ATTR_TYPE_S32",
+   [NL_ATTR_TYPE_S64]  = "NL_ATTR_TYPE_S64",
+   [NL_ATTR_TYPE_BINARY]   = "NL_ATTR_TYPE_BINARY",
+   [NL_ATTR_TYPE_STRING]   = "NL_ATTR_TYPE_STRING",
+   [NL_ATTR_TYPE_NUL_STRING]   = "NL_ATTR_TYPE_NUL_STRING",
+   [NL_ATTR_TYPE_NESTED]   = "NL_ATTR_TYPE_NESTED",
+   [NL_ATTR_TYPE_NESTED_ARRAY] = "NL_ATTR_TYPE_NESTED_ARRAY",
+   [NL_ATTR_TYPE_BITFIELD32]   = "NL_ATTR_TYPE_BITFIELD32",
+};
+
+static const struct pretty_nla_desc __policy_attr_desc[] = {
+   NLATTR_DESC_INVALID(NL_POLICY_TYPE_ATTR_UNSPEC),
+   NLATTR_DESC_U32_ENUM(NL_POLICY_TYPE_ATTR_TYPE, policy_attr_type),
+   NLATTR_DESC_S64(NL_POLICY_TYPE_ATTR_MIN_VALUE_S),
+   NLATTR_DESC_S64(NL_POLICY_TYPE_ATTR_MAX_VALUE_S),
+   NLATTR_DESC_U64(NL_POLICY_TYPE_ATTR_MIN_VALUE_U),
+   NLATTR_DESC_U64(NL_POLICY_TYPE_ATTR_MAX_VALUE_U),
+   NLATTR_DESC_U32(NL_POLICY_TYPE_ATTR_MIN_LENGTH),
+   NLATTR_DESC_U32(NL_POLICY_TYPE_ATTR_MAX_LENGTH),
+   NLATTR_DESC_U32(NL_POLICY_TYPE_ATTR_POLICY_IDX),
+   NLATTR_DESC_U32(NL_POLICY_TYPE_ATTR_POLICY_MAXTYPE),
+   NLATTR_DESC_X32(NL_POLICY_TYPE_ATTR_BITFIELD32_MASK),
+   NLATTR_DESC_X64(NL_POLICY_TYPE_ATTR_PAD),
+   NLATTR_DESC_BINARY(NL_POLICY_TYPE_ATTR_MASK),
+};
+
+static const struct pretty_nla_desc __policy_attrs_desc[] = {
+   NLATTR_DESC_NESTED(0, policy_attr),
+};
+
+static const struct pretty_nla_desc __policies_desc[] = {
+   NLATTR_DESC_ARRAY(0, policy_attrs),
+};
+
+static const struct pretty_nla_desc __op_policy_desc[] = {
+   NLATTR_DESC_INVALID(CTRL_ATTR_POLICY_UNSPEC),
+   NLATTR_DESC_U32(CTRL_ATTR_POLICY_DO),
+   NLATTR_DESC_U32(CTRL_ATTR_POLICY_DUMP),
+};
+
+static const struct pretty_nla_desc __op_policies_desc[] = {
+   NLATTR_DESC_NESTED(0, op_policy),
+};
+
 static const struct pretty_nla_desc __attr_desc[] = {
NLATTR_DESC_INVALID(CTRL_ATTR_UNSPEC),
NLATTR_DESC_U16(CTRL_ATTR_FAMILY_ID),
@@ -38,6 +91,9 @@ static const struct pretty_nla_desc __attr_desc[] = {
NLATTR_DESC_U32(CTRL_ATTR_MAXATTR),
NLATTR_DESC_ARRAY(CTRL_ATTR_OPS, attrops),
NLATTR_DESC_ARRAY(CTRL_ATTR_MCAST_GROUPS, mcgrps),
+   NLATTR_DESC_ARRAY(CTRL_ATTR_POLICY, policies),
+   NLATTR_DESC_ARRAY(CTRL_ATTR_OP_POLICY, op_policies),
+   NLATTR_DESC_U32(CTRL_ATTR_OP),
 };
 
 const struct pretty_nlmsg_desc genlctrl_msg_desc[] = {
@@ -51,6 +107,7 @@ const struct pretty_nlmsg_desc genlctrl_msg_desc[] = {
NLMSG_DESC(CTRL_CMD_NEWMCAST_GRP, attr),
NLMSG_DESC(CTRL_CMD_DELMCAST_GRP, attr),
NLMSG_DESC(CTRL_CMD_GETMCAST_GRP, attr),
+   NLMSG_DESC(CTRL_CMD_GETPOLICY, attr),
 };
 
 const unsigned int genlctrl_msg_n_desc = ARRAY_SIZE(genlctrl_msg_desc);
-- 
2.28.0



[PATCH ethtool 4/4] netlink: add message descriptions for pause stats

2020-10-19 Thread Michal Kubecek
Add message descriptions for pretty printing of new attributes used for pause
statistics.

Signed-off-by: Michal Kubecek 
---
 netlink/desc-ethtool.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/netlink/desc-ethtool.c b/netlink/desc-ethtool.c
index ee8fc4796987..96291b9bdd3b 100644
--- a/netlink/desc-ethtool.c
+++ b/netlink/desc-ethtool.c
@@ -180,12 +180,19 @@ static const struct pretty_nla_desc __coalesce_desc[] = {
NLATTR_DESC_U32(ETHTOOL_A_COALESCE_RATE_SAMPLE_INTERVAL),
 };
 
+static const struct pretty_nla_desc __pause_stats_desc[] = {
+   NLATTR_DESC_BINARY(ETHTOOL_A_PAUSE_STAT_PAD),
+   NLATTR_DESC_U64(ETHTOOL_A_PAUSE_STAT_TX_FRAMES),
+   NLATTR_DESC_U64(ETHTOOL_A_PAUSE_STAT_RX_FRAMES),
+};
+
 static const struct pretty_nla_desc __pause_desc[] = {
NLATTR_DESC_INVALID(ETHTOOL_A_PAUSE_UNSPEC),
NLATTR_DESC_NESTED(ETHTOOL_A_PAUSE_HEADER, header),
NLATTR_DESC_BOOL(ETHTOOL_A_PAUSE_AUTONEG),
NLATTR_DESC_BOOL(ETHTOOL_A_PAUSE_RX),
NLATTR_DESC_BOOL(ETHTOOL_A_PAUSE_TX),
+   NLATTR_DESC_NESTED(ETHTOOL_A_PAUSE_STATS, pause_stats),
 };
 
 static const struct pretty_nla_desc __eee_desc[] = {
-- 
2.28.0



[PATCH ethtool 1/4] netlink: support u32 enumerated types in pretty printing

2020-10-19 Thread Michal Kubecek
Some numeric attributes take values from a short list/range with symbolic
names. Showing the symbolic names instead of numeric values will make the
pretty printed netlink messages easier to read. If the value is too big for
provided names array (e.g. running on newer kernel) or the name is omitted,
numeric attribute value is shown.

Signed-off-by: Michal Kubecek 
---
 netlink/prettymsg.c |  9 +
 netlink/prettymsg.h | 18 --
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/netlink/prettymsg.c b/netlink/prettymsg.c
index f992dcaf071f..d5d999fddfbb 100644
--- a/netlink/prettymsg.c
+++ b/netlink/prettymsg.c
@@ -137,6 +137,15 @@ static int pretty_print_attr(const struct nlattr *attr,
case NLA_BOOL:
printf("%s", mnl_attr_get_u8(attr) ? "on" : "off");
break;
+   case NLA_U32_ENUM: {
+   uint32_t val = mnl_attr_get_u32(attr);
+
+   if (adesc && val < adesc->n_names && adesc->names[val])
+   printf("%s", adesc->names[val]);
+   else
+   printf("%u", val);
+   break;
+   }
default:
if (alen <= __DUMP_LINE)
__print_binary_short(adata, alen);
diff --git a/netlink/prettymsg.h b/netlink/prettymsg.h
index b5e5f735ac8a..6987c6ec5bca 100644
--- a/netlink/prettymsg.h
+++ b/netlink/prettymsg.h
@@ -28,13 +28,20 @@ enum pretty_nla_format {
NLA_BOOL,
NLA_NESTED,
NLA_ARRAY,
+   NLA_U32_ENUM,
 };
 
 struct pretty_nla_desc {
enum pretty_nla_format  format;
const char  *name;
-   const struct pretty_nla_desc*children;
-   unsigned intn_children;
+   union {
+   const struct pretty_nla_desc*children;
+   const char  *const *names;
+   };
+   union {
+   unsigned intn_children;
+   unsigned intn_names;
+   };
 };
 
 struct pretty_nlmsg_desc {
@@ -81,6 +88,13 @@ struct pretty_nlmsg_desc {
.children = __ ## _children_desc ## _desc, \
.n_children = 1, \
}
+#define NLATTR_DESC_U32_ENUM(_name, _names_table) \
+   [_name] = { \
+   .format = NLA_U32_ENUM, \
+   .name = #_name, \
+   .names = __ ## _names_table ## _names, \
+   .n_children = ARRAY_SIZE(__ ## _names_table ## _names), \
+   }
 
 #define NLMSG_DESC(_name, _attrs) \
[_name] = { \
-- 
2.28.0



[PATCH ethtool 2/4] netlink: support 64-bit attribute types in pretty printed messages

2020-10-19 Thread Michal Kubecek
Add NLA_U64 (unsigned), NLA_X64 (unsigned, printed as hex) and NLA_S64
(signed) attribute types in pretty printing code.

Signed-off-by: Michal Kubecek 
---
 netlink/prettymsg.c | 10 ++
 netlink/prettymsg.h |  6 ++
 2 files changed, 16 insertions(+)

diff --git a/netlink/prettymsg.c b/netlink/prettymsg.c
index d5d999fddfbb..0a1fae3da54e 100644
--- a/netlink/prettymsg.c
+++ b/netlink/prettymsg.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -110,6 +111,9 @@ static int pretty_print_attr(const struct nlattr *attr,
case NLA_U32:
printf("%u", mnl_attr_get_u32(attr));
break;
+   case NLA_U64:
+   printf("%" PRIu64, mnl_attr_get_u64(attr));
+   break;
case NLA_X8:
printf("0x%02x", mnl_attr_get_u8(attr));
break;
@@ -119,6 +123,9 @@ static int pretty_print_attr(const struct nlattr *attr,
case NLA_X32:
printf("0x%08x", mnl_attr_get_u32(attr));
break;
+   case NLA_X64:
+   printf("%" PRIx64, mnl_attr_get_u64(attr));
+   break;
case NLA_S8:
printf("%d", (int)mnl_attr_get_u8(attr));
break;
@@ -128,6 +135,9 @@ static int pretty_print_attr(const struct nlattr *attr,
case NLA_S32:
printf("%d", (int)mnl_attr_get_u32(attr));
break;
+   case NLA_S64:
+   printf("%" PRId64, (int64_t)mnl_attr_get_u64(attr));
+   break;
case NLA_STRING:
printf("\"%.*s\"", alen, (const char *)adata);
break;
diff --git a/netlink/prettymsg.h b/netlink/prettymsg.h
index 6987c6ec5bca..25990cceffca 100644
--- a/netlink/prettymsg.h
+++ b/netlink/prettymsg.h
@@ -17,12 +17,15 @@ enum pretty_nla_format {
NLA_U8,
NLA_U16,
NLA_U32,
+   NLA_U64,
NLA_X8,
NLA_X16,
NLA_X32,
+   NLA_X64,
NLA_S8,
NLA_S16,
NLA_S32,
+   NLA_S64,
NLA_STRING,
NLA_FLAG,
NLA_BOOL,
@@ -62,12 +65,15 @@ struct pretty_nlmsg_desc {
 #define NLATTR_DESC_U8(_name)  NLATTR_DESC(_name, NLA_U8)
 #define NLATTR_DESC_U16(_name) NLATTR_DESC(_name, NLA_U16)
 #define NLATTR_DESC_U32(_name) NLATTR_DESC(_name, NLA_U32)
+#define NLATTR_DESC_U64(_name) NLATTR_DESC(_name, NLA_U64)
 #define NLATTR_DESC_X8(_name)  NLATTR_DESC(_name, NLA_X8)
 #define NLATTR_DESC_X16(_name) NLATTR_DESC(_name, NLA_X16)
 #define NLATTR_DESC_X32(_name) NLATTR_DESC(_name, NLA_X32)
+#define NLATTR_DESC_X64(_name) NLATTR_DESC(_name, NLA_X64)
 #define NLATTR_DESC_S8(_name)  NLATTR_DESC(_name, NLA_U8)
 #define NLATTR_DESC_S16(_name) NLATTR_DESC(_name, NLA_U16)
 #define NLATTR_DESC_S32(_name) NLATTR_DESC(_name, NLA_U32)
+#define NLATTR_DESC_S64(_name) NLATTR_DESC(_name, NLA_S64)
 #define NLATTR_DESC_STRING(_name)  NLATTR_DESC(_name, NLA_STRING)
 #define NLATTR_DESC_FLAG(_name)NLATTR_DESC(_name, NLA_FLAG)
 #define NLATTR_DESC_BOOL(_name)NLATTR_DESC(_name, NLA_BOOL)
-- 
2.28.0



[PATCH ethtool 0/4] add missing message descriptions

2020-10-19 Thread Michal Kubecek
Add message descriptions needed for pretty printing of netlink messages
related to recently added features: genetlink policy dumps and pause frame
statistics. First two patches extend pretty printing code with features
used by these descriptions: support for 64-bit numeric attributes and
and enumerated types (shown as symbolic names rather than numeric values).

Michal Kubecek (4):
  netlink: support u32 enumerated types in pretty printing
  netlink: support 64-bit attribute types in pretty printed messages
  netlink: add descriptions for genetlink policy dumps
  netlink: add message descriptions for pause stats

 netlink/desc-ethtool.c  |  7 +
 netlink/desc-genlctrl.c | 57 +
 netlink/prettymsg.c | 19 ++
 netlink/prettymsg.h | 24 +++--
 4 files changed, 105 insertions(+), 2 deletions(-)

-- 
2.28.0



Re: [PATCH net-next 1/6] ethtool: Extend link modes settings uAPI with lanes

2020-10-21 Thread Michal Kubecek
On Tue, Oct 20, 2020 at 07:39:13AM +, Danielle Ratson wrote:
> > -Original Message-
> > From: Michal Kubecek 
> > Sent: Monday, October 19, 2020 4:25 PM
> > 
> > As I said, I meant the extension suggested in my mail as independent of what
> > this series is about. For lanes count selector, I find proposed
> > 
> >   ethtool -s  ... lanes  ...
> > 
> > the most natural.
> > 
> > From purely syntactic/semantic point of view, there are three types of
> > requests:
> > 
> >   (1) enable specific set of modes, disable the rest
> >   (2) enable/disable specific modes, leave the rest as they are
> >   (3) enable modes matching a condition (and disable the rest)
> > 
> > What I proposed was to allow the use symbolic names instead of masks
> > (which are getting more and more awful with each new mode) also for (1),
> > like they can already be used for (2).
> > 
> > The lanes selector is an extension of (3) which I would prefer not to mix 
> > with
> > (1) or (2) within one command line, i.e. either "advertise" or "speed / 
> > duplex
> > / lanes".
> > 
> > IIUC Jakub's and Andrew's comments were not so much about the syntax
> > and semantic (which is quite clear) but rather about the question if the
> > requests like "advertise exactly the modes with (100Gb/s speed and) two
> > lanes" would really address a real life need and wouldn't be more often used
> > as shortcuts for "advertise 10baseKR2/Full". (On the other hand, I
> > suspect existing speed and duplex selectors are often used the same way.)
> > 
> > Michal
> 
> So, do you want to change the current approach somehow or we are good
> to go with this one, keeping in mind the future extension you have
> suggested? 

As far as I'm concerned, it makes sense as it is. The only thing I'm not
happy about is ETHTOOL_A_LINKMODES_LANES being a "write only" attribute
(unlike _SPEED and _DUPLEX) but being able to query this information
would require extensive changes far beyond the scope of this series.

Michal


Re: [PATCH net-next 1/6] ethtool: Extend link modes settings uAPI with lanes

2020-10-21 Thread Michal Kubecek
On Wed, Oct 21, 2020 at 07:20:22AM +, Danielle Ratson wrote:
> > -Original Message-
> > From: Michal Kubecek 
> > Sent: Wednesday, October 21, 2020 10:08 AM
> > 
> > On Tue, Oct 20, 2020 at 07:39:13AM +, Danielle Ratson wrote:
> > > > -Original Message-
> > > > From: Michal Kubecek 
> > > > Sent: Monday, October 19, 2020 4:25 PM
> > > >
> > > > As I said, I meant the extension suggested in my mail as independent
> > > > of what this series is about. For lanes count selector, I find
> > > > proposed
> > > >
> > > >   ethtool -s  ... lanes  ...
> > > >
> > > > the most natural.
> > > >
> > > > From purely syntactic/semantic point of view, there are three types
> > > > of
> > > > requests:
> > > >
> > > >   (1) enable specific set of modes, disable the rest
> > > >   (2) enable/disable specific modes, leave the rest as they are
> > > >   (3) enable modes matching a condition (and disable the rest)
> > > >
> > > > What I proposed was to allow the use symbolic names instead of masks
> > > > (which are getting more and more awful with each new mode) also for
> > > > (1), like they can already be used for (2).
> > > >
> > > > The lanes selector is an extension of (3) which I would prefer not
> > > > to mix with
> > > > (1) or (2) within one command line, i.e. either "advertise" or
> > > > "speed / duplex / lanes".
> > > >
> > > > IIUC Jakub's and Andrew's comments were not so much about the syntax
> > > > and semantic (which is quite clear) but rather about the question if
> > > > the requests like "advertise exactly the modes with (100Gb/s speed
> > > > and) two lanes" would really address a real life need and wouldn't
> > > > be more often used as shortcuts for "advertise 10baseKR2/Full".
> > > > (On the other hand, I suspect existing speed and duplex selectors
> > > > are often used the same way.)
> > > >
> > > > Michal
> > >
> > > So, do you want to change the current approach somehow or we are good
> > > to go with this one, keeping in mind the future extension you have
> > > suggested?
> > 
> > As far as I'm concerned, it makes sense as it is. The only thing I'm not 
> > happy
> > about is ETHTOOL_A_LINKMODES_LANES being a "write only" attribute
> > (unlike _SPEED and _DUPLEX) but being able to query this information would
> > require extensive changes far beyond the scope of this series.
> 
> If I understood you correctly, patch #5 does that query, isn't it?
> "mlxsw: ethtool: Expose the number of lanes in use"

Ah, right, it does. But as you extend struct ethtool_link_ksettings and
drivers will need to be updated to provide this information, wouldn't it
be more useful to let the driver provide link mode in use instead (and
derive number of lanes from it)?

Michal


Re: [PATCH net-next 1/6] ethtool: Extend link modes settings uAPI with lanes

2020-10-22 Thread Michal Kubecek
On Thu, Oct 22, 2020 at 06:15:48AM +, Danielle Ratson wrote:
> > -Original Message-
> > From: Michal Kubecek 
> > Sent: Wednesday, October 21, 2020 11:48 AM
> > 
> > Ah, right, it does. But as you extend struct ethtool_link_ksettings
> > and drivers will need to be updated to provide this information,
> > wouldn't it be more useful to let the driver provide link mode in
> > use instead (and derive number of lanes from it)?
> 
> This is the way it is done with the speed parameter, so I have aligned
> it to it. Why the lanes should be done differently comparing to the
> speed?

Speed and duplex have worked this way since ages and the interface was
probably introduced back in times when combination of speed and duplex
was sufficient to identify the link mode. This is no longer the case and
even adding number of lanes wouldn't make the combination unique. So if
we are going to extend the interface now and update drivers to provide
extra information, I believe it would be more useful to provide full
information.

Michal


Re: [RFC PATCH net-next v3] ethtool: Improve compatibility between netlink and ioctl interfaces

2020-10-28 Thread Michal Kubecek
On Tue, Oct 27, 2020 at 02:53:05PM -0700, Jakub Kicinski wrote:
> On Tue, 27 Oct 2020 16:51:14 +0200 Ido Schimmel wrote:
> > From: Ido Schimmel 
> > 
> > With the ioctl interface, when autoneg is enabled, but without
> > specifying speed, duplex or link modes, the advertised link modes are
> > set to the supported link modes by the ethtool user space utility.
> 
> > With the netlink interface, the same thing is done by the kernel, but
> > only if speed or duplex are specified. In which case, the advertised
> > link modes are set by traversing the supported link modes and picking
> > the ones matching the specified speed or duplex.
> 
> > Fix this incompatibility problem by introducing a new flag in the
> > ethtool netlink request header: 'ETHTOOL_FLAG_LEGACY'. The purpose of
> > the flag is to indicate to the kernel that it needs to be compatible
> > with the legacy ioctl interface. A patch to the ethtool user space
> > utility will make sure the flag is set, when supported by the kernel.
> 
> I did not look at the legacy code but I'm confused by what you wrote.
> 
> IIUC for ioctl it's the user space that sets the advertised.
> For netlink it's the kernel.
> So how does the legacy flag make the kernel behave like it used to?

The idea why I suggested "legacy" as the name was that it allowed
ethtool to preserve the old behaviour (without having to query for
supported modes first). But from this point of view it's indeed a bit
confusing.

> If anything LEGACY should mean - don't populate advertised at all,
> user space will do it.

I would prefer not inverting the flag so that at least for the netlink
API, the default semantics would be that ETHTOOL_A_LINKMODES_AUTONEG=1
without other attributes means "enable autonegotiation" as expected
(without touching other settings).

> Also the semantics of a "LEGACY" flag are a little loose for my taste,
> IMHO a new flag attr would be cleaner. ETHTOOL_A_LINKMODES_AUTO_POPULATE?
> But no strong feelings.

Actually, when I suggested using a flag, I had a request specific flag
in mind, not a global one. As for the name, how about
ETHTOOL_A_LINKMODES_ADVERTISE_ALL? It should be probably forbidden to
combine it with ETHTOOL_A_LINKMODES_OURS then.

Michal


Re: [RFC PATCH net-next v3] ethtool: Improve compatibility between netlink and ioctl interfaces

2020-10-29 Thread Michal Kubecek
On Wed, Oct 28, 2020 at 07:34:36PM +0200, Ido Schimmel wrote:
> On Wed, Oct 28, 2020 at 01:53:39AM +0100, Michal Kubecek wrote:
> > On Tue, Oct 27, 2020 at 02:53:05PM -0700, Jakub Kicinski wrote:
> > > 
> > > I did not look at the legacy code but I'm confused by what you wrote.
> > > 
> > > IIUC for ioctl it's the user space that sets the advertised.
> > > For netlink it's the kernel.
> > > So how does the legacy flag make the kernel behave like it used to?
> > 
> > The idea why I suggested "legacy" as the name was that it allowed
> > ethtool to preserve the old behaviour (without having to query for
> > supported modes first). But from this point of view it's indeed a bit
> > confusing.
> 
> I think it would be best to solve this by having user space query the
> kernel for supported link modes if autoneg is being enabled without
> additional parameters. Then user space will issue a set request with
> ETHTOOL_A_LINKMODES_OURS being set to all supported link modes.
> 
> It does not require kernel changes and would be easier on users that
> currently need to resort to old ethtool despite having a kernel that
> supports netlink-based ethtool.

That would certainly be a solution. I'm not exactly happy about having
to issue two requests but (1) it would be limited to specific case with
"autoneg on" without advertise, speed and duplex (and lanes, when/if
it's introduced), (2) we would need an extra request to check support of
the flag anyway and (3) supported modes of a device are unlikely to
change so that we don't have to worry about races.

Michal


Re: [RFC PATCH ethtool] ethtool: Improve compatibility between netlink and ioctl interfaces

2020-11-02 Thread Michal Kubecek
On Mon, Nov 02, 2020 at 08:40:36PM +0200, Ido Schimmel wrote:
> From: Ido Schimmel 
> 
> With the ioctl interface, when autoneg is enabled, but without
> specifying speed, duplex or link modes, the advertised link modes are
> set to the supported link modes by the ethtool user space utility.
> 
> This does not happen when using the netlink interface. Fix this
> incompatibility problem by having ethtool query the supported link modes
> from the kernel and advertise all of them when only "autoneg on" is
> specified.
> 
> Before:
> 
> # ethtool -s eth0 advertise 0xC autoneg on
> # ethtool -s eth0 autoneg on
> # ethtool eth0
> Settings for eth0:
>   Supported ports: [ TP ]
>   Supported link modes:   10baseT/Half 10baseT/Full
>   100baseT/Half 100baseT/Full
>   1000baseT/Full
>   Supported pause frame use: No
>   Supports auto-negotiation: Yes
>   Supported FEC modes: Not reported
>   Advertised link modes:  100baseT/Half 100baseT/Full
>   Advertised pause frame use: No
>   Advertised auto-negotiation: Yes
>   Advertised FEC modes: Not reported
>   Speed: 1000Mb/s
>   Duplex: Full
>   Auto-negotiation: on
>   Port: Twisted Pair
>   PHYAD: 0
>   Transceiver: internal
>   MDI-X: off (auto)
>   Supports Wake-on: umbg
>   Wake-on: d
> Current message level: 0x0007 (7)
>drv probe link
>   Link detected: yes
> 
> After:
> 
> # ethtool -s eth0 advertise 0xC autoneg on
> # ethtool -s eth0 autoneg on
> # ethtool eth0
> Settings for eth0:
>   Supported ports: [ TP ]
>   Supported link modes:   10baseT/Half 10baseT/Full
>   100baseT/Half 100baseT/Full
>   1000baseT/Full
>   Supported pause frame use: No
>   Supports auto-negotiation: Yes
>   Supported FEC modes: Not reported
>   Advertised link modes:  10baseT/Half 10baseT/Full
>   100baseT/Half 100baseT/Full
>   1000baseT/Full
>   Advertised pause frame use: No
>   Advertised auto-negotiation: Yes
>   Advertised FEC modes: Not reported
>   Speed: 1000Mb/s
>   Duplex: Full
>   Auto-negotiation: on
>   Port: Twisted Pair
>   PHYAD: 0
>   Transceiver: internal
>   MDI-X: on (auto)
>   Supports Wake-on: umbg
>   Wake-on: d
> Current message level: 0x0007 (7)
>drv probe link
>   Link detected: yes
> 
> Signed-off-by: Ido Schimmel 
> ---
> Michal / Jakub, let me know if you see a better way. Sending as RFC
> since I want to run it through regression first.
> ---
>  netlink/settings.c | 115 +
>  1 file changed, 115 insertions(+)
> 
> diff --git a/netlink/settings.c b/netlink/settings.c
> index 41a2e5af1945..1f856b1b14d5 100644
> --- a/netlink/settings.c
> +++ b/netlink/settings.c
> @@ -1110,6 +1110,113 @@ static const struct param_parser sset_params[] = {
>   {}
>  };
>  
> +static bool sset_is_autoneg_only(const struct nl_context *nlctx)
> +{
> + return nlctx->argc == 2 && !strcmp(nlctx->argp[0], "autoneg") &&
> +!strcmp(nlctx->argp[1], "on");
> +}

This would only return true if there is only "autoneg on" on command
line; the ioctl parser fills all supported modes whenever none of
"speed", "duplex" and "advertise" is present, even if there are other
parameters not related to advertised modes selection (e.g. "wol").

Doing this properly from command line would require us to duplicate the
parser here which would be very inconvenient and impractical w.r.t.
future extensions.

What would probably make more sense would be modifying nl_parse() not to
send the messages itself in PARSER_GROUP_MSG case but return them back
to caller (like it does in other cases when there is only one message).
Then we could check the ETHTOOL_MSG_LINKMODES_SET message (if there is
one) for presence of ETHTOOL_A_LINKMODES_AUTONEG,
ETHTOOL_A_LINKMODES_OURS, ETHTOOL_A_LINKMODES_SPEED and
ETHTOOL_A_LINKMODES_DUPLEX (and value of ETHTOOL_A_LINKMODES_AUTONEG)
and add ETHTOOL_A_LINKMODES_OURS if needed.

I'll prepare a proof of concept of such nl_parse() rework. Fortunately
this is the only subcommand using PARSER_GROUP_MSG so that there are no
other callers that would need adjusting.

> +static int linkmodes_reply_adver_all_cb(const struct nlmsghdr *nlhdr,

  ^ advert?

> + void *data)
> +{
> + const struct nlattr *bitset_tb[ETHTOOL_A_BITSET_MAX + 1] = {};
> + const struct nlattr *tb[ETHTOOL_A_LINKMODES_MAX + 1] = {};
> + DECLARE_ATTR_TB_INFO(bitset_tb);
> + struct nl_context *nlctx = data;
> + struct nl_msg_buff *msgbuff;
> + DECLARE_ATTR_TB_INFO(tb);
> + struct nl_socket *nlsk;
> + struct nlattr *nest;
> + int ret;
> +
> + ret = mnl_att

Re: [RFC PATCH ethtool] ethtool: Improve compatibility between netlink and ioctl interfaces

2020-11-03 Thread Michal Kubecek
On Tue, Nov 03, 2020 at 04:24:30PM +0200, Ido Schimmel wrote:
> 
> I have the changes you requested here:
> https://github.com/idosch/ethtool/commit/b34d15839f2662808c566c04eda726113e20ee59
> 
> Do you want to integrate it with your nl_parse() rework or should I?

I pushed the combined series to

  https://git.kernel.org/pub/scm/linux/kernel/git/mkubecek/ethtool.git

as branch mk/master/advertise-all. I only ran few quick tests so far,
it's not submission ready yet.

First two patches are unrelated fixes found while testing, I'm going to
submit and push them separately. Third patch reworks nl_parser()
handling of multiple request messages as indicated in my previous mail.
Fourth patch is the ioctl compatibility fix.

Michal


Re: [PATCH net v3] net: sched: fix packet stuck problem for lockless qdisc

2021-04-18 Thread Michal Kubecek
On Thu, Mar 25, 2021 at 11:13:11AM +0800, Yunsheng Lin wrote:
> Lockless qdisc has below concurrent problem:
> cpu0 cpu1
>  . .
> q->enqueue .
>  . .
> qdisc_run_begin()  .
>  . .
> dequeue_skb()  .
>  . .
> sch_direct_xmit()  .
>  . .
>  .q->enqueue
>  . qdisc_run_begin()
>  .return and do nothing
>  . .
> qdisc_run_end().
> 
> cpu1 enqueue a skb without calling __qdisc_run() because cpu0
> has not released the lock yet and spin_trylock() return false
> for cpu1 in qdisc_run_begin(), and cpu0 do not see the skb
> enqueued by cpu1 when calling dequeue_skb() because cpu1 may
> enqueue the skb after cpu0 calling dequeue_skb() and before
> cpu0 calling qdisc_run_end().
> 
> Lockless qdisc has below another concurrent problem when
> tx_action is involved:
> 
> cpu0(serving tx_action) cpu1 cpu2
>   .   ..
>   .  q->enqueue.
>   .qdisc_run_begin()   .
>   .  dequeue_skb() .
>   .   .q->enqueue
>   .   ..
>   . sch_direct_xmit()  .
>   .   . qdisc_run_begin()
>   .   .   return and do nothing
>   .   ..
>  clear __QDISC_STATE_SCHED..
>  qdisc_run_begin()..
>  return and do nothing..
>   .   ..
>   .qdisc_run_end() .
> 
> This patch fixes the above data race by:
> 1. Get the flag before doing spin_trylock().
> 2. If the first spin_trylock() return false and the flag is not
>set before the first spin_trylock(), Set the flag and retry
>another spin_trylock() in case other CPU may not see the new
>flag after it releases the lock.
> 3. reschedule if the flags is set after the lock is released
>at the end of qdisc_run_end().
> 
> For tx_action case, the flags is also set when cpu1 is at the
> end if qdisc_run_end(), so tx_action will be rescheduled
> again to dequeue the skb enqueued by cpu2.
> 
> Only clear the flag before retrying a dequeuing when dequeuing
> returns NULL in order to reduce the overhead of the above double
> spin_trylock() and __netif_schedule() calling.
> 
> The performance impact of this patch, tested using pktgen and
> dummy netdev with pfifo_fast qdisc attached:
> 
>  threads  without+this_patch   with+this_patch  delta
> 12.61Mpps2.60Mpps   -0.3%
> 23.97Mpps3.82Mpps   -3.7%
> 45.62Mpps5.59Mpps   -0.5%
> 82.78Mpps2.77Mpps   -0.3%
>162.22Mpps2.22Mpps   -0.0%
> 
> Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking")
> Signed-off-by: Yunsheng Lin 
> ---
> V3: fix a compile error and a few comment typo, remove the
> __QDISC_STATE_DEACTIVATED checking, and update the
> performance data.
> V2: Avoid the overhead of fixing the data race as much as
> possible.

I tried this patch o top of 5.12-rc7 with real devices. I used two
machines with 10Gb/s Intel ixgbe NICs, sender has 16 CPUs (2 8-core CPUs
with HT disabled) and 16 Rx/Tx queues, receiver has 48 CPUs (2 12-core
CPUs with HT enabled) and 48 Rx/Tx queues. With multiple TCP streams on
a saturated ethernet, the CPU consumption grows quite a lot:

threads unpatched 5.12-rc75.12-rc7 + v3   
  1   25.6%   30.6%
  8   73.1%  241.4%
128  132.2% 1012.0%

(The values are normalized to one core, i.e. 100% corresponds to one
fully used logical CPU.) I didn't perform a full statistical evaluation
but the growth is way beyond any statistical fluctuation with one
exception: 8-thread test of patched kernel showed values from 155.5% to
311.4%. Closer look shows that most of the CPU time was spent in softirq
and running top in parallel with the test confirms that there are
multiple ksofirqd threads running at 100% CPU. I had similar problem
with earlier versions of my patch (work in progress, I still need to
check some corner cases and write commit message explaining the logic)
and tracing confirmed that similar problem (non-empty queue, no other
thread going to clean it up but device queue stopped) was happening
repeatedly most of the time.

The biggest problem IMHO is that the loop in __qdisc_run() may finish
without rescheduling not only when the qdisc queue is empty but also
when the corresponding device Tx queue is stopped whic

Re: [PATCH net v3] net: sched: fix packet stuck problem for lockless qdisc

2021-04-19 Thread Michal Kubecek
On Mon, Apr 19, 2021 at 10:04:27AM +0800, Yunsheng Lin wrote:
> > 
> > I tried this patch o top of 5.12-rc7 with real devices. I used two
> > machines with 10Gb/s Intel ixgbe NICs, sender has 16 CPUs (2 8-core CPUs
> > with HT disabled) and 16 Rx/Tx queues, receiver has 48 CPUs (2 12-core
> > CPUs with HT enabled) and 48 Rx/Tx queues. With multiple TCP streams on
> > a saturated ethernet, the CPU consumption grows quite a lot:
> > 
> > threads unpatched 5.12-rc75.12-rc7 + v3   
> >   1   25.6%   30.6%
> >   8   73.1%  241.4%
> > 128  132.2% 1012.0%
> > 
> > (The values are normalized to one core, i.e. 100% corresponds to one
> > fully used logical CPU.) I didn't perform a full statistical evaluation
> > but the growth is way beyond any statistical fluctuation with one
> > exception: 8-thread test of patched kernel showed values from 155.5% to
> > 311.4%. Closer look shows that most of the CPU time was spent in softirq
> > and running top in parallel with the test confirms that there are
> > multiple ksofirqd threads running at 100% CPU. I had similar problem
> > with earlier versions of my patch (work in progress, I still need to
> > check some corner cases and write commit message explaining the logic)
> 
> Great, if there is a better idea, maybe share the core idea first so
> that we both can work on the that?

I'm not sure if it's really better but to minimize the false positives
and unnecessary calls to __netif_schedule(), I replaced q->seqlock with
an atomic combination of a "running" flag (which corresponds to current
seqlock being locked) and a "drainers" count (number of other threads
going to clean up the qdisc queue). This way we could keep track of them
and get reliable information if another thread is going to run a cleanup
after we leave the qdisc_run() critical section (so that there is no
need to schedule).

> > The biggest problem IMHO is that the loop in __qdisc_run() may finish
> > without rescheduling not only when the qdisc queue is empty but also
> > when the corresponding device Tx queue is stopped which devices tend to
> > do whenever they cannot send any more packets out. Thus whenever
> > __QDISC_STATE_NEED_RESCHEDULE is set with device queue stopped or
> > frozen, we keep rescheduling the queue cleanup without any chance to
> > progress or clear __QDISC_STATE_NEED_RESCHEDULE. For this to happen, all
> > we need is another thready to fail the first spin_trylock() while device
> > queue is stopped and qdisc queue not empty.
> 
> Yes, We could just return false before doing the second spin_trylock() if
> the netdev queue corresponding qdisc is stopped, and when the netdev queue
> is restarted, __netif_schedule() is called again, see netif_tx_wake_queue().
> 
> Maybe add a sch_qdisc_stopped() function and do the testting in 
> qdisc_run_begin:
> 
> if (dont_retry || sch_qdisc_stopped())
>   return false;
> 
> bool sch_qdisc_stopped(struct Qdisc *q)
> {
>   const struct netdev_queue *txq = q->dev_queue;
> 
>   if (!netif_xmit_frozen_or_stopped(txq))
>   return true;
> 
>   reture false;
> }
> 
> At least for qdisc with TCQ_F_ONETXQUEUE flags set is doable?

Either this or you can do the check in qdisc_run_end() - when the device
queue is stopped or frozen, there is no need to schedule as we know it's
going to be done when the flag is cleared again (and we cannot do
anything until then anyway).

> > Another part of the problem may be that to avoid the race, the logic is
> > too pessimistic: consider e.g. (dotted lines show "barriers" where
> > ordering is important):
> > 
> > CPU ACPU B
> > spin_trylock() succeeds
> >  pfifo_fast_enqueue()
> > ..
> > skb_array empty, exit loop
> >  first spin_trylock() fails
> >  set __QDISC_STATE_NEED_RESCHEDULE
> >  second spin_trylock() fails
> > ..
> > spin_unlock()
> > call __netif_schedule()
> > 
> > When we switch the order of spin_lock() on CPU A and second
> > spin_trylock() on CPU B (but keep setting __QDISC_STATE_NEED_RESCHEDULE
> > before CPU A tests it), we end up scheduling a queue cleanup even if
> > there is already one running. And either of these is quite realistic.
> 
> But I am not sure it is a good thing or bad thing when the above happen,
> because it may be able to enable the tx batching?

In either of the two scenarios, we call __netif_schedule() to schedule
a cleanup which does not do anything useful. In first, the qdics queue
is empty so that either the scheduled cleanup finds it empty or there
will be some new packets which would have their own cleanup. In second,
scheduling a cleanu

Re: [Linuxarm] Re: [PATCH net v3] net: sched: fix packet stuck problem for lockless qdisc

2021-04-19 Thread Michal Kubecek
On Mon, Apr 19, 2021 at 08:21:38PM +0800, Yunsheng Lin wrote:
> On 2021/4/19 10:04, Yunsheng Lin wrote:
> > On 2021/4/19 6:59, Michal Kubecek wrote:
> >> I tried this patch o top of 5.12-rc7 with real devices. I used two
> >> machines with 10Gb/s Intel ixgbe NICs, sender has 16 CPUs (2 8-core CPUs
> >> with HT disabled) and 16 Rx/Tx queues, receiver has 48 CPUs (2 12-core
> >> CPUs with HT enabled) and 48 Rx/Tx queues. With multiple TCP streams on
> >> a saturated ethernet, the CPU consumption grows quite a lot:
> >>
> >> threads unpatched 5.12-rc75.12-rc7 + v3   
> >>   1   25.6%   30.6%
> >>   8   73.1%  241.4%
> >> 128  132.2% 1012.0%
> 
> I do not really read the above number, but I understand that v3 has a cpu
> usage impact when it is patched to 5.12-rc7, so I do a test too on a arm64
> system with a hns3 100G netdev, which is in node 0, and node 0 has 32 cpus.
> 
> root@(none)$ cat /sys/class/net/eth4/device/numa_node
> 0
> root@(none)$ numactl -H
> available: 4 nodes (0-3)
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
> 25 26 27 28 29 30 31
> node 0 size: 128646 MB
> node 0 free: 127876 MB
> node 1 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 
> 53 54 55 56 57 58 59 60 61 62 63
> node 1 size: 129019 MB
> node 1 free: 128796 MB
> node 2 cpus: 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 
> 85 86 87 88 89 90 91 92 93 94 95
> node 2 size: 129019 MB
> node 2 free: 128755 MB
> node 3 cpus: 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 
> 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
> node 3 size: 127989 MB
> node 3 free: 127772 MB
> node distances:
> node   0   1   2   3
>   0:  10  16  32  33
>   1:  16  10  25  32
>   2:  32  25  10  16
>   3:  33  32  16  10
> 
> and I use below cmd to only use 16 tx/rx queue with 72 tx queue depth in
> order to trigger the tx queue stopping handling:
> ethtool -L eth4 combined 16
> ethtool -G eth4 tx 72
> 
> threads   unpatched  patched_v4patched_v4+queue_stopped_opt
>16  11% (si: 3.8%) 20% (si:13.5%)   11% (si:3.8%)
>32  12% (si: 4.4%) 30% (si:22%) 13% (si:4.4%)
>64  13% (si: 4.9%) 35% (si:26%) 13% (si:4.7%)
> 
> "11% (si: 3.8%)": 11% means the total cpu useage in node 0, and 3.8%
> means the softirq cpu useage .
> And thread number is as below iperf cmd:
> taskset -c 0-31 iperf -c 192.168.100.2 -t 100 -i 1 -P *thread*

The problem I see with this is that iperf's -P option only allows
running the test in multiple connections but they do not actually run in
multiple threads. Therefore this may not result in as much concurrency
as it seems.

Also, 100Gb/s ethernet is not so easy to saturate, trying 10Gb/s or
1Gb/s might put more pressure on qdisc code concurrency.

Michal

> It seems after applying the queue_stopped_opt patch, the cpu usage
> is closed to the unpatch one, at least with my testcase, maybe you
> can try your testcase with the queue_stopped_opt patch to see if
> it make any difference?

I will, I was not aware of v4 submission. I'll write a short note to it
so that it does not accidentally get applied before we know for sure
what the CPU usage impact is.

Michal


Re: [PATCH net v4 1/2] net: sched: fix packet stuck problem for lockless qdisc

2021-04-19 Thread Michal Kubecek
On Fri, Apr 16, 2021 at 09:16:48AM +0800, Yunsheng Lin wrote:
> Lockless qdisc has below concurrent problem:
> cpu0 cpu1
>  . .
> q->enqueue .
>  . .
> qdisc_run_begin()  .
>  . .
> dequeue_skb()  .
>  . .
> sch_direct_xmit()  .
>  . .
>  .q->enqueue
>  . qdisc_run_begin()
>  .return and do nothing
>  . .
> qdisc_run_end().
> 
> cpu1 enqueue a skb without calling __qdisc_run() because cpu0
> has not released the lock yet and spin_trylock() return false
> for cpu1 in qdisc_run_begin(), and cpu0 do not see the skb
> enqueued by cpu1 when calling dequeue_skb() because cpu1 may
> enqueue the skb after cpu0 calling dequeue_skb() and before
> cpu0 calling qdisc_run_end().
> 
> Lockless qdisc has below another concurrent problem when
> tx_action is involved:
> 
> cpu0(serving tx_action) cpu1 cpu2
>   .   ..
>   .  q->enqueue.
>   .qdisc_run_begin()   .
>   .  dequeue_skb() .
>   .   .q->enqueue
>   .   ..
>   . sch_direct_xmit()  .
>   .   . qdisc_run_begin()
>   .   .   return and do nothing
>   .   ..
>  clear __QDISC_STATE_SCHED..
>  qdisc_run_begin()..
>  return and do nothing..
>   .   ..
>   .qdisc_run_end() .
> 
> This patch fixes the above data race by:
> 1. Test STATE_MISSED before doing spin_trylock().
> 2. If the first spin_trylock() return false and STATE_MISSED is
>not set before the first spin_trylock(), Set STATE_MISSED and
>retry another spin_trylock() in case other CPU may not see
>STATE_MISSED after it releases the lock.
> 3. reschedule if STATE_MISSED is set after the lock is released
>at the end of qdisc_run_end().
> 
> For tx_action case, STATE_MISSED is also set when cpu1 is at the
> end if qdisc_run_end(), so tx_action will be rescheduled again
> to dequeue the skb enqueued by cpu2.
> 
> Clear STATE_MISSED before retrying a dequeuing when dequeuing
> returns NULL in order to reduce the overhead of the above double
> spin_trylock() and __netif_schedule() calling.
> 
> The performance impact of this patch, tested using pktgen and
> dummy netdev with pfifo_fast qdisc attached:
> 
>  threads  without+this_patch   with+this_patch  delta
> 12.61Mpps2.60Mpps   -0.3%
> 23.97Mpps3.82Mpps   -3.7%
> 45.62Mpps5.59Mpps   -0.5%
> 82.78Mpps2.77Mpps   -0.3%
>162.22Mpps2.22Mpps   -0.0%
> 
> Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking")
> Signed-off-by: Yunsheng Lin 
> Tested-by: Juergen Gross 
> ---
> V4: Change STATE_NEED_RESCHEDULE to STATE_MISSED mirroring
> NAPI's NAPIF_STATE_MISSED, and add Juergen's "Tested-by"
> tag for there is only renaming and typo fixing between
> V4 and V3.
> V3: Fix a compile error and a few comment typo, remove the
> __QDISC_STATE_DEACTIVATED checking, and update the
> performance data.
> V2: Avoid the overhead of fixing the data race as much as
> possible.

As pointed out in the discussion on v3, this patch may result in
significantly higher CPU consumption with multiple threads competing on
a saturated outgoing device. I missed this submission so that I haven't
checked it yet but given the description of v3->v4 changes above, it's
quite likely that it suffers from the same problem.

Michal


Re: [PATCH net v4 1/2] net: sched: fix packet stuck problem for lockless qdisc

2021-04-19 Thread Michal Kubecek
On Mon, Apr 19, 2021 at 05:29:46PM +0200, Michal Kubecek wrote:
> 
> As pointed out in the discussion on v3, this patch may result in
> significantly higher CPU consumption with multiple threads competing on
> a saturated outgoing device. I missed this submission so that I haven't
> checked it yet but given the description of v3->v4 changes above, it's
> quite likely that it suffers from the same problem.

And it indeed does. However, with the additional patch from the v3
discussion, the numbers are approximately the same as with an unpatched
mainline kernel.

As with v4, I tried this patch on top of 5.12-rc7 with real devices.
I used two machines with 10Gb/s Intel ixgbe NICs, sender has 16 CPUs
(2 8-core CPUs with HT disabled) and 16 Rx/Tx queues, receiver has
48 CPUs (2 12-core CPUs with HT enabled) and 48 Rx/Tx queues.

  threads5.12-rc75.12-rc7 + v45.12-rc7 + v4 + stop
 125.1%  38.1%22.9%
 866.2% 277.0%74.1%
1690.1% 150.7%91.0%
32   107.2% 272.6%   108.3%
64   116.3% 487.5%   118.1%
   128   126.1% 946.7%   126.9%

(The values are normalized to one core, i.e. 100% corresponds to one
fully used logical CPU.)

So it seems that repeated scheduling while the queue was stopped is
indeed the main performance issue and that other cases of the logic
being too pessimistic do not play significant role. There is an
exception with 8 connections/threads and the result with just this
series also looks abnormally high (e.g. much higher than with
16 threads). It might be worth investigating what happens there and
what do the results with other thread counts around 8 look like.

I'll run some more tests with other traffic patterns tomorrow and
I'm also going to take a closer look at the additional patch.

Michal


Re: [PATCH net v4 1/2] net: sched: fix packet stuck problem for lockless qdisc

2021-04-20 Thread Michal Kubecek
On Tue, Apr 20, 2021 at 01:55:03AM +0200, Michal Kubecek wrote:
> On Mon, Apr 19, 2021 at 05:29:46PM +0200, Michal Kubecek wrote:
> > 
> > As pointed out in the discussion on v3, this patch may result in
> > significantly higher CPU consumption with multiple threads competing on
> > a saturated outgoing device. I missed this submission so that I haven't
> > checked it yet but given the description of v3->v4 changes above, it's
> > quite likely that it suffers from the same problem.
> 
> And it indeed does. However, with the additional patch from the v3
> discussion, the numbers are approximately the same as with an unpatched
> mainline kernel.
> 
> As with v4, I tried this patch on top of 5.12-rc7 with real devices.
> I used two machines with 10Gb/s Intel ixgbe NICs, sender has 16 CPUs
> (2 8-core CPUs with HT disabled) and 16 Rx/Tx queues, receiver has
> 48 CPUs (2 12-core CPUs with HT enabled) and 48 Rx/Tx queues.
> 
>   threads5.12-rc75.12-rc7 + v45.12-rc7 + v4 + stop
>  125.1%  38.1%22.9%
>  866.2% 277.0%74.1%
> 1690.1% 150.7%91.0%
> 32   107.2% 272.6%   108.3%
> 64   116.3% 487.5%   118.1%
>128   126.1% 946.7%   126.9%
> 
> (The values are normalized to one core, i.e. 100% corresponds to one
> fully used logical CPU.)

I repeated the tests few more times and with more iterations and it
seems the problem rather was that the CPU utilization numbers are not
very stable, in particular with number of connections/threads close to
the number of CPUs and Tx queues. Refined results (and also other tests)
show that full 3-patch series performs similar to unpatched 5.12-rc7
(within the margin of statistical error).

However, I noticed something disturbing in the results of a simple
1-thread TCP_STREAM test (client sends data through a TCP connection to
server using long writes, we measure the amount of data received by the
server):

  server: 172.17.1.1, port 12543
  iterations: 20, threads: 1, test length: 30
  test: TCP_STREAM, message size: 1048576
  
  1 927403548.4 B/s,  avg   927403548.4 B/s, mdev   0.0 B/s (  0.0%)
  21176317172.1 B/s,  avg  1051860360.2 B/s, mdev   124456811.8 B/s ( 
11.8%), confid. +/-  1581348251.3 B/s (150.3%)
  3 927335837.8 B/s,  avg  1010352186.1 B/s, mdev   117354970.3 B/s ( 
11.6%), confid. +/-   357073677.2 B/s ( 35.3%)
  41176728045.1 B/s,  avg  1051946150.8 B/s, mdev   124576544.7 B/s ( 
11.8%), confid. +/-   228863127.8 B/s ( 21.8%)
  51176788216.3 B/s,  avg  1076914563.9 B/s, mdev   122102985.3 B/s ( 
11.3%), confid. +/-   169478943.5 B/s ( 15.7%)
  61158167055.1 B/s,  avg  1090456645.8 B/s, mdev   115504209.5 B/s ( 
10.6%), confid. +/-   132805140.8 B/s ( 12.2%)
  71176243474.4 B/s,  avg  1102711907.0 B/s, mdev   111069717.1 B/s ( 
10.1%), confid. +/-   110956822.2 B/s ( 10.1%)
  81176771142.8 B/s,  avg  969311.5 B/s, mdev   106744173.5 B/s (  
9.6%), confid. +/-95417120.0 B/s (  8.6%)
  91176206364.6 B/s,  avg  1119106761.8 B/s, mdev   102644185.2 B/s (  
9.2%), confid. +/-83685200.5 B/s (  7.5%)
  10   1175888409.4 B/s,  avg  1124784926.6 B/s, mdev9880.5 B/s (  
8.8%), confid. +/-74537085.1 B/s (  6.6%)
  11   1176541407.6 B/s,  avg  1129490061.2 B/s, mdev9544.8 B/s (  
8.4%), confid. +/-67230249.7 B/s (  6.0%)
  12934185352.8 B/s,  avg  1113214668.9 B/s, mdev   106114984.5 B/s (  
9.5%), confid. +/-70420712.5 B/s (  6.3%)
  13   1176550558.1 B/s,  avg  1118086660.3 B/s, mdev   103339448.9 B/s (  
9.2%), confid. +/-65002902.4 B/s (  5.8%)
  14   1176521808.8 B/s,  avg  1122260599.5 B/s, mdev   100711151.3 B/s (  
9.0%), confid. +/-60333655.0 B/s (  5.4%)
  15   1176744840.8 B/s,  avg  1125892882.3 B/s, mdev98240838.2 B/s (  
8.7%), confid. +/-56319052.3 B/s (  5.0%)
  16   1176593778.5 B/s,  avg  1129061688.3 B/s, mdev95909740.8 B/s (  
8.5%), confid. +/-52771633.5 B/s (  4.7%)
  17   1176583967.4 B/s,  avg  1131857116.5 B/s, mdev93715582.2 B/s (  
8.3%), confid. +/-49669258.6 B/s (  4.4%)
  18   1176853301.8 B/s,  avg  1134356904.5 B/s, mdev91656530.2 B/s (  
8.1%), confid. +/-46905244.8 B/s (  4.1%)
  19   1176592845.7 B/s,  avg  1136579848.8 B/s, mdev89709043.8 B/s (  
7.9%), confid. +/-44424855.9 B/s (  3.9%)
  20   1176608117.3 B/s,  avg  1138581262.2 B/s, mdev87871692.6 B/s (  
7.7%), confid. +/-42193098.5 B/s (  3.7%)
  all avg  1138581262.2 B/s, mdev87871692.6 B/s (  
7.7%), confid. +/-42193098.5 B/s (  3.7%)

Each line shows result of one 30 second long test and average, mean
deviation and 99% confidence interval half width through the iterations
so far. While 17 iteration results are essentially t

Re: [PATCH ethtool v3 0/5] Extend uAPI with lanes parameter

2021-02-14 Thread Michal Kubecek
On Wed, Feb 10, 2021 at 03:48:35PM +0200, Danielle Ratson wrote:
> Currently, there is no way of knowing how many lanes will be use to
> achieve a wanted speed.
> For example, 100G speed can be achieved using: 2X50 or 4X25.
> 
> In order to solve that, extend ethtool uAPI with lanes as a new link
> mode setting so the command below, for example, will be supported:
> $ ethtool -s swp5 lanes N
> 
> Patch #1: Update headers with the new parameter.
> Patch #2: Support lanes in netlink.
> Patch #3: Expose the number of lanes in use.
> Patch #4: Add auto-completion for lanes.
> Patch #5: Add lanes to man page.
> 
> v3:
>   * Add a seperated patch, patch #1, for uapi headers and squash
>   * the rest of it to patch #2.

Series applied, thank you.

Michal

> 
> Danielle Ratson (5):
>   update UAPI header copies
>   netlink: settings: Add netlink support for lanes parameter
>   netlink: settings: Expose the number of lanes in use
>   shell-completion: Add completion for lanes
>   man: Add man page for setting lanes parameter
> 
>  ethtool.8.in  |  4 
>  ethtool.c |  1 +
>  netlink/desc-ethtool.c|  1 +
>  netlink/settings.c| 13 +
>  shell-completion/bash/ethtool |  4 
>  uapi/linux/ethtool.h  |  2 +-
>  uapi/linux/ethtool_netlink.h  |  1 +
>  uapi/linux/if_link.h  | 10 --
>  uapi/linux/netlink.h  |  2 +-
>  uapi/linux/rtnetlink.h| 20 +++-
>  10 files changed, 49 insertions(+), 9 deletions(-)
> 
> -- 
> 2.26.2
> 


signature.asc
Description: PGP signature


[PATCH ethtool] ioctl: less confusing error message for master-slave parameter

2021-02-21 Thread Michal Kubecek
The fallback code issues a reasonable error message when a subcommand
implemented only via netlink would end up being processed by ioctl code,
e.g. because a new ethtool runs on an older kernel without netlink support.
But when a netlink only parameter is passed to subcommand which is
recognized by ioctl code in general, it is handled as an unknown one.

At the the moment, there is only one such parameter: master-slave for
'-s' subcommand. As it is not handled by the generic command line parser,
address this with a quick fix and leave updating the generic parser for
later.

Reported-by: Bruce LIU 
Signed-off-by: Michal Kubecek 
---
 ethtool.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/ethtool.c b/ethtool.c
index fb90e9e456b9..15e9d34831b3 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -72,6 +72,16 @@ static void exit_bad_args(void)
exit(1);
 }
 
+static void exit_nlonly_param(const char *name) __attribute__((noreturn));
+
+static void exit_nlonly_param(const char *name)
+{
+   fprintf(stderr,
+   "ethtool: parameter '%s' can be used only with netlink\n",
+   name);
+   exit(1);
+}
+
 typedef enum {
CMDL_NONE,
CMDL_BOOL,
@@ -3066,6 +3076,8 @@ static int do_sset(struct cmd_context *ctx)
ARRAY_SIZE(cmdline_msglvl));
break;
}
+   } else if (!strcmp(argp[i], "master-slave")) {
+   exit_nlonly_param(argp[i]);
} else {
exit_bad_args();
}
-- 
2.30.1



Re: [PATCH net] ethtool: fix the check logic of at least one channel for RX/TX

2021-02-23 Thread Michal Kubecek
On Tue, Feb 23, 2021 at 09:01:06AM -0800, Jakub Kicinski wrote:
> On Tue, 23 Feb 2021 14:24:40 +0100 Simon Horman wrote:
> > From: Yinjun Zhang 
> > 
> > The command "ethtool -L  combined 0" may clean the RX/TX channel
> > count and skip the error path, since the attrs
> > tb[ETHTOOL_A_CHANNELS_RX_COUNT] and tb[ETHTOOL_A_CHANNELS_TX_COUNT]
> > are NULL in this case when recent ethtool is used.
> > 
> > Tested using ethtool v5.10.
> > 
> > Fixes: 7be92514b99c ("ethtool: check if there is at least one channel for 
> > TX/RX in the core")
> > Signed-off-by: Yinjun Zhang 
> > Signed-off-by: Simon Horman 
> > Signed-off-by: Louis Peens 
> 
> Please make sure you CC Michal on ethtool patches.
> 
> > diff --git a/net/ethtool/channels.c b/net/ethtool/channels.c
> > index 25a9e566ef5c..e35ef627f61f 100644
> > --- a/net/ethtool/channels.c
> > +++ b/net/ethtool/channels.c
> > @@ -175,14 +175,14 @@ int ethnl_set_channels(struct sk_buff *skb, struct 
> > genl_info *info)
> >  
> > /* ensure there is at least one RX and one TX channel */
> > if (!channels.combined_count && !channels.rx_count)
> > -   err_attr = tb[ETHTOOL_A_CHANNELS_RX_COUNT];
> > +   err_attr = mod_combined ? tb[ETHTOOL_A_CHANNELS_COMBINED_COUNT] 
> > :
> > + tb[ETHTOOL_A_CHANNELS_RX_COUNT];
> > else if (!channels.combined_count && !channels.tx_count)
> > -   err_attr = tb[ETHTOOL_A_CHANNELS_TX_COUNT];
> > +   err_attr = mod_combined ? tb[ETHTOOL_A_CHANNELS_COMBINED_COUNT] 
> > :
> > + tb[ETHTOOL_A_CHANNELS_TX_COUNT];
> > else
> > err_attr = NULL;
> > if (err_attr) {
> > -   if (mod_combined)
> > -   err_attr = tb[ETHTOOL_A_CHANNELS_COMBINED_COUNT];
> > ret = -EINVAL;
> > NL_SET_ERR_MSG_ATTR(info->extack, err_attr, "requested channel 
> > counts would result in no RX or TX channel being configured");
> > goto out_ops;
> 
> In case driver decides to adjust max counts - I'd lean towards:

I missed this note while reading the e-mail for the first time so I
thouoght this was just a cleanup. You are right, this would address both
issues.

Michal

> 
> diff --git a/net/ethtool/channels.c b/net/ethtool/channels.c
> index 5635604cb9ba..73d267415819 100644
> --- a/net/ethtool/channels.c
> +++ b/net/ethtool/channels.c
> @@ -116,10 +116,10 @@ int ethnl_set_channels(struct sk_buff *skb, struct 
> genl_info *info)
> struct ethtool_channels channels = {};
> struct ethnl_req_info req_info = {};
> struct nlattr **tb = info->attrs;
> -   const struct nlattr *err_attr;
> const struct ethtool_ops *ops;
> struct net_device *dev;
> u32 max_rx_in_use = 0;
> +   u32 err_attr;
> int ret;
>  
> ret = ethnl_parse_header_dev_get(&req_info,
> @@ -157,34 +157,34 @@ int ethnl_set_channels(struct sk_buff *skb, struct 
> genl_info *info)
>  
> /* ensure new channel counts are within limits */
> if (channels.rx_count > channels.max_rx)
> -   err_attr = tb[ETHTOOL_A_CHANNELS_RX_COUNT];
> +   err_attr = ETHTOOL_A_CHANNELS_RX_COUNT;
> else if (channels.tx_count > channels.max_tx)
> -   err_attr = tb[ETHTOOL_A_CHANNELS_TX_COUNT];
> +   err_attr = ETHTOOL_A_CHANNELS_TX_COUNT;
> else if (channels.other_count > channels.max_other)
> -   err_attr = tb[ETHTOOL_A_CHANNELS_OTHER_COUNT];
> +   err_attr = ETHTOOL_A_CHANNELS_OTHER_COUNT;
> else if (channels.combined_count > channels.max_combined)
> -   err_attr = tb[ETHTOOL_A_CHANNELS_COMBINED_COUNT];
> +   err_attr = ETHTOOL_A_CHANNELS_COMBINED_COUNT;
> else
> -   err_attr = NULL;
> +   err_attr = 0;
> if (err_attr) {
> ret = -EINVAL;
> -   NL_SET_ERR_MSG_ATTR(info->extack, err_attr,
> +   NL_SET_ERR_MSG_ATTR(info->extack, tb[err_attr],
> "requested channel count exceeds 
> maximum");
> goto out_ops;
> }
>  
> /* ensure there is at least one RX and one TX channel */
> if (!channels.combined_count && !channels.rx_count)
> -   err_attr = tb[ETHTOOL_A_CHANNELS_RX_COUNT];
> +   err_attr = ETHTOOL_A_CHANNELS_RX_COUNT;
> else if (!channels.combined_count && !channels.tx_count)
> -   err_attr = tb[ETHTOOL_A_CHANNELS_TX_COUNT];
> +   err_attr = ETHTOOL_A_CHANNELS_TX_COUNT;
> else
> -   err_attr = NULL;
> +   err_attr = 0;
> if (err_attr) {
> if (mod_combined)
> -   err_attr = tb[ETHTOOL_A_CHANNELS_COMBINED_COUNT];
> +   err_attr = ETHTOOL_A_CHANNELS_COMBINED_COUNT;
> ret = -EINVAL;
> -   NL_SET_ERR_MSG_ATTR(info->extack, err_attr

Re: [PATCH net] ethtool: fix the check logic of at least one channel for RX/TX

2021-02-23 Thread Michal Kubecek
On Tue, Feb 23, 2021 at 02:24:40PM +0100, Simon Horman wrote:
> From: Yinjun Zhang 
> 
> The command "ethtool -L  combined 0" may clean the RX/TX channel
> count and skip the error path, since the attrs
> tb[ETHTOOL_A_CHANNELS_RX_COUNT] and tb[ETHTOOL_A_CHANNELS_TX_COUNT]
> are NULL in this case when recent ethtool is used.
> 
> Tested using ethtool v5.10.
> 
> Fixes: 7be92514b99c ("ethtool: check if there is at least one channel for 
> TX/RX in the core")
> Signed-off-by: Yinjun Zhang 
> Signed-off-by: Simon Horman 
> Signed-off-by: Louis Peens 

Reviewed-by: Michal Kubecek 

> ---
>  net/ethtool/channels.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/net/ethtool/channels.c b/net/ethtool/channels.c
> index 25a9e566ef5c..e35ef627f61f 100644
> --- a/net/ethtool/channels.c
> +++ b/net/ethtool/channels.c
> @@ -175,14 +175,14 @@ int ethnl_set_channels(struct sk_buff *skb, struct 
> genl_info *info)
>  
>   /* ensure there is at least one RX and one TX channel */
>   if (!channels.combined_count && !channels.rx_count)
> - err_attr = tb[ETHTOOL_A_CHANNELS_RX_COUNT];
> + err_attr = mod_combined ? tb[ETHTOOL_A_CHANNELS_COMBINED_COUNT] 
> :
> +   tb[ETHTOOL_A_CHANNELS_RX_COUNT];
>   else if (!channels.combined_count && !channels.tx_count)
> - err_attr = tb[ETHTOOL_A_CHANNELS_TX_COUNT];
> + err_attr = mod_combined ? tb[ETHTOOL_A_CHANNELS_COMBINED_COUNT] 
> :
> +   tb[ETHTOOL_A_CHANNELS_TX_COUNT];
>   else
>   err_attr = NULL;
>   if (err_attr) {
> - if (mod_combined)
> - err_attr = tb[ETHTOOL_A_CHANNELS_COMBINED_COUNT];
>   ret = -EINVAL;
>   NL_SET_ERR_MSG_ATTR(info->extack, err_attr, "requested channel 
> counts would result in no RX or TX channel being configured");
>   goto out_ops;
> -- 
> 2.20.1
> 


signature.asc
Description: PGP signature


Re: [PATCH net] ethtool: fix the check logic of at least one channel for RX/TX

2021-02-23 Thread Michal Kubecek
On Tue, Feb 23, 2021 at 05:02:06PM -0800, Jakub Kicinski wrote:
> On Wed, 24 Feb 2021 01:32:51 +0100 Michal Kubecek wrote:
> > On Tue, Feb 23, 2021 at 02:24:40PM +0100, Simon Horman wrote:
> > > From: Yinjun Zhang 
> > > 
> > > The command "ethtool -L  combined 0" may clean the RX/TX channel
> > > count and skip the error path, since the attrs
> > > tb[ETHTOOL_A_CHANNELS_RX_COUNT] and tb[ETHTOOL_A_CHANNELS_TX_COUNT]
> > > are NULL in this case when recent ethtool is used.
> > > 
> > > Tested using ethtool v5.10.
> > > 
> > > Fixes: 7be92514b99c ("ethtool: check if there is at least one channel for 
> > > TX/RX in the core")
> > > Signed-off-by: Yinjun Zhang 
> > > Signed-off-by: Simon Horman 
> > > Signed-off-by: Louis Peens   
> > 
> > Reviewed-by: Michal Kubecek 
> 
> IOW you prefer this to what I proposed?

No, that was my misunderstanding, please see my reply to your e-mail.

Michal


Re: Packet gets stuck in NOLOCK pfifo_fast qdisc

2021-04-06 Thread Michal Kubecek
On Tue, Apr 06, 2021 at 08:55:41AM +0800, Yunsheng Lin wrote:
> 
> Hi, Jiri
> Do you have a reproducer that can be shared here?
> With reproducer, I can debug and test it myself too.

I'm afraid we are not aware of a simple reproducer. As mentioned in the
original discussion, the race window is extremely small and the other
thread has to do quite a lot in the meantime which is probably why, as
far as I know, this was never observed on real hardware, only in
virtualization environments. NFS may also be important as, IIUC, it can
often issue an RPC request from a different CPU right after a data
transfer. Perhaps you could cheat a bit and insert a random delay
between the empty queue check and releasing q->seqlock to make it more
likely to happen.

Other than that, it's rather just "run this complex software in a xen VM
and wait".

Michal


Re: Packet gets stuck in NOLOCK pfifo_fast qdisc

2021-04-06 Thread Michal Kubecek
On Tue, Apr 06, 2021 at 10:46:29AM +0800, Yunsheng Lin wrote:
> On 2021/4/6 9:49, Cong Wang wrote:
> > On Sat, Apr 3, 2021 at 5:23 AM Jiri Kosina  wrote:
> >>
> >> I am still planning to have Yunsheng Lin's (CCing) fix [1] tested in the
> >> coming days. If it works, then we can consider proceeding with it,
> >> otherwise I am all for reverting the whole NOLOCK stuff.
> >>
> >> [1] 
> >> https://lore.kernel.org/linux-can/1616641991-14847-1-git-send-email-linyunsh...@huawei.com/T/#u
> > 
> > I personally prefer to just revert that bit, as it brings more troubles
> > than gains. Even with Yunsheng's patch, there are still some issues.
> > Essentially, I think the core qdisc scheduling code is not ready for
> > lockless, just look at those NOLOCK checks in sch_generic.c. :-/
> 
> I am also awared of the NOLOCK checks too:), and I am willing to
> take care of it if that is possible.
> 
> As the number of cores in a system is increasing, it is the trend
> to become lockless, right? Even there is only one cpu involved, the
> spinlock taking and releasing takes about 30ns on our arm64 system
> when CONFIG_PREEMPT_VOLUNTARY is enable(ip forwarding testing).

I agree with the benefits but currently the situation is that we have
a race condition affecting the default qdisc which is being hit in
production and can cause serious trouble which is made worse by commit
1f3279ae0c13 ("tcp: avoid retransmits of TCP packets hanging in host
queues") preventing the retransmits of the stuck packet being sent.

Perhaps rather than patching over current implementation which requires
more and more complicated hacks to work around the fact that we cannot
make the "queue is empty" check and leaving the critical section atomic,
it would make sense to reimplement it in a way which would allow us
making it atomic.

Michal



Re: [PATCH net v1 1/1] ethtool: fix incorrect datatype in set_eee ops

2021-04-06 Thread Michal Kubecek
On Tue, Apr 06, 2021 at 09:17:30PM +0800, Wong Vee Khee wrote:
> The member 'tx_lpi_timer' is defined with __u32 datatype in the ethtool
> header file. Hence, we should use ethnl_update_u32() in set_eee ops.

To be precise, the correct reason is that unlike .eee_enabled and
.tx_lpi_enabled, .tx_lpi_timer value is interpreted as a number, not
a logical value (those two are also __u32). But I don't think it's
necessary to resubmit.

Reviewed-by: Michal Kubecek 

> Fixes: fd77be7bd43c ("ethtool: set EEE settings with EEE_SET request")
> Cc:  # 5.10.x
> Cc: Michal Kubecek 
> Signed-off-by: Wong Vee Khee 
> ---
>  net/ethtool/eee.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ethtool/eee.c b/net/ethtool/eee.c
> index 901b7de941ab..e10bfcc07853 100644
> --- a/net/ethtool/eee.c
> +++ b/net/ethtool/eee.c
> @@ -169,8 +169,8 @@ int ethnl_set_eee(struct sk_buff *skb, struct genl_info 
> *info)
>   ethnl_update_bool32(&eee.eee_enabled, tb[ETHTOOL_A_EEE_ENABLED], &mod);
>   ethnl_update_bool32(&eee.tx_lpi_enabled,
>   tb[ETHTOOL_A_EEE_TX_LPI_ENABLED], &mod);
> - ethnl_update_bool32(&eee.tx_lpi_timer, tb[ETHTOOL_A_EEE_TX_LPI_TIMER],
> - &mod);
> + ethnl_update_u32(&eee.tx_lpi_timer, tb[ETHTOOL_A_EEE_TX_LPI_TIMER],
> +  &mod);
>   ret = 0;
>   if (!mod)
>   goto out_ops;
> -- 
> 2.25.1
> 


signature.asc
Description: PGP signature


Re: [PATCH net-next v2 3/9] ethtool: add a new command for reading standard stats

2021-04-17 Thread Michal Kubecek
On Sat, Apr 17, 2021 at 12:18:08PM -0700, Jakub Kicinski wrote:
> On Sat, 17 Apr 2021 12:15:20 -0700 Jakub Kicinski wrote:
> > On Sat, 17 Apr 2021 21:53:40 +0300 Ido Schimmel wrote:
> > > On Sat, Apr 17, 2021 at 11:13:51AM -0700, Jakub Kicinski wrote:  
> > > > On Sat, 17 Apr 2021 10:57:42 -0700 Jakub Kicinski wrote:
> > > >
> > > > FWIW ethnl_parse_bit() -> ETHTOOL_A_BITSET_BIT_NAME
> > > > User space can also use raw flags like --groups 0xf but that's perhaps
> > > > too spartan for serious use.
> > > 
> > > So the kernel can work with ETHTOOL_A_BITSET_BIT_INDEX /
> > > ETHTOOL_A_BITSET_BIT_NAME, but I was wondering if using ethtool binary
> > > we can query the strings that the kernel will accept. I think not?  
> 
> Heh, I misunderstood your question. You're asking if the strings can be
> queried from the command line.
> 
> No, I don't think so. We could add some form of "porcelain" command if
> needed.

We don't have such command but I think it would be useful. After all, as
you pointed out, the request is already implemented at UAPI level so all
we need is to make it available to user.

The syntax will be a bit tricky as some string sets are global and some
per device. Out of

ethtool --show-strings [devname] 
ethtool --show-strings [devname] set 
ethtool --show-strings  [dev ]

the third seems nicest but also least consistent with the rest of
ethtool command line syntax. So probably one of the first two.

Michal


Re: [PATCH ethtool] Fix help message for master-slave option

2021-01-24 Thread Michal Kubecek
On Wed, Jan 20, 2021 at 09:21:22PM +0900, Yuusuke Ashizuka wrote:
> Fixes: 558f7cc33daf ("netlink: add master/slave configuration support")
> Signed-off-by: Yuusuke Ashizuka 

Applied, thank you.

Michal

> ---
>  ethtool.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/ethtool.c b/ethtool.c
> index 585aafa..84a61f4 100644
> --- a/ethtool.c
> +++ b/ethtool.c
> @@ -5630,7 +5630,7 @@ static const struct option args[] = {
> " [ wol %d[/%d] | p|u|m|b|a|g|s|f|d... 
> ]\n"
> " [ sopass %x:%x:%x:%x:%x:%x ]\n"
> " [ msglvl %d[/%d] | type on|off ... [--] 
> ]\n"
> -   " [ master-slave 
> master-preferred|slave-preferred|master-force|slave-force ]\n"
> +   " [ master-slave 
> preferred-master|preferred-slave|forced-master|forced-slave ]\n"
>   },
>   {
>   .opts   = "-a|--show-pause",
> -- 
> 2.29.2
> 


signature.asc
Description: PGP signature


Re: [PATCH net-next v3 2/7] ethtool: Get link mode in use instead of speed and duplex parameters

2021-01-28 Thread Michal Kubecek
On Wed, Jan 27, 2021 at 01:22:02PM +, Danielle Ratson wrote:
> > -Original Message-
> > From: Edwin Peer 
> > Sent: Tuesday, January 26, 2021 7:14 PM
> > To: Danielle Ratson 
> > Cc: netdev ; David S . Miller 
> > ; Jakub Kicinski ; Jiri Pirko
> > ; Andrew Lunn ; f.faine...@gmail.com; 
> > Michal Kubecek ; mlxsw
> > ; Ido Schimmel 
> > Subject: Re: [PATCH net-next v3 2/7] ethtool: Get link mode in use instead 
> > of speed and duplex parameters
> > 
> > For one thing, it's cleaner if the driver API is symmetric. The
> > proposed solution sets attributes in terms of speeds and lanes,
> > etc., but it gets them in terms of a compound link_info. But, this
> > asymmetry aside, if link_mode may eventually become R/W at the
> > driver API, as you suggest, then it is more appropriate to guard it
> > with a capability bit, as has been done for lanes, rather than use
> > the -1 special value to indicate that the driver did not set it.
> > 
> > Regards,
> > Edwin Peer
> 
> This patchset adds lanes parameter, not link_mode. The link_mode
> addition was added as a read-only parameter for the reasons we
> mentioned, and I am not sure that implementing the symmetric side is
> relevant for this patchset.
> 
> Michal, do you think we will use the Write side of the link_mode
> parameter?

It makes sense, IMHO. Unless we also add "media" (or whatever name would
be most appropriate) parameter, we cannot in general fully determine the
link mode by speed, duplex and lanes. And using "advertise" to select
a specific mode with disabled autonegotiation would be rather confusing,
I believe. (At the moment, ethtool does not even support syntax like
"advertise " but it will be easy to support
"advertise ... [--]" and I think we should extend the syntax to
support it, regardless of what we choose.) So if we want to allow user
to pick a specific link node by name or bit mask (or rather index?),
I would prefer using a separate parameter.

>And if so, do you think it is relevant for this specific
> patchset?

I don't see an obvious problem with leaving that part for later so
I would say it's up to you.

Michal


Re: [PATCH ethtool v2 2/5] netlink: settings: Add netlink support for lanes parameter

2021-02-09 Thread Michal Kubecek
On Tue, Feb 02, 2021 at 08:25:10PM +0200, Danielle Ratson wrote:
> Add support for "ethtool -s  lanes N ..." for setting a specific
> number of lanes.
> 
> Signed-off-by: Danielle Ratson 
> Reviewed-by: Jiri Pirko 
> ---
>  ethtool.c  | 1 +
>  netlink/settings.c | 8 
>  2 files changed, 9 insertions(+)
> 
> diff --git a/ethtool.c b/ethtool.c
> index 585aafa..fcb09f7 100644
> --- a/ethtool.c
> +++ b/ethtool.c
> @@ -5620,6 +5620,7 @@ static const struct option args[] = {
>   .nlfunc = nl_sset,
>   .help   = "Change generic options",
>   .xhelp  = " [ speed %d ]\n"
> +   " [ lanes %d ]\n"
> " [ duplex half|full ]\n"
> " [ port tp|aui|bnc|mii|fibre|da ]\n"
> " [ mdix auto|on|off ]\n"
> diff --git a/netlink/settings.c b/netlink/settings.c
> index 90c28b1..6cb5d5b 100644
> --- a/netlink/settings.c
> +++ b/netlink/settings.c
> @@ -20,6 +20,7 @@
>  struct link_mode_info {
>   enum link_mode_classclass;
>   u32 speed;
> + u32 lanes;
>   u8  duplex;
>  };
>  

This structure member is not used anywhere in this patch and, AFAICS,
not even in the rest of your series. Perhaps a leftover from an older
version?

Michal

> @@ -1067,6 +1068,13 @@ static const struct param_parser sset_params[] = {
>   .handler= nl_parse_direct_u32,
>   .min_argc   = 1,
>   },
> + {
> + .arg= "lanes",
> + .group  = ETHTOOL_MSG_LINKMODES_SET,
> + .type   = ETHTOOL_A_LINKMODES_LANES,
> + .handler= nl_parse_direct_u32,
> + .min_argc   = 1,
> + },
>   {
>   .arg= "duplex",
>   .group  = ETHTOOL_MSG_LINKMODES_SET,
> -- 
> 2.26.2
> 


signature.asc
Description: PGP signature


Re: [PATCH ethtool v2 1/5] ethtool: Extend ethtool link modes settings uAPI with lanes

2021-02-09 Thread Michal Kubecek
On Tue, Feb 02, 2021 at 08:25:09PM +0200, Danielle Ratson wrote:
> Add ETHTOOL_A_LINKMODES_LANES, expand ethtool_link_settings with
> lanes attribute and define valid lanes in order to support a new
> lanes-selector.
> 
> Signed-off-by: Danielle Ratson 
> ---

When updating the UAPI header copies, please do it in a separate commit
which updates the whole uapi/ subdirectory to the state of a specific
kernel commit. You can use the script at

  https://www.kernel.org/pub/software/network/ethtool/ethtool-import-uapi

It expects the LINUX_GIT environment variable to point to your local git
repository with kernel tree and takes one argument identifying the
commit you want to import the uapi headers from (commit id, tag or
branch name can be used). In your case, net-next would be the most
likely choice.

Michal

> Notes:
> v2:
>   * Update headers after changes in upstream patches.
> 
>  netlink/desc-ethtool.c   | 1 +
>  uapi/linux/ethtool_netlink.h | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/netlink/desc-ethtool.c b/netlink/desc-ethtool.c
> index 96291b9..fe5d7ba 100644
> --- a/netlink/desc-ethtool.c
> +++ b/netlink/desc-ethtool.c
> @@ -87,6 +87,7 @@ static const struct pretty_nla_desc __linkmodes_desc[] = {
>   NLATTR_DESC_U8(ETHTOOL_A_LINKMODES_DUPLEX),
>   NLATTR_DESC_U8(ETHTOOL_A_LINKMODES_MASTER_SLAVE_CFG),
>   NLATTR_DESC_U8(ETHTOOL_A_LINKMODES_MASTER_SLAVE_STATE),
> + NLATTR_DESC_U32(ETHTOOL_A_LINKMODES_LANES),
>  };
>  
>  static const struct pretty_nla_desc __linkstate_desc[] = {
> diff --git a/uapi/linux/ethtool_netlink.h b/uapi/linux/ethtool_netlink.h
> index c022883..0cd6906 100644
> --- a/uapi/linux/ethtool_netlink.h
> +++ b/uapi/linux/ethtool_netlink.h
> @@ -227,6 +227,7 @@ enum {
>   ETHTOOL_A_LINKMODES_DUPLEX, /* u8 */
>   ETHTOOL_A_LINKMODES_MASTER_SLAVE_CFG,   /* u8 */
>   ETHTOOL_A_LINKMODES_MASTER_SLAVE_STATE, /* u8 */
> + ETHTOOL_A_LINKMODES_LANES,  /* u32 */
>  
>   /* add new constants above here */
>   __ETHTOOL_A_LINKMODES_CNT,
> -- 
> 2.26.2
> 


signature.asc
Description: PGP signature


Re: [PATCH ethtool v2 0/5] Extend uAPI with lanes parameter

2021-02-09 Thread Michal Kubecek
On Tue, Feb 02, 2021 at 08:25:08PM +0200, Danielle Ratson wrote:
> Currently, there is no way of knowing how many lanes will be use to
> achieve a wanted speed.
> For example, 100G speed can be achieved using: 2X50 or 4X25.
> 
> In order to solve that, extend ethtool uAPI with lanes as a new link
> mode setting so the command below, for example, will be supported:
> $ ethtool -s swp5 lanes N
> 
> Patch #1: Update headers with the new parameter.
> Patch #2: Support lanes in netlink.
> Patch #3: Expose the number of lanes in use.
> Patch #4: Add auto-completion for lanes.
> Patch #5: Add lanes to man page.
> 
> Danielle Ratson (5):
>   ethtool: Extend ethtool link modes settings uAPI with lanes
>   netlink: settings: Add netlink support for lanes parameter
>   netlink: settings: Expose the number of lanes in use
>   shell-completion: Add completion for lanes
>   man: Add man page for setting lanes parameter
> 
>  ethtool.8.in  |  4 
>  ethtool.c |  1 +
>  netlink/desc-ethtool.c|  1 +
>  netlink/settings.c| 14 ++
>  shell-completion/bash/ethtool |  4 
>  uapi/linux/ethtool_netlink.h  |  1 +
>  6 files changed, 25 insertions(+)

Sorry for the delay, I was busy with other stuff last week and missed
that with kernel part accepted, I should take care of the userspace
counterpart.

The series looks good to me, except for two minor issues I'll comment to
relevant patches.

Michal


signature.asc
Description: PGP signature


Re: [PATCH ethtool v2 1/5] ethtool: Extend ethtool link modes settings uAPI with lanes

2021-02-10 Thread Michal Kubecek
On Wed, Feb 10, 2021 at 12:06:04PM +, Danielle Ratson wrote:
> > When updating the UAPI header copies, please do it in a separate
> > commit which updates the whole uapi/ subdirectory to the state of a
> > specific kernel commit. You can use the script at
> > 
> >   https://www.kernel.org/pub/software/network/ethtool/ethtool-import-uapi
> > 
> > It expects the LINUX_GIT environment variable to point to your local
> > git repository with kernel tree and takes one argument identifying
> > the commit you want to import the uapi headers from (commit id, tag
> > or branch name can be used). In your case, net-next would be the
> > most likely choice.
> 
> Should I use the commit in net-next that I added it uapi headers, or
> generally the last commit of net-next?

That's up to you, I'm fine with either. The important point is to have
a consistent snapshot of all copied (and sanitized) uapi headers.

Michal


signature.asc
Description: PGP signature


Re: [RFC PATCH V3 net-next 1/5] ethtool: Allow network drivers to dump arbitrary EEPROM data

2021-03-18 Thread Michal Kubecek
On Thu, Mar 18, 2021 at 02:03:02PM +0100, Andrew Lunn wrote:
> On Mon, Mar 15, 2021 at 07:12:39PM +0200, Moshe Shemesh wrote:
> >  
> > +EEPROM_DATA
> > +===
> > +
> > +Fetch module EEPROM data dump.
> > +
> > +Request contents:
> > +
> > +  =  ==  ==
> > +  ``ETHTOOL_A_EEPROM_DATA_HEADER``   nested  request header
> > +  ``ETHTOOL_A_EEPROM_DATA_OFFSET``   u32 offset within a page
> > +  ``ETHTOOL_A_EEPROM_DATA_LENGTH``   u32 amount of bytes to read
> 
> I wonder if offset and length should be u8. At most, we should only be
> returning a 1/2 page, so 128 bytes. We don't need a u32.

There is no actual gain using NLA_U8 due to padding. Out of the
interfaces used here, kernel-userspace API is the least flexible so
I would generally prefer NLA_U32, except for bools or enumerated values
where it's absolutely obvious the number of possible values cannot grow
too much. In this case, I can't really say it's impossible we could have
devices with bigger pages in something like 20 years.

> >  Request translation
> >  ===
> >  
> > @@ -1357,8 +1387,8 @@ are netlink only.
> >``ETHTOOL_GET_DUMP_FLAG``   n/a
> >``ETHTOOL_GET_DUMP_DATA``   n/a
> >``ETHTOOL_GET_TS_INFO`` ``ETHTOOL_MSG_TSINFO_GET``
> > -  ``ETHTOOL_GMODULEINFO`` n/a
> > -  ``ETHTOOL_GMODULEEEPROM``   n/a
> > +  ``ETHTOOL_GMODULEINFO`` ``ETHTOOL_MSG_MODULE_EEPROM_GET``
> > +  ``ETHTOOL_GMODULEEEPROM``   ``ETHTOOL_MSG_MODULE_EEPROM_GET``
> >``ETHTOOL_GEEE````ETHTOOL_MSG_EEE_GET``
> >``ETHTOOL_SEEE````ETHTOOL_MSG_EEE_SET``
> >``ETHTOOL_GRSSH``   n/a
> 
> We should check with Michal about this. It is not a direct replacement
> of the old IOCTL API, it is new API. He may want it documented
> differently.

This table is meant to give a hint in the sense "for what you used
ioctl command in left column, use now netlink request in the right".
So IMHO it's appropriate. Perhaps it would deserve a comment explaining
this.

> > +   request->offset = nla_get_u32(tb[ETHTOOL_A_EEPROM_DATA_OFFSET]);
> > +   request->length = nla_get_u32(tb[ETHTOOL_A_EEPROM_DATA_LENGTH]);
> > +   if (tb[ETHTOOL_A_EEPROM_DATA_PAGE] &&
> > +   dev->ethtool_ops->get_module_eeprom_data_by_page &&
> > +   request->offset + request->length > ETH_MODULE_EEPROM_PAGE_LEN)
> > +   return -EINVAL;
> 
> You need to watch out for overflows here. 0xfff0 + 0x20 is less
> than ETH_MODULE_EEPROM_PAGE_LEN when it wraps around, but will cause
> bad things to happen.

BtW, the ioctl code also suffers from this problem and we recently had
a report from customer (IIRC the effect was ethtool trying to allocate
~4GB of memory), upstream fix should follow soon.

Michal


[PATCH ethtool 7/7] build: add -Wextra to default CFLAGS

2020-08-09 Thread Michal Kubecek
As a result of previous commits, ethtool source now builds with gcc
versions 7-11 without any compiler warning with "-Wall -Wextra". Add
"-Wextra" to default cflags to make sure that any new warnings are
caught as early as possible.

Suggested-by: Andrew Lunn 
Signed-off-by: Michal Kubecek 
---
 Makefile.am | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile.am b/Makefile.am
index 2abb2742c335..099182e8d6ad 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -1,4 +1,4 @@
-AM_CFLAGS = -Wall
+AM_CFLAGS = -Wall -Wextra
 AM_CPPFLAGS = -I$(top_srcdir)/uapi
 LDADD = -lm
 
-- 
2.28.0



[PATCH ethtool 3/7] ioctl: get rid of signed/unsigned comparison warnings

2020-08-09 Thread Michal Kubecek
Comparison between signed and unsigned values is fragile and causes
compiler warnings with recent compilers and stricter CFLAGS. Prevent such
comparisons either by properly declaring variables (mostly loop iterators)
as unsigned or by explicitly casting one side of the comparison.

Signed-off-by: Michal Kubecek 
---
 ethtool.c | 45 -
 1 file changed, 24 insertions(+), 21 deletions(-)

diff --git a/ethtool.c b/ethtool.c
index 4fa7a2c1716f..d9dcd0448c02 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -225,8 +225,8 @@ static void parse_generic_cmdline(struct cmd_context *ctx,
 {
int argc = ctx->argc;
char **argp = ctx->argp;
-   int i, idx;
-   int found;
+   unsigned int idx;
+   int i, found;
 
for (i = 0; i < argc; i++) {
found = 0;
@@ -641,8 +641,9 @@ static void dump_link_caps(const char *prefix, const char 
*an_prefix,
  "20baseCR4/Full" },
};
int indent;
-   int did1, new_line_pend, i;
+   int did1, new_line_pend;
int fecreported = 0;
+   unsigned int i;
 
/* Indent just like the separate functions used to */
indent = strlen(prefix) + 14;
@@ -1071,7 +1072,7 @@ void dump_hex(FILE *file, const u8 *data, int len, int 
offset)
 static int dump_regs(int gregs_dump_raw, int gregs_dump_hex,
 struct ethtool_drvinfo *info, struct ethtool_regs *regs)
 {
-   int i;
+   unsigned int i;
 
if (gregs_dump_raw) {
fwrite(regs->data, regs->len, 1, stdout);
@@ -1128,7 +1129,8 @@ static int dump_eeprom(int geeprom_dump_raw,
 static int dump_test(struct ethtool_test *test,
 struct ethtool_gstrings *strings)
 {
-   int i, rc;
+   unsigned int i;
+   int rc;
 
rc = test->flags & ETH_TEST_FL_FAILED;
fprintf(stdout, "The test result is %s\n", rc ? "FAIL" : "PASS");
@@ -1359,7 +1361,7 @@ static void dump_one_feature(const char *indent, const 
char *name,
   : "");
 }
 
-static int linux_version_code(void)
+static unsigned int linux_version_code(void)
 {
struct utsname utsname;
unsigned version, patchlevel, sublevel = 0;
@@ -1375,10 +1377,10 @@ static void dump_features(const struct feature_defs 
*defs,
  const struct feature_state *state,
  const struct feature_state *ref_state)
 {
-   int kernel_ver = linux_version_code();
-   u32 value;
+   unsigned int kernel_ver = linux_version_code();
+   unsigned int i, j;
int indent;
-   int i, j;
+   u32 value;
 
for (i = 0; i < OFF_FLAG_DEF_SIZE; i++) {
/* Don't show features whose state is unknown on this
@@ -1411,7 +1413,7 @@ static void dump_features(const struct feature_defs *defs,
 
/* Show matching features */
for (j = 0; j < defs->n_features; j++) {
-   if (defs->def[j].off_flag_index != i)
+   if (defs->def[j].off_flag_index != (int)i)
continue;
if (defs->off_flag_matched[i] != 1)
/* Show all matching feature states */
@@ -1668,8 +1670,8 @@ static struct feature_defs *get_feature_defs(struct 
cmd_context *ctx)
 {
struct ethtool_gstrings *names;
struct feature_defs *defs;
+   unsigned int i, j;
u32 n_features;
-   int i, j;
 
names = get_stringset(ctx, ETH_SS_FEATURES, 0, 1);
if (names) {
@@ -2236,8 +2238,8 @@ static int do_sfeatures(struct cmd_context *ctx)
struct cmdline_info *cmdline_features;
struct feature_state *old_state, *new_state;
struct ethtool_value eval;
+   unsigned int i, j;
int err, rc;
-   int i, j;
 
defs = get_feature_defs(ctx);
if (!defs) {
@@ -2317,7 +2319,7 @@ static int do_sfeatures(struct cmd_context *ctx)
continue;
 
for (j = 0; j < defs->n_features; j++) {
-   if (defs->def[j].off_flag_index != i ||
+   if (defs->def[j].off_flag_index != (int)i ||
!FEATURE_BIT_IS_SET(
old_state->features.features,
j, available) ||
@@ -2724,9 +2726,9 @@ static int do_sset(struct cmd_context *ctx)
u32 msglvl_wanted = 0;
u32 msglvl_mask = 0;
struct cmdline_info cmdline_msglvl[n_flags_msglvl];
-   int argc = ctx->argc;
+   unsigned int argc = ctx->argc;
char **argp = ctx->argp;
-   int i;
+   unsigned int i;
int err = 0;
 
for (i = 0; i < n_flags_msglvl; i++)
@@ -3869,7 +3871,7 @@ static int do_srx

[PATCH ethtool 0/7] compiler warnings cleanup, part 2

2020-08-09 Thread Michal Kubecek
Two compiler warnings still appear when compiling current source:
comparison between signed and unsigned values and missing struct member
initializations.

The former are mostly handled by declaring variables (loop iterators,
mostly) as unsigned, only few required an explicit cast. The latter are
handled by using named field initializers; in link_mode_info[] array,
helper macros are also used to make code easier to read and check.

As the final step, add "-Wextra" to default CFLAGS to catch any future
warnings as early as possible.

Michal Kubecek (7):
  netlink: get rid of signed/unsigned comparison warnings
  ioctl: check presence of eeprom length argument properly
  ioctl: get rid of signed/unsigned comparison warnings
  get rid of signed/unsigned comparison warnings in register dump
parsers
  settings: simplify link_mode_info[] initializers
  ioctl: convert cmdline_info arrays to named initializers
  build: add -Wextra to default CFLAGS

 Makefile.am|   2 +-
 dsa.c  |   2 +-
 ethtool.c  | 435 ++---
 fec.c  |   2 +-
 ibm_emac.c |   2 +-
 marvell.c  |   2 +-
 natsemi.c  |   2 +-
 netlink/features.c |   4 +-
 netlink/netlink.c  |   4 +-
 netlink/netlink.h  |   2 +-
 netlink/nlsock.c   |   2 +-
 netlink/parser.c   |   2 +-
 netlink/settings.c | 242 ++---
 rxclass.c  |   8 +-
 sfpdiag.c  |   2 +-
 tg3.c  |   4 +-
 16 files changed, 439 insertions(+), 278 deletions(-)

-- 
2.28.0



[PATCH ethtool 2/7] ioctl: check presence of eeprom length argument properly

2020-08-09 Thread Michal Kubecek
In do_geeprom(), do_seprom() and do_getmodule(), check if user used
"length" command line argument is done by setting the value to -1 before
parsing and checking if it changed. This is quite ugly and also causes
compiler warnings as the variable is u32.

Use proper "seen" flag to let parser tell us if the argument was used.

Signed-off-by: Michal Kubecek 
---
 ethtool.c | 24 +++-
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/ethtool.c b/ethtool.c
index c4ad186cd390..4fa7a2c1716f 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -3184,10 +3184,12 @@ static int do_geeprom(struct cmd_context *ctx)
int geeprom_changed = 0;
int geeprom_dump_raw = 0;
u32 geeprom_offset = 0;
-   u32 geeprom_length = -1;
+   u32 geeprom_length = 0;
+   int geeprom_length_seen = 0;
struct cmdline_info cmdline_geeprom[] = {
{ "offset", CMDL_U32, &geeprom_offset, NULL },
-   { "length", CMDL_U32, &geeprom_length, NULL },
+   { "length", CMDL_U32, &geeprom_length, NULL,
+ 0, &geeprom_length_seen },
{ "raw", CMDL_BOOL, &geeprom_dump_raw, NULL },
};
int err;
@@ -3204,7 +3206,7 @@ static int do_geeprom(struct cmd_context *ctx)
return 74;
}
 
-   if (geeprom_length == -1)
+   if (!geeprom_length_seen)
geeprom_length = drvinfo.eedump_len;
 
if (drvinfo.eedump_len < geeprom_offset + geeprom_length)
@@ -3234,14 +3236,16 @@ static int do_seeprom(struct cmd_context *ctx)
 {
int seeprom_changed = 0;
u32 seeprom_magic = 0;
-   u32 seeprom_length = -1;
+   u32 seeprom_length = 0;
u32 seeprom_offset = 0;
u8 seeprom_value = 0;
+   int seeprom_length_seen = 0;
int seeprom_value_seen = 0;
struct cmdline_info cmdline_seeprom[] = {
{ "magic", CMDL_U32, &seeprom_magic, NULL },
{ "offset", CMDL_U32, &seeprom_offset, NULL },
-   { "length", CMDL_U32, &seeprom_length, NULL },
+   { "length", CMDL_U32, &seeprom_length, NULL,
+ 0, &seeprom_length_seen },
{ "value", CMDL_U8, &seeprom_value, NULL,
  0, &seeprom_value_seen },
};
@@ -3262,7 +3266,7 @@ static int do_seeprom(struct cmd_context *ctx)
if (seeprom_value_seen)
seeprom_length = 1;
 
-   if (seeprom_length == -1)
+   if (!seeprom_length_seen)
seeprom_length = drvinfo.eedump_len;
 
if (drvinfo.eedump_len < seeprom_offset + seeprom_length) {
@@ -4538,15 +4542,17 @@ static int do_getmodule(struct cmd_context *ctx)
struct ethtool_modinfo modinfo;
struct ethtool_eeprom *eeprom;
u32 geeprom_offset = 0;
-   u32 geeprom_length = -1;
+   u32 geeprom_length = 0;
int geeprom_changed = 0;
int geeprom_dump_raw = 0;
int geeprom_dump_hex = 0;
+   int geeprom_length_seen = 0;
int err;
 
struct cmdline_info cmdline_geeprom[] = {
{ "offset", CMDL_U32, &geeprom_offset, NULL },
-   { "length", CMDL_U32, &geeprom_length, NULL },
+   { "length", CMDL_U32, &geeprom_length, NULL,
+ 0, &geeprom_length_seen },
{ "raw", CMDL_BOOL, &geeprom_dump_raw, NULL },
{ "hex", CMDL_BOOL, &geeprom_dump_hex, NULL },
};
@@ -4566,7 +4572,7 @@ static int do_getmodule(struct cmd_context *ctx)
return 1;
}
 
-   if (geeprom_length == -1)
+   if (!geeprom_length_seen)
geeprom_length = modinfo.eeprom_len;
 
if (modinfo.eeprom_len < geeprom_offset + geeprom_length)
-- 
2.28.0



[PATCH ethtool 5/7] settings: simplify link_mode_info[] initializers

2020-08-09 Thread Michal Kubecek
Use macro helpers to make link_mode_info[] initializers easier to read and
less prone to mistakes. As a bonus, this gets rid of "missing field
initializer" warnings in netlink/settings.c

This commit should have no effect on resulting code (checked with gcc-11
and -O2).

Signed-off-by: Michal Kubecek 
---
 netlink/settings.c | 236 +
 1 file changed, 86 insertions(+), 150 deletions(-)

diff --git a/netlink/settings.c b/netlink/settings.c
index 99d047a3e497..935724e799da 100644
--- a/netlink/settings.c
+++ b/netlink/settings.c
@@ -64,160 +64,96 @@ static const char *const names_transceiver[] = {
  * there is little chance of getting them separated any time soon so let's
  * sort them out ourselves
  */
+#define __REAL(_speed) \
+   { .class = LM_CLASS_REAL, .speed = _speed, .duplex = DUPLEX_FULL }
+#define __HALF_DUPLEX(_speed) \
+   { .class = LM_CLASS_REAL, .speed = _speed, .duplex = DUPLEX_HALF }
+#define __SPECIAL(_class) \
+   { .class = LM_CLASS_ ## _class }
+
 static const struct link_mode_info link_modes[] = {
-   [ETHTOOL_LINK_MODE_10baseT_Half_BIT] =
-   { LM_CLASS_REAL,10, DUPLEX_HALF },
-   [ETHTOOL_LINK_MODE_10baseT_Full_BIT] =
-   { LM_CLASS_REAL,10, DUPLEX_FULL },
-   [ETHTOOL_LINK_MODE_100baseT_Half_BIT] =
-   { LM_CLASS_REAL,100,DUPLEX_HALF },
-   [ETHTOOL_LINK_MODE_100baseT_Full_BIT] =
-   { LM_CLASS_REAL,100,DUPLEX_FULL },
-   [ETHTOOL_LINK_MODE_1000baseT_Half_BIT] =
-   { LM_CLASS_REAL,1000,   DUPLEX_HALF },
-   [ETHTOOL_LINK_MODE_1000baseT_Full_BIT] =
-   { LM_CLASS_REAL,1000,   DUPLEX_FULL },
-   [ETHTOOL_LINK_MODE_Autoneg_BIT] =
-   { LM_CLASS_AUTONEG },
-   [ETHTOOL_LINK_MODE_TP_BIT] =
-   { LM_CLASS_PORT },
-   [ETHTOOL_LINK_MODE_AUI_BIT] =
-   { LM_CLASS_PORT },
-   [ETHTOOL_LINK_MODE_MII_BIT] =
-   { LM_CLASS_PORT },
-   [ETHTOOL_LINK_MODE_FIBRE_BIT] =
-   { LM_CLASS_PORT },
-   [ETHTOOL_LINK_MODE_BNC_BIT] =
-   { LM_CLASS_PORT },
-   [ETHTOOL_LINK_MODE_1baseT_Full_BIT] =
-   { LM_CLASS_REAL,1,  DUPLEX_FULL },
-   [ETHTOOL_LINK_MODE_Pause_BIT] =
-   { LM_CLASS_PAUSE },
-   [ETHTOOL_LINK_MODE_Asym_Pause_BIT] =
-   { LM_CLASS_PAUSE },
-   [ETHTOOL_LINK_MODE_2500baseX_Full_BIT] =
-   { LM_CLASS_REAL,2500,   DUPLEX_FULL },
-   [ETHTOOL_LINK_MODE_Backplane_BIT] =
-   { LM_CLASS_PORT },
-   [ETHTOOL_LINK_MODE_1000baseKX_Full_BIT] =
-   { LM_CLASS_REAL,1000,   DUPLEX_FULL },
-   [ETHTOOL_LINK_MODE_1baseKX4_Full_BIT] =
-   { LM_CLASS_REAL,1,  DUPLEX_FULL },
-   [ETHTOOL_LINK_MODE_1baseKR_Full_BIT] =
-   { LM_CLASS_REAL,1,  DUPLEX_FULL },
-   [ETHTOOL_LINK_MODE_1baseR_FEC_BIT] =
-   { LM_CLASS_REAL,1,  DUPLEX_FULL },
-   [ETHTOOL_LINK_MODE_2baseMLD2_Full_BIT] =
-   { LM_CLASS_REAL,2,  DUPLEX_FULL },
-   [ETHTOOL_LINK_MODE_2baseKR2_Full_BIT] =
-   { LM_CLASS_REAL,2,  DUPLEX_FULL },
-   [ETHTOOL_LINK_MODE_4baseKR4_Full_BIT] =
-   { LM_CLASS_REAL,4,  DUPLEX_FULL },
-   [ETHTOOL_LINK_MODE_4baseCR4_Full_BIT] =
-   { LM_CLASS_REAL,4,  DUPLEX_FULL },
-   [ETHTOOL_LINK_MODE_4baseSR4_Full_BIT] =
-   { LM_CLASS_REAL,4,  DUPLEX_FULL },
-   [ETHTOOL_LINK_MODE_4baseLR4_Full_BIT] =
-   { LM_CLASS_REAL,4,  DUPLEX_FULL },
-   [ETHTOOL_LINK_MODE_56000baseKR4_Full_BIT] =
-   { LM_CLASS_REAL,56000,  DUPLEX_FULL },
-   [ETHTOOL_LINK_MODE_56000baseCR4_Full_BIT] =
-   { LM_CLASS_REAL,56000,  DUPLEX_FULL },
-   [ETHTOOL_LINK_MODE_56000baseSR4_Full_BIT] =
-   { LM_CLASS_REAL,56000,  DUPLEX_FULL },
-   [ETHTOOL_LINK_MODE_56000baseLR4_Full_BIT] =
-   { LM_CLASS_REAL,56000,  DUPLEX_FULL },
-   [ETHTOOL_LINK_MODE_25000baseCR_Full_BIT] =
-   { LM_CLASS_REAL,25000,  DUPLEX_FULL },
-   [ETHTOOL_LINK_MODE_25000baseKR_Full_BIT] =
-   { LM_CLASS_REAL,25000,  DUPLEX_FULL },
-   [ETHTOOL_LINK_MODE_25000baseSR_Full_BIT] =
-   { LM_CLASS_REAL,25000,  DUPLEX_FULL },
-   [ETHTOOL_LINK_MODE_5baseCR2_Full_BIT] =
-   { LM_CLASS_REAL,5,  DUPLEX_FULL },
-   [ETHTOOL_LINK_MODE_5baseKR2_Full_BIT] =
-   { LM_CLASS_REAL,5,  DUPLEX_FULL },
-   [ETHTOOL_LINK_MODE_10baseKR4_Full_BIT] =
-   { LM_CLASS_REAL,1000

[PATCH ethtool 4/7] get rid of signed/unsigned comparison warnings in register dump parsers

2020-08-09 Thread Michal Kubecek
All of these are avoided by declaring a variable (mostly loop iterators)
holding only unsigned values as unsigned.

Signed-off-by: Michal Kubecek 
---
 dsa.c  | 2 +-
 fec.c  | 2 +-
 ibm_emac.c | 2 +-
 marvell.c  | 2 +-
 natsemi.c  | 2 +-
 rxclass.c  | 8 +---
 sfpdiag.c  | 2 +-
 tg3.c  | 4 ++--
 8 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/dsa.c b/dsa.c
index 65502a899194..33c1d39d6605 100644
--- a/dsa.c
+++ b/dsa.c
@@ -824,8 +824,8 @@ static int dsa_mv88e6xxx_dump_regs(struct ethtool_regs 
*regs)
 {
const struct dsa_mv88e6xxx_switch *sw = NULL;
const u16 *data = (u16 *)regs->data;
+   unsigned int i;
u16 id;
-   int i;
 
/* Marvell chips have 32 per-port 16-bit registers */
if (regs->len < 32 * sizeof(u16))
diff --git a/fec.c b/fec.c
index 9cb4f8b1d4e1..d2373d6124c0 100644
--- a/fec.c
+++ b/fec.c
@@ -198,7 +198,7 @@ int fec_dump_regs(struct ethtool_drvinfo *info 
__maybe_unused,
  struct ethtool_regs *regs)
 {
const u32 *data = (u32 *)regs->data;
-   int offset;
+   unsigned int offset;
u32 val;
 
for (offset = 0; offset < regs->len; offset += 4) {
diff --git a/ibm_emac.c b/ibm_emac.c
index ea01d56f609c..9f7cae605482 100644
--- a/ibm_emac.c
+++ b/ibm_emac.c
@@ -238,7 +238,7 @@ static void *print_mal_regs(void *buf)
 {
struct emac_ethtool_regs_subhdr *hdr = buf;
struct mal_regs *p = (struct mal_regs *)(hdr + 1);
-   int i;
+   unsigned int i;
 
printf("MAL%d Registers\n", hdr->index);
printf("-\n");
diff --git a/marvell.c b/marvell.c
index 8afb150327a3..d3d570e4d4ad 100644
--- a/marvell.c
+++ b/marvell.c
@@ -130,7 +130,7 @@ static void dump_fifo(const char *name, const void *p)
 static void dump_gmac_fifo(const char *name, const void *p)
 {
const u32 *r = p;
-   int i;
+   unsigned int i;
static const char *regs[] = {
"End Address",
"Almost Full Thresh",
diff --git a/natsemi.c b/natsemi.c
index 0af465959cbc..4d9fc092b623 100644
--- a/natsemi.c
+++ b/natsemi.c
@@ -967,8 +967,8 @@ int
 natsemi_dump_eeprom(struct ethtool_drvinfo *info __maybe_unused,
struct ethtool_eeprom *ee)
 {
-   int i;
u16 *eebuf = (u16 *)ee->data;
+   unsigned int i;
 
if (ee->magic != NATSEMI_MAGIC) {
fprintf(stderr, "Magic number 0x%08x does not match 0x%08x\n",
diff --git a/rxclass.c b/rxclass.c
index 79972651e706..6cf81fdafc85 100644
--- a/rxclass.c
+++ b/rxclass.c
@@ -348,8 +348,9 @@ int rxclass_rule_getall(struct cmd_context *ctx)
 {
struct ethtool_rxnfc *nfccmd;
__u32 *rule_locs;
-   int err, i;
+   unsigned int i;
__u32 count;
+   int err;
 
/* determine rule count */
err = rxclass_get_dev_info(ctx, &count, NULL);
@@ -481,8 +482,9 @@ static int rmgr_find_empty_slot(struct rmgr_ctrl *rmgr,
 static int rmgr_init(struct cmd_context *ctx, struct rmgr_ctrl *rmgr)
 {
struct ethtool_rxnfc *nfccmd;
-   int err, i;
__u32 *rule_locs;
+   unsigned int i;
+   int err;
 
/* clear rule manager settings */
memset(rmgr, 0, sizeof(*rmgr));
@@ -941,7 +943,7 @@ static int rxclass_get_long(char *str, long long *val, int 
size)
 
 static int rxclass_get_ulong(char *str, unsigned long long *val, int size)
 {
-   long long max = ~0ULL >> (64 - size);
+   unsigned long long max = ~0ULL >> (64 - size);
char *endp;
 
errno = 0;
diff --git a/sfpdiag.c b/sfpdiag.c
index fa41651422ea..1fa8b7ba8fec 100644
--- a/sfpdiag.c
+++ b/sfpdiag.c
@@ -190,8 +190,8 @@ static float befloattoh(const __u32 *source)
 
 static void sff8472_calibration(const __u8 *id, struct sff_diags *sd)
 {
-   int i;
__u16 rx_reading;
+   unsigned int i;
 
/* Calibration should occur for all values (threshold and current) */
for (i = 0; i < ARRAY_SIZE(sd->bias_cur); ++i) {
diff --git a/tg3.c b/tg3.c
index ac73b33ae4e3..ebdef2d60e6b 100644
--- a/tg3.c
+++ b/tg3.c
@@ -7,7 +7,7 @@
 int tg3_dump_eeprom(struct ethtool_drvinfo *info __maybe_unused,
struct ethtool_eeprom *ee)
 {
-   int i;
+   unsigned int i;
 
if (ee->magic != TG3_MAGIC) {
fprintf(stderr, "Magic number 0x%08x does not match 0x%08x\n",
@@ -26,7 +26,7 @@ int tg3_dump_eeprom(struct ethtool_drvinfo *info 
__maybe_unused,
 int tg3_dump_regs(struct ethtool_drvinfo *info __maybe_unused,
  struct ethtool_regs *regs)
 {
-   int i;
+   unsigned int i;
u32 reg;
 
fprintf(stdout, "Offset\tValue\n");
-- 
2.28.0



[PATCH ethtool 1/7] netlink: get rid of signed/unsigned comparison warnings

2020-08-09 Thread Michal Kubecek
Get rid of compiler warnings about comparison between signed and
unsigned integer values in netlink code.

Signed-off-by: Michal Kubecek 
---
 netlink/features.c | 4 ++--
 netlink/netlink.c  | 4 ++--
 netlink/netlink.h  | 2 +-
 netlink/nlsock.c   | 2 +-
 netlink/parser.c   | 2 +-
 netlink/settings.c | 6 +++---
 6 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/netlink/features.c b/netlink/features.c
index 8b5b8588ca23..f5862e97a265 100644
--- a/netlink/features.c
+++ b/netlink/features.c
@@ -149,7 +149,7 @@ int dump_features(const struct nlattr *const *tb,
continue;
 
for (j = 0; j < results.count; j++) {
-   if (feature_flags[j] == i) {
+   if (feature_flags[j] == (int)i) {
n_match++;
flag_value = flag_value ||
feature_on(results.active, j);
@@ -163,7 +163,7 @@ int dump_features(const struct nlattr *const *tb,
for (j = 0; j < results.count; j++) {
const char *name = get_string(feature_names, j);
 
-   if (feature_flags[j] != i)
+   if (feature_flags[j] != (int)i)
continue;
if (n_match == 1)
dump_feature(&results, NULL, NULL, j,
diff --git a/netlink/netlink.c b/netlink/netlink.c
index 76b6e825b1d0..e42d57076a4b 100644
--- a/netlink/netlink.c
+++ b/netlink/netlink.c
@@ -33,9 +33,9 @@ int nomsg_reply_cb(const struct nlmsghdr *nlhdr, void *data 
__maybe_unused)
 int attr_cb(const struct nlattr *attr, void *data)
 {
const struct attr_tb_info *tb_info = data;
-   int type = mnl_attr_get_type(attr);
+   uint16_t type = mnl_attr_get_type(attr);
 
-   if (type >= 0 && type <= tb_info->max_type)
+   if (type <= tb_info->max_type)
tb_info->tb[type] = attr;
 
return MNL_CB_OK;
diff --git a/netlink/netlink.h b/netlink/netlink.h
index a4984c82ae76..dd4a02bcc916 100644
--- a/netlink/netlink.h
+++ b/netlink/netlink.h
@@ -45,7 +45,7 @@ struct nl_context {
const char  *cmd;
const char  *param;
char**argp;
-   int argc;
+   unsigned intargc;
boolioctl_fallback;
boolwildcard_unsupported;
 };
diff --git a/netlink/nlsock.c b/netlink/nlsock.c
index c3f09b6ee9ab..ef31d8c33b29 100644
--- a/netlink/nlsock.c
+++ b/netlink/nlsock.c
@@ -168,7 +168,7 @@ static void debug_msg(struct nl_socket *nlsk, const void 
*msg, unsigned int len,
  *
  * Return: error code extracted from the message
  */
-static int nlsock_process_ack(struct nlmsghdr *nlhdr, ssize_t len,
+static int nlsock_process_ack(struct nlmsghdr *nlhdr, unsigned long len,
  unsigned int suppress_nlerr, bool pretty)
 {
const struct nlattr *tb[NLMSGERR_ATTR_MAX + 1] = {};
diff --git a/netlink/parser.c b/netlink/parser.c
index 395bd5743af9..c5a368a65a7a 100644
--- a/netlink/parser.c
+++ b/netlink/parser.c
@@ -604,7 +604,7 @@ static int parse_numeric_bitset(struct nl_context *nlctx, 
uint16_t type,
parser_err_invalid_value(nlctx, arg);
return -EINVAL;
}
-   len1 = maskptr ? (maskptr - arg) : strlen(arg);
+   len1 = maskptr ? (unsigned int)(maskptr - arg) : strlen(arg);
nwords = DIV_ROUND_UP(len1, 8);
nbits = 0;
 
diff --git a/netlink/settings.c b/netlink/settings.c
index de35ad173627..99d047a3e497 100644
--- a/netlink/settings.c
+++ b/netlink/settings.c
@@ -276,10 +276,10 @@ int dump_link_modes(struct nl_context *nlctx, const 
struct nlattr *bitset,
const struct nlattr *bitset_tb[ETHTOOL_A_BITSET_MAX + 1] = {};
DECLARE_ATTR_TB_INFO(bitset_tb);
const unsigned int before_len = strlen(before);
+   unsigned int prev = UINT_MAX - 1;
const struct nlattr *bits;
const struct nlattr *bit;
bool first = true;
-   int prev = -2;
bool nomask;
int ret;
 
@@ -333,7 +333,7 @@ int dump_link_modes(struct nl_context *nlctx, const struct 
nlattr *bitset,
if (first)
first = false;
/* ugly hack to preserve old output format */
-   if (class == LM_CLASS_REAL && (prev == idx - 1) &&
+   if (class == LM_CLASS_REAL && (idx == prev + 1) &&
prev < link_modes_count &&
link_modes[prev].class == LM_CLASS_REAL &&
link_modes[prev].duplex == DUPLEX_HALF)
@@ -375,7 +375,7 @@ int dump_link_modes(struct nl_context *nlctx, const struct 
nlattr *bitset,
first = fals

[PATCH ethtool 6/7] ioctl: convert cmdline_info arrays to named initializers

2020-08-09 Thread Michal Kubecek
To get rid of remaining "missing field initializer" compiler warnings,
convert arrays of struct cmdline_info used for command line parser to
named initializers. This also makes the initializers easier to read.

This commit should have no effect on resulting code (checked with gcc-11
and -O2).

Signed-off-by: Michal Kubecek 
---
 ethtool.c | 378 ++
 1 file changed, 296 insertions(+), 82 deletions(-)

diff --git a/ethtool.c b/ethtool.c
index d9dcd0448c02..40868d064e28 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -1825,10 +1825,24 @@ static int do_spause(struct cmd_context *ctx)
int pause_rx_wanted = -1;
int pause_tx_wanted = -1;
struct cmdline_info cmdline_pause[] = {
-   { "autoneg", CMDL_BOOL, &pause_autoneg_wanted,
- &epause.autoneg },
-   { "rx", CMDL_BOOL, &pause_rx_wanted, &epause.rx_pause },
-   { "tx", CMDL_BOOL, &pause_tx_wanted, &epause.tx_pause },
+   {
+   .name   = "autoneg",
+   .type   = CMDL_BOOL,
+   .wanted_val = &pause_autoneg_wanted,
+   .ioctl_val  = &epause.autoneg,
+   },
+   {
+   .name   = "rx",
+   .type   = CMDL_BOOL,
+   .wanted_val = &pause_rx_wanted,
+   .ioctl_val  = &epause.rx_pause,
+   },
+   {
+   .name   = "tx",
+   .type   = CMDL_BOOL,
+   .wanted_val = &pause_tx_wanted,
+   .ioctl_val  = &epause.tx_pause,
+   },
};
int err, changed = 0;
 
@@ -1868,12 +1882,30 @@ static int do_sring(struct cmd_context *ctx)
s32 ring_rx_jumbo_wanted = -1;
s32 ring_tx_wanted = -1;
struct cmdline_info cmdline_ring[] = {
-   { "rx", CMDL_S32, &ring_rx_wanted, &ering.rx_pending },
-   { "rx-mini", CMDL_S32, &ring_rx_mini_wanted,
- &ering.rx_mini_pending },
-   { "rx-jumbo", CMDL_S32, &ring_rx_jumbo_wanted,
- &ering.rx_jumbo_pending },
-   { "tx", CMDL_S32, &ring_tx_wanted, &ering.tx_pending },
+   {
+   .name   = "rx",
+   .type   = CMDL_S32,
+   .wanted_val = &ring_rx_wanted,
+   .ioctl_val  = &ering.rx_pending,
+   },
+   {
+   .name   = "rx-mini",
+   .type   = CMDL_S32,
+   .wanted_val = &ring_rx_mini_wanted,
+   .ioctl_val  = &ering.rx_mini_pending,
+   },
+   {
+   .name   = "rx-jumbo",
+   .type   = CMDL_S32,
+   .wanted_val = &ring_rx_jumbo_wanted,
+   .ioctl_val  = &ering.rx_jumbo_pending,
+   },
+   {
+   .name   = "tx",
+   .type   = CMDL_S32,
+   .wanted_val = &ring_tx_wanted,
+   .ioctl_val  = &ering.tx_pending,
+   },
};
int err, changed = 0;
 
@@ -1937,12 +1969,30 @@ static int do_schannels(struct cmd_context *ctx)
s32 channels_other_wanted = -1;
s32 channels_combined_wanted = -1;
struct cmdline_info cmdline_channels[] = {
-   { "rx", CMDL_S32, &channels_rx_wanted, &echannels.rx_count },
-   { "tx", CMDL_S32, &channels_tx_wanted, &echannels.tx_count },
-   { "other", CMDL_S32, &channels_other_wanted,
- &echannels.other_count },
-   { "combined", CMDL_S32, &channels_combined_wanted,
- &echannels.combined_count },
+   {
+   .name   = "rx",
+   .type   = CMDL_S32,
+   .wanted_val = &channels_rx_wanted,
+   .ioctl_val  = &echannels.rx_count,
+   },
+   {
+   .name   = "tx",
+   .type   = CMDL_S32,
+   .wanted_val = &channels_tx_wanted,
+   .ioctl_val  = &echannels.tx_count,
+   },
+   {
+ 

Re: [PATCH ethtool 1/7] netlink: get rid of signed/unsigned comparison warnings

2020-08-11 Thread Michal Kubecek
On Mon, Aug 10, 2020 at 04:11:22PM +0200, Andrew Lunn wrote:
> On Sun, Aug 09, 2020 at 11:24:19PM +0200, Michal Kubecek wrote:
> > Get rid of compiler warnings about comparison between signed and
> > unsigned integer values in netlink code.
> > 
> > Signed-off-by: Michal Kubecek 
> > ---
> >  netlink/features.c | 4 ++--
> >  netlink/netlink.c  | 4 ++--
> >  netlink/netlink.h  | 2 +-
> >  netlink/nlsock.c   | 2 +-
> >  netlink/parser.c   | 2 +-
> >  netlink/settings.c | 6 +++---
> >  6 files changed, 10 insertions(+), 10 deletions(-)
> > 
> > diff --git a/netlink/features.c b/netlink/features.c
> > index 8b5b8588ca23..f5862e97a265 100644
> > --- a/netlink/features.c
> > +++ b/netlink/features.c
> > @@ -149,7 +149,7 @@ int dump_features(const struct nlattr *const *tb,
> > continue;
> >  
> > for (j = 0; j < results.count; j++) {
> > -   if (feature_flags[j] == i) {
> > +   if (feature_flags[j] == (int)i) {
> > n_match++;
> > flag_value = flag_value ||
> > feature_on(results.active, j);
> > @@ -163,7 +163,7 @@ int dump_features(const struct nlattr *const *tb,
> > for (j = 0; j < results.count; j++) {
> > const char *name = get_string(feature_names, j);
> >  
> > -   if (feature_flags[j] != i)
> > +   if (feature_flags[j] != (int)i)
> 
> Hi Michal
> 
> Would it be better to make feature_flags an unsigned int * ? And
> change the -1 to MAX_UNIT?

It certainly would. I was actually thinking about this solution for
a moment but then I managed to mistake feature_flags with off_flag_def
and convinced myself that it's shared with ioctl code so that changing
its type would require changes there as well. Thank you for pointing
this out.

Michal


  1   2   3   4   5   6   7   >