Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback

2017-06-27 Thread David Miller
From: Andrew Lunn 
Date: Mon, 26 Jun 2017 15:34:39 +0200

> I still fear this is going to be an ethtool call with only one user.

That is my fear as well.

We are also in a sort-of moratorium for adding new major ethtool
features until the conversion of ethtool over to netlink occurs.


Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback

2017-06-26 Thread Gal Pressman

> On Wed, Jun 21, 2017 at 04:04:44PM +0300, Gal Pressman wrote:
>> Currently, drivers can only tell whether the link is up/down through
>> ETHTOOL_GLINK callback, but no additional information is given in case
>> of link down.
>> This patch provides an infrastructure that allows drivers to hint
>> the user with the reason why the link is down, in order to ease the
>> debug process.
>>
>> Reasons are separated to two types, generic and vendor specific.
>> Drivers can reply with a generic reason using the enums provided in this
>> patch (and the ones that will be added in the future), which will be
>> translated to strings by the userspace ethtool.
>> In case of a vendor specific reason (not suitable for most vendors),
>> drivers can reply with ETHTOOL_VENDOR_SPECIFIC reason, in this case the
>> vendor_reason field should be filled with a vendor specific status code
>> which will be parsed by the vendor specific userspace parser if one is
>> available.
>>
>> This kind of information can save system administrators precious time
>> which will not be wasted trying to understand why the link won't go
>> up.
>>
>> For example, when the cable is unplugged:
>> $ ethtool ethXX
>> ...
>> Link detected: no (Cable unplugged)
>>
>> Signed-off-by: Gal Pressman 
>> Signed-off-by: Saeed Mahameed 
>> ---
>>  include/linux/ethtool.h  |  2 ++
>>  include/uapi/linux/ethtool.h | 33 +
>>  net/core/ethtool.c   | 24 
>>  3 files changed, 59 insertions(+)
>>
>> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
>> index 83cc986..d472047 100644
>> --- a/include/linux/ethtool.h
>> +++ b/include/linux/ethtool.h
>> @@ -374,5 +374,7 @@ struct ethtool_ops {
>>struct ethtool_link_ksettings *);
>>  int (*set_link_ksettings)(struct net_device *,
>>const struct ethtool_link_ksettings *);
>> +int (*get_link_down_reason)(struct net_device *,
>> +struct ethtool_link_down_reason *);
>>  };
>>  #endif /* _LINUX_ETHTOOL_H */
>> diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
>> index 7d4a594..8cf9d2c 100644
>> --- a/include/uapi/linux/ethtool.h
>> +++ b/include/uapi/linux/ethtool.h
>> @@ -550,6 +550,13 @@ struct ethtool_pauseparam {
>>  
>>  #define ETH_GSTRING_LEN 32
>>  
>> +struct ethtool_link_down_reason {
>> +__u32   cmd;
>> +__u32   reason;
>> +__u32   vendor_reason;
>> +__u32   reserved[4];
>> +};
> Shouldn't this be a list? The device is over its power budget,
> overheating and does not have a cable plugged in. There can be
> multiple reasons it is down.
Current implementation will report one issue at a time, fix it to reveal the 
next one :).

Reporting more than one issue is possible, for example:
Add another callback to return the number of issues and allocate a buffer 
accordingly, sounds like an overkill to me.
Alternatively, use a bitmap instead of the enum, which will probably get too 
big very quickly.
I don't think it's worth it in this particular case since in most cases one 
reason should suffice.

>
>Andrew



Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback

2017-06-26 Thread Andrew Lunn
> The driver has a good intimate information of his device
> implementation, and hence an analysis done by the device vendor is
> favorable.

For a firmware based NIC, maybe. For a discrete NIC, making use of all
the Linux subsystems, this is going to be hard from within the kernel.

> The driver provider can perform the analysis inside the device (firmware) or 
> in the driver according to his preferences.
> We believe that since devices are becoming smarter, more analysis will be 
> done by the device itself, which has more
> information and faster access.
> Smart NICs/SoCs are very popular this days and this API takes into account 
> the different architectures.
> 
> Since this callback is optional, a user space analysis tool can be added in 
> the future providing more generic analysis approach for
> unsupported devices.

I still fear this is going to be an ethtool call with only one user.

  Andrew


Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback

2017-06-26 Thread Andrew Lunn
On Wed, Jun 21, 2017 at 04:04:44PM +0300, Gal Pressman wrote:
> Currently, drivers can only tell whether the link is up/down through
> ETHTOOL_GLINK callback, but no additional information is given in case
> of link down.
> This patch provides an infrastructure that allows drivers to hint
> the user with the reason why the link is down, in order to ease the
> debug process.
> 
> Reasons are separated to two types, generic and vendor specific.
> Drivers can reply with a generic reason using the enums provided in this
> patch (and the ones that will be added in the future), which will be
> translated to strings by the userspace ethtool.
> In case of a vendor specific reason (not suitable for most vendors),
> drivers can reply with ETHTOOL_VENDOR_SPECIFIC reason, in this case the
> vendor_reason field should be filled with a vendor specific status code
> which will be parsed by the vendor specific userspace parser if one is
> available.
> 
> This kind of information can save system administrators precious time
> which will not be wasted trying to understand why the link won't go
> up.
> 
> For example, when the cable is unplugged:
> $ ethtool ethXX
> ...
> Link detected: no (Cable unplugged)
> 
> Signed-off-by: Gal Pressman 
> Signed-off-by: Saeed Mahameed 
> ---
>  include/linux/ethtool.h  |  2 ++
>  include/uapi/linux/ethtool.h | 33 +
>  net/core/ethtool.c   | 24 
>  3 files changed, 59 insertions(+)
> 
> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
> index 83cc986..d472047 100644
> --- a/include/linux/ethtool.h
> +++ b/include/linux/ethtool.h
> @@ -374,5 +374,7 @@ struct ethtool_ops {
> struct ethtool_link_ksettings *);
>   int (*set_link_ksettings)(struct net_device *,
> const struct ethtool_link_ksettings *);
> + int (*get_link_down_reason)(struct net_device *,
> + struct ethtool_link_down_reason *);
>  };
>  #endif /* _LINUX_ETHTOOL_H */
> diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
> index 7d4a594..8cf9d2c 100644
> --- a/include/uapi/linux/ethtool.h
> +++ b/include/uapi/linux/ethtool.h
> @@ -550,6 +550,13 @@ struct ethtool_pauseparam {
>  
>  #define ETH_GSTRING_LEN  32
>  
> +struct ethtool_link_down_reason {
> + __u32   cmd;
> + __u32   reason;
> + __u32   vendor_reason;
> + __u32   reserved[4];
> +};

Shouldn't this be a list? The device is over its power budget,
overheating and does not have a cable plugged in. There can be
multiple reasons it is down.

 Andrew


Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback

2017-06-26 Thread Gal Pressman

> On Sun, Jun 25, 2017 at 02:59:24PM +0300, Gal Pressman wrote:
>>> On Thu, Jun 22, 2017 at 11:09:04AM +0300, Gal Pressman wrote:
>> +enum {
>> +ETHTOOL_LINK_VENDOR_SPECIFIC = -1, /* Vendor specific issue 
>> provided in vendor_reason */
>> +ETHTOOL_LINK_NO_ISSUE, /* No issue observed with link */
>> +ETHTOOL_LINK_REASON_UNKNOWN, /* Unknown reason */
> I think OTHER would be better that UNKNOWN. 
 Fine with me.
>> +ETHTOOL_LINK_NETDEV_CARRIER_DOWN, /* Netdev carrier is down */
>> +ETHTOOL_LINK_ADMIN_DOWN, /* Admin down */
> These two are interesting. We have that information already. Why do we
> want it again?
 My goal is to gather all link issue reasons in one place.
>>> I'm actually wondering if this is a user space problem. Nearly
>>> everything you list is already available. Some you get from ip link,
>>> others from ethtool or ethtool --module-info, including I2C bus
>>> error, since you would expect EIO or ETIMEOUT.
>>>
>>> If you were to write a user space tool using the information what is
>>> currently available, what would be missing?
>>>
>>>   Andrew
>> I think most of the reasons in this list would be missing.
>> Auto negotiation failure,
> You can probably get that from the PHY layer. You get both the local
> and remote AN capabilities.
>
>> unplugged, over temperature, power budget exceeded..
> Don't you get over temperature from the SFF data? Also power budget?
You are right, but it depends on other resources that might fail such as BUS 
failure, invalid EEPROM, etc..

> And what does cable unplugged actually mean? Do you have a micro
> switch inside the socket? So that is maybe a gpio-key?
No, some hardware devices can sense this state.
We would like to expose this information when it's available.

> Another thing to remember is that your device is the exception to the
> rule. You have some firmware doing a lot of the work bringing this all
> together. But nearly every other Ethernet interface has a discrete MAC
> and PHY, I2C bus driver, EEPROM driver, generic SFF decoder, HWMON
> temperature sensor, etc. How does your call work in this normal
> situation? How do you make calls into all these subsystems to get the
> information? You want a generic solution which can be made to work for
> everybody.
The driver has a good intimate information of his device implementation, and 
hence an analysis done by the device vendor is favorable.
The driver provider can perform the analysis inside the device (firmware) or in 
the driver according to his preferences.
We believe that since devices are becoming smarter, more analysis will be done 
by the device itself, which has more
information and faster access.
Smart NICs/SoCs are very popular this days and this API takes into account the 
different architectures.

Since this callback is optional, a user space analysis tool can be added in the 
future providing more generic analysis approach for
unsupported devices.

>   Andrew
>
>> Keep in mind that this is just an initial list, not to mention the vendor 
>> reasons which are not part of any existing API.
>> I don't see how a user space tool that expects ETIMEOUT/EIO is better than 
>> this suggestion.



Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback

2017-06-25 Thread Andrew Lunn
On Sun, Jun 25, 2017 at 02:59:24PM +0300, Gal Pressman wrote:
> 
> > On Thu, Jun 22, 2017 at 11:09:04AM +0300, Gal Pressman wrote:
>  +enum {
>  +ETHTOOL_LINK_VENDOR_SPECIFIC = -1, /* Vendor specific issue 
>  provided in vendor_reason */
>  +ETHTOOL_LINK_NO_ISSUE, /* No issue observed with link */
>  +ETHTOOL_LINK_REASON_UNKNOWN, /* Unknown reason */
> >>> I think OTHER would be better that UNKNOWN. 
> >> Fine with me.
>  +ETHTOOL_LINK_NETDEV_CARRIER_DOWN, /* Netdev carrier is down */
>  +ETHTOOL_LINK_ADMIN_DOWN, /* Admin down */
> >>> These two are interesting. We have that information already. Why do we
> >>> want it again?
> >> My goal is to gather all link issue reasons in one place.
> > I'm actually wondering if this is a user space problem. Nearly
> > everything you list is already available. Some you get from ip link,
> > others from ethtool or ethtool --module-info, including I2C bus
> > error, since you would expect EIO or ETIMEOUT.
> >
> > If you were to write a user space tool using the information what is
> > currently available, what would be missing?
> >
> >   Andrew
> 
> I think most of the reasons in this list would be missing.
> Auto negotiation failure,

You can probably get that from the PHY layer. You get both the local
and remote AN capabilities.

> unplugged, over temperature, power budget exceeded..

Don't you get over temperature from the SFF data? Also power budget?

And what does cable unplugged actually mean? Do you have a micro
switch inside the socket? So that is maybe a gpio-key?

Another thing to remember is that your device is the exception to the
rule. You have some firmware doing a lot of the work bringing this all
together. But nearly every other Ethernet interface has a discrete MAC
and PHY, I2C bus driver, EEPROM driver, generic SFF decoder, HWMON
temperature sensor, etc. How does your call work in this normal
situation? How do you make calls into all these subsystems to get the
information? You want a generic solution which can be made to work for
everybody.

Andrew




> 
> Keep in mind that this is just an initial list, not to mention the vendor 
> reasons which are not part of any existing API.
> I don't see how a user space tool that expects ETIMEOUT/EIO is better than 
> this suggestion.


Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback

2017-06-25 Thread Gal Pressman

> On Thu, Jun 22, 2017 at 11:09:04AM +0300, Gal Pressman wrote:
 +enum {
 +  ETHTOOL_LINK_VENDOR_SPECIFIC = -1, /* Vendor specific issue provided in 
 vendor_reason */
 +  ETHTOOL_LINK_NO_ISSUE, /* No issue observed with link */
 +  ETHTOOL_LINK_REASON_UNKNOWN, /* Unknown reason */
>>> I think OTHER would be better that UNKNOWN. 
>> Fine with me.
 +  ETHTOOL_LINK_NETDEV_CARRIER_DOWN, /* Netdev carrier is down */
 +  ETHTOOL_LINK_ADMIN_DOWN, /* Admin down */
>>> These two are interesting. We have that information already. Why do we
>>> want it again?
>> My goal is to gather all link issue reasons in one place.
> I'm actually wondering if this is a user space problem. Nearly
> everything you list is already available. Some you get from ip link,
> others from ethtool or ethtool --module-info, including I2C bus
> error, since you would expect EIO or ETIMEOUT.
>
> If you were to write a user space tool using the information what is
> currently available, what would be missing?
>
> Andrew

I think most of the reasons in this list would be missing.
Auto negotiation failure, link training failure, remote fault indication, bad 
signal integrity, cable protocol mismatch, cable unplugged,
over temperature, power budget exceeded..

Keep in mind that this is just an initial list, not to mention the vendor 
reasons which are not part of any existing API.
I don't see how a user space tool that expects ETIMEOUT/EIO is better than this 
suggestion.


Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback

2017-06-24 Thread Andrew Lunn
On Thu, Jun 22, 2017 at 11:09:04AM +0300, Gal Pressman wrote:
> 
> >> +enum {
> >> +  ETHTOOL_LINK_VENDOR_SPECIFIC = -1, /* Vendor specific issue provided in 
> >> vendor_reason */
> >> +  ETHTOOL_LINK_NO_ISSUE, /* No issue observed with link */
> >> +  ETHTOOL_LINK_REASON_UNKNOWN, /* Unknown reason */
> > I think OTHER would be better that UNKNOWN. 
> 
> Fine with me.
> >> +  ETHTOOL_LINK_NETDEV_CARRIER_DOWN, /* Netdev carrier is down */
> >> +  ETHTOOL_LINK_ADMIN_DOWN, /* Admin down */
> > These two are interesting. We have that information already. Why do we
> > want it again?
> 
> My goal is to gather all link issue reasons in one place.

I'm actually wondering if this is a user space problem. Nearly
everything you list is already available. Some you get from ip link,
others from ethtool or ethtool --module-info, including I2C bus
error, since you would expect EIO or ETIMEOUT.

If you were to write a user space tool using the information what is
currently available, what would be missing?

  Andrew


Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback

2017-06-23 Thread Andrew Lunn
On Fri, Jun 23, 2017 at 11:23:17AM +0300, Gal Pressman wrote:
> >> The I2C bus that's connected to this module (interface).
> >> We can add another reason for MDIO BUS errors or merge to one BUS error 
> >> reason.
>  +ETHTOOL_LINK_UNSUPP_EEPROM, /* Unsupported EEPROM */
> >>> Which EEPROM?
> >> Module EEPROM.
> > Which module? This is all very vague. Some of the Marvell 10G PHYs
> > have an EEPROM to boot from, for example. Would that count? Or are you
> > talking about the SFP 'EEPROM', which is not actually an EEPROM, in
> > that it is not Electrically Erasable, not is it a ROM, since things
> > like temperature changes with time.
> 

> I am referring to the optical/electrical module EEPROM which is
> exposed through standard interface such as SFF 8472. Might not be an
> actual EEPROM but that's how the SFF committee decided to refer to
> it :).

Right, so at a minimum, put a comment: The following properties
referring to the optical/electrical module EEPROM which is exposed
through standard interface such as SFF 8472.

That makes it a lot less ambiguous.

> 
> >
>  +ETHTOOL_LINK_OVERTEMP, /* Over temperature */
>  +ETHTOOL_LINK_PWR_BUDGET_EXC, /* Power budget exceeded */
>  +ETHTOOL_LINK_MODULE_ADMIN_DOWN, /* Module admin down */
> >>> It seems like these last 6 are all SFP issues? How about putting SFP
> >>> into the name?
> >> Might be a QSFP issue for example, we can put module in the name though.
> > What is the generic name of SFP, SFP+ QSFP, SFF?
> 
> AFAIK, the name is module.

And as a term, module is overloaded. If the standard is called SFF
8472, then i would suggest putting SFF in these macros.

This is all assuming we actually decide to expose this information
this way

 Andrew


Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback

2017-06-23 Thread Gal Pressman
>> The I2C bus that's connected to this module (interface).
>> We can add another reason for MDIO BUS errors or merge to one BUS error 
>> reason.
 +  ETHTOOL_LINK_UNSUPP_EEPROM, /* Unsupported EEPROM */
>>> Which EEPROM?
>> Module EEPROM.
> Which module? This is all very vague. Some of the Marvell 10G PHYs
> have an EEPROM to boot from, for example. Would that count? Or are you
> talking about the SFP 'EEPROM', which is not actually an EEPROM, in
> that it is not Electrically Erasable, not is it a ROM, since things
> like temperature changes with time.

I am referring to the optical/electrical module EEPROM which is exposed through 
standard
interface such as SFF 8472. Might not be an actual EEPROM but that's how the 
SFF committee decided
to refer to it :).

>
 +  ETHTOOL_LINK_OVERTEMP, /* Over temperature */
 +  ETHTOOL_LINK_PWR_BUDGET_EXC, /* Power budget exceeded */
 +  ETHTOOL_LINK_MODULE_ADMIN_DOWN, /* Module admin down */
>>> It seems like these last 6 are all SFP issues? How about putting SFP
>>> into the name?
>> Might be a QSFP issue for example, we can put module in the name though.
> What is the generic name of SFP, SFP+ QSFP, SFF?

AFAIK, the name is module.
>
>  Andrew



Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback

2017-06-22 Thread Andrew Lunn
> The I2C bus that's connected to this module (interface).
> We can add another reason for MDIO BUS errors or merge to one BUS error 
> reason.
> >> +  ETHTOOL_LINK_UNSUPP_EEPROM, /* Unsupported EEPROM */
> > Which EEPROM?
> 
> Module EEPROM.

Which module? This is all very vague. Some of the Marvell 10G PHYs
have an EEPROM to boot from, for example. Would that count? Or are you
talking about the SFP 'EEPROM', which is not actually an EEPROM, in
that it is not Electrically Erasable, not is it a ROM, since things
like temperature changes with time.

> >> +  ETHTOOL_LINK_OVERTEMP, /* Over temperature */
> >> +  ETHTOOL_LINK_PWR_BUDGET_EXC, /* Power budget exceeded */
> >> +  ETHTOOL_LINK_MODULE_ADMIN_DOWN, /* Module admin down */
> > It seems like these last 6 are all SFP issues? How about putting SFP
> > into the name?
> 
> Might be a QSFP issue for example, we can put module in the name though.

What is the generic name of SFP, SFP+ QSFP, SFF?

 Andrew


Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback

2017-06-22 Thread Gal Pressman

>> +ETHTOOL_LINK_PWR_BUDGET_EXC, /* Power budget exceeded */
>> +ETHTOOL_LINK_MODULE_ADMIN_DOWN, /* Module admin down */
>> +
>> +ETHTOOL_LINK_REASONS_COUNT
>> +};
> Any enumerated list is going to get changed too often.
> Could the API just return a string?

The motivation for the enumerated list is to make this API as generic as 
possible and compatible
with all ethernet drivers.

Returning a string is a good idea, maybe change the vendor specific opcode to a 
string?


Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback

2017-06-22 Thread Gal Pressman

>> +enum {
>> +ETHTOOL_LINK_VENDOR_SPECIFIC = -1, /* Vendor specific issue provided in 
>> vendor_reason */
>> +ETHTOOL_LINK_NO_ISSUE, /* No issue observed with link */
>> +ETHTOOL_LINK_REASON_UNKNOWN, /* Unknown reason */
> I think OTHER would be better that UNKNOWN. 

Fine with me.
>> +ETHTOOL_LINK_NETDEV_CARRIER_DOWN, /* Netdev carrier is down */
>> +ETHTOOL_LINK_ADMIN_DOWN, /* Admin down */
> These two are interesting. We have that information already. Why do we
> want it again?

My goal is to gather all link issue reasons in one place.
I don't want the user to gather pieces of other reports just to get a sense of 
what's wrong.
>> +ETHTOOL_LINK_AN_FAILED, /* Auto negotiation failed */
>> +ETHTOOL_LINK_TRAINING_FAILED, /* Link training failed */
>> +ETHTOOL_LINK_RMT_FAULT, /* Remote fault indication */
>> +ETHTOOL_LINK_BAD_SIGNAL_INTEGRITY, /* Bad signal integrity */
>> +ETHTOOL_LINK_CABLE_MISMATCH, /* Cable protocol mismatch */
>> +ETHTOOL_LINK_INTERNAL_ERR, /* Internal error */
> How does an internet error differ from an UNKNOWN?

Internal error means I know what happened but I can't tell you, your vendor 
support might be
able to provide a better analysis.
Unknown means that the driver doesn't have any useful information that can help 
in this case.
>> +ETHTOOL_LINK_CABLE_UNPLUGGED, /* Cable unplugged */
>> +ETHTOOL_LINK_UNSUPP_MODULE, /* Unsupported module */
>> +ETHTOOL_LINK_I2C_BUS_ERR, /* I2C bus error */
> Which I2C bus? What about MDIO BUS errors?

The I2C bus that's connected to this module (interface).
We can add another reason for MDIO BUS errors or merge to one BUS error reason.
>> +ETHTOOL_LINK_UNSUPP_EEPROM, /* Unsupported EEPROM */
> Which EEPROM?

Module EEPROM.
>> +ETHTOOL_LINK_OVERTEMP, /* Over temperature */
>> +ETHTOOL_LINK_PWR_BUDGET_EXC, /* Power budget exceeded */
>> +ETHTOOL_LINK_MODULE_ADMIN_DOWN, /* Module admin down */
> It seems like these last 6 are all SFP issues? How about putting SFP
> into the name?

Might be a QSFP issue for example, we can put module in the name though.
>
>  Andrew



Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback

2017-06-21 Thread Stephen Hemminger
On Wed, 21 Jun 2017 16:04:44 +0300
Gal Pressman  wrote:

> +
> +enum {
> + ETHTOOL_LINK_VENDOR_SPECIFIC = -1, /* Vendor specific issue provided in 
> vendor_reason */
> + ETHTOOL_LINK_NO_ISSUE, /* No issue observed with link */
> + ETHTOOL_LINK_REASON_UNKNOWN, /* Unknown reason */
> + ETHTOOL_LINK_NETDEV_CARRIER_DOWN, /* Netdev carrier is down */
> + ETHTOOL_LINK_ADMIN_DOWN, /* Admin down */
> + ETHTOOL_LINK_AN_FAILED, /* Auto negotiation failed */
> + ETHTOOL_LINK_TRAINING_FAILED, /* Link training failed */
> + ETHTOOL_LINK_RMT_FAULT, /* Remote fault indication */
> + ETHTOOL_LINK_BAD_SIGNAL_INTEGRITY, /* Bad signal integrity */
> + ETHTOOL_LINK_CABLE_MISMATCH, /* Cable protocol mismatch */
> + ETHTOOL_LINK_INTERNAL_ERR, /* Internal error */
> + ETHTOOL_LINK_CABLE_UNPLUGGED, /* Cable unplugged */
> + ETHTOOL_LINK_UNSUPP_MODULE, /* Unsupported module */
> + ETHTOOL_LINK_I2C_BUS_ERR, /* I2C bus error */
> + ETHTOOL_LINK_UNSUPP_EEPROM, /* Unsupported EEPROM */
> + ETHTOOL_LINK_OVERTEMP, /* Over temperature */
> + ETHTOOL_LINK_PWR_BUDGET_EXC, /* Power budget exceeded */
> + ETHTOOL_LINK_MODULE_ADMIN_DOWN, /* Module admin down */
> +
> + ETHTOOL_LINK_REASONS_COUNT
> +};

Any enumerated list is going to get changed too often.
Could the API just return a string?


Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback

2017-06-21 Thread Andrew Lunn
> +enum {
> + ETHTOOL_LINK_VENDOR_SPECIFIC = -1, /* Vendor specific issue provided in 
> vendor_reason */
> + ETHTOOL_LINK_NO_ISSUE, /* No issue observed with link */
> + ETHTOOL_LINK_REASON_UNKNOWN, /* Unknown reason */

I think OTHER would be better that UNKNOWN. 

> + ETHTOOL_LINK_NETDEV_CARRIER_DOWN, /* Netdev carrier is down */
> + ETHTOOL_LINK_ADMIN_DOWN, /* Admin down */

These two are interesting. We have that information already. Why do we
want it again?

> + ETHTOOL_LINK_AN_FAILED, /* Auto negotiation failed */
> + ETHTOOL_LINK_TRAINING_FAILED, /* Link training failed */
> + ETHTOOL_LINK_RMT_FAULT, /* Remote fault indication */
> + ETHTOOL_LINK_BAD_SIGNAL_INTEGRITY, /* Bad signal integrity */
> + ETHTOOL_LINK_CABLE_MISMATCH, /* Cable protocol mismatch */
> + ETHTOOL_LINK_INTERNAL_ERR, /* Internal error */

How does an internet error differ from an UNKNOWN?

> + ETHTOOL_LINK_CABLE_UNPLUGGED, /* Cable unplugged */
> + ETHTOOL_LINK_UNSUPP_MODULE, /* Unsupported module */
> + ETHTOOL_LINK_I2C_BUS_ERR, /* I2C bus error */

Which I2C bus? What about MDIO BUS errors?

> + ETHTOOL_LINK_UNSUPP_EEPROM, /* Unsupported EEPROM */

Which EEPROM?

> + ETHTOOL_LINK_OVERTEMP, /* Over temperature */
> + ETHTOOL_LINK_PWR_BUDGET_EXC, /* Power budget exceeded */
> + ETHTOOL_LINK_MODULE_ADMIN_DOWN, /* Module admin down */

It seems like these last 6 are all SFP issues? How about putting SFP
into the name?

 Andrew


[RFC PATCH net-next 1/3] ethtool: Add link down reason callback

2017-06-21 Thread Gal Pressman
Currently, drivers can only tell whether the link is up/down through
ETHTOOL_GLINK callback, but no additional information is given in case
of link down.
This patch provides an infrastructure that allows drivers to hint
the user with the reason why the link is down, in order to ease the
debug process.

Reasons are separated to two types, generic and vendor specific.
Drivers can reply with a generic reason using the enums provided in this
patch (and the ones that will be added in the future), which will be
translated to strings by the userspace ethtool.
In case of a vendor specific reason (not suitable for most vendors),
drivers can reply with ETHTOOL_VENDOR_SPECIFIC reason, in this case the
vendor_reason field should be filled with a vendor specific status code
which will be parsed by the vendor specific userspace parser if one is
available.

This kind of information can save system administrators precious time
which will not be wasted trying to understand why the link won't go
up.

For example, when the cable is unplugged:
$ ethtool ethXX
...
Link detected: no (Cable unplugged)

Signed-off-by: Gal Pressman 
Signed-off-by: Saeed Mahameed 
---
 include/linux/ethtool.h  |  2 ++
 include/uapi/linux/ethtool.h | 33 +
 net/core/ethtool.c   | 24 
 3 files changed, 59 insertions(+)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 83cc986..d472047 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -374,5 +374,7 @@ struct ethtool_ops {
  struct ethtool_link_ksettings *);
int (*set_link_ksettings)(struct net_device *,
  const struct ethtool_link_ksettings *);
+   int (*get_link_down_reason)(struct net_device *,
+   struct ethtool_link_down_reason *);
 };
 #endif /* _LINUX_ETHTOOL_H */
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 7d4a594..8cf9d2c 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -550,6 +550,13 @@ struct ethtool_pauseparam {
 
 #define ETH_GSTRING_LEN32
 
+struct ethtool_link_down_reason {
+   __u32   cmd;
+   __u32   reason;
+   __u32   vendor_reason;
+   __u32   reserved[4];
+};
+
 /**
  * enum ethtool_stringset - string set ID
  * @ETH_SS_TEST: Self-test result names, for use with %ETHTOOL_TEST
@@ -1331,6 +1338,8 @@ struct ethtool_per_queue_op {
 #define ETHTOOL_PHY_GTUNABLE   0x004e /* Get PHY tunable configuration */
 #define ETHTOOL_PHY_STUNABLE   0x004f /* Set PHY tunable configuration */
 
+#define ETHTOOL_GLINK_DOWN_RSN 0x0050 /* Get link down reason */
+
 /* compatibility with older code */
 #define SPARC_ETH_GSET ETHTOOL_GSET
 #define SPARC_ETH_SSET ETHTOOL_SSET
@@ -1766,4 +1775,28 @@ struct ethtool_link_settings {
 * __u32 map_lp_advertising[link_mode_masks_nwords];
 */
 };
+
+enum {
+   ETHTOOL_LINK_VENDOR_SPECIFIC = -1, /* Vendor specific issue provided in 
vendor_reason */
+   ETHTOOL_LINK_NO_ISSUE, /* No issue observed with link */
+   ETHTOOL_LINK_REASON_UNKNOWN, /* Unknown reason */
+   ETHTOOL_LINK_NETDEV_CARRIER_DOWN, /* Netdev carrier is down */
+   ETHTOOL_LINK_ADMIN_DOWN, /* Admin down */
+   ETHTOOL_LINK_AN_FAILED, /* Auto negotiation failed */
+   ETHTOOL_LINK_TRAINING_FAILED, /* Link training failed */
+   ETHTOOL_LINK_RMT_FAULT, /* Remote fault indication */
+   ETHTOOL_LINK_BAD_SIGNAL_INTEGRITY, /* Bad signal integrity */
+   ETHTOOL_LINK_CABLE_MISMATCH, /* Cable protocol mismatch */
+   ETHTOOL_LINK_INTERNAL_ERR, /* Internal error */
+   ETHTOOL_LINK_CABLE_UNPLUGGED, /* Cable unplugged */
+   ETHTOOL_LINK_UNSUPP_MODULE, /* Unsupported module */
+   ETHTOOL_LINK_I2C_BUS_ERR, /* I2C bus error */
+   ETHTOOL_LINK_UNSUPP_EEPROM, /* Unsupported EEPROM */
+   ETHTOOL_LINK_OVERTEMP, /* Over temperature */
+   ETHTOOL_LINK_PWR_BUDGET_EXC, /* Power budget exceeded */
+   ETHTOOL_LINK_MODULE_ADMIN_DOWN, /* Module admin down */
+
+   ETHTOOL_LINK_REASONS_COUNT
+};
+
 #endif /* _UAPI_LINUX_ETHTOOL_H */
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 03111a2..b818ad4 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -2523,6 +2523,26 @@ static int set_phy_tunable(struct net_device *dev, void 
__user *useraddr)
return ret;
 }
 
+static int get_link_down_reason(struct net_device *dev, void __user *useraddr)
+{
+   struct ethtool_link_down_reason ldr;
+   int ret;
+
+   if (!dev->ethtool_ops->get_link_down_reason)
+   return -EOPNOTSUPP;
+
+   memset(, 0, sizeof(ldr));
+   ldr.cmd = ETHTOOL_GLINK_DOWN_RSN;
+   ret = dev->ethtool_ops->get_link_down_reason(dev, );
+   if (ret)
+   return ret;
+
+   if (copy_to_user(useraddr,