Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis
On Fri, 1 Jun 2001, Bogdan Costescu wrote: > On Fri, 1 Jun 2001, jamal wrote: > > > One idea i have been toying with is to maintain hysteris or threshold of > > some form in dev_watchdog; > > AFAIK, dev_watchdog is right now used only for Tx (if I'm wrong, please > correct me!). So how do you sense link loss if you expect only high Rx > traffic ? > Good question. Makes me think. Thoughts further below. > > example: if watchdog timer expires threshold times, you declare the link > > dead and send netif_carrier_off netlink message. > > On recovery, you send netif_carrier_on > > I assume that you mean "on recovery" as in "first succesful hard_start_xmit". > right. > > Assumption: > > If the tx path is blocked, more than likely the link is down. > > Yes, but is this a good approximation ? I'm not saying that it's not, I'm > merely asking for counter-arguments. It is an indirect approximation. Note that if the system data is very asymetrical as in the case you pointed out, notification will take a long long time. You need a plan B. Still, the tx watchdogs are a good source of fault detection in the case of non-availabilty of MII detection and even with the presence of MII. I hate making this more complex than it should be: Since we already have a messaging system within the kernel and user<->kernel space aka "netlink" -- one could easily add a protocol in user space which "dynamically heartbeats" the devices. Control should come from user space; it would be a great idea to avoid ioctls. "Dynamic" in the above sense means trying to totaly avoid making it a synchronous poll. The poll rate is a function of how many packets go out that device per average measurement time. Basically, the period that the user space app dumps "hello" netlink packets to the kernel is a variable. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis
On Fri, 1 Jun 2001, Bogdan Costescu wrote: On Fri, 1 Jun 2001, jamal wrote: One idea i have been toying with is to maintain hysteris or threshold of some form in dev_watchdog; AFAIK, dev_watchdog is right now used only for Tx (if I'm wrong, please correct me!). So how do you sense link loss if you expect only high Rx traffic ? Good question. Makes me think. Thoughts further below. example: if watchdog timer expires threshold times, you declare the link dead and send netif_carrier_off netlink message. On recovery, you send netif_carrier_on I assume that you mean on recovery as in first succesful hard_start_xmit. right. Assumption: If the tx path is blocked, more than likely the link is down. Yes, but is this a good approximation ? I'm not saying that it's not, I'm merely asking for counter-arguments. It is an indirect approximation. Note that if the system data is very asymetrical as in the case you pointed out, notification will take a long long time. You need a plan B. Still, the tx watchdogs are a good source of fault detection in the case of non-availabilty of MII detection and even with the presence of MII. I hate making this more complex than it should be: Since we already have a messaging system within the kernel and user-kernel space aka netlink -- one could easily add a protocol in user space which dynamically heartbeats the devices. Control should come from user space; it would be a great idea to avoid ioctls. Dynamic in the above sense means trying to totaly avoid making it a synchronous poll. The poll rate is a function of how many packets go out that device per average measurement time. Basically, the period that the user space app dumps hello netlink packets to the kernel is a variable. cheers, jamal - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis
On Fri, 1 Jun 2001, jamal wrote: > Jeff, Thanks for copying netdev. Wish more people would do that. Shame on me, I should have thought of that too... I joined lkml only about 2 weeks ago because netdev related topics are sometimes discussed only there... > Not really. > > One idea i have been toying with is to maintain hysteris or threshold of > some form in dev_watchdog; AFAIK, dev_watchdog is right now used only for Tx (if I'm wrong, please correct me!). So how do you sense link loss if you expect only high Rx traffic ? > example: if watchdog timer expires threshold times, you declare the link > dead and send netif_carrier_off netlink message. > On recovery, you send netif_carrier_on I assume that you mean "on recovery" as in "first succesful hard_start_xmit". > Assumption: > If the tx path is blocked, more than likely the link is down. Yes, but is this a good approximation ? I'm not saying that it's not, I'm merely asking for counter-arguments. -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis
Jeff, Thanks for copying netdev. Wish more people would do that. On Fri, 1 Jun 2001, Bogdan Costescu wrote: > On Fri, 1 Jun 2001, Jeff Garzik wrote: > > > The loss and regain of link status should be proactively signalled to > > userspace using netlink or something similar. > > [ For the general discussion ] > I fully agree, but I just wanted to give an example of legit use from > user space of _current_ values from hardware. > > > Currently we have > > netif_carrier_{on,off,ok} but it is only passively checked. > > netif_carrier_{on,off} should probably schedule_task() to fire off a > > netlink message... > > [ Link status details ] > Just that not all NICs have hardware support (and/or not all drivers use > these facilities) for link status change notification using interrupts. > Right now, most drivers _poll_ for media status and based on the poll > rate, netif_carrier routines are (or should be) called. We can't make the > poll rate very small for the general case, as MII access is time > consuming (same discussion was some months ago when the bonding driver > was updated). However, for users who know that they need this info to be > more accurate (at the expense of CPU time), polling through ioctl's is the > only solution. Not really. One idea i have been toying with is to maintain hysteris or threshold of some form in dev_watchdog; example: if watchdog timer expires threshold times, you declare the link dead and send netif_carrier_off netlink message. On recovery, you send netif_carrier_on Assumption: If the tx path is blocked, more than likely the link is down. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis
On Fri, 1 Jun 2001, David S. Miller wrote: > Don't such HA apps need to run as root anyways? Not necessarily, but eventually you can let root (CAP_NET_ADMIN, anyway) go through without any limitations, root can bring down the system at will in other ways. In addition, the rate limiting solution allows a warning to be issued when the limit is exceeded, so that the poor sysadmin knows what hit him 8-) -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis
On Fri, 1 Jun 2001, Jeff Garzik wrote: > The loss and regain of link status should be proactively signalled to > userspace using netlink or something similar. [ For the general discussion ] I fully agree, but I just wanted to give an example of legit use from user space of _current_ values from hardware. > Currently we have > netif_carrier_{on,off,ok} but it is only passively checked. > netif_carrier_{on,off} should probably schedule_task() to fire off a > netlink message... [ Link status details ] Just that not all NICs have hardware support (and/or not all drivers use these facilities) for link status change notification using interrupts. Right now, most drivers _poll_ for media status and based on the poll rate, netif_carrier routines are (or should be) called. We can't make the poll rate very small for the general case, as MII access is time consuming (same discussion was some months ago when the bonding driver was updated). However, for users who know that they need this info to be more accurate (at the expense of CPU time), polling through ioctl's is the only solution. [ Back to general discussion ] So far, to the problem of too often access to hardware, 2 solutions were proposed: 1. cache the values. You can then let the user shoot him-/her-self in the foot by making too many ioctl calls. But this prevent any legit use of current hardware state. 2. rate limiting. You don't let the user access the hardware too often (to be defined), so he/she can't shoot his-/her-self in the foot. Legit use of current hardware state is possible. IMHO, solution 2 is much better. Can you find situations when it's not ? -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis
Jeff Garzik writes: > For your HA application specifically, right now, I would suggest making > sure your net driver calls netif_carrier_xxx correctly, then checking > for IFF_RUNNING interface flag. IFF_RUNNING will disappear if the > interface is up, but there is no carrier [as according to > netif_carrier_ok]. Don't such HA apps need to run as root anyways? Regardless, I agree that, long term, the way to do this is via netlink. Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis
Bogdan Costescu wrote: > No way! If I implement a HA application which depends on link status, I > want the info to be accurate, I don't want to know that 30 seconds ago I > had good link. To tangent a little bit, and add netdev to the CC... The loss and regain of link status should be proactively signalled to userspace using netlink or something similar. Currently we have netif_carrier_{on,off,ok} but it is only passively checked. netif_carrier_{on,off} should probably schedule_task() to fire off a netlink message... For your HA application specifically, right now, I would suggest making sure your net driver calls netif_carrier_xxx correctly, then checking for IFF_RUNNING interface flag. IFF_RUNNING will disappear if the interface is up, but there is no carrier [as according to netif_carrier_ok]. -- Jeff Garzik | Disbelief, that's why you fail. Building 1024| MandrakeSoft | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis
On Fri, 1 Jun 2001, Alan Cox wrote: > I am sure that to an unpriviledged application reporting back the same result > as we saw last time we asked the hardware unless it is over 30 seconds old > will work fine. Maybe 10 for link partner ? No way! If I implement a HA application which depends on link status, I want the info to be accurate, I don't want to know that 30 seconds ago I had good link. IMHO, rate limiting is the only solution. -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis
Bogdan Costescu wrote: No way! If I implement a HA application which depends on link status, I want the info to be accurate, I don't want to know that 30 seconds ago I had good link. To tangent a little bit, and add netdev to the CC... The loss and regain of link status should be proactively signalled to userspace using netlink or something similar. Currently we have netif_carrier_{on,off,ok} but it is only passively checked. netif_carrier_{on,off} should probably schedule_task() to fire off a netlink message... For your HA application specifically, right now, I would suggest making sure your net driver calls netif_carrier_xxx correctly, then checking for IFF_RUNNING interface flag. IFF_RUNNING will disappear if the interface is up, but there is no carrier [as according to netif_carrier_ok]. -- Jeff Garzik | Disbelief, that's why you fail. Building 1024| MandrakeSoft | - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis
Jeff Garzik writes: For your HA application specifically, right now, I would suggest making sure your net driver calls netif_carrier_xxx correctly, then checking for IFF_RUNNING interface flag. IFF_RUNNING will disappear if the interface is up, but there is no carrier [as according to netif_carrier_ok]. Don't such HA apps need to run as root anyways? Regardless, I agree that, long term, the way to do this is via netlink. Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis
On Fri, 1 Jun 2001, Jeff Garzik wrote: The loss and regain of link status should be proactively signalled to userspace using netlink or something similar. [ For the general discussion ] I fully agree, but I just wanted to give an example of legit use from user space of _current_ values from hardware. Currently we have netif_carrier_{on,off,ok} but it is only passively checked. netif_carrier_{on,off} should probably schedule_task() to fire off a netlink message... [ Link status details ] Just that not all NICs have hardware support (and/or not all drivers use these facilities) for link status change notification using interrupts. Right now, most drivers _poll_ for media status and based on the poll rate, netif_carrier routines are (or should be) called. We can't make the poll rate very small for the general case, as MII access is time consuming (same discussion was some months ago when the bonding driver was updated). However, for users who know that they need this info to be more accurate (at the expense of CPU time), polling through ioctl's is the only solution. [ Back to general discussion ] So far, to the problem of too often access to hardware, 2 solutions were proposed: 1. cache the values. You can then let the user shoot him-/her-self in the foot by making too many ioctl calls. But this prevent any legit use of current hardware state. 2. rate limiting. You don't let the user access the hardware too often (to be defined), so he/she can't shoot his-/her-self in the foot. Legit use of current hardware state is possible. IMHO, solution 2 is much better. Can you find situations when it's not ? -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis
On Fri, 1 Jun 2001, David S. Miller wrote: Don't such HA apps need to run as root anyways? Not necessarily, but eventually you can let root (CAP_NET_ADMIN, anyway) go through without any limitations, root can bring down the system at will in other ways. In addition, the rate limiting solution allows a warning to be issued when the limit is exceeded, so that the poor sysadmin knows what hit him 8-) -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis
Jeff, Thanks for copying netdev. Wish more people would do that. On Fri, 1 Jun 2001, Bogdan Costescu wrote: On Fri, 1 Jun 2001, Jeff Garzik wrote: The loss and regain of link status should be proactively signalled to userspace using netlink or something similar. [ For the general discussion ] I fully agree, but I just wanted to give an example of legit use from user space of _current_ values from hardware. Currently we have netif_carrier_{on,off,ok} but it is only passively checked. netif_carrier_{on,off} should probably schedule_task() to fire off a netlink message... [ Link status details ] Just that not all NICs have hardware support (and/or not all drivers use these facilities) for link status change notification using interrupts. Right now, most drivers _poll_ for media status and based on the poll rate, netif_carrier routines are (or should be) called. We can't make the poll rate very small for the general case, as MII access is time consuming (same discussion was some months ago when the bonding driver was updated). However, for users who know that they need this info to be more accurate (at the expense of CPU time), polling through ioctl's is the only solution. Not really. One idea i have been toying with is to maintain hysteris or threshold of some form in dev_watchdog; example: if watchdog timer expires threshold times, you declare the link dead and send netif_carrier_off netlink message. On recovery, you send netif_carrier_on Assumption: If the tx path is blocked, more than likely the link is down. cheers, jamal - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis
On Fri, 1 Jun 2001, Alan Cox wrote: I am sure that to an unpriviledged application reporting back the same result as we saw last time we asked the hardware unless it is over 30 seconds old will work fine. Maybe 10 for link partner ? No way! If I implement a HA application which depends on link status, I want the info to be accurate, I don't want to know that 30 seconds ago I had good link. IMHO, rate limiting is the only solution. -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis
On Fri, 1 Jun 2001, jamal wrote: Jeff, Thanks for copying netdev. Wish more people would do that. Shame on me, I should have thought of that too... I joined lkml only about 2 weeks ago because netdev related topics are sometimes discussed only there... Not really. One idea i have been toying with is to maintain hysteris or threshold of some form in dev_watchdog; AFAIK, dev_watchdog is right now used only for Tx (if I'm wrong, please correct me!). So how do you sense link loss if you expect only high Rx traffic ? example: if watchdog timer expires threshold times, you declare the link dead and send netif_carrier_off netlink message. On recovery, you send netif_carrier_on I assume that you mean on recovery as in first succesful hard_start_xmit. Assumption: If the tx path is blocked, more than likely the link is down. Yes, but is this a good approximation ? I'm not saying that it's not, I'm merely asking for counter-arguments. -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/