Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis

2001-06-02 Thread jamal



On Fri, 1 Jun 2001, Bogdan Costescu wrote:

> On Fri, 1 Jun 2001, jamal wrote:
>
> > One idea i have been toying with is to maintain hysteris or threshold of
> > some form in dev_watchdog;
>
> AFAIK, dev_watchdog is right now used only for Tx (if I'm wrong, please
> correct me!). So how do you sense link loss if you expect only high Rx
> traffic ?
>

Good question. Makes me think. Thoughts further below.

> > example: if watchdog timer expires threshold times, you declare the link
> > dead and send netif_carrier_off netlink message.
> > On recovery, you send  netif_carrier_on
>
> I assume that you mean "on recovery" as in "first succesful hard_start_xmit".
>

right.

> > Assumption:
> > If the tx path is blocked, more than likely the link is down.
>
> Yes, but is this a good approximation ? I'm not saying that it's not, I'm
> merely asking for counter-arguments.

It is an indirect approximation. Note that if the system data is very
asymetrical as in the case you pointed out, notification will take a long
long time. You need a plan B. Still, the tx watchdogs are a good source of
fault detection in the case of non-availabilty of MII detection and even
with the presence of MII.

I hate making this more complex than it should be:

Since we already have a messaging system within the kernel and
user<->kernel space aka "netlink" -- one could easily add a protocol in
user space which "dynamically heartbeats" the devices. Control should come
from user space; it would be a great idea to avoid ioctls.

"Dynamic" in the above sense means trying to totaly avoid making it a
synchronous poll. The poll rate is a function of how many packets go out
that device per average measurement time. Basically, the period that the
user space app dumps "hello" netlink packets to the kernel is a variable.

cheers,
jamal


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis

2001-06-02 Thread jamal



On Fri, 1 Jun 2001, Bogdan Costescu wrote:

 On Fri, 1 Jun 2001, jamal wrote:

  One idea i have been toying with is to maintain hysteris or threshold of
  some form in dev_watchdog;

 AFAIK, dev_watchdog is right now used only for Tx (if I'm wrong, please
 correct me!). So how do you sense link loss if you expect only high Rx
 traffic ?


Good question. Makes me think. Thoughts further below.

  example: if watchdog timer expires threshold times, you declare the link
  dead and send netif_carrier_off netlink message.
  On recovery, you send  netif_carrier_on

 I assume that you mean on recovery as in first succesful hard_start_xmit.


right.

  Assumption:
  If the tx path is blocked, more than likely the link is down.

 Yes, but is this a good approximation ? I'm not saying that it's not, I'm
 merely asking for counter-arguments.

It is an indirect approximation. Note that if the system data is very
asymetrical as in the case you pointed out, notification will take a long
long time. You need a plan B. Still, the tx watchdogs are a good source of
fault detection in the case of non-availabilty of MII detection and even
with the presence of MII.

I hate making this more complex than it should be:

Since we already have a messaging system within the kernel and
user-kernel space aka netlink -- one could easily add a protocol in
user space which dynamically heartbeats the devices. Control should come
from user space; it would be a great idea to avoid ioctls.

Dynamic in the above sense means trying to totaly avoid making it a
synchronous poll. The poll rate is a function of how many packets go out
that device per average measurement time. Basically, the period that the
user space app dumps hello netlink packets to the kernel is a variable.

cheers,
jamal


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis

2001-06-01 Thread Bogdan Costescu

On Fri, 1 Jun 2001, jamal wrote:

> Jeff, Thanks for copying netdev. Wish more people would do that.

Shame on me, I should have thought of that too... I joined lkml only about
2 weeks ago because netdev related topics are sometimes discussed only
there...

> Not really.
>
> One idea i have been toying with is to maintain hysteris or threshold of
> some form in dev_watchdog;

AFAIK, dev_watchdog is right now used only for Tx (if I'm wrong, please
correct me!). So how do you sense link loss if you expect only high Rx
traffic ?

> example: if watchdog timer expires threshold times, you declare the link
> dead and send netif_carrier_off netlink message.
> On recovery, you send  netif_carrier_on

I assume that you mean "on recovery" as in "first succesful hard_start_xmit".

> Assumption:
> If the tx path is blocked, more than likely the link is down.

Yes, but is this a good approximation ? I'm not saying that it's not, I'm
merely asking for counter-arguments.

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: [EMAIL PROTECTED]


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis

2001-06-01 Thread jamal


Jeff, Thanks for copying netdev. Wish more people would do that.

On Fri, 1 Jun 2001, Bogdan Costescu wrote:

> On Fri, 1 Jun 2001, Jeff Garzik wrote:
>
> > The loss and regain of link status should be proactively signalled to
> > userspace using netlink or something similar.
>
> [ For the general discussion ]
> I fully agree, but I just wanted to give an example of legit use from
> user space of _current_ values from hardware.
>
> >  Currently we have
> > netif_carrier_{on,off,ok} but it is only passively checked.
> > netif_carrier_{on,off} should probably schedule_task() to fire off a
> > netlink message...
>
> [ Link status details ]
> Just that not all NICs have hardware support (and/or not all drivers use
> these facilities) for link status change notification using interrupts.
> Right now, most drivers _poll_ for media status and based on the poll
> rate, netif_carrier routines are (or should be) called. We can't make the
> poll rate very small for the general case, as MII access is time
> consuming (same discussion was some months ago when the bonding driver
> was updated). However, for users who know that they need this info to be
> more accurate (at the expense of CPU time), polling through ioctl's is the
> only solution.

Not really.

One idea i have been toying with is to maintain hysteris or threshold of
some form in dev_watchdog;
example: if watchdog timer expires threshold times, you declare the link
dead and send netif_carrier_off netlink message.
On recovery, you send  netif_carrier_on

Assumption:
If the tx path is blocked, more than likely the link is down.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis

2001-06-01 Thread Bogdan Costescu

On Fri, 1 Jun 2001, David S. Miller wrote:

> Don't such HA apps need to run as root anyways?

Not necessarily, but eventually you can let root (CAP_NET_ADMIN, anyway)
go through without any limitations, root can bring down the system at will
in other ways.

In addition, the rate limiting solution allows a warning to be issued when
the limit is exceeded, so that the poor sysadmin knows what hit him 8-)

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis

2001-06-01 Thread Bogdan Costescu

On Fri, 1 Jun 2001, Jeff Garzik wrote:

> The loss and regain of link status should be proactively signalled to
> userspace using netlink or something similar.

[ For the general discussion ]
I fully agree, but I just wanted to give an example of legit use from
user space of _current_ values from hardware.

>  Currently we have
> netif_carrier_{on,off,ok} but it is only passively checked.
> netif_carrier_{on,off} should probably schedule_task() to fire off a
> netlink message...

[ Link status details ]
Just that not all NICs have hardware support (and/or not all drivers use
these facilities) for link status change notification using interrupts.
Right now, most drivers _poll_ for media status and based on the poll
rate, netif_carrier routines are (or should be) called. We can't make the
poll rate very small for the general case, as MII access is time
consuming (same discussion was some months ago when the bonding driver
was updated). However, for users who know that they need this info to be
more accurate (at the expense of CPU time), polling through ioctl's is the
only solution.

[ Back to general discussion ]
So far, to the problem of too often access to hardware, 2 solutions were
proposed:
1. cache the values. You can then let the user shoot him-/her-self in the
   foot by making too many ioctl calls. But this prevent any legit use of
   current hardware state.
2. rate limiting. You don't let the user access the hardware too often (to
   be defined), so he/she can't shoot his-/her-self in the foot. Legit use
   of current hardware state is possible.

IMHO, solution 2 is much better. Can you find situations when it's not ?

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: [EMAIL PROTECTED]



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis

2001-06-01 Thread David S. Miller


Jeff Garzik writes:
 > For your HA application specifically, right now, I would suggest making
 > sure your net driver calls netif_carrier_xxx correctly, then checking
 > for IFF_RUNNING interface flag.  IFF_RUNNING will disappear if the
 > interface is up, but there is no carrier [as according to
 > netif_carrier_ok].

Don't such HA apps need to run as root anyways?

Regardless, I agree that, long term, the way to do this is via netlink.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis

2001-06-01 Thread Jeff Garzik

Bogdan Costescu wrote:
> No way! If I implement a HA application which depends on link status, I
> want the info to be accurate, I don't want to know that 30 seconds ago I
> had good link.

To tangent a little bit, and add netdev to the CC...

The loss and regain of link status should be proactively signalled to
userspace using netlink or something similar.  Currently we have
netif_carrier_{on,off,ok} but it is only passively checked. 
netif_carrier_{on,off} should probably schedule_task() to fire off a
netlink message...

For your HA application specifically, right now, I would suggest making
sure your net driver calls netif_carrier_xxx correctly, then checking
for IFF_RUNNING interface flag.  IFF_RUNNING will disappear if the
interface is up, but there is no carrier [as according to
netif_carrier_ok].

-- 
Jeff Garzik  | Disbelief, that's why you fail.
Building 1024|
MandrakeSoft |
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis

2001-06-01 Thread Bogdan Costescu

On Fri, 1 Jun 2001, Alan Cox wrote:

> I am sure that to an unpriviledged application reporting back the same result
> as we saw last time we asked the hardware unless it is over 30 seconds old
> will work fine. Maybe 10 for link partner ?

No way! If I implement a HA application which depends on link status, I
want the info to be accurate, I don't want to know that 30 seconds ago I
had good link.

IMHO, rate limiting is the only solution.

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis

2001-06-01 Thread Jeff Garzik

Bogdan Costescu wrote:
 No way! If I implement a HA application which depends on link status, I
 want the info to be accurate, I don't want to know that 30 seconds ago I
 had good link.

To tangent a little bit, and add netdev to the CC...

The loss and regain of link status should be proactively signalled to
userspace using netlink or something similar.  Currently we have
netif_carrier_{on,off,ok} but it is only passively checked. 
netif_carrier_{on,off} should probably schedule_task() to fire off a
netlink message...

For your HA application specifically, right now, I would suggest making
sure your net driver calls netif_carrier_xxx correctly, then checking
for IFF_RUNNING interface flag.  IFF_RUNNING will disappear if the
interface is up, but there is no carrier [as according to
netif_carrier_ok].

-- 
Jeff Garzik  | Disbelief, that's why you fail.
Building 1024|
MandrakeSoft |
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis

2001-06-01 Thread David S. Miller


Jeff Garzik writes:
  For your HA application specifically, right now, I would suggest making
  sure your net driver calls netif_carrier_xxx correctly, then checking
  for IFF_RUNNING interface flag.  IFF_RUNNING will disappear if the
  interface is up, but there is no carrier [as according to
  netif_carrier_ok].

Don't such HA apps need to run as root anyways?

Regardless, I agree that, long term, the way to do this is via netlink.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis

2001-06-01 Thread Bogdan Costescu

On Fri, 1 Jun 2001, Jeff Garzik wrote:

 The loss and regain of link status should be proactively signalled to
 userspace using netlink or something similar.

[ For the general discussion ]
I fully agree, but I just wanted to give an example of legit use from
user space of _current_ values from hardware.

  Currently we have
 netif_carrier_{on,off,ok} but it is only passively checked.
 netif_carrier_{on,off} should probably schedule_task() to fire off a
 netlink message...

[ Link status details ]
Just that not all NICs have hardware support (and/or not all drivers use
these facilities) for link status change notification using interrupts.
Right now, most drivers _poll_ for media status and based on the poll
rate, netif_carrier routines are (or should be) called. We can't make the
poll rate very small for the general case, as MII access is time
consuming (same discussion was some months ago when the bonding driver
was updated). However, for users who know that they need this info to be
more accurate (at the expense of CPU time), polling through ioctl's is the
only solution.

[ Back to general discussion ]
So far, to the problem of too often access to hardware, 2 solutions were
proposed:
1. cache the values. You can then let the user shoot him-/her-self in the
   foot by making too many ioctl calls. But this prevent any legit use of
   current hardware state.
2. rate limiting. You don't let the user access the hardware too often (to
   be defined), so he/she can't shoot his-/her-self in the foot. Legit use
   of current hardware state is possible.

IMHO, solution 2 is much better. Can you find situations when it's not ?

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: [EMAIL PROTECTED]



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis

2001-06-01 Thread Bogdan Costescu

On Fri, 1 Jun 2001, David S. Miller wrote:

 Don't such HA apps need to run as root anyways?

Not necessarily, but eventually you can let root (CAP_NET_ADMIN, anyway)
go through without any limitations, root can bring down the system at will
in other ways.

In addition, the rate limiting solution allows a warning to be issued when
the limit is exceeded, so that the poor sysadmin knows what hit him 8-)

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis

2001-06-01 Thread jamal


Jeff, Thanks for copying netdev. Wish more people would do that.

On Fri, 1 Jun 2001, Bogdan Costescu wrote:

 On Fri, 1 Jun 2001, Jeff Garzik wrote:

  The loss and regain of link status should be proactively signalled to
  userspace using netlink or something similar.

 [ For the general discussion ]
 I fully agree, but I just wanted to give an example of legit use from
 user space of _current_ values from hardware.

   Currently we have
  netif_carrier_{on,off,ok} but it is only passively checked.
  netif_carrier_{on,off} should probably schedule_task() to fire off a
  netlink message...

 [ Link status details ]
 Just that not all NICs have hardware support (and/or not all drivers use
 these facilities) for link status change notification using interrupts.
 Right now, most drivers _poll_ for media status and based on the poll
 rate, netif_carrier routines are (or should be) called. We can't make the
 poll rate very small for the general case, as MII access is time
 consuming (same discussion was some months ago when the bonding driver
 was updated). However, for users who know that they need this info to be
 more accurate (at the expense of CPU time), polling through ioctl's is the
 only solution.

Not really.

One idea i have been toying with is to maintain hysteris or threshold of
some form in dev_watchdog;
example: if watchdog timer expires threshold times, you declare the link
dead and send netif_carrier_off netlink message.
On recovery, you send  netif_carrier_on

Assumption:
If the tx path is blocked, more than likely the link is down.

cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis

2001-06-01 Thread Bogdan Costescu

On Fri, 1 Jun 2001, Alan Cox wrote:

 I am sure that to an unpriviledged application reporting back the same result
 as we saw last time we asked the hardware unless it is over 30 seconds old
 will work fine. Maybe 10 for link partner ?

No way! If I implement a HA application which depends on link status, I
want the info to be accurate, I don't want to know that 30 seconds ago I
had good link.

IMHO, rate limiting is the only solution.

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] support for Cobalt Networks (x86 only) systems (forrealthis

2001-06-01 Thread Bogdan Costescu

On Fri, 1 Jun 2001, jamal wrote:

 Jeff, Thanks for copying netdev. Wish more people would do that.

Shame on me, I should have thought of that too... I joined lkml only about
2 weeks ago because netdev related topics are sometimes discussed only
there...

 Not really.

 One idea i have been toying with is to maintain hysteris or threshold of
 some form in dev_watchdog;

AFAIK, dev_watchdog is right now used only for Tx (if I'm wrong, please
correct me!). So how do you sense link loss if you expect only high Rx
traffic ?

 example: if watchdog timer expires threshold times, you declare the link
 dead and send netif_carrier_off netlink message.
 On recovery, you send  netif_carrier_on

I assume that you mean on recovery as in first succesful hard_start_xmit.

 Assumption:
 If the tx path is blocked, more than likely the link is down.

Yes, but is this a good approximation ? I'm not saying that it's not, I'm
merely asking for counter-arguments.

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: [EMAIL PROTECTED]


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/