Re: request for information about the ath5k licensing

2007-09-07 Thread Reyk Floeter
On Wed, Sep 05, 2007 at 01:00:15PM -0400, Luis R. Rodriguez wrote:
 On 9/5/07, Michael Buesch [EMAIL PROTECTED] wrote:
  On Wednesday 05 September 2007, Reyk Floeter wrote:
   I'm the author of the free hardware driver layer for wireless Atheros
   devices in OpenBSD, also known as OpenHAL.
  
   I'm still trying to get an idea about the facts and the latest state
   of the incidence that violated the copyright of my code, because I
   just returned from vacation.
 
   Could you please give me some feedback about the latest state? Please
   reply in private, I'm not subscribed to any of the Linux lists and I'm
   rather interested in facts than in the usual trolling.
  
   - Has this issue been fixed?
 
  It has never been applied to any repository.
  - No issue and no copyright violation.
 
   - Is there any repository available with the ath5k code using a
   modified/extended license?
 
  No.
 
 Well that is not accurate. Please give us a few we are working on
 verifying some information for you.
 
   I don't know how to find the relevant bits in the various Linux git
   repositories. Sorry, I don't get the structure of it. Are there any
   other sources online except this diff on the linux kernel mailing
   list?
  
   - Are there any plans to release the ath5k code using a
   modified/extended license?
 
  No.
 
 
 Same here. Apologies for this taking so long. It'll all be sorted out soon.
 

I'm still waiting for an answer. Your process is taking too long.

Reyk
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][MIPS][7/7] AR7: ethernet

2007-09-07 Thread Geert Uytterhoeven
On Thu, 6 Sep 2007, Andrew Morton wrote:
  On Thu, 6 Sep 2007 17:34:10 +0200 Matteo Croce [EMAIL PROTECTED] wrote:
  Driver for the cpmac 100M ethernet driver.
  It works fine disabling napi support, enabling it gives a kernel panic
  when the first IPv6 packet has to be forwarded.
  Other than that works fine.
 
 The driver does a lot of open-coded dma_cache_inv() calls (in a way which
 assumes a 32-bit bus, too).  I assume that dma_cache_inv() is some mips

No, even i386 has it ;-)

 thing.  I'd have thought that it would be better to use the dma mapping API
 thoughout the driver, and its associated dma invalidation APIs.

However, Ralf just posted a patch to remove it on all architectures, and
driver writers should consider it gone.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say programmer or something like that.
-- Linus Torvalds
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


new NAPI interface broken

2007-09-07 Thread Jan-Bernd Themann
Hi Stephen,

I saw that you developed most of the new NAPI interface.
I already addressed this issue a while ago. Please correct me if I got
it wrong. I think there is still a serious problem with the NAPI
changes to make NAPI polling independent of struct net_device objects.
Its about the question who inserts and removes devices from the poll list.

netif_rx_schedule: sets NAPI_STATE_SCHED flag, insert device in poll list.
netif_rx_complete: clears NAPI_STATE_SCHED
netif_rx_reschedule: sets NAPI_STATE_SCHED, insert device in poll list.
net_rx_action: 
 -removes dev from poll list
 -calls poll function
 -adds dev to poll list if NAPI_STATE_SCHED still set

1) netif_rx_complete and netif_rx_reschedule don't work together
2) On SMP systems: after netif_rx_complete has been called on CPU1
   (+interruts enabled), netif_rx_schedule could be called on CPU2 
   (irq handler) before net_rx_action on CPU1 has checked NAPI_STATE_SCHED. 
   In that case the device would be added to poll lists of CPU1 and CPU2
   as net_rx_action would see NAPI_STATE_SCHED set.
   This must not happen. It will be caught when netif_rx_complete is
   called the second time (BUG() called)

This would mean we have a problem on all SMP machines right now.

If I got all this right then we probably need a further flag to tell
net_rx_action whether to poll again or to stop (with the possibility
that the device has been scheduled on a different CPU in between).
The old NAPI interface uses the return value of poll to determine
if the device has to be polled again or not. 
We can either switch back or in case we want to stick to
the new return value, we might have to add something similar to 
the NAPI_STATE_SCHED flag or a new parameter...

Regards,
Jan-Bernd
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: possible NAPI improvements to reduce interrupt rates for low traffic rates

2007-09-07 Thread jamal
On Fri, 2007-07-09 at 10:31 +0100, James Chapman wrote:
 Not really. I used 3-year-old, single CPU x86 boxes with e100 
 interfaces. 
 The idle poll change keeps them in polled mode. Without idle 
 poll, I get twice as many interrupts as packets, one for txdone and one 
 for rx. NAPI is continuously scheduled in/out.

Certainly faster than the machine in the paper (which was about 2 years
old in 2005).
I could never get ping -f to do that for me - so things must be getting
worse with newer machines then.

 No. Since I did a flood ping from the machine under test, the improved 
 latency meant that the ping response was handled more quickly, causing 
 the next packet to be sent sooner. So more packets were transmitted in 
 the allotted time (10 seconds).

ok.

 With current NAPI:
 rtt min/avg/max/mdev = 0.902/1.843/101.727/4.659 ms, pipe 9, ipg/ewma 
 1.611/1.421 ms
 
 With idle poll changes:
 rtt min/avg/max/mdev = 0.898/1.117/28.371/0.689 ms, pipe 3, ipg/ewma 
 1.175/1.236 ms

Not bad in terms of latency. The deviation certainly looks better.

 But the CPU has done more work. 

I am going to be the devil's advocate[1]:
If the problem i am trying to solve is reduce cpu use at lower rate,
then this is not the right answer because your cpu use has gone up.
Your latency numbers have not improved that much (looking at the avg)
and your throughput is not that much higher. Will i be willing to pay
more cpu (of an already piggish cpu use by NAPI at that rate with 2
interupts per packet)?

Another test: try a simple ping and compare the rtts.

 The problem I started thinking about was the one where NAPI thrashes 
 in/out of polled mode at higher and higher rates as network interface 
 speeds and CPU speeds increase. A flood ping demonstrates this even on 
 100M links on my boxes. 

things must be getting worse in the state of average hardware out there.
It will be worthwile exercise to compare on an even faster machine
and see what transpires there.
 
 Networking boxes want consistent 
 performance/latency for all traffic patterns and they need to avoid 
 interrupt livelock. Current practice seems to be to use hardware 
 interrupt mitigation or timers to limit interrupt rate but this just 
 hurts latency, as you noted. So I'm trying to find a way to limit the 
 NAPI interrupt rate without increasing latency. My comment about this 
 approach being suitable for routers and networked servers is that these 
 boxes care more about minimizing packet latency than they do about 
 wasting CPU cycles by polling idle devices.

I think the arguement of who cares about a little more cpu is valid
for the case of routers. It is a double edged sword, because it applies
to the case of who cares if NAPI uses a little more cpu at low rates
and who cares if James turns on polling and abuses a little more-more
cpu. Since NAPI is the incumbent, the onus(sp?) is to do better. You
must do better sir!

Look at the timers, she said - that way you may be able to cut the cpu
abuse.

cheers,
jamal

[1] historically the devils advocate was a farce really ;-

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170

2007-09-07 Thread Paul E. McKenney
On Fri, Sep 07, 2007 at 03:27:15PM +0200, Johannes Berg wrote:
 On Thu, 2007-09-06 at 08:46 -0700, Paul E. McKenney wrote:
 
  Looks good to me from an RCU viewpoint.  I cannot claim familiarity with
  this code.  I therefore especially like the indications of where RTNL
  is held and not!!!
 
 :)
 
  Some questions below based on a quick scan.  And a global question:
  should the comments about RTNL being held be replaced by ASSERT_RTNL()?
 
 I don't like ASSERT_RTNL() much because it actually tries to lock it.
 I'd be much happer if it was WARN_ON(!mutex_locked(rtnl_mutex)) or
 something equivalent.

Ah!  It would indeed be nice to have a lower-overhead ASSERT_RTNL_LIGHT()
or whatever.

 In any case, I have an updated patch I'll be sending soon, and it
 requires a new list walking primitive I'll also send.

Look forward to seeing it!

   - write_lock_bh(local-sub_if_lock);
   + /* we're under RTNL so all this is fine */
 if (unlikely(local-reg_state == IEEE80211_DEV_UNREGISTERED)) {
   - write_unlock_bh(local-sub_if_lock);
 __ieee80211_if_del(local, sdata);
 return -ENODEV;
 }
   - list_add(sdata-list, local-sub_if_list);
   + list_add_tail_rcu(sdata-list, local-interfaces);
  
  The _rcu is required because this list isn't protected by RTNL?
 
 Yes, not all walkers of the list are protected by the RTNL.

K.

   @@ -226,22 +225,22 @@ void ieee80211_if_reinit(struct net_devi
 /* Remove all virtual interfaces that use this BSS
  * as their sdata-bss */
 struct ieee80211_sub_if_data *tsdata, *n;
   - LIST_HEAD(tmp_list);
   
   - write_lock_bh(local-sub_if_lock);
  
  This code is also protected by RTNL?
 
 Yes.

Comment?  (Or is it in the function header?)

 ASSERT_RTNL();
  
  I -like- this!!!  ;-)
 
 :)

Thanx, Paul
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Freeing alive inet6 address

2007-09-07 Thread Denis V. Lunev
From: Denis V. Lunev [EMAIL PROTECTED]

addrconf_dad_failure calls addrconf_dad_stop which takes referenced address
and drops the count. So, in6_ifa_put perrformed at out: is extra. This
results in message: Freeing alive inet6 address and not released dst entries.

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
Signed-off-by: Alexey Dobriyan [EMAIL PROTECTED]

--- ./net/ipv6/ndisc.c.ipv6dad  2007-09-03 16:54:32.0 +0400
+++ ./net/ipv6/ndisc.c  2007-09-07 13:34:30.0 +0400
@@ -736,7 +736,7 @@ static void ndisc_recv_ns(struct sk_buff
 * so fail our DAD process
 */
addrconf_dad_failure(ifp);
-   goto out;
+   return;
} else {
/*
 * This is not a dad solicitation.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170

2007-09-07 Thread Johannes Berg
On Thu, 2007-09-06 at 08:46 -0700, Paul E. McKenney wrote:

 Looks good to me from an RCU viewpoint.  I cannot claim familiarity with
 this code.  I therefore especially like the indications of where RTNL
 is held and not!!!

:)

 Some questions below based on a quick scan.  And a global question:
 should the comments about RTNL being held be replaced by ASSERT_RTNL()?

I don't like ASSERT_RTNL() much because it actually tries to lock it.
I'd be much happer if it was WARN_ON(!mutex_locked(rtnl_mutex)) or
something equivalent.

In any case, I have an updated patch I'll be sending soon, and it
requires a new list walking primitive I'll also send.

  -   write_lock_bh(local-sub_if_lock);
  +   /* we're under RTNL so all this is fine */
  if (unlikely(local-reg_state == IEEE80211_DEV_UNREGISTERED)) {
  -   write_unlock_bh(local-sub_if_lock);
  __ieee80211_if_del(local, sdata);
  return -ENODEV;
  }
  -   list_add(sdata-list, local-sub_if_list);
  +   list_add_tail_rcu(sdata-list, local-interfaces);
 
 The _rcu is required because this list isn't protected by RTNL?

Yes, not all walkers of the list are protected by the RTNL.

  @@ -226,22 +225,22 @@ void ieee80211_if_reinit(struct net_devi
  /* Remove all virtual interfaces that use this BSS
   * as their sdata-bss */
  struct ieee80211_sub_if_data *tsdata, *n;
  -   LIST_HEAD(tmp_list);
  
  -   write_lock_bh(local-sub_if_lock);
 
 This code is also protected by RTNL?

Yes.

  ASSERT_RTNL();
 
 I -like- this!!!  ;-)

:)

johannes


signature.asc
Description: This is a digitally signed message part


[RFC] mac80211: fix virtual interface locking

2007-09-07 Thread Johannes Berg
Florian Lohoff noticed a bug in mac80211: when bringing the
master interface down while other virtual interfaces are up
we call dev_close() under a spinlock which is not allowed.
This patch removes the sub_if_lock used by mac80211 in favour
of using an RCU list. All list manipulations are already done
under rtnl so are well protected against each other, and the
read-side locks we took in the RX and TX code are already in
RCU read-side critical sections.

Signed-off-by: Johannes Berg [EMAIL PROTECTED]
Cc: Florian Lohoff [EMAIL PROTECTED]
Cc: Herbert Xu [EMAIL PROTECTED]
Cc: Michal Piotrowski [EMAIL PROTECTED]
Cc: Satyam Sharma [EMAIL PROTECTED]

---
If you want to test this you'll need to get the other pending patches,
as John is at KS he isn't pushing to Dave who is at KS too anyhow. Grab
from
http://johannes.sipsolutions.net/patches/net-2.6.24/all/2007-09-06-13:43/ 
patches 002-011, they are slated to go into net-2.6.24 if timing works out. 
I'll backport this fix to -stable when we actually get around to verifying it.

 net/mac80211/ieee80211.c   |  100 -
 net/mac80211/ieee80211_i.h |5 --
 net/mac80211/ieee80211_iface.c |   31 +---
 net/mac80211/ieee80211_sta.c   |   12 ++--
 net/mac80211/rx.c  |9 +--
 net/mac80211/tx.c  |   10 ++--
 6 files changed, 84 insertions(+), 83 deletions(-)

--- wireless-dev.orig/net/mac80211/ieee80211.c  2007-09-07 10:52:12.604441281 
+0200
+++ wireless-dev/net/mac80211/ieee80211.c   2007-09-07 16:30:34.044429746 
+0200
@@ -88,24 +88,31 @@ static struct dev_mc_list *ieee80211_get
return NULL;
}
 
-   /* start of iteration, both unassigned */
-   if (!mcd-cur  !mcd-sdata) {
-   mcd-sdata = list_entry(local-sub_if_list.next,
-   struct ieee80211_sub_if_data, list);
-   mcd-cur = mcd-sdata-dev-mc_list;
-   }
+   /*
+* Prepare for iteration if not done already.
+*/
+   list_prepare_entry(mcd-sdata, local-interfaces, list);
 
-   if (mcd-cur)
+   if (mcd-cur) {
+   /*
+* Iterate over the multicast addresses in
+* the current device (mcd-sdata).
+*/
mcd-cur = mcd-cur-next;
+   }
 
-   while (!mcd-cur) {
-   /* reached end of interface list? */
-   if (mcd-sdata-list.next == local-sub_if_list)
-   break;
-   /* otherwise try next interface */
-   mcd-sdata = list_entry(mcd-sdata-list.next,
-   struct ieee80211_sub_if_data, list);
-   mcd-cur = mcd-sdata-dev-mc_list;
+   if (!mcd-cur) {
+   /*
+* Iterate over the devices until finding one (the
+* first or the next) with multicast addresses.
+*/
+   list_for_each_entry_continue_rcu(mcd-sdata,
+local-interfaces,
+list) {
+   mcd-cur = mcd-sdata-dev-mc_list;
+   if (mcd-cur)
+   break;
+   }
}
 
return mcd-cur;
@@ -145,9 +152,10 @@ static void ieee80211_configure_filter(s
 
/*
 * We can iterate through the device list for the multicast
-* address list so need to lock it.
+* address list so need to be in a RCU read-side section,
+* the RTNL isn't held in this function.
 */
-   read_lock(local-sub_if_lock);
+   rcu_read_lock();
 
/* be a bit nasty */
new_flags |= (131);
@@ -163,7 +171,7 @@ static void ieee80211_configure_filter(s
WARN_ON(mcd.cur);
 
local-filter_flags = new_flags  ~(131);
-   read_unlock(local-sub_if_lock);
+   rcu_read_unlock();
 
netif_tx_unlock(local-mdev);
 }
@@ -176,14 +184,13 @@ static int ieee80211_master_open(struct 
struct ieee80211_sub_if_data *sdata;
int res = -EOPNOTSUPP;
 
-   read_lock(local-sub_if_lock);
-   list_for_each_entry(sdata, local-sub_if_list, list) {
+   /* we hold the RTNL here so can safely walk the list */
+   list_for_each_entry(sdata, local-interfaces, list) {
if (sdata-dev != dev  netif_running(sdata-dev)) {
res = 0;
break;
}
}
-   read_unlock(local-sub_if_lock);
return res;
 }
 
@@ -192,11 +199,10 @@ static int ieee80211_master_stop(struct 
struct ieee80211_local *local = wdev_priv(dev-ieee80211_ptr);
struct ieee80211_sub_if_data *sdata;
 
-   read_lock(local-sub_if_lock);
-   list_for_each_entry(sdata, local-sub_if_list, list)
+   /* we hold the RTNL here so can safely walk the list */
+   list_for_each_entry(sdata, local-interfaces, list)

Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170

2007-09-07 Thread Johannes Berg
On Fri, 2007-09-07 at 07:25 -0700, Paul E. McKenney wrote:

@@ -226,22 +225,22 @@ void ieee80211_if_reinit(struct net_devi
/* Remove all virtual interfaces that use this BSS
 * as their sdata-bss */
struct ieee80211_sub_if_data *tsdata, *n;
-   LIST_HEAD(tmp_list);

-   write_lock_bh(local-sub_if_lock);
   
   This code is also protected by RTNL?
  
  Yes.
 
 Comment?  (Or is it in the function header?)

Oh, forgot to say: yes, there is a comment further up and even an
ASSERT_RTNL()

johannes


signature.asc
Description: This is a digitally signed message part


Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170

2007-09-07 Thread Johannes Berg
On Fri, 2007-09-07 at 07:25 -0700, Paul E. McKenney wrote:

  I don't like ASSERT_RTNL() much because it actually tries to lock it.
  I'd be much happer if it was WARN_ON(!mutex_locked(rtnl_mutex)) or
  something equivalent.
 
 Ah!  It would indeed be nice to have a lower-overhead ASSERT_RTNL_LIGHT()
 or whatever.

I don't know why it tries that anyway. Maybe it's from semaphore days
where you couldn't check _is_locked()?

  In any case, I have an updated patch I'll be sending soon, and it
  requires a new list walking primitive I'll also send.
 
 Look forward to seeing it!

Will send in a minute.

johannes


signature.asc
Description: This is a digitally signed message part


Re: [NFS] problems with lockd in 2.6.22.6

2007-09-07 Thread J. Bruce Fields
On Fri, Sep 07, 2007 at 05:49:55PM +0200, Wolfgang Walter wrote:
 Hello,
 
 we upgraded the kernel of a nfs-server from 2.6.17.11 to 2.6.22.6. Since then 
 we get the message
 
 lockd: too many open TCP sockets, consider increasing the number of nfsd 
 threads
 lockd: last TCP connect from ^\\236^\É^D
 
 1) These random characters in the second line are caused by a bug in 
 svc_tcp_accept.
 I already posted this patch on netdev@vger.kernel.org:

Thanks, I've applied that.  (The bug is a little subtle: there's
actually two previous __svc_print_addr() calls which might have
initialized buf correctly, and it's not obvious that the second isn't
always called (since it's in a dprintk, which is a macro that expands
into a printk inside a conditional)).

 with this patch applied one gets something like
 
 lockd: too many open TCP sockets, consider increasing the number of
 nfsd threads lockd: last TCP connect from 10.11.0.12, port=784
 
 
 2) The number of nfsd threads we are running on the machine is 1024.
 So this is not the problem. It seems, though, that in the case of
 lockd svc_tcp_accept does not check the number of nfsd threads but the
 number of lockd threads which is one.  As soon as the number of open
 lockd sockets surpasses 80 this message gets logged.  This usually
 happens every evening when a lot of people shutdown their workstation.

So to be clear: there's not an actual problem here other than that the
logs are getting spammed?  (Not that that isn't a problem in itself.)

 3) For unknown reason these sockets then remain open. In the morning
 when people start their workstation again we therefor not only get a
 lot of these messages again but often the nfs-server does not proberly
 work any more. Restarting the nfs-daemon is a workaround.

Hm, thanks.

--b.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


PATCH to bug #8876

2007-09-07 Thread Nikolay Kopitonenko
Hi there!

Below is a fix for this:
http://bugzilla.kernel.org/show_bug.cgi?id=8876


Applies to any version since 2.6.22 to latest: 2.6.23-rc5-git1

please apply :)


-CUT-
diff -urN a/net/ipv4/devinet.c b/net/ipv4/devinet.c
--- a/net/ipv4/devinet.c2007-07-09 02:32:17.0 +0300
+++ b/net/ipv4/devinet.c2007-08-10 20:33:22.0 +0300
@@ -1193,7 +1193,7 @@
for (ifa = in_dev-ifa_list, ip_idx = 0; ifa;
 ifa = ifa-ifa_next, ip_idx++) {
if (ip_idx  s_ip_idx)
-   goto cont;
+   continue;
if (inet_fill_ifaddr(skb, ifa, NETLINK_CB(cb-skb).pid,
 cb-nlh-nlmsg_seq,
 RTM_NEWADDR, NLM_F_MULTI) = 0)
-/CUT-

Signed-off-by: [EMAIL PROTECTED]


Thanks

Nikolay Kopitonenko
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


problems with lockd in 2.6.22.6

2007-09-07 Thread Wolfgang Walter
Hello,

we upgraded the kernel of a nfs-server from 2.6.17.11 to 2.6.22.6. Since then 
we get the message

lockd: too many open TCP sockets, consider increasing the number of nfsd threads
lockd: last TCP connect from ^\\236^\É^D

1) These random characters in the second line are caused by a bug in 
svc_tcp_accept.
I already posted this patch on netdev@vger.kernel.org:

Signed-off-by: Wolfgang Walter [EMAIL PROTECTED]
--- linux-2.6.22.6/net/sunrpc/svcsock.c 2007-08-27 18:10:14.0 +0200
+++ linux-2.6.22.6w/net/sunrpc/svcsock.c2007-09-03 18:27:30.0 
+0200
@@ -1090,7 +1090,7 @@
   serv-sv_name);
printk(KERN_NOTICE
   %s: last TCP connect from %s\n,
-  serv-sv_name, buf);
+  serv-sv_name, __svc_print_addr(sin, 
buf, sizeof(buf)));
}
/*
 * Always select the oldest socket. It's not fair,


with this patch applied one gets something like

lockd: too many open TCP sockets, consider increasing the number of nfsd threads
lockd: last TCP connect from 10.11.0.12, port=784


2) The number of nfsd threads we are running on the machine is 1024. So this is 
not
the problem. It seems, though, that in the case of lockd svc_tcp_accept does not
check the number of nfsd threads but the number of lockd threads which is one.
As soon as the number of open lockd sockets surpasses 80 this message gets 
logged.
This usually happens every evening when a lot of people shutdown their 
workstation.

3) For unknown reason these sockets then remain open. In the morning when people
start their workstation again we therefor not only get a lot of these messages
again but often the nfs-server does not proberly work any more. Restarting the
nfs-daemon is a workaround.

Reagrds,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix e100 on systems that have cache incoherent DMA

2007-09-07 Thread Kok, Auke

David Acker wrote:

On the systems that have cache incoherent DMA, including ARM, there is a
race condition between software allocating a new receive buffer and hardware
writing into a buffer.  The two race on touching the last Receive Frame
Descriptor (RFD).  It has its el-bit set and its next link equal to 0.
When hardware encounters this buffer it attempts to write data to it and
then update Status Word bits and Actual Count in the RFD.  At the same time
software may try to clear the el-bit and set the link address to a new buffer.

Since the entire RFD is once cache-line, the two write operations can collide.
This can lead to the receive unit stalling or interpreting random memory as
its receive area.

The fix is to set the el-bit on and the size to 0 on the next to last buffer
in the chain.  When the hardware encounters this buffer it stops and does not
write to it at all.  The hardware issues an RNR interrupt with the receive
unit in the No Resources state.  Software can write to the tail of the list
because it knows hardware will stop on the previous descriptor that was
marked as the end of list.

Once it has a new next to last buffer prepared, it can clear the el-bit and
set the size on the previous one.  The race on this buffer is safe since
the link already points to a valid next buffer and the software can handle
the race setting the size (assuming aligned 16 bit writes are atomic with
respect to the DMA read). If the hardware sees the el-bit cleared without
the size set, it will move on to the next buffer and skip this one.  If it
sees the size set but the el-bit still set, it will complete that buffer
and then RNR interrupt and wait.

Flags are kept in the software descriptor to note if the el bit is set and if
the size was 0.  When software clears the RFD's el bit and set its size, it
also clears the el flag but leaves the size was 0 bit set.  This way software
can identify them when the race may have occurred when cleaning the ring.
On these descriptors, it looks ahead and if the next one is complete then
hardware must have skipped the current one.  Logic is added to prevent two
packets in a row being marked while the receiver is running to avoid running
in lockstep with the hardware and thereby limiting the required lookahead.

This is a patch for 2.6.23-rc4.

Signed-off-by: David Acker [EMAIL PROTECTED]



first impressions are not good: pings are erratic and shoot up to 3 seconds. In 
an overnight stress test, the receive unit went offline and never came back up 
(TX still working).


it sounds like something in the logic is suspending the ru too much, but I 
haven't had time to look deeply into the code yet.


Auke




---

--- linux-2.6.23-rc4/drivers/net/e100.c.orig2007-08-30 13:32:10.0 
-0400
+++ linux-2.6.23-rc4/drivers/net/e100.c 2007-08-30 15:42:07.0 -0400
@@ -106,6 +106,13 @@
  * the RFD, the RFD must be dma_sync'ed to maintain a consistent
  * view from software and hardware.
  *
+ * In order to keep updates to the RFD link field from colliding with
+ * hardware writes to mark packets complete, we use the feature that
+ * hardware will not write to a size 0 descriptor and mark the previous
+ * packet as end-of-list (EL).   After updating the link, we remove EL
+ * and only then restore the size such that hardware may use the
+ *	previous-to-end RFD. 
+ *

  * Under typical operation, the  receive unit (RU) is start once,
  * and the controller happily fills RFDs as frames arrive.  If
  * replacement RFDs cannot be allocated, or the RU goes non-active,
@@ -281,14 +288,14 @@ struct csr {
 };
 
 enum scb_status {

+   rus_no_res   = 0x08,
rus_ready= 0x10,
rus_mask = 0x3C,
 };
 
 enum ru_state  {

-   RU_SUSPENDED = 0,
-   RU_RUNNING   = 1,
-   RU_UNINITIALIZED = -1,
+   ru_stopped = 0,
+   ru_running = 1,
 };
 
 enum scb_stat_ack {

@@ -401,10 +408,16 @@ struct rfd {
u16 size;
 };
 
+enum rx_flags {

+   rx_el = 0x01,
+   rx_s0 = 0x02,
+};
+
 struct rx {
struct rx *next, *prev;
struct sk_buff *skb;
dma_addr_t dma_addr;
+   u8 flags;
 };
 
 #if defined(__BIG_ENDIAN_BITFIELD)

@@ -952,7 +965,7 @@ static void e100_get_defaults(struct nic
((nic-mac = mac_82558_D101_A4) ? cb_cid : cb_i));
 
 	/* Template for a freshly allocated RFD */

-   nic-blank_rfd.command = cpu_to_le16(cb_el);
+   nic-blank_rfd.command = 0;
nic-blank_rfd.rbd = 0x;
nic-blank_rfd.size = cpu_to_le16(VLAN_ETH_FRAME_LEN);
 
@@ -1753,18 +1766,48 @@ static int e100_alloc_cbs(struct nic *ni

return 0;
 }
 
-static inline void e100_start_receiver(struct nic *nic, struct rx *rx)

+static void e100_find_mark_el(struct nic *nic, struct rx *marked_rx, int 
is_running)
 {
-   if(!nic-rxs) return;
-   if(RU_SUSPENDED != nic-ru_running) return;
+   struct rx *rx = nic-rx_to_use-prev-prev;
+   

auto recycling of TIME_WAIT connections

2007-09-07 Thread Pádraig Brady
As I see it, TIME_WAIT state is required for 2 reasons:

  to handle wandering duplicate packets
  (so a reincarnation of a connection will not be corrupted by these packets)

  To handle last ack from active closer (client) not being received by remote.
  If that happened, the server which is in LAST_ACK state would retransmit its 
FIN
  (which may contain data also) so the client must be in TIME_WAIT state to 
handle that.
  If client is not in TIME_WAIT state, then it could only indicate to the server
  that data was maybe lost (with an RST).

The first issue, requires a large timeout, and
the TIME_WAIT timeout is currently 60 seconds on linux.
That timeout effectively limits the connection rate between
local TCP clients and a server to 32k/60s or around 500 connections/second.

But that issue can't really happen when the client
and server are on the same machine can it, and
even if it could, the timeouts involved would be shorter.

Now linux does have an (undocumented) /proc/sys/net/ipv4/tcp_tw_recycle flag
to enable recycling of TIME_WAIT connections. This is global however and could 
cause
problems in general for external connections.

So how about auto enabling recycling for local connections?

cheers,
Pádraig.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23-rc4-mm1: e1000e napi lockup

2007-09-07 Thread David Miller
From: Jiri Slaby [EMAIL PROTECTED]
Date: Fri, 07 Sep 2007 09:19:30 +0200

 I found a regression in 2.6.23-rc4-mm1 (since -rc3-mm1) in e1000e driver.
 napi_disable(adapter-napi) in e1000_probe freezes the kernel on boot.

Yes, the semantics changed slightly in the net-2.6.24 tree the
other week and someone needs to fix it up.

The netif_napi_add() implicitly does a napi_disable() call.  Device
open must explicitly napi_enable() and device close must explicitly
napi_disable(), and if done elsewhere these calls must be strictly
balanced.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: wrt Age Entry For IPv4 IPv6 Route Table

2007-09-07 Thread David Miller

I'm trevelling otherwise I would have reviewed and integrated
or given feedback for changes.

I'll be back late next week.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


2.6.23-rc4-mm1: e1000e napi lockup

2007-09-07 Thread Jiri Slaby
Hi,

I found a regression in 2.6.23-rc4-mm1 (since -rc3-mm1) in e1000e driver.
napi_disable(adapter-napi) in e1000_probe freezes the kernel on boot.

regards,
-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NFS] problems with lockd in 2.6.22.6

2007-09-07 Thread Wolfgang Walter
Am Freitag, 7. September 2007 18:19 schrieben Sie:
 On Fri, Sep 07, 2007 at 05:49:55PM +0200, Wolfgang Walter wrote:
  Hello,
 
  we upgraded the kernel of a nfs-server from 2.6.17.11 to 2.6.22.6. Since
  then we get the message
 
  lockd: too many open TCP sockets, consider increasing the number of nfsd
  threads lockd: last TCP connect from ^\\236^\É^D

 
  2) The number of nfsd threads we are running on the machine is 1024.
  So this is not the problem. It seems, though, that in the case of
  lockd svc_tcp_accept does not check the number of nfsd threads but the
  number of lockd threads which is one.  As soon as the number of open
  lockd sockets surpasses 80 this message gets logged.  This usually
  happens every evening when a lot of people shutdown their workstation.

 So to be clear: there's not an actual problem here other than that the
 logs are getting spammed?  (Not that that isn't a problem in itself.)


When more than 80 nfs clients try to lock files at the same time then it
probably would.

  3) For unknown reason these sockets then remain open. In the morning
  when people start their workstation again we therefor not only get a
  lot of these messages again but often the nfs-server does not properly
  work any more. Restarting the nfs-daemon is a workaround.

 Hm, thanks.


I don't know if the lockd thing is the reason, though.

2.6.22.6 per se runs stable (no oops, no crash etc) but kernel nfs seems
to be a little bit unstable. 2.6.17.11 run for months without any nfsd-related 
problems whereas in 2.6.22.6 nfs needs to be restarted almost every day. 
Sometimes this fails with

lockd_down: lockd failed to exit, clearing pid
nfsd: last server has exited
nfsd: unexporting all filesystems
lockd_up: makesock failed, error=-98

after which the server must be rebooted.

I think there is something with lockd because there are no problems over the 
day. It is in the morning when a lot of people log into their machines and 
start their desktops (I think kde locks its config files when it reads them).

Regards
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bonding: update some distro-specific documentation

2007-09-07 Thread Jay Vosburgh
Andy Gospodarek [EMAIL PROTECTED] wrote:

This all looks fine except for one nit (well, request for extra
detail, really):

@@ -802,15 +802,20 @@ BROADCAST=192.168.1.255
 ONBOOT=yes
 BOOTPROTO=none
 USERCTL=no
+BONDING_OPTS=mode=balance-alb miimon=100

   Be sure to change the networking specific lines (IPADDR,
 NETMASK, NETWORK and BROADCAST) to match your network configuration.
+You also need to set the BONDING_OPTS= line to specify the desired
+options for your bond0 interface.  Specifying bonding options in this
+way is the preferred method for configuring bonding interfaces.

Can you add something here that mentions that, for the
arp_ip_target option, it has to be supplied as arp_ip_target=+10.0.0.1
and not just arp_ip_target=10.0.0.1?  Also, multiple targets require
multiple instances of the arp_ip_target option; it doesn't work to put
multiple IP addresses as in the module option (i.e.,
arp_ip_target=10.0.0.1,10.0.0.2).

This is necessary because ifup-eth isn't adding the + when it
translates the option for use with sysfs or parsing the multiple IP
address syntax.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2][RESEND] ehea: fix last_rx update

2007-09-07 Thread Jan-Bernd Themann
Update last_rx in registered device struct instead of
in the dummy device.

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]

---
 drivers/net/ehea/ehea_main.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index 1e9fd6f..717b129 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -471,7 +471,7 @@ static struct ehea_cqe *ehea_proc_rwqes(struct net_device 
*dev,
else
netif_receive_skb(skb);
 
-   dev-last_rx = jiffies;
+   port-netdev-last_rx = jiffies;
} else {
pr-p_stats.poll_receive_errors++;
port_reset = ehea_treat_poll_error(pr, rq, cqe,
-- 
1.5.2

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


PROBLEM: Oops with forcedeth and netkey in 2.6.21 and 22

2007-09-07 Thread Alexandre Ghisoli
Hi there, 

I cannot get my AMD64 working with forcedeth network chip and Netkey.

While recompiling kernel to get iptables and OpenSWAN working, I cannot
anymore boot my computer, it freeze on network setup.

After some reboots / recompile, I've traced the problem arround NetKEY.
If I enable it in the kernel, I'm getting oops.

Starting in single, I've been able to see errors comming because
dhcpclient process and af_packet module. If I dont load af_packet at
boot, i can setup manually an ip address. Unfortunaty, when lauching
gnome, my computer hang (probably some process tries to load
af_packets ?)

The NIC is on-board NIC on the MSI Neo4 Platinum motherboard (product
MS-7125)

I've tried thoses kernel version, same behavior :
 - 2.6.21.6
 - 2.6.22.1
 - 2.6.22.6


Here is the oops I got (dmesg captured) :

skb_under_panic: text:c02b089c len:14 put:14 head:f74e8410 data:f74e8402
tail:f74e8400 end:f74e8580 dev:NULL
[ cut here ]
kernel BUG at net/core/skbuff.c:111!
invalid opcode:  [#1]
PREEMPT SMP
Modules linked in: af_packet usbhid sha256 sha1 hmac crypto_hash des
crypto_algapi af_key xfrm_user ohci_hcd parport_pc ehci_hcd usbcore
parport ohci1394 ieee1394 nvidia(P) floppy forcedeth sg
CPU:1
EIP:0060:[c029f0c9]Tainted: P   VLI
EFLAGS: 00010292   (2.6.21.6 #2)
EIP is at skb_under_panic+0x59/0x60
eax: 0072   ebx: f74e8410   ecx: f72a2000   edx: 
esi:    edi: 0800   ebp:    esp: f72a3d6c
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process dhcpcd (pid: 3723, ti=f72a2000 task=f76bd030 task.ti=f72a2000)
Stack: c0398bd0 c02b089c 000e 000e f74e8410 f74e8402 f74e8400
f74e8580
   c037d386 f74e8402 f7fa4c80 c02b08a1 f79b4000 f72a3f40 f79b4000
c215a900
   f7fa4c80 f8b95cca f72a3ecc  0148 c016c455 f7a92600
0008cae0
Call Trace:
 [c02b089c] eth_header+0x10c/0x120
 [c02b08a1] eth_header+0x111/0x120
 [f8b95cca] packet_sendmsg+0x14a/0x260 [af_packet]
 [c016c455] link_path_walk+0x65/0xc0
 [c0299dbe] sock_sendmsg+0xce/0x100
 [c0131180] autoremove_wake_function+0x0/0x40
 [c016d3d4] path_lookup+0x14/0x20
 [c02fdb10] unix_find_other+0x30/0x1a0
 [c029a193] sys_sendto+0x133/0x180
 [c029b1ce] sys_socketcall+0x14e/0x280
 [c0102c7e] sysenter_past_esp+0x5f/0x85
 ===
Code: 00 00 89 5c 24 14 8b 98 90 00 00 00 89 54 24 0c 89 5c 24 10 8b 40
60 89 4c 24 04 c7 04 24 d0 8b 39 c0 89 44 24 08 e8 57 f4 e7 ff 0f 0b
eb fe 8d 76 00 56 53 bb 86 d3 37 c0 83 ec 24 8b 70 14 85
EIP: [c029f0c9] skb_under_panic+0x59/0x60 SS:ESP 0068:f72a3d6c

Here is my running environement :
Linux amd64-linux 2.6.21.6 #1 SMP PREEMPT Thu Sep 6 23:33:04 CEST 2007
i686 AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ AuthenticAMD
GNU/Linux
 
Gnu C  4.2.0
Gnu make   3.81
binutils   Binutils
util-linux 2.12r
mount  2.12r
module-init-tools  3.2.2
e2fsprogs  1.40.2
reiserfsprogs  3.6.19
PPP2.4.4
Linux C Library libc.2.6
Dynamic linker (ldd)   2.6
Procps 3.2.7
Net-tools  1.60
Kbd1.13
Sh-utils   6.9
udev   114

And my on-board NIC :
00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3)
Subsystem: Micro-Star International Co., Ltd. Unknown device
7125
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast TAbort-
TAbort- MAbort- SERR- PERR-
Latency: 0 (250ns min, 5000ns max)
Interrupt: pin A routed to IRQ 16
Region 0: Memory at fe029000 (32-bit, non-prefetchable)
[size=4K]
Region 1: I/O ports at b400 [size=8]
Capabilities: [44] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1
+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable+ DSel=0 DScale=0 PME-

Thanks for your help.

--Alexandre

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170

2007-09-07 Thread Johannes Berg
On Fri, 2007-09-07 at 18:01 +0200, Michael Buesch wrote:

 What's the problem with trying to lock it?

I think I had a problem with it once when I inserted it into some code
that was atomic and it all blew up badly ;) Nothing important really but
it sort of made me not like it much.

johannes


signature.asc
Description: This is a digitally signed message part


Re: [PATCH] Fix e100 on systems that have cache incoherent DMA

2007-09-07 Thread David Acker

Kok, Auke wrote:
first impressions are not good: pings are erratic and shoot up to 3 
seconds. In an overnight stress test, the receive unit went offline and 
never came back up (TX still working).


it sounds like something in the logic is suspending the ru too much, but 
I haven't had time to look deeply into the code yet.


I don't have an e100 enabled x86 box handy but I will look into getting one 
setup.

I just applied this patch to my PXA255 based system 
http://www.compulab.co.il/x255/html/x255-cm-datasheet.htm .
It is running 2.6.18.4 plus compulab patches plus some hostap patches plus the 
e100 patch.  I get:

pings going from the embedded system to a desktop machine.
100 packets transmitted, 100 received, 0% packet loss, time 98996ms
rtt min/avg/max/mdev = 0.239/0.728/1.512/0.571 ms

Pings going the from the desktop machine to the embedded system
100 packets transmitted, 100 received, 0% packet loss, time 99217ms
rtt min/avg/max/mdev = 0.206/0.876/1.473/0.575 ms


iperf tcp from embedded to desktop gets:
[  5]  0.0-100.0 sec  1007 MBytes  84.4 Mbits/sec
iperf udp from the embedded to the desktop gets (embedded told to send at 
100mbps):
[  5] Server Report:
[  5]  0.0-100.0 sec947 MBytes  79.4 Mbits/sec  0.068 ms   16/675645 
(0.0024%)
[  5]  0.0-100.0 sec  1 datagrams received out-of-order

iperf tcp from the desktop to the embedded gets:
[  6]  0.0-100.0 sec  1.01 GBytes  86.4 Mbits/sec
iperf udp from the desktop to the embedded gets the following when the desktop 
sent at 100 mbps
[  5]  0.0-100.0 sec964 MBytes  80.8 Mbits/sec  0.359 ms 126467/813760 (16%)
[  5]  0.0-100.0 sec  1 datagrams received out-of-order


Boot messages for my e100 are:
e100: Intel(R) PRO/100 Network Driver, 3.5.10-k2-NAPI
e100: Copyright(c) 1999-2005 Intel Corporation
PCI: enabling device :00:09.0 ( - 0003)
PCI: Setting latency timer of device :00:09.0 to 64
e100: eth0: e100_probe: addr 0x10131000, irq 111, MAC addr 00:09:30:FF:F2:F6
cat 
/sys/bus/pci/drivers/e100/\:00\:09.0/{device,vendor,subsystem_device,subsystem_vendor}
0x1209
0x8086
0x
0x

It's on its own interrupt line:
cm-debian:~# cat /proc/interrupts |grep eth0
111: 402428   -  eth0

lspci shows:
00:09.0 Ethernet controller: Intel Corporation 8255xER/82551IT Fast Ethernet 
Controller (rev 09)

Let me know if there is any other information I can provide you.  I will look through the code to see what could be 
going on with your machine.  I will also look into reproducing these results with a newer kernel.  This may be tricky 
since compulab's patches are pretty stale and don't always apply easily.


-Ack
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix e100 on systems that have cache incoherent DMA

2007-09-07 Thread Kok, Auke

David Acker wrote:

Kok, Auke wrote:
first impressions are not good: pings are erratic and shoot up to 3 
seconds. In an overnight stress test, the receive unit went offline and 
never came back up (TX still working).


it sounds like something in the logic is suspending the ru too much, but 
I haven't had time to look deeply into the code yet.


I don't have an e100 enabled x86 box handy but I will look into getting one 
setup.

I just applied this patch to my PXA255 based system 
http://www.compulab.co.il/x255/html/x255-cm-datasheet.htm .
It is running 2.6.18.4 plus compulab patches plus some hostap patches plus the 
e100 patch.  I get:

pings going from the embedded system to a desktop machine.
100 packets transmitted, 100 received, 0% packet loss, time 98996ms
rtt min/avg/max/mdev = 0.239/0.728/1.512/0.571 ms

Pings going the from the desktop machine to the embedded system
100 packets transmitted, 100 received, 0% packet loss, time 99217ms
rtt min/avg/max/mdev = 0.206/0.876/1.473/0.575 ms


ok, I just got a note from our lab saying that that particular system has the 
freak ping times even without your patch applied 8)


ignoring the ping issue, we still have the ru offline, but that could have 
possibly been caused by whatever is causing this ping issue... More testing is 
needed, and I'll try to find a system without the ping issue here first.


Auke
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/2][BNX2]: Add iSCSI support to BNX2 devices.

2007-09-07 Thread Mike Christie

Anil Veerabhadrappa wrote:



+
+/* iSCSI stages */
+#define ISCSI_STAGE_SECURITY_NEGOTIATION (0)
+#define ISCSI_STAGE_LOGIN_OPERATIONAL_NEGOTIATION (1)
+#define ISCSI_STAGE_FULL_FEATURE_PHASE (3)
+/* Logout response codes */
+#define ISCSI_LOGOUT_RESPONSE_CONNECTION_CLOSED (0)
+#define ISCSI_LOGOUT_RESPONSE_CID_NOT_FOUND (1)
+#define ISCSI_LOGOUT_RESPONSE_CLEANUP_FAILED (3)
+
+/* iSCSI task types */
+#define ISCSI_TASK_TYPE_READ(0)
+#define ISCSI_TASK_TYPE_WRITE   (1)
+#define ISCSI_TASK_TYPE_MPATH   (2)




All of these iscsi code shoulds be in iscsi_proto.h or should be added 
there.

This is a very tricky proposal as this header file is automatically
generated by a well defined process and is shared between various driver
supporting multiple platform/OS and the firmware. If it is not of a big
issue I would like to keep it the way it is.


The values that are iscsi RFC values should come from the iscsi_proto.h 
file and not be duplicated for each driver.




+/*
+ * hardware reset
+ */
+int bnx2i_reset(struct scsi_cmnd *sc)
+{
+   return 0;
+}


So what is up with this one? It seems like if there is a way to reset 
hardware then you would want it as the scsi eh host reset callout 
instead of dropping the session. We could add some transport level 
recovery callouts for the iscsi specifics.


We may not be able to support HBA cold reset as bnx2 driver is the
primary owner of chip reset and initialization. This is the drawback of
sharing network interface with the NIC driver. If there is a need for
administrator to reset the iSCSI port same can be achieved by running
'ifdown eth#' and 'ifup eth#'.
Current driver even allows ethernet interface reset when there are
active iSCSI connection, all active iscsi sessions will be reinstated
when the network link comes back live
 



If you cannot support it or it does not make sense just remove the stub 
then. I say it is not a big deal now, but hopefully we do not hit fun 
like with qla3xxx and qla4xxx :)



+
+void bnx2i_sysfs_cleanup(void)
+{
+   class_device_unregister(port_class_dev);
+   class_unregister(bnx2i_class);
+}
The sysfs bits related to the hba should be use one of the scsi sysfs 
facilities or if they are related to iscsi bits and are generic then 
through the iscsi hba


bnx2i needs 2 sysfs entries -
1. QP size info - this is used to size per connection shared data
structures to issue work requests to chip (login, scsi cmd, tmf, nopin)
and get completions from the chip (scsi completions, async messages,
etc'). This is a iSCSI HBA attribute
2. port mapper - we can be more flexible on classifying this as either
iSCSI HBA attribute or bnx2i driver global attribute
Can hooks be added to iSCSI transport class to include these?



Which ones were they exactly? I think JamesB wanted only common 
transport values in the transport class. If it is driver specific then 
it should go on the host or target or device with the scsi_host_template 
attrs.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: possible NAPI improvements to reduce interrupt rates for low traffic rates

2007-09-07 Thread Jason Lunz
In gmane.linux.network, you wrote:
 But the CPU has done more work. The flood ping will always show 
 increased CPU with these changes because the driver always stays in the 
 NAPI poll list. For typical LAN traffic, the average CPU usage doesn't 
 increase as much, though more measurements would be useful.

I'd be particularly interested to see what happens to your latency when
other apps are hogging the cpu. I assume from your description that your
cpu is mostly free to schedule the niced softirqd for the device polling
duration, but this won't always be the case. If other tasks are running
at high priority, it could be nearly a full jiffy before softirqd gets
to check the poll list again and the latency introduced could be much
higher than you've yet measured.

Jason
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


error(s) in 2.6.23-rc5 bonding.txt ?

2007-09-07 Thread Rick Jones
I was perusing Documentation/networking/bonding.txt in a 2.6.23-rc5 tree 
and came across the following discussing the round-robin scheduling:



Note that this out of order delivery occurs when both the
sending and receiving systems are utilizing a multiple
interface bond.  Consider a configuration in which a
balance-rr bond feeds into a single higher capacity network
channel (e.g., multiple 100Mb/sec ethernets feeding a single
gigabit ethernet via an etherchannel capable switch).  In this
configuration, traffic sent from the multiple 100Mb devices to
a destination connected to the gigabit device will not see
packets out of order.  


My first reaction was that this was incorrect - it didn't matter if the 
receiver was using a single link or not because the packets flowing 
across the multiple 100Mb links could hit the intermediate device out of 
order and so stay that way across the GbE link.


Before I go and patch-out that text I thought I'd double check.

rick jones
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ixgbe: driver for Intel(R) 82598 PCI-Express 10GbE adapters (v4)

2007-09-07 Thread Jeff Garzik

David Miller wrote:

From: Kok, Auke [EMAIL PROTECTED]
Date: Thu, 06 Sep 2007 11:31:47 -0700


Also available through git:// and http:// here:

   http://foo-projects.org/~sofar/ixgbe-20070905-submission.patch
   http://foo-projects.org/~sofar/ixgbe-20070905-submission.patch.bz2
   (git-am formatted!)

   git://lost.foo-projects.org/~ahkok/linux-2.6 ixgbe-20070905-submission


To be honest I have absolutely no problems with this driver and we
should just cut the crap and merge it in now.

Any objections anyone makes at this point is frankly nit picking crap
which we can cure with followon cleanups and corrections.


Are you responding to a strawman or something?

AFAICS nobody objected to it, and Auke cleaned it up a la e1000e, which 
got queued during KS.


Jeff



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] DOC: Update networking/multiqueue.txt with correct information.

2007-09-07 Thread PJ Waskiewicz
Updated the multiqueue.txt document to call out the correct kernel options
to select to enable multiqueue.

Signed-off-by: Peter P Waskiewicz Jr [EMAIL PROTECTED]
---

 Documentation/networking/multiqueue.txt |   10 +++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/Documentation/networking/multiqueue.txt 
b/Documentation/networking/multiqueue.txt
index 00b60cc..ea5a42e 100644
--- a/Documentation/networking/multiqueue.txt
+++ b/Documentation/networking/multiqueue.txt
@@ -58,9 +58,13 @@ software, so it's a straight round-robin qdisc.  It uses the 
same syntax and
 classification priomap that sch_prio uses, so it should be intuitive to
 configure for people who've used sch_prio.
 
-The PRIO qdisc naturally plugs into a multiqueue device.  If PRIO has been
-built with NET_SCH_PRIO_MQ, then upon load, it will make sure the number of
-bands requested is equal to the number of queues on the hardware.  If they
+In order to utilitize the multiqueue features of the qdiscs, the network
+device layer needs to enable multiple queue support.  This can be done by
+selecting NETDEVICES_MULTIQUEUE under Drivers.
+
+The PRIO qdisc naturally plugs into a multiqueue device.  If
+NETDEVICES_MULTIQUEUE is selected, then on qdisc load, the number of
+bands requested is compared to the number of queues on the hardware.  If they
 are equal, it sets a one-to-one mapping up between the queues and bands.  If
 they're not equal, it will not load the qdisc.  This is the same behavior
 for RR.  Once the association is made, any skb that is classified will have
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][MIPS][7/7] AR7: ethernet

2007-09-07 Thread Jeff Garzik

Matteo Croce wrote:

Il Friday 07 September 2007 00:30:25 Andrew Morton ha scritto:

On Thu, 6 Sep 2007 17:34:10 +0200 Matteo Croce [EMAIL PROTECTED] wrote:
Driver for the cpmac 100M ethernet driver.
It works fine disabling napi support, enabling it gives a kernel panic
when the first IPv6 packet has to be forwarded.
Other than that works fine.


I'm not too sure why I got cc'ed on this (and not on patches 1-6?) but
whatever.


I mailed every maintainer in the respective section in the file MAINTAINERS
and you were in the NETWORK DEVICE DRIVERS section

This patch introduces quite a number of basic coding-style mistakes. 
Please run it through scripts/checkpatch.pl and review the output.


Already done. I'm collecting other suggestions before committing


cool, I'll wait for the resend before reviewing, then.

As an author I understand that fixing up coding style / cosmetic stuff 
rather than meat is annoying.


But it is important to emphasize that a clean driver is what makes a 
good, thorough, effective review possible.


Jeff


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [CORRECTION][PATCH] Fix a potential NULL pointer dereference in uli526x_interrupt() in drivers/net/tulip/uli526x.c

2007-09-07 Thread Jeff Garzik

Micah Gruber wrote:

This patch fixes a potential null dereference bug where we dereference dev 
before a null check. This patch simply moves the dereferencing after the null 
check.

Signed-off-by: Micah Gruber [EMAIL PROTECTED]
---

--- a/drivers/net/tulip/uli526x.c
+++ b/drivers/net/tulip/uli526x.c
@@ -663,7 +663,7 @@
 {
struct net_device *dev = dev_id;
struct uli526x_board_info *db = netdev_priv(dev);
-   unsigned long ioaddr = dev-base_addr;
+   unsigned long ioaddr;
unsigned long flags;
 
 	if (!dev) {

@@ -671,6 +671,8 @@
return IRQ_NONE;
}
 
+	ioaddr = dev-base_addr;

+


as satyam noted, just remove the !dev test


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix e100 on systems that have cache incoherent DMA

2007-09-07 Thread Jeff Garzik

David Acker wrote:
Let me know if there is any other information I can provide you.  I will 
look through the code to see what could be going on with your machine.  
I will also look into reproducing these results with a newer kernel.  
This may be tricky since compulab's patches are pretty stale and don't 
always apply easily.



pktgen outputs for the various cases modified/unmodified[/others?] would 
be nice, if you have a spot of time.


Jeff


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: error(s) in 2.6.23-rc5 bonding.txt ?

2007-09-07 Thread Jay Vosburgh
Rick Jones [EMAIL PROTECTED] wrote:
[...]
 Note that this out of order delivery occurs when both the
 sending and receiving systems are utilizing a multiple
 interface bond.  Consider a configuration in which a
 balance-rr bond feeds into a single higher capacity network
 channel (e.g., multiple 100Mb/sec ethernets feeding a single
 gigabit ethernet via an etherchannel capable switch).  In this
 configuration, traffic sent from the multiple 100Mb devices to
 a destination connected to the gigabit device will not see
 packets out of order.  

My first reaction was that this was incorrect - it didn't matter if the
receiver was using a single link or not because the packets flowing across
the multiple 100Mb links could hit the intermediate device out of order
and so stay that way across the GbE link.

Usually it does matter, at least at the time I tested this.

Usually, the even striping of traffic from the balance-rr mode
will deliver in-order to a single higher speed link (e.g., N 100Mb
feeding a single 1Gb).  I say usually because, although I don't see it
happen with the equipment I have, I'm willing to believe that there are
gizmos that would bundle packets arriving on the switch ports.

The reordering (usually) occurs when packet coalescing stuff
(either interrupt mitigation on the device, or NAPI) happens at the
receiver end, after the packets are striped evenly into the interfaces,
e.g.,

eth0eth1eth2
P1  P2  P3
P4  P5  P6
P7  P8  P9

and then eth0 goes and grabs a bunch of its packets, then eth1,
and eth2 do the same afterwards, so the received order ends up something
like P1, P4, P7, P2, P5, P8, P3, P6, P9.  In Ye Olde Dayes Of Yore, with
one packet per interrupt at 10 Mb/sec, this type of configuration
wouldn't reorder (or at least not as badly).

The text probably is lacking in some detail, though.  The real
key is that the last sender before getting to the destination system has
to do the round-robin striping.  Most switches that I'm familiar with
(again, never seen one, but willing to believe there is one) don't have
round-robin as a load balance option for etherchannel, and thus won't
evenly stripe traffic, but instead do some math on the packets so that a
given connection isn't split across ports.

That said, it's certainly plausible that, for a given set of N
ethernets all enslaved to a single bonding balance-rr, the individual
ethernets could get out of sync, as it were (e.g., one running a fuller
tx ring, and thus running behind the others).  If bonding is the only
feeder of the devices, then for a continuous flow of traffic, all the
slaves will generally receive packets (from the kernel, for
transmission) at pretty much the same rate, and so they won't tend to
get ahead or behind.

I haven't investigated into this deeply for a few years, but
this is my recollection of what happened with the tests I did then.  I
did testing with multiple 100Mb devices feeding either other sets of
100Mb devices or single gigabit devices.  I'm willing to believe that
things have changed, and an N feeding into one configuration can
reorder, but I haven't seen it (or really looked for it; balance-rr
isn't much the rage these days).

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23-rc4-mm1: e1000e napi lockup

2007-09-07 Thread Jeff Garzik

Kok, Auke wrote:

David Miller wrote:

From: Jiri Slaby [EMAIL PROTECTED]
Date: Fri, 07 Sep 2007 09:19:30 +0200

I found a regression in 2.6.23-rc4-mm1 (since -rc3-mm1) in e1000e 
driver.

napi_disable(adapter-napi) in e1000_probe freezes the kernel on boot.


Yes, the semantics changed slightly in the net-2.6.24 tree the
other week and someone needs to fix it up.

The netif_napi_add() implicitly does a napi_disable() call.  Device
open must explicitly napi_enable() and device close must explicitly
napi_disable(), and if done elsewhere these calls must be strictly
balanced.


I'll fix it... it's my patch that adds the new napi code to it and I 
need to get it ready for the merge window anyway.


well  since its close to the merge window opening, we could see what 
happens if DaveM pulls branch 'upstream' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git


That should make this class of pre-merge-window annoyance go away.

Jeff


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: request for information about the ath5k licensing

2007-09-07 Thread Jeff Garzik

Reyk Floeter wrote:

I'm still waiting for an answer. Your process is taking too long.



Speaking as a person through which these changes flow upstream into the 
official kernel (ath5k maintainers - linville - me - linus)...



The most important thing for today is that no ath5k stuff has been 
committed (nor has it ever been).



I would rather take it slow and make sure everybody is happy.  There is 
nothing upstream, and so, there is no need to rush and correct something.


Collectively, this is just growing pains.  Everyone is breaking new 
ground, trying to figure out how to best support atheros stuff on Linux. 
 There are new tools to deal with (svn? git? flavor of the day?:)), new 
licenses with new ramifications to consider, a new wireless stack to 
deal with.


What you are witnessing is but a small part of the chaos as everyone 
tackles these chores simultaneously.


Jeff


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: error(s) in 2.6.23-rc5 bonding.txt ?

2007-09-07 Thread Rick Jones

That said, it's certainly plausible that, for a given set of N
ethernets all enslaved to a single bonding balance-rr, the individual
ethernets could get out of sync, as it were (e.g., one running a fuller
tx ring, and thus running behind the others). 


That is the scenario of which I was thinking.


If bonding is the only feeder of the devices, then for a continuous
flow of traffic, all the slaves will generally receive packets (from
the kernel, for transmission) at pretty much the same rate, and so
they won't tend to get ahead or behind.


I could see that if there was just one TCP connection going doing bulk 
or something, but if there were a bulk transmitter coupled with an 
occasional request/response (ie netperf TCP_STREAM and a TCP_RR) i'd 
think the tx rings would no longer remain balanced.



I haven't investigated into this deeply for a few years, but
this is my recollection of what happened with the tests I did then.  I
did testing with multiple 100Mb devices feeding either other sets of
100Mb devices or single gigabit devices.  I'm willing to believe that
things have changed, and an N feeding into one configuration can
reorder, but I haven't seen it (or really looked for it; balance-rr
isn't much the rage these days).


Are you OK with that block of text simply being yanked?

rick
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] atl1: add CONFIG_ATL1_EXPERIMENTAL to kconfig

2007-09-07 Thread Chris Snook
From: Chris Snook [EMAIL PROTECTED]

Introduce Kconfig ATL1_EXPERIMENTAL to separate mature code from less mature
code in the atl1 driver, and remove EXPERIMENTAL designation for ATL1.

Signed-off-by: Chris Snook [EMAIL PROTECTED]
Acked-by: Jay Cliburn [EMAIL PROTECTED]

--- a/drivers/net/Kconfig   2007-09-04 10:12:38.0 -0400
+++ b/drivers/net/Kconfig   2007-09-04 10:37:34.0 -0400
@@ -2329,8 +2329,8 @@ config QLA3XXX
  will be called qla3xxx.
 
 config ATL1
-   tristate Attansic L1 Gigabit Ethernet support (EXPERIMENTAL)
-   depends on PCI  EXPERIMENTAL
+   tristate Attansic L1 Gigabit Ethernet support
+   depends on PCI
select CRC32
select MII
help
@@ -2339,6 +2339,16 @@ config ATL1
  To compile this driver as a module, choose M here.  The module
  will be called atl1.
 
+config ATL1_EXPERIMENTAL
+   bool atl1 experimental features
+   depends on ATL1  EXPERIMENTAL
+   help
+ This option enables various features that have not yet reached
+ the maturity of the rest of the atl1 driver.  The driver will
+ still work fine without this option enabled.
+
+ If unsure, say N.
+
 endif # NETDEV_1000
 
 #
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: unable to handle kernel NULL pointer dereference1

2007-09-07 Thread Michal Piotrowski
Hi Mark,

[Adding netdev to CC]

On 07/09/2007, Mark Nipper [EMAIL PROTECTED] wrote:
 I've received two oopses now from my kernel while running
 the 2.6.22 series.  The first was with 2.6.22.1 back in July and
 the second which happened just within the last day is 2.6.22.5.
 They both appear to be the same bug and I don't think it's
 hardware related.  I'm attaching the entries from logcheck which
 I received when they happened.

 I'm not subscribed to the mailing list, so please make
 sure to copy me directly on any replies.  And let me know if
 anyone needs any additional information to try to track this
 down.  Thanks for reading...

Regards,
Michal

-- 
LOG
http://www.stardust.webpages.pl/log/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23-rc4-mm1: e1000e napi lockup

2007-09-07 Thread Jeff Garzik

Kok, Auke wrote:

Jeff Garzik wrote:

Kok, Auke wrote:

David Miller wrote:

From: Jiri Slaby [EMAIL PROTECTED]
Date: Fri, 07 Sep 2007 09:19:30 +0200

I found a regression in 2.6.23-rc4-mm1 (since -rc3-mm1) in e1000e 
driver.
napi_disable(adapter-napi) in e1000_probe freezes the kernel on 
boot.

Yes, the semantics changed slightly in the net-2.6.24 tree the
other week and someone needs to fix it up.

The netif_napi_add() implicitly does a napi_disable() call.  Device
open must explicitly napi_enable() and device close must explicitly
napi_disable(), and if done elsewhere these calls must be strictly
balanced.
I'll fix it... it's my patch that adds the new napi code to it and I 
need to get it ready for the merge window anyway.


well  since its close to the merge window opening, we could see 
what happens if DaveM pulls branch 'upstream' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git


That should make this class of pre-merge-window annoyance go away.


If I do that now I get a big merge conflict:


oh you are _guaranteed_ conflicts.  most of that is NAPI-area code that 
got changed by both.



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] atl1: wrap problematic optimizations in CONFIG_ATL1_EXPERIMENTAL

2007-09-07 Thread Chris Snook
From: Chris Snook [EMAIL PROTECTED]

Make certain problematic optimizations build-time configurable.

Signed-off-by: Chris Snook [EMAIL PROTECTED]
Acked-by: Jay Cliburn [EMAIL PROTECTED]

--- a/drivers/net/atl1/atl1_main.c  2007-09-04 10:12:38.0 -0400
+++ b/drivers/net/atl1/atl1_main.c  2007-09-04 11:23:26.0 -0400
@@ -2203,22 +2203,26 @@ static int __devinit atl1_probe(struct p
struct net_device *netdev;
struct atl1_adapter *adapter;
static int cards_found = 0;
-   bool pci_using_64 = true;
+   bool pci_using_64 = false;
int err;
 
err = pci_enable_device(pdev);
if (err)
return err;
 
+#ifdef CONFIG_ATL1_EXPERIMENTAL
err = pci_set_dma_mask(pdev, DMA_64BIT_MASK);
+   if (!err) {
+   pci_using_64 = true;
+   goto dma_ok;
+   }
+#endif /* CONFIG_ATL1_EXPERIMENTAL */
+   err = pci_set_dma_mask(pdev, DMA_32BIT_MASK);
if (err) {
-   err = pci_set_dma_mask(pdev, DMA_32BIT_MASK);
-   if (err) {
-   dev_err(pdev-dev, no usable DMA configuration\n);
-   goto err_dma;
-   }
-   pci_using_64 = false;
+   dev_err(pdev-dev, no usable DMA configuration\n);
+   goto err_dma;
}
+dma_ok:
/* Mark all PCI regions associated with PCI device
 * pdev as being reserved by owner atl1_driver_name
 */
@@ -2294,11 +2298,13 @@ static int __devinit atl1_probe(struct p
netdev-features |= (NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX);
 
/*
-* FIXME - Until tso performance gets fixed, disable the feature.
+* TSO currently has performance problems,
+* so let's disable it by default.
 * Enable it with ethtool -K if desired.
 */
-   /* netdev-features |= NETIF_F_TSO; */
-
+#ifdef CONFIG_ATL1_EXPERIMENTAL
+   netdev-features |= NETIF_F_TSO;
+#endif
if (pci_using_64)
netdev-features |= NETIF_F_HIGHDMA;
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23-rc4-mm1: e1000e napi lockup

2007-09-07 Thread Kok, Auke

Jeff Garzik wrote:

Kok, Auke wrote:

Jeff Garzik wrote:

Kok, Auke wrote:

David Miller wrote:

From: Jiri Slaby [EMAIL PROTECTED]
Date: Fri, 07 Sep 2007 09:19:30 +0200

I found a regression in 2.6.23-rc4-mm1 (since -rc3-mm1) in e1000e 
driver.
napi_disable(adapter-napi) in e1000_probe freezes the kernel on 
boot.

Yes, the semantics changed slightly in the net-2.6.24 tree the
other week and someone needs to fix it up.

The netif_napi_add() implicitly does a napi_disable() call.  Device
open must explicitly napi_enable() and device close must explicitly
napi_disable(), and if done elsewhere these calls must be strictly
balanced.
I'll fix it... it's my patch that adds the new napi code to it and I 
need to get it ready for the merge window anyway.
well  since its close to the merge window opening, we could see 
what happens if DaveM pulls branch 'upstream' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git


That should make this class of pre-merge-window annoyance go away.

If I do that now I get a big merge conflict:


oh you are _guaranteed_ conflicts.  most of that is NAPI-area code that 
got changed by both.



actually that's the only thing it was, and fixing it up was trivial (took me 
about 3 minutes). it was 3x the napi code and once a struct indent change...


I'll have a new e1000e napi patch for andrew in a sec.

Auke
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] atl1: Introduce CONFIG_ATL1_EXPERIMENTAL

2007-09-07 Thread Jeff Garzik

Chris Snook wrote:
The atl1 driver is currently marked EXPERIMENTAL, because a few 
supposedly performance-enhancing features still have problems.  When 
these features are disabled, the driver is completely stable, fully 
functional, and performs well.


Patch 1/2 Creates the kconfig option CONFIG_ATL1_EXPERIMENTAL, and 
removes the EXPERIMENTAL designation from CONFIG_ATL1


Patch 2/2 Wraps some currently-disabled features in #ifdef 
CONFIG_ATL1_EXPERIMENTAL, so developers and testers can play with these 
features more easily, and distributions will still get a fast, stable 
driver with existing .config files.


We'll also be using this to wrap around various new features we'll be 
experimenting with in coming months.  Instead of using a half dozen 
different kconfig options for each of them, like some drivers do, we'll 
just use this, and make sure things are safe for everyone before we take 
them out of the experimental wrapper.


Well, I haven't received patch #2 yet, but in general a runtime switch 
(module option?) is greatly preferred.


Jeff



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] atl1: Introduce CONFIG_ATL1_EXPERIMENTAL

2007-09-07 Thread Chris Snook

Jeff Garzik wrote:

Chris Snook wrote:
The atl1 driver is currently marked EXPERIMENTAL, because a few 
supposedly performance-enhancing features still have problems.  When 
these features are disabled, the driver is completely stable, fully 
functional, and performs well.


Patch 1/2 Creates the kconfig option CONFIG_ATL1_EXPERIMENTAL, and 
removes the EXPERIMENTAL designation from CONFIG_ATL1


Patch 2/2 Wraps some currently-disabled features in #ifdef 
CONFIG_ATL1_EXPERIMENTAL, so developers and testers can play with 
these features more easily, and distributions will still get a fast, 
stable driver with existing .config files.


We'll also be using this to wrap around various new features we'll be 
experimenting with in coming months.  Instead of using a half dozen 
different kconfig options for each of them, like some drivers do, 
we'll just use this, and make sure things are safe for everyone before 
we take them out of the experimental wrapper.


Well, I haven't received patch #2 yet, but in general a runtime switch 
(module option?) is greatly preferred.


Jeff


Okay, I'll think about how we want to parameterize this.  I don't want users 
expecting development options to be around forever.  I'll resubmit something 
once I have more of these experimental features ready to submit.


-- Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [atl1-devel] [PATCH 2/2] atl1: wrap problematic optimizations in CONFIG_ATL1_EXPERIMENTAL

2007-09-07 Thread Luca
On 9/8/07, Chris Snook [EMAIL PROTECTED] wrote:
 From: Chris Snook [EMAIL PROTECTED]

 Make certain problematic optimizations build-time configurable.

 Signed-off-by: Chris Snook [EMAIL PROTECTED]
 Acked-by: Jay Cliburn [EMAIL PROTECTED]

 --- a/drivers/net/atl1/atl1_main.c  2007-09-04 10:12:38.0 -0400
 +++ b/drivers/net/atl1/atl1_main.c  2007-09-04 11:23:26.0 -0400
 @@ -2203,22 +2203,26 @@ static int __devinit atl1_probe(struct p
 struct net_device *netdev;
 struct atl1_adapter *adapter;
 static int cards_found = 0;
 -   bool pci_using_64 = true;
 +   bool pci_using_64 = false;
 int err;

 err = pci_enable_device(pdev);
 if (err)
 return err;

 +#ifdef CONFIG_ATL1_EXPERIMENTAL
 err = pci_set_dma_mask(pdev, DMA_64BIT_MASK);
 +   if (!err) {
 +   pci_using_64 = true;
 +   goto dma_ok;
 +   }
 +#endif /* CONFIG_ATL1_EXPERIMENTAL */

This is more like CONFIG_ATL1_PLEASE_KILL_MY_MACHINE; I really don't
see the problem with just limiting the DMA mask:
- if you don't have physical mem over the 4GB boundary limiting DMA
doesn't make any difference
- if you have more than 4GB of memory the machine won't survive long without it

Luca
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [atl1-devel] [PATCH 2/2] atl1: wrap problematic optimizations in CONFIG_ATL1_EXPERIMENTAL

2007-09-07 Thread Chris Snook

Luca wrote:

On 9/8/07, Chris Snook [EMAIL PROTECTED] wrote:

From: Chris Snook [EMAIL PROTECTED]

Make certain problematic optimizations build-time configurable.

Signed-off-by: Chris Snook [EMAIL PROTECTED]
Acked-by: Jay Cliburn [EMAIL PROTECTED]

--- a/drivers/net/atl1/atl1_main.c  2007-09-04 10:12:38.0 -0400
+++ b/drivers/net/atl1/atl1_main.c  2007-09-04 11:23:26.0 -0400
@@ -2203,22 +2203,26 @@ static int __devinit atl1_probe(struct p
struct net_device *netdev;
struct atl1_adapter *adapter;
static int cards_found = 0;
-   bool pci_using_64 = true;
+   bool pci_using_64 = false;
int err;

err = pci_enable_device(pdev);
if (err)
return err;

+#ifdef CONFIG_ATL1_EXPERIMENTAL
err = pci_set_dma_mask(pdev, DMA_64BIT_MASK);
+   if (!err) {
+   pci_using_64 = true;
+   goto dma_ok;
+   }
+#endif /* CONFIG_ATL1_EXPERIMENTAL */


This is more like CONFIG_ATL1_PLEASE_KILL_MY_MACHINE; I really don't
see the problem with just limiting the DMA mask:
- if you don't have physical mem over the 4GB boundary limiting DMA
doesn't make any difference
- if you have more than 4GB of memory the machine won't survive long without it


Atheros is still working on this, and we plan to fix it.  64-bit DMA *should* 
work.  I just resubmitted your patch with the comment Jeff requested.  I still 
may want to revisit CONFIG_ATL1_EXPERIMENTAL soon when I start playing around 
with more features.


-- Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [-MM, FIX V3] e1000e: incorporate napi_struct changes from net-2.6.24.git

2007-09-07 Thread Auke Kok
This incorporates the new napi_struct changes into e1000e. Included
bugfix for ifdown hang from Krishna Kumar for e1000.

Disabling polling is no longer needed at init time, so remove
napi_disable() call from _probe().

Signed-off-by: Auke Kok [EMAIL PROTECTED]
---

 drivers/net/e1000e/e1000.h  |2 ++
 drivers/net/e1000e/netdev.c |   39 ---
 2 files changed, 18 insertions(+), 23 deletions(-)

diff --git a/drivers/net/e1000e/e1000.h b/drivers/net/e1000e/e1000.h
index c57e35a..d2499bb 100644
--- a/drivers/net/e1000e/e1000.h
+++ b/drivers/net/e1000e/e1000.h
@@ -187,6 +187,8 @@ struct e1000_adapter {
struct e1000_ring *tx_ring /* One per active queue */
cacheline_aligned_in_smp;
 
+   struct napi_struct napi;
+
unsigned long tx_queue_len;
unsigned int restart_queue;
u32 txd_cmd;
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 372da46..f8ec537 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -1149,12 +1149,12 @@ static irqreturn_t e1000_intr_msi(int irq, void *data)
mod_timer(adapter-watchdog_timer, jiffies + 1);
}
 
-   if (netif_rx_schedule_prep(netdev)) {
+   if (netif_rx_schedule_prep(netdev, adapter-napi)) {
adapter-total_tx_bytes = 0;
adapter-total_tx_packets = 0;
adapter-total_rx_bytes = 0;
adapter-total_rx_packets = 0;
-   __netif_rx_schedule(netdev);
+   __netif_rx_schedule(netdev, adapter-napi);
} else {
atomic_dec(adapter-irq_sem);
}
@@ -1212,12 +1212,12 @@ static irqreturn_t e1000_intr(int irq, void *data)
mod_timer(adapter-watchdog_timer, jiffies + 1);
}
 
-   if (netif_rx_schedule_prep(netdev)) {
+   if (netif_rx_schedule_prep(netdev, adapter-napi)) {
adapter-total_tx_bytes = 0;
adapter-total_tx_packets = 0;
adapter-total_rx_bytes = 0;
adapter-total_rx_packets = 0;
-   __netif_rx_schedule(netdev);
+   __netif_rx_schedule(netdev, adapter-napi);
} else {
atomic_dec(adapter-irq_sem);
}
@@ -1662,10 +1662,10 @@ set_itr_now:
  * e1000_clean - NAPI Rx polling callback
  * @adapter: board private structure
  **/
-static int e1000_clean(struct net_device *poll_dev, int *budget)
+static int e1000_clean(struct napi_struct *napi, int budget)
 {
-   struct e1000_adapter *adapter;
-   int work_to_do = min(*budget, poll_dev-quota);
+   struct e1000_adapter *adapter = container_of(napi, struct 
e1000_adapter, napi);
+   struct net_device *poll_dev = adapter-netdev;
int tx_cleaned = 0, work_done = 0;
 
/* Must NOT use netdev_priv macro here. */
@@ -1684,25 +1684,20 @@ static int e1000_clean(struct net_device *poll_dev, int 
*budget)
spin_unlock(adapter-tx_queue_lock);
}
 
-   adapter-clean_rx(adapter, work_done, work_to_do);
-   *budget -= work_done;
-   poll_dev-quota -= work_done;
+   adapter-clean_rx(adapter, work_done, budget);
 
/* If no Tx and not enough Rx work done, exit the polling mode */
-   if ((!tx_cleaned  (work_done == 0)) ||
+   if ((tx_cleaned  (work_done  budget)) ||
   !netif_running(poll_dev)) {
 quit_polling:
if (adapter-itr_setting  3)
e1000_set_itr(adapter);
-   netif_rx_complete(poll_dev);
-   if (test_bit(__E1000_DOWN, adapter-state))
-   atomic_dec(adapter-irq_sem);
-   else
-   e1000_irq_enable(adapter);
+   netif_rx_complete(poll_dev, napi);
+   e1000_irq_enable(adapter);
return 0;
}
 
-   return 1;
+   return work_done;
 }
 
 static void e1000_vlan_rx_add_vid(struct net_device *netdev, u16 vid)
@@ -2439,7 +2434,7 @@ int e1000e_up(struct e1000_adapter *adapter)
 
clear_bit(__E1000_DOWN, adapter-state);
 
-   netif_poll_enable(adapter-netdev);
+   napi_enable(adapter-napi);
e1000_irq_enable(adapter);
 
/* fire a link change interrupt to start the watchdog */
@@ -2472,7 +2467,7 @@ void e1000e_down(struct e1000_adapter *adapter)
e1e_flush();
msleep(10);
 
-   netif_poll_disable(netdev);
+   napi_disable(adapter-napi);
e1000_irq_disable(adapter);
 
del_timer_sync(adapter-watchdog_timer);
@@ -2605,7 +2600,7 @@ static int e1000_open(struct net_device *netdev)
/* From here on the code is the same as e1000e_up() */
clear_bit(__E1000_DOWN, adapter-state);
 
-   netif_poll_enable(netdev);
+   napi_enable(adapter-napi);
 
e1000_irq_enable(adapter);
 
@@ -4090,8 +4085,7 @@ static int __devinit e1000_probe(struct pci_dev *pdev,

Re: [PATCH] [-MM, FIX V3] e1000e: incorporate napi_struct changes from net-2.6.24.git

2007-09-07 Thread Kok, Auke

Auke Kok wrote:

This incorporates the new napi_struct changes into e1000e. Included
bugfix for ifdown hang from Krishna Kumar for e1000.

Disabling polling is no longer needed at init time, so remove
napi_disable() call from _probe().



david,

while testing this patch I noticed that the poll routine is now called 100% of 
the time, and since I'm not doing much different than before, I suspec that 
something in the new napi code is staying in polling mode forever? Since e1000e 
is pretty much the same code as e1000, I doubt the problem is there, but you can 
probably tell better. ideas?


Auke




Signed-off-by: Auke Kok [EMAIL PROTECTED]
---

 drivers/net/e1000e/e1000.h  |2 ++
 drivers/net/e1000e/netdev.c |   39 ---
 2 files changed, 18 insertions(+), 23 deletions(-)

diff --git a/drivers/net/e1000e/e1000.h b/drivers/net/e1000e/e1000.h
index c57e35a..d2499bb 100644
--- a/drivers/net/e1000e/e1000.h
+++ b/drivers/net/e1000e/e1000.h
@@ -187,6 +187,8 @@ struct e1000_adapter {
struct e1000_ring *tx_ring /* One per active queue */
cacheline_aligned_in_smp;
 
+	struct napi_struct napi;

+
unsigned long tx_queue_len;
unsigned int restart_queue;
u32 txd_cmd;
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 372da46..f8ec537 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -1149,12 +1149,12 @@ static irqreturn_t e1000_intr_msi(int irq, void *data)
mod_timer(adapter-watchdog_timer, jiffies + 1);
}
 
-	if (netif_rx_schedule_prep(netdev)) {

+   if (netif_rx_schedule_prep(netdev, adapter-napi)) {
adapter-total_tx_bytes = 0;
adapter-total_tx_packets = 0;
adapter-total_rx_bytes = 0;
adapter-total_rx_packets = 0;
-   __netif_rx_schedule(netdev);
+   __netif_rx_schedule(netdev, adapter-napi);
} else {
atomic_dec(adapter-irq_sem);
}
@@ -1212,12 +1212,12 @@ static irqreturn_t e1000_intr(int irq, void *data)
mod_timer(adapter-watchdog_timer, jiffies + 1);
}
 
-	if (netif_rx_schedule_prep(netdev)) {

+   if (netif_rx_schedule_prep(netdev, adapter-napi)) {
adapter-total_tx_bytes = 0;
adapter-total_tx_packets = 0;
adapter-total_rx_bytes = 0;
adapter-total_rx_packets = 0;
-   __netif_rx_schedule(netdev);
+   __netif_rx_schedule(netdev, adapter-napi);
} else {
atomic_dec(adapter-irq_sem);
}
@@ -1662,10 +1662,10 @@ set_itr_now:
  * e1000_clean - NAPI Rx polling callback
  * @adapter: board private structure
  **/
-static int e1000_clean(struct net_device *poll_dev, int *budget)
+static int e1000_clean(struct napi_struct *napi, int budget)
 {
-   struct e1000_adapter *adapter;
-   int work_to_do = min(*budget, poll_dev-quota);
+   struct e1000_adapter *adapter = container_of(napi, struct 
e1000_adapter, napi);
+   struct net_device *poll_dev = adapter-netdev;
int tx_cleaned = 0, work_done = 0;
 
 	/* Must NOT use netdev_priv macro here. */

@@ -1684,25 +1684,20 @@ static int e1000_clean(struct net_device *poll_dev, int 
*budget)
spin_unlock(adapter-tx_queue_lock);
}
 
-	adapter-clean_rx(adapter, work_done, work_to_do);

-   *budget -= work_done;
-   poll_dev-quota -= work_done;
+   adapter-clean_rx(adapter, work_done, budget);
 
 	/* If no Tx and not enough Rx work done, exit the polling mode */

-   if ((!tx_cleaned  (work_done == 0)) ||
+   if ((tx_cleaned  (work_done  budget)) ||
   !netif_running(poll_dev)) {
 quit_polling:
if (adapter-itr_setting  3)
e1000_set_itr(adapter);
-   netif_rx_complete(poll_dev);
-   if (test_bit(__E1000_DOWN, adapter-state))
-   atomic_dec(adapter-irq_sem);
-   else
-   e1000_irq_enable(adapter);
+   netif_rx_complete(poll_dev, napi);
+   e1000_irq_enable(adapter);
return 0;
}
 
-	return 1;

+   return work_done;
 }
 
 static void e1000_vlan_rx_add_vid(struct net_device *netdev, u16 vid)

@@ -2439,7 +2434,7 @@ int e1000e_up(struct e1000_adapter *adapter)
 
 	clear_bit(__E1000_DOWN, adapter-state);
 
-	netif_poll_enable(adapter-netdev);

+   napi_enable(adapter-napi);
e1000_irq_enable(adapter);
 
 	/* fire a link change interrupt to start the watchdog */

@@ -2472,7 +2467,7 @@ void e1000e_down(struct e1000_adapter *adapter)
e1e_flush();
msleep(10);
 
-	netif_poll_disable(netdev);

+   napi_disable(adapter-napi);
e1000_irq_disable(adapter);
 
 	del_timer_sync(adapter-watchdog_timer);

@@ -2605,7 +2600,7 @@ static int e1000_open(struct 

Re: error(s) in 2.6.23-rc5 bonding.txt ?

2007-09-07 Thread Jay Vosburgh
Rick Jones [EMAIL PROTECTED] wrote:
[...]
 If bonding is the only feeder of the devices, then for a continuous
 flow of traffic, all the slaves will generally receive packets (from
 the kernel, for transmission) at pretty much the same rate, and so
 they won't tend to get ahead or behind.

I could see that if there was just one TCP connection going doing bulk or
something, but if there were a bulk transmitter coupled with an occasional
request/response (ie netperf TCP_STREAM and a TCP_RR) i'd think the tx
rings would no longer remain balanced.

I'm not sure that would be the case, because even the traffic
bump from the TCP_RR would be funneled through the round-robin.  So,
the next packet of the bulk transmit would simply be pushed back to
the next available interface.

Perhaps varying packet sizes would throw things out of whack, if
the small ones happened to line up all one one interface (regardless of
the other traffic).

A PAUSE frame to one interface would almost certainly get things
out of whack, but I don't know how long it would stay out of whack (or,
really, how likely getting a PAUSE is).  Probably just as long as all of
the slaves are running at full speed.

  I haven't investigated into this deeply for a few years, but
 this is my recollection of what happened with the tests I did then.  I
 did testing with multiple 100Mb devices feeding either other sets of
 100Mb devices or single gigabit devices.  I'm willing to believe that
 things have changed, and an N feeding into one configuration can
 reorder, but I haven't seen it (or really looked for it; balance-rr
 isn't much the rage these days).

Are you OK with that block of text simply being yanked?

Mmm... I'm an easy sell for a usually or other suitable caveat
added in strategic places (avoiding absolute statements and all that).
The text does reflect the results of experiments I ran at the time, so
I'm reluctant to toss it wholesale simply because we speculate over how
it might not be accurate.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [TG3]: Workaround MSI bug on 5714/5780.

2007-09-07 Thread Michael Chan
On Thu, 2007-09-06 at 12:50 -0700, David Miller wrote:
 From: Michael Chan [EMAIL PROTECTED]
 Date: Thu, 06 Sep 2007 12:05:30 -0700
 
  The HT1000 bridge may very well have an MSI issue.  I'm checking with
  ServerWorks and I will do some testing to confirm.  If confirmed, we can
  disable MSI behind the HT1000 bridge instead of globally.  The 5714
  issue is not caused by the HT1000 as it is not behind the HT1000.
 
 What I'm going to do at this point is just merge the tg3
 fix into the current 2.6.23 tree right now.
 
 Meanwhile I'll have the HT1000 MSI quirk revert ready and,
 unless we find a reason not to, I'll ask Greg KH to merge
 that patch into 2.6.24
 
David, I see that you have already done the revert in your 2.6.23 tree.
So the following patch assumes the revert is already done.  I think it
is quite safe for this to go into 2.6.23.

[PCI]: Add MSI quirk for ServerWorks HT1000 PCIX bridge.

This is the fix for the following problem:

https://bugzilla.redhat.com/show_bug.cgi?id=227657

The bnx2 device 5706 complains about MSI not working behind a
ServerWorks HT1000 PCIX bridge. An earlier commit to fix the problem:

e3008dedff4bdc96a5f67224cd3d8d12237082a0:

PCI: disable MSI by default on systems with Serverworks HT1000 chips

was not entirely correct, and has been reverted.

MSI does not work on the PCIX bus because the BIOS did not set the
HT_MSI_FLAGS_ENABLE bit in the HyperTransport MSI capability on the
bridge.  We use the existing quirk_msi_ht_cap() to detect the problem
and disable MSI in all buses behind it.

Signed-off-by: Michael Chan [EMAIL PROTECTED]
Cc: Anantha Subramanyam [EMAIL PROTECTED]
Cc: Naren Sankar [EMAIL PROTECTED]

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 6da5a5d..c58429b 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -1703,6 +1703,9 @@ static void __devinit quirk_msi_ht_cap(struct pci_dev 
*dev)
 }
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_SERVERWORKS, 
PCI_DEVICE_ID_SERVERWORKS_HT2000_PCIE,
quirk_msi_ht_cap);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_SERVERWORKS,
+   PCI_DEVICE_ID_SERVERWORKS_HT1000_PXB,
+   quirk_msi_ht_cap);
 
 /* The nVidia CK804 chipset may have 2 HT MSI mappings.
  * MSI are supported if the MSI capability set in any of these mappings.
diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index 3e34dc0..1bdf8be 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -1428,6 +1428,7 @@
 #define PCI_DEVICE_ID_SERVERWORKS_HE 0x0008
 #define PCI_DEVICE_ID_SERVERWORKS_LE 0x0009
 #define PCI_DEVICE_ID_SERVERWORKS_GCNB_LE 0x0017
+#define PCI_DEVICE_ID_SERVERWORKS_HT1000_PXB   0x0036
 #define PCI_DEVICE_ID_SERVERWORKS_EPB0x0103
 #define PCI_DEVICE_ID_SERVERWORKS_HT2000_PCIE  0x0132
 #define PCI_DEVICE_ID_SERVERWORKS_OSB4   0x0200


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html