date:20070920

Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets

2007-09-20 Thread Eric Dumazet


Nagendra Tomar a écrit :

--- Davide Libenzi [EMAIL PROTECTED] wrote:


On Wed, 19 Sep 2007, David Miller wrote:


From: Nagendra Tomar [EMAIL PROTECTED]
Date: Wed, 19 Sep 2007 15:37:09 -0700 (PDT)

With the SOCK_NOSPACE check in tcp_check_space(), this epoll_wait call will 
not return, even when the incoming acks free the buffers.

Note that this patch assumes that the SOCK_NOSPACE check in
tcp_check_space is a trivial optimization which can be safely removed.

I already replied to your patch posting explaining that whatever is
not setting SOCK_NOSPACE should be fixed instead.

Please address that, thanks.
You're not planning of putting the notion of a SOCK_NOSPACE bit inside a 
completely device-unaware interface like epoll, I hope?




Definitely not ! 

The point is that the tcp write space available 
wakeup does not get called if SOCK_NOSPACE bit is not set. This was
fine when the wakeup was merely a wakeup (since SOCK_NOSPACE bit 
indicated that someone really cared abt the wakeup). Now after the

introduction of callback'ed wakeups, we might have some work to
do inside the callback even if there is nobody interested in the wakeup
at that point of time. 


In this particular case the ep_poll_callback is not getting called and
hence the socket fd is not getting added to the ready list.



Does it means that with your patch each ACK on a ET managed socket will 
trigger an epoll event   ?


Maybe your very sensitive high throuput appication needs to set a flag or 
something at socket level to ask for such a behavior.


The default should stay as is. That is an event should be sent only if someone 
cared about the wakeup.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ethtool: marvell register update

2007-09-20 Thread Jeff Garzik


Stephen Hemminger wrote:

Update the decode of sky2 registers.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]


applied

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] phy: export phy_mii_ioctl

2007-09-20 Thread Jeff Garzik


Domen Puncer wrote:

Export phy_mii_ioctl, so network drivers can use it when built
as modules too.

Signed-off-by: Domen Puncer [EMAIL PROTECTED]


applied


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] pci: Fix e100 interrupt quirk

2007-09-20 Thread Andrew Morton

On Tue, 18 Sep 2007 15:17:37 +0400 Valentine Barshak [EMAIL PROTECTED] wrote:

 PCI memory space may have a 64-bit offset on some architectures
 (for example, PowerPC 440) and the actual PCI memory address
 has to fixed up (an offset to PCI mem space shuld be added)
 before remapping. So, pci_iomap should be used instead of
 reading and remapping PCI BAR directly. This has been tested
 on Sequoia PowerPC 440EPx board.
 
 Signed-off-by: Valentine Barshak [EMAIL PROTECTED]
 ---
 
 --- linux-2.6.orig/drivers/pci/quirks.c   2007-09-04 21:15:43.0 
 +0400
 +++ linux-2.6.bld/drivers/pci/quirks.c2007-09-05 20:46:14.0 
 +0400
 @@ -1444,9 +1444,9 @@
  static void __devinit quirk_e100_interrupt(struct pci_dev *dev)
  {
   u16 command;
 - u32 bar;
   u8 __iomem *csr;
   u8 cmd_hi;
 + int rc;
  
   switch (dev-device) {
   /* PCI IDs taken from drivers/net/e100.c */
 @@ -1476,16 +1476,17 @@
* re-enable them when it's ready.
*/
   pci_read_config_word(dev, PCI_COMMAND, command);
 - pci_read_config_dword(dev, PCI_BASE_ADDRESS_0, bar);
  
 - if (!(command  PCI_COMMAND_MEMORY) || !bar)
 + rc = pci_request_region(dev, 0, e100_quirk);
 +
 + if (!(command  PCI_COMMAND_MEMORY) || (rc  0))
   return;

Really?  So if pci_request_region() failed and !(command  PCI_COMMAND_MEMORY),
we leak the region?  So the next call to this function will fail?


 - csr = ioremap(bar, 8);
 + csr = pci_iomap(dev, 0, 8);
   if (!csr) {
   printk(KERN_WARNING PCI: Can't map %s e100 registers\n,
   pci_name(dev));
 - return;
 + goto e100_quirk_exit;
   }
  
   cmd_hi = readb(csr + 3);
 @@ -1495,7 +1496,9 @@
   writeb(1, csr + 3);
   }
  
 - iounmap(csr);
 + pci_iounmap(dev, csr);
 +e100_quirk_exit:
 + pci_release_region(dev, 0);
  }
  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_ANY_ID, 
 quirk_e100_interrupt);
  
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH - net-2.6.24 0/2] Introduce and use print_ip and print_ipv6

2007-09-20 Thread Joe Perches

In the same vein as print_mac, the implementations
introduce declaration macros:
DECLARE_IP_BUF(var)
DECLARE_IPV6_BUF(var)
and functions:
print_ip
print_ipv6
print_ipv6_nofmt

IPV4 Use:

DECLARE_IP_BUF(ipbuf);
__be32 addr;
print_ip(ipbuf, addr);

IPV6 use:

DECLARE_IPV6_BUF(ipv6buf);
const struct in6_addr *addr;
print_ipv6(ipv6buf, addr);
and
print_ipv6_nofmt(ipv6buf, addr);

compiled x86, defconfig and allyesconfig

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH - net-2.6.24 1/2] Introduce and use print_ip

2007-09-20 Thread Joe Perches

This removes the uses of NIPQUAD and HIPQUAD in
drivers/net and net

IPV4 Use:

DECLARE_IP_BUF(ipbuf);
__be32 addr;
print_ip(ipbuf, addr)

Signed-off-by:  Joe Perches [EMAIL PROTECTED]

please pull from:
git pull http://repo.or.cz/r/linux-2.6/trivial-mods.git print_ipv4

stats for print_ipv4:

--

 drivers/net/bonding/bond_main.c|   35 +++-
 drivers/net/bonding/bond_sysfs.c   |   31 ++-
 include/linux/ip.h |8 +++
 include/net/ip_vs.h|   36 +++--
 include/net/sctp/sctp.h|5 +-
 net/atm/clip.c |5 +-
 net/atm/mpc.c  |   28 ++
 net/atm/mpoa_caches.c  |   20 +---
 net/bridge/netfilter/ebt_log.c |   21 
 net/core/netpoll.c |   15 +++--
 net/core/utils.c   |   14 +
 net/dccp/ipv4.c|   10 ++--
 net/dccp/probe.c   |   14 +++--
 net/ipv4/af_inet.c |8 ++-
 net/ipv4/arp.c |9 ++--
 net/ipv4/fib_trie.c|7 ++-
 net/ipv4/icmp.c|   26 +
 net/ipv4/ip_fragment.c |5 +-
 net/ipv4/ip_input.c|8 ++-
 net/ipv4/ipcomp.c  |5 +-
 net/ipv4/ipconfig.c|   46 +---
 net/ipv4/ipvs/ip_vs_conn.c |   63 ++
 net/ipv4/ipvs/ip_vs_core.c |   51 +++---
 net/ipv4/ipvs/ip_vs_ctl.c  |   35 +++-
 net/ipv4/ipvs/ip_vs_dh.c   |   10 ++--
 net/ipv4/ipvs/ip_vs_ftp.c  |   19 ---
 net/ipv4/ipvs/ip_vs_lblc.c |   14 +++--
 net/ipv4/ipvs/ip_vs_lblcr.c|   34 +++-
 net/ipv4/ipvs/ip_vs_lc.c   |5 +-
 net/ipv4/ipvs/ip_vs_nq.c   |5 +-
 net/ipv4/ipvs/ip_vs_proto.c|   20 ---
 net/ipv4/ipvs/ip_vs_proto_ah.c |   24 +---
 net/ipv4/ipvs/ip_vs_proto_esp.c|   24 +---
 net/ipv4/ipvs/ip_vs_proto_tcp.c|   20 ---
 net/ipv4/ipvs/ip_vs_proto_udp.c|   10 ++--
 net/ipv4/ipvs/ip_vs_rr.c   |5 +-
 net/ipv4/ipvs/ip_vs_sed.c  |5 +-
 net/ipv4/ipvs/ip_vs_sh.c   |   10 ++--
 net/ipv4/ipvs/ip_vs_sync.c |5 +-
 net/ipv4/ipvs/ip_vs_wlc.c  |5 +-
 net/ipv4/ipvs/ip_vs_wrr.c  |5 +-
 net/ipv4/ipvs/ip_vs_xmit.c |   16 +++---
 net/ipv4/netfilter/arp_tables.c|   20 ---
 net/ipv4/netfilter/ip_tables.c |   19 ---
 net/ipv4/netfilter/ipt_CLUSTERIP.c |   16 +++---
 net/ipv4/netfilter/ipt_LOG.c   |   10 ++--
 net/ipv4/netfilter/ipt_SAME.c  |   21 +---
 net/ipv4/netfilter/ipt_iprange.c   |   21 
 net/ipv4/netfilter/ipt_recent.c|5 +-
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c |   20 ---
 net/ipv4/netfilter/nf_nat_ftp.c|3 +-
 net/ipv4/netfilter/nf_nat_h323.c   |   68 ++-
 net/ipv4/netfilter/nf_nat_irc.c|5 +-
 net/ipv4/netfilter/nf_nat_rule.c   |   16 --
 net/ipv4/netfilter/nf_nat_sip.c|   12 +++--
 net/ipv4/netfilter/nf_nat_snmp_basic.c |   13 +++--
 net/ipv4/route.c   |   58 +
 net/ipv4/tcp_input.c   |5 +-
 net/ipv4/tcp_ipv4.c|   25 +
 net/ipv4/tcp_probe.c   |8 ++-
 net/ipv4/tcp_timer.c   |5 +-
 net/ipv4/udp.c |   14 +++--
 net/ipv6/netfilter/ip6t_LOG.c  |9 ++-
 net/netfilter/nf_conntrack_ftp.c   |   10 ++--
 net/netfilter/nf_conntrack_irc.c   |   18 ---
 net/netfilter/xt_hashlimit.c   |   10 ++--
 net/rxrpc/af_rxrpc.c   |7 ++-
 net/rxrpc/ar-error.c   |5 +-
 net/rxrpc/ar-local.c   |   19 ---
 net/rxrpc/ar-peer.c|9 ++--
 net/rxrpc/ar-proc.c|   23 +---
 net/rxrpc/ar-transport.c   |   17 --
 net/rxrpc/rxkad.c  |4 +-
 net/sctp/protocol.c|   26 ++---
 net/sctp/sm_statefuns.c|6 +-

[git patches] net driver updates

2007-09-20 Thread Jeff Garzik


[this, sans patch which was too big for netdev, was just sent upstream.
 the patch can be recreated via 'git diff net-2.6.24..upstream']



NOTE that sky2 will also be going upstream for 2.6.23-rc, as just posted
on netdev.

Please pull from the 'upstream' branch of
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git upstream

to receive the following changes:

Al Viro (20):
  8139cp: trivial endianness annotations
  endianness annotations drivers/net/bonding/
  fix vlan in 8139cp on big-endian
  3c59x: trivial endianness annotations, NULL noise removal
  amd8111e: trivial endianness annotations, NULL noise removal
  amd8111e big-endian fix
  arcnet endianness annotations
  tulip: endianness annotations
  typhoon: trivial endianness annotations
  pcnet32: endianness
  ixgb: endianness
  drivers/net/irda: endianness, NULL noise
  starfire: trivial endianness annotations
  r8169: endianness
  via-rhine: endianness
  pppoe: endianness
  tms380tr: trivial endianness annotations
  drivers/net/appletalk: endianness
  3c509: endianness
  cxgb3: trivial endianness annotations

Alex Landau (1):
  Blackfin EMAC driver: add function to change the MAC address

Bryan Wu (3):
  Blackfin EMAC driver: add power management interface and change the 
bf537mac_reset to bf537mac_disable
  Blackfin EMAC driver: Add phy abstraction layer supporting in bfin_emac 
driver
  Blackfin EMAC driver: add a select for the PHYLIB of this driver

David Gibson (1):
  Device tree aware EMAC driver

Dhananjay Phadke (1):
  netxen: ethtool fixes

Jeff Garzik (1):
  [netdrvr] Stop using legacy hooks -self_test_count, -get_stats_count

Maciej W. Rozycki (3):
  sb1250-mac.c: Fix stats references
  NET_SB1250_MAC: Update Kconfig entry
  NET_SB1250_MAC: Rename to SB1250_MAC

Sivakumar Subramani (4):
  S2io: Change kmalloc+memset to k[zc]alloc
  S2io: Removed unused feature - bimodal interrupts
  S2io: Added support set_mac_address driver entry point
  S2io: Updating transceiver information in ethtool function

Stephen Hemminger (3):
  sky2: fix VLAN receive processing (resend)
  sky2: ethtool speed report bug
  sky2: version 1.18

Ursula Braun (1):
  s390 networking MAINTAINERS

Vitaly Bordug (2):
  FS_ENET: TX stuff should use fep-tx_lock, instead of fep-lock.
  FS_ENET: Add polling support

 Documentation/powerpc/booting-without-of.txt |  156 +
 MAINTAINERS  |   12 
 arch/mips/configs/bigsur_defconfig   |2 
 arch/mips/configs/sb1250-swarm_defconfig |2 
 arch/powerpc/platforms/44x/Kconfig   |3 
 arch/powerpc/platforms/cell/Kconfig  |4 
 drivers/net/3c509.c  |4 
 drivers/net/3c59x.c  |   39 
 drivers/net/8139cp.c |   59 
 drivers/net/8139too.c|   11 
 drivers/net/Kconfig  |   89 
 drivers/net/Makefile |3 
 drivers/net/amd8111e.c   |9 
 drivers/net/amd8111e.h   |   24 
 drivers/net/appletalk/ipddp.c|2 
 drivers/net/appletalk/ipddp.h|2 
 drivers/net/arcnet/rfc1051.c |4 
 drivers/net/arcnet/rfc1201.c |6 
 drivers/net/atl1/atl1_ethtool.c  |   11 
 drivers/net/b44.c|   11 
 drivers/net/bfin_mac.c   |  347 ++-
 drivers/net/bfin_mac.h   |   53 
 drivers/net/bnx2.c   |   20 
 drivers/net/bonding/bond_3ad.c   |   42 
 drivers/net/bonding/bond_3ad.h   |   20 
 drivers/net/bonding/bond_alb.c   |   19 
 drivers/net/bonding/bond_alb.h   |4 
 drivers/net/bonding/bond_main.c  |   22 
 drivers/net/bonding/bond_sysfs.c |8 
 drivers/net/bonding/bonding.h|6 
 drivers/net/cassini.c|   11 
 drivers/net/chelsio/cxgb2.c  |   11 
 drivers/net/cxgb3/common.h   |4 
 drivers/net/cxgb3/cxgb3_main.c   |   11 
 drivers/net/cxgb3/sge.c  |6 
 drivers/net/e100.c   |   19 
 drivers/net/e1000/e1000_ethtool.c|   22 
 drivers/net/e1000e/ethtool.c |   21 
 drivers/net/ehea/ehea_ethtool.c  |   13 
 drivers/net/forcedeth.c  |   45 
 drivers/net/fs_enet/fs_enet-main.c   |   77 
 drivers/net/fs_enet/mac-fcc.c|   12 
 drivers/net/fs_enet/mac-fec.c|   30 
 drivers/net/fs_enet/mac-scc.c|   20 
 drivers/net/fs_enet/mii-bitbang.c|   10 
 drivers/net/gianfar_ethtool.c|   20 
 drivers/net/ibm_emac/Kconfig

Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets

2007-09-20 Thread Nagendra Tomar

--- Eric Dumazet [EMAIL PROTECTED] wrote:

 Nagendra Tomar a écrit :
  --- Davide Libenzi [EMAIL PROTECTED] wrote:

  On Wed, 19 Sep 2007, David Miller wrote:

  From: Nagendra Tomar [EMAIL PROTECTED]
  Date: Wed, 19 Sep 2007 15:37:09 -0700 (PDT)

  With the SOCK_NOSPACE check in tcp_check_space(), this epoll_wait call 
  will 
  not return, even when the incoming acks free the buffers.
  Note that this patch assumes that the SOCK_NOSPACE check in
  tcp_check_space is a trivial optimization which can be safely removed.
  I already replied to your patch posting explaining that whatever is
  not setting SOCK_NOSPACE should be fixed instead.

  Please address that, thanks.
  You're not planning of putting the notion of a SOCK_NOSPACE bit inside a 
  completely device-unaware interface like epoll, I hope?

  Definitely not ! 

  The point is that the tcp write space available 
  wakeup does not get called if SOCK_NOSPACE bit is not set. This was
  fine when the wakeup was merely a wakeup (since SOCK_NOSPACE bit 
  indicated that someone really cared abt the wakeup). Now after the
  introduction of callback'ed wakeups, we might have some work to
  do inside the callback even if there is nobody interested in the wakeup
  at that point of time. 

  In this particular case the ep_poll_callback is not getting called and
  hence the socket fd is not getting added to the ready list.

 Does it means that with your patch each ACK on a ET managed socket will 
 trigger an epoll event   ?

 Maybe your very sensitive high throuput appication needs to set a flag or 
 something at socket level to ask for such a behavior.

 The default should stay as is. That is an event should be sent only if 
 someone 
 cared about the wakeup.

A high throughput app will always care about the wakeup, or else it will 
not be a high throughput app in the first place. An application that
occasionaly writes and then goes to slumber and then writes again will
not be a high throughput app. 

My point is that the SOCK_NOSPACE check does not save us much. For
high throughput app it will almost always be set, thus making the 
check insignificant, and for the low throughput case we care less.

Thanx,
Tomar

  ___
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/ 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/7] CAN: Add PF_CAN core module

2007-09-20 Thread Urs Thuermann

Hi Patrick,

I have done allmost all changes to the code as you suggested.  The
changes to use the return value of can_rx_register() also fixed a
minor flax with failing bind() and setsockopt() on raw sockets.

But there are two things left I would like to ask/understand:

Patrick McHardy [EMAIL PROTECTED] writes:

  When the module is unloaded it calls can_proto_unregister() which
  clears the pointer.  Do you see a race condition here?
 
 Yes, you do request_module, load the module, get the cp pointer
 from proto_tab, the module is unloaded again. cp points to
 stable memory. Using module references would fix this.

How would I use the module reference counter?  Somehow with
try_module_get()?  I have thought something like

cp = proto_tab[protocol];
if (!cp ...)
return ...;

if (!try_module_get(cp-prot-owner))
return ...;

sk = sk_alloc(...)

module_put(...);
return ret;

But here I see two problems:

1. Between the check !cp...  and referencing cp-prot-owner the
   module could get unloaded and the reference be invalid.  Is there
   some lock I can hold that prevents module unloading?  I haven't
   found something like this in include/linux/module.h

2. If the module gets unloaded after the first check and
   request_module() but before the call to try_module_get() the
   socket() syscall will return with error, although module auto
   loading would normally be successful.  How can I prevent that?

  find_dev_rcv_lists() is called in one place from can_rcv() with RCU
  lock held, as you write.  The other two calls to find_dev_rcv_lists()
  are from can_rx_register/unregister() functions which change the
  receive lists.  Therefore, we can't only use RCU but need protection
  against simultanous writes.  We do this with the spin_lock_bh().  The
  _bh variant, because can_rcv() runs in interrupt and we need to block
  that.  I thought this is pretty standard.
  
  I'll check this again tomorrow, but I have put much time in these
  locking issues already, changed it quite a few times and hoped to have
  got it right finally.
 
 
 I'm not saying you should use *only* RCU, you need the lock
 for additions/removal of course, but since the receive path
 doesn't take that lock and relies on RCU, you need to use
 the _rcu list walking variant to avoid races with concurrent
 list changes.

I have no objections to add the _rcu suffix for the code changing the
receive lists, but I don't see why it's necessary.  When I do a
spin_lock_bh() before writing, can't I be sure that there is no
interrupt routine running in parallel while I hold this spinlock?  If
so, there is no reader in parallel because the can_rcv() function runs
in a softirq.  I'd really like to understand why you think the writers
should also use the _rcu variant.  I'm sorry if I miss something
obvious here, but could you try to explain it to me?

urs
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [2.6.23-rc4-mm1][Bug] kernel BUG at include/linux/netdevice.h:339!

2007-09-20 Thread Kamalesh Babulal

Andrew Morton wrote:
 On Mon, 17 Sep 2007 17:46:38 +0530
 Kamalesh Babulal [EMAIL PROTECTED] wrote:

   
 Kernel Bug is hit with 2.6.23-rc4-mm1 kernel on ppc64 machine.

 kernel BUG at include/linux/netdevice.h:339!
 

 (please cc netdev@vger.kernel.org on networking-related matters)

 You died here:

 static inline void napi_complete(struct napi_struct *n)
 {
 BUG_ON(!test_bit(NAPI_STATE_SCHED, n-state));

 The NAPI changes have had a few problems and hopefully things have
 been fixed up since then.  I'll try to get rc6-mm1 out this evening,
 so please retest that?
   
Hi Andrew,

I don't see this bug in the 2.6.23-rc6-mm1, till now.

-- 
Thanks  Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/7] CAN: Add PF_CAN core module

2007-09-20 Thread Patrick McHardy

Urs Thuermann wrote:
 Patrick McHardy [EMAIL PROTECTED] writes:
 
When the module is unloaded it calls can_proto_unregister() which
clears the pointer.  Do you see a race condition here?

Yes, you do request_module, load the module, get the cp pointer
from proto_tab, the module is unloaded again. cp points to
stable memory. Using module references would fix this.
 
 
 How would I use the module reference counter?  Somehow with
 try_module_get()?  I have thought something like
 
 cp = proto_tab[protocol];
 if (!cp ...)
 return ...;
 
 if (!try_module_get(cp-prot-owner))
 return ...;
 
 sk = sk_alloc(...)
 
 module_put(...);
 return ret;
 
 But here I see two problems:
 
 1. Between the check !cp...  and referencing cp-prot-owner the
module could get unloaded and the reference be invalid.  Is there
some lock I can hold that prevents module unloading?  I haven't
found something like this in include/linux/module.h


No, you need to add your own locking to prevent this, something
list this:

registration/unregistration:

take lock
change proto_tab[]
release lock

lookup:

take lock
cp = proto_tab[]
if (cp  !try_module_get(cp-owner))
cp = NULL
release lock

 2. If the module gets unloaded after the first check and
request_module() but before the call to try_module_get() the
socket() syscall will return with error, although module auto
loading would normally be successful.  How can I prevent that?


Why do you want to prevent it? The admin unloaded the module,
so he apparently doesn't want the operation to succeed.

find_dev_rcv_lists() is called in one place from can_rcv() with RCU
lock held, as you write.  The other two calls to find_dev_rcv_lists()
are from can_rx_register/unregister() functions which change the
receive lists.  Therefore, we can't only use RCU but need protection
against simultanous writes.  We do this with the spin_lock_bh().  The
_bh variant, because can_rcv() runs in interrupt and we need to block
that.  I thought this is pretty standard.

I'll check this again tomorrow, but I have put much time in these
locking issues already, changed it quite a few times and hoped to have
got it right finally.


I'm not saying you should use *only* RCU, you need the lock
for additions/removal of course, but since the receive path
doesn't take that lock and relies on RCU, you need to use
the _rcu list walking variant to avoid races with concurrent
list changes.
 
 
 I have no objections to add the _rcu suffix for the code changing the
 receive lists, but I don't see why it's necessary.  When I do a
 spin_lock_bh() before writing, can't I be sure that there is no
 interrupt routine running in parallel while I hold this spinlock?  If
 so, there is no reader in parallel because the can_rcv() function runs
 in a softirq.  I'd really like to understand why you think the writers
 should also use the _rcu variant. 


I'm saying you need _rcu for the *read side*. All operations changing
the list already use the _rcu variants.

 I'm sorry if I miss something
 obvious here, but could you try to explain it to me?


spin_lock_bh only disables BHs locally, other CPUs can still process
softirqs. And since rcv_lists_lock is only used in process context,
the BH disabling is actually not even necessary.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] pci: Fix e100 interrupt quirk

2007-09-20 Thread Valentine Barshak


Andrew Morton wrote:

On Tue, 18 Sep 2007 15:17:37 +0400 Valentine Barshak [EMAIL PROTECTED] wrote:


PCI memory space may have a 64-bit offset on some architectures
(for example, PowerPC 440) and the actual PCI memory address
has to fixed up (an offset to PCI mem space shuld be added)
before remapping. So, pci_iomap should be used instead of
reading and remapping PCI BAR directly. This has been tested
on Sequoia PowerPC 440EPx board.

Signed-off-by: Valentine Barshak [EMAIL PROTECTED]
---

--- linux-2.6.orig/drivers/pci/quirks.c 2007-09-04 21:15:43.0 +0400
+++ linux-2.6.bld/drivers/pci/quirks.c  2007-09-05 20:46:14.0 +0400
@@ -1444,9 +1444,9 @@
 static void __devinit quirk_e100_interrupt(struct pci_dev *dev)
 {
u16 command;
-   u32 bar;
u8 __iomem *csr;
u8 cmd_hi;
+   int rc;
 
 	switch (dev-device) {

/* PCI IDs taken from drivers/net/e100.c */
@@ -1476,16 +1476,17 @@
 * re-enable them when it's ready.
 */
pci_read_config_word(dev, PCI_COMMAND, command);
-   pci_read_config_dword(dev, PCI_BASE_ADDRESS_0, bar);
 
-	if (!(command  PCI_COMMAND_MEMORY) || !bar)

+   rc = pci_request_region(dev, 0, e100_quirk);
+
+   if (!(command  PCI_COMMAND_MEMORY) || (rc  0))
return;


Really?  So if pci_request_region() failed and !(command  PCI_COMMAND_MEMORY),
we leak the region?  So the next call to this function will fail?



I've split command and request region checks and submitted new patch:
http://lkml.org/lkml/2007/9/19/106
Please, take a look,
Thanks,
Valentine.




-   csr = ioremap(bar, 8);
+   csr = pci_iomap(dev, 0, 8);
if (!csr) {
printk(KERN_WARNING PCI: Can't map %s e100 registers\n,
pci_name(dev));
-   return;
+   goto e100_quirk_exit;
}
 
 	cmd_hi = readb(csr + 3);

@@ -1495,7 +1496,9 @@
writeb(1, csr + 3);
}
 
-	iounmap(csr);

+   pci_iounmap(dev, csr);
+e100_quirk_exit:
+   pci_release_region(dev, 0);
 }
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_ANY_ID, quirk_e100_interrupt);
 
-

To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/7] CAN: Add PF_CAN core module

2007-09-20 Thread Urs Thuermann

Patrick McHardy [EMAIL PROTECTED] writes:

 No, you need to add your own locking to prevent this, something
 list this:
 
 registration/unregistration:
 
 take lock
 change proto_tab[]
 release lock
 
 lookup:
 
 take lock
 cp = proto_tab[]
 if (cp  !try_module_get(cp-owner))
   cp = NULL
 release lock

Ah, ok.  Thanks for that hint.  I will add it that way.

  2. If the module gets unloaded after the first check and
 request_module() but before the call to try_module_get() the
 socket() syscall will return with error, although module auto
 loading would normally be successful.  How can I prevent that?
 
 
 Why do you want to prevent it? The admin unloaded the module,
 so he apparently doesn't want the operation to succeed.

Well, unloading a module doesn't usually cause to operation to fail
when auto loading is enabled.  It only wouldn't succeed when the
unload happens in the small window between test/request-module and
call to try_module_get().  This looks ugly to me.  But the lock you
described above would also solve this.

 I'm saying you need _rcu for the *read side*. All operations changing
 the list already use the _rcu variants.
 
  I'm sorry if I miss something
  obvious here, but could you try to explain it to me?
 
 
 spin_lock_bh only disables BHs locally, other CPUs can still process
 softirqs. And since rcv_lists_lock is only used in process context,
 the BH disabling is actually not even necessary.

Well, I finally (hopefully) got it and I have changed the code
accordingly.  Thanks for your explanation.

I will post our updated code again, probably today.  The issues still
left are

* module parameter for loopback, but we want to keep that.
* configure option for allowing normal users access to raw and bcm CAN
  sockets.  I'll check how easily an (embedded) system can be set up
  to run relevant/all processes with the CAP_NEW_RAW capability.  I
  would like to kill that configure option.
* seq_files for proc fs.  On my TODO list.

urs
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [LARTC] ifb and ppp

2007-09-20 Thread Patrick McHardy

Please keep netdev and myself CCed.

Frithjof Hammer wrote:
Does this patch help?
 
 
 A further examiniation:
 [...]
 printk (fri: mein type %x\n,dev-type);
 switch (dev-type) {
 
 [...]
 shows this:
 
 [EMAIL PROTECTED]:/usr/src/linux-source-2.6.21# dmesg | grep fri
 fri: mein type 1
 
 that is defined as ARPHRD_ETHER in include/linux/if_arp.h.
 
 As far as i understand this means, that my ppp0 device is recognized as 
 Ethernetinterface.
 
 Any further help/ideas?


I misread the code, the device it looks at in tcf_mirred_init is
the target device (ifb). So what it does is check whether the
target device wants a link layer header and if it does restores
the one from the source device. So currently it seems impossible
to get rid of the PPP(oE) header.

Jamal, is that how its supposed to work?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-2.6.24 0/9]: TCP improvements cleanups

2007-09-20 Thread Ilpo Järvinen

Hi Dave,

Just in case you're short on what to do ;-) here are some TCP
related cleanups  improvements to net-2.6.24. Including FRTO
undo fix which finally should allow FRTO to be turned on, and
some simple fastpath tweaks simple enough to the 2.6.24
schedule. ...I've a larger fastpath_hint removal patch coming
up later too but it's really a monster which needs more time
though I guess it could really cut down the SACK processing
latencies people are experience with high-speed flows (I'll
probably post it with RFC once you've picked these up).

These were boot ( couple of hours) tested on the top of
net-2.6.24 (something after the first large rebase you did,
so you could count that as success report of it too :-)).
Not sure if all those fragment/collapse paths I modified got
executed though.

--
 i.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/9] [TCP]: Maintain highest_sack accurately to the highest skb

2007-09-20 Thread Ilpo Järvinen

In general, it should not be necessary to call tcp_fragment for
already SACKed skbs, but it's better to be safe than sorry. And
indeed, it can be called from sacktag when a DSACK arrives or
some ACK (with SACK) reordering occurs (sacktag could be made
to avoid the call in the latter case though I'm not sure if it's
worth of the trouble and added complexity to cover such marginal
case).

The collapse case has return for SACKED_ACKED case earlier, so
just WARN_ON if internal inconsistency is detected for some
reason.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_output.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index d65d17b..9df5b2a 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -692,6 +692,9 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 
len, unsigned int mss
TCP_SKB_CB(buff)-end_seq = TCP_SKB_CB(skb)-end_seq;
TCP_SKB_CB(skb)-end_seq = TCP_SKB_CB(buff)-seq;
 
+   if (tp-sacked_out  (TCP_SKB_CB(skb)-seq == tp-highest_sack))
+   tp-highest_sack = TCP_SKB_CB(buff)-seq;
+
/* PSH and FIN should only be set in the second packet. */
flags = TCP_SKB_CB(skb)-flags;
TCP_SKB_CB(skb)-flags = flags  ~(TCPCB_FLAG_FIN|TCPCB_FLAG_PSH);
@@ -1723,6 +1726,10 @@ static void tcp_retrans_try_collapse(struct sock *sk, 
struct sk_buff *skb, int m
/* Update sequence range on original skb. */
TCP_SKB_CB(skb)-end_seq = TCP_SKB_CB(next_skb)-end_seq;
 
+   if (WARN_ON(tp-sacked_out 
+   (TCP_SKB_CB(next_skb)-seq == tp-highest_sack)))
+   return;
+
/* Merge over control information. */
flags |= TCP_SKB_CB(next_skb)-flags; /* This moves PSH/FIN 
etc. over */
TCP_SKB_CB(skb)-flags = flags;
-- 
1.5.0.6

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/9] [TCP]: Make fackets_out accurate

2007-09-20 Thread Ilpo Järvinen

Substraction for fackets_out is unconditional when snd_una
advances, thus there's no need to do it inside the loop. Just
make sure correct bounds are honored.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_input.c  |   10 +++---
 net/ipv4/tcp_output.c |   44 ++--
 2 files changed, 29 insertions(+), 25 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index fd0ae4d..09b6b1d 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2302,8 +2302,8 @@ tcp_fastretrans_alert(struct sock *sk, int pkts_acked, 
int flag)
 * 1. Reno does not count dupacks (sacked_out) automatically. */
if (!tp-packets_out)
tp-sacked_out = 0;
-   /* 2. SACK counts snd_fack in packets inaccurately. */
-   if (tp-sacked_out == 0)
+
+   if (WARN_ON(!tp-sacked_out  tp-fackets_out))
tp-fackets_out = 0;
 
/* Now state machine starts.
@@ -2571,10 +2571,6 @@ static int tcp_tso_acked(struct sock *sk, struct sk_buff 
*skb,
} else if (*seq_rtt  0)
*seq_rtt = now - scb-when;
 
-   if (tp-fackets_out) {
-   __u32 dval = min(tp-fackets_out, packets_acked);
-   tp-fackets_out -= dval;
-   }
tp-packets_out -= packets_acked;
 
BUG_ON(tcp_skb_pcount(skb) == 0);
@@ -2657,7 +2653,6 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 
*seq_rtt_p)
seq_rtt = now - scb-when;
last_ackt = skb-tstamp;
}
-   tcp_dec_pcount_approx(tp-fackets_out, skb);
tp-packets_out -= tcp_skb_pcount(skb);
tcp_unlink_write_queue(skb, sk);
sk_stream_free_skb(sk, skb);
@@ -2672,6 +2667,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 
*seq_rtt_p)
tcp_ack_update_rtt(sk, acked, seq_rtt);
tcp_rearm_rto(sk);
 
+   tp-fackets_out -= min(pkts_acked, tp-fackets_out);
if (tcp_is_reno(tp))
tcp_remove_reno_sacks(sk, pkts_acked);
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 9df5b2a..cbe8bf6 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -652,6 +652,26 @@ static void tcp_set_skb_tso_segs(struct sock *sk, struct 
sk_buff *skb, unsigned
}
 }
 
+/* When a modification to fackets out becomes necessary, we need to check
+ * skb is counted to fackets_out or not. Another important thing is to
+ * tweak SACK fastpath hint too as it would overwrite all changes unless
+ * hint is also changed.
+ */
+static void tcp_adjust_fackets_out(struct tcp_sock *tp, struct sk_buff *skb,
+  int decr)
+{
+   if (!tp-sacked_out)
+   return;
+
+   if (!before(tp-highest_sack, TCP_SKB_CB(skb)-seq))
+   tp-fackets_out -= decr;
+
+   /* cnt_hint is off-by-one compared with fackets_out (see sacktag) */
+   if (tp-fastpath_skb_hint != NULL 
+   after(TCP_SKB_CB(tp-fastpath_skb_hint)-seq, TCP_SKB_CB(skb)-seq))
+   tp-fastpath_cnt_hint -= decr;
+}
+
 /* Function to create two new TCP segments.  Shrinks the given segment
  * to the specified size and appends a new segment with the rest of the
  * packet to the list.  This won't be called frequently, I hope.
@@ -746,21 +766,12 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, 
u32 len, unsigned int mss
if (TCP_SKB_CB(skb)-sacked  TCPCB_LOST)
tp-lost_out -= diff;
 
-   if (diff  0) {
-   /* Adjust Reno SACK estimate. */
-   if (tcp_is_reno(tp)) {
-   tcp_dec_pcount_approx_int(tp-sacked_out, 
diff);
-   tcp_verify_left_out(tp);
-   }
-
-   tcp_dec_pcount_approx_int(tp-fackets_out, diff);
-   /* SACK fastpath might overwrite it unless dealt with */
-   if (tp-fastpath_skb_hint != NULL 
-   after(TCP_SKB_CB(tp-fastpath_skb_hint)-seq,
- TCP_SKB_CB(skb)-seq)) {
-   
tcp_dec_pcount_approx_int(tp-fastpath_cnt_hint, diff);
-   }
+   /* Adjust Reno SACK estimate. */
+   if (tcp_is_reno(tp)  diff  0) {
+   tcp_dec_pcount_approx_int(tp-sacked_out, diff);
+   tcp_verify_left_out(tp);
}
+   tcp_adjust_fackets_out(tp, skb, diff);
}
 
/* Link BUFF into the send queue. */
@@ -1746,10 +1757,7 @@ static void tcp_retrans_try_collapse(struct sock *sk, 
struct sk_buff *skb, int m
if (tcp_is_reno(tp)  tp-sacked_out)
tcp_dec_pcount_approx(tp-sacked_out, next_skb);
 
-

[PATCH 3/9] [TCP]: clear_all_retrans_hints prefixed by tcp_

2007-09-20 Thread Ilpo Järvinen

In addition, fix its function comment spacing.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 include/net/tcp.h |4 ++--
 net/ipv4/tcp_input.c  |   10 +-
 net/ipv4/tcp_output.c |6 +++---
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index f28f382..16dfe3c 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1066,8 +1066,8 @@ static inline void tcp_mib_init(void)
TCP_ADD_STATS_USER(TCP_MIB_MAXCONN, -1);
 }
 
-/*from STCP */
-static inline void clear_all_retrans_hints(struct tcp_sock *tp){
+/* from STCP */
+static inline void tcp_clear_all_retrans_hints(struct tcp_sock *tp) {
tp-lost_skb_hint = NULL;
tp-scoreboard_skb_hint = NULL;
tp-retransmit_skb_hint = NULL;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 09b6b1d..89162a9 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1670,7 +1670,7 @@ static void tcp_enter_frto_loss(struct sock *sk, int 
allowed_segments, int flag)
tp-high_seq = tp-frto_highmark;
TCP_ECN_queue_cwr(tp);
 
-   clear_all_retrans_hints(tp);
+   tcp_clear_all_retrans_hints(tp);
 }
 
 void tcp_clear_retrans(struct tcp_sock *tp)
@@ -1741,7 +1741,7 @@ void tcp_enter_loss(struct sock *sk, int how)
/* Abort FRTO algorithm if one is in progress */
tp-frto_counter = 0;
 
-   clear_all_retrans_hints(tp);
+   tcp_clear_all_retrans_hints(tp);
 }
 
 static int tcp_check_sack_reneging(struct sock *sk)
@@ -2106,7 +2106,7 @@ static void tcp_undo_cwr(struct sock *sk, const int undo)
 
/* There is something screwy going on with the retrans hints after
   an undo */
-   clear_all_retrans_hints(tp);
+   tcp_clear_all_retrans_hints(tp);
 }
 
 static inline int tcp_may_undo(struct tcp_sock *tp)
@@ -2199,7 +2199,7 @@ static int tcp_try_undo_loss(struct sock *sk)
TCP_SKB_CB(skb)-sacked = ~TCPCB_LOST;
}
 
-   clear_all_retrans_hints(tp);
+   tcp_clear_all_retrans_hints(tp);
 
DBGUNDO(sk, partial loss);
tp-lost_out = 0;
@@ -2656,7 +2656,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 
*seq_rtt_p)
tp-packets_out -= tcp_skb_pcount(skb);
tcp_unlink_write_queue(skb, sk);
sk_stream_free_skb(sk, skb);
-   clear_all_retrans_hints(tp);
+   tcp_clear_all_retrans_hints(tp);
}
 
if (ackedFLAG_ACKED) {
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index cbe8bf6..f46d24b 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -687,7 +687,7 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 
len, unsigned int mss
 
BUG_ON(len  skb-len);
 
-   clear_all_retrans_hints(tp);
+   tcp_clear_all_retrans_hints(tp);
nsize = skb_headlen(skb) - len;
if (nsize  0)
nsize = 0;
@@ -1719,7 +1719,7 @@ static void tcp_retrans_try_collapse(struct sock *sk, 
struct sk_buff *skb, int m
   tcp_skb_pcount(next_skb) != 1);
 
/* changing transmit queue under us so clear hints */
-   clear_all_retrans_hints(tp);
+   tcp_clear_all_retrans_hints(tp);
 
/* Ok.  We will be able to collapse the packet. */
tcp_unlink_write_queue(next_skb, sk);
@@ -1792,7 +1792,7 @@ void tcp_simple_retransmit(struct sock *sk)
}
}
 
-   clear_all_retrans_hints(tp);
+   tcp_clear_all_retrans_hints(tp);
 
if (!lost)
return;
-- 
1.5.0.6

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/9] [TCP]: Move accounting from tso_acked to clean_rtx_queue

2007-09-20 Thread Ilpo Järvinen

The accounting code is pretty much the same, so it's a shame
we do it in two places.

I'm not too sure if added fully_acked check in MTU probing is
really what we want perhaps the added end_seq could be used in
the after() comparison.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_input.c |   75 +
 1 files changed, 32 insertions(+), 43 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 89162a9..d340fd5 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2528,14 +2528,12 @@ static void tcp_rearm_rto(struct sock *sk)
}
 }
 
-static int tcp_tso_acked(struct sock *sk, struct sk_buff *skb,
-__u32 now, __s32 *seq_rtt)
+static u32 tcp_tso_acked(struct sock *sk, struct sk_buff *skb)
 {
struct tcp_sock *tp = tcp_sk(sk);
struct tcp_skb_cb *scb = TCP_SKB_CB(skb);
__u32 seq = tp-snd_una;
__u32 packets_acked;
-   int acked = 0;
 
/* If we get here, the whole TSO packet has not been
 * acked.
@@ -2548,36 +2546,11 @@ static int tcp_tso_acked(struct sock *sk, struct 
sk_buff *skb,
packets_acked -= tcp_skb_pcount(skb);
 
if (packets_acked) {
-   __u8 sacked = scb-sacked;
-
-   acked |= FLAG_DATA_ACKED;
-   if (sacked) {
-   if (sacked  TCPCB_RETRANS) {
-   if (sacked  TCPCB_SACKED_RETRANS)
-   tp-retrans_out -= packets_acked;
-   acked |= FLAG_RETRANS_DATA_ACKED;
-   *seq_rtt = -1;
-   } else if (*seq_rtt  0)
-   *seq_rtt = now - scb-when;
-   if (sacked  TCPCB_SACKED_ACKED)
-   tp-sacked_out -= packets_acked;
-   if (sacked  TCPCB_LOST)
-   tp-lost_out -= packets_acked;
-   if (sacked  TCPCB_URG) {
-   if (tp-urg_mode 
-   !before(seq, tp-snd_up))
-   tp-urg_mode = 0;
-   }
-   } else if (*seq_rtt  0)
-   *seq_rtt = now - scb-when;
-
-   tp-packets_out -= packets_acked;
-
BUG_ON(tcp_skb_pcount(skb) == 0);
BUG_ON(!before(scb-seq, scb-end_seq));
}
 
-   return acked;
+   return packets_acked;
 }
 
 /* Remove acknowledged frames from the retransmission queue. */
@@ -2587,6 +2560,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 
*seq_rtt_p)
const struct inet_connection_sock *icsk = inet_csk(sk);
struct sk_buff *skb;
__u32 now = tcp_time_stamp;
+   int fully_acked = 1;
int acked = 0;
int prior_packets = tp-packets_out;
__s32 seq_rtt = -1;
@@ -2595,6 +2569,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 
*seq_rtt_p)
while ((skb = tcp_write_queue_head(sk)) 
   skb != tcp_send_head(sk)) {
struct tcp_skb_cb *scb = TCP_SKB_CB(skb);
+   u32 end_seq;
+   u32 packets_acked;
__u8 sacked = scb-sacked;
 
/* If our packet is before the ack sequence we can
@@ -2602,11 +2578,19 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 
*seq_rtt_p)
 * the other end.
 */
if (after(scb-end_seq, tp-snd_una)) {
-   if (tcp_skb_pcount(skb)  1 
-   after(tp-snd_una, scb-seq))
-   acked |= tcp_tso_acked(sk, skb,
-  now, seq_rtt);
-   break;
+   if (tcp_skb_pcount(skb) == 1 ||
+   !after(tp-snd_una, scb-seq))
+   break;
+
+   packets_acked = tcp_tso_acked(sk, skb);
+   if (!packets_acked)
+   break;
+
+   fully_acked = 0;
+   end_seq = tp-snd_una;
+   } else {
+   packets_acked = tcp_skb_pcount(skb);
+   end_seq = scb-end_seq;
}
 
/* Initial outgoing SYN's get put onto the write_queue
@@ -2624,7 +2608,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 
*seq_rtt_p)
}
 
/* MTU probing checks */
-   if (icsk-icsk_mtup.probe_size) {
+   if (fully_acked  icsk-icsk_mtup.probe_size) {
if (!after(tp-mtu_probe.probe_seq_end, 
TCP_SKB_CB(skb)-end_seq)) {
tcp_mtup_probe_success(sk, skb);
}
@@ -2633,27 +2617,32 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32

[PATCH 6/9] [TCP] FRTO: Improve interoperability with other undo_marker users

2007-09-20 Thread Ilpo Järvinen

Basically this change enables it, previously other undo_marker
users were left with nothing. Reverse undo_marker logic
completely to get it set right in CA_Loss. On the other hand,
when spurious RTO is detected, clear it. Clearing might be too
heavy for some scenarios but seems safe enough starting point
for now and shouldn't have much effect except in majority of
cases (if in any).

By adding a new FLAG_ we avoid looping through write_queue when
RTO occurs.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_input.c |   42 +++---
 1 files changed, 27 insertions(+), 15 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 74accb0..948e79a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -104,6 +104,7 @@ int sysctl_tcp_abc __read_mostly;
 #define FLAG_ONLY_ORIG_SACKED  0x200 /* SACKs only non-rexmit sent before RTO 
*/
 #define FLAG_SND_UNA_ADVANCED  0x400 /* Snd_una was changed (!= 
FLAG_DATA_ACKED) */
 #define FLAG_DSACKING_ACK  0x800 /* SACK blocks contained DSACK info */
+#define FLAG_NONHEAD_RETRANS_ACKED 0x1000 /* Non-head rexmitted data was 
ACKed */
 
 #define FLAG_ACKED (FLAG_DATA_ACKED|FLAG_SYN_ACKED)
 #define FLAG_NOT_DUP   (FLAG_DATA|FLAG_WIN_UPDATE|FLAG_ACKED)
@@ -1597,6 +1598,8 @@ void tcp_enter_frto(struct sock *sk)
tp-undo_retrans = 0;
 
skb = tcp_write_queue_head(sk);
+   if (TCP_SKB_CB(skb)-sacked  TCPCB_RETRANS)
+   tp-undo_marker = 0;
if (TCP_SKB_CB(skb)-sacked  TCPCB_SACKED_RETRANS) {
TCP_SKB_CB(skb)-sacked = ~TCPCB_SACKED_RETRANS;
tp-retrans_out -= tcp_skb_pcount(skb);
@@ -1646,6 +1649,8 @@ static void tcp_enter_frto_loss(struct sock *sk, int 
allowed_segments, int flag)
/* ...enter this if branch just for the first segment */
flag |= FLAG_DATA_ACKED;
} else {
+   if (TCP_SKB_CB(skb)-sacked  TCPCB_RETRANS)
+   tp-undo_marker = 0;
TCP_SKB_CB(skb)-sacked = 
~(TCPCB_LOST|TCPCB_SACKED_RETRANS);
}
 
@@ -1661,7 +1666,6 @@ static void tcp_enter_frto_loss(struct sock *sk, int 
allowed_segments, int flag)
tp-snd_cwnd = tcp_packets_in_flight(tp) + allowed_segments;
tp-snd_cwnd_cnt = 0;
tp-snd_cwnd_stamp = tcp_time_stamp;
-   tp-undo_marker = 0;
tp-frto_counter = 0;
 
tp-reordering = min_t(unsigned int, tp-reordering,
@@ -2587,20 +2591,6 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 
*seq_rtt_p)
end_seq = scb-end_seq;
}
 
-   /* Initial outgoing SYN's get put onto the write_queue
-* just like anything else we transmit.  It is not
-* true data, and if we misinform our callers that
-* this ACK acks real data, we will erroneously exit
-* connection startup slow start one packet too
-* quickly.  This is severely frowned upon behavior.
-*/
-   if (!(scb-flags  TCPCB_FLAG_SYN)) {
-   flag |= FLAG_DATA_ACKED;
-   } else {
-   flag |= FLAG_SYN_ACKED;
-   tp-retrans_stamp = 0;
-   }
-
/* MTU probing checks */
if (fully_acked  icsk-icsk_mtup.probe_size 
!after(tp-mtu_probe.probe_seq_end, scb-end_seq)) {
@@ -2613,6 +2603,9 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 
*seq_rtt_p)
tp-retrans_out -= packets_acked;
flag |= FLAG_RETRANS_DATA_ACKED;
seq_rtt = -1;
+   if ((flag  FLAG_DATA_ACKED) ||
+   (packets_acked  1))
+   flag |= FLAG_NONHEAD_RETRANS_ACKED;
} else if (seq_rtt  0) {
seq_rtt = now - scb-when;
if (fully_acked)
@@ -2634,6 +2627,20 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 
*seq_rtt_p)
}
tp-packets_out -= packets_acked;
 
+   /* Initial outgoing SYN's get put onto the write_queue
+* just like anything else we transmit.  It is not
+* true data, and if we misinform our callers that
+* this ACK acks real data, we will erroneously exit
+* connection startup slow start one packet too
+* quickly.  This is severely frowned upon behavior.
+*/
+   if (!(scb-flags  TCPCB_FLAG_SYN)) {
+   flag |= FLAG_DATA_ACKED;
+   } else {
+   flag |= FLAG_SYN_ACKED;
+   tp-retrans_stamp = 0;
+   }
+

[PATCH 7/9] [TCP] FRTO: Update sysctl documentation

2007-09-20 Thread Ilpo Järvinen

Since the SACK enhanced FRTO was added, the code has been
under test numerous times so remove experimental claim
from the documentation. Also be a bit more verbose about
the usage.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 Documentation/networking/ip-sysctl.txt |   17 -
 1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index 32c2e9d..6ae2fef 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -180,13 +180,20 @@ tcp_fin_timeout - INTEGER
to live longer. Cf. tcp_max_orphans.
 
 tcp_frto - INTEGER
-   Enables F-RTO, an enhanced recovery algorithm for TCP retransmission
+   Enables Forward RTO-Recovery (F-RTO) defined in RFC4138.
+   F-RTO is an enhanced recovery algorithm for TCP retransmission
timeouts.  It is particularly beneficial in wireless environments
where packet loss is typically due to random radio interference
-   rather than intermediate router congestion. If set to 1, basic
-   version is enabled. 2 enables SACK enhanced F-RTO, which is
-   EXPERIMENTAL. The basic version can be used also when SACK is
-   enabled for a flow through tcp_sack sysctl.
+   rather than intermediate router congestion.  FRTO is sender-side
+   only modification.  Therefore it does not require any support from
+   the peer, but in a typical case, however, where wireless link is
+   the local access link and most of the data flows downlink, the
+   faraway servers should have FRTO enabled to take advantage of it.
+   If set to 1, basic version is enabled.  2 enables SACK enhanced
+   F-RTO if flow uses SACK.  The basic version can be used also when
+   SACK is in use though scenario(s) with it exists where FRTO
+   interacts badly with the packet counting of the SACK enabled TCP
+   flow.
 
 tcp_frto_response - INTEGER
When F-RTO has detected that a TCP retransmission timeout was
-- 
1.5.0.6

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/9] [TCP]: Cleanup tcp_tso_acked and tcp_clean_rtx_queue

2007-09-20 Thread Ilpo Järvinen

Implements following cleanups:
- Comment re-placement (CodingStyle)
- tcp_tso_acked() local (wrapper-like) variable removal
  (readability)
- __-types removed (IMHO they make local variables jumpy looking
  and just was space)
- acked - flag (naming conventions elsewhere in TCP code)
- linebreak adjustments (readability)
- nested if()s combined (reduced indentation)
- clarifying newlines added

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_input.c |   66 ++---
 1 files changed, 30 insertions(+), 36 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d340fd5..74accb0 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2528,55 +2528,49 @@ static void tcp_rearm_rto(struct sock *sk)
}
 }
 
+/* If we get here, the whole TSO packet has not been acked. */
 static u32 tcp_tso_acked(struct sock *sk, struct sk_buff *skb)
 {
struct tcp_sock *tp = tcp_sk(sk);
-   struct tcp_skb_cb *scb = TCP_SKB_CB(skb);
-   __u32 seq = tp-snd_una;
-   __u32 packets_acked;
+   u32 packets_acked;
 
-   /* If we get here, the whole TSO packet has not been
-* acked.
-*/
-   BUG_ON(!after(scb-end_seq, seq));
+   BUG_ON(!after(TCP_SKB_CB(skb)-end_seq, tp-snd_una));
 
packets_acked = tcp_skb_pcount(skb);
-   if (tcp_trim_head(sk, skb, seq - scb-seq))
+   if (tcp_trim_head(sk, skb, tp-snd_una - TCP_SKB_CB(skb)-seq))
return 0;
packets_acked -= tcp_skb_pcount(skb);
 
if (packets_acked) {
BUG_ON(tcp_skb_pcount(skb) == 0);
-   BUG_ON(!before(scb-seq, scb-end_seq));
+   BUG_ON(!before(TCP_SKB_CB(skb)-seq, TCP_SKB_CB(skb)-end_seq));
}
 
return packets_acked;
 }
 
-/* Remove acknowledged frames from the retransmission queue. */
-static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p)
+/* Remove acknowledged frames from the retransmission queue. If our packet
+ * is before the ack sequence we can discard it as it's confirmed to have
+ * arrived at the other end.
+ */
+static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p)
 {
struct tcp_sock *tp = tcp_sk(sk);
const struct inet_connection_sock *icsk = inet_csk(sk);
struct sk_buff *skb;
-   __u32 now = tcp_time_stamp;
+   u32 now = tcp_time_stamp;
int fully_acked = 1;
-   int acked = 0;
+   int flag = 0;
int prior_packets = tp-packets_out;
-   __s32 seq_rtt = -1;
+   s32 seq_rtt = -1;
ktime_t last_ackt = net_invalid_timestamp();
 
-   while ((skb = tcp_write_queue_head(sk)) 
-  skb != tcp_send_head(sk)) {
+   while ((skb = tcp_write_queue_head(sk))  skb != tcp_send_head(sk)) {
struct tcp_skb_cb *scb = TCP_SKB_CB(skb);
u32 end_seq;
u32 packets_acked;
-   __u8 sacked = scb-sacked;
+   u8 sacked = scb-sacked;
 
-   /* If our packet is before the ack sequence we can
-* discard it as it's confirmed to have arrived at
-* the other end.
-*/
if (after(scb-end_seq, tp-snd_una)) {
if (tcp_skb_pcount(skb) == 1 ||
!after(tp-snd_una, scb-seq))
@@ -2601,38 +2595,38 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 
*seq_rtt_p)
 * quickly.  This is severely frowned upon behavior.
 */
if (!(scb-flags  TCPCB_FLAG_SYN)) {
-   acked |= FLAG_DATA_ACKED;
+   flag |= FLAG_DATA_ACKED;
} else {
-   acked |= FLAG_SYN_ACKED;
+   flag |= FLAG_SYN_ACKED;
tp-retrans_stamp = 0;
}
 
/* MTU probing checks */
-   if (fully_acked  icsk-icsk_mtup.probe_size) {
-   if (!after(tp-mtu_probe.probe_seq_end, 
TCP_SKB_CB(skb)-end_seq)) {
-   tcp_mtup_probe_success(sk, skb);
-   }
+   if (fully_acked  icsk-icsk_mtup.probe_size 
+   !after(tp-mtu_probe.probe_seq_end, scb-end_seq)) {
+   tcp_mtup_probe_success(sk, skb);
}
 
if (sacked) {
if (sacked  TCPCB_RETRANS) {
if (sacked  TCPCB_SACKED_RETRANS)
tp-retrans_out -= packets_acked;
-   acked |= FLAG_RETRANS_DATA_ACKED;
+   flag |= FLAG_RETRANS_DATA_ACKED;
seq_rtt = -1;
} else if (seq_rtt  0) {
seq_rtt = now - scb-when;
if (fully_acked)
last_ackt = skb-tstamp;

[PATCH 8/9] [TCP]: Enable SACK enhanced FRTO (RFC4138) by default

2007-09-20 Thread Ilpo Järvinen

Most of the description that follows comes from my mail to
netdev (some editing done):

Main obstacle to FRTO use is its deployment as it has to be on
the sender side where as wireless link is often the receiver's
access link. Take initiative on behalf of unlucky receivers and
enable it by default in future Linux TCP senders. Also IETF
seems to interested in advancing FRTO from experimental [1].

How does FRTO help?
===

FRTO detects spurious RTOs and avoids a number of unnecessary
retransmissions and a couple of other problems that can arise
due to incorrect guess made at RTO (i.e., that segments were
lost when they actually got delayed which is likely to occur
e.g. in wireless environments with link-layer retransmission).
Though FRTO cannot prevent the first (potentially unnecessary)
retransmission at RTO, I suspect that it won't cost that much
even if you have to pay for each bit (won't be that high
percentage out of all packets after all :-)). However, usually
when you have a spurious RTO, not only the first segment
unnecessarily retransmitted but the *whole window*. It goes like
this: all cumulative ACKs got delayed due to in-order delivery,
then TCP will actually send 1.5*original cwnd worth of data in
the RTO's slow-start when the delayed ACKs arrive (basically the
original cwnd worth of it unnecessarily). In case one is
interested in minimizing unnecessary retransmissions e.g. due to
cost, those rexmissions must never see daylight. Besides, in the
worst case the generated burst overloads the bottleneck buffers
which is likely to significantly delay the further progress of
the flow. In case of ll rexmissions, ACK compression often
occurs at the same time making the burst very sharp edged (in
that case TCP often loses most of the segments above high_seq
= very bad performance too). When FRTO is enabled, those
unnecessary retransmissions are fully avoided except for the
first segment and the cwnd behavior after detected spurious RTO
is determined by the response (one can tune that by sysctl).

Basic version (non-SACK enhanced one), FRTO can fail to detect
spurious RTO as spurious and falls back to conservative
behavior. ACK lossage is much less significant than reordering,
usually the FRTO can detect spurious RTO if at least 2
cumulative ACKs from original window are preserved (excluding
the ACK that advances to high_seq). With SACK-enhanced version,
the detection is quite robust.

FRTO should remove the need to set a high lower bound for the
RTO estimator due to delay spikes that occur relatively common
in some environments (esp. in wireless/cellular ones).

[1] http://www1.ietf.org/mail-archive/web/tcpm/current/msg02862.html

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_input.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 948e79a..02b549b 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -85,7 +85,7 @@ int sysctl_tcp_adv_win_scale __read_mostly = 2;
 int sysctl_tcp_stdurg __read_mostly;
 int sysctl_tcp_rfc1337 __read_mostly;
 int sysctl_tcp_max_orphans __read_mostly = NR_FILE;
-int sysctl_tcp_frto __read_mostly;
+int sysctl_tcp_frto __read_mostly = 2;
 int sysctl_tcp_frto_response __read_mostly;
 int sysctl_tcp_nometrics_save __read_mostly;
 
-- 
1.5.0.6

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 9/9] [TCP]: Avoid clearing sacktag hint in trivial situations

2007-09-20 Thread Ilpo Järvinen

There's no reason to clear the sacktag skb hint when small part
of the rexmit queue changes. Account changes (if any) instead when
fragmenting/collapsing. RTO/FRTO do not touch SACKED_ACKED bits so
no need to discard SACK tag hint at all.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 include/net/tcp.h |6 +-
 net/ipv4/tcp_input.c  |   14 --
 net/ipv4/tcp_output.c |   12 
 3 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 16dfe3c..07b1faa 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1067,11 +1067,15 @@ static inline void tcp_mib_init(void)
 }
 
 /* from STCP */
-static inline void tcp_clear_all_retrans_hints(struct tcp_sock *tp) {
+static inline void tcp_clear_retrans_hints_partial(struct tcp_sock *tp) {
tp-lost_skb_hint = NULL;
tp-scoreboard_skb_hint = NULL;
tp-retransmit_skb_hint = NULL;
tp-forward_skb_hint = NULL;
+}
+
+static inline void tcp_clear_all_retrans_hints(struct tcp_sock *tp) {
+   tcp_clear_retrans_hints_partial(tp);
tp-fastpath_skb_hint = NULL;
 }
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 02b549b..1092b5a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1674,7 +1674,7 @@ static void tcp_enter_frto_loss(struct sock *sk, int 
allowed_segments, int flag)
tp-high_seq = tp-frto_highmark;
TCP_ECN_queue_cwr(tp);
 
-   tcp_clear_all_retrans_hints(tp);
+   tcp_clear_retrans_hints_partial(tp);
 }
 
 void tcp_clear_retrans(struct tcp_sock *tp)
@@ -1714,10 +1714,14 @@ void tcp_enter_loss(struct sock *sk, int how)
tp-bytes_acked = 0;
tcp_clear_retrans(tp);
 
-   /* Push undo marker, if it was plain RTO and nothing
-* was retransmitted. */
-   if (!how)
+   if (!how) {
+   /* Push undo marker, if it was plain RTO and nothing
+* was retransmitted. */
tp-undo_marker = tp-snd_una;
+   tcp_clear_retrans_hints_partial(tp);
+   } else {
+   tcp_clear_all_retrans_hints(tp);
+   }
 
tcp_for_write_queue(skb, sk) {
if (skb == tcp_send_head(sk))
@@ -1744,8 +1748,6 @@ void tcp_enter_loss(struct sock *sk, int how)
TCP_ECN_queue_cwr(tp);
/* Abort FRTO algorithm if one is in progress */
tp-frto_counter = 0;
-
-   tcp_clear_all_retrans_hints(tp);
 }
 
 static int tcp_check_sack_reneging(struct sock *sk)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index f46d24b..cbb83ac 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -687,7 +687,7 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 
len, unsigned int mss
 
BUG_ON(len  skb-len);
 
-   tcp_clear_all_retrans_hints(tp);
+   tcp_clear_retrans_hints_partial(tp);
nsize = skb_headlen(skb) - len;
if (nsize  0)
nsize = 0;
@@ -1718,9 +1718,6 @@ static void tcp_retrans_try_collapse(struct sock *sk, 
struct sk_buff *skb, int m
BUG_ON(tcp_skb_pcount(skb) != 1 ||
   tcp_skb_pcount(next_skb) != 1);
 
-   /* changing transmit queue under us so clear hints */
-   tcp_clear_all_retrans_hints(tp);
-
/* Ok.  We will be able to collapse the packet. */
tcp_unlink_write_queue(next_skb, sk);
 
@@ -1759,6 +1756,13 @@ static void tcp_retrans_try_collapse(struct sock *sk, 
struct sk_buff *skb, int m
 
tcp_adjust_fackets_out(tp, skb, tcp_skb_pcount(next_skb));
tp-packets_out -= tcp_skb_pcount(next_skb);
+
+   /* changed transmit queue under us so clear hints */
+   tcp_clear_retrans_hints_partial(tp);
+   /* manually tune sacktag skb hint */
+   if (tp-fastpath_skb_hint == next_skb)
+   tp-fastpath_skb_hint = skb;
+
sk_stream_free_skb(sk, next_skb);
}
 }
-- 
1.5.0.6

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: netif_rx will not free skb when I use ftp in kernel 2.6.22/2.6.21

2007-09-20 Thread Jay Cliburn


Chris Snook wrote:

 wrote:
in function 
at_alloc_rx_buffers(), pci_unmap_page() and netif_rx() in function 
at_clean_rx_irq(), 


Okay, I didn't know you were talking about the atl1 driver.  Are you 
using the
in-tree driver in 2.6.22, or the pre-merge driver on sourceforge, or the 
vendor

driver from Attansic/Atheros?


Based on the function names (at_*), looks like the vendor driver is 
being used.

The pre-merge and in-kernel function names begin with atl1_.

Jay
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH: 2.6.13-15-SMP 3/3] network: concurrently run softirq network code on SMP

2007-09-20 Thread john ye

Bottom Softirq Implementation. John Ye, 2007.08.27

Why this patch:
Make kernel be able to concurrently execute softirq's net code on SMP 
system.
Takes full advantages of SMP to handle more packets and greatly raises NIC 
throughput.
The current kernel's net packet processing logic is:
1) The CPU which handles a hardirq must be executing its related softirq.
2) One softirq instance(irqs handled by 1 CPU) can't be executed on more 
than 2 CPUs
at the same time.
The limitation make kernel network be hard to take the advantages of SMP.

How this patch:
It splits the current softirq code into 2 parts: the cpu-sensitive top half,
and the cpu-insensitive bottom half, then make bottom half(calld BS) be
executed on SMP concurrently.
The two parts are not equal in terms of size and load. Top part has constant 
code
size(mainly, in net/core/dev.c and NIC drivers), while bottom part involves
netfilter(iptables) whose load varies very much. An iptalbes with 1000 rules 
to match
will make the bottom part's load be very high. So, if the bottom part 
softirq
can be randomly distributed to processors and run concurrently on them, the 
network will
gain much more packet handling capacity, network throughput will be be 
increased
remarkably.

Where useful:
It's useful on SMP machines that meet the following 2 conditions:
1) have high kernel network load, for example, running iptables with 
thousands of rules, etc).
2) have more CPUs than active NICs, e.g. a 4 CPUs machine with 2 NICs).
On these system, with the increase of softirq load, some CPUs will be idle
while others(number is equal to # of NIC) keeps busy.
IRQBALANCE will help, but it only shifts IRQ among CPUS, makes no softirq 
concurrency.
Balancing the load of each cpus will not remarkably increase network speed.

Where NOT useful:
If the bottom half of softirq is too small(without running iptables), or the 
network
is too idle, BS patch will not be seen to have visible effect. But It has no
negative affect either.
User can turn on/off BS functionality by /proc/sys/net/bs_enable switch.

How to test:
On a linux box, run iptables, add 2000 rules to table filter  table nat to 
simulate huge
softirq load. Then, open 20 ftp sessions to download big file. On another 
machine(who
use this test machine as gateway), open 20 more ftp download sessions. 
Compare the speed,
without BS enabled, and with BS enabled.
cat /proc/sys/net/bs_enable. this is a switch to turn on/off BS
cat /proc/sys/net/bs_status. this shows the usage of each CPUs
Test shown that when bottom softirq load is high, the network throughput can 
be nearly
doubled on 2 CPUs machine. hopefully it may be quadrupled on a 4 cpus linux 
box.

Bugs:
It will NOT allow hotpug CPU.
It only allows incremental CPUs ids, starting from 0 to num_online_cpus().
for example, 0,1,2,3 is OK. 0,1,8,9 is KO.

Some considerations in the future:
1) With BS patch, the irq balance code on arch/i386/kernel/io_apic.c seems 
no need any more,
at least not for network irq.
2) Softirq load will become very small. It only run the top half of old 
softirq, which
is much less expensive than bottom half---the netfilter program.
To let top softirq process more packets, can these 3 network parameters be 
given a larger value?
   extern int netdev_max_backlog = 1000;
   extern int netdev_budget = 300;
   extern int weight_p = 64;
3) Now, BS are running on built-in keventd thread, we can create new 
workqueues to let it run on?

Signed-off-by: John Ye (Seeker) [EMAIL PROTECTED]


--- old/net/ipv4/ip_input.c 2007-09-20 20:50:31.0 +0800
+++ new/net/ipv4/ip_input.c 2007-09-21 05:52:40.0 +0800
@@ -362,6 +362,198 @@
 return NET_RX_DROP;
 }

+
+#define CONFIG_BOTTOM_SOFTIRQ_SMP
+#define CONFIG_BOTTOM_SOFTIRQ_SMP_SYSCTL
+
+#ifdef CONFIG_BOTTOM_SOFTIRQ_SMP
+
+/*
+ *
+Bottom Softirq Implementation. John Ye, 2007.08.27
+
+Why this patch:
+Make kernel be able to concurrently execute softirq's net code on SMP 
system.
+Takes full advantages of SMP to handle more packets and greatly raises NIC 
throughput.
+The current kernel's net packet processing logic is:
+1) The CPU which handles a hardirq must be executing its related softirq.
+2) One softirq instance(irqs handled by 1 CPU) can't be executed on more 
than 2 CPUs
+at the same time.
+The limitation make kernel network be hard to take the advantages of SMP.
+
+How this patch:
+It splits the current softirq code into 2 parts: the cpu-sensitive top 
half,
+and the cpu-insensitive bottom half, then make bottom half(calld BS) be
+executed on SMP concurrently.
+The two parts are not equal in terms of size and load. Top part has 
constant code
+size(mainly, in net/core/dev.c and NIC drivers), while bottom part involves
+netfilter(iptables) whose load varies very much. An iptalbes with 1000 
rules to match
+will make the bottom part's load be very high. So, if the bottom part 
softirq
+can be randomly distributed to processors and run concurrently on them, the 
network will
+gain

Re: [LARTC] ifb and ppp

2007-09-20 Thread jamal

On Thu, 2007-20-09 at 13:55 +0200, Patrick McHardy wrote:
 Please keep netdev and myself CCed.

and me too (I am way behind on netdev)

 Frithjof Hammer wrote:

  Any further help/ideas?

Sorry, I didnt follow the thread - what is the goal to be achieved with
the setup?

 I misread the code, the device it looks at in tcf_mirred_init is
 the target device (ifb). So what it does is check whether the
 target device wants a link layer header and if it does restores
 the one from the source device. So currently it seems impossible
 to get rid of the PPP(oE) header.

It is tricky to redirect from devices that have disparity
in their view of link layer headers except for those that we know
dont expect anything. 

 Jamal, is that how its supposed to work?

Right - some netdevices on receipt will expect the link layer header.

cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V5 1/11] net/core: add a netdev notification for slave detach

2007-09-20 Thread Moni Shoua

A slave of a bonding master that wants to send a notification before
going down should call netdev_slave_detach(). The handling of this notification
will be done outside the context of unregister_netdevice() which is sometimes
necessary, as with IPoIB slave for example.

Signed-off-by: Moni Shoua monis at voltaire.com
---
 include/linux/if.h |1 +
 net/core/dev.c |   20 
 2 files changed, 21 insertions(+)

Index: net-2.6/net/core/dev.c
===
--- net-2.6.orig/net/core/dev.c 2007-09-20 08:04:47.164051688 +0200
+++ net-2.6/net/core/dev.c  2007-09-20 09:20:21.493060579 +0200
@@ -2588,6 +2588,25 @@ int netdev_set_master(struct net_device 
return 0;
 }
 
+/**
+ * netdev_slave_detach -   notify that slave is about to detach 
from master
+ * @slave: slave device
+ *
+ * Raise a flag that slave is about to detach from master
+ * and notify the netdev  chain.
+ * The caller must hold the rtnl_mutex.
+ */
+
+int netdev_slave_detach(struct net_device *slave)
+{
+   int ret = 0;
+   if (slave-flags  IFF_SLAVE) {
+   slave-priv_flags |= IFF_SLAVE_DETACH;
+   ret = call_netdevice_notifiers(NETDEV_CHANGE, slave);
+   }
+   return ret;
+}
+
 static void __dev_set_promiscuity(struct net_device *dev, int inc)
 {
unsigned short old_flags = dev-flags;
@@ -4120,6 +4139,7 @@ EXPORT_SYMBOL(dev_set_mac_address);
 EXPORT_SYMBOL(free_netdev);
 EXPORT_SYMBOL(netdev_boot_setup_check);
 EXPORT_SYMBOL(netdev_set_master);
+EXPORT_SYMBOL(netdev_slave_detach);
 EXPORT_SYMBOL(netdev_state_change);
 EXPORT_SYMBOL(netif_receive_skb);
 EXPORT_SYMBOL(netif_rx);
Index: net-2.6/include/linux/if.h
===
--- net-2.6.orig/include/linux/if.h 2007-09-20 08:04:47.164051688 +0200
+++ net-2.6/include/linux/if.h  2007-09-20 08:15:29.577729301 +0200
@@ -61,6 +61,7 @@
 #define IFF_MASTER_ALB 0x10/* bonding master, balance-alb. */
 #define IFF_BONDING0x20/* bonding master or slave  */
 #define IFF_SLAVE_NEEDARP 0x40 /* need ARPs for validation */
+#define IFF_SLAVE_DETACH 0x80  /* slave is about to unregister */
 
 #define IF_GET_IFACE   0x0001  /* for querying only */
 #define IF_GET_PROTO   0x0002

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V5 2/11] IB/ipoib: Notify the world before doing unregister

2007-09-20 Thread Moni Shoua

When the bonding device enslaves IPoIB devices it takes pointers to
functions in the ib_ipoib module. This is fine as long as the ib_ipoib
nodule remains loaded while the references to its functions exist.
So, to help bonding do a cleanup on time, when the IPoIB net device is a 
slave of a bonding master, let the master know that the IPoIB device is
about to unregister (but before calling unregister).

Signed-off-by: Moni Shoua monis at voltaire.com
---
 drivers/infiniband/ulp/ipoib/ipoib.h  |7 +++
 drivers/infiniband/ulp/ipoib/ipoib_main.c |3 +++
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c |1 +
 3 files changed, 11 insertions(+)

Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c
===
--- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c  2007-09-20 
08:35:34.0 +0200
+++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c   2007-09-20 
14:20:16.495147879 +0200
@@ -48,6 +48,7 @@
 #include linux/in.h
 
 #include net/dst.h
+#include linux/netdevice.h
 
 MODULE_AUTHOR(Roland Dreier);
 MODULE_DESCRIPTION(IP-over-InfiniBand net driver);
@@ -921,6 +922,7 @@ void ipoib_dev_cleanup(struct net_device
 
/* Delete any child interfaces first */
list_for_each_entry_safe(cpriv, tcpriv, priv-child_intfs, list) {
+   ipoib_slave_detach(cpriv-dev);
unregister_netdev(cpriv-dev);
ipoib_dev_cleanup(cpriv-dev);
free_netdev(cpriv-dev);
@@ -1208,6 +1210,7 @@ static void ipoib_remove_one(struct ib_d
ib_unregister_event_handler(priv-event_handler);
flush_scheduled_work();
 
+   ipoib_slave_detach(priv-dev);
unregister_netdev(priv-dev);
ipoib_dev_cleanup(priv-dev);
free_netdev(priv-dev);
Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
===
--- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_vlan.c  2007-09-20 
09:26:11.0 +0200
+++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_vlan.c   2007-09-20 
09:27:20.182709679 +0200
@@ -157,6 +157,7 @@ int ipoib_vlan_delete(struct net_device 
mutex_lock(ppriv-vlan_mutex);
list_for_each_entry_safe(priv, tpriv, ppriv-child_intfs, list) {
if (priv-pkey == pkey) {
+   ipoib_slave_detach(priv-dev);
unregister_netdev(priv-dev);
ipoib_dev_cleanup(priv-dev);
list_del(priv-list);
Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib.h
===
--- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib.h   2007-09-20 
12:18:56.0 +0200
+++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib.h2007-09-20 
14:21:47.385972207 +0200
@@ -570,6 +570,13 @@ static inline void ipoib_cm_handle_rx_wc
 
 #endif
 
+static inline void ipoib_slave_detach(struct net_device *dev)
+{
+   rtnl_lock();
+   netdev_slave_detach(dev);
+   rtnl_unlock();
+}
+
 #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG
 void ipoib_create_debug_files(struct net_device *dev);
 void ipoib_delete_debug_files(struct net_device *dev);

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Please pull 'z1211' branch of wireless-2.6

2007-09-20 Thread John W. Linville

On Wed, Sep 19, 2007 at 11:12:50PM +0100, Daniel Drake wrote:
 John W. Linville wrote:

 BTW: I fairly regularly get email from F7 users complaining about 
 connection intermittancy and other bugs that we don't seem to have for 
 the softmac driver (maybe stack related issues, of which I've fixed a 
 couple that affected me personally, I'm a little surprised that F7 
 jumped so early).

Hmmm...please refer any of these to bugzilla.redhat.com if you don't mind.

John
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V5 3/11] IB/ipoib: Bound the net device to the ipoib_neigh structue

2007-09-20 Thread Moni Shoua

IPoIB uses a two layer neighboring scheme, such that for each struct neighbour
whose device is an ipoib one, there is a struct ipoib_neigh buddy which is
created on demand at the tx flow by an ipoib_neigh_alloc(skb-dst-neighbour)
call.

When using the bonding driver, neighbours are created by the net stack on behalf
of the bonding (master) device. On the tx flow the bonding code gets an skb such
that skb-dev points to the master device, it changes this skb to point on the
slave device and calls the slave hard_start_xmit function.

Under this scheme, ipoib_neigh_destructor assumption that for each struct
neighbour it gets, n-dev is an ipoib device and hence netdev_priv(n-dev)
can be casted to struct ipoib_dev_priv is buggy.

To fix it, this patch adds a dev field to struct ipoib_neigh which is used
instead of the struct neighbour dev one, when n-dev-flags has the
IFF_MASTER bit set.

Signed-off-by: Moni Shoua monis at voltaire.com
Signed-off-by: Or Gerlitz ogerlitz at voltaire.com
---
 drivers/infiniband/ulp/ipoib/ipoib.h   |4 +++-
 drivers/infiniband/ulp/ipoib/ipoib_main.c  |   24 +++-
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |3 ++-
 3 files changed, 20 insertions(+), 11 deletions(-)

Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib.h
===
--- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib.h   2007-09-18 
17:08:53.245849217 +0200
+++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib.h2007-09-18 
17:09:26.534874404 +0200
@@ -328,6 +328,7 @@ struct ipoib_neigh {
struct sk_buff_head queue;
 
struct neighbour   *neighbour;
+   struct net_device *dev;
 
struct list_headlist;
 };
@@ -344,7 +345,8 @@ static inline struct ipoib_neigh **to_ip
 INFINIBAND_ALEN, sizeof(void *));
 }
 
-struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh);
+struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh,
+ struct net_device *dev);
 void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh);
 
 extern struct workqueue_struct *ipoib_workqueue;
Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c
===
--- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c  2007-09-18 
17:08:53.245849217 +0200
+++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c   2007-09-18 
17:23:54.725744661 +0200
@@ -511,7 +511,7 @@ static void neigh_add_path(struct sk_buf
struct ipoib_path *path;
struct ipoib_neigh *neigh;
 
-   neigh = ipoib_neigh_alloc(skb-dst-neighbour);
+   neigh = ipoib_neigh_alloc(skb-dst-neighbour, skb-dev);
if (!neigh) {
++priv-stats.tx_dropped;
dev_kfree_skb_any(skb);
@@ -830,6 +830,13 @@ static void ipoib_neigh_cleanup(struct n
unsigned long flags;
struct ipoib_ah *ah = NULL;
 
+   neigh = *to_ipoib_neigh(n);
+   if (neigh) {
+   priv = netdev_priv(neigh-dev);
+   ipoib_dbg(priv, neigh_destructor for bonding device: %s\n,
+ n-dev-name);
+   } else
+   return;
ipoib_dbg(priv,
  neigh_cleanup for %06x  IPOIB_GID_FMT \n,
  IPOIB_QPN(n-ha),
@@ -837,13 +844,10 @@ static void ipoib_neigh_cleanup(struct n
 
spin_lock_irqsave(priv-lock, flags);
 
-   neigh = *to_ipoib_neigh(n);
-   if (neigh) {
-   if (neigh-ah)
-   ah = neigh-ah;
-   list_del(neigh-list);
-   ipoib_neigh_free(n-dev, neigh);
-   }
+   if (neigh-ah)
+   ah = neigh-ah;
+   list_del(neigh-list);
+   ipoib_neigh_free(n-dev, neigh);
 
spin_unlock_irqrestore(priv-lock, flags);
 
@@ -851,7 +855,8 @@ static void ipoib_neigh_cleanup(struct n
ipoib_put_ah(ah);
 }
 
-struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour)
+struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour,
+ struct net_device *dev)
 {
struct ipoib_neigh *neigh;
 
@@ -860,6 +865,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(st
return NULL;
 
neigh-neighbour = neighbour;
+   neigh-dev = dev;
*to_ipoib_neigh(neighbour) = neigh;
skb_queue_head_init(neigh-queue);
ipoib_cm_set(neigh, NULL);
Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
===
--- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-09-18 
17:08:53.245849217 +0200
+++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_multicast.c  2007-09-18 
17:09:26.536874045 +0200
@@ -727,7 +727,8 @@ out:
if (skb-dst
skb-dst-neighbour 
!*to_ipoib_neigh(skb-dst-neighbour)) {
-

Re: Please pull 'ssb-drivers' branch of wireless-2.6

2007-09-20 Thread John W. Linville

On Wed, Sep 19, 2007 at 02:33:56PM -0700, Greg KH wrote:
 On Wed, Sep 19, 2007 at 04:44:28PM -0400, John W. Linville wrote:

  These patches build upon the SSB bus support added to net-2.6.24 to
  support the b43 wireless driver.  Since Dave has that support in his
  tree, I'm asking him to merge these patches as well.

  The second patch adds a driver for a USB OHCI device which lives on
  the SSB bus.  Again, this is found on a number of SoC devices used
  especially in wireless routers and APs.

  The patches are available here:
  
  
  http://www.kernel.org/pub/linux/kernel/people/linville/wireless-2.6/ssb-drivers/0001-b44-port-to-native-ssb-support.patch
  
  http://www.kernel.org/pub/linux/kernel/people/linville/wireless-2.6/ssb-drivers/0002-usb-ssb-hosted-OHCI-driver.patch
 
 This one needs to go through the linux-usb-devel list (not the -users
 list) and get acked by David Brownell, the current OHCI maintainer.

Ooops, sorry -- clicked on the wrong line in MAINTAINERS...

David, please review the patch in the second link above and consider it
for inclusion in 2.6.24 (once the SSB stuff in net-2.6.24 is merged).

Thanks!

John
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V5 4/11] IB/ipoib: Verify address handle validity on send

2007-09-20 Thread Moni Shoua

When the bonding device senses a carrier loss of its active slave it replaces
that slave with a new one. In between the times when the carrier of an IPoIB
device goes down and ipoib_neigh is destroyed, it is possible that the
bonding driver will send a packet on a new slave that uses an old ipoib_neigh.
This patch detects and prevents this from happenning.

Signed-off-by: Moni Shoua monis at voltaire.com
Signed-off-by: Or Gerlitz ogerlitz at voltaire.com
---
 drivers/infiniband/ulp/ipoib/ipoib_main.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c
===
--- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c  2007-09-18 
17:09:26.535874225 +0200
+++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c   2007-09-18 
17:10:22.375853147 +0200
@@ -686,9 +686,10 @@ static int ipoib_start_xmit(struct sk_bu
goto out;
}
} else if (neigh-ah) {
-   if (unlikely(memcmp(neigh-dgid.raw,
+   if (unlikely((memcmp(neigh-dgid.raw,
skb-dst-neighbour-ha + 4,
-   sizeof(union ib_gid {
+   sizeof(union ib_gid))) ||
+(neigh-dev != dev))) {
spin_lock(priv-lock);
/*
 * It's safe to call ipoib_put_ah() inside

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V5 5/11] net/bonding: Enable bonding to enslave non ARPHRD_ETHER

2007-09-20 Thread Moni Shoua

This patch changes some of the bond netdevice attributes and functions
to be that of the active slave for the case of the enslaved device not being
of ARPHRD_ETHER type. Basically it overrides those setting done by 
ether_setup(),
which are netdevice **type** dependent and hence might be not appropriate for
devices of other types. It also enforces mutual exclusion on bonding slaves
from dissimilar ether types, as was concluded over the v1 discussion.

IPoIB (see Documentation/infiniband/ipoib.txt) MAC address is made of a 3 bytes
IB QP (Queue Pair) number and 16 bytes IB port GID (Global ID) of the port this
IPoIB device is bounded to. The QP is a resource created by the IB HW and the
GID is an identifier burned into the HCA (i have omitted here some details which
are not important for the bonding RFC).

Signed-off-by: Moni Shoua monis at voltaire.com
Signed-off-by: Or Gerlitz ogerlitz at voltaire.com
---
 drivers/net/bonding/bond_main.c |   39 +++
 1 files changed, 39 insertions(+)

Index: net-2.6/drivers/net/bonding/bond_main.c
===
--- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-15 
10:08:59.0 +0300
+++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-15 10:54:13.424688411 
+0300
@@ -1237,6 +1237,26 @@ static int bond_compute_features(struct 
return 0;
 }
 
+
+static void bond_setup_by_slave(struct net_device *bond_dev,
+   struct net_device *slave_dev)
+{
+   bond_dev-hard_header   = slave_dev-hard_header;
+   bond_dev-rebuild_header= slave_dev-rebuild_header;
+   bond_dev-hard_header_cache = slave_dev-hard_header_cache;
+   bond_dev-header_cache_update   = slave_dev-header_cache_update;
+   bond_dev-hard_header_parse = slave_dev-hard_header_parse;
+
+   bond_dev-neigh_setup   = slave_dev-neigh_setup;
+
+   bond_dev-type  = slave_dev-type;
+   bond_dev-hard_header_len   = slave_dev-hard_header_len;
+   bond_dev-addr_len  = slave_dev-addr_len;
+
+   memcpy(bond_dev-broadcast, slave_dev-broadcast,
+   slave_dev-addr_len);
+}
+
 /* enslave device slave to bond device master */
 int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 {
@@ -1311,6 +1331,25 @@ int bond_enslave(struct net_device *bond
goto err_undo_flags;
}
 
+   /* set bonding device ether type by slave - bonding netdevices are
+* created with ether_setup, so when the slave type is not ARPHRD_ETHER
+* there is a need to override some of the type dependent attribs/funcs.
+*
+* bond ether type mutual exclusion - don't allow slaves of dissimilar
+* ether type (eg ARPHRD_ETHER and ARPHRD_INFINIBAND) share the same 
bond
+*/
+   if (bond-slave_cnt == 0) {
+   if (slave_dev-type != ARPHRD_ETHER)
+   bond_setup_by_slave(bond_dev, slave_dev);
+   } else if (bond_dev-type != slave_dev-type) {
+   printk(KERN_ERR DRV_NAME : %s ether type (%d) is different 
+   from other slaves (%d), can not enslave it.\n,
+   slave_dev-name,
+   slave_dev-type, bond_dev-type);
+   res = -EINVAL;
+   goto err_undo_flags;
+   }
+
if (slave_dev-set_mac_address == NULL) {
printk(KERN_ERR DRV_NAME
: %s: Error: The slave device you specified does 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V5 6/11] net/bonding: Enable bonding to enslave netdevices not supporting set_mac_address()

2007-09-20 Thread Moni Shoua

This patch allows for enslaving netdevices which do not support
the set_mac_address() function. In that case the bond mac address is the one
of the active slave, where remote peers are notified on the mac address
(neighbour) change by Gratuitous ARP sent by bonding when fail-over occurs
(this is already done by the bonding code).

Signed-off-by: Moni Shoua monis at voltaire.com
Signed-off-by: Or Gerlitz ogerlitz at voltaire.com
---
 drivers/net/bonding/bond_main.c |   87 +++-
 drivers/net/bonding/bonding.h   |1 
 2 files changed, 60 insertions(+), 28 deletions(-)

Index: net-2.6/drivers/net/bonding/bond_main.c
===
--- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-15 
10:54:13.0 +0300
+++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-15 10:54:41.971632881 
+0300
@@ -1095,6 +1095,14 @@ void bond_change_active_slave(struct bon
if (new_active) {
bond_set_slave_active_flags(new_active);
}
+
+   /* when bonding does not set the slave MAC address, the bond MAC
+* address is the one of the active slave.
+*/
+   if (new_active  !bond-do_set_mac_addr)
+   memcpy(bond-dev-dev_addr,  new_active-dev-dev_addr,
+   new_active-dev-addr_len);
+
bond_send_gratuitous_arp(bond);
}
 }
@@ -1351,13 +1359,22 @@ int bond_enslave(struct net_device *bond
}
 
if (slave_dev-set_mac_address == NULL) {
-   printk(KERN_ERR DRV_NAME
-   : %s: Error: The slave device you specified does 
-   not support setting the MAC address. 
-   Your kernel likely does not support slave 
-   devices.\n, bond_dev-name);
-   res = -EOPNOTSUPP;
-   goto err_undo_flags;
+   if (bond-slave_cnt == 0) {
+   printk(KERN_WARNING DRV_NAME
+   : %s: Warning: The first slave device you 
+   specified does not support setting the MAC 
+   address. This bond MAC address would be that 
+   of the active slave.\n, bond_dev-name);
+   bond-do_set_mac_addr = 0;
+   } else if (bond-do_set_mac_addr) {
+   printk(KERN_ERR DRV_NAME
+   : %s: Error: The slave device you specified 
+   does not support setting the MAC addres,.
+   but this bond uses this practice. \n
+   , bond_dev-name);
+   res = -EOPNOTSUPP;
+   goto err_undo_flags;
+   }
}
 
new_slave = kzalloc(sizeof(struct slave), GFP_KERNEL);
@@ -1378,16 +1395,18 @@ int bond_enslave(struct net_device *bond
 */
memcpy(new_slave-perm_hwaddr, slave_dev-dev_addr, ETH_ALEN);
 
-   /*
-* Set slave to master's mac address.  The application already
-* set the master's mac address to that of the first slave
-*/
-   memcpy(addr.sa_data, bond_dev-dev_addr, bond_dev-addr_len);
-   addr.sa_family = slave_dev-type;
-   res = dev_set_mac_address(slave_dev, addr);
-   if (res) {
-   dprintk(Error %d calling set_mac_address\n, res);
-   goto err_free;
+   if (bond-do_set_mac_addr) {
+   /*
+* Set slave to master's mac address.  The application already
+* set the master's mac address to that of the first slave
+*/
+   memcpy(addr.sa_data, bond_dev-dev_addr, bond_dev-addr_len);
+   addr.sa_family = slave_dev-type;
+   res = dev_set_mac_address(slave_dev, addr);
+   if (res) {
+   dprintk(Error %d calling set_mac_address\n, res);
+   goto err_free;
+   }
}
 
res = netdev_set_master(slave_dev, bond_dev);
@@ -1612,9 +1631,11 @@ err_close:
dev_close(slave_dev);
 
 err_restore_mac:
-   memcpy(addr.sa_data, new_slave-perm_hwaddr, ETH_ALEN);
-   addr.sa_family = slave_dev-type;
-   dev_set_mac_address(slave_dev, addr);
+   if (bond-do_set_mac_addr) {
+   memcpy(addr.sa_data, new_slave-perm_hwaddr, ETH_ALEN);
+   addr.sa_family = slave_dev-type;
+   dev_set_mac_address(slave_dev, addr);
+   }
 
 err_free:
kfree(new_slave);
@@ -1792,10 +1813,12 @@ int bond_release(struct net_device *bond
/* close slave before restoring its mac address */
dev_close(slave_dev);
 
-   /* restore original (permanent) mac address */
-   memcpy(addr.sa_data, slave-perm_hwaddr, ETH_ALEN);
-

[PATCH V5 7/11] net/bonding: Enable IP multicast for bonding IPoIB devices

2007-09-20 Thread Moni Shoua

Allow to enslave devices when the bonding device is not up. Over the discussion
held at the previous post this seemed to be the most clean way to go, where it
is not expected to cause instabilities.

Normally, the bonding driver is UP before any enslavement takes place.
Once a netdevice is UP, the network stack acts to have it join some multicast 
groups
(eg the all-hosts 224.0.0.1). Now, since ether_setup() have set the bonding 
device
type to be ARPHRD_ETHER and address len to be ETHER_ALEN, the net core code
computes a wrong multicast link address. This is b/c ip_eth_mc_map() is called
where for multicast joins taking place after the enslavement another 
ip_xxx_mc_map()
is called (eg ip_ib_mc_map() when the bond type is ARPHRD_INFINIBAND)

Signed-off-by: Moni Shoua monis at voltaire.com
Signed-off-by: Or Gerlitz ogerlitz at voltaire.com
---
 drivers/net/bonding/bond_main.c  |5 +++--
 drivers/net/bonding/bond_sysfs.c |6 ++
 2 files changed, 5 insertions(+), 6 deletions(-)

Index: net-2.6/drivers/net/bonding/bond_main.c
===
--- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-15 
10:54:41.0 +0300
+++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-15 10:55:48.431862446 
+0300
@@ -1285,8 +1285,9 @@ int bond_enslave(struct net_device *bond
 
/* bond must be initialized by bond_open() before enslaving */
if (!(bond_dev-flags  IFF_UP)) {
-   dprintk(Error, master_dev is not up\n);
-   return -EPERM;
+   printk(KERN_WARNING DRV_NAME
+%s: master_dev is not up in bond_enslave\n,
+   bond_dev-name);
}
 
/* already enslaved */
Index: net-2.6/drivers/net/bonding/bond_sysfs.c
===
--- net-2.6.orig/drivers/net/bonding/bond_sysfs.c   2007-08-15 
10:08:58.0 +0300
+++ net-2.6/drivers/net/bonding/bond_sysfs.c2007-08-15 10:55:48.432862269 
+0300
@@ -266,11 +266,9 @@ static ssize_t bonding_store_slaves(stru
 
/* Quick sanity check -- is the bond interface up? */
if (!(bond-dev-flags  IFF_UP)) {
-   printk(KERN_ERR DRV_NAME
-  : %s: Unable to update slaves because interface is 
down.\n,
+   printk(KERN_WARNING DRV_NAME
+  : %s: doing slave updates when interface is down.\n,
   bond-dev-name);
-   ret = -EPERM;
-   goto out;
}
 
/* Note:  We can't hold bond-lock here, as bond_create grabs it. */

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V5 0/11] net/bonding: ADD IPoIB support for the bonding driver

2007-09-20 Thread Moni Shoua

This patch series is the fifth version (see below link to V4) of the 
suggested changes to the bonding driver so it would be able to support 
non ARPHRD_ETHER netdevices for its High-Availability (active-backup) mode. 

Patches 1-10 were originally submitted in V4 and patch 11 is an addition by Jay.

Jay,
The bonding patches you acked remain unchanged while I guess I sitll need
to get an official ack by Roland for the IPoIB patches.
Is it OK with you to push the entire series to the networking tree?
Roland has already agreed to do so.


Major changes from the previous version:

1. Style changes
2. IPoIB - notify slave detach on vlan delete
3. Add function to net/core for slave detach instead of having it only in
   ib/ipoib
4. IPoIB - handle ib device and bonding device the same way in neigh_cleanup
   function

Links to earlier discussion:


1. A discussion in netdev about bonding support for IPoIB.
http://lists.openwall.net/netdev/2006/11/30/46

2. V4 series
http://lists.openfabrics.org/pipermail/general/2007-August/039825.html

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V5 8/11] net/bonding: Handlle wrong assumptions that slave is always an Ethernet device

2007-09-20 Thread Moni Shoua

bonding sometimes uses Ethernet constants (such as MTU and address length) which
are not good when it enslaves non Ethernet devices (such as InfiniBand).

Signed-off-by: Moni Shoua monis at voltaire.com
---
 drivers/net/bonding/bond_main.c  |3 ++-
 drivers/net/bonding/bond_sysfs.c |   19 +--
 drivers/net/bonding/bonding.h|1 +
 3 files changed, 16 insertions(+), 7 deletions(-)

Index: net-2.6/drivers/net/bonding/bond_main.c
===
--- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-15 
10:55:48.0 +0300
+++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-20 14:29:11.911298577 
+0300
@@ -1224,7 +1224,8 @@ static int bond_compute_features(struct 
struct slave *slave;
struct net_device *bond_dev = bond-dev;
unsigned long features = bond_dev-features;
-   unsigned short max_hard_header_len = ETH_HLEN;
+   unsigned short max_hard_header_len = max((u16)ETH_HLEN,
+   bond_dev-hard_header_len);
int i;
 
features = ~(NETIF_F_ALL_CSUM | BOND_VLAN_FEATURES);
Index: net-2.6/drivers/net/bonding/bond_sysfs.c
===
--- net-2.6.orig/drivers/net/bonding/bond_sysfs.c   2007-08-15 
10:55:48.0 +0300
+++ net-2.6/drivers/net/bonding/bond_sysfs.c2007-08-15 12:14:41.152469089 
+0300
@@ -164,9 +164,7 @@ static ssize_t bonding_store_bonds(struc
printk(KERN_INFO DRV_NAME
: %s is being deleted...\n,
bond-dev-name);
-   bond_deinit(bond-dev);
-   bond_destroy_sysfs_entry(bond);
-   unregister_netdevice(bond-dev);
+   bond_destroy(bond);
rtnl_unlock();
goto out;
}
@@ -260,6 +258,7 @@ static ssize_t bonding_store_slaves(stru
char command[IFNAMSIZ + 1] = { 0, };
char *ifname;
int i, res, found, ret = count;
+   u32 original_mtu;
struct slave *slave;
struct net_device *dev = NULL;
struct bonding *bond = to_bond(d);
@@ -325,6 +324,7 @@ static ssize_t bonding_store_slaves(stru
}
 
/* Set the slave's MTU to match the bond */
+   original_mtu = dev-mtu;
if (dev-mtu != bond-dev-mtu) {
if (dev-change_mtu) {
res = dev-change_mtu(dev,
@@ -339,6 +339,9 @@ static ssize_t bonding_store_slaves(stru
}
rtnl_lock();
res = bond_enslave(bond-dev, dev);
+   bond_for_each_slave(bond, slave, i)
+   if (strnicmp(slave-dev-name, ifname, IFNAMSIZ) == 0)
+   slave-original_mtu = original_mtu;
rtnl_unlock();
if (res) {
ret = res;
@@ -351,13 +354,17 @@ static ssize_t bonding_store_slaves(stru
bond_for_each_slave(bond, slave, i)
if (strnicmp(slave-dev-name, ifname, IFNAMSIZ) == 0) {
dev = slave-dev;
+   original_mtu = slave-original_mtu;
break;
}
if (dev) {
printk(KERN_INFO DRV_NAME : %s: Removing slave %s\n,
bond-dev-name, dev-name);
rtnl_lock();
-   res = bond_release(bond-dev, dev);
+   if (bond-setup_by_slave)
+   res = bond_release_and_destroy(bond-dev, dev);
+   else
+   res = bond_release(bond-dev, dev);
rtnl_unlock();
if (res) {
ret = res;
@@ -365,9 +372,9 @@ static ssize_t bonding_store_slaves(stru
}
/* set the slave MTU to the default */
if (dev-change_mtu) {
-   dev-change_mtu(dev, 1500);
+   dev-change_mtu(dev, original_mtu);
} else {
-   dev-mtu = 1500;
+   dev-mtu = original_mtu;
}
}
else {
Index: net-2.6/drivers/net/bonding/bonding.h
===
--- net-2.6.orig/drivers/net/bonding/bonding.h  2007-08-15 10:55:34.0 
+0300
+++ net-2.6/drivers/net/bonding/bonding.h   2007-08-20 14:29:11.912298402 
+0300
@@ -156,6 +156,7 @@ struct slave {
s8 link;/* one of

PATCH V5 9/11] net/bonding: Delay sending of gratuitous ARP to avoid failure

2007-09-20 Thread Moni Shoua

Delay sending a gratuitous_arp when LINK_STATE_LINKWATCH_PENDING bit
in dev-state field is on. This improves the chances for the arp packet to
be transmitted.

Signed-off-by: Moni Shoua monis at voltaire.com
---
 drivers/net/bonding/bond_main.c |   24 +---
 drivers/net/bonding/bonding.h   |1 +
 2 files changed, 22 insertions(+), 3 deletions(-)

Index: net-2.6/drivers/net/bonding/bond_main.c
===
--- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-15 
10:56:33.0 +0300
+++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-15 11:04:37.221123652 
+0300
@@ -1102,8 +1102,14 @@ void bond_change_active_slave(struct bon
if (new_active  !bond-do_set_mac_addr)
memcpy(bond-dev-dev_addr,  new_active-dev-dev_addr,
new_active-dev-addr_len);
-
-   bond_send_gratuitous_arp(bond);
+   if (bond-curr_active_slave 
+   test_bit(__LINK_STATE_LINKWATCH_PENDING,
+   bond-curr_active_slave-dev-state)) {
+   dprintk(delaying gratuitous arp on %s\n,
+   bond-curr_active_slave-dev-name);
+   bond-send_grat_arp = 1;
+   } else
+   bond_send_gratuitous_arp(bond);
}
 }
 
@@ -2083,6 +2089,17 @@ void bond_mii_monitor(struct net_device 
 * program could monitor the link itself if needed.
 */
 
+   if (bond-send_grat_arp) {
+   if (bond-curr_active_slave  
test_bit(__LINK_STATE_LINKWATCH_PENDING,
+   bond-curr_active_slave-dev-state))
+   dprintk(Needs to send gratuitous arp but not yet\n);
+   else {
+   dprintk(sending delayed gratuitous arp on on %s\n,
+   bond-curr_active_slave-dev-name);
+   bond_send_gratuitous_arp(bond);
+   bond-send_grat_arp = 0;
+   }
+   }
read_lock(bond-curr_slave_lock);
oldcurrent = bond-curr_active_slave;
read_unlock(bond-curr_slave_lock);
@@ -2484,7 +2501,7 @@ static void bond_send_gratuitous_arp(str
 
if (bond-master_ip) {
bond_arp_send(slave-dev, ARPOP_REPLY, bond-master_ip,
- bond-master_ip, 0);
+   bond-master_ip, 0);
}
 
list_for_each_entry(vlan, bond-vlan_list, vlan_list) {
@@ -4293,6 +4310,7 @@ static int bond_init(struct net_device *
bond-current_arp_slave = NULL;
bond-primary_slave = NULL;
bond-dev = bond_dev;
+   bond-send_grat_arp = 0;
INIT_LIST_HEAD(bond-vlan_list);
 
/* Initialize the device entry points */
Index: net-2.6/drivers/net/bonding/bonding.h
===
--- net-2.6.orig/drivers/net/bonding/bonding.h  2007-08-15 10:56:33.0 
+0300
+++ net-2.6/drivers/net/bonding/bonding.h   2007-08-15 11:05:41.516451497 
+0300
@@ -187,6 +187,7 @@ struct bonding {
struct   timer_list arp_timer;
s8   kill_timers;
s8   do_set_mac_addr;
+   s8   send_grat_arp;
struct   net_device_stats stats;
 #ifdef CONFIG_PROC_FS
struct   proc_dir_entry *proc_entry;

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V5 10/11] net/bonding: Destroy bonding master when last slave is gone

2007-09-20 Thread Moni Shoua

When bonding enslaves non Ethernet devices it takes pointers to functions 
in the module that owns the slaves. In this case it becomes unsafe
to keep the bonding master registered after last slave was unenslaved 
because we don't know if the pointers are still valid.  Destroying the bond 
when slave_cnt is zero
ensures that these functions be used anymore.

Signed-off-by: Moni Shoua monis at voltaire.com
---
 drivers/net/bonding/bond_main.c |   45 +++-
 drivers/net/bonding/bonding.h   |3 ++
 2 files changed, 47 insertions(+), 1 deletion(-)

Index: net-2.6/drivers/net/bonding/bond_main.c
===
--- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-20 
14:43:17.123702132 +0300
+++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-20 14:43:17.850571535 
+0300
@@ -1256,6 +1256,7 @@ static int bond_compute_features(struct 
 static void bond_setup_by_slave(struct net_device *bond_dev,
struct net_device *slave_dev)
 {
+   struct bonding *bond = bond_dev-priv;
bond_dev-hard_header   = slave_dev-hard_header;
bond_dev-rebuild_header= slave_dev-rebuild_header;
bond_dev-hard_header_cache = slave_dev-hard_header_cache;
@@ -1270,6 +1271,7 @@ static void bond_setup_by_slave(struct n
 
memcpy(bond_dev-broadcast, slave_dev-broadcast,
slave_dev-addr_len);
+   bond-setup_by_slave = 1;
 }
 
 /* enslave device slave to bond device master */
@@ -1838,6 +1840,35 @@ int bond_release(struct net_device *bond
 }
 
 /*
+* Destroy a bonding device.
+* Must be under rtnl_lock when this function is called.
+*/
+void bond_destroy(struct bonding *bond)
+{
+   bond_deinit(bond-dev);
+   bond_destroy_sysfs_entry(bond);
+   unregister_netdevice(bond-dev);
+}
+
+/*
+* First release a slave and than destroy the bond if no more slaves iare left.
+* Must be under rtnl_lock when this function is called.
+*/
+int  bond_release_and_destroy(struct net_device *bond_dev, struct net_device 
*slave_dev)
+{
+   struct bonding *bond = bond_dev-priv;
+   int ret;
+
+   ret = bond_release(bond_dev, slave_dev);
+   if ((ret == 0)  (bond-slave_cnt == 0)) {
+   printk(KERN_INFO DRV_NAME  %s: destroying bond for.\n,
+   bond_dev-name);
+   bond_destroy(bond);
+   }
+   return ret;
+}
+
+/*
  * This function releases all slaves.
  */
 static int bond_release_all(struct net_device *bond_dev)
@@ -3322,7 +3353,11 @@ static int bond_slave_netdev_event(unsig
switch (event) {
case NETDEV_UNREGISTER:
if (bond_dev) {
-   bond_release(bond_dev, slave_dev);
+   dprintk(slave %s unregisters\n, slave_dev-name);
+   if (bond-setup_by_slave)
+   bond_release_and_destroy(bond_dev, slave_dev);
+   else
+   bond_release(bond_dev, slave_dev);
}
break;
case NETDEV_CHANGE:
@@ -3331,6 +3366,13 @@ static int bond_slave_netdev_event(unsig
 * sets up a hierarchical bond, then rmmod's
 * one of the slave bonding devices?
 */
+   if (slave_dev-priv_flags  IFF_SLAVE_DETACH) {
+   dprintk(slave %s detaching\n, slave_dev-name);
+   if (bond-setup_by_slave)
+   bond_release_and_destroy(bond_dev, slave_dev);
+   else
+   bond_release(bond_dev, slave_dev);
+   }
break;
case NETDEV_DOWN:
/*
@@ -4311,6 +4353,7 @@ static int bond_init(struct net_device *
bond-primary_slave = NULL;
bond-dev = bond_dev;
bond-send_grat_arp = 0;
+   bond-setup_by_slave = 0;
INIT_LIST_HEAD(bond-vlan_list);
 
/* Initialize the device entry points */
Index: net-2.6/drivers/net/bonding/bonding.h
===
--- net-2.6.orig/drivers/net/bonding/bonding.h  2007-08-20 14:43:17.123702132 
+0300
+++ net-2.6/drivers/net/bonding/bonding.h   2007-08-20 14:47:52.845180870 
+0300
@@ -188,6 +188,7 @@ struct bonding {
s8   kill_timers;
s8   do_set_mac_addr;
s8   send_grat_arp;
+   s8   setup_by_slave;
struct   net_device_stats stats;
 #ifdef CONFIG_PROC_FS
struct   proc_dir_entry *proc_entry;
@@ -295,6 +296,8 @@ static inline void bond_unset_master_alb
 struct vlan_entry *bond_next_vlan(struct bonding *bond, struct vlan_entry 
*curr);
 int bond_dev_queue_xmit(struct bonding *bond, struct sk_buff *skb, struct 
net_device *slave_dev);
 int bond_create(char *name, struct bond_params *params, struct bonding 
**newbond);
+void

[PATCH V5 5/11] net/bonding: Enable bonding to enslave non ARPHRD_ETHER

2007-09-20 Thread Moni Shoua

This patch changes some of the bond netdevice attributes and functions
to be that of the active slave for the case of the enslaved device not being
of ARPHRD_ETHER type. Basically it overrides those setting done by 
ether_setup(),
which are netdevice **type** dependent and hence might be not appropriate for
devices of other types. It also enforces mutual exclusion on bonding slaves
from dissimilar ether types, as was concluded over the v1 discussion.

IPoIB (see Documentation/infiniband/ipoib.txt) MAC address is made of a 3 bytes
IB QP (Queue Pair) number and 16 bytes IB port GID (Global ID) of the port this
IPoIB device is bounded to. The QP is a resource created by the IB HW and the
GID is an identifier burned into the HCA (i have omitted here some details which
are not important for the bonding RFC).

Signed-off-by: Moni Shoua monis at voltaire.com
Signed-off-by: Or Gerlitz ogerlitz at voltaire.com
---
 drivers/net/bonding/bond_main.c |   39 +++
 1 files changed, 39 insertions(+)

Index: net-2.6/drivers/net/bonding/bond_main.c
===
--- net-2.6.orig/drivers/net/bonding/bond_main.c2007-08-15 
10:08:59.0 +0300
+++ net-2.6/drivers/net/bonding/bond_main.c 2007-08-15 10:54:13.424688411 
+0300
@@ -1237,6 +1237,26 @@ static int bond_compute_features(struct 
return 0;
 }
 
+
+static void bond_setup_by_slave(struct net_device *bond_dev,
+   struct net_device *slave_dev)
+{
+   bond_dev-hard_header   = slave_dev-hard_header;
+   bond_dev-rebuild_header= slave_dev-rebuild_header;
+   bond_dev-hard_header_cache = slave_dev-hard_header_cache;
+   bond_dev-header_cache_update   = slave_dev-header_cache_update;
+   bond_dev-hard_header_parse = slave_dev-hard_header_parse;
+
+   bond_dev-neigh_setup   = slave_dev-neigh_setup;
+
+   bond_dev-type  = slave_dev-type;
+   bond_dev-hard_header_len   = slave_dev-hard_header_len;
+   bond_dev-addr_len  = slave_dev-addr_len;
+
+   memcpy(bond_dev-broadcast, slave_dev-broadcast,
+   slave_dev-addr_len);
+}
+
 /* enslave device slave to bond device master */
 int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 {
@@ -1311,6 +1331,25 @@ int bond_enslave(struct net_device *bond
goto err_undo_flags;
}
 
+   /* set bonding device ether type by slave - bonding netdevices are
+* created with ether_setup, so when the slave type is not ARPHRD_ETHER
+* there is a need to override some of the type dependent attribs/funcs.
+*
+* bond ether type mutual exclusion - don't allow slaves of dissimilar
+* ether type (eg ARPHRD_ETHER and ARPHRD_INFINIBAND) share the same 
bond
+*/
+   if (bond-slave_cnt == 0) {
+   if (slave_dev-type != ARPHRD_ETHER)
+   bond_setup_by_slave(bond_dev, slave_dev);
+   } else if (bond_dev-type != slave_dev-type) {
+   printk(KERN_ERR DRV_NAME : %s ether type (%d) is different 
+   from other slaves (%d), can not enslave it.\n,
+   slave_dev-name,
+   slave_dev-type, bond_dev-type);
+   res = -EINVAL;
+   goto err_undo_flags;
+   }
+
if (slave_dev-set_mac_address == NULL) {
printk(KERN_ERR DRV_NAME
: %s: Error: The slave device you specified does 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 11/11] bonding: Optionally allow ethernet slaves to keep own MAC

2007-09-20 Thread Moni Shoua

Update the don't change MAC of slaves functionality added in
previous changes to be a generic option, rather than something tied to IB
devices, as it's occasionally useful for regular ethernet devices as well.

Adds fail_over_mac option (which is automatically enabled for IB
slaves), applicable only to active-backup mode.

Includes documentation update.

Updates bonding driver version to 3.2.0.

Signed-off-by: Jay Vosburgh [EMAIL PROTECTED]
---
 Documentation/networking/bonding.txt |   33 +++
 drivers/net/bonding/bond_main.c  |   57 +
 drivers/net/bonding/bond_sysfs.c |   49 +
 drivers/net/bonding/bonding.h|6 ++--
 4 files changed, 121 insertions(+), 24 deletions(-)

diff --git a/Documentation/networking/bonding.txt 
b/Documentation/networking/bonding.txt
index 1da5666..1134062 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -281,6 +281,39 @@ downdelay
will be rounded down to the nearest multiple.  The default
value is 0.
 
+fail_over_mac
+
+   Specifies whether active-backup mode should set all slaves to
+   the same MAC address (the traditional behavior), or, when
+   enabled, change the bond's MAC address when changing the
+   active interface (i.e., fail over the MAC address itself).
+
+   Fail over MAC is useful for devices that cannot ever alter
+   their MAC address, or for devices that refuse incoming
+   broadcasts with their own source MAC (which interferes with
+   the ARP monitor).
+
+   The down side of fail over MAC is that every device on the
+   network must be updated via gratuitous ARP, vs. just updating
+   a switch or set of switches (which often takes place for any
+   traffic, not just ARP traffic, if the switch snoops incoming
+   traffic to update its tables) for the traditional method.  If
+   the gratuitous ARP is lost, communication may be disrupted.
+
+   When fail over MAC is used in conjuction with the mii monitor,
+   devices which assert link up prior to being able to actually
+   transmit and receive are particularly susecptible to loss of
+   the gratuitous ARP, and an appropriate updelay setting may be
+   required.
+
+   A value of 0 disables fail over MAC, and is the default.  A
+   value of 1 enables fail over MAC.  This option is enabled
+   automatically if the first slave added cannot change its MAC
+   address.  This option may be modified via sysfs only when no
+   slaves are present in the bond.
+
+   This option was added in bonding version 3.2.0.
+
 lacp_rate
 
Option specifying the rate in which we'll ask our link partner
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 77caca3..c01ff9d 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -97,6 +97,7 @@ static char *xmit_hash_policy = NULL;
 static int arp_interval = BOND_LINK_ARP_INTERV;
 static char *arp_ip_target[BOND_MAX_ARP_TARGETS] = { NULL, };
 static char *arp_validate = NULL;
+static int fail_over_mac = 0;
 struct bond_params bonding_defaults;
 
 module_param(max_bonds, int, 0);
@@ -130,6 +131,8 @@ module_param_array(arp_ip_target, charp, NULL, 0);
 MODULE_PARM_DESC(arp_ip_target, arp targets in n.n.n.n form);
 module_param(arp_validate, charp, 0);
 MODULE_PARM_DESC(arp_validate, validate src/dst of ARP probes: none 
(default), active, backup or all);
+module_param(fail_over_mac, int, 0);
+MODULE_PARM_DESC(fail_over_mac, For active-backup, do not set all slaves to 
the same MAC.  0 of off (default), 1 for on.);
 
 /*- Global variables */
 
@@ -1099,7 +1102,7 @@ void bond_change_active_slave(struct bonding *bond, 
struct slave *new_active)
/* when bonding does not set the slave MAC address, the bond MAC
 * address is the one of the active slave.
 */
-   if (new_active  !bond-do_set_mac_addr)
+   if (new_active  bond-params.fail_over_mac)
memcpy(bond-dev-dev_addr,  new_active-dev-dev_addr,
new_active-dev-addr_len);
if (bond-curr_active_slave 
@@ -1371,16 +1374,16 @@ int bond_enslave(struct net_device *bond_dev, struct 
net_device *slave_dev)
if (slave_dev-set_mac_address == NULL) {
if (bond-slave_cnt == 0) {
printk(KERN_WARNING DRV_NAME
-   : %s: Warning: The first slave device you 
-   specified does not support setting the MAC 
-   address. This bond MAC address would be that 
-   of the active slave.\n, bond_dev-name);
-   bond-do_set_mac_addr = 0;
-   } else

Re: Please pull 'z1211' branch of wireless-2.6

2007-09-20 Thread John W. Linville

On Wed, Sep 19, 2007 at 11:08:16PM +0100, Daniel Drake wrote:

 I would like to this until 2.6.25 until I have had time to clear up some 
 final issues and do more testing myself of zd1211rw-mac80211. I also 
 think we need to discuss the rename...

Renames being what they are, I was hoping to avoid a bikeshed
discussion about the choice of names.  My main point was to get it
into the tree with a unique and manageable name.  I'm sure we could
still rename it again before 2.6.24 ships or even later.

I know that you will argue that a rename is unnecessary if we
simply port the existing driver to mac80211, which is certainly true.
I just wonder if that is the least bumpy solution for users.  At least
with a new driver, if something doesn't work then the old driver is
still there as a fallback.  Plus you can avoid some confusion with
old howtos and such on the web referring to an old driver instead
of the new one, etc.  Maybe that isn't a huge issue in this case,
but I wouldn't underestimate the possible confusion.

 (just to clarify to others: this is the first I heard of this merge 
 before John posted it).

Yes, sorry...permission, forgiveness...forgive? :-)

 John, thanks a lot for your efforts, I hope you don't mind waiting one 
 extra release cycle for me to sort a few things out.

Well, obviously I would like to get it out now.  The longer we are
without a mac80211-based driver for zd1211 hardware then the longer
we must maintain the softmac component (or at least take bug reports
for it).

If you are determined not to have it in 2.6.24 then I will relent.
I will also suggest that Larry start sending any softmac bugs to
you... :-)

If we will be having a port rather than a new driver, how soon after
2.6.24-rc1 closes can we queue the port for 2.6.25?  I think it
should be almost immediately, to ensure maximum test exposure and to
seal the deal.  What do you think?

Thanks,

John
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH - net-2.6.24 1/2] Introduce and use print_ip

2007-09-20 Thread Thomas Graf

* Joe Perches [EMAIL PROTECTED] 2007-09-19 23:53
 This removes the uses of NIPQUAD and HIPQUAD in
 drivers/net and net
 
 IPV4 Use:
 
   DECLARE_IP_BUF(ipbuf);
   __be32 addr;
   print_ip(ipbuf, addr)
 
 Signed-off-by:  Joe Perches [EMAIL PROTECTED]
 
 please pull from:
 git pull http://repo.or.cz/r/linux-2.6/trivial-mods.git print_ipv4

Including a patch for review would be helpful.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Please pull 'z1211' branch of wireless-2.6

2007-09-20 Thread Daniel Drake


John W. Linville wrote:

I know that you will argue that a rename is unnecessary if we
simply port the existing driver to mac80211, which is certainly true.
I just wonder if that is the least bumpy solution for users.  At least
with a new driver, if something doesn't work then the old driver is
still there as a fallback.  Plus you can avoid some confusion with
old howtos and such on the web referring to an old driver instead
of the new one, etc.  Maybe that isn't a huge issue in this case,
but I wouldn't underestimate the possible confusion.


Maybe I'll provide a one-off externally building driver for 2.6.25 or 
something like that, just as a basis for comparison. I think biting the 
bullet and simply attacking the issues that come up is the best way.


Old documentation will still be relevant for the mac80211 driver, 
especially if we don't change the driver/config names -- offhand I can't 
think of any obvious differences between the user interface to the 2 
drivers.


(just to clarify to others: this is the first I heard of this merge 
before John posted it).


Yes, sorry...permission, forgiveness...forgive? :-)


Of course :)


If you are determined not to have it in 2.6.24 then I will relent.
I will also suggest that Larry start sending any softmac bugs to
you... :-)


That's fine.


If we will be having a port rather than a new driver, how soon after
2.6.24-rc1 closes can we queue the port for 2.6.25?  I think it
should be almost immediately, to ensure maximum test exposure and to
seal the deal.  What do you think?


I think that's realistic, I'll do what I can.

Thanks,
Daniel

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] sb1250-mac.c: De-typedef, de-volatile, de-etc...

2007-09-20 Thread Maciej W. Rozycki

On Thu, 20 Sep 2007, Jeff Garzik wrote:

  You may be pleased (or less so) to hear that the version of sb1250-mac.c in
  your tree does not even build (because of
  42d53d6be113f974d8152979c88e1061b953bd12) and the patch below does not
  address it.  I ran out of time in the evening, but I will send you a fix
  shortly.  To be honest I think even with bulk changes it may be worth
  checking whether they do not break stuff. ;-)
 
 hrm.  I cannot get this to apply on top of linux-2.6.git,
 netdev-2.6.git#upstream (prior to net-2.6.24 rebase) or
 netdev-2.6.git#upstream (after net-2.6.24 rebase)

 It applies on top of current -mm.  It seems to apply to a copy of 
netdev-2.6.git#upstream that I have got, but I am probably missing 
something...  If I try to clone your repository again I get:

$ git clone 
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/linux-netdev-2.6.git linux
Initialized empty Git repository in 
/home/macro/GIT-other/linux-netdev/linux/.git/
fatal: The remote end hung up unexpectedly
fetch-pack from 
'git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/linux-netdev-2.6.git' 
failed.
$

For linux-2.6.git the patch-mips-2.6.23-rc5-20070904-sb1250-mac-typedef-7 
version applies as submitted originally; I can resubmit this one if you 
like.

 I am slowly getting lost and I have another big chunk for sb1250-mac.c 
waiting to be put on top of these...

  Maciej
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [LARTC] ifb and ppp

2007-09-20 Thread Frithjof Hammer

 Sorry, I didnt follow the thread - what is the goal to be achieved with
 the setup?

A simple ingress shaping on ppp0 (PPPOE DSL line). I want to replace my old 
imq ingress shaper in favor of ifb. My former script used iptables marks  to 
classify the packets. My iptables marks are getting set, as like before with 
imq. But tc seems not to recognize them: It only uses the default class.

So i run tcpdump -i ifb0  and discovered that the packets seems to be still 
encapsulated on ifb0. I suppose this is why my iptables stuff is not working.

I've attached the ingress part of my shaping script. 

Thanks for your help
Frithjof

 
tc qdisc del dev ppp0 root2 /dev/null  /dev/null
tc qdisc del dev ifb0 root 2 /dev/null  /dev/null
tc qdisc del dev ppp0 ingress

 modprobe ifb
 ifconfig ifb0 up

 tc qdisc add dev ppp0 ingress
 tc filter add dev ppp0 parent : protocol ip u32 match u32 0 0 flowid 1:1 
action mirred egress redirect dev ifb0

 tc qdisc add dev ifb0 handle 1: root hfsc default 32
 tc class add dev ifb0 parent 1: classid 1:1 hfsc sc rate 6000kbit ul rate 
6000kbit

 tc class add dev ifb0 parent 1:1 classid 1:30 hfsc rt umax 208b dmax 20ms rate 
83kbit ls rate 120kbit
 tc class add dev ifb0 parent 1:1 classid 1:31 hfsc sc rate $[(6000-120)/3]kbit 
ul rate 6000kbit
 tc class add dev ifb0 parent 1:1 classid 1:32 hfsc sc rate 
$[(6000-120)/3*2]kbit ul rate  6000kbit

 tc qdisc add dev ifb0 parent 1:30 handle 30: sfq perturb 10
 tc qdisc add dev ifb0 parent 1:31 handle 31: sfq perturb 10
 tc qdisc add dev ifb0 parent 1:32 handle 32: red limit 100 min 5000 max 
10 avpkt 1000 burst 50

 tc filter add dev ifb0 parent 1:0 prio 0 protocol ip handle 30 fw flowid 1:30
 tc filter add dev ifb0 parent 1:0 prio 0 protocol ip handle 31 fw flowid 1:31
 tc filter add dev ifb0 parent 1:0 prio 0 protocol ip handle 32 fw flowid 1:32


 iptables -t mangle -N MYSHAPER-IN
 iptables -t mangle -I PREROUTING -i ppp0 -j MYSHAPER-IN

 iptables -t mangle -A MYSHAPER-IN -p tcp -m length --length :64 -j MARK 
--set-mark 31 # short TCP packets are probably ACKs
 iptables -t mangle -A MYSHAPER-IN -p tcp --dport 22 -m length --length :500 -j 
MARK --set-mark 3# secure shell
 iptables -t mangle -A MYSHAPER-IN -p tcp --sport 22 -m length --length :500 -j 
MARK --set-mark 31# secure shell
 iptables -t mangle -A MYSHAPER-IN -p ! tcp -j MARK --set-mark 31  
# Set non-tcp packets to high priority
 iptables -t mangle -A MYSHAPER-IN -m mark --mark 0 -j MARK --set-mark 32   
   # redundant- mark any unmarked packets as 26 (low prio)

[...]

Re: [PATCH - net-2.6.24 0/2] Introduce and use print_ip and print_ipv6

2007-09-20 Thread Randy Dunlap

On Wed, 19 Sep 2007 23:53:31 -0700 Joe Perches wrote:

 In the same vein as print_mac, the implementations
 introduce declaration macros:
   DECLARE_IP_BUF(var)
   DECLARE_IPV6_BUF(var)
 and functions:
   print_ip
   print_ipv6
   print_ipv6_nofmt
 
 IPV4 Use:
 
   DECLARE_IP_BUF(ipbuf);
   __be32 addr;
   print_ip(ipbuf, addr);
 
 IPV6 use:
 
   DECLARE_IPV6_BUF(ipv6buf);
   const struct in6_addr *addr;
   print_ipv6(ipv6buf, addr);
 and
   print_ipv6_nofmt(ipv6buf, addr);
 
 compiled x86, defconfig and allyesconfig


How large are the patches if you posted them for review instead
of just referencing gits for them?  (which cuts down on review
possibilities)

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: net-2.6.24 plans

2007-09-20 Thread Johannes Berg

On Thu, 2007-09-20 at 10:17 -0400, John W. Linville wrote:

  2) ATMEL USB driver
 
 These are both really new.  I think I'll transfer them to my
 wireless-2.6 tree, but still hold them back at least until 2.6.25.

Also, atmel isn't even ported to mac80211 yet, is it?

  3) NL80211
 
 I need to check w/ Johannes to see if the user-facing portions of
 this have stabilized.

I have a patch to basically remove everything from nl80211 that we're
not using today, and make the interface well-defined so each type of
setting has methods to new, del, get, set, for example create, remove,
get info or change a virtual interface. If you wish, I can post this
patch for inclusion into wireless-dev and then copy the resulting
nl80211 to net-2.6.24, including the mac80211 hooks to make use of it.
Shouldn't take more than a few hours.

johannes


signature.asc
Description: This is a digitally signed message part

Re: net-2.6.24 plans

2007-09-20 Thread John W. Linville

On Wed, Sep 19, 2007 at 03:19:28PM -0700, David Miller wrote:

 So it looks like what's left is:
 
 1) ATH5K driver
 2) ATMEL USB driver

These are both really new.  I think I'll transfer them to my
wireless-2.6 tree, but still hold them back at least until 2.6.25.

 3) NL80211

I need to check w/ Johannes to see if the user-facing portions of
this have stabilized.

 4) misc bits sprinkled around mac80211

These bits are mostly pieces with unsettled user inferface issues
or unsettled features that still need some development.  I'll be
holding-on to these a while longer.

Thanks,

John
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: bnx2 dirver's firmware images

2007-09-20 Thread Denys Vlasenko

On Wednesday 19 September 2007 22:43, Michael Chan wrote:
 On Wed, 2007-09-19 at 21:29 +0100, Denys Vlasenko wrote:
 
  Are you saying that you successfully run-tested it?
 
 I've only reviewed the code.  Let's resolve these issues first before
 testing the code.

Please test these two patches.
I updated them according to your comments.
--
vda
diff -urpN linux-2.6.23-rc6/drivers/net/bnx2.c linux-2.6.23-rc6.bnx2/drivers/net/bnx2.c
--- linux-2.6.23-rc6/drivers/net/bnx2.c	2007-09-14 00:08:11.0 +0100
+++ linux-2.6.23-rc6.bnx2/drivers/net/bnx2.c	2007-09-20 15:47:06.0 +0100
@@ -52,6 +52,8 @@
 #include bnx2_fw.h
 #include bnx2_fw2.h
 
+#define FW_BUF_SIZE		0x8000
+
 #define DRV_MODULE_NAME		bnx2
 #define PFX DRV_MODULE_NAME	: 
 #define DRV_MODULE_VERSION	1.6.4
@@ -2767,89 +2769,44 @@ bnx2_set_rx_mode(struct net_device *dev)
 	spin_unlock_bh(bp-phy_lock);
 }
 
-#define FW_BUF_SIZE	0x8000
-
+/* To be moved to generic lib/ */
 static int
-bnx2_gunzip_init(struct bnx2 *bp)
+bnx2_gunzip(void *gunzip_buf, unsigned sz, u8 *zbuf, int len, void **outbuf)
 {
-	if ((bp-gunzip_buf = vmalloc(FW_BUF_SIZE)) == NULL)
-		goto gunzip_nomem1;
+	struct z_stream_s *strm;
+	int rc;
 
-	if ((bp-strm = kmalloc(sizeof(*bp-strm), GFP_KERNEL)) == NULL)
-		goto gunzip_nomem2;
+	/* gzip header (1f,8b,08... 10 bytes total + possible asciz filename)
+	 * is stripped */
 
-	bp-strm-workspace = kmalloc(zlib_inflate_workspacesize(), GFP_KERNEL);
-	if (bp-strm-workspace == NULL)
+	rc = -ENOMEM;
+	strm = kmalloc(sizeof(*strm), GFP_KERNEL);
+	if (strm == NULL)
+		goto gunzip_nomem2;
+	strm-workspace = kmalloc(zlib_inflate_workspacesize(), GFP_KERNEL);
+	if (strm-workspace == NULL)
 		goto gunzip_nomem3;
 
-	return 0;
+	strm-next_in = zbuf;
+	strm-avail_in = len;
+	strm-next_out = gunzip_buf;
+	strm-avail_out = sz;
+
+	rc = zlib_inflateInit2(strm, -MAX_WBITS);
+	if (rc == Z_OK) {
+		rc = zlib_inflate(strm, Z_FINISH);
+		if (rc == Z_OK)
+			rc = sz - strm-avail_out;
+		else
+			rc = -EINVAL;
+		zlib_inflateEnd(strm);
+	} else
+		rc = -EINVAL;
 
+	kfree(strm-workspace);
 gunzip_nomem3:
-	kfree(bp-strm);
-	bp-strm = NULL;
-
+	kfree(strm);
 gunzip_nomem2:
-	vfree(bp-gunzip_buf);
-	bp-gunzip_buf = NULL;
-
-gunzip_nomem1:
-	printk(KERN_ERR PFX %s: Cannot allocate firmware buffer for 
-			uncompression.\n, bp-dev-name);
-	return -ENOMEM;
-}
-
-static void
-bnx2_gunzip_end(struct bnx2 *bp)
-{
-	kfree(bp-strm-workspace);
-
-	kfree(bp-strm);
-	bp-strm = NULL;
-
-	if (bp-gunzip_buf) {
-		vfree(bp-gunzip_buf);
-		bp-gunzip_buf = NULL;
-	}
-}
-
-static int
-bnx2_gunzip(struct bnx2 *bp, u8 *zbuf, int len, void **outbuf, int *outlen)
-{
-	int n, rc;
-
-	/* check gzip header */
-	if ((zbuf[0] != 0x1f) || (zbuf[1] != 0x8b) || (zbuf[2] != Z_DEFLATED))
-		return -EINVAL;
-
-	n = 10;
-
-#define FNAME	0x8
-	if (zbuf[3]  FNAME)
-		while ((zbuf[n++] != 0)  (n  len));
-
-	bp-strm-next_in = zbuf + n;
-	bp-strm-avail_in = len - n;
-	bp-strm-next_out = bp-gunzip_buf;
-	bp-strm-avail_out = FW_BUF_SIZE;
-
-	rc = zlib_inflateInit2(bp-strm, -MAX_WBITS);
-	if (rc != Z_OK)
-		return rc;
-
-	rc = zlib_inflate(bp-strm, Z_FINISH);
-
-	*outlen = FW_BUF_SIZE - bp-strm-avail_out;
-	*outbuf = bp-gunzip_buf;
-
-	if ((rc != Z_OK)  (rc != Z_STREAM_END))
-		printk(KERN_ERR PFX %s: Firmware decompression error: %s\n,
-		   bp-dev-name, bp-strm-msg);
-
-	zlib_inflateEnd(bp-strm);
-
-	if (rc == Z_STREAM_END)
-		return 0;
-
 	return rc;
 }
 
@@ -2902,22 +2859,21 @@ load_cpu_fw(struct bnx2 *bp, struct cpu_
 	/* Load the Text area. */
 	offset = cpu_reg-spad_base + (fw-text_addr - cpu_reg-mips_view_base);
 	if (fw-gz_text) {
-		u32 text_len;
-		void *text;
-
-		rc = bnx2_gunzip(bp, fw-gz_text, fw-gz_text_len, text,
- text_len);
-		if (rc)
-			return rc;
-
-		fw-text = text;
-	}
-	if (fw-gz_text) {
+		u32 *text;
 		int j;
 
+		text = vmalloc(FW_BUF_SIZE);
+		if (!text)
+			return -ENOMEM;
+		rc = bnx2_gunzip(text, FW_BUF_SIZE, fw-gz_text, fw-gz_text_len);
+		if (rc  0) {
+			vfree(text);
+			return rc;
+		}
 		for (j = 0; j  (fw-text_len / 4); j++, offset += 4) {
-			REG_WR_IND(bp, offset, cpu_to_le32(fw-text[j]));
+			REG_WR_IND(bp, offset, cpu_to_le32(text[j]));
 	}
+		vfree(text);
 	}
 
 	/* Load the Data area. */
@@ -2979,27 +2935,27 @@ bnx2_init_cpus(struct bnx2 *bp)
 {
 	struct cpu_reg cpu_reg;
 	struct fw_info *fw;
-	int rc = 0;
+	int rc;
 	void *text;
-	u32 text_len;
-
-	if ((rc = bnx2_gunzip_init(bp)) != 0)
-		return rc;
 
 	/* Initialize the RV2P processor. */
-	rc = bnx2_gunzip(bp, bnx2_rv2p_proc1, sizeof(bnx2_rv2p_proc1), text,
-			 text_len);
-	if (rc)
+	text = vmalloc(FW_BUF_SIZE);
+	if (!text)
+		return -ENOMEM;
+	rc = bnx2_gunzip(text, FW_BUF_SIZE, bnx2_rv2p_proc1, sizeof(bnx2_rv2p_proc1));
+	if (rc  0) {
+		vfree(text);
 		goto init_cpu_err;
+	}
+	load_rv2p_fw(bp, text, rc /* == len */, RV2P_PROC1);
 
-	load_rv2p_fw(bp, text, text_len, RV2P_PROC1);
-
-	rc = bnx2_gunzip(bp, bnx2_rv2p_proc2, sizeof(bnx2_rv2p_proc2), text,
-			 text_len);
-	if (rc)
+

[PATCH 0/3 Rev-4] Age Entry For IPv4 IPv6 Route Table

2007-09-20 Thread Varun Chandramohan

Hi Dave,
Thanks for the comment. I have created another patch set as you have 
suggested.
Your Comments:
In avoiding the age initialization at routing cache insertion time,
you make the value provided totally inaccurate and essentially
useless especially the very first time the value is asked for.

I really don't like these changes, they have had problems every step
of the way, and the above proves that we could essentially always
return an age value of zero and still be compliant with the standards.

+   if (!*age) {
 + *age = timeval_to_sec(tv);
 + NLA_PUT_U32(skb, RTA_AGE, *age);
I have made a mistake. Sorry i didnt catch it earlier :-)
So, NLA_PUT_U32(skb, RTA_AGE, 0) would have made more sense?
 + } else {
 + NLA_PUT_U32(skb, RTA_AGE, timeval_to_sec(tv) - *age);
 + }

Since you didnt like the hack, i have reimplemented the above by initilizing 
the age value at the time of insertion. I hope this is what you pointed out in 
your comments. Please let me know if its ok.

Stephen, as the age value is human readable we decided that it need not be 
accurate. I thought that rounding up will make it a bit more readable. But i 
think you are right. So, in this patchset i have taken care of this issue. Is 
this ok? 

Regards,
Varun

Original Comment:
According to the RFC 4292 (IP Forwarding Table MIB) there is a need for an age 
entry for all the routes in therouting table. The entry in the RFC is 
inetCidrRouteAge and oid is inetCidrRouteAge.1.10.
Many snmp application require this age entry. So iam adding the age field in 
the routing table for ipv4 and ipv6 and providing the interface for this value 
netlink.

I made a note of changes i made as per the suggestions given in the community. 
Here is the changelog.

Changelog since ver 1:
-
Changes Suggestion  
 
1)Change in the interface from proc to netlink.
  It was not approved by David Miller and Yoshifuji.David Miller  
Yoshifuji

2)Change from jiffies to timeval.   Eric Dumazet

3)Rounding up timeval   Patrick 
McHardy, Oliver Hartkopp
Eric Dumazet.

4)Relocate timeval_to_sec   Stephen 
Hemminger, Krishna Kumar

5)Using macro RT6_GET_ROUTE_INFOKrishna Kumar

6)Add proper comment for timeval_to_sec Eric Dumazet

7)Add proper comment for timeval insertion  Thomas Graf 
  

8)Insert the age value at route insertion   David Miller

9)Remove round off. Stephen 
Hemminger
Signed-off-by: Varun Chandramohan [EMAIL PROTECTED]
---

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH - net-2.6.24 0/2] Introduce and use print_ip and print_ipv6

2007-09-20 Thread Joe Perches

On Thu, 2007-09-20 at 07:55 -0700, Randy Dunlap wrote:
 How large are the patches if you posted them for review instead
 of just referencing gits for them?  (which cuts down on review
 possibilities)

The v4 is ~130kb, the v6 ~35kb.

There is a gitweb available at:

print_ip:
http://repo.or.cz/w/linux-2.6/trivial-mods.git?a=shortlog;h=print_ipv4
commit diff:
http://repo.or.cz/w/linux-2.6/trivial-mods.git?a=commitdiff;h=1e3a30d5d8b49b3accca07cc84ecf6d977cacdd5

print_ipv6:
http://repo.or.cz/w/linux-2.6/trivial-mods.git?a=shortlog;h=print_ipv6
commit diff:
http://repo.or.cz/w/linux-2.6/trivial-mods.git?a=commitdiff;h=e96b794a57a164db84379e2baf5fe2622a5ae3bf


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/3 Rev4] New attribute RTA_AGE

2007-09-20 Thread Varun Chandramohan

A new attribute RTA_AGE is added for the age value to be exported to userlevel 
using netlink

Signed-off-by: Varun Chandramohan [EMAIL PROTECTED]
---
 include/linux/rtnetlink.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index c91476c..68046a4 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -263,6 +263,7 @@ enum rtattr_type_t
RTA_SESSION,
RTA_MP_ALGO, /* no longer used */
RTA_TABLE,
+   RTA_AGE,
__RTA_MAX
 };
 
-- 
1.4.3.4

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/3 Rev4] Initilize and populate age field

2007-09-20 Thread Varun Chandramohan

The age field is filled with the current time at the time of creation of the 
route. When the routes are dumped
then the age value stored in the route structure is subtracted from the current 
time value and the difference is the age expressed in secs.

Signed-off-by: Varun Chandramohan [EMAIL PROTECTED]
---
 net/ipv4/fib_hash.c  |5 +
 net/ipv4/fib_lookup.h|3 ++-
 net/ipv4/fib_semantics.c |   13 ++---
 net/ipv4/fib_trie.c  |1 +
 4 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/fib_hash.c b/net/ipv4/fib_hash.c
index 9ad1d9f..bb52193 100644
--- a/net/ipv4/fib_hash.c
+++ b/net/ipv4/fib_hash.c
@@ -385,6 +385,7 @@ static int fn_hash_insert(struct fib_tab
struct fib_alias *fa, *new_fa;
struct fn_zone *fz;
struct fib_info *fi;
+   struct timeval tv;
u8 tos = cfg-fc_tos;
__be32 key;
int err;
@@ -420,6 +421,7 @@ static int fn_hash_insert(struct fib_tab
else
fa = fib_find_alias(f-fn_alias, tos, fi-fib_priority);
 
+   do_gettimeofday(tv);
/* Now fa, if non-NULL, points to the first fib alias
 * with the same keys [prefix,tos,priority], if such key already
 * exists or to the node before which we will insert new one.
@@ -448,6 +450,7 @@ static int fn_hash_insert(struct fib_tab
fa-fa_info = fi;
fa-fa_type = cfg-fc_type;
fa-fa_scope = cfg-fc_scope;
+   fa-fa_age = tv.tv_sec;
state = fa-fa_state;
fa-fa_state = ~FA_S_ACCESSED;
fib_hash_genid++;
@@ -507,6 +510,7 @@ static int fn_hash_insert(struct fib_tab
new_fa-fa_type = cfg-fc_type;
new_fa-fa_scope = cfg-fc_scope;
new_fa-fa_state = 0;
+   new_fa-fa_age = tv.tv_sec;
 
/*
 * Insert new entry to the list.
@@ -697,6 +701,7 @@ fn_hash_dump_bucket(struct sk_buff *skb,
  f-fn_key,
  fz-fz_order,
  fa-fa_tos,
+ fa-fa_age,
  fa-fa_info,
  NLM_F_MULTI)  0) {
cb-args[4] = i;
diff --git a/net/ipv4/fib_lookup.h b/net/ipv4/fib_lookup.h
index eef9eec..76c4a47 100644
--- a/net/ipv4/fib_lookup.h
+++ b/net/ipv4/fib_lookup.h
@@ -13,6 +13,7 @@ struct fib_alias {
u8  fa_type;
u8  fa_scope;
u8  fa_state;
+   time_t  fa_age;
 };
 
 #define FA_S_ACCESSED  0x01
@@ -27,7 +28,7 @@ extern struct fib_info *fib_create_info(
 extern int fib_nh_match(struct fib_config *cfg, struct fib_info *fi);
 extern int fib_dump_info(struct sk_buff *skb, u32 pid, u32 seq, int event,
 u32 tb_id, u8 type, u8 scope, __be32 dst,
-int dst_len, u8 tos, struct fib_info *fi,
+int dst_len, u8 tos, time_t age, struct fib_info *fi,
 unsigned int);
 extern void rtmsg_fib(int event, __be32 key, struct fib_alias *fa,
  int dst_len, u32 tb_id, struct nl_info *info,
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index c434119..fa892ce 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -278,7 +278,8 @@ static inline size_t fib_nlmsg_size(stru
 + nla_total_size(4) /* RTA_TABLE */
 + nla_total_size(4) /* RTA_DST */
 + nla_total_size(4) /* RTA_PRIORITY */
-+ nla_total_size(4); /* RTA_PREFSRC */
++ nla_total_size(4) /* RTA_PREFSRC */
++ nla_total_size(4); /*RTA_AGE*/
 
/* space for nested metrics */
payload += nla_total_size((RTAX_MAX * nla_total_size(4)));
@@ -313,7 +314,7 @@ void rtmsg_fib(int event, __be32 key, st
 
err = fib_dump_info(skb, info-pid, seq, event, tb_id,
fa-fa_type, fa-fa_scope, key, dst_len,
-   fa-fa_tos, fa-fa_info, nlm_flags);
+   fa-fa_tos, fa-fa_age, fa-fa_info, nlm_flags);
if (err  0) {
/* -EMSGSIZE implies BUG in fib_nlmsg_size() */
WARN_ON(err == -EMSGSIZE);
@@ -940,11 +941,12 @@ __be32 __fib_res_prefsrc(struct fib_resu
 }
 
 int fib_dump_info(struct sk_buff *skb, u32 pid, u32 seq, int event,
- u32 tb_id, u8 type, u8 scope, __be32 dst, int dst_len, u8 tos,
+ u32 tb_id, u8 type, u8 scope, __be32 dst, int dst_len, u8 
tos, time_t age,
  struct fib_info *fi, unsigned int flags)
 {
struct nlmsghdr *nlh;
struct rtmsg *rtm;
+   struct timeval tv;
 
nlh = nlmsg_put(skb, pid,

[PATCH 3/3 Rev4] Initialize and fill IPv6 route age

2007-09-20 Thread Varun Chandramohan

The age field of the ipv6 route structures are initilized with the current 
timeval at the time of route
creation. When the route dump is called the route age value stored in the 
structure is subtracted from the
present timeval and the difference is passed on as the route age.

Signed-off-by: Varun Chandramohan [EMAIL PROTECTED]
---
 include/net/ip6_fib.h |1 +
 net/ipv6/addrconf.c   |5 +
 net/ipv6/route.c  |   14 ++
 3 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index c48ea87..e30a1cf 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -98,6 +98,7 @@ struct rt6_info

u32 rt6i_flags;
u32 rt6i_metric;
+   time_t  rt6i_age;
atomic_trt6i_ref;
struct fib6_table   *rt6i_table;
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 91ef3be..e77c6ad 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4182,6 +4182,7 @@ EXPORT_SYMBOL(unregister_inet6addr_notif
 
 int __init addrconf_init(void)
 {
+   struct timeval tv;
int err = 0;
 
/* The addrconf netdev notifier requires that loopback_dev
@@ -4209,10 +4210,14 @@ int __init addrconf_init(void)
if (err)
return err;
 
+   do_gettimeofday(tv);
ip6_null_entry.rt6i_idev = in6_dev_get(loopback_dev);
+   ip6_null_entry.rt6i_age = tv.tv_sec;
 #ifdef CONFIG_IPV6_MULTIPLE_TABLES
ip6_prohibit_entry.rt6i_idev = in6_dev_get(loopback_dev);
+   ip6_prohibit_entry.rt6i_age = tv.tv_sec;
ip6_blk_hole_entry.rt6i_idev = in6_dev_get(loopback_dev);
+   ip6_blk_hole_entry.rt6i_age = tv.tv_sec;
 #endif
 
register_netdevice_notifier(ipv6_dev_notf);
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 55ea80f..e9a9d00 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -600,7 +600,14 @@ static int __ip6_ins_rt(struct rt6_info
 {
int err;
struct fib6_table *table;
+   struct timeval tv;
 
+   do_gettimeofday(tv);
+   /* Update the timeval for new routes
+* We add it here to make it common irrespective
+* of how the new route is added.
+*/
+   rt-rt6i_age = tv.tv_sec;
table = rt-rt6i_table;
write_lock_bh(table-tb6_lock);
err = fib6_add(table-tb6_root, rt, info);
@@ -2112,6 +2119,7 @@ static inline size_t rt6_nlmsg_size(void
   + nla_total_size(4) /* RTA_IIF */
   + nla_total_size(4) /* RTA_OIF */
   + nla_total_size(4) /* RTA_PRIORITY */
+  + nla_total_size(4) /*RTA_AGE*/
   + RTAX_MAX * nla_total_size(4) /* RTA_METRICS */
   + nla_total_size(sizeof(struct rta_cacheinfo));
 }
@@ -2123,6 +2131,7 @@ static int rt6_fill_node(struct sk_buff
 {
struct rtmsg *rtm;
struct nlmsghdr *nlh;
+   struct timeval tv;
long expires;
u32 table;
 
@@ -2186,6 +2195,11 @@ static int rt6_fill_node(struct sk_buff
if (ipv6_get_saddr(rt-u.dst, dst, saddr_buf) == 0)
NLA_PUT(skb, RTA_PREFSRC, 16, saddr_buf);
}
+   
+   do_gettimeofday(tv);
+   if (rt-rt6i_age) {
+   NLA_PUT_U32(skb, RTA_AGE, (tv.tv_sec - rt-rt6i_age));
+   }
 
if (rtnetlink_put_metrics(skb, rt-u.dst.metrics)  0)
goto nla_put_failure;
-- 
1.4.3.4

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: wrong arp query with policy routing

2007-09-20 Thread Chuck Ebbert

 Is there a way to force linux to make an arp
 probe with the source ip belonging to the
 same subnet requesting ip?

Umm, arp_filter?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [LARTC] ifb and ppp

2007-09-20 Thread Patrick McHardy

Frithjof Hammer wrote:
Sorry, I didnt follow the thread - what is the goal to be achieved with
the setup?
 
 
 A simple ingress shaping on ppp0 (PPPOE DSL line). I want to replace my old 
 imq ingress shaper in favor of ifb. My former script used iptables marks  to 
 classify the packets. My iptables marks are getting set, as like before with 
 imq. But tc seems not to recognize them: It only uses the default class.
 
 So i run tcpdump -i ifb0  and discovered that the packets seems to be still 
 encapsulated on ifb0. I suppose this is why my iptables stuff is not working.


Thats actually a completely different problem. Unlike with imq, packets
are delivered to ifb *before* they pass through iptables. So at that
time they're not marked. I don't see a good solution for this that
allows to keep the iptables rules, I'd suggest to switch to ematches.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH][MIPS][7/7] AR7: ethernet

2007-09-20 Thread Matteo Croce

Driver for the cpmac 100M ethernet driver.
Jeff, here is the meat ;)

Signed-off-by: Matteo Croce [EMAIL PROTECTED]
Signed-off-by: Eugene Konev [EMAIL PROTECTED]

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 6a0863e..28ba0dc 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -1822,6 +1822,15 @@ config SC92031
  To compile this driver as a module, choose M here: the module
  will be called sc92031.  This is recommended.
 
+config CPMAC
+   tristate TI AR7 CPMAC Ethernet support (EXPERIMENTAL)
+   depends on NET_ETHERNET  EXPERIMENTAL  AR7
+   select PHYLIB
+   select FIXED_PHY
+   select FIXED_MII_100_FDX
+   help
+ TI AR7 CPMAC Ethernet support
+
 config NET_POCKET
bool Pocket and portable adapters
depends on PARPORT
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 9501d64..b536934 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -157,6 +157,7 @@ obj-$(CONFIG_8139CP) += 8139cp.o
 obj-$(CONFIG_8139TOO) += 8139too.o
 obj-$(CONFIG_ZNET) += znet.o
 obj-$(CONFIG_LAN_SAA9730) += saa9730.o
+obj-$(CONFIG_CPMAC) += cpmac.o
 obj-$(CONFIG_DEPCA) += depca.o
 obj-$(CONFIG_EWRK3) += ewrk3.o
 obj-$(CONFIG_ATP) += atp.o
diff --git a/drivers/net/cpmac.c b/drivers/net/cpmac.c
new file mode 100644
index 000..50aad94
--- /dev/null
+++ b/drivers/net/cpmac.c
@@ -0,0 +1,1166 @@
+/*
+ * Copyright (C) 2006, 2007 Eugene Konev
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#include linux/module.h
+#include linux/init.h
+#include linux/moduleparam.h
+
+#include linux/sched.h
+#include linux/kernel.h
+#include linux/slab.h
+#include linux/errno.h
+#include linux/types.h
+#include linux/delay.h
+#include linux/version.h
+
+#include linux/netdevice.h
+#include linux/etherdevice.h
+#include linux/ethtool.h
+#include linux/skbuff.h
+#include linux/mii.h
+#include linux/phy.h
+#include linux/platform_device.h
+#include linux/dma-mapping.h
+#include asm/gpio.h
+
+MODULE_AUTHOR(Eugene Konev);
+MODULE_DESCRIPTION(TI AR7 ethernet driver (CPMAC));
+MODULE_LICENSE(GPL);
+
+static int rx_ring_size = 64;
+static int disable_napi;
+static int debug_level = 8;
+static int dumb_switch;
+
+module_param(rx_ring_size, int, 0644);
+module_param(disable_napi, int, 0644);
+/* Next 2 are only used in cpmac_probe, so it's pointless to change them */
+module_param(debug_level, int, 0444);
+module_param(dumb_switch, int, 0444);
+
+MODULE_PARM_DESC(rx_ring_size, Size of rx ring (in skbs));
+MODULE_PARM_DESC(disable_napi, Disable NAPI polling);
+MODULE_PARM_DESC(debug_level, Number of NETIF_MSG bits to enable);
+MODULE_PARM_DESC(dumb_switch, Assume switch is not connected to MDIO bus);
+
+/* frame size + 802.1q tag */
+#define CPMAC_SKB_SIZE (ETH_FRAME_LEN + 4)
+#define CPMAC_TX_RING_SIZE 8
+
+/* Ethernet registers */
+#define CPMAC_TX_CONTROL   0x0004
+#define CPMAC_TX_TEARDOWN  0x0008
+#define CPMAC_RX_CONTROL   0x0014
+#define CPMAC_RX_TEARDOWN  0x0018
+#define CPMAC_MBP  0x0100
+# define MBP_RXPASSCRC 0x4000
+# define MBP_RXQOS 0x2000
+# define MBP_RXNOCHAIN 0x1000
+# define MBP_RXCMF 0x0100
+# define MBP_RXSHORT   0x0080
+# define MBP_RXCEF 0x0040
+# define MBP_RXPROMISC 0x0020
+# define MBP_PROMISCCHAN(channel)  (((channel)  0x7)  16)
+# define MBP_RXBCAST   0x2000
+# define MBP_BCASTCHAN(channel)(((channel)  0x7)  8)
+# define MBP_RXMCAST   0x0020
+# define MBP_MCASTCHAN(channel)((channel)  0x7)
+#define CPMAC_UNICAST_ENABLE   0x0104
+#define CPMAC_UNICAST_CLEAR0x0108
+#define CPMAC_MAX_LENGTH   0x010c
+#define CPMAC_BUFFER_OFFSET0x0110
+#define CPMAC_MAC_CONTROL  0x0160
+# define MAC_TXPTYPE   0x0200
+# define MAC_TXPACE0x0040
+# define MAC_MII   0x0020
+# define MAC_TXFLOW0x0010
+# define MAC_RXFLOW0x0008
+# define MAC_MTEST 0x0004
+# define MAC_LOOPBACK

Re: [PATCH V5 2/11] IB/ipoib: Notify the world before doing unregister

2007-09-20 Thread Roland Dreier

  +ipoib_slave_detach(cpriv-dev);
   unregister_netdev(cpriv-dev);

Maybe you already answered this before, but I'm still not clear why
this notifier call can't just be added to the start of
unregister_netdevice(), so we can avoid having driver needing to know
anything about bonding internals?

 - R.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] RDMA/CMA: Use neigh_event_send() to initiate neighbour discovery.

2007-09-20 Thread Roland Dreier

  Roland - can you please queue this up for 2.6.24?

Done, thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH]: Preliminary release of Sun Neptune driver

2007-09-20 Thread Ariel Hendel


Thanks Dave for your preliminary posting of the driver.
I am copying Matheos Worku. Matheos is intimately familiar with 
the Neptune/NIU family of devices and their respective drivers.


Not only he can be a good reviewer, he can also clarify issues 
around naming and so on. I agree that Neptune is just an overused 
internal codename not worth propagating in the code.


Please feel free to add [EMAIL PROTECTED] to the reviewers list.

Ariel

David Miller wrote:

From: Rick Jones [EMAIL PROTECTED]
Date: Wed, 19 Sep 2007 16:20:39 -0700



so why niu?  To what does niu translate anyway?



Network Interface Unit.  This is what the Niagara-2 programmers manual
refers to the chip as.

I try to name the files for most drivers I write as a 2 or 3 letter
acronyms, it looks so much better than the usual verbose names.  It's
very unix.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] net/: all net/ cleanup with ARRAY_SIZE

2007-09-20 Thread rae l

On 9/17/07, David Miller [EMAIL PROTECTED] wrote:
 From: Denis Cheng [EMAIL PROTECTED]
 Date: Sun,  2 Sep 2007 18:30:17 +0800

  Signed-off-by: Denis Cheng [EMAIL PROTECTED]

 You already submitted the net/ipv4/af_inet.c case
 seperately, so I had to remove it from this patch for
 it to apply properly.

 Please keep your patches straight to avoid problems
 like this.
I just can say sorry. But at that time, I'm not sure the former
specific patch to net/ipv4/af_inet.c would be applied, and then I
realized that change should be done with every subsystem in the kernel
source, so I regenerate a new patch for the whole net/ subsystem; In
this situation, I think I should give an announcement to make the
former patch deprecated, shouldn't it?
However, I'll be more cautious with patches.

 Thans.
Thanks for applying.

-- 
Denis Cheng
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Please pull 'z1211' branch of wireless-2.6

2007-09-20 Thread Larry Finger

Daniel Drake wrote:
 John W. Linville wrote:

 If you are determined not to have it in 2.6.24 then I will relent.
 I will also suggest that Larry start sending any softmac bugs to
 you... :-)
 
 That's fine.

You're on. BTW, I will let you be the primary tester of [PATCH] fix softmac 
lockdep reports that
Johannes posted earlier today. I see you were CC'd. I plan on testing it with 
bcm43xx, but I won't
get to it for a couple of days.

Larry

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Please pull 'z1211' branch of wireless-2.6

2007-09-20 Thread Johannes Berg

On Thu, 2007-09-20 at 11:37 -0500, Larry Finger wrote:

 You're on. BTW, I will let you be the primary tester of [PATCH] fix softmac 
 lockdep reports that
 Johannes posted earlier today. I see you were CC'd. I plan on testing it with 
 bcm43xx, but I won't
 get to it for a couple of days.

The only thing it can possibly fix is our race against some other
functions that use the global workqueue and lock the RTNL from within
the work function while we have it locked while flushing. Conversely, it
can't really break anything either.

johannes


signature.asc
Description: This is a digitally signed message part

Re: [PATCH 2/3] netlink: the temp variable name max is ambiguous

2007-09-20 Thread rae l

On 9/17/07, David Miller [EMAIL PROTECTED] wrote:
 From: Denis Cheng [EMAIL PROTECTED]
 Date: Sun,  2 Sep 2007 03:45:58 +0800

  with the macro max provided by linux/kernel.h, so changed its name to a 
  more proper one: limit

  Signed-off-by: Denis Cheng [EMAIL PROTECTED]

 Not strictly necessary because CPP knows to differentiate between
 'max(' and plain 'max' when evaluating if a CPP macro should be
 expanded or not.
I also know the GNU CPP is intelligent, but people are often not.
I just think the avoidance to use human ambiguous names could give
more readability.

 Nonetheless, applied to net-2.6.24, thanks.

-- 
Denis Cheng
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: net-2.6.24 plans

2007-09-20 Thread John W. Linville

On Thu, Sep 20, 2007 at 04:50:52PM +0200, Johannes Berg wrote:
 On Thu, 2007-09-20 at 10:17 -0400, John W. Linville wrote:
 
   2) ATMEL USB driver
  
  These are both really new.  I think I'll transfer them to my
  wireless-2.6 tree, but still hold them back at least until 2.6.25.
 
 Also, atmel isn't even ported to mac80211 yet, is it?

Kalle Valo has done some work on this, and I think Eugene Teo has
joined the effort.  They both are in contact with Pavel to accomplish
the mac80211 port.

John
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2.6.23][BNX2]: Add PHY workaround for 5709 A1.

2007-09-20 Thread Michael Chan

[BNX2]: Add PHY workaround for 5709 A1.

Add the DIS_EARLY_DAC PHY workaround for 5709 A1.  Without it, link
sometimes does not come up.

Update version to 1.6.5.

Signed-off-by: Michael Chan [EMAIL PROTECTED]

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 854d80c..66eed22 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -54,8 +54,8 @@
 
 #define DRV_MODULE_NAMEbnx2
 #define PFX DRV_MODULE_NAME: 
-#define DRV_MODULE_VERSION 1.6.4
-#define DRV_MODULE_RELDATE August 3, 2007
+#define DRV_MODULE_VERSION 1.6.5
+#define DRV_MODULE_RELDATE September 20, 2007
 
 #define RUN_AT(x) (jiffies + (x))
 
@@ -6727,7 +6727,8 @@ bnx2_init_board(struct pci_dev *pdev, struct net_device 
*dev)
} else if (CHIP_NUM(bp) == CHIP_NUM_5706 ||
   CHIP_NUM(bp) == CHIP_NUM_5708)
bp-phy_flags |= PHY_CRC_FIX_FLAG;
-   else if (CHIP_ID(bp) == CHIP_ID_5709_A0)
+   else if (CHIP_ID(bp) == CHIP_ID_5709_A0 ||
+CHIP_ID(bp) == CHIP_ID_5709_A1)
bp-phy_flags |= PHY_DIS_EARLY_DAC_FLAG;
 
if ((CHIP_ID(bp) == CHIP_ID_5708_A0) ||


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] sb1250-mac.c: De-typedef, de-volatile, de-etc...

2007-09-20 Thread Jeff Garzik


Maciej W. Rozycki wrote:

On Thu, 20 Sep 2007, Jeff Garzik wrote:


You may be pleased (or less so) to hear that the version of sb1250-mac.c in
your tree does not even build (because of
42d53d6be113f974d8152979c88e1061b953bd12) and the patch below does not
address it.  I ran out of time in the evening, but I will send you a fix
shortly.  To be honest I think even with bulk changes it may be worth
checking whether they do not break stuff. ;-)

hrm.  I cannot get this to apply on top of linux-2.6.git,
netdev-2.6.git#upstream (prior to net-2.6.24 rebase) or
netdev-2.6.git#upstream (after net-2.6.24 rebase)


 It applies on top of current -mm.  It seems to apply to a copy of 
netdev-2.6.git#upstream that I have got, but I am probably missing 
something...  If I try to clone your repository again I get:


$ git clone 
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/linux-netdev-2.6.git linux
Initialized empty Git repository in 
/home/macro/GIT-other/linux-netdev/linux/.git/
fatal: The remote end hung up unexpectedly
fetch-pack from 
'git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/linux-netdev-2.6.git' 
failed.


Remove the linux- prefix.

Jeff



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets

2007-09-20 Thread Davide Libenzi

On Wed, 19 Sep 2007, Nagendra Tomar wrote:

 The tcp_check_space() function calls tcp_new_space() only if the
 SOCK_NOSPACE bit is set in the socket flags. This is causing Edge Triggered
 EPOLLOUT events to be missed for TCP sockets, as the ep_poll_callback() 
 is not called from the wakeup routine.
 
 The SOCK_NOSPACE bit indicates the user's intent to perform writes
 on that socket (set in tcp_sendmsg and tcp_poll). I believe the idea 
 behind the SOCK_NOSPACE check is to optimize away the tcp_new_space call
 in cases when user is not interested in writing to the socket. These two
 take care of all possible scenarios in which a user can convey his intent
 to write on that socket.
 
 Case 1: tcp_sendmsg detects lack of sndbuf space
 Case 2: tcp_poll returns not writable
 
 This is fine if we do not deal with epoll's Edge Triggered events (EPOLLET).
 With ET events we can have a scenario where the SOCK_NOSPACE bit is not set,
 as the user has neither done a sendmsg nor a poll/epoll call that returned
 with the POLLOUT condition not set. 

Looking back at it, I think the current TCP code is right, once you look 
at the event to be a output buffer full-with_space transition.
If you drop an fd inside epoll with EPOLLOUT|EPOLLET and you get an event 
(free space on the output buffer), if you do not consume it (say a 
tcp_sendmsg that re-fill the buffer), you can't see other OUT event 
anymore since they happen on the full-with_space transition.
Yes, I know, the read size (EPOLLIN) works differently and you get an 
event for every packet you receive. And yes, I do not like asymmetric 
things. But that does not make the EPOLLOUT|EPOLLET wrong IMO.



- Davide


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH: 2.6.13-15-SMP 3/3] network: concurrently run softirq network code on SMP

2007-09-20 Thread David Miller


The whole reason the queues are per-cpu is so that we do not
have to touch remote processor state nor use locks of any
kind whatsoever.

With multi-queue networking cards becoming more and more
available, which will split up the packet workload in
hardware across all available cpus, there is less and less
reason to make a patch like this one.

We've known about this issue for ages, and if we felt it
was appropriate to make this change, we would have done
so years ago.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH - net-2.6.24 0/2] Introduce and use print_ip and print_ipv6

2007-09-20 Thread Ilpo Järvinen

On Thu, 20 Sep 2007, Joe Perches wrote:

 On Thu, 2007-09-20 at 07:55 -0700, Randy Dunlap wrote:
  How large are the patches if you posted them for review instead
  of just referencing gits for them?  (which cuts down on review
  possibilities)
 
 The v4 is ~130kb, the v6 ~35kb.

 There is a gitweb available at:
 
 print_ip:
 http://repo.or.cz/w/linux-2.6/trivial-mods.git?a=shortlog;h=print_ipv4
 commit diff:
 http://repo.or.cz/w/linux-2.6/trivial-mods.git?a=commitdiff;h=1e3a30d5d8b49b3accca07cc84ecf6d977cacdd5
 
 print_ipv6:
 http://repo.or.cz/w/linux-2.6/trivial-mods.git?a=shortlog;h=print_ipv6
 commit diff:
 http://repo.or.cz/w/linux-2.6/trivial-mods.git?a=commitdiff;h=e96b794a57a164db84379e2baf5fe2622a5ae3bf

...Alternatively you could split it up a bit and send those smaller 
chunks for reviewing purposes only (even though it would be combined
to a single big patch in the end).

-- 
 i.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.23-rc6 Resending] NETWORKING : Edge Triggered EPOLLOUT events get missed for TCP sockets

2007-09-20 Thread Davide Libenzi

On Thu, 20 Sep 2007, Eric Dumazet wrote:

 Does it means that with your patch each ACK on a ET managed socket will
 trigger an epoll event   ?
 
 Maybe your very sensitive high throuput appication needs to set a flag or
 something at socket level to ask for such a behavior.
 
 The default should stay as is. That is an event should be sent only if someone
 cared about the wakeup.

Unfortunately f_op-poll() does not let the caller to specify the events 
it's interested in, that would allow to split send/recevie wait queues and 
better detect read/write cases.
The detection of a waitqueue_active(-sk_wr_sleep) would work fine in 
detecting is someone is actually waiting for a write, w/out the false 
positives triggered by the read-waiters.
That would be a very sane thing to do, but would require a bigdumb change 
to all the -poll around (that could be automated by a script - devices 
not caring about the events hint can just continue to use the single queue 
like they currently do), and a more critical and gradual change of all the 
devices that wants to take advantage of it.
That way, no more magic bits are needed, and a simple waitqueue_active() 
would tell you if someone is waiting for write-space events.



- Davide


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.23][BNX2]: Add PHY workaround for 5709 A1.

2007-09-20 Thread David Miller

From: Michael Chan [EMAIL PROTECTED]
Date: Thu, 20 Sep 2007 11:07:13 -0700

 [BNX2]: Add PHY workaround for 5709 A1.

 Add the DIS_EARLY_DAC PHY workaround for 5709 A1.  Without it, link
 sometimes does not come up.

 Update version to 1.6.5.

 Signed-off-by: Michael Chan [EMAIL PROTECTED]

Applied, thanks Michael.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Bug, PATCH and another Bug] Was: Fix refcounting problem with netif_rx_reschedule()

2007-09-20 Thread David Miller

From: Krishna Kumar2 [EMAIL PROTECTED]
Date: Thu, 20 Sep 2007 11:24:01 +0530

 Ran 4/16/64 thread iperf on latest bits with this patch and no issues after
 30 mins. I used to
 consistently get the bug within 1-2 mins with just 4 threads prior to this
 patch.

 Tested-by: Krishna Kumar [EMAIL PROTECTED]
 (if any value in that)

There is much value in that :-)  Thanks a lot Kirshna.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH: 2.6.13-15-SMP 3/3] network: concurrently run softirq network code on SMP

2007-09-20 Thread Stephen Hemminger

On Thu, 20 Sep 2007 21:04:16 +0800
john ye [EMAIL PROTECTED] wrote:

 Bottom Softirq Implementation. John Ye, 2007.08.27
 
 Why this patch:
 Make kernel be able to concurrently execute softirq's net code on SMP 
 system.
 Takes full advantages of SMP to handle more packets and greatly raises NIC 
 throughput.
 The current kernel's net packet processing logic is:
 1) The CPU which handles a hardirq must be executing its related softirq.
 2) One softirq instance(irqs handled by 1 CPU) can't be executed on more 
 than 2 CPUs
 at the same time.
 The limitation make kernel network be hard to take the advantages of SMP.
 
 How this patch:
 It splits the current softirq code into 2 parts: the cpu-sensitive top half,
 and the cpu-insensitive bottom half, then make bottom half(calld BS) be
 executed on SMP concurrently.
 The two parts are not equal in terms of size and load. Top part has constant 
 code
 size(mainly, in net/core/dev.c and NIC drivers), while bottom part involves
 netfilter(iptables) whose load varies very much. An iptalbes with 1000 rules 
 to match
 will make the bottom part's load be very high. So, if the bottom part 
 softirq
 can be randomly distributed to processors and run concurrently on them, the 
 network will
 gain much more packet handling capacity, network throughput will be be 
 increased
 remarkably.
 
 Where useful:
 It's useful on SMP machines that meet the following 2 conditions:
 1) have high kernel network load, for example, running iptables with 
 thousands of rules, etc).
 2) have more CPUs than active NICs, e.g. a 4 CPUs machine with 2 NICs).
 On these system, with the increase of softirq load, some CPUs will be idle
 while others(number is equal to # of NIC) keeps busy.
 IRQBALANCE will help, but it only shifts IRQ among CPUS, makes no softirq 
 concurrency.
 Balancing the load of each cpus will not remarkably increase network speed.
 
 Where NOT useful:
 If the bottom half of softirq is too small(without running iptables), or the 
 network
 is too idle, BS patch will not be seen to have visible effect. But It has no
 negative affect either.
 User can turn on/off BS functionality by /proc/sys/net/bs_enable switch.
 


If I read this correctly. You basically changed network processing from softirq
to workqueue (which is pretty much what -rt does). Perhaps optimizing and/or 
rearchitecting netfilter rule processing would get more benefit.

But you are ignoring the issue of all the locking assumptions that get changed.
Any performance gain from getting SMP will probably be lost by the additional
locking required.

Also patch is formatted badly, has multiple style issues (indentation etc).
If you want to have it seriously considered, follow the 
Documentation/CodingStyle guidelines.
There is even a perl script to check it scripts/checkpatch.pl

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] sb1250-mac.c: De-typedef, de-volatile, de-etc...

2007-09-20 Thread Maciej W. Rozycki

 Remove typedefs, volatiles and convert kmalloc()/memset() pairs to
kcalloc().  Also reformat the surrounding clutter.

Signed-off-by: Maciej W. Rozycki [EMAIL PROTECTED]
---
On Thu, 20 Sep 2007, Jeff Garzik wrote:

 Remove the linux- prefix.

 Hmm, it looks like a bad application of `sed' by myself.  Sorry for the 
noise.

  Maciej

patch-netdev-2.6.23-rc6-20070920-sb1250-mac-typedef-9
diff -up --recursive --new-file 
linux-netdev-2.6.23-rc6-20070920.macro/drivers/net/sb1250-mac.c 
linux-netdev-2.6.23-rc6-20070920/drivers/net/sb1250-mac.c
--- linux-netdev-2.6.23-rc6-20070920.macro/drivers/net/sb1250-mac.c 
2007-09-20 17:55:14.0 +
+++ linux-netdev-2.6.23-rc6-20070920/drivers/net/sb1250-mac.c   2007-09-20 
18:09:18.0 +
@@ -140,17 +140,17 @@ MODULE_PARM_DESC(int_timeout_rx, RX tim
  * */
 
 
-typedef enum { sbmac_speed_auto, sbmac_speed_10,
-  sbmac_speed_100, sbmac_speed_1000 } sbmac_speed_t;
+enum sbmac_speed { sbmac_speed_auto, sbmac_speed_10,
+  sbmac_speed_100, sbmac_speed_1000 };
 
-typedef enum { sbmac_duplex_auto, sbmac_duplex_half,
-  sbmac_duplex_full } sbmac_duplex_t;
+enum sbmac_duplex { sbmac_duplex_auto, sbmac_duplex_half,
+   sbmac_duplex_full };
 
-typedef enum { sbmac_fc_auto, sbmac_fc_disabled, sbmac_fc_frame,
-  sbmac_fc_collision, sbmac_fc_carrier } sbmac_fc_t;
+enum sbmac_fc { sbmac_fc_auto, sbmac_fc_disabled, sbmac_fc_frame,
+   sbmac_fc_collision, sbmac_fc_carrier } sbmac_fc_t;
 
-typedef enum { sbmac_state_uninit, sbmac_state_off, sbmac_state_on,
-  sbmac_state_broken } sbmac_state_t;
+enum sbmac_state { sbmac_state_uninit, sbmac_state_off, sbmac_state_on,
+  sbmac_state_broken };
 
 
 /**
@@ -176,55 +176,61 @@ typedef enum { sbmac_state_uninit, sbmac
  *  DMA Descriptor structure
  * */
 
-typedef struct sbdmadscr_s {
+struct sbdmadscr {
uint64_t  dscr_a;
uint64_t  dscr_b;
-} sbdmadscr_t;
-
-typedef unsigned long paddr_t;
+};
 
 /**
  *  DMA Controller structure
  * */
 
-typedef struct sbmacdma_s {
+struct sbmacdma {
 
/*
 * This stuff is used to identify the channel and the registers
 * associated with it.
 */
-
-   struct sbmac_softc *sbdma_eth;  /* back pointer to associated MAC */
-   int  sbdma_channel; /* channel number */
-   int  sbdma_txdir;   /* direction (1=transmit) */
-   int  sbdma_maxdescr;/* total # of descriptors in ring */
+   struct sbmac_softc  *sbdma_eth; /* back pointer to associated
+  MAC */
+   int sbdma_channel;  /* channel number */
+   int sbdma_txdir;/* direction (1=transmit) */
+   int sbdma_maxdescr; /* total # of descriptors
+  in ring */
 #ifdef CONFIG_SBMAC_COALESCE
-   int  sbdma_int_pktcnt;  /* # descriptors rx/tx before 
interrupt*/
-   int  sbdma_int_timeout; /* # usec rx/tx interrupt */
+   int sbdma_int_pktcnt;
+   /* # descriptors rx/tx
+  before interrupt */
+   int sbdma_int_timeout;
+   /* # usec rx/tx interrupt */
 #endif
-
-   volatile void __iomem *sbdma_config0;   /* DMA config register 0 */
-   volatile void __iomem *sbdma_config1;   /* DMA config register 1 */
-   volatile void __iomem *sbdma_dscrbase;  /* Descriptor base address */
-   volatile void __iomem *sbdma_dscrcnt;   /* Descriptor count register */
-   volatile void __iomem *sbdma_curdscr;   /* current descriptor address */
-   volatile void __iomem *sbdma_oodpktlost;/* pkt drop (rx only) */
-
+   void __iomem*sbdma_config0; /* DMA config register 0 */
+   void __iomem*sbdma_config1; /* DMA config register 1 */
+   void __iomem*sbdma_dscrbase;
+   /* descriptor base address */
+   void __iomem*sbdma_dscrcnt; /* descriptor count register */
+   void __iomem*sbdma_curdscr; /* current descriptor
+  address */
+   void __iomem*sbdma_oodpktlost;
+   /* pkt drop (rx only) */
 
/*
 * This stuff is for maintenance of the ring

Re: [PATCH 1/9] [TCP]: Maintain highest_sack accurately to the highest skb

2007-09-20 Thread David Miller

From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Thu, 20 Sep 2007 15:17:44 +0300

 In general, it should not be necessary to call tcp_fragment for
 already SACKed skbs, but it's better to be safe than sorry. And
 indeed, it can be called from sacktag when a DSACK arrives or
 some ACK (with SACK) reordering occurs (sacktag could be made
 to avoid the call in the latter case though I'm not sure if it's
 worth of the trouble and added complexity to cover such marginal
 case).

 The collapse case has return for SACKED_ACKED case earlier, so
 just WARN_ON if internal inconsistency is detected for some
 reason.

 Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]

Applied, thanks Ilpo.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/9] [TCP]: Make fackets_out accurate

2007-09-20 Thread David Miller

From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Thu, 20 Sep 2007 15:17:45 +0300

 Substraction for fackets_out is unconditional when snd_una
 advances, thus there's no need to do it inside the loop. Just
 make sure correct bounds are honored.

 Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]

Applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/9] [TCP]: clear_all_retrans_hints prefixed by tcp_

2007-09-20 Thread David Miller

From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Thu, 20 Sep 2007 15:17:46 +0300

 In addition, fix its function comment spacing.

 Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]

Applied.

 -/*from STCP */
 -static inline void clear_all_retrans_hints(struct tcp_sock *tp){
 +/* from STCP */
 +static inline void tcp_clear_all_retrans_hints(struct tcp_sock *tp) {

This brace should also be on a line by itself.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/9] [TCP]: Move accounting from tso_acked to clean_rtx_queue

2007-09-20 Thread David Miller

From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Thu, 20 Sep 2007 15:17:47 +0300

 The accounting code is pretty much the same, so it's a shame
 we do it in two places.

 I'm not too sure if added fully_acked check in MTU probing is
 really what we want perhaps the added end_seq could be used in
 the after() comparison.

Indeed there are a bunch of tradeoffs to consider when
handling the TSO-partial-ack cases.

 Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]

Applied, thanks Ilpo.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/9] [TCP]: Cleanup tcp_tso_acked and tcp_clean_rtx_queue

2007-09-20 Thread David Miller

From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Thu, 20 Sep 2007 15:17:48 +0300

 Implements following cleanups:
 - Comment re-placement (CodingStyle)
 - tcp_tso_acked() local (wrapper-like) variable removal
   (readability)
 - __-types removed (IMHO they make local variables jumpy looking
   and just was space)
 - acked - flag (naming conventions elsewhere in TCP code)
 - linebreak adjustments (readability)
 - nested if()s combined (reduced indentation)
 - clarifying newlines added
 
 Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]

Applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/9] [TCP] FRTO: Improve interoperability with other undo_marker users

2007-09-20 Thread David Miller

From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Thu, 20 Sep 2007 15:17:49 +0300

 Basically this change enables it, previously other undo_marker
 users were left with nothing. Reverse undo_marker logic
 completely to get it set right in CA_Loss. On the other hand,
 when spurious RTO is detected, clear it. Clearing might be too
 heavy for some scenarios but seems safe enough starting point
 for now and shouldn't have much effect except in majority of
 cases (if in any).

 By adding a new FLAG_ we avoid looping through write_queue when
 RTO occurs.

 Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]

Applied.  Thanks for following up on all of this stuff to
get FRTO in shape.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 7/9] [TCP] FRTO: Update sysctl documentation

2007-09-20 Thread David Miller

From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Thu, 20 Sep 2007 15:17:50 +0300

 Since the SACK enhanced FRTO was added, the code has been
 under test numerous times so remove experimental claim
 from the documentation. Also be a bit more verbose about
 the usage.

 Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]

APplied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 8/9] [TCP]: Enable SACK enhanced FRTO (RFC4138) by default

2007-09-20 Thread David Miller

From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Thu, 20 Sep 2007 15:17:51 +0300

 Most of the description that follows comes from my mail to
 netdev (some editing done):
 ...
 Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]

Applied, thanks Ilpo!
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 9/9] [TCP]: Avoid clearing sacktag hint in trivial situations

2007-09-20 Thread David Miller

From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Thu, 20 Sep 2007 15:17:52 +0300

 There's no reason to clear the sacktag skb hint when small part
 of the rexmit queue changes. Account changes (if any) instead when
 fragmenting/collapsing. RTO/FRTO do not touch SACKED_ACKED bits so
 no need to discard SACK tag hint at all.
 
 Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]

Applied, and I followed it up with this coding style fixlet.

Thanks!

commit e3723ad866a1e0690f3bc32443180ec1f6657f4a
Author: David S. Miller [EMAIL PROTECTED]
Date:   Thu Sep 20 11:40:37 2007 -0700

[TCP]: Minor coding style fixup.

Signed-off-by: David S. Miller [EMAIL PROTECTED]

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 07b1faa..991ccdc 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1067,14 +1067,16 @@ static inline void tcp_mib_init(void)
 }
 
 /* from STCP */
-static inline void tcp_clear_retrans_hints_partial(struct tcp_sock *tp) {
+static inline void tcp_clear_retrans_hints_partial(struct tcp_sock *tp)
+{
tp-lost_skb_hint = NULL;
tp-scoreboard_skb_hint = NULL;
tp-retransmit_skb_hint = NULL;
tp-forward_skb_hint = NULL;
 }
 
-static inline void tcp_clear_all_retrans_hints(struct tcp_sock *tp) {
+static inline void tcp_clear_all_retrans_hints(struct tcp_sock *tp)
+{
tcp_clear_retrans_hints_partial(tp);
tp-fastpath_skb_hint = NULL;
 }
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [git patches] net driver updates

2007-09-20 Thread David Miller

From: Jeff Garzik [EMAIL PROTECTED]
Date: Thu, 20 Sep 2007 03:26:10 -0400

 Please pull from the 'upstream' branch of
 master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git upstream

 to receive the following changes:

Pulled into net-2.6.24 and pushed out, thanks Jeff!
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/7] CAN: Add raw protocol

2007-09-20 Thread Urs Thuermann

This patch adds the CAN raw protocol.

Signed-off-by: Oliver Hartkopp [EMAIL PROTECTED]
Signed-off-by: Urs Thuermann [EMAIL PROTECTED]

---
 include/linux/can/raw.h |   31 +
 net/can/Kconfig |   26 +
 net/can/Makefile|3 
 net/can/raw.c   |  828 
 4 files changed, 888 insertions(+)

Index: net-2.6.24/include/linux/can/raw.h
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ net-2.6.24/include/linux/can/raw.h  2007-09-20 18:48:59.0 +0200
@@ -0,0 +1,31 @@
+/*
+ * linux/can/raw.h
+ *
+ * Definitions for raw CAN sockets
+ *
+ * Authors: Oliver Hartkopp [EMAIL PROTECTED]
+ *  Urs Thuermann   [EMAIL PROTECTED]
+ * Copyright (c) 2002-2007 Volkswagen Group Electronic Research
+ * All rights reserved.
+ *
+ * Send feedback to [EMAIL PROTECTED]
+ *
+ */
+
+#ifndef CAN_RAW_H
+#define CAN_RAW_H
+
+#include linux/can.h
+
+#define SOL_CAN_RAW (SOL_CAN_BASE + CAN_RAW)
+
+/* for socket options affecting the socket (not the global system) */
+
+enum {
+   CAN_RAW_FILTER = 1, /* set 0 .. n can_filter(s)  */
+   CAN_RAW_ERR_FILTER, /* set filter for error frames   */
+   CAN_RAW_LOOPBACK,   /* local loopback (default:on)   */
+   CAN_RAW_RECV_OWN_MSGS   /* receive my own msgs (default:off) */
+};
+
+#endif
Index: net-2.6.24/net/can/Kconfig
===
--- net-2.6.24.orig/net/can/Kconfig 2007-09-20 18:48:58.0 +0200
+++ net-2.6.24/net/can/Kconfig  2007-09-20 18:48:59.0 +0200
@@ -16,6 +16,32 @@
  If you want CAN support, you should say Y here and also to the
  specific driver for your controller(s) below.
 
+config CAN_RAW
+   tristate Raw CAN Protocol (raw access with CAN-ID filtering)
+   depends on CAN
+   default N
+   ---help---
+ The Raw CAN protocol option offers access to the CAN bus via
+ the BSD socket API. You probably want to use the raw socket in
+ most cases where no higher level protocol is being used. The raw
+ socket has several filter options e.g. ID-Masking / Errorframes.
+ To receive/send raw CAN messages, use AF_CAN with protocol CAN_RAW.
+
+config CAN_RAW_USER
+   bool Allow non-root users to access Raw CAN Protocol sockets
+   depends on CAN_RAW
+   default N
+   ---help---
+ The Controller Area Network is a local field bus transmitting only
+ broadcast messages without any routing and security concepts.
+ In the majority of cases the user application has to deal with
+ raw CAN frames. Therefore it might be reasonable NOT to restrict
+ the CAN access only to the user root, as known from other networks.
+ Since CAN_RAW sockets can only send and receive frames to/from CAN
+ interfaces this does not affect security of others networks.
+ Say Y here if you want non-root users to be able to access CAN_RAW
+ sockets.
+
 config CAN_DEBUG_CORE
bool CAN Core debugging messages
depends on CAN
Index: net-2.6.24/net/can/Makefile
===
--- net-2.6.24.orig/net/can/Makefile2007-09-20 18:48:58.0 +0200
+++ net-2.6.24/net/can/Makefile 2007-09-20 18:48:59.0 +0200
@@ -4,3 +4,6 @@
 
 obj-$(CONFIG_CAN)  += can.o
 can-objs   := af_can.o proc.o
+
+obj-$(CONFIG_CAN_RAW)  += can-raw.o
+can-raw-objs   := raw.o
Index: net-2.6.24/net/can/raw.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ net-2.6.24/net/can/raw.c2007-09-20 18:48:59.0 +0200
@@ -0,0 +1,828 @@
+/*
+ * raw.c - Raw sockets for protocol family CAN
+ *
+ * Copyright (c) 2002-2007 Volkswagen Group Electronic Research
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *notice, this list of conditions, the following disclaimer and
+ *the referenced file 'COPYING'.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in the
+ *documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of Volkswagen nor the names of its contributors
+ *may be used to endorse or promote products derived from this software
+ *without specific prior written permission.
+ *
+ * Alternatively, provided that this notice is retained in full, this
+ * software may be distributed under the terms of the GNU General
+ * Public License (GPL) version 2 as distributed in the 'COPYING'
+ * file from the main directory of the

[PATCH 1/7] CAN: Allocate protocol numbers for PF_CAN

2007-09-20 Thread Urs Thuermann

This patch adds a protocol/address family number, ARP hardware type,
ethernet packet type, and a line discipline number for the SocketCAN
implementation.

Signed-off-by: Oliver Hartkopp [EMAIL PROTECTED]
Signed-off-by: Urs Thuermann [EMAIL PROTECTED]

---
 include/linux/if_arp.h   |1 +
 include/linux/if_ether.h |1 +
 include/linux/socket.h   |2 ++
 include/linux/tty.h  |3 ++-
 net/core/sock.c  |4 ++--
 5 files changed, 8 insertions(+), 3 deletions(-)

Index: net-2.6.24/include/linux/if_arp.h
===
--- net-2.6.24.orig/include/linux/if_arp.h  2007-09-20 18:48:21.0 
+0200
+++ net-2.6.24/include/linux/if_arp.h   2007-09-20 18:48:57.0 +0200
@@ -52,6 +52,7 @@
 #define ARPHRD_ROSE270
 #define ARPHRD_X25 271 /* CCITT X.25   */
 #define ARPHRD_HWX25   272 /* Boards with X.25 in firmware */
+#define ARPHRD_CAN 280 /* Controller Area Network  */
 #define ARPHRD_PPP 512
 #define ARPHRD_CISCO   513 /* Cisco HDLC   */
 #define ARPHRD_HDLCARPHRD_CISCO
Index: net-2.6.24/include/linux/if_ether.h
===
--- net-2.6.24.orig/include/linux/if_ether.h2007-09-20 18:48:21.0 
+0200
+++ net-2.6.24/include/linux/if_ether.h 2007-09-20 18:48:57.0 +0200
@@ -90,6 +90,7 @@
 #define ETH_P_WAN_PPP   0x0007  /* Dummy type for WAN PPP frames*/
 #define ETH_P_PPP_MP0x0008  /* Dummy type for PPP MP frames */
 #define ETH_P_LOCALTALK 0x0009 /* Localtalk pseudo type*/
+#define ETH_P_CAN  0x000C  /* Controller Area Network  */
 #define ETH_P_PPPTALK  0x0010  /* Dummy type for Atalk over PPP*/
 #define ETH_P_TR_802_2 0x0011  /* 802.2 frames */
 #define ETH_P_MOBITEX  0x0015  /* Mobitex ([EMAIL PROTECTED])  */
Index: net-2.6.24/include/linux/socket.h
===
--- net-2.6.24.orig/include/linux/socket.h  2007-09-20 18:48:21.0 
+0200
+++ net-2.6.24/include/linux/socket.h   2007-09-20 18:48:57.0 +0200
@@ -185,6 +185,7 @@
 #define AF_PPPOX   24  /* PPPoX sockets*/
 #define AF_WANPIPE 25  /* Wanpipe API Sockets */
 #define AF_LLC 26  /* Linux LLC*/
+#define AF_CAN 29  /* Controller Area Network  */
 #define AF_TIPC30  /* TIPC sockets */
 #define AF_BLUETOOTH   31  /* Bluetooth sockets*/
 #define AF_IUCV32  /* IUCV sockets */
@@ -220,6 +221,7 @@
 #define PF_PPPOX   AF_PPPOX
 #define PF_WANPIPE AF_WANPIPE
 #define PF_LLC AF_LLC
+#define PF_CAN AF_CAN
 #define PF_TIPCAF_TIPC
 #define PF_BLUETOOTH   AF_BLUETOOTH
 #define PF_IUCVAF_IUCV
Index: net-2.6.24/include/linux/tty.h
===
--- net-2.6.24.orig/include/linux/tty.h 2007-09-20 18:48:21.0 +0200
+++ net-2.6.24/include/linux/tty.h  2007-09-20 18:48:57.0 +0200
@@ -24,7 +24,7 @@
 #define NR_PTYSCONFIG_LEGACY_PTY_COUNT   /* Number of legacy ptys */
 #define NR_UNIX98_PTY_DEFAULT  4096  /* Default maximum for Unix98 ptys */
 #define NR_UNIX98_PTY_MAX  (1  MINORBITS) /* Absolute limit */
-#define NR_LDISCS  17
+#define NR_LDISCS  18
 
 /* line disciplines */
 #define N_TTY  0
@@ -45,6 +45,7 @@
 #define N_SYNC_PPP 14  /* synchronous PPP */
 #define N_HCI  15  /* Bluetooth HCI UART */
 #define N_GIGASET_M101 16  /* Siemens Gigaset M101 serial DECT adapter */
+#define N_SLCAN17  /* Serial / USB serial CAN Adaptors */
 
 /*
  * This character is the same as _POSIX_VDISABLE: it cannot be used as
Index: net-2.6.24/net/core/sock.c
===
--- net-2.6.24.orig/net/core/sock.c 2007-09-20 18:48:21.0 +0200
+++ net-2.6.24/net/core/sock.c  2007-09-20 18:48:57.0 +0200
@@ -154,7 +154,7 @@
   sk_lock-AF_ASH   , sk_lock-AF_ECONET   , sk_lock-AF_ATMSVC   ,
   sk_lock-21   , sk_lock-AF_SNA  , sk_lock-AF_IRDA ,
   sk_lock-AF_PPPOX , sk_lock-AF_WANPIPE  , sk_lock-AF_LLC  ,
-  sk_lock-27   , sk_lock-28  , sk_lock-29  ,
+  sk_lock-27   , sk_lock-28  , sk_lock-AF_CAN  ,
   sk_lock-AF_TIPC  , sk_lock-AF_BLUETOOTH, sk_lock-IUCV,
   sk_lock-AF_RXRPC , sk_lock-AF_MAX
 };
@@ -168,7 +168,7 @@
   slock-AF_ASH   , slock-AF_ECONET   , slock-AF_ATMSVC   ,
   slock-21   , slock-AF_SNA  , slock-AF_IRDA ,
   slock-AF_PPPOX , slock-AF_WANPIPE  , slock-AF_LLC  ,
-  slock-27   , slock-28  , slock-29  ,
+  slock-27   ,

[PATCH 7/7] CAN: Add documentation

2007-09-20 Thread Urs Thuermann

This patch adds documentation for the PF_CAN protocol family.

Signed-off-by: Oliver Hartkopp [EMAIL PROTECTED]
Signed-off-by: Urs Thuermann [EMAIL PROTECTED]

---
 Documentation/networking/00-INDEX |2 
 Documentation/networking/can.txt  |  635 ++
 2 files changed, 637 insertions(+)

Index: net-2.6.24/Documentation/networking/can.txt
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ net-2.6.24/Documentation/networking/can.txt 2007-09-20 18:49:01.0 
+0200
@@ -0,0 +1,635 @@
+
+
+can.txt
+
+Readme file for the Controller Area Network Protocol Family (aka Socket CAN)
+
+This file contains
+
+  1 Overview / What is Socket CAN
+
+  2 Motivation / Why using the socket API
+
+  3 Socket CAN concept
+3.1 receive lists
+3.2 loopback
+3.3 network security issues (capabilities)
+3.4 network problem notifications
+
+  4 How to use Socket CAN
+4.1 RAW protocol sockets with can_filters (SOCK_RAW)
+  4.1.1 RAW socket option CAN_RAW_FILTER
+  4.1.2 RAW socket option CAN_RAW_ERR_FILTER
+  4.1.3 RAW socket option CAN_RAW_LOOPBACK
+  4.1.4 RAW socket option CAN_RAW_RECV_OWN_MSGS
+4.2 Broadcast Manager protocol sockets (SOCK_DGRAM)
+4.3 connected transport protocols (SOCK_SEQPACKET)
+4.4 unconnected transport protocols (SOCK_DGRAM)
+
+  5 Socket CAN core module
+5.1 can.ko module params
+5.2 procfs content
+5.3 writing own CAN protocol modules
+
+  6 CAN network drivers
+6.1 general settings
+6.2 loopback
+6.3 CAN controller hardware filters
+6.4 currently supported CAN hardware
+6.5 todo
+
+  7 Credits
+
+
+
+1. Overview / What is Socket CAN
+
+
+The socketcan package is an implementation of CAN protocols
+(Controller Area Network) for Linux.  CAN is a networking technology
+which has widespread use in automation, embedded devices, and
+automotive fields.  While there have been other CAN implementations
+for Linux based on character devices, Socket CAN uses the Berkeley
+socket API, the Linux network stack and implements the CAN device
+drivers as network interfaces.  The CAN socket API has been designed
+as similar as possible to the TCP/IP protocols to allow programmers,
+familiar with network programming, to easily learn how to use CAN
+sockets.
+
+2. Motivation / Why using the socket API
+
+
+There have been CAN implementations for Linux before Socket CAN so the
+question arises, why we have started another project.  Most existing
+implementations come as a device driver for some CAN hardware, they
+are based on character devices and provide comparatively little
+functionality.  Usually, there is only a hardware-specific device
+driver which provides a character device interface to send and
+receive raw CAN frames, directly to/from the controller hardware.
+Queueing of frames and higher-level transport protocols like ISO-TP
+have to be implemented in user space applications.  Also, most
+character-device implementations support only one single process to
+open the device at a time, similar to a serial interface.  Exchanging
+the CAN controller requires employment of another device driver and
+often the need for adaption of large parts of the application to the
+new driver's API.
+
+Socket CAN was designed to overcome all of these limitations.  A new
+protocol family has been implemented which provides a socket interface
+to user space applications and which builds upon the Linux network
+layer, so to use all of the provided queueing functionality.  A device
+driver for CAN controller hardware registers itself with the Linux
+network layer as a network device, so that CAN frames from the
+controller can be passed up to the network layer and on to the CAN
+protocol family module and also vice-versa.  Also, the protocol family
+module provides an API for transport protocol modules to register, so
+that any number of transport protocols can be loaded or unloaded
+dynamically.  In fact, the can core module alone does not provide any
+protocol and cannot be used without loading at least one additional
+protocol module.  Multiple sockets can be opened at the same time,
+on different or the same protocol module and they can listen/send
+frames on different or the same CAN IDs.  Several sockets listening on
+the same interface for frames with the same CAN ID are all passed the
+same received matching CAN frames.  An application wishing to
+communicate using a specific transport protocol, e.g. ISO-TP, just
+selects that protocol when opening the socket, and then can read and
+write application data byte streams, without having to deal with
+CAN-IDs, frames, etc.
+
+Similar functionality visible from user-space

[PATCH 0/7] CAN: Add new PF_CAN protocol family, try #7

2007-09-20 Thread Urs Thuermann

Hello Dave, hello Patrick,

this is the seventh post of the patch series that adds the PF_CAN
protocol family for the Controller Area Network.

Since our last post we have changed the following:

* Changes suggested by Patrick:
  - protect proto_tab[] by a lock.
  - add _rcu to some hlist traversals.
  - use printk_ratelimit() for module autoload failures.
  - make can_proto_unregister() and can_rx_unregister() return void.
  - use return value of can_proto_register() and can_rx_register()
(this also removed a flaw in behavior of raw_bind() and raw_setsockopt()
 in case of failure to can_rx_register() their filters).
  - call kzalloc() with GFP_KERNEL in case NETDEV_REGISTER.
  - use round_jiffies() to calculate expiration times.
  - make some variables static and/or __read_mostly.
  - in can_create() check for net namespace before auto loading modules.
  - add build time check for struct sizes.
  - use skb_share_chack() in vcan.
  - fixed some comments.
* Typos in documentation as pointed out by Randy Dunlap and Bill Fink.

The changes in try #6 were:

* Update code to work with namespaces in net-2.6.24.
* Remove SET_MODULE_OWNER() from vcan.

The changes in try #5 were:

* Remove slab destructor from calls to kmem_cache_alloc().
* Add comments about types defined in can.h.
* Update comment on vcan loopback module parameter.
* Fix typo in documentation.

The changes in try #4 were:

* Change vcan network driver to use the new RTNL API, as suggested by
  Patrick.
* Revert our change to use skb-iif instead of skb-cb.  After
  discussion with Patrick and Jamal it turned out, our first
  implementation was correct.
* Use skb_tail_pointer() instead of skb-tail directly.
* Coding style changes to satisfy linux/scripts/checkpatch.pl.
* Minor changes for 64-bit-cleanliness.
* Minor cleanup of #include's

The changes in try #3 were:

* Use sbk-sk and skb-pkt_type instead of skb-cb to pass loopback
  flags and originating socket down to the driver and back to the
  receiving socket.  Thanks to Patrick McHardy for pointing out our
  wrong use of sbk-cb.
* Use skb-iif instead of skb-cb to pass receiving interface from
  raw_rcv() and bcm_rcv() up to raw_recvmsg() and bcm_recvmsg().
* Set skb-protocol when sending CAN frames to netdevices.
* Removed struct raw_opt and struct bcm_opt and integrated these
  directly into struct raw_sock and bcm_sock resp., like most other
  proto implementations do.
* We have found and fixed race conditions between raw_bind(),
  raw_{set,get}sockopt() and raw_notifier().  This resulted in
  - complete removal of our own notifier list infrastructure in
af_can.c.  raw.c and bcm.c now use normal netdevice notifiers.
  - removal of ro-lock spinlock.  We use lock_sock(sk) now.
  - changed deletion of dev_rcv_lists, which are now marked for
deletion in the netdevice notifier in af_can.c and are actually
deleted when all entries have been deleted using can_rx_unregister().
* Follow changes in 2.6.22 (e.g. ktime_t timestamps in skb).
* Removed obsolete code from vcan.c, as pointed out by Stephen Hemminger.

The changes in try #2 were:

* reduced RCU callback overhead when deleting receiver lists (thx to
  feedback from Paul E. McKenney).
* eliminated some code duplication in net/can/proc.c.
* renamed slock-29 and sk_lock-29 to slock-AF_CAN and sk_lock-AF_CAN in
  net/core/sock.c
* added entry for can.txt in Documentation/networking/00-INDEX
* added error frame definitions in include/linux/can/error.h, which are to
  be used by CAN network drivers.


This patch series applies against net-2.6.24 and is derived from Subversion
revision r484 of http://svn.berlios.de/svnroot/repos/socketcan.
It can be found in the directory
http://svn.berlios.de/svnroot/repos/socketcan/trunk/patch-series/version.

Thanks very much for your work!

Best regards,

Urs Thuermann
Oliver Hartkopp


P.S. Greetings from some BSD and Linux users here at the LUG meeting in
 Braunschweig :-)
--
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/7] CAN: Add virtual CAN netdevice driver

2007-09-20 Thread Urs Thuermann

This patch adds the virtual CAN bus (vcan) network driver.
The vcan device is just a loopback device for CAN frames, no
real CAN hardware is involved.

Signed-off-by: Oliver Hartkopp [EMAIL PROTECTED]
Signed-off-by: Urs Thuermann [EMAIL PROTECTED]

---
 drivers/net/Makefile |1 
 drivers/net/can/Kconfig  |   25 +
 drivers/net/can/Makefile |5 +
 drivers/net/can/vcan.c   |  208 +++
 net/can/Kconfig  |3 
 5 files changed, 242 insertions(+)

Index: net-2.6.24/drivers/net/Makefile
===
--- net-2.6.24.orig/drivers/net/Makefile2007-09-20 18:48:21.0 
+0200
+++ net-2.6.24/drivers/net/Makefile 2007-09-20 18:49:00.0 +0200
@@ -10,6 +10,7 @@
 obj-$(CONFIG_CHELSIO_T1) += chelsio/
 obj-$(CONFIG_CHELSIO_T3) += cxgb3/
 obj-$(CONFIG_EHEA) += ehea/
+obj-$(CONFIG_CAN) += can/
 obj-$(CONFIG_BONDING) += bonding/
 obj-$(CONFIG_ATL1) += atl1/
 obj-$(CONFIG_GIANFAR) += gianfar_driver.o
Index: net-2.6.24/drivers/net/can/Kconfig
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ net-2.6.24/drivers/net/can/Kconfig  2007-09-20 18:49:00.0 +0200
@@ -0,0 +1,25 @@
+menu CAN Device Drivers
+   depends on CAN
+
+config CAN_VCAN
+   tristate Virtual Local CAN Interface (vcan)
+   depends on CAN
+   default N
+   ---help---
+ Similar to the network loopback devices, vcan offers a
+ virtual local CAN interface.
+
+ This driver can also be built as a module.  If so, the module
+ will be called vcan.
+
+config CAN_DEBUG_DEVICES
+   bool CAN devices debugging messages
+   depends on CAN
+   default N
+   ---help---
+ Say Y here if you want the CAN device drivers to produce a bunch of
+ debug messages to the system log.  Select this if you are having
+ a problem with CAN support and want to see more of what is going
+ on.
+
+endmenu
Index: net-2.6.24/drivers/net/can/Makefile
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ net-2.6.24/drivers/net/can/Makefile 2007-09-20 18:49:00.0 +0200
@@ -0,0 +1,5 @@
+#
+#  Makefile for the Linux Controller Area Network drivers.
+#
+
+obj-$(CONFIG_CAN_VCAN) += vcan.o
Index: net-2.6.24/drivers/net/can/vcan.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ net-2.6.24/drivers/net/can/vcan.c   2007-09-20 18:49:00.0 +0200
@@ -0,0 +1,208 @@
+/*
+ * vcan.c - Virtual CAN interface
+ *
+ * Copyright (c) 2002-2007 Volkswagen Group Electronic Research
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *notice, this list of conditions, the following disclaimer and
+ *the referenced file 'COPYING'.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in the
+ *documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of Volkswagen nor the names of its contributors
+ *may be used to endorse or promote products derived from this software
+ *without specific prior written permission.
+ *
+ * Alternatively, provided that this notice is retained in full, this
+ * software may be distributed under the terms of the GNU General
+ * Public License (GPL) version 2 as distributed in the 'COPYING'
+ * file from the main directory of the linux kernel source.
+ *
+ * The provided data structures and external interfaces from this code
+ * are not restricted to be used by modules with a GPL compatible license.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * AS IS AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
+ * DAMAGE.
+ *
+ * Send feedback to [EMAIL PROTECTED]
+ *
+ */
+
+#include linux/module.h
+#include linux/init.h
+#include linux/netdevice.h
+#include linux/if_arp.h
+#include linux/if_ether.h
+#include

[PATCH 4/7] CAN: Add broadcast manager (bcm) protocol

2007-09-20 Thread Urs Thuermann

This patch adds the CAN broadcast manager (bcm) protocol.

Signed-off-by: Oliver Hartkopp [EMAIL PROTECTED]
Signed-off-by: Urs Thuermann [EMAIL PROTECTED]

---
 include/linux/can/bcm.h |   65 +
 net/can/Kconfig |   28 
 net/can/Makefile|3 
 net/can/bcm.c   | 1784 
 4 files changed, 1880 insertions(+)

Index: net-2.6.24/include/linux/can/bcm.h
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ net-2.6.24/include/linux/can/bcm.h  2007-09-20 18:48:59.0 +0200
@@ -0,0 +1,65 @@
+/*
+ * linux/can/bcm.h
+ *
+ * Definitions for CAN Broadcast Manager (BCM)
+ *
+ * Author: Oliver Hartkopp [EMAIL PROTECTED]
+ * Copyright (c) 2002-2007 Volkswagen Group Electronic Research
+ * All rights reserved.
+ *
+ * Send feedback to [EMAIL PROTECTED]
+ *
+ */
+
+#ifndef CAN_BCM_H
+#define CAN_BCM_H
+
+/**
+ * struct bcm_msg_head - head of messages to/from the broadcast manager
+ * @opcode:opcode, see enum below.
+ * @flags: special flags, see below.
+ * @count: number of frames to send before changing interval.
+ * @ival1: interval for the first @count frames.
+ * @ival2: interval for the following frames.
+ * @can_id:CAN ID of frames to be sent or received.
+ * @nframes:   number of frames appended to the message head.
+ * @frames:array of CAN frames.
+ */
+struct bcm_msg_head {
+   int opcode;
+   int flags;
+   int count;
+   struct timeval ival1, ival2;
+   canid_t can_id;
+   int nframes;
+   struct can_frame frames[0];
+};
+
+enum {
+   TX_SETUP = 1,   /* create (cyclic) transmission task */
+   TX_DELETE,  /* remove (cyclic) transmission task */
+   TX_READ,/* read properties of (cyclic) transmission task */
+   TX_SEND,/* send one CAN frame */
+   RX_SETUP,   /* create RX content filter subscription */
+   RX_DELETE,  /* remove RX content filter subscription */
+   RX_READ,/* read properties of RX content filter subscription */
+   TX_STATUS,  /* reply to TX_READ request */
+   TX_EXPIRED, /* notification on performed transmissions (count=0) */
+   RX_STATUS,  /* reply to RX_READ request */
+   RX_TIMEOUT, /* cyclic message is absent */
+   RX_CHANGED  /* updated CAN frame (detected content change) */
+};
+
+#define SETTIMER0x0001
+#define STARTTIMER  0x0002
+#define TX_COUNTEVT 0x0004
+#define TX_ANNOUNCE 0x0008
+#define TX_CP_CAN_ID0x0010
+#define RX_FILTER_ID0x0020
+#define RX_CHECK_DLC0x0040
+#define RX_NO_AUTOTIMER 0x0080
+#define RX_ANNOUNCE_RESUME  0x0100
+#define TX_RESET_MULTI_IDX  0x0200
+#define RX_RTR_FRAME0x0400
+
+#endif /* CAN_BCM_H */
Index: net-2.6.24/net/can/Kconfig
===
--- net-2.6.24.orig/net/can/Kconfig 2007-09-20 18:48:59.0 +0200
+++ net-2.6.24/net/can/Kconfig  2007-09-20 18:48:59.0 +0200
@@ -42,6 +42,34 @@
  Say Y here if you want non-root users to be able to access CAN_RAW
  sockets.
 
+config CAN_BCM
+   tristate Broadcast Manager CAN Protocol (with content filtering)
+   depends on CAN
+   default N
+   ---help---
+ The Broadcast Manager offers content filtering, timeout monitoring,
+ sending of RTR-frames and cyclic CAN messages without permanent user
+ interaction. The BCM can be 'programmed' via the BSD socket API and
+ informs you on demand e.g. only on content updates / timeouts.
+ You probably want to use the bcm socket in most cases where cyclic
+ CAN messages are used on the bus (e.g. in automotive environments).
+ To use the Broadcast Manager, use AF_CAN with protocol CAN_BCM.
+
+config CAN_BCM_USER
+   bool Allow non-root users to access CAN broadcast manager sockets
+   depends on CAN_BCM
+   default N
+   ---help---
+ The Controller Area Network is a local field bus transmitting only
+ broadcast messages without any routing and security concepts.
+ In the majority of cases the user application has to deal with
+ raw CAN frames. Therefore it might be reasonable NOT to restrict
+ the CAN access only to the user root, as known from other networks.
+ Since CAN_BCM sockets can only send and receive frames to/from CAN
+ interfaces this does not affect security of others networks.
+ Say Y here if you want non-root users to be able to access CAN_BCM
+ sockets.
+
 config CAN_DEBUG_CORE
bool CAN Core debugging messages
depends on CAN
Index: net-2.6.24/net/can/Makefile
===
--- net-2.6.24.orig/net/can/Makefile2007-09-20 18:48:59.0 +0200
+++ net-2.6.24/net/can/Makefile

[PATCH 6/7] CAN: Add maintainer entries

2007-09-20 Thread Urs Thuermann

This patch adds entries in the CREDITS and MAINTAINERS file for CAN.

Signed-off-by: Oliver Hartkopp [EMAIL PROTECTED]
Signed-off-by: Urs Thuermann [EMAIL PROTECTED]

---
 CREDITS |   16 
 MAINTAINERS |9 +
 2 files changed, 25 insertions(+)

Index: net-2.6.24/CREDITS
===
--- net-2.6.24.orig/CREDITS 2007-09-20 18:48:21.0 +0200
+++ net-2.6.24/CREDITS  2007-09-20 18:49:00.0 +0200
@@ -1331,6 +1331,14 @@
 S: 5623 HZ Eindhoven
 S: The Netherlands
 
+N: Oliver Hartkopp
+E: [EMAIL PROTECTED]
+W: http://www.volkswagen.de
+D: Controller Area Network (network layer core)
+S: Brieffach 1776
+S: 38436 Wolfsburg
+S: Germany
+
 N: Andrew Haylett
 E: [EMAIL PROTECTED]
 D: Selection mechanism
@@ -3284,6 +3292,14 @@
 S: F-35042 Rennes Cedex
 S: France
 
+N: Urs Thuermann
+E: [EMAIL PROTECTED]
+W: http://www.volkswagen.de
+D: Controller Area Network (network layer core)
+S: Brieffach 1776
+S: 38436 Wolfsburg
+S: Germany
+
 N: Jon Tombs
 E: [EMAIL PROTECTED]
 W: http://www.esi.us.es/~jon
Index: net-2.6.24/MAINTAINERS
===
--- net-2.6.24.orig/MAINTAINERS 2007-09-20 18:48:21.0 +0200
+++ net-2.6.24/MAINTAINERS  2007-09-20 18:49:00.0 +0200
@@ -975,6 +975,15 @@
 L: [EMAIL PROTECTED]
 S: Maintained
 
+CAN NETWORK LAYER
+P: Urs Thuermann
+M: [EMAIL PROTECTED]
+P: Oliver Hartkopp
+M: [EMAIL PROTECTED]
+L: [EMAIL PROTECTED]
+W: http://developer.berlios.de/projects/socketcan/
+S: Maintained
+
 CALGARY x86-64 IOMMU
 P: Muli Ben-Yehuda
 M: [EMAIL PROTECTED]

--
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 9/9] [TCP]: Avoid clearing sacktag hint in trivial situations

2007-09-20 Thread Ilpo Järvinen

On Thu, 20 Sep 2007, David Miller wrote:

 From: Ilpo_Järvinen [EMAIL PROTECTED]
 Date: Thu, 20 Sep 2007 15:17:52 +0300

  There's no reason to clear the sacktag skb hint when small part
  of the rexmit queue changes. Account changes (if any) instead when
  fragmenting/collapsing. RTO/FRTO do not touch SACKED_ACKED bits so
  no need to discard SACK tag hint at all.

  Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]

 Applied, and I followed it up with this coding style fixlet.

Yeah, that's for fixing it... ...Just didn't notice it was left wrong 
while doing things that required more thinking to get them right...

-- 
 i.

[git patches] net driver fixes

2007-09-20 Thread Jeff Garzik


This includes the sky2 update that you and sch discussed.

Please pull from 'upstream-linus' branch of
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git 
upstream-linus

to receive the following updates:

 drivers/net/myri10ge/myri10ge.c |3 +
 drivers/net/phy/phy.c   |1 +
 drivers/net/sky2.c  |  368 +++
 drivers/net/sky2.h  |   41 -
 4 files changed, 292 insertions(+), 121 deletions(-)

Brice Goglin (1):
  myri10ge: Add support for PCI device id 9

Domen Puncer (1):
  phy: export phy_mii_ioctl

Stephen Hemminger (6):
  sky2: fix VLAN receive processing (resend)
  sky2: ethtool speed report bug
  sky2: reorganize chip revision features
  sky2: fe+ chip support
  sky2: receive FIFO checking
  sky2: version 1.18

diff --git a/drivers/net/myri10ge/myri10ge.c b/drivers/net/myri10ge/myri10ge.c
index 1c42266..556962f 100644
--- a/drivers/net/myri10ge/myri10ge.c
+++ b/drivers/net/myri10ge/myri10ge.c
@@ -3094,9 +3094,12 @@ static void myri10ge_remove(struct pci_dev *pdev)
 }
 
 #define PCI_DEVICE_ID_MYRICOM_MYRI10GE_Z8E 0x0008
+#define PCI_DEVICE_ID_MYRICOM_MYRI10GE_Z8E_9   0x0009
 
 static struct pci_device_id myri10ge_pci_tbl[] = {
{PCI_DEVICE(PCI_VENDOR_ID_MYRICOM, PCI_DEVICE_ID_MYRICOM_MYRI10GE_Z8E)},
+   {PCI_DEVICE
+(PCI_VENDOR_ID_MYRICOM, PCI_DEVICE_ID_MYRICOM_MYRI10GE_Z8E_9)},
{0},
 };
 
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 0cc4369..cb230f4 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -409,6 +409,7 @@ int phy_mii_ioctl(struct phy_device *phydev,
 
return 0;
 }
+EXPORT_SYMBOL(phy_mii_ioctl);
 
 /**
  * phy_start_aneg - start auto-negotiation for this PHY device
diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index 5d812de..eaffe55 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -51,7 +51,7 @@
 #include sky2.h
 
 #define DRV_NAME   sky2
-#define DRV_VERSION1.17
+#define DRV_VERSION1.18
 #define PFXDRV_NAME  
 
 /*
@@ -118,12 +118,15 @@ static const struct pci_device_id sky2_id_table[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4351) }, /* 88E8036 */
{ PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4352) }, /* 88E8038 */
{ PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4353) }, /* 88E8039 */
+   { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4354) }, /* 88E8040 */
{ PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4356) }, /* 88EC033 */
+   { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x435A) }, /* 88E8048 */
{ PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4360) }, /* 88E8052 */
{ PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4361) }, /* 88E8050 */
{ PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4362) }, /* 88E8053 */
{ PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4363) }, /* 88E8055 */
{ PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4364) }, /* 88E8056 */
+   { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4365) }, /* 88E8070 */
{ PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4366) }, /* 88EC036 */
{ PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4367) }, /* 88EC032 */
{ PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4368) }, /* 88EC034 */
@@ -147,6 +150,7 @@ static const char *yukon2_name[] = {
Extreme,  /* 0xb5 */
EC,   /* 0xb6 */
FE,   /* 0xb7 */
+   FE+,  /* 0xb8 */
 };
 
 static void sky2_set_multicast(struct net_device *dev);
@@ -217,8 +221,7 @@ static void sky2_power_on(struct sky2_hw *hw)
else
sky2_write8(hw, B2_Y2_CLK_GATE, 0);
 
-   if (hw-chip_id == CHIP_ID_YUKON_EC_U ||
-   hw-chip_id == CHIP_ID_YUKON_EX) {
+   if (hw-flags  SKY2_HW_ADV_POWER_CTL) {
u32 reg;
 
sky2_pci_write32(hw, PCI_DEV_REG3, 0);
@@ -311,10 +314,8 @@ static void sky2_phy_init(struct sky2_hw *hw, unsigned 
port)
struct sky2_port *sky2 = netdev_priv(hw-dev[port]);
u16 ctrl, ct1000, adv, pg, ledctrl, ledover, reg;
 
-   if (sky2-autoneg == AUTONEG_ENABLE
-!(hw-chip_id == CHIP_ID_YUKON_XL
-|| hw-chip_id == CHIP_ID_YUKON_EC_U
-|| hw-chip_id == CHIP_ID_YUKON_EX)) {
+   if (sky2-autoneg == AUTONEG_ENABLE 
+   !(hw-flags  SKY2_HW_NEWER_PHY)) {
u16 ectrl = gm_phy_read(hw, port, PHY_MARV_EXT_CTRL);
 
ectrl = ~(PHY_M_EC_M_DSC_MSK | PHY_M_EC_S_DSC_MSK |
@@ -334,7 +335,7 @@ static void sky2_phy_init(struct sky2_hw *hw, unsigned port)
 
ctrl = gm_phy_read(hw, port, PHY_MARV_PHY_CTRL);
if (sky2_is_copper(hw)) {
-   if (hw-chip_id == CHIP_ID_YUKON_FE) {
+   if (!(hw-flags  SKY2_HW_GIGABIT)) {
/* enable automatic crossover */
ctrl |= PHY_M_PC_MDI_XMODE(PHY_M_PC_ENA_AUTO)  1;
} else {
@@ -346,9 +347,7 @@ static void sky2_phy_init(struct sky2_hw *hw, unsigned

Please pull 'nl80211' branch of wireless-2.6

2007-09-20 Thread John W. Linville

Dave,

This patch adds the basic nl80211 infrastructure.

Thanks!

John

---

Patch is available here:


http://www.kernel.org/pub/linux/kernel/people/linville/wireless-2.6/nl80211/0001-nl80211-add-netlink-interface-to-cfg80211.patch

---

The following changes since commit 0d4cbb5e7f60b2f1a4d8b7f6ea4cc264262c7a01:
  Linus Torvalds (1):
Linux 2.6.23-rc6

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git 
nl80211

Johannes Berg (1):
  nl80211: add netlink interface to cfg80211

 include/linux/nl80211.h  |   97 +-
 include/net/cfg80211.h   |   11 +-
 include/net/iw_handler.h |8 +-
 net/mac80211/ieee80211_cfg.c |2 +-
 net/wireless/Kconfig |   17 ++-
 net/wireless/Makefile|1 +
 net/wireless/core.c  |  148 +++
 net/wireless/core.h  |   32 +++
 net/wireless/nl80211.c   |  431 ++
 net/wireless/nl80211.h   |   24 +++
 10 files changed, 762 insertions(+), 9 deletions(-)
 create mode 100644 net/wireless/nl80211.c
 create mode 100644 net/wireless/nl80211.h

diff --git a/include/linux/nl80211.h b/include/linux/nl80211.h
index 9a30ba2..a5dd030 100644
--- a/include/linux/nl80211.h
+++ b/include/linux/nl80211.h
@@ -7,7 +7,97 @@
  */
 
 /**
+ * enum nl80211_commands - supported nl80211 commands
+ *
+ * @NL80211_CMD_UNSPEC: unspecified command to catch errors
+ *
+ * @NL80211_CMD_GET_WIPHY: request information about a wiphy or dump request
+ * to get a list of all present wiphys.
+ * @NL80211_CMD_SET_WIPHY: set wiphy name, needs %NL80211_ATTR_WIPHY and
+ * %NL80211_ATTR_WIPHY_NAME.
+ * @NL80211_CMD_NEW_WIPHY: Newly created wiphy, response to get request
+ * or rename notification. Has attributes %NL80211_ATTR_WIPHY and
+ * %NL80211_ATTR_WIPHY_NAME.
+ * @NL80211_CMD_DEL_WIPHY: Wiphy deleted. Has attributes
+ * %NL80211_ATTR_WIPHY and %NL80211_ATTR_WIPHY_NAME.
+ *
+ * @NL80211_CMD_GET_INTERFACE: Request an interface's configuration;
+ * either a dump request on a %NL80211_ATTR_WIPHY or a specific get
+ * on an %NL80211_ATTR_IFINDEX is supported.
+ * @NL80211_CMD_SET_INTERFACE: Set type of a virtual interface, requires
+   %NL80211_ATTR_IFINDEX and %NL80211_ATTR_IFTYPE.
+ * @NL80211_CMD_NEW_INTERFACE: Newly created virtual interface or response
+ * to %NL80211_CMD_GET_INTERFACE. Has %NL80211_ATTR_IFINDEX,
+ * %NL80211_ATTR_WIPHY and %NL80211_ATTR_IFTYPE attributes. Can also
+ * be sent from userspace to request creation of a new virtual interface,
+ * then requires attributes %NL80211_ATTR_WIPHY, %NL80211_ATTR_IFTYPE and
+ * %NL80211_ATTR_IFNAME.
+ * @NL80211_CMD_DEL_INTERFACE: Virtual interface was deleted, has attributes
+ * %NL80211_ATTR_IFINDEX and %NL80211_ATTR_WIPHY. Can also be sent from
+ * userspace to request deletion of a virtual interface, then requires
+ * attribute %NL80211_ATTR_IFINDEX.
+ *
+ * @NL80211_CMD_MAX: highest used command number
+ * @__NL80211_CMD_AFTER_LAST: internal use
+ */
+enum nl80211_commands {
+/* don't change the order or add anything inbetween, this is ABI! */
+   NL80211_CMD_UNSPEC,
+
+   NL80211_CMD_GET_WIPHY,  /* can dump */
+   NL80211_CMD_SET_WIPHY,
+   NL80211_CMD_NEW_WIPHY,
+   NL80211_CMD_DEL_WIPHY,
+
+   NL80211_CMD_GET_INTERFACE,  /* can dump */
+   NL80211_CMD_SET_INTERFACE,
+   NL80211_CMD_NEW_INTERFACE,
+   NL80211_CMD_DEL_INTERFACE,
+
+   /* add commands here */
+
+   /* used to define NL80211_CMD_MAX below */
+   __NL80211_CMD_AFTER_LAST,
+   NL80211_CMD_MAX = __NL80211_CMD_AFTER_LAST - 1
+};
+
+
+/**
+ * enum nl80211_attrs - nl80211 netlink attributes
+ *
+ * @NL80211_ATTR_UNSPEC: unspecified attribute to catch errors
+ *
+ * @NL80211_ATTR_WIPHY: index of wiphy to operate on, cf.
+ * /sys/class/ieee80211/phyname/index
+ * @NL80211_ATTR_WIPHY_NAME: wiphy name (used for renaming)
+ *
+ * @NL80211_ATTR_IFINDEX: network interface index of the device to operate on
+ * @NL80211_ATTR_IFNAME: network interface name
+ * @NL80211_ATTR_IFTYPE: type of virtual interface, see enum nl80211_iftype
+ *
+ * @NL80211_ATTR_MAX: highest attribute number currently defined
+ * @__NL80211_ATTR_AFTER_LAST: internal use
+ */
+enum nl80211_attrs {
+/* don't change the order or add anything inbetween, this is ABI! */
+   NL80211_ATTR_UNSPEC,
+
+   NL80211_ATTR_WIPHY,
+   NL80211_ATTR_WIPHY_NAME,
+
+   NL80211_ATTR_IFINDEX,
+   NL80211_ATTR_IFNAME,
+   NL80211_ATTR_IFTYPE,
+
+   /* add attributes here, update the policy in nl80211.c */
+
+   __NL80211_ATTR_AFTER_LAST,
+   NL80211_ATTR_MAX = __NL80211_ATTR_AFTER_LAST - 1
+};
+
+/**
  * enum nl80211_iftype - (virtual) interface types
+ *
  * @NL80211_IFTYPE_UNSPECIFIED: unspecified type, driver decides
  * @NL80211_IFTYPE_ADHOC: independent BSS member
  * @NL80211_IFTYPE_STATION:

Re: 2.6.23-rc6-mm1

2007-09-20 Thread Andrew Morton

On Thu, 20 Sep 2007 21:42:44 +0530
Kamalesh Babulal [EMAIL PROTECTED] wrote:

 ...

  i have tested the change with cross compiler for power405 with the same
  .config
  with which the build problem is solved, but the build fails with another
  error
 
CC [M]  drivers/net/mace.o
  drivers/net/mace.c: In function 'mace_handle_misc_intrs':
  drivers/net/mace.c:642: error: 'dev' undeclared (first use in this
  function)
  drivers/net/mace.c:642: error: (Each undeclared identifier is reported
  only once
  drivers/net/mace.c:642: error: for each function it appears in.)
  make[2]: *** [drivers/net/mace.o] Error 1
  make[1]: *** [drivers/net] Error 2
  make: *** [drivers] Error 2
 
  This patch fixes the build failure
 
  Signed-off-by: Kamalesh Babulal [EMAIL PROTECTED]
  ---
  --- linux-2.6.23-rc6 /drivers/net/mace.c 2007-09-20 17:16:50.0+0530
  +++ linux-2.6.23-rc6/drivers/net/~mace.c2007-09-20 17:12:
  47.0 +0530
  @@ -633,7 +633,7 @@ static void mace_set_multicast(struct ne
   spin_unlock_irqrestore(mp-lock, flags);
   }
 
  -static void mace_handle_misc_intrs(struct mace_data *mp, int intr)
  +static void mace_handle_misc_intrs(struct mace_data *mp, int intr, struct
  net_device *dev)
   {
   volatile struct mace __iomem *mb = mp-mace;
   static int mace_babbles, mace_jabbers;
  @@ -669,7 +669,7 @@ static irqreturn_t mace_interrupt(int ir
   spin_lock_irqsave(mp-lock, flags);
   intr = in_8(mb-ir);  /* read interrupt register */
   in_8(mb-xmtrc);  /* get retries */
  -mace_handle_misc_intrs(mp, intr);
  +mace_handle_misc_intrs(mp, intr, dev);
 
   i = mp-tx_empty;
   while (in_8(mb-pr)  XMTSV) {
  @@ -682,7 +682,7 @@ static irqreturn_t mace_interrupt(int ir
   */
  intr = in_8(mb-ir);
  if (intr != 0)
  -   mace_handle_misc_intrs(mp, intr);
  +   mace_handle_misc_intrs(mp, intr, dev);
  if (mp-tx_bad_runt) {
  fs = in_8(mb-xmtfs);
  mp-tx_bad_runt = 0;
  @@ -817,7 +817,7 @@ static void mace_tx_timeout(unsigned lon
  goto out;
 
   /* update various counters */
  -mace_handle_misc_intrs(mp, in_8(mb-ir));
  +mace_handle_misc_intrs(mp, in_8(mb-ir), dev);
 
   cp = mp-tx_cmds + NCMDS_TX * mp-tx_empty;

Thanks, I will fix the wordwrapping in your patch and shall send it in to
David. 

 
 Hi,
 
 The build fails when compiling with the same .config over cross compiler for
 powerpc405
 
 drivers/net/mv643xx_eth.c: In function 'mv643xx_eth_int_handler':
 drivers/net/mv643xx_eth.c:564: error: 'bp' undeclared (first use in this
 function)
 drivers/net/mv643xx_eth.c:564: error: (Each undeclared identifier is
 reported only once
 drivers/net/mv643xx_eth.c:564: error: for each function it appears in.)
 drivers/net/mv643xx_eth.c: At top level:
 drivers/net/mv643xx_eth.c:1010: error: conflicting types for 'mv643xx_poll'
 drivers/net/mv643xx_eth.c:68: error: previous declaration of 'mv643xx_poll'
 was here
 make[2]: *** [drivers/net/mv643xx_eth.o] Error 1
 make[1]: *** [drivers/net] Error 2
 make: *** [drivers] Error 2

Yes, rather a lot of net drivers got broken in easy-to-fix ways.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] sky2: be more selective about FIFO watchdog

2007-09-20 Thread Stephen Hemminger

Be more selective about when to enable the ram buffer watchdog code.
It is unnecessary on XL A3 or later revs, and with Yukon FE
the buffer is so small (4K) that the watchdog detects false positives.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- a/drivers/net/sky2.c2007-09-19 15:36:32.0 -0700
+++ b/drivers/net/sky2.c2007-09-20 10:43:42.0 -0700
@@ -816,7 +816,8 @@ static void sky2_mac_init(struct sky2_hw
sky2_write8(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_RST_CLR);
sky2_write16(hw, SK_REG(port, TX_GMF_CTRL_T), GMF_OPER_ON);
 
-   if (!(hw-flags  SKY2_HW_RAMBUFFER)) {
+   /* On chips without ram buffer, pause is controled by MAC level */
+   if (sky2_read8(hw, B2_E_0) == 0) {
sky2_write8(hw, SK_REG(port, RX_GMF_LP_THR), 768/8);
sky2_write8(hw, SK_REG(port, RX_GMF_UP_THR), 1024/8);
 
@@ -1271,7 +1272,7 @@ static int sky2_up(struct net_device *de
struct sky2_port *sky2 = netdev_priv(dev);
struct sky2_hw *hw = sky2-hw;
unsigned port = sky2-port;
-   u32 imask;
+   u32 imask, ramsize;
int cap, err = -ENOMEM;
struct net_device *otherdev = hw-dev[sky2-port^1];
 
@@ -1326,13 +1327,12 @@ static int sky2_up(struct net_device *de
 
sky2_mac_init(hw, port);
 
-   if (hw-flags  SKY2_HW_RAMBUFFER) {
-   /* Register is number of 4K blocks on internal RAM buffer. */
-   u32 ramsize = sky2_read8(hw, B2_E_0) * 4;
+   /* Register is number of 4K blocks on internal RAM buffer. */
+   ramsize = sky2_read8(hw, B2_E_0) * 4;
+   if (ramsize  0) {
u32 rxspace;
 
-   printk(KERN_DEBUG PFX %s: ram buffer %dK\n, dev-name, 
ramsize);
-
+   pr_debug(PFX %s: ram buffer %dK\n, dev-name, ramsize);
if (ramsize  16)
rxspace = ramsize / 2;
else
@@ -1995,7 +1995,7 @@ static int sky2_change_mtu(struct net_de
 
synchronize_irq(hw-pdev-irq);
 
-   if (!(hw-flags  SKY2_HW_RAMBUFFER))
+   if (sky2_read8(hw, B2_E_0) == 0)
sky2_set_tx_stfwd(hw, port);
 
ctl = gma_read16(hw, port, GM_GP_CTRL);
@@ -2526,7 +2526,7 @@ static void sky2_watchdog(unsigned long 
++active;
 
/* For chips with Rx FIFO, check if stuck */
-   if ((hw-flags  SKY2_HW_RAMBUFFER) 
+   if ((hw-flags  SKY2_HW_FIFO_HANG_CHECK) 
 sky2_rx_hung(dev)) {
pr_info(PFX %s: receiver hang detected\n,
dev-name);
@@ -2684,8 +2684,10 @@ static int __devinit sky2_init(struct sk
switch(hw-chip_id) {
case CHIP_ID_YUKON_XL:
hw-flags = SKY2_HW_GIGABIT
-   | SKY2_HW_NEWER_PHY
-   | SKY2_HW_RAMBUFFER;
+   | SKY2_HW_NEWER_PHY;
+   if (hw-chip_rev  3)
+   hw-flags |= SKY2_HW_FIFO_HANG_CHECK;
+
break;
 
case CHIP_ID_YUKON_EC_U:
@@ -2711,11 +2713,10 @@ static int __devinit sky2_init(struct sk
dev_err(hw-pdev-dev, unsupported revision Yukon-EC 
rev A1\n);
return -EOPNOTSUPP;
}
-   hw-flags = SKY2_HW_GIGABIT | SKY2_HW_RAMBUFFER;
+   hw-flags = SKY2_HW_GIGABIT | SKY2_HW_FIFO_HANG_CHECK;
break;
 
case CHIP_ID_YUKON_FE:
-   hw-flags = SKY2_HW_RAMBUFFER;
break;
 
case CHIP_ID_YUKON_FE_P:
--- a/drivers/net/sky2.h2007-09-19 10:05:28.0 -0700
+++ b/drivers/net/sky2.h2007-09-20 10:44:15.0 -0700
@@ -2063,7 +2063,7 @@ struct sky2_hw {
 #define SKY2_HW_FIBRE_PHY  0x0002
 #define SKY2_HW_GIGABIT0x0004
 #define SKY2_HW_NEWER_PHY  0x0008
-#define SKY2_HW_RAMBUFFER  0x0010  /* chip has RAM FIFO */
+#define SKY2_HW_FIFO_HANG_CHECK0x0010
 #define SKY2_HW_NEW_LE 0x0020  /* new LSOv2 format */
 #define SKY2_HW_AUTO_TX_SUM0x0040  /* new IP decode for Tx */
 #define SKY2_HW_ADV_POWER_CTL  0x0080  /* additional PHY power regs */
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Pull request for 'r8169-for-jeff-20070919' branch

2007-09-20 Thread Chuck Ebbert

On 09/19/2007 03:56 PM, Francois Romieu wrote:
 Please pull from branch 'r8169-for-jeff-20070919' in repository
 

People are still reporting hangs with this card in 2.6.22.6, are there
any fixes appropriate for that?

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 142 matches

Mail list logo