Re: 2.6.22-rc3-mm1 - pppd hanging in netdev_run_todo while holding mutex

2007-06-06 Thread Andrew Morton
On Mon, 04 Jun 2007 14:00:56 -0400 [EMAIL PROTECTED] wrote:

 On Wed, 30 May 2007 23:58:23 PDT, Andrew Morton said:
  ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc3/2.6.22-rc3-mm1/
 
 Under 22-rc2-mm1, if my VPN connection got reset, ppp0 just quietly went away.
 
 Under 22-rc3-mm1, it seems to end up wedged and waiting for references to
 go away:
 
 Jun  4 09:23:01 turing-police kernel: [90089.270707] unregister_netdevice: 
 waiting for ppp0 to become free. Usage count = 8
 Jun  4 09:23:11 turing-police kernel: [90099.396121] unregister_netdevice: 
 waiting for ppp0 to become free. Usage count = 8
 Jun  4 09:23:21 turing-police kernel: [90109.520574] unregister_netdevice: 
 waiting for ppp0 to become free. Usage count = 8
 Jun  4 09:23:32 turing-police kernel: [90119.653129] unregister_netdevice: 
 waiting for ppp0 to become free. Usage count = 8

Interesting refcount.

 'echo t  /proc/sysrq_trigger' shows pppd hung up here:
 
 Jun  4 10:52:57 turing-police kernel: [95478.047892] pppd  D 
 000105ad3830  4968  3815  1 (NOTLB)
 Jun  4 10:52:57 turing-police kernel: [95478.047902]  810008d5fd78 
 0086  81000349
 Jun  4 10:52:57 turing-police kernel: [95478.047911]  810008d5fd28 
 810008d4a040 810003461820 810008d4a2b0
 Jun  4 10:52:57 turing-police kernel: [95478.047920]  000105ad3733 
 0202 00ff 80239795
 Jun  4 10:52:57 turing-police kernel: [95478.047928] Call Trace:
 Jun  4 10:52:57 turing-police kernel: [95478.047936]  [805207a2] 
 schedule_timeout+0x8d/0xb4
 Jun  4 10:52:57 turing-police kernel: [95478.047945]  [805207e2] 
 schedule_timeout_uninterruptible+0x19/0x1b
 Jun  4 10:52:57 turing-police kernel: [95478.047954]  [802397bb] 
 msleep+0x14/0x1e
 Jun  4 10:52:57 turing-police kernel: [95478.047963]  [8048aa4e] 
 netdev_run_todo+0x12f/0x234 
 Jun  4 10:52:57 turing-police kernel: [95478.047972]  [8049166f] 
 rtnl_unlock+0x35/0x37
 Jun  4 10:52:57 turing-police kernel: [95478.047981]  [804894a9] 
 unregister_netdev+0x1e/0x23
 Jun  4 10:52:57 turing-police kernel: [95478.047994]  [88a5f2c2] 
 :ppp_generic:ppp_shutdown_interface+0x67/0xbb
 Jun  4 10:52:57 turing-police kernel: [95478.048018]  [88a5f5b8] 
 :ppp_generic:ppp_release+0x33/0x65
 Jun  4 10:52:57 turing-police kernel: [95478.048028]  [8028d54a] 
 __fput+0xac/0x176
 Jun  4 10:52:57 turing-police kernel: [95478.048036]  [8028d628] 
 fput+0x14/0x16
 Jun  4 10:52:57 turing-police kernel: [95478.048045]  [8028a9c6] 
 filp_close+0x66/0x71
 Jun  4 10:52:57 turing-police kernel: [95478.048054]  [8028bd54] 
 sys_close+0x98/0xd7
 Jun  4 10:52:57 turing-police kernel: [95478.048062]  [8020a03c] 
 tracesys+0xdc/0xe1
 Jun  4 10:52:57 turing-police kernel: [95478.048073]  [2b45cd2429a0]

I don't know what could have caused this, sorry.  If it's still there in next 
-mm
(which is still 10 compile fixes away) it'd be good if you could bisect it.
Suspects would be git-net.patch, get-netdev-all.patch and gregkh-driver-*.patch

Thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Remove incorrect comment from hamradio/scc.

2007-06-06 Thread Dave Jones
scc_rxint doesn't call this function at all.
http://bugzilla.kernel.org/show_bug.cgi?id=8146

Signed-off-by: Dave Jones [EMAIL PROTECTED]

diff --git a/drivers/net/hamradio/scc.c b/drivers/net/hamradio/scc.c
index 6fdaad5..30bed2a 100644
--- a/drivers/net/hamradio/scc.c
+++ b/drivers/net/hamradio/scc.c
@@ -1610,7 +1610,7 @@ static int scc_net_close(struct net_device *dev)
return 0;
 }
 
-/*  receive frame, called from scc_rxint()  */
+/*  receive frame  */
 
 static void scc_net_rx(struct scc_channel *scc, struct sk_buff *skb)
 {

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


typo in via-velocity.c

2007-06-06 Thread Dave Jones
http://bugzilla.kernel.org/show_bug.cgi?id=8160

Signed-off-by: Dave Jones [EMAIL PROTECTED]

diff --git a/drivers/net/via-velocity.c b/drivers/net/via-velocity.c
index 25b75b6..b670b97 100644
--- a/drivers/net/via-velocity.c
+++ b/drivers/net/via-velocity.c
@@ -1562,7 +1562,7 @@ static void velocity_print_link_status(struct 
velocity_info *vptr)
if (vptr-mii_status  VELOCITY_LINK_FAIL) {
VELOCITY_PRT(MSG_LEVEL_INFO, KERN_NOTICE %s: failed to detect 
cable link\n, vptr-dev-name);
} else if (vptr-options.spd_dpx == SPD_DPX_AUTO) {
-   VELOCITY_PRT(MSG_LEVEL_INFO, KERN_NOTICE %s: Link 
autonegation, vptr-dev-name);
+   VELOCITY_PRT(MSG_LEVEL_INFO, KERN_NOTICE %s: Link 
auto-negotiation, vptr-dev-name);
 
if (vptr-mii_status  VELOCITY_SPEED_1000)
VELOCITY_PRT(MSG_LEVEL_INFO,  speed 1000M bps);

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC] network splice receive

2007-06-06 Thread Jens Axboe
On Tue, Jun 05 2007, jamal wrote:
 On Tue, 2007-05-06 at 14:20 +0200, Jens Axboe wrote:
 
   
    1800 4ff3 937f e000 6381 7275 0008
   
   Perhaps that hex pattern rings a bell with someone intimate with the
   networking. The remaining wrong bytes don't seem to have anything in
   common.
  
  Ok, the source mac address is 00:18:F3:4F:7F:93 and the destination is
  00:E0:81:63:75:72 which are the middle 12 bytes of the 16.
  
 
 It appears you may have endianness issues and perhaps a 16 bit
 mis-offset. the 0008 at the end looks like a swapped 0x800 which
 is the ethernet type for IPV4.

Yeah, it looks like the first part of the ethernet frame. I'm using an
e1000 on the receive side and a sky2 on the sender.

  Hope that helps someone clue me in as to which network part is reusing
  the data. Do I need to 'pin' the sk_buff until the pipe data has been
  consumed?
 
 I would worry about the driver level first - likely thats where your
 corruption is.

I'd just be a heck-of-a-lot more inclined to suspect my code! Hence I
thought that data reuse for perhaps another packet would be the likely
explanation, especially since it looked like valid network driver data
ending up where it should not.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC] network splice receive

2007-06-06 Thread Jens Axboe
On Tue, Jun 05 2007, Evgeniy Polyakov wrote:
 On Tue, Jun 05, 2007 at 10:05:43AM +0200, Jens Axboe ([EMAIL PROTECTED]) 
 wrote:
  Here's an implementation of tcp network splice receive support. It's
  originally based on the patch set that Intel posted some time ago, but
  has been (close to) 100% reworked.
  
  Now, I'm not a networking guru by any stretch of the imagination, so I'd
  like some input on the direction of the main patch. Is the approach
  feasible? Glaring errors? Missing bits?
 
   263.709262] [ cut here ]
   [  263.713932] kernel BUG at include/linux/mm.h:285!
   [  263.718678] invalid opcode:  [1] PREEMPT SMP 
   [  263.723561] CPU 0 
   [  263.725665] Modules linked in: button loop snd_intel8x0
   snd_ac97_codec ac97_bus snd_pcm snd_timer snd soundcore psmouse
   snd_page_alloc k8temp i2c_nforcen
   [  263.755666] Pid: 2709, comm: splice-fromnet Not tainted
   2.6.22-rc4-splice #2
   [  263.762759] RIP: 0010:[8038c60c]  [8038c60c]
   skb_splice_bits+0xac/0x1c9
   [  263.771212] RSP: 0018:81003c79fc88  EFLAGS: 00010246
   [  263.776564] RAX:  RBX: 05a8 RCX:
   81003ff04778
   [  263.783743] RDX: 81003ff04778 RSI: 0ab2 RDI:
   0003d52d
   [  263.790925] RBP: 81003c79fdd8 R08:  R09:
   81003d927b78
   [  263.798104] R10: 803b0181 R11: 81003c79fde8 R12:
   81003d52d000
   [  263.805284] R13: 054e R14: 81003d927b78 R15:
   81003bbc6ea0
   [  263.812463] FS:  2ac4089a86d0() GS:804fb000()
   knlGS:
   [  263.820611] CS:  0010 DS:  ES:  CR0: 8005003b
   [  263.826396] CR2: 2ac4088320e0 CR3: 3c987000 CR4:
   06e0
   [  263.833578] Process splice-fromnet (pid: 2709, threadinfo
   81003c79e000, task 81003755c380)
   [  263.842591] Stack:  81003ff04720 
   81003755c380 0046
   [  263.850897]  00c6 0046 81003b0428b8
   81003d0b5b10
   [  263.858543]  00c6 81003d0b5b10 81003b0428b8
   81003d0b5b10
   [  263.865957] Call Trace:
   [  263.868683]  [803dc630] _read_unlock_irq+0x31/0x4e
   [  263.874393]  [803afb54] tcp_splice_data_recv+0x20/0x22
   [  263.880447]  [803afa2b] tcp_read_sock+0xa2/0x1ab
   [  263.885983]  [803afb34] tcp_splice_data_recv+0x0/0x22
   [  263.891951]  [803b01c1] tcp_splice_read+0xae/0x1a3
   [  263.897655]  [8038920f] sock_def_readable+0x0/0x6f
   [  263.903366]  [80384a65] sock_splice_read+0x15/0x17
   [  263.909072]  [8029e773] do_splice_to+0x76/0x88
   [  263.914432]  [8029fcc8] sys_splice+0x1a8/0x232
   [  263.919795]  [802097ce] system_call+0x7e/0x83
   [  263.925067] 
   [  263.926606] 
   [  263.926607] Code: 0f 0b eb fe 44 89 e6 81 e6 ff 0f 00 00 90 ff 42
   08 48 63 55 
   [  263.936418] RIP  [8038c60c] skb_splice_bits+0xac/0x1c9
   [  263.942516]  RSP 81003c79fc88
 
 This a vm_bug_on in get_page().
 
  +static inline int spd_fill_page(struct splice_pipe_desc *spd, struct page 
  *page,
  +   unsigned int len, unsigned int offset)
  +{
  +   struct page *p;
  +
  +   if (unlikely(spd-nr_pages == PIPE_BUFFERS))
  +   return 1;
  +
  +#ifdef NET_COPY_SPLICE
  +   p = alloc_pages(GFP_KERNEL, 0);
  +   if (!p)
  +   return 1;
  +
  +   memcpy(page_address(p) + offset, page_address(page) + offset, len);
  +#else
  +   p = page;
  +   get_page(p);
  +#endif
 
 Some pages have zero reference counter here.
 
 Is commented NET_COPY_SPLICE part from old implementation?
 It will be always slower than existing approach due to allocation
 overhead.

NET_COPY_SPLICE is not supposed to be enabled, it's just a demonstration
of a copy operation if you wanted to test functionality without just
linking in the skb pages. At least that would allow you to test
correctness of the rest of the code, since I don't know if the skb page
linking is entirely correct yet.

But it's somewhat annoying to get pages with zero reference counts
there, I wonder how that happens. I guess if the skb-data originated
from kmalloc() then we don't really know. The main intent there was just
to ensure the page wasn't going away, but clearly it's not good enough
to ensure that reuse isn't taking place.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: warnings in git-wireless

2007-06-06 Thread Christoph Hellwig
On Tue, Jun 05, 2007 at 01:12:03PM -0700, James Ketrenos wrote:
 Yes, we certainly don't want a driver to be near mainline that does 
 things that the rest of the kernel and other drivers are doing.  We should 
 force them to stay out-of-tree until any and everything is resolved.  
 Heaven forbid that the code should be merged, contributed, and improved 
 upon as a community.
 
 You'd think these guys were trying to enable folks with a driver so they 
 can use the hardware they bought or something.  Its almost laughable.  What 
 idiots.

Maybe you should make sure your companies publishes specs or useable drivers
earlier, and you should stop writing crap code instead of these hypocritical 
comments.  We're a big project and we need to be up to some standards before
we can put code in that we'll have to maintain then.

 
 James
 -
 To unsubscribe from this list: send the line unsubscribe linux-wireless in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
---end quoted text---
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sending IPv6 packets via kern_sendmsg

2007-06-06 Thread Roar Bjørgum Rotvik
Anton wrote:
 Hi,
 
 Following on from a previous (now lost :-)) posting, I have been trying 
 to send out IPv6 packets from a kernel module using the kern_sendmsg() 
 function. Since in theory I need this function to be started in 
 interrupt context (actually, softirq context), but since this is 
 impossible because kern_sendmsg() needs to sleep, I have created a work 
 queue which calls the kern_sendmsg() function separately. The work queue 
 is scheduled from the softirq context (actually, at the moment this 
 happens from a Netfilter hook). The work queue function creates a msghdr 
 structure, fills in a sin6_addr structure, calls sock_create_kern() and 
 then uses this socket to send an IPv6 packet, which consists of a header 
 (struct ipv6hdr *iphdr) and some data following on from this. The above 
 packet is placed in the msghdr structure by setting (after the 
 appropriate initializations):
 msg.msg_iov-iov_base = (char *) ip6hdr;
 msg.msg_iov-iov_len = sizeof( struct ipv6hdr + ntohs( 
 ip6hdr-payload_len ) );

Shouldn't this be:
msg.msg_iov-iov_len = sizeof (struct ipv6hdr) + ntohs (ip6hdr-payload_len);

Seems you use sizeof (nthos (...)), and that seems wrong.

-- 
Roar B. Rotvik
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix race in AF_UNIX

2007-06-06 Thread Miklos Szeredi
   Holding a global mutex over recvmsg() calls under AF_UNIX is pretty
   much a non-starter, this will kill performance for multi-threaded
   apps.
  
  That's an rwsem held for read.  It's held for write in unix_gc() only
  for a short duration, and unix_gc() should only rarely be called.  So
  I don't think there's any performance problem here.
 
 It pulls a non-local cacheline into the local thread, that's extremely
 expensive on SMP.

OK, here's an updated patch, that uses -readlock, and passes my
testing.


From: Miklos Szeredi [EMAIL PROTECTED]

There are races involving the garbage collector, that can throw away
perfectly good packets with AF_UNIX sockets in them.

The problems arise when a socket goes from installed to in-flight or
vice versa during garbage collection.  Since gc is done with a
spinlock held, this only shows up on SMP.

Signed-off-by: Miklos Szeredi [EMAIL PROTECTED]
---

Index: linux-2.6.22-rc2/net/unix/garbage.c
===
--- linux-2.6.22-rc2.orig/net/unix/garbage.c2007-06-03 23:58:11.0 
+0200
+++ linux-2.6.22-rc2/net/unix/garbage.c 2007-06-06 09:48:36.0 +0200
@@ -186,7 +186,21 @@ void unix_gc(void)
 
forall_unix_sockets(i, s)
{
-   unix_sk(s)-gc_tree = GC_ORPHAN;
+   struct unix_sock *u = unix_sk(s);
+
+   u-gc_tree = GC_ORPHAN;
+
+   /*
+* Hold -readlock to protect against sockets going from
+* in-flight to installed
+*
+* Can't sleep on this, because
+*   a) we are under spinlock
+*   b) skb_recv_datagram() could be waiting for a packet that
+*  is to be sent by this thread
+*/
+   if (!mutex_trylock(u-readlock))
+   goto lock_failed;
}
/*
 *  Everything is now marked
@@ -207,8 +221,6 @@ void unix_gc(void)
 
forall_unix_sockets(i, s)
{
-   int open_count = 0;
-
/*
 *  If all instances of the descriptor are not
 *  in flight we are in use.
@@ -218,10 +230,20 @@ void unix_gc(void)
 *  In this case (see unix_create1()) we set artificial
 *  negative inflight counter to close race window.
 *  It is trick of course and dirty one.
+*
+*  Get the inflight counter first, then the open
+*  counter.  This avoids problems if racing with
+*  sendmsg
+*
+*  If just created socket is not yet attached to
+*  a file descriptor, assume open_count of 1
 */
+   int inflight_count = atomic_read(unix_sk(s)-inflight);
+   int open_count = 1;
+
if (s-sk_socket  s-sk_socket-file)
open_count = file_count(s-sk_socket-file);
-   if (open_count  atomic_read(unix_sk(s)-inflight))
+   if (open_count  inflight_count)
maybe_unmark_and_push(s);
}
 
@@ -300,6 +322,7 @@ void unix_gc(void)
spin_unlock(s-sk_receive_queue.lock);
}
u-gc_tree = GC_ORPHAN;
+   mutex_unlock(u-readlock);
}
spin_unlock(unix_table_lock);
 
@@ -309,4 +332,19 @@ void unix_gc(void)
 
__skb_queue_purge(hitlist);
mutex_unlock(unix_gc_sem);
+   return;
+
+ lock_failed:
+   {
+   struct sock *s1;
+   forall_unix_sockets(i, s1) {
+   if (s1 == s) {
+   spin_unlock(unix_table_lock);
+   mutex_unlock(unix_gc_sem);
+   return;
+   }
+   mutex_unlock(unix_sk(s1)-readlock);
+   }
+   BUG();
+   }
 }

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix race in AF_UNIX

2007-06-06 Thread David Miller
From: Miklos Szeredi [EMAIL PROTECTED]
Date: Wed, 06 Jun 2007 10:08:29 +0200

Holding a global mutex over recvmsg() calls under AF_UNIX is pretty
much a non-starter, this will kill performance for multi-threaded
apps.
   
   That's an rwsem held for read.  It's held for write in unix_gc() only
   for a short duration, and unix_gc() should only rarely be called.  So
   I don't think there's any performance problem here.
  
  It pulls a non-local cacheline into the local thread, that's extremely
  expensive on SMP.
 
 OK, here's an updated patch, that uses -readlock, and passes my
 testing.

Thanks a lot, I'll review this as soon as possible unless
someone else beats me to it :)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix e100 rx path on ARM (was [PATCH] e100 rx: or s and el bits)

2007-06-06 Thread Milton Miller


On Jun 5, 2007, at 9:26 PM, Kok, Auke wrote:


Kok, Auke wrote:


Hmm git-revert seems to do the job right. I checked it with git-show 
| patch -p1 -R and the results look OK. The two patches on top of the 
one we want to revert are unrelated enough to apply (manually it 
shows some fuzz, but otherwise it's OK).
Jeff, please `git-revert d52df4a35af569071fda3f4eb08e47cc7023f094` to 
revert the following patch for now:

---
commit  d52df4a35af569071fda3f4eb08e47cc7023f094
Author: Scott Feldman [EMAIL PROTECTED]
Date:   Wed Nov 9 02:18:52 2005 -0500
 [netdrvr e100] experiment with doing RX in a similar manner to 
eepro100
 I was going to say that eepro100's speedo_rx_link() does the 
same DMA
 abuse as e100, but then I noticed one little detail: eepro100 
sets  both
 EL (end of list) and S (suspend) bits in the RFD as it chains it 
 to the
 RFD list.  e100 was only setting the EL bit.  Hmmm, that's  
interesting.
 That means that if HW reads a RFD with the S-bit set,  it'll 
process
 that RFD and then suspend the receive unit.  The  receive unit 
will
 resume when SW clears the S-bit.  There is no need  for SW to 
restart
 the receive unit.  Which means a lot of the receive  unit state 
tracking

 code in the driver goes away.
 So here's a patch against 2.6.14.  (Sorry for inlining it; the 
mailer
 I'm using now will mess with the word wrap).  I can't test this 
on
 XScale (unless someone has an e100 module for Gumstix :) .  It 
should
 be doing exactly what eepro100 does with RFDs.  I don't believe 
this
 change will introduce a performance hit because the S-bit and 
EL-bit  go
 hand-in-hand meaning if we're going to suspend because of the S- 
bit,
 we're on the last resource anyway, so we'll have to wait for SW  
to

 replenish.
 (cherry picked from 29e79da9495261119e3b2e4e7c72507348e75976 
commit)

---


A little bit more is needed to explain why we're reverting it for now. 
Jeff, please insert this into the revert commit.


Auke

--
This patch attempted to fix e100 for non-cache coherent memory 
architectures by using the cb style code that eepro100 had and using 
the EL and s bits from the RFD list. Unfortunately the hardware 
doesn't work exactly like this and therefore this patch actually 
breaks e100 on those systems.


on all systems.  (Both the | typo and the removed restart logic).

Reverting the change brings it back to the previously known good state 
for 2.6.22. The pending rewrite in progress to this code can then be 
safely merged later.


Signed-off-by: Auke Kok [EMAIL PROTECTED]
---


milton

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GENETLINK] get pid from userspace

2007-06-06 Thread Ariane Keller

Hi,

I have a user space program which connects to different kernel 
modules with generic netlink. Each module provides its own generic 
netlink family.


For each module to connect to I create a socket and bind it:
fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
struct sockaddr_nl addr;
memset(addr, 0, sizeof(addr));
bind(fd, (struct sockaddr *)addr, sizeof(addr));

As far as I understand, this lets the generic netlink controller choose 
a unique pid, which shows up in the kernel module in the callback 
function as genl_info.snd_pid.


Is it possible to get this pid in userspace (without sending a message 
to the kernel module)?


Thanks!

Ariane




-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/5] NetXen: Add NETXEN prefix to macros

2007-06-06 Thread Mithlesh Thukral
He seems to be the same guy who has dropped the patches from RHEL5 bugzilla 
for now.

--
Mithlesh Thukral

On Tuesday 05 June 2007 00:28, Andy Gospodarek wrote:
 On Sun, Jun 03, 2007 at 11:50:29AM -0400, Jeff Garzik wrote:
  Mithlesh Thukral wrote:
  NetXen: Add NETXEN prefix to a macro
  This patch will add the NETXEN prefix to USER_START macro.
  
  Signed-off by: Wen Xiong [EMAIL PROTECTED]
  Signed-off by: Mithlesh Thukral [EMAIL PROTECTED]
  ---
  
   drivers/net/netxen/netxen_nic.h |4 ++--
   drivers/net/netxen/netxen_nic_ethtool.c |2 +-
   drivers/net/netxen/netxen_nic_hw.c  |4 ++--
   drivers/net/netxen/netxen_nic_init.c|4 ++--
   4 files changed, 7 insertions(+), 7 deletions(-)
 
  Your patch description is useless.  Clearly we know -what- it does,
  simply by reading the patch.
 
  But it does not answer the simple question:  why?  why is this needed in
  a bug fix Release Candidate series?

 I can't see why this is needed in an RC branch, but the use of
 generically named macros/variables has been problematic with this driver
 on more 'obscure' arches.

 I submitted a bigger cleanup patch a in March that never got taken so
 this is still a problem.  Here is that patch:

 Signed-off-by: Andy Gospodarek [EMAIL PROTECTED]
 ---

  netxen_nic.h |   47
 --- netxen_nic_ethtool.c |8
 
  netxen_nic_hw.c  |   10 +-
  netxen_nic_init.c|   23 ---
  4 files changed, 45 insertions(+), 43 deletions(-)

 diff --git a/drivers/net/netxen/netxen_nic.h
 b/drivers/net/netxen/netxen_nic.h index dd8ce35..d5f0c06 100644
 --- a/drivers/net/netxen/netxen_nic.h
 +++ b/drivers/net/netxen/netxen_nic.h
 @@ -68,9 +68,10 @@
  #define _NETXEN_NIC_LINUX_SUBVERSION 3
  #define NETXEN_NIC_LINUX_VERSIONID  3.3.3

 -#define NUM_FLASH_SECTORS (64)
 -#define FLASH_SECTOR_SIZE (64 * 1024)
 -#define FLASH_TOTAL_SIZE  (NUM_FLASH_SECTORS * FLASH_SECTOR_SIZE)
 +#define NETXEN_NUM_FLASH_SECTORS (64)
 +#define NETXEN_FLASH_SECTOR_SIZE (64 * 1024)
 +#define NETXEN_FLASH_TOTAL_SIZE  (NETXEN_NUM_FLASH_SECTORS \
 + * NETXEN_FLASH_SECTOR_SIZE)

  #define PHAN_VENDOR_ID 0x4040

 @@ -671,28 +672,28 @@ struct netxen_new_user_info {

  /* Flash memory map */
  typedef enum {
 - CRBINIT_START = 0,  /* Crbinit section */
 - BRDCFG_START = 0x4000,  /* board config */
 - INITCODE_START = 0x6000,/* pegtune code */
 - BOOTLD_START = 0x1, /* bootld */
 - IMAGE_START = 0x43000,  /* compressed image */
 - SECONDARY_START = 0x20, /* backup images */
 - PXE_START = 0x3E,   /* user defined region */
 - USER_START = 0x3E8000,  /* User defined region for new boards */
 - FIXED_START = 0x3F  /* backup of crbinit */
 + NETXEN_CRBINIT_START = 0,   /* Crbinit section */
 + NETXEN_BRDCFG_START = 0x4000,   /* board config */
 + NETXEN_INITCODE_START = 0x6000, /* pegtune code */
 + NETXEN_BOOTLD_START = 0x1,  /* bootld */
 + NETXEN_IMAGE_START = 0x43000,   /* compressed image */
 + NETXEN_SECONDARY_START = 0x20,  /* backup images */
 + NETXEN_PXE_START = 0x3E,/* user defined region */
 + NETXEN_USER_START = 0x3E8000,   /* User defined region for new boards */
 + NETXEN_FIXED_START = 0x3F   /* backup of crbinit */
  } netxen_flash_map_t;

 -#define USER_START_OLD PXE_START /* for backward compatibility */
 -
 -#define FLASH_START  (CRBINIT_START)
 -#define INIT_SECTOR  (0)
 -#define PRIMARY_START(BOOTLD_START)
 -#define FLASH_CRBINIT_SIZE   (0x4000)
 -#define FLASH_BRDCFG_SIZE(sizeof(struct netxen_board_info))
 -#define FLASH_USER_SIZE  (sizeof(struct 
 netxen_user_info)/sizeof(u32))
 -#define FLASH_SECONDARY_SIZE (USER_START-SECONDARY_START)
 -#define NUM_PRIMARY_SECTORS  (0x20)
 -#define NUM_CONFIG_SECTORS   (1)
 +#define NETXEN_USER_START_OLD NETXEN_PXE_START   /* for backward
 compatibility */ +
 +#define NETXEN_FLASH_START   (NETXEN_CRBINIT_START)
 +#define NETXEN_INIT_SECTOR   (0)
 +#define NETXEN_PRIMARY_START (NETXEN_BOOTLD_START)
 +#define NETXEN_FLASH_CRBINIT_SIZE(0x4000)
 +#define NETXEN_FLASH_BRDCFG_SIZE (sizeof(struct netxen_board_info))
 +#define NETXEN_FLASH_USER_SIZE   (sizeof(struct
 netxen_user_info)/sizeof(u32)) +#define NETXEN_FLASH_SECONDARY_SIZE
   (NETXEN_USER_START-NETXEN_SECONDARY_START) +#define
 NETXEN_NUM_PRIMARY_SECTORS(0x20)
 +#define NETXEN_NUM_CONFIG_SECTORS(1)
  #define PFX NetXen: 
  extern char netxen_nic_driver_name[];

 diff --git a/drivers/net/netxen/netxen_nic_ethtool.c
 b/drivers/net/netxen/netxen_nic_ethtool.c index ee1b5a2..4dfa76b 100644
 --- a/drivers/net/netxen/netxen_nic_ethtool.c
 +++ b/drivers/net/netxen/netxen_nic_ethtool.c
 @@ -94,7 +94,7 @@ static const char
 

Re: [patch 5/7] CAN: Add virtual CAN netdevice driver

2007-06-06 Thread Patrick McHardy
Urs Thuermann wrote:
 Patrick McHardy [EMAIL PROTECTED] writes:
 
I don't get why you can't directly check the socket option on the
TX path.
 
 
 We have several types of sockets in the PF_CAN family, two of which
 are GPL'ed and which are in the patch series.  These are CAN_RAW and
 CAN_BCM.  The protocol implementations use can_send() in af_can.c to
 send a CAN frame and indicate to can_send() in an int argument,
 whether this frame should be looped back.  Only the raw protocol has a
 socket option (setsockopt(2)) in struct raw_sock for this, bcm always
 sets this to 1 to have the frame looped back.  There is no option in
 struct bcm_sock for this.  In can_send() and in the driver we don't
 know what type of socket skb-sk points to and can't check that
 option.  Changing this would mean we have to add such an option in the
 same position in all CAN socket types and set it to fixed values in
 some of them (e.g. to 1 for bcm).  While it's doable, I wouldn't like
 that very much.
 
 Is there anything that prevents can_send() from using skb-pkt_type to
 pass the loopback flag down to the driver?


No, that should be fine, in fact it should set it anyway even if you
can't use it for this purpose.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC RTNETLINK 04/09]: Link creation API

2007-06-06 Thread Patrick McHardy
Stephen Hemminger wrote:
 On Wed, 06 Jun 2007 01:17:11 +0200
 Patrick McHardy [EMAIL PROTECTED] wrote:
 
If you want I'll extend existing bridge netlink to use these.


Are you talking about brige-port information or bridge device
configuration? So far the API is not suitable for anything that
currently uses IFLA_PROTINFO because the sender is not the driver
which created the device and doesn't use AF_UNSPEC. For bridge
device configuration it would certainly be nice to have, but I'm
not sure yet how to handle enslave operations. So far my favourite
idea is to add enslave/release operations to rtnl_link_ops and call
them when IFLA_MASTER is set (so the netlink message would look like
this: ifindex: eth0 master: br0 nlmsg_type: RTM_NETLINK). But I
haven't really thought this through yet.
 
 
 Was thinking AF_BRIDGE (we have it already so use it), and both
 add/remove bridge, and enslave/unslave device.


I think we should use two seperate families for bridge devices
and bridge ports. I'll think about the enslave operation some
more and try to add it next week.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC RTNETLINK 00/09]: Netlink link creation API

2007-06-06 Thread Patrick McHardy
Eric W. Biederman wrote:
 Reading through the patches they look usable to me.
 
 Having to patch iproute to create the more interesting network
 devices sucks, but that problem seems fundamental.  We might
 be able to avoid it if we allowed fields to be reused between
 different types of devices but that makes the error checking
 trickier, and we aren't likely to have that many types of
 devices so there likely isn't much value in generalizing.


You don't really need to patch it, installing a new iplink_XXX.so
file is enough. Generalizing driver specific options more than
what we currently have doesn't look very promising. I think your
driver was simple enough to get along with the generic device
attributes though (IFLA_LINK or IFLA_MASTER).

 I do think we should specify the IFLA_KIND (was: IFLA_NAME) values in
 a header file.  So it is easy to get a list of all of the different
 kinds and so we don't conflict.


I don't think conflicts are going to be a problem (it would be
nice if modpost would warn about duplicate aliases though).
How is listing IFLA_KIND types in a header file going to help
get a list? Presuming the user knows what kind of device he
wants to set up and is not just looking for things to play
around with I also don't see any real value in this.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC RTNETLINK 00/09]: Netlink link creation API

2007-06-06 Thread Patrick McHardy
Patrick McHardy wrote:
 The following patches contain the rtnetlink link creation API I promised,
 as well as two simple driver conversion to use the API as an example.
 I've also converted VLAN as a more complex example, but these patches
 need some more work and are most likely not interesting to all the CCed
 parties, so I'm sending them seperately.


I've updated the patches to remove the broken VLAN ID change, added back
some consts, renamed IFLA_INFO_NAME to IFLA_INFO_KIND and rebased to
current net-2.6. The current patches and -git trees can be found at
http://people.netfilter.org/kaber/rtnl_link/

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[WIP][PATCHES] Network xmit batching

2007-06-06 Thread jamal
Folks,

While Krishna and I have been attempting this on the side, progress has
been rather slow - so this is to solicit for more participation so we
can get this over with faster. Success (myself being conservative when
it comes to performance) requires testing on a wide variety of hardware.

The results look promising - certainly from a pktgen perspective where
performance has been known in some cases to go up over 50%.
Tests by Sridhar on a low number of TCP flows also indicate improved
performance as well as lowered CPU use.
 
I have setup the current state of my patches against Linus tree at:
git://git.kernel.org/pub/scm/linux/kernel/git/hadi/batch-lin26.git

This is also clean against 2.6.22-rc4. So if you want just a diff that
will work against 2.6.22-rc4 - i can send it to you.
I also have a tree against Daves net-2.6 at 
git://git.kernel.org/pub/scm/linux/kernel/git/hadi/batch-net26.git
but iam abandoning that effort until we get this stable due to the
occasional bug that cropped up(like e1000).

I am attaching a pktgen script. There is one experimental parameter
called batch_low - for starters just leave it at 0 in order to reduce
experimental variance. If you have solid results you can muck around
with it.
KK has a netperf script he has been using - if you know netperf your
help will really be appreciated in testing it on your hardware.
KK, can you please post your script?
Testing with forwarding and bridging will also be appreaciated.
Above that, suggestions to changes as long as they are based on
verifiable results or glaringly obvious changes are welcome. My
preference at the moment is to flesh out the patch as is and then
improve on it later if it shows it has some value on a wide variety of
apps. As the subject is indicating this is a WIP and as all eulas
suggest subject to change without notice. 
If you help out, when you post your results, can you please say what
hardware and setup was?

The only real driver that has been changed is e1000 for now. KK is
working on something infiniband related and i plan (if noone beats me)
to get tg3 working. It would be nice if someone converted some 10G
ethernet driver.

cheers,
jamal


pktgen.batch-1-1
Description: application/shellscript


[PATCH] acenic: SET_NETDEV_DEV is always there these days

2007-06-06 Thread Geert Uytterhoeven
acenic: SET_NETDEV_DEV is always there these days

Signed-off-by: Geert Uytterhoeven [EMAIL PROTECTED]

---
Disclaimer: not tested at all

--- a/drivers/net/acenic.c
+++ b/drivers/net/acenic.c
@@ -159,10 +159,6 @@ static struct pci_device_id acenic_pci_t
 };
 MODULE_DEVICE_TABLE(pci, acenic_pci_tbl);
 
-#ifndef SET_NETDEV_DEV
-#define SET_NETDEV_DEV(net, pdev)  do{} while(0)
-#endif
-
 #define ace_sync_irq(irq)  synchronize_irq(irq)
 
 #ifndef offset_in_page

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- Sony Network and Software Technology Center Europe (NSCE)
[EMAIL PROTECTED] --- The Corporate Village, Da Vincilaan 7-D1
Voice +32-2-7008453 Fax +32-2-7008622  B-1935 Zaventem, Belgium
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] acenic: SET_NETDEV_DEV is always there these days

2007-06-06 Thread Jes Sorensen
 Geert == Geert Uytterhoeven [EMAIL PROTECTED] writes:

Geert acenic: SET_NETDEV_DEV is always there these days

Geert Signed-off-by: Geert Uytterhoeven [EMAIL PROTECTED]

Signed-off-by: Jes Sorensen [EMAIL PROTECTED]

Geert --- Disclaimer: not tested at all

Geert --- a/drivers/net/acenic.c +++ b/drivers/net/acenic.c @@ -159,10
Geert +159,6 @@ static struct pci_device_id acenic_pci_t };
Geert MODULE_DEVICE_TABLE(pci, acenic_pci_tbl);
 
Geert -#ifndef SET_NETDEV_DEV -#define SET_NETDEV_DEV(net, pdev) do{}
Geert while(0) -#endif - #define ace_sync_irq(irq) synchronize_irq(irq)
 
Geert  #ifndef offset_in_page

Geert Gr{oetje,eeting}s,

Geert  Geert

Geert -- Geert Uytterhoeven -- Sony Network and Software Technology
Geert Center Europe (NSCE) [EMAIL PROTECTED] --- The
Geert Corporate Village, Da Vincilaan 7-D1 Voice +32-2-7008453 Fax
Geert +32-2-7008622  B-1935 Zaventem, Belgium
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Virtual ethernet tunnel

2007-06-06 Thread Pavel Emelianov
Veth stands for Virtual ETHernet. It is a simple tunnel driver
that works at the link layer and looks like a pair of ethernet
devices interconnected with each other.

Mainly it allows to communicate between network namespaces but
it can be used as is as well.

Eric recently sent a similar driver called etun. This
implementation uses another interface - the RTM_NRELINK
message introduced by Patric. The patch fits today netdev
tree with Patrick's patches.

The newlink callback is organized that way to make it easy
to create the peer device in the separate namespace when we
have them in kernel.

The patch for an ip utility is also provided.

Eric, since ethtool interface was from your patch, I add
your Signed-off-by line.

Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
Signed-off-by: Pavel Emelianov [EMAIL PROTECTED]

---

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 7d57f4a..7e144be 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -119,6 +119,12 @@ config TUN
 
  If you don't know what to use this for, you don't need it.
 
+config VETH
+   tristate Virtual ethernet device
+   ---help---
+ The device is an ethernet tunnel. Devices are created in pairs. When
+ one end receives the packet it appears on its pair and vice versa.
+
 config NET_SB1000
tristate General Instruments Surfboard 1000
depends on PNP
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index a77affa..4764119 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -185,6 +185,7 @@ obj-$(CONFIG_MACSONIC) += macsonic.o
 obj-$(CONFIG_MACMACE) += macmace.o
 obj-$(CONFIG_MAC89x0) += mac89x0.o
 obj-$(CONFIG_TUN) += tun.o
+obj-$(CONFIG_VETH) += veth.o
 obj-$(CONFIG_NET_NETX) += netx-eth.o
 obj-$(CONFIG_DL2K) += dl2k.o
 obj-$(CONFIG_R8169) += r8169.o
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
new file mode 100644
index 000..6746c91
--- /dev/null
+++ b/drivers/net/veth.c
@@ -0,0 +1,391 @@
+/*
+ *  drivers/net/veth.c
+ *
+ *  Copyright (C) 2007 OpenVZ http://openvz.org, SWsoft Inc
+ *
+ */
+
+#include linux/list.h
+#include linux/netdevice.h
+#include linux/ethtool.h
+#include linux/etherdevice.h
+
+#include net/dst.h
+#include net/xfrm.h
+#include net/veth.h
+
+#define DRV_NAME   veth
+#define DRV_VERSION1.0
+
+struct veth_priv {
+   struct net_device *peer;
+   struct net_device *dev;
+   struct list_head list;
+   struct net_device_stats stats;
+   unsigned ip_summed;
+};
+
+static LIST_HEAD(veth_list);
+
+/*
+ * ethtool interface
+ */
+
+static struct {
+   const char string[ETH_GSTRING_LEN];
+} ethtool_stats_keys[] = {
+   { peer_ifindex },
+};
+
+static int veth_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
+{
+   cmd-supported  = 0;
+   cmd-advertising= 0;
+   cmd-speed  = SPEED_1;
+   cmd-duplex = DUPLEX_FULL;
+   cmd-port   = PORT_TP;
+   cmd-phy_address= 0;
+   cmd-transceiver= XCVR_INTERNAL;
+   cmd-autoneg= AUTONEG_DISABLE;
+   cmd-maxtxpkt   = 0;
+   cmd-maxrxpkt   = 0;
+   return 0;
+}
+
+static void veth_get_drvinfo(struct net_device *dev, struct ethtool_drvinfo 
*info)
+{
+   strcpy(info-driver, DRV_NAME);
+   strcpy(info-version, DRV_VERSION);
+   strcpy(info-fw_version, N/A);
+}
+
+static void veth_get_strings(struct net_device *dev, u32 stringset, u8 *buf)
+{
+   switch(stringset) {
+   case ETH_SS_STATS:
+   memcpy(buf, ethtool_stats_keys, sizeof(ethtool_stats_keys));
+   break;
+   }
+}
+
+static int veth_get_stats_count(struct net_device *dev)
+{
+   return ARRAY_SIZE(ethtool_stats_keys);
+}
+
+static void veth_get_ethtool_stats(struct net_device *dev,
+   struct ethtool_stats *stats, u64 *data)
+{
+   struct veth_priv *priv;
+
+   priv = netdev_priv(dev);
+   data[0] = priv-peer-ifindex;
+}
+
+static u32 veth_get_rx_csum(struct net_device *dev)
+{
+   struct veth_priv *priv;
+
+   priv = netdev_priv(dev);
+   return priv-ip_summed == CHECKSUM_UNNECESSARY;
+}
+
+static int veth_set_rx_csum(struct net_device *dev, u32 data)
+{
+   struct veth_priv *priv;
+
+   priv = netdev_priv(dev);
+   priv-ip_summed = data ? CHECKSUM_UNNECESSARY : CHECKSUM_NONE;
+   return 0;
+}
+
+static u32 veth_get_tx_csum(struct net_device *dev)
+{
+   return (dev-features  NETIF_F_NO_CSUM) != 0;
+}
+
+static int veth_set_tx_csum(struct net_device *dev, u32 data)
+{
+   if (data)
+   dev-features |= NETIF_F_NO_CSUM;
+   else
+   dev-features = ~NETIF_F_NO_CSUM;
+   return 0;
+}
+
+static struct ethtool_ops veth_ethtool_ops = {
+   .get_settings   = veth_get_settings,
+   .get_drvinfo= veth_get_drvinfo,
+   .get_link   = ethtool_op_get_link,
+   .get_rx_csum

[PATCH] Module for ip utility to support veth device

2007-06-06 Thread Pavel Emelianov
The usage is
# ip link add [name] type veth [peer name] [mac mac] [peer_mac mac]

The Makefile is maybe not as beautiful as it could be. It
is to be discussed.

One thing I noticed during testing is the following. When launching
this with link_veth.so module and not specifying any module specific
parameters, the kernel refuses to accept the packet when parsing the
IFLA_LINKINFO. So the hunk for ip/iplink.c doesn't add an empty extra 
header if no extra data expected.

Signed-off-by: Pavel Emelianov [EMAIL PROTECTED]

---

diff --git a/ip/Makefile b/ip/Makefile
index 9a5bfe3..b46bce3 100644
--- a/ip/Makefile
+++ b/ip/Makefile
@@ -8,8 +8,9 @@ RTMONOBJ=rtmon.o
 ALLOBJ=$(IPOBJ) $(RTMONOBJ)
 SCRIPTS=ifcfg rtpr routel routef
 TARGETS=ip rtmon
+LIBS=link_veth.so
 
-all: $(TARGETS) $(SCRIPTS)
+all: $(TARGETS) $(SCRIPTS) $(LIBS)
 
 ip: $(IPOBJ) $(LIBNETLINK) $(LIBUTIL)
 
@@ -24,3 +25,6 @@ clean:
 
 LDLIBS += -ldl
 LDFLAGS+= -Wl,-export-dynamic
+
+%.so: %.c
+   $(CC) $(CFLAGS) -shared $ -o $@
diff --git a/ip/iplink.c b/ip/iplink.c
index 5170419..6975990 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -287,7 +287,7 @@ static int iplink_modify(int cmd, unsign
 strlen(type));
 
lu = get_link_type(type);
-   if (lu) {
+   if (lu  argc) {
struct rtattr * data = NLMSG_TAIL(req.n);
addattr_l(req.n, sizeof(req), IFLA_INFO_DATA, NULL, 0);
 
diff --git a/ip/link_veth.c b/ip/link_veth.c
new file mode 100644
index 000..adfdef6
--- /dev/null
+++ b/ip/link_veth.c
@@ -0,0 +1,77 @@
+#include string.h
+
+#include utils.h
+#include ip_common.h
+#include veth.h
+
+#define ETH_ALEN   6
+
+static void usage(void)
+{
+   printf(Usage: ip link add ... 
+   [peer peer-name] [mac mac] [peer_mac mac]\n);
+}
+
+static int veth_parse_opt(struct link_util *lu, int argc, char **argv,
+   struct nlmsghdr *hdr)
+{
+   __u8 mac[ETH_ALEN];
+
+   for (; argc != 0; argv++, argc--) {
+   if (strcmp(*argv, peer) == 0) {
+   argv++;
+   argc--;
+   if (argc == 0) {
+   usage();
+   return -1;
+   }
+
+   addattr_l(hdr, 1024, VETH_INFO_PEER,
+   *argv, strlen(*argv));
+
+   continue;
+   }
+
+   if (strcmp(*argv, mac) == 0) {
+   argv++;
+   argc--;
+   if (argc == 0) {
+   usage();
+   return -1;
+   }
+
+   if (hexstring_a2n(*argv, mac, sizeof(mac)) == NULL)
+   return -1;
+
+   addattr_l(hdr, 1024, VETH_INFO_MAC,
+   mac, ETH_ALEN);
+   continue;
+   }
+
+   if (strcmp(*argv, peer_mac) == 0) {
+   argv++;
+   argc--;
+   if (argc == 0) {
+   usage();
+   return -1;
+   }
+
+   if (hexstring_a2n(*argv, mac, sizeof(mac)) == NULL)
+   return -1;
+
+   addattr_l(hdr, 1024, VETH_INFO_PEER_MAC,
+   mac, ETH_ALEN);
+   continue;
+   }
+
+   usage();
+   return -1;
+   }
+
+   return 0;
+}
+
+struct link_util veth_link_util = {
+   .id = veth,
+   .parse_opt = veth_parse_opt,
+};
diff --git a/ip/veth.h b/ip/veth.h
new file mode 100644
index 000..74c8e1e
--- /dev/null
+++ b/ip/veth.h
@@ -0,0 +1,13 @@
+#ifndef __NET_VETH_H__
+#define __NET_VETH_H__
+
+enum {
+   VETH_INFO_UNSPEC,
+   VETH_INFO_MAC,
+   VETH_INFO_PEER,
+   VETH_INFO_PEER_MAC,
+
+   VETH_INFO_MAX
+};
+
+#endif
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Module for ip utility to support veth device

2007-06-06 Thread Patrick McHardy
Pavel Emelianov wrote:
 diff --git a/ip/iplink.c b/ip/iplink.c
 index 5170419..6975990 100644
 --- a/ip/iplink.c
 +++ b/ip/iplink.c
 @@ -287,7 +287,7 @@ static int iplink_modify(int cmd, unsign
strlen(type));
  
   lu = get_link_type(type);
 - if (lu) {
 + if (lu  argc) {
   struct rtattr * data = NLMSG_TAIL(req.n);
   addattr_l(req.n, sizeof(req), IFLA_INFO_DATA, NULL, 0);


I've folded this part into my iproute patch, thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC RTNETLINK 00/09]: Netlink link creation API

2007-06-06 Thread Eric W. Biederman
Patrick McHardy [EMAIL PROTECTED] writes:

 Eric W. Biederman wrote:
 Reading through the patches they look usable to me.
 
 Having to patch iproute to create the more interesting network
 devices sucks, but that problem seems fundamental.  We might
 be able to avoid it if we allowed fields to be reused between
 different types of devices but that makes the error checking
 trickier, and we aren't likely to have that many types of
 devices so there likely isn't much value in generalizing.


 You don't really need to patch it, installing a new iplink_XXX.so
 file is enough. Generalizing driver specific options more than
 what we currently have doesn't look very promising. I think your
 driver was simple enough to get along with the generic device
 attributes though (IFLA_LINK or IFLA_MASTER).

I need to know the other device in the pair of devices I am creating.
If ifindex was selectable IFLA_LINK or IFLA_MASTER might be
interesting however they are currently are not, and I'm not quite
certain about letting a user select the ifindex.  Although there may
come a point when dealing with migration when it makes sense.

Hmm.  I guess if I had a reasonable default I could find out the
pair device by looking at the returned NEW_LINK message.

Looking more close.  IFLA_MASTER does not work because I don't
have a master/slave relationship.

IFLA_LINK looks like it will work.  I don't precisely match the
semantics though which makes me nervous.  In particular my other
device is not something I am sending through but what I am sending
to.  The way the IPv6 code uses iflink to get the link local address
starting with the hardware address of the iflink would be completely
the wrong thing to do in my case.  Now my device won't have the magic
IPv6 tunnel arp type so that code won't trigger.  Still it is
a challenge.

I still think adding a IFLA_PARTNER or a custom attribute is cleaner
in this case.  Slight semantic mismatches are the worst design bugs
to correct.

To some extent this is a practical problematic point for me, because
in the context of multiple network namespaces I could theoretically
have both network devices have the same name and the same ifindex in
different network namespaces.  Although it really doesn't matter
unless they are in the same network namespace in which case they
can't have the same ifindex or ifname.

 I do think we should specify the IFLA_KIND (was: IFLA_NAME) values in
 a header file.  So it is easy to get a list of all of the different
 kinds and so we don't conflict.


 I don't think conflicts are going to be a problem (it would be
 nice if modpost would warn about duplicate aliases though).
 How is listing IFLA_KIND types in a header file going to help
 get a list? Presuming the user knows what kind of device he
 wants to set up and is not just looking for things to play
 around with I also don't see any real value in this.

This isn't about the user this is about maintaining the ABI.

We have to control set of strings for IFLA_KIND.  Having them all
in a single header file means that we can easily look when we add
support for a new kind to see if some other driver has already
used that kind.

The same reason we stick the rest of the enumerations into a header
file.

Strings don't conflict as easily as small integers do, but it is still
possible to have a conflict, so having something like an ifla_kind.h to
hold them would be useful.

Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Virtual ethernet tunnel

2007-06-06 Thread Patrick McHardy
Pavel Emelianov wrote:
 Veth stands for Virtual ETHernet. It is a simple tunnel driver
 that works at the link layer and looks like a pair of ethernet
 devices interconnected with each other.
 
 Mainly it allows to communicate between network namespaces but
 it can be used as is as well.
 
 Eric recently sent a similar driver called etun. This
 implementation uses another interface - the RTM_NRELINK
 message introduced by Patric. The patch fits today netdev
 tree with Patrick's patches.
 
 The newlink callback is organized that way to make it easy
 to create the peer device in the separate namespace when we
 have them in kernel.
 

 +struct veth_priv {
 + struct net_device *peer;
 + struct net_device *dev;
 + struct list_head list;
 + struct net_device_stats stats;


You can use dev-stats instead.

 +static int veth_xmit(struct sk_buff *skb, struct net_device *dev)
 +{
 + struct net_device *rcv = NULL;
 + struct veth_priv *priv, *rcv_priv;
 + int length;
 +
 + skb_orphan(skb);
 +
 + priv = netdev_priv(dev);
 + rcv = priv-peer;
 + rcv_priv = netdev_priv(rcv);
 +
 + if (!(rcv-flags  IFF_UP))
 + goto outf;
 +
 + skb-dev = rcv;

eth_type_trans already sets skb-dev.

 + skb-pkt_type = PACKET_HOST;
 + skb-protocol = eth_type_trans(skb, rcv);
 + if (dev-features  NETIF_F_NO_CSUM)
 + skb-ip_summed = rcv_priv-ip_summed;
 +
 + dst_release(skb-dst);
 + skb-dst = NULL;
 +
 + secpath_reset(skb);
 + nf_reset(skb);


Is skb-mark supposed to survive communication between different
namespaces?

 +static const struct nla_policy veth_policy[VETH_INFO_MAX] = {
 + [VETH_INFO_MAC] = { .type = NLA_BINARY, .len = ETH_ALEN },
 + [VETH_INFO_PEER]= { .type = NLA_STRING },
 + [VETH_INFO_PEER_MAC]= { .type = NLA_BINARY, .len = ETH_ALEN },
 +};


The rtnl_link codes looks fine. I don't like the VETH_INFO_MAC attribute
very much though, we already have a generic device attribute for MAC
addresses. Of course that only allows you to supply one MAC address, so
I'm wondering what you think of allocating only a single device per
newlink operation and binding them in a seperate enslave operation?

 +enum {
 + VETH_INFO_UNSPEC,
 + VETH_INFO_MAC,
 + VETH_INFO_PEER,
 + VETH_INFO_PEER_MAC,
 +
 + VETH_INFO_MAX
 +};

Please follow the

#define VETH_INFO_MAX   (__VETH_INFO_MAX - 1)

convention here.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC RTNETLINK 00/09]: Netlink link creation API

2007-06-06 Thread Patrick McHardy
Eric W. Biederman wrote:
 Patrick McHardy [EMAIL PROTECTED] writes:
 
You don't really need to patch it, installing a new iplink_XXX.so
file is enough. Generalizing driver specific options more than
what we currently have doesn't look very promising. I think your
driver was simple enough to get along with the generic device
attributes though (IFLA_LINK or IFLA_MASTER).
 
 
 I need to know the other device in the pair of devices I am creating.
 If ifindex was selectable IFLA_LINK or IFLA_MASTER might be
 interesting however they are currently are not, and I'm not quite
 certain about letting a user select the ifindex.  Although there may
 come a point when dealing with migration when it makes sense.


It shouldn't be very hard to implement, so far I just didn't see
any use for it.

 Hmm.  I guess if I had a reasonable default I could find out the
 pair device by looking at the returned NEW_LINK message.
 
 Looking more close.  IFLA_MASTER does not work because I don't
 have a master/slave relationship.
 
 IFLA_LINK looks like it will work.  I don't precisely match the
 semantics though which makes me nervous.  In particular my other
 device is not something I am sending through but what I am sending
 to.  The way the IPv6 code uses iflink to get the link local address
 starting with the hardware address of the iflink would be completely
 the wrong thing to do in my case.  Now my device won't have the magic
 IPv6 tunnel arp type so that code won't trigger.  Still it is
 a challenge.
 
 I still think adding a IFLA_PARTNER or a custom attribute is cleaner
 in this case.  Slight semantic mismatches are the worst design bugs
 to correct.


Indeed, IFLA_PARTNER sounds like a better idea. I just suggested to
Pavel to create only a single device per newlink operation and binding
them later, what do you think about that?

I do think we should specify the IFLA_KIND (was: IFLA_NAME) values in
a header file.  So it is easy to get a list of all of the different
kinds and so we don't conflict.


I don't think conflicts are going to be a problem (it would be
nice if modpost would warn about duplicate aliases though).
How is listing IFLA_KIND types in a header file going to help
get a list? Presuming the user knows what kind of device he
wants to set up and is not just looking for things to play
around with I also don't see any real value in this.
 
 
 This isn't about the user this is about maintaining the ABI.
 
 We have to control set of strings for IFLA_KIND.  Having them all
 in a single header file means that we can easily look when we add
 support for a new kind to see if some other driver has already
 used that kind.
 
 The same reason we stick the rest of the enumerations into a header
 file.
 
 Strings don't conflict as easily as small integers do, but it is still
 possible to have a conflict, so having something like an ifla_kind.h to
 hold them would be useful.


Mhh .. we have multiple string based APIs that do just fine. I'd
prefer having someone adding a new driver do a quick grep for
MODULE_ALIAS_RTNL_LINK to adding unused definitions.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Virtual ethernet tunnel

2007-06-06 Thread Patrick McHardy
Pavel Emelianov wrote:

 +MODULE_DESCRIPTION(Virtual Ethernet Tunnel);
 +MODULE_LICENSE(GPL v2);

This seems to be missing MODULE_ALIAS_RTNL_LINK(veth);
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC RTNETLINK 00/09]: Netlink link creation API

2007-06-06 Thread Alexey Kuznetsov
Hello!

I just suggested to
 Pavel to create only a single device per newlink operation and binding
 them later,

I see some logical inconsistency here.

Look, the second end is supposed to be in another namespace.
It will have identity, which cannot be expressed in any way in namespace,
which is allowed to create the pair: name, ifindex - nothing
is shared between namespaces.

Moreover, do not forget we have two netlink spaces as well.
Events happening in one namespace are reported only inside that namespace.

Actually, the only self-consistent solution, which I see right now
(sorry, did not think that much) is to create the whole pair as
one operation; required parameters (name of partner, identity of namespace)
can be passed as attributes. I guess IFLA_PARTNER approach suggests
the same thing, right?

As response to this action two replies are generated: one RTM_NEWLINK
for one end of device with the whole desciption of partnet
is broadcasted inside this namespace, another RTM_NETLINK with index/name
of partner device is broadcasted inside the second namespace
(and, probably, some attributes, which must be hidden inside namespace,
f.e. identity of main device is suppressed). 

Alexey


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC RTNETLINK 00/09]: Netlink link creation API

2007-06-06 Thread Eric W. Biederman
Patrick McHardy [EMAIL PROTECTED] writes:

 I still think adding a IFLA_PARTNER or a custom attribute is cleaner
 in this case.  Slight semantic mismatches are the worst design bugs
 to correct.


 Indeed, IFLA_PARTNER sounds like a better idea. I just suggested to
 Pavel to create only a single device per newlink operation and binding
 them later, what do you think about that?

I don't think it solves much because we still need a way to report the
partner device.

On the actual using side I think it makes the core of the driver much
more difficult to get right.  

Basically if we can't count on having a partner device we have to
add NULL pointer checks and locking to the packet dispatch which
is otherwise unnecessary.

Eric

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: warnings in git-wireless

2007-06-06 Thread James Ketrenos
Christoph Hellwig wrote:
 On Tue, Jun 05, 2007 at 01:12:03PM -0700, James Ketrenos wrote:
 Yes, we certainly don't want a driver to be near mainline that does 
 things that the rest of the kernel and other drivers are doing.  We should 
 force them to stay out-of-tree until any and everything is resolved.  
 Heaven forbid that the code should be merged, contributed, and improved 
 upon as a community.

 You'd think these guys were trying to enable folks with a driver so they 
 can use the hardware they bought or something.  Its almost laughable.  What 
 idiots.
 
 Maybe you should make sure your companies publishes specs

I'm flattered that you think I am high enough at Intel to do that.  If it was 
in my power, accurate and complete specifications would be available for all of 
Intel's hardware--for everyone.  I doubt you'll find a single Linux developer 
at Intel who is not fighting to try and reach that goal, including myself.

The reality is, however, (and this may shock you) I am not a vice president or 
head of a major division at Intel.  Its true.  Intel doesn't do what I want 
just because I snap my fingers and say it should be so.

Instead, we try and do what we can -- enable people with a driver (and a 
community to contribute into) so that the hardware they purchased can operate 
to its fullest potential on Linux.  We realize within the community that some 
areas are difficult for people outside of Intel to initially contribute to 
given the lack of specs.  However, there are vast quantities of features and 
functionality that can be enabled and improved given nothing more than the 
driver sources.

Anyway, its obvious from your emails that our efforts are not sufficient and 
you seem to resent that we're even trying.  I apologize.

If you'd like to submit patches to help us make the perfect driver, we're very 
willing to review and merge those in.

James
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC RTNETLINK 00/09]: Netlink link creation API

2007-06-06 Thread Patrick McHardy
Eric W. Biederman wrote:
 Patrick McHardy [EMAIL PROTECTED] writes:
 
 
I still think adding a IFLA_PARTNER or a custom attribute is cleaner
in this case.  Slight semantic mismatches are the worst design bugs
to correct.


Indeed, IFLA_PARTNER sounds like a better idea. I just suggested to
Pavel to create only a single device per newlink operation and binding
them later, what do you think about that?
 
 
 I don't think it solves much because we still need a way to report the
 partner device.

I was thinking of something like this:

ip link add veth0 type veth
ip link add veth1 partner veth0 type veth

ip would resolve veth0 to an ifindex and set IFLA_PARTNER. But Alexey
just raised a few good points, so this might not work.

 On the actual using side I think it makes the core of the driver much
 more difficult to get right.  
 
 Basically if we can't count on having a partner device we have to
 add NULL pointer checks and locking to the packet dispatch which
 is otherwise unnecessary.


All you'd need to do is keep the queue stopped until the device
is bound. No changes to rx or tx path neccessary.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Virtual ethernet tunnel

2007-06-06 Thread Stephen Hemminger
On Wed, 06 Jun 2007 19:11:38 +0400
Pavel Emelianov [EMAIL PROTECTED] wrote:

 Veth stands for Virtual ETHernet. It is a simple tunnel driver
 that works at the link layer and looks like a pair of ethernet
 devices interconnected with each other.
 
 Mainly it allows to communicate between network namespaces but
 it can be used as is as well.
 
 Eric recently sent a similar driver called etun. This
 implementation uses another interface - the RTM_NRELINK
 message introduced by Patric. The patch fits today netdev
 tree with Patrick's patches.
 
 The newlink callback is organized that way to make it easy
 to create the peer device in the separate namespace when we
 have them in kernel.
 
 The patch for an ip utility is also provided.
 
 Eric, since ethtool interface was from your patch, I add
 your Signed-off-by line.
 
 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
 Signed-off-by: Pavel Emelianov [EMAIL PROTECTED]
 
 ---
 
 diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
 index 7d57f4a..7e144be 100644
 --- a/drivers/net/Kconfig
 +++ b/drivers/net/Kconfig
 @@ -119,6 +119,12 @@ config TUN
  
 If you don't know what to use this for, you don't need it.
  
 +config VETH
 + tristate Virtual ethernet device
 + ---help---
 +   The device is an ethernet tunnel. Devices are created in pairs. When
 +   one end receives the packet it appears on its pair and vice versa.
 +
  config NET_SB1000
   tristate General Instruments Surfboard 1000
   depends on PNP
 diff --git a/drivers/net/Makefile b/drivers/net/Makefile
 index a77affa..4764119 100644
 --- a/drivers/net/Makefile
 +++ b/drivers/net/Makefile
 @@ -185,6 +185,7 @@ obj-$(CONFIG_MACSONIC) += macsonic.o
  obj-$(CONFIG_MACMACE) += macmace.o
  obj-$(CONFIG_MAC89x0) += mac89x0.o
  obj-$(CONFIG_TUN) += tun.o
 +obj-$(CONFIG_VETH) += veth.o
  obj-$(CONFIG_NET_NETX) += netx-eth.o
  obj-$(CONFIG_DL2K) += dl2k.o
  obj-$(CONFIG_R8169) += r8169.o
 diff --git a/drivers/net/veth.c b/drivers/net/veth.c
 new file mode 100644
 index 000..6746c91
 --- /dev/null
 +++ b/drivers/net/veth.c
 @@ -0,0 +1,391 @@
 +/*
 + *  drivers/net/veth.c
 + *
 + *  Copyright (C) 2007 OpenVZ http://openvz.org, SWsoft Inc
 + *
 + */
 +
 +#include linux/list.h
 +#include linux/netdevice.h
 +#include linux/ethtool.h
 +#include linux/etherdevice.h
 +
 +#include net/dst.h
 +#include net/xfrm.h
 +#include net/veth.h
 +
 +#define DRV_NAME veth
 +#define DRV_VERSION  1.0
 +
 +struct veth_priv {
 + struct net_device *peer;
 + struct net_device *dev;
 + struct list_head list;
 + struct net_device_stats stats;
 + unsigned ip_summed;
 +};
 +
 +static LIST_HEAD(veth_list);
 +
 +/*
 + * ethtool interface
 + */
 +
 +static struct {
 + const char string[ETH_GSTRING_LEN];
 +} ethtool_stats_keys[] = {
 + { peer_ifindex },
 +};

Seems like a good usage of sysfs attributes, rather than ethtool.

Then you can get rid of all the useless ethtool for what is
basically a virtual device.

 +/*
 + * xmit
 + */
 +
 +static int veth_xmit(struct sk_buff *skb, struct net_device *dev)
 +{
 + struct net_device *rcv = NULL;
 + struct veth_priv *priv, *rcv_priv;
 + int length;
 +
 + skb_orphan(skb);
 +
 + priv = netdev_priv(dev);
 + rcv = priv-peer;
 + rcv_priv = netdev_priv(rcv);
 +
 + if (!(rcv-flags  IFF_UP))
 + goto outf;
 +
 + skb-dev = rcv;
 + skb-pkt_type = PACKET_HOST;
 + skb-protocol = eth_type_trans(skb, rcv);
 + if (dev-features  NETIF_F_NO_CSUM)
 + skb-ip_summed = rcv_priv-ip_summed;
 +
 + dst_release(skb-dst);
 + skb-dst = NULL;
 +
 + secpath_reset(skb);
 + nf_reset(skb);
 +
 + length = skb-len;
 +
 + priv-stats.tx_bytes += length;
 + priv-stats.tx_packets++;
 +
 + rcv_priv-stats.rx_bytes += length;
 + rcv_priv-stats.rx_packets++;

Per-cpu stats? This will cacheline thrash.

 + netif_rx(skb);
 + return 0;
 +
 +outf:
 + kfree_skb(skb);
 + priv-stats.tx_dropped++;
 + return 0;
 +}



-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC RTNETLINK 00/09]: Netlink link creation API

2007-06-06 Thread Patrick McHardy
Alexey Kuznetsov wrote:
   I just suggested to
Pavel to create only a single device per newlink operation and binding
them later,
 
 
 I see some logical inconsistency here.
 
 Look, the second end is supposed to be in another namespace.
 It will have identity, which cannot be expressed in any way in namespace,
 which is allowed to create the pair: name, ifindex - nothing
 is shared between namespaces.


Good point, I didn't think of that. Is there a version of this patch
that already uses different namespaces so I can look at it?

Are network namespace completely seperated or is there some hierarchy
with all lower namespaces visible above or something like that?

 Moreover, do not forget we have two netlink spaces as well.
 Events happening in one namespace are reported only inside that namespace.
 
 Actually, the only self-consistent solution, which I see right now
 (sorry, did not think that much) is to create the whole pair as
 one operation; required parameters (name of partner, identity of namespace)
 can be passed as attributes. I guess IFLA_PARTNER approach suggests
 the same thing, right?


I imagined it more as a bind operation, pretty similar to enslave, so
it would only contain an ifindex, no parameters. But as you say that
doesn't work, so I guess we'd have to nest an entire ifinfomsg + the
attributes for the partner device under it .. not exactly pretty.

 As response to this action two replies are generated: one RTM_NEWLINK
 for one end of device with the whole desciption of partnet
 is broadcasted inside this namespace, another RTM_NETLINK with index/name
 of partner device is broadcasted inside the second namespace
 (and, probably, some attributes, which must be hidden inside namespace,
 f.e. identity of main device is suppressed). 


The identity of the main device has no meaning within a different
namespace, but are there other reasons for hiding it?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC RTNETLINK 04/09]: Link creation API

2007-06-06 Thread Eric W. Biederman

 +static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 +{
 + struct rtnl_link_ops *ops;
 + struct net_device *dev;
 + struct ifinfomsg *ifm;
 + char name[MODULE_NAME_LEN];
 + char ifname[IFNAMSIZ];
 + struct nlattr *tb[IFLA_MAX+1];
 + struct nlattr *linkinfo[IFLA_INFO_MAX+1];
 + int err;
 +
 + err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFLA_MAX, ifla_policy);
 + if (err  0)
 + return err;
 +
 + if (tb[IFLA_IFNAME])
 + nla_strlcpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ);
 + else
 + ifname[0] = '\0';
 +
 + ifm = nlmsg_data(nlh);
 + if (ifm-ifi_index  0)
 + dev = __dev_get_by_index(ifm-ifi_index);
 + else if (ifname[0])
 + dev = __dev_get_by_name(ifname);
 + else
 + dev = NULL;
 +
 + if (tb[IFLA_LINKINFO]) {
 + err = nla_parse_nested(linkinfo, IFLA_INFO_MAX,
 +tb[IFLA_LINKINFO], ifla_info_policy);
 + if (err  0)
 + return err;
 + } else
 + memset(linkinfo, 0, sizeof(linkinfo));
 +
 + if (linkinfo[IFLA_INFO_NAME]) {
 + nla_strlcpy(name, linkinfo[IFLA_INFO_NAME], sizeof(name));
 + ops = rtnl_link_ops_get(name);

Ugh.  Shouldn't we have the request_module logic here?
Otherwise it looks like we can skip the validate method and 
have other weird interactions.


 + } else {
 + name[0] = '\0';
 + ops = NULL;
 + }
 +
 + if (1) {
 + struct nlattr *attr[ops ? ops-maxtype + 1 : 0], **data = NULL;
 +
 + if (ops) {
 + if (ops-maxtype  linkinfo[IFLA_INFO_DATA]) {
 + err = nla_parse_nested(attr, ops-maxtype,
 + linkinfo[IFLA_INFO_DATA],
 +ops-policy);
 + if (err  0)
 + return err;
 + data = attr;
 + }
 + if (ops-validate) {
 + err = ops-validate(tb, data);
 + if (err  0)
 + return err;
 + }
 + }
 +
 + if (dev) {
 + int modified = 0;
 +
 + if (nlh-nlmsg_flags  NLM_F_EXCL)
 + return -EEXIST;
 + if (nlh-nlmsg_flags  NLM_F_REPLACE)
 + return -EOPNOTSUPP;
 +
 + if (linkinfo[IFLA_INFO_DATA]) {
 + if (!ops || ops != dev-rtnl_link_ops ||
 + !ops-changelink)
 + return -EOPNOTSUPP;
 +
 + err = ops-changelink(dev, tb, data);
 + if (err  0)
 + return err;
 + modified = 1;
 + }
 +
 + return do_setlink(dev, ifm, tb, ifname, modified);
 + }
 +
 + if (!(nlh-nlmsg_flags  NLM_F_CREATE))
 + return -ENODEV;
 +
 + if (ifm-ifi_index)
 + return -EINVAL;
 + if (tb[IFLA_ADDRESS] || tb[IFLA_BROADCAST] || tb[IFLA_MAP])
 + return -EOPNOTSUPP;
 +
 +#ifdef CONFIG_KMOD
 + if (!ops  name[0]) {
 + /* race condition: device may be created while rtnl is
 +  * unlocked, final register_netdevice will catch it.
 +  */
 + __rtnl_unlock();
 + request_module(rtnl-link-%s, name);
 + rtnl_lock();
 + ops = rtnl_link_ops_get(name);
 + }
 +#endif
 + if (!ops)
 + return -EOPNOTSUPP;

 +
 + if (!ifname[0])
 + snprintf(ifname, IFNAMSIZ, %s%%d, ops-name);
 + dev = alloc_netdev(ops-priv_size, ifname, ops-setup);
 + if (!dev)
 + return -ENOMEM;
 +
 + if (strchr(dev-name, '%')) {
 + err = dev_alloc_name(dev, dev-name);
 + if (err  0)
 + goto err_free;
 + }
 + dev-rtnl_link_ops = ops;
 +
 + if (tb[IFLA_MTU])
 + dev-mtu = nla_get_u32(tb[IFLA_MTU]);
 + if (tb[IFLA_TXQLEN])
 + dev-tx_queue_len = nla_get_u32(tb[IFLA_TXQLEN]);
 + if (tb[IFLA_WEIGHT])
 + dev-weight = nla_get_u32(tb[IFLA_WEIGHT]);
 + if (tb[IFLA_OPERSTATE])
 + set_operstate(dev, nla_get_u8(tb[IFLA_OPERSTATE]));
 + if (tb[IFLA_LINKMODE])
 + dev-link_mode = nla_get_u8(tb[IFLA_LINKMODE]);
 +
 +

Re: [RFC RTNETLINK 04/09]: Link creation API

2007-06-06 Thread Patrick McHardy
Eric W. Biederman wrote:
+ if (linkinfo[IFLA_INFO_NAME]) {
+ nla_strlcpy(name, linkinfo[IFLA_INFO_NAME], sizeof(name));
+ ops = rtnl_link_ops_get(name);
 
 
 Ugh.  Shouldn't we have the request_module logic here?
 Otherwise it looks like we can skip the validate method and 
 have other weird interactions.


Good catch. The easiest solution seems be to simply replay the
request after successful module load, which also avoids the
device lookup race.

Something like this (untested).

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 8d2f817..f2868b0 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -930,6 +930,7 @@ static int rtnl_newlink(struct sk_buff *skb, struct 
nlmsghdr *nlh, void *arg)
struct nlattr *linkinfo[IFLA_INFO_MAX+1];
int err;
 
+replay:
err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFLA_MAX, ifla_policy);
if (err  0)
return err;
@@ -1012,19 +1013,19 @@ static int rtnl_newlink(struct sk_buff *skb, struct 
nlmsghdr *nlh, void *arg)
if (tb[IFLA_ADDRESS] || tb[IFLA_BROADCAST] || tb[IFLA_MAP])
return -EOPNOTSUPP;
 
+   if (!ops) {
 #ifdef CONFIG_KMOD
-   if (!ops  kind[0]) {
-   /* race condition: device may be created while rtnl is
-* unlocked, final register_netdevice will catch it.
-*/
-   __rtnl_unlock();
-   request_module(rtnl-link-%s, kind);
-   rtnl_lock();
-   ops = rtnl_link_ops_get(kind);
-   }
+   if (kind[0]) {
+   __rtnl_unlock();
+   request_module(rtnl-link-%s, kind);
+   rtnl_lock();
+   ops = rtnl_link_ops_get(kind);
+   if (ops)
+   goto replay;
+   }
 #endif
-   if (!ops)
return -EOPNOTSUPP;
+   }
 
if (!ifname[0])
snprintf(ifname, IFNAMSIZ, %s%%d, ops-kind);


Re: [RFC RTNETLINK 00/09]: Netlink link creation API

2007-06-06 Thread Eric W. Biederman
Patrick McHardy [EMAIL PROTECTED] writes:

 Alexey Kuznetsov wrote:
  I just suggested to
Pavel to create only a single device per newlink operation and binding
them later,
 
 
 I see some logical inconsistency here.
 
 Look, the second end is supposed to be in another namespace.
 It will have identity, which cannot be expressed in any way in namespace,
 which is allowed to create the pair: name, ifindex - nothing
 is shared between namespaces.


 Good point, I didn't think of that. Is there a version of this patch
 that already uses different namespaces so I can look at it?

We have posted patches a couple of times, and veth or etun were always
a part of it.  But except for some book keeping details we really don't
care.

 Are network namespace completely seperated or is there some hierarchy
 with all lower namespaces visible above or something like that?

Completely separated.  The goal is to look like two separate machines
to user space, with respect to the network stack.

There is a bit of a hierarchy usage wise.  Because frequently only
one namespace will have real hardware devices in it.  So everything
needs to route through there.  But that detail is a usage detail
and is easiest not to reflect in the actual implementation.

 I imagined it more as a bind operation, pretty similar to enslave, so
 it would only contain an ifindex, no parameters. But as you say that
 doesn't work, so I guess we'd have to nest an entire ifinfomsg + the
 attributes for the partner device under it .. not exactly pretty.

In the model I'm working in, is that there is a separate operation:
move device to other namespace, which should work for any network device.

So there should be an interval immediately after device creation
when both devices are in the same namespace, and then one of the
pair is moved to another namespace.

 As response to this action two replies are generated: one RTM_NEWLINK
 for one end of device with the whole desciption of partnet
 is broadcasted inside this namespace, another RTM_NETLINK with index/name
 of partner device is broadcasted inside the second namespace
 (and, probably, some attributes, which must be hidden inside namespace,
 f.e. identity of main device is suppressed). 


 The identity of the main device has no meaning within a different
 namespace, but are there other reasons for hiding it?

Not really.  We can already recognized the type of the device.

Eric

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/5] NetXen: Add NETXEN prefix to macros

2007-06-06 Thread wendy xiong
On Sun, 2007-06-03 at 11:50 -0400, Jeff Garzik wrote:
 Mithlesh Thukral wrote:
  NetXen: Add NETXEN prefix to a macro
  This patch will add the NETXEN prefix to USER_START macro.
  
  Signed-off by: Wen Xiong [EMAIL PROTECTED]
  Signed-off by: Mithlesh Thukral [EMAIL PROTECTED]
  ---
  
   drivers/net/netxen/netxen_nic.h |4 ++--
   drivers/net/netxen/netxen_nic_ethtool.c |2 +-
   drivers/net/netxen/netxen_nic_hw.c  |4 ++--
   drivers/net/netxen/netxen_nic_init.c|4 ++--
   4 files changed, 7 insertions(+), 7 deletions(-)
 
 Your patch description is useless.  Clearly we know -what- it does, 
 simply by reading the patch.
 
 But it does not answer the simple question:  why?  why is this needed in 
 a bug fix Release Candidate series?

Hi Jeff,

When we backported netxen driver to 2.6.9 kernel on ppc, we saw the
compile errors for USER_START. Because there is another definition for
USER_START which is used by arch/ppc64/mm/hash_utils.c and native.c.

So we like to add NETXEN before USER_START in netxen device driver, then
netxen driver can be compiled correctly in 2.6.9 kernel.

Let me know if you any question for this patch.

Thanks,
wendy Xiong



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC RTNETLINK 04/09]: Link creation API

2007-06-06 Thread Eric W. Biederman
Patrick McHardy [EMAIL PROTECTED] writes:

 Eric W. Biederman wrote:
+if (linkinfo[IFLA_INFO_NAME]) {
+nla_strlcpy(name, linkinfo[IFLA_INFO_NAME], sizeof(name));
+ops = rtnl_link_ops_get(name);
 
 
 Ugh.  Shouldn't we have the request_module logic here?
 Otherwise it looks like we can skip the validate method and 
 have other weird interactions.


 Good catch. The easiest solution seems be to simply replay the
 request after successful module load, which also avoids the
 device lookup race.

Looks reasonable to me.   There might be a variable or two that
we need to make certain is initialized but at a quick glance it
looks ok.

Eric


 Something like this (untested).

 diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
 index 8d2f817..f2868b0 100644
 --- a/net/core/rtnetlink.c
 +++ b/net/core/rtnetlink.c
 @@ -930,6 +930,7 @@ static int rtnl_newlink(struct sk_buff *skb, struct 
 nlmsghdr
 *nlh, void *arg)
   struct nlattr *linkinfo[IFLA_INFO_MAX+1];
   int err;
  
 +replay:
   err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFLA_MAX, ifla_policy);
   if (err  0)
   return err;
 @@ -1012,19 +1013,19 @@ static int rtnl_newlink(struct sk_buff *skb, struct
 nlmsghdr *nlh, void *arg)
   if (tb[IFLA_ADDRESS] || tb[IFLA_BROADCAST] || tb[IFLA_MAP])
   return -EOPNOTSUPP;
  
 + if (!ops) {
  #ifdef CONFIG_KMOD
 - if (!ops  kind[0]) {
 - /* race condition: device may be created while rtnl is
 -  * unlocked, final register_netdevice will catch it.
 -  */
 - __rtnl_unlock();
 - request_module(rtnl-link-%s, kind);
 - rtnl_lock();
 - ops = rtnl_link_ops_get(kind);
 - }
 + if (kind[0]) {
 + __rtnl_unlock();
 + request_module(rtnl-link-%s, kind);
 + rtnl_lock();
 + ops = rtnl_link_ops_get(kind);
 + if (ops)
 + goto replay;
 + }
  #endif
 - if (!ops)
   return -EOPNOTSUPP;
 + }
  
   if (!ifname[0])
   snprintf(ifname, IFNAMSIZ, %s%%d, ops-kind);
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/5] NetXen: Add NETXEN prefix to macros

2007-06-06 Thread Jeff Garzik
On Wed, Jun 06, 2007 at 12:52:44PM -0500, wendy xiong wrote:
 On Sun, 2007-06-03 at 11:50 -0400, Jeff Garzik wrote:
  Mithlesh Thukral wrote:
   NetXen: Add NETXEN prefix to a macro
   This patch will add the NETXEN prefix to USER_START macro.
   
   Signed-off by: Wen Xiong [EMAIL PROTECTED]
   Signed-off by: Mithlesh Thukral [EMAIL PROTECTED]
   ---
   
drivers/net/netxen/netxen_nic.h |4 ++--
drivers/net/netxen/netxen_nic_ethtool.c |2 +-
drivers/net/netxen/netxen_nic_hw.c  |4 ++--
drivers/net/netxen/netxen_nic_init.c|4 ++--
4 files changed, 7 insertions(+), 7 deletions(-)
  
  Your patch description is useless.  Clearly we know -what- it does, 
  simply by reading the patch.
  
  But it does not answer the simple question:  why?  why is this needed in 
  a bug fix Release Candidate series?
 
 Hi Jeff,
 
 When we backported netxen driver to 2.6.9 kernel on ppc, we saw the
 compile errors for USER_START. Because there is another definition for
 USER_START which is used by arch/ppc64/mm/hash_utils.c and native.c.
 
 So we like to add NETXEN before USER_START in netxen device driver, then
 netxen driver can be compiled correctly in 2.6.9 kernel.
 
 Let me know if you any question for this patch.

That answers my questions, but doesn't address the issue:  You need
to regenerate the patch with the description improved as described.

The patch description is copied verbatim into the kernel changelog, and
preserved verbatim for all eternity.  Four years from now, we need to be
able to read the patch description and understand -why- the NETXEN_xxx
prefix was added.  Simply stating that the NETXEN_xxx prefix was added
is useless, as it is self-evident from reading the patch itself.

Jeff



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] NetXen: Fix ping issue

2007-06-06 Thread wendy xiong
On Sun, 2007-06-03 at 11:51 -0400, Jeff Garzik wrote:
 Mithlesh Thukral wrote:
  NetXen: Fix initialization and subsequent ping issue
  This patch will fix the initialization and ping issues seen on
  certain PPC architecture blades.
  
  Signed-off by: Wen Xiong [EMAIL PROTECTED]
  Signed-off by: Mithlesh Thukral [EMAIL PROTECTED]
  ---
  
   drivers/net/netxen/netxen_nic_main.c |7 +++
   drivers/net/netxen/netxen_nic_niu.c  |8 ++--
   2 files changed, 9 insertions(+), 6 deletions(-)
 
 Again, your patch description is useless.
 
 You should describe the problem being fixed, and how/why the changes 
 seen in the patch actually fix the issue.
 
Hi Jeff,

Ping problem description:
After we moved up netxen adapter's firmware to 3.4.19, we saw this ping
problem on x/pBlade.  After configured interface up, ping -c 1
10.10.10.10 failed. Netxen adapter couldn't accept ARP broadcast packet
somehow. If I manually added MAC address in the ARP table, then ping
start working.

netxen adapter should finish initilization after system boot. But on
some platform, looks netxen adapter didn't initilization correctly after
system boot up, so have to re-load the firmware again in probe routine.
Also re-initilization netxen_config_0 and netxen_config_1 registers.

Let me know if you have any question for this patch.

Thanks
Wendy Xiong





-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/5] NetXen: Add NETXEN prefix to macros

2007-06-06 Thread wendy xiong
On Wed, 2007-06-06 at 14:05 -0400, Jeff Garzik wrote:
 On Wed, Jun 06, 2007 at 12:52:44PM -0500, wendy xiong wrote:
  On Sun, 2007-06-03 at 11:50 -0400, Jeff Garzik wrote:
   Mithlesh Thukral wrote:
NetXen: Add NETXEN prefix to a macro
This patch will add the NETXEN prefix to USER_START macro.

Signed-off by: Wen Xiong [EMAIL PROTECTED]
Signed-off by: Mithlesh Thukral [EMAIL PROTECTED]
---

 drivers/net/netxen/netxen_nic.h |4 ++--
 drivers/net/netxen/netxen_nic_ethtool.c |2 +-
 drivers/net/netxen/netxen_nic_hw.c  |4 ++--
 drivers/net/netxen/netxen_nic_init.c|4 ++--
 4 files changed, 7 insertions(+), 7 deletions(-)
   
   Your patch description is useless.  Clearly we know -what- it does, 
   simply by reading the patch.
   
   But it does not answer the simple question:  why?  why is this needed in 
   a bug fix Release Candidate series?
  
  Hi Jeff,
  
  When we backported netxen driver to 2.6.9 kernel on ppc, we saw the
  compile errors for USER_START. Because there is another definition for
  USER_START which is used by arch/ppc64/mm/hash_utils.c and native.c.
  
  So we like to add NETXEN before USER_START in netxen device driver, then
  netxen driver can be compiled correctly in 2.6.9 kernel.
  
  Let me know if you any question for this patch.
 
 That answers my questions, but doesn't address the issue:  You need
 to regenerate the patch with the description improved as described.
 
 The patch description is copied verbatim into the kernel changelog, and
 preserved verbatim for all eternity.  Four years from now, we need to be
 able to read the patch description and understand -why- the NETXEN_xxx
 prefix was added.  Simply stating that the NETXEN_xxx prefix was added
 is useless, as it is self-evident from reading the patch itself.
 
   Jeff
 
 
 
Jeff, 
Thanks.
We will re-generate the patch with correct issue title and description.

Thanks,
wendy

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2.6.22-rc4] ehea: Fixed possible kernel panic on VLAN packet recv

2007-06-06 Thread Thomas Klein
This patch fixes a possible kernel panic due to not checking the vlan group
when processing received VLAN packets and a malfunction in VLAN/hypervisor
registration.


Signed-off-by: Thomas Klein [EMAIL PROTECTED]
---


diff -Nurp -X dontdiff linux-2.6.22-rc4/drivers/net/ehea/ehea.h 
patched_kernel/drivers/net/ehea/ehea.h
--- linux-2.6.22-rc4/drivers/net/ehea/ehea.h2007-06-05 02:57:25.0 
+0200
+++ patched_kernel/drivers/net/ehea/ehea.h  2007-06-06 12:53:58.0 
+0200
@@ -39,7 +39,7 @@
 #include asm/io.h
 
 #define DRV_NAME   ehea
-#define DRV_VERSIONEHEA_0061
+#define DRV_VERSIONEHEA_0064
 
 #define EHEA_MSG_DEFAULT (NETIF_MSG_LINK | NETIF_MSG_TIMER \
| NETIF_MSG_RX_ERR | NETIF_MSG_TX_ERR)
diff -Nurp -X dontdiff linux-2.6.22-rc4/drivers/net/ehea/ehea_main.c 
patched_kernel/drivers/net/ehea/ehea_main.c
--- linux-2.6.22-rc4/drivers/net/ehea/ehea_main.c   2007-06-05 
02:57:25.0 +0200
+++ patched_kernel/drivers/net/ehea/ehea_main.c 2007-06-06 12:53:58.0 
+0200
@@ -451,7 +451,8 @@ static struct ehea_cqe *ehea_proc_rwqes(
processed_rq3++;
}
 
-   if (cqe-status  EHEA_CQE_VLAN_TAG_XTRACT)
+   if ((cqe-status  EHEA_CQE_VLAN_TAG_XTRACT)
+port-vgrp)
vlan_hwaccel_receive_skb(skb, port-vgrp,
 cqe-vlan_tag);
else
@@ -1910,10 +1911,7 @@ static void ehea_vlan_rx_register(struct
goto out;
}
 
-   if (grp)
-   memset(cb1-vlan_filter, 0, sizeof(cb1-vlan_filter));
-   else
-   memset(cb1-vlan_filter, 0xFF, sizeof(cb1-vlan_filter));
+   memset(cb1-vlan_filter, 0, sizeof(cb1-vlan_filter));
 
hret = ehea_h_modify_ehea_port(adapter-handle, port-logical_port_id,
   H_PORT_CB1, H_PORT_CB1_ALL, cb1);
@@ -1947,7 +1945,7 @@ static void ehea_vlan_rx_add_vid(struct 
}
 
index = (vid / 64);
-   cb1-vlan_filter[index] |= ((u64)(1  (vid  0x3F)));
+   cb1-vlan_filter[index] |= ((u64)(0x8000  (vid  0x3F)));
 
hret = ehea_h_modify_ehea_port(adapter-handle, port-logical_port_id,
   H_PORT_CB1, H_PORT_CB1_ALL, cb1);
@@ -1982,7 +1980,7 @@ static void ehea_vlan_rx_kill_vid(struct
}
 
index = (vid / 64);
-   cb1-vlan_filter[index] = ~((u64)(1  (vid  0x3F)));
+   cb1-vlan_filter[index] = ~((u64)(0x8000  (vid  0x3F)));
 
hret = ehea_h_modify_ehea_port(adapter-handle, port-logical_port_id,
   H_PORT_CB1, H_PORT_CB1_ALL, cb1);

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] NetXen: Fix ping issue

2007-06-06 Thread Jeff Garzik

You'll need to resend the patch...

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] b44: timer power saving

2007-06-06 Thread Michael Buesch
On Monday 04 June 2007 22:25:37 Stephen Hemminger wrote:
 Make the PHY and statistic timer run on one second boundary
 for powersaving.
 
 On resume, the driver should check for link up immediately, to
 get online faster (rather than waiting for the next second).
 
 Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]
 
 
 ---
  drivers/net/b44.c |9 -
  1 file changed, 4 insertions(+), 5 deletions(-)
 
 --- a/drivers/net/b44.c   2007-06-04 12:31:27.0 -0700
 +++ b/drivers/net/b44.c   2007-06-04 12:31:34.0 -0700
 @@ -599,8 +599,7 @@ static void b44_timer(unsigned long __op
  
   spin_unlock_irq(bp-lock);
  
 - bp-timer.expires = jiffies + HZ;
 - add_timer(bp-timer);
 + mod_timer(bp-timer, round_jiffies(jiffies + HZ));
  }
  
  static void b44_tx(struct b44 *bp)
 @@ -2348,11 +2347,11 @@ static int b44_resume(struct pci_dev *pd
   netif_device_attach(bp-dev);
   spin_unlock_irq(bp-lock);
  
 - bp-timer.expires = jiffies + HZ;
 - add_timer(bp-timer);
 -
   b44_enable_ints(bp);
   netif_wake_queue(dev);
 +
 + mod_timer(bp-timer, jiffies + 1);

I don't think we need +1, if you need to fire immediately
(on the next tick). The timer core will always fire
timers that are in the past immediately.

-- 
Greetings Michael.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC RTNETLINK 00/09]: Netlink link creation API

2007-06-06 Thread Alexey Kuznetsov
Hello!

 Good point, I didn't think of that. Is there a version of this patch
 that already uses different namespaces so I can look at it?

Pavel does not like the idea. It looks not exactly pretty, like you said. :-)

The alternative is to create pair in main namespace and then move
one end to another namespace renaming it and changing index.
Why I do not like it? Because this makes RTM_NEWLINK just useless step,
all its work is undone and real work is remade when the device moves,
with all the unrettiness moved to another place.

From another hand, some move operation is required in any case.
Right now in openvz the problem is solved in tricky, but quite
inerseting way: all the devices in main namespace are assigned
with odd index, child devices get odd index. So that, when a device
moves from main namespace to child, openvz does not need to change
ifindex, conflict is impossible. Well, it is working approach.
But it is not pretty either.


 Are network namespace completely seperated or is there some hierarchy
 with all lower namespaces visible above or something like that?

Right now they are completely separate.

It is possible to make child devices visible in parent namespace
like it is done for process pids: i.e. there is an abstract identity
which is seen under different names and indices in different namespaces.
Sounds cool, but this add a lot of complexity, which has no meaning
outside of context of device creation, I do not think it is worth to do.




 The identity of the main device has no meaning within a different
 namespace, but are there other reasons for hiding it?

Perhaps, security. It is not a good idea to leak information
about parent namespace to child namespace.

Also, people will want to see emulated ethernet inside namespace
looking exactly like ethernet. No freaking additional attributes.

Alexey
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] NetXen: Fix link status messages

2007-06-06 Thread wendy xiong
On Sun, 2007-06-03 at 11:51 -0400, Jeff Garzik wrote:
 Mithlesh Thukral wrote:
  NetXen: Correct link status messages.
  This patch will fix the problem of wrong link status messages
  that were reported. 
  
  Signed-off by: Wen Xiong [EMAIL PROTECTED]
  Signed-off by: Mithlesh Thukral [EMAIL PROTECTED]
  ---
  
   drivers/net/netxen/netxen_nic.h  |1 +
   drivers/net/netxen/netxen_nic_init.c |   21 +
   drivers/net/netxen/netxen_nic_isr.c  |   23 +++
   3 files changed, 37 insertions(+), 8 deletions(-)
 
 dropped due to previous patches being dropped
Hi Jeff, 

Thanks for your review the patch!
We are going to re-submit the patch with correct problem description.

Thanks,
wendy

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Virtual ethernet tunnel

2007-06-06 Thread David Miller
From: Pavel Emelianov [EMAIL PROTECTED]
Date: Wed, 06 Jun 2007 19:11:38 +0400

 Veth stands for Virtual ETHernet. It is a simple tunnel driver
 that works at the link layer and looks like a pair of ethernet
 devices interconnected with each other.

I would suggest choosing a different name.

'veth' is also the name of the virtualized ethernet device
found on IBM machines, driven by driver/net/ibmveth.[ch]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PCNET32] Lock solid with netconsole

2007-06-06 Thread Francois Romieu
Emmanuel Fusté [EMAIL PROTECTED] :
[...]
 Did you plan to submit the spin_lock_irqsave patch to mainline ?

No. I will not have enough time to figure why/if it fixes things.

-- 
Ueimor
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Devel] Re: [PATCH] Virtual ethernet tunnel

2007-06-06 Thread Daniel Lezcano

David Miller wrote:

From: Pavel Emelianov [EMAIL PROTECTED]
Date: Wed, 06 Jun 2007 19:11:38 +0400

  

Veth stands for Virtual ETHernet. It is a simple tunnel driver
that works at the link layer and looks like a pair of ethernet
devices interconnected with each other.



I would suggest choosing a different name.

'veth' is also the name of the virtualized ethernet device
found on IBM machines, driven by driver/net/ibmveth.[ch]
  

Eric Biederman proposed the name etun which stands for Ethernet TUNnel.

The goal of the pair device is to pass network packets between namespaces.
That reminds me the well known pipe which pass data between processes.

All network devices have a name describing what they do : eth for 
ethernet,

dummy for a device doing nothing, loopback, ...

Perhaps, a name like epipe or npipe, which reflects what does the 
device, is more appropriate ?


-- Daniel


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Devel] Re: [PATCH] Virtual ethernet tunnel

2007-06-06 Thread David Miller
From: Daniel Lezcano [EMAIL PROTECTED]
Date: Wed, 06 Jun 2007 22:38:11 +0200

 Perhaps, a name like epipe or npipe, which reflects what does the 
 device, is more appropriate ?

'npipe' (Network PIPE) or 'epipe' (Ethernet PIPE) are fine with me.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] b44: timer power saving

2007-06-06 Thread Stephen Hemminger

 I don't think we need +1, if you need to fire immediately
 (on the next tick). The timer core will always fire
 timers that are in the past immediately.

Thanks, but is it worth bothering to change it again?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-06 Thread jamal
On Wed, 2007-06-06 at 17:11 +0200, Patrick McHardy wrote:

 I haven't followed the entire discussion, but I still don't see a
 alternative to touching the qdisc layer - multiple hardware queues
 need multiple queue states if you want to avoid a busy hardware
 queue stopping the qdisc entirely 

If you start with the above premise then ...

 and thereby preventing the qdisc
 to continue feeding packets to other active HW queues. And to make
 use of the multiple queue states you need multiple queues.

 you will logically lead to the above conclusion. 

[Which of course leads to the complexity (and not optimizing for the
common - which is single ring NICs)].

The problem is the premise is _innacurate_.
Since you havent followed the discussion, i will try to be brief (which
is hard).
If you want verbosity it is in my previous emails:

Consider a simple example of strict prio qdisc which is mirror
configuration of a specific hardware. 
Then for sake of discussion, assume two prio queues in the qdisc - PSL
and PSH and two hardware queues/rings in a NIC which does strict prio
with queues PHL and PHH.
The mapping is as follows:
PSL --- maps to --- PHL
PSH --- maps to --- PHH

Assume the PxH has a higher prio than PxL.
Strict prio will always favor H over L.

Two scenarios:
a) a lot of packets for PSL arriving on the stack.
They only get sent from PSL - PHL if and only if there are no
packets from PSH-PHH.
b)a lot of packets for PSH arriving from the stack.
They will always be favored over PSL in sending to the hardware.

From the above:
The only way PHL will ever shutdown the path to the hardware is when
there are sufficient PHL packets.
Corrollary,
The only way PSL will ever shutdown the path to the hardware is when
there are _NO_ PSH packets.

So there is no need to do per queue control because the schedule will
ensure things work out fine as long as what you have the correct qdisc;
and it is a qdisc that will work just fine with a single ring with zero
mods.
What you need is a driver API to ask it to select the ring given an
index. This is similar to the qdisc filter used to select a queue.

You can extend the use case i described above to N queues. You can
extend it to other schedulers (WRR or any non-work conserving queues)
etc. It is consistent. Of course if you configure CBQ for a hardware
that does strict prio - that is a misconfig etc.

Infact for the wired case i see little value (there is some) in using
multiple rings. In the case of wireless (which is strict prio based) it
provides more value.

 I would love to see your alternative patches.

From the above you can see they are simple. I am working on a couple of
things (batching and recovering pktgen ipsec patches)- I will work on
those patches soon after.

Iam actually not against the subqueue control - i know Peter needs it
for certain hardware; i am just against the mucking around of the common
case (single ring NIC) just to get that working. 
 
cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] NET: Multiqueue network device support.

2007-06-06 Thread Waskiewicz Jr, Peter P
 [Which of course leads to the complexity (and not optimizing 
 for the common - which is single ring NICs)].

The common for 100 Mbit and older 1Gbit is single ring NICs.  Newer
PCI-X and PCIe NICs from 1Gbit to 10Gbit support multiple rings in the
hardware, and it's all headed in that direction, so it's becoming the
common case.

 Infact for the wired case i see little value (there is some) 
 in using multiple rings. In the case of wireless (which is 
 strict prio based) it provides more value.

There is value, hence why NIC manufacturers are building wired parts
that support multiple rings today.  And wireless may not want strict
prio in software, and may just want round-robin from the stack.  Either
way, Yi Zhu has represented the wireless side in this discussion
agreeing with these per-queue control patches.  Is wireless not a common
case to be considered?

  I would love to see your alternative patches.
 
 From the above you can see they are simple.

The description above won't provide what I'm trying to solve and what
wireless has stated they want.

 Iam actually not against the subqueue control - i know Peter 
 needs it for certain hardware; i am just against the mucking 
 around of the common case (single ring NIC) just to get that working. 

Single-ring NICs see no difference here.  Please explain why using my
patches with pfifo_fast, sch_prio, or any other existing qdisc will
change the behavior for single-ring NICs?  If the driver doesn't call
alloc_etherdev_mq() explicity, or use the new sch_rr qdisc, then the Tx
path is identical to the kernel today.  What am I mucking around with?
And these patches are not for specific hardware; rather they're for all
the NICs today that have multiple rings, and want to control them in the
OS instead of the driver, which is most of wireless and a handful of
NICs from Intel and Neterion afaik.

-PJ Waskiewicz
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-06 Thread David Miller
From: jamal [EMAIL PROTECTED]
Date: Wed, 06 Jun 2007 18:13:40 -0400

 From the above you can see they are simple. I am working on a couple of
 things (batching and recovering pktgen ipsec patches)- I will work on
 those patches soon after.
 
 Iam actually not against the subqueue control - i know Peter needs it
 for certain hardware; i am just against the mucking around of the common
 case (single ring NIC) just to get that working. 

There are other reasons to do interesting things in this area,
purely for parallelization reasons.

For example, consider a chip that has N totally independant TX packet
queues going out to the same ethernet port.  You can lock and transmit
on them independantly, and the chip internally arbitrates using DRR or
whatever to blast the queues out to the physical port in some fair'ish
manner.

In that case you'd want to be able to do something like:

struct mydev_tx_queue *q = mydev-tx_q[smp_processor_id() % N];

or similar in the -hard_start_xmit() driver.  But something generic
to support this kind of parallelization would be great (and necessary)
because the TX lock is unary per netdev and destroys all of the
parallelization possible with something like the above.

With the above for transmit, and having N struct napi_struct
instances for MSI-X directed RX queues, we'll have no problem keeping
a 10gbit (or even faster) port completely full with lots of cpu to
spare on multi-core boxes.

However, I have to disagree with your analysis of the multi-qdisc
situation, and I tend to agree with Patrick.

If you only have one qdisc to indicate status on, when is the queue
full?  That is the core issue.  Indicating full status when any of
the hardware queues are full is broken, because we should never
block out queuing of higher priority packets just because the
low priority queue can't take any more frames, _and_ vice versa.

I really want to believe your proofs but they are something out of
a fairy tale :-)

 The only way PHL will ever shutdown the path to the hardware is when
 there are sufficient PHL packets.
 Corrollary,
 The only way PSL will ever shutdown the path to the hardware is when
 there are _NO_ PSH packets.

The problem with this line of thinking is that it ignores the fact
that it is bad to not queue to the device when there is space
available, _even_ for lower priority packets.

The more you keep all available TX queues full, the less likely
delays in CPU processing will lead to a device with nothing to
do.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-06 Thread David Miller
From: Waskiewicz Jr, Peter P [EMAIL PROTECTED]
Date: Wed, 6 Jun 2007 15:30:39 -0700

  [Which of course leads to the complexity (and not optimizing 
  for the common - which is single ring NICs)].
 
 The common for 100 Mbit and older 1Gbit is single ring NICs.  Newer
 PCI-X and PCIe NICs from 1Gbit to 10Gbit support multiple rings in the
 hardware, and it's all headed in that direction, so it's becoming the
 common case.

I totally agree.  No modern commodity 1gb and faster card is going
to be without many queues on both TX and RX.

  Iam actually not against the subqueue control - i know Peter 
  needs it for certain hardware; i am just against the mucking 
  around of the common case (single ring NIC) just to get that working. 
 
 Single-ring NICs see no difference here.  Please explain why using my
 patches with pfifo_fast, sch_prio, or any other existing qdisc will
 change the behavior for single-ring NICs?

I agree with the implication here, there is no penalty for existing
devices.

There are two core issues in my mind:

1) multi-queue on both RX and TX is going to be very pervasive very
   soon, everyone is putting this into silicon.

   The parallelization gain potential is enormous, and we have to
   design for this.

2) Queues are meant to be filled as much as possible, you can't do
   that by having only one qdisc attached to the device indicating
   unary full status, you simply can't.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] NET: Multiqueue network device support.

2007-06-06 Thread Waskiewicz Jr, Peter P
 However, I have to disagree with your analysis of the 
 multi-qdisc situation, and I tend to agree with Patrick.
 
 If you only have one qdisc to indicate status on, when is the 
 queue full?  That is the core issue.  Indicating full status 
 when any of the hardware queues are full is broken, because 
 we should never block out queuing of higher priority packets 
 just because the low priority queue can't take any more 
 frames, _and_ vice versa.
 
 I really want to believe your proofs but they are something 
 out of a fairy tale :-)

Dave,
Can we move forward on this please?  If you are comfortable that
my patches give the kernel the ability to manage hardware queues
sufficiently, I'd like to request that 2.6.23 be opened (wink wink) so I
can submit the patches for inclusion to that kernel.

Thanks,

-PJ Waskiewicz
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-06 Thread David Miller
From: Waskiewicz Jr, Peter P [EMAIL PROTECTED]
Date: Wed, 6 Jun 2007 15:57:35 -0700

   Can we move forward on this please?  If you are comfortable
 that my patches give the kernel the ability to manage hardware
 queues sufficiently, I'd like to request that 2.6.23 be opened (wink
 wink) so I can submit the patches for inclusion to that kernel.

While I am growing in support of your changes, there are
two things:

1) I want to study them more and hear more about what Patrick has to
   say about them when he returns from his trip on Sunday

2) I don't want to open up a net-2.6.23 tree yet so that people
   concentrate on bug fixes and regressions, pick an open bug or
   regression report and help out if you want net-2.6.23 cut faster
   :-)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] NET: Multiqueue network device support.

2007-06-06 Thread Waskiewicz Jr, Peter P
 While I am growing in support of your changes, there are two things:
 
 1) I want to study them more and hear more about what Patrick has to
say about them when he returns from his trip on Sunday
 
 2) I don't want to open up a net-2.6.23 tree yet so that people
concentrate on bug fixes and regressions, pick an open bug or
regression report and help out if you want net-2.6.23 cut faster
:-)

Where can I find such reports?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/3] qla3xxx: cleanup checksum offload code

2007-06-06 Thread Ron Mercer
Jeff,
You've already appled patch 1/3 in Stephen's series.  I tested this one
(2/3) and it looks good.  I can resubmit this if you want.
Regards, Ron Mercer

 -Original Message-
 From: Stephen Hemminger [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, May 30, 2007 2:23 PM
 To: Jeff Garzik; Linux Driver
 Cc: netdev@vger.kernel.org
 Subject: [PATCH 2/3] qla3xxx: cleanup checksum offload code
 
 The code for checksum is more complex than needed when 
 dealing with VLAN's;
 the higher layers already pass down the location of the IP header.
 
 Compile tested only, no hardware available.
 
 Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]
 
 ---
  drivers/net/qla3xxx.c |   37 +++--
  1 file changed, 11 insertions(+), 26 deletions(-)
 
 --- a/drivers/net/qla3xxx.c   2007-05-30 14:09:25.0 -0700
 +++ b/drivers/net/qla3xxx.c   2007-05-30 14:09:31.0 -0700
 @@ -2433,37 +2433,22 @@ static int ql_get_seg_count(struct ql3_a
   return -1;
  }
  
 -static void ql_hw_csum_setup(struct sk_buff *skb,
 +static void ql_hw_csum_setup(const struct sk_buff *skb,
struct ob_mac_iocb_req *mac_iocb_ptr)
  {
 - struct ethhdr *eth;
 - struct iphdr *ip = NULL;
 - u8 offset = ETH_HLEN;
 -
 - eth = (struct ethhdr *)(skb-data);
 -
 - if (eth-h_proto == __constant_htons(ETH_P_IP)) {
 - ip = (struct iphdr *)skb-data[ETH_HLEN];
 - } else if (eth-h_proto == htons(ETH_P_8021Q) 
 -((struct vlan_ethhdr *)skb-data)-
 -h_vlan_encapsulated_proto == 
 __constant_htons(ETH_P_IP)) {
 - ip = (struct iphdr *)skb-data[VLAN_ETH_HLEN];
 - offset = VLAN_ETH_HLEN;
 - }
 -
 - if (ip) {
 - if (ip-protocol == IPPROTO_TCP) {
 - mac_iocb_ptr-flags1 |= 
 OB_3032MAC_IOCB_REQ_TC | 
 + const struct iphdr *ip = ip_hdr(skb);
 +
 + mac_iocb_ptr-ip_hdr_off = skb_network_offset(skb);
 + mac_iocb_ptr-ip_hdr_len = ip-ihl;
 +
 + if (ip-protocol == IPPROTO_TCP) {
 + mac_iocb_ptr-flags1 |= OB_3032MAC_IOCB_REQ_TC |
   OB_3032MAC_IOCB_REQ_IC;
 - mac_iocb_ptr-ip_hdr_off = offset;
 - mac_iocb_ptr-ip_hdr_len = ip-ihl;
 - } else if (ip-protocol == IPPROTO_UDP) {
 - mac_iocb_ptr-flags1 |= 
 OB_3032MAC_IOCB_REQ_UC | 
 + } else {
 + mac_iocb_ptr-flags1 |= OB_3032MAC_IOCB_REQ_UC |
   OB_3032MAC_IOCB_REQ_IC;
 - mac_iocb_ptr-ip_hdr_off = offset;
 - mac_iocb_ptr-ip_hdr_len = ip-ihl;
 - }
   }
 +
  }
  
  /*
 
 -- 
 Stephen Hemminger [EMAIL PROTECTED]
 
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-06 Thread jamal
On Wed, 2007-06-06 at 15:35 -0700, David Miller wrote:
 From: jamal [EMAIL PROTECTED]
 Date: Wed, 06 Jun 2007 18:13:40 -0400

 There are other reasons to do interesting things in this area,
 purely for parallelization reasons.
 
 For example, consider a chip that has N totally independant TX packet
 queues going out to the same ethernet port.  You can lock and transmit
 on them independantly, and the chip internally arbitrates using DRR or
 whatever to blast the queues out to the physical port in some fair'ish
 manner.
 
 In that case you'd want to be able to do something like:
 
   struct mydev_tx_queue *q = mydev-tx_q[smp_processor_id() % N];
 
 or similar in the -hard_start_xmit() driver.  But something generic
 to support this kind of parallelization would be great (and necessary)
 because the TX lock is unary per netdev and destroys all of the
 parallelization possible with something like the above.
 

I cant think of any egress scheduler that will benefit from that
approach. The scheduler is the decider of which packet goes out next
on the wire.

 With the above for transmit, and having N struct napi_struct
 instances for MSI-X directed RX queues, we'll have no problem keeping
 a 10gbit (or even faster) port completely full with lots of cpu to
 spare on multi-core boxes.
 

RX queues - yes, I can see;  TX queues, it doesnt make sense to put
different rings on different CPUs.

 However, I have to disagree with your analysis of the multi-qdisc
 situation, and I tend to agree with Patrick.
 If you only have one qdisc to indicate status on, when is the queue
 full?  That is the core issue. 

I just described why it is not an issue. If you make the assumption it
is an issue, then it becomes one. 


  Indicating full status when any of
 the hardware queues are full is broken, because we should never
 block out queuing of higher priority packets just because the
 low priority queue can't take any more frames, _and_ vice versa.

Dave, you didnt read anything i said ;- The situation you describe is
impossible. low prio will never block high prio.

 I really want to believe your proofs but they are something out of
 a fairy tale :-)

They are a lot real than it seems. Please read again what i typed in ;-
And i will produce patches since this seems to be complex to explain.

  The only way PHL will ever shutdown the path to the hardware is when
  there are sufficient PHL packets.
  Corrollary,
  The only way PSL will ever shutdown the path to the hardware is when
  there are _NO_ PSH packets.
 
 The problem with this line of thinking is that it ignores the fact
 that it is bad to not queue to the device when there is space
 available, _even_ for lower priority packets.

So use a different scheduler. Dont use strict prio. Strict prio will
guarantee starvation of low prio packets as long as there are high prio
packets. Thats the intent.

 The more you keep all available TX queues full, the less likely
 delays in CPU processing will lead to a device with nothing to
 do.

It is design intent - thats how the specific scheduler works. 

cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: warnings in git-wireless

2007-06-06 Thread Andrew Morton
On Wed, 06 Jun 2007 13:51:41 -0700 James Ketrenos [EMAIL PROTECTED] wrote:

 
   * make C=2 CF=-Wall will complain if you use ARRAY_SIZE on global data
   */
  #define GLOBAL_ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
  
  This is identical to ARRAY_SIZE.
 
  And if there's some problem with ARRAY_SIZE then fix ARRAY_SIZE!  Don't go 
  off and create some private thing and leave everyone else twisting in the
  wind.
 
 The code was resolving the sparse warnings.  GLOBAL_ARRAY_SIZE removes the 
 part of the ARRAY_SIZE definition that causes sparse to complain ('+ 
 __must_be_array(arr)').  I don't know enough about how sparse works to fix 
 sparse, or to know if its a problem with __must_be_array.  The code itself 
 was fine -- using an array with ARRAY_SIZE.

(These 340-column emails are rather hard to reply to)

Your GLOBAL_ARRAY_SIZE() is, afaict, identical to ARRAY_SIZE().

If ARRAY_SIZE() is spitting some sparse warning then please report it and
we'll take a look into it.

 
  /* Debug and printf string expansion helpers for printing bitfields */
  #define BIT_FMT8 %c%c%c%c-%c%c%c%c
  #define BIT_FMT16 BIT_FMT8 : BIT_FMT8
  #define BIT_FMT32 BIT_FMT16   BIT_FMT16
 ...
  #define BIT_ARG32(x) \
  BITC(x,31),BITC(x,30),BITC(x,29),BITC(x,28),\
  BITC(x,27),BITC(x,26),BITC(x,25),BITC(x,24),\
  BITC(x,23),BITC(x,22),BITC(x,21),BITC(x,20),\
  BITC(x,19),BITC(x,18),BITC(x,17),BITC(x,16),\
  BIT_ARG16(x)
  
  None of the above is appropriate to a driver-private header.
 
 Where would be the appropriate place?

Dunno.  include/linux/bitfield-helpers.h?

  We use it in with iwlwifi; I don't know if others need it or want to use it. 
  Do you know of other projects using something similar?

I haven't looked.

  #define KELVIN_TO_CELSIUS(x) ((x)-273)
  
  Nor is that.
 
 Where would the appropriate place be?

include/linux/temperature.h?  acpi could use it, and there are other things
we could put into temperature.h

  static inline const struct ieee80211_hw_mode *iwl_get_hw_mode(struct 
  iwl_priv
   *priv, int mode)
  {
 int i;
 
 for (i = 0; i  3; i++)
 if (priv-modes[i].mode == mode)
 return priv-modes[i];
 
 return NULL;
  }
  
  Far too large to inline, has five callsites.
 
 Currently CodingStyle states to use inline where you might have otherwise 
 used a macro, and then later if the function is not overly complex (citing 3 
 lines as a guideline).  Is this too long because it has a for loop in it?  
 Or a loop and a branch?

Anything more than 10-20 instructions turns out to be too large.

 Removing static inline from the functions declared in header files means they 
 need to be moved to .c files, declared as extern, and pollute the namespace.  
 In prior drivers we had been beaten up about doing that...

You were mis-beaten-up.  Choose an appropriate namespace (iwl_* sounds OK),
stick to it and you'll be fine.

  #define WLAN_FC_GET_TYPE(fc)(((fc)  IEEE80211_FCTL_FTYPE))
  #define WLAN_FC_GET_STYPE(fc)   (((fc)  IEEE80211_FCTL_STYPE))
  #define WLAN_GET_SEQ_FRAG(seq)  ((seq)  0x000f)
  #define WLAN_GET_SEQ_SEQ(seq)   ((seq)  4)
  
  These don't need to be macros
 
 What would you like these to be?

/*
 * comment goes here
 */
static inline suitable_return_type wlan_fc_get_type(whatever_type_fc_has fc)
{
return fc  IEEE80211_FCTL_FTYPE;
}

 These currently exist as macros in ieee80211.h and there are other examples 
 in the kernel of similar macros.  If a goal is to remove *all macros* then 
 that should be stated in CodingStyle, preferably in a way that helps 
 developers understand how they are supposed to write their code.

These things come up again and again in lkml code-review.  Could be that
CodingStyle doesn't cover everything.  Common sense and taste applies too.

  #define QOS_CONTROL_LEN 2
 
  static inline u16 *ieee80211_get_qos_ctrl(struct ieee80211_hdr *hdr)
  {
 int hdr_len = ieee80211_get_hdrlen(hdr-frame_control);
 if (hdr-frame_control  IEEE80211_STYPE_QOS_DATA)
 return (u16 *) ((u8 *) hdr + (hdr_len) - QOS_CONTROL_LEN);
 return NULL;
  }
  
  Two callsites, too large to inline.
 
 If something is used more than once then it is unsuitable for an inline?  
 Again maybe updating CodingStyle would be helpful?
 
 Change:
   Generally, inline functions are preferable to macros resembling 
 functions.
 
 To:
   Macros resembling functions and inline functions should *NOT* be used.
 
 IIRC, ieee80211_get_qos_ctrl used to be a macros, and was then changed to an 
 inline per CodingStyle.  Now we don't want inlines, and instead want pure 
 functions (and consequently polluted namespace--or do we want to add to 
 CodingStyle that all functions should be implemented in a single file and 
 marked static?)

inlines are better than macros because they are more readable, because they
have typechecking and because for some reasons programmers are more likely
to 

Re: [PATCH] NET: Multiqueue network device support.

2007-06-06 Thread jamal
On Wed, 2007-06-06 at 15:40 -0700, David Miller wrote:


 There are two core issues in my mind:
 
 1) multi-queue on both RX and TX is going to be very pervasive very
soon, everyone is putting this into silicon.
 
The parallelization gain potential is enormous, and we have to
design for this.
 

There is no potential for parallelizing on transmit that i can think of.
Dave, please explain it slowly so i can understand it.

There is huge potential for parallelizing on receive. But i am certainly
missing the value in the transmit.

cheers,
jamal


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-06 Thread Jeff Garzik
On Wed, Jun 06, 2007 at 04:14:12PM -0700, Waskiewicz Jr, Peter P wrote:
  While I am growing in support of your changes, there are two things:
  
  1) I want to study them more and hear more about what Patrick has to
 say about them when he returns from his trip on Sunday
  
  2) I don't want to open up a net-2.6.23 tree yet so that people
 concentrate on bug fixes and regressions, pick an open bug or
 regression report and help out if you want net-2.6.23 cut faster
 :-)
 
 Where can I find such reports?

Michal Piotrowski's list is a good place to start, while not
network-specific, is turning into _the_ place to list regressions.

Jeff



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


bnx2 driver reports WoL not supported when it actually works

2007-06-06 Thread Tim Mann
Hi, Michael.  We have some IBM blades with a BCM5708-based dual-port
NIC that the bnx2 driver reports as not supporting wake-on-LAN.
(Ethtool says Supports Wake-on: d.)  However, IBM says that WoL is
supported by these NICs, and in fact when I tried putting the blade
into soft-off and sending a magic packet to it, it did wake up!  I
tried both ports, and they both worked.

Some more details:  The driver identifies the NIC as follows:

Broadcom NetXtreme II BCM5708 1000Base-SX (B2) PCI-X 64-bit 133MHz found at mem 
da00, IRQ 153, node addr  00145 ed6f8ea

Here's the full ethtool output:

Supported ports: [ FIBRE ]
Supported link modes:   1000baseT/Full 
Supports auto-negotiation: Yes
Advertised link modes:  1000baseT/Full 
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: FIBRE
PHYAD: 2
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: d
Wake-on: d
Cannot get message level: Function not implemented
Link detected: yes

I added a printk to get the full contents of the chip_id register; it's
0x57081021.  So the driver is deciding that the NIC doesn't support WoL
because the SERDES bit is set in the CHIP_BOND_ID.  (It's not the later
CHIP_ID test since this NIC is a 5708_B2.)

/* Disable WOL support if we are running on a SERDES chip. */
if (CHIP_BOND_ID(bp)  CHIP_BOND_ID_SERDES_BIT) {
bp-phy_flags |= PHY_SERDES_FLAG;
bp-flags |= NO_WOL_FLAG;
if (CHIP_NUM(bp) == CHIP_NUM_5708) {
bp-phy_addr = 2;
reg = REG_RD_IND(bp, bp-shmem_base +
 BNX2_SHARED_HW_CFG_CONFIG);
if (reg  BNX2_SHARED_HW_CFG_PHY_2_5G)
bp-phy_flags |= PHY_2_5G_CAPABLE_FLAG;
}
}

if ((CHIP_ID(bp) == CHIP_ID_5708_A0) ||
(CHIP_ID(bp) == CHIP_ID_5708_B0) ||
(CHIP_ID(bp) == CHIP_ID_5708_B1))
bp-flags |= NO_WOL_FLAG;

It would be easy to fix this by commenting out the line that sets the
NO_WOL_FLAG if the SERDES bit is set, but I assume the line is there
for a reason -- I imagine there are *some* NICs that don't support WoL
if the SERDES bit is set (?).

Can you find out what the right fix is?  (Or do we need to ask IBM?)
It's important to me because my software is conservative and does not
put a machine to sleep if ethtool says that the machine doesn't have a
NIC that supports WoL.

Thanks,
-- 
Tim Mann  work: [EMAIL PROTECTED]  home: [EMAIL PROTECTED]
  http://www.vmware.com  http://tim-mann.org
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-06 Thread Rick Jones

  RX queues - yes, I can see;  TX queues, it doesnt make sense to put

different rings on different CPUs.


To what extent might that preclude some cachelines bouncing hither and 
yon between the CPUs?


rick jones

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-06 Thread David Miller
From: jamal [EMAIL PROTECTED]
Date: Wed, 06 Jun 2007 19:32:46 -0400

 On Wed, 2007-06-06 at 15:35 -0700, David Miller wrote:
  With the above for transmit, and having N struct napi_struct
  instances for MSI-X directed RX queues, we'll have no problem keeping
  a 10gbit (or even faster) port completely full with lots of cpu to
  spare on multi-core boxes.
  
 
 RX queues - yes, I can see;  TX queues, it doesnt make sense to put
 different rings on different CPUs.

For the locking is makes a ton of sense.

If you have sendmsg() calls going on N cpus, would you rather
they:

1) All queue up to the single netdev-tx_lock

or

2) All take local per-hw-queue locks

to transmit the data they are sending?

I thought this was obvious... guess not :-)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-06 Thread David Miller
From: jamal [EMAIL PROTECTED]
Date: Wed, 06 Jun 2007 19:32:46 -0400

 So use a different scheduler. Dont use strict prio. Strict prio will
 guarantee starvation of low prio packets as long as there are high prio
 packets. Thats the intent.

Ok, point taken.

There are of course other uses for multiple TX queues, and in
particular my finer-grained locking example.

I'm still amazed the TX locking issue wasn't clear to you,
too nervous about tonight's game? :)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-06 Thread jamal
On Wed, 2007-06-06 at 16:48 -0700, Rick Jones wrote:
RX queues - yes, I can see;  TX queues, it doesnt make sense to put
  different rings on different CPUs.
 
 To what extent might that preclude some cachelines bouncing hither and 
 yon between the CPUs?

I think the bouncing will exist a lot more with the multi CPUs. But one
would assume if you go that path, you would also parallelize the stack
on egress to reduce such an effect. I guess the point i am not seeing is
the value. The tx, once hitting the NIC is an IO issue not a CPU issue.
OTOH, the receive path once a packet is received, that is a CPU problem
(and therefore multi CPUs help).

To be fair to Peter, that is not what his patches are trying to address
(and infact, they cant solve that problem).

off for the night.

cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-06 Thread David Miller
From: jamal [EMAIL PROTECTED]
Date: Wed, 06 Jun 2007 19:35:46 -0400

 There is no potential for parallelizing on transmit that i can think of.
 Dave, please explain it slowly so i can understand it.
 
 There is huge potential for parallelizing on receive. But i am certainly
 missing the value in the transmit.

I gave an example in another response, you have N processes
queueing up data for TCP or UDP or whatever in parallel on
different cpus, all going out the same 10gbit device.

All of them enter into -hard_start_xmit(), and thus all of them try
to take the same netdev-tx_lock

If they have multiple TX queues, independantly programmable, that
single lock is stupid.

We could use per-queue TX locks for such hardware, but we can't
support that currently.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-06 Thread David Miller
From: Rick Jones [EMAIL PROTECTED]
Date: Wed, 06 Jun 2007 16:48:59 -0700

RX queues - yes, I can see;  TX queues, it doesnt make sense to put
  different rings on different CPUs.
 
 To what extent might that preclude some cachelines bouncing hither and 
 yon between the CPUs?

I think per-TX-queue locking takes locality as another advantage.
You only touch the TX descriptors for queue N, rather than a single
globally shared one.

Same goes for RX.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-06 Thread David Miller
From: jamal [EMAIL PROTECTED]
Date: Wed, 06 Jun 2007 19:54:47 -0400

 On Wed, 2007-06-06 at 16:48 -0700, Rick Jones wrote:
 RX queues - yes, I can see;  TX queues, it doesnt make sense to put
   different rings on different CPUs.
  
  To what extent might that preclude some cachelines bouncing hither and 
  yon between the CPUs?
 
 I think the bouncing will exist a lot more with the multi CPUs. But one
 would assume if you go that path, you would also parallelize the stack
 on egress to reduce such an effect. I guess the point i am not seeing is
 the value. The tx, once hitting the NIC is an IO issue not a CPU issue.

Disagred, that single TX lock kills cpu cycles.

If all of the TX queues are independantly programmable of one
another, the single TX lock kills performance.

 off for the night.

Enjoy the game.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: warnings in git-wireless

2007-06-06 Thread James Ketrenos
Andrew Morton wrote:
 On Wed, 06 Jun 2007 13:51:41 -0700 James Ketrenos [EMAIL PROTECTED] wrote:
 
  * make C=2 CF=-Wall will complain if you use ARRAY_SIZE on global data
  */
 #define GLOBAL_ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
 This is identical to ARRAY_SIZE.

 And if there's some problem with ARRAY_SIZE then fix ARRAY_SIZE!  Don't go 
 off and create some private thing and leave everyone else twisting in the
 wind.
 The code was resolving the sparse warnings.  
 
...
 Your GLOBAL_ARRAY_SIZE() is, afaict, identical to ARRAY_SIZE().

From include/linux/kernel.h

#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))

From drivers/net/wireless/mac80211/iwlwifi/iwl-helpers.h

#define GLOBAL_ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))

The '+ __must_be_array(arr)' part of ARRAY_SIZE was causing sparse to complain 
with:

drivers/net/wireless/mac80211/iwlwifi/base.c:4646:22: error: cannot size 
expression
...

When I had run my builds, I had restricted the sparse checks to just iwlwifi 
(vs. checking the rest of the kernel).

I just ran it C=2 CF=-Wall against the rest of the kernel and do see other code 
with the same problem, eg:

sound/core/memalloc.c:521:14: error: cannot size expression
...

I had erroneously thought it was just a problem with iwlwifi...

Thanks,
James
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.22-rc4] ehea: Fixed possible kernel panic on VLAN packet recv

2007-06-06 Thread Michael Ellerman
On Wed, 2007-06-06 at 20:53 +0200, Thomas Klein wrote:
 This patch fixes a possible kernel panic due to not checking the vlan group
 when processing received VLAN packets and a malfunction in VLAN/hypervisor
 registration.
 
 
 Signed-off-by: Thomas Klein [EMAIL PROTECTED]
 ---
 
 
 diff -Nurp -X dontdiff linux-2.6.22-rc4/drivers/net/ehea/ehea.h 
 patched_kernel/drivers/net/ehea/ehea.h
 --- linux-2.6.22-rc4/drivers/net/ehea/ehea.h  2007-06-05 02:57:25.0 
 +0200
 +++ patched_kernel/drivers/net/ehea/ehea.h2007-06-06 12:53:58.0 
 +0200
 @@ -39,7 +39,7 @@
  #include asm/io.h
  
  #define DRV_NAME ehea
 -#define DRV_VERSION  EHEA_0061
 +#define DRV_VERSION  EHEA_0064
  
  #define EHEA_MSG_DEFAULT (NETIF_MSG_LINK | NETIF_MSG_TIMER \
   | NETIF_MSG_RX_ERR | NETIF_MSG_TX_ERR)
 diff -Nurp -X dontdiff linux-2.6.22-rc4/drivers/net/ehea/ehea_main.c 
 patched_kernel/drivers/net/ehea/ehea_main.c
 --- linux-2.6.22-rc4/drivers/net/ehea/ehea_main.c 2007-06-05 
 02:57:25.0 +0200
 +++ patched_kernel/drivers/net/ehea/ehea_main.c   2007-06-06 
 12:53:58.0 +0200
 @@ -1947,7 +1945,7 @@ static void ehea_vlan_rx_add_vid(struct 
   }
  
   index = (vid / 64);
 - cb1-vlan_filter[index] |= ((u64)(1  (vid  0x3F)));
 + cb1-vlan_filter[index] |= ((u64)(0x8000  (vid  0x3F)));
  
   hret = ehea_h_modify_ehea_port(adapter-handle, port-logical_port_id,
  H_PORT_CB1, H_PORT_CB1_ALL, cb1);
 @@ -1982,7 +1980,7 @@ static void ehea_vlan_rx_kill_vid(struct
   }
  
   index = (vid / 64);
 - cb1-vlan_filter[index] = ~((u64)(1  (vid  0x3F)));
 + cb1-vlan_filter[index] = ~((u64)(0x8000  (vid  0x3F)));

These two seem ripe for splitting into some sort of helper routine.
Which would leave only one place to get it right.

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person


signature.asc
Description: This is a digitally signed message part


Re: [PATCH] NET: Multiqueue network device support.

2007-06-06 Thread Jeff Garzik
On Wed, Jun 06, 2007 at 04:52:15PM -0700, David Miller wrote:
 For the locking is makes a ton of sense.
 
 If you have sendmsg() calls going on N cpus, would you rather
 they:
 
 1) All queue up to the single netdev-tx_lock
 
 or
 
 2) All take local per-hw-queue locks
 
 to transmit the data they are sending?
 
 I thought this was obvious... guess not :-)

Agreed ++

For my part, I definitely want to see parallel Tx as well as parallel Rx.
It's the only thing that makes sense for modern multi-core CPUs.

Two warnings flags are raised in my brain though:

1) you need (a) well-designed hardware _and_ (b) a smart driver writer
to avoid bottlenecking on internal driver locks.  As you can see we have
both (a) and (b) for tg3 ;-)  But it's up in the air whether a
multi-TX-queue scheme can be sanely locked internally on other hardware.

At the moment we have to hope Intel gets it right in their driver...


2) I fear that the getting-it-into-the-Tx-queue part will take some
thought in order to make this happen, too.  Just like you have the
SMP/SMT/Multi-core scheduler scheduling various resources, surely we
will want some smarts so that sockets are not bouncing wildly across
CPUs, absent other factors outside our control.

Otherwise you will negate a lot of the value of the nifty multi-TX-lock
driver API, by bouncing data across CPUs on each transmit anyway.

IOW, you will have to sanely fill each of the TX queues.

Jeff



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: warnings in git-wireless

2007-06-06 Thread Andrew Morton
On Wed, 06 Jun 2007 15:33:46 -0700 James Ketrenos [EMAIL PROTECTED] wrote:

 Andrew Morton wrote:
  On Wed, 06 Jun 2007 13:51:41 -0700 James Ketrenos [EMAIL PROTECTED] wrote:
  
   * make C=2 CF=-Wall will complain if you use ARRAY_SIZE on global data
   */
  #define GLOBAL_ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
  This is identical to ARRAY_SIZE.
 
  And if there's some problem with ARRAY_SIZE then fix ARRAY_SIZE!  Don't 
  go 
  off and create some private thing and leave everyone else twisting in the
  wind.
  The code was resolving the sparse warnings.  
  
 ...
  Your GLOBAL_ARRAY_SIZE() is, afaict, identical to ARRAY_SIZE().
 
 From include/linux/kernel.h
 
 #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + 
 __must_be_array(arr))

o crap, sorry, I was looking at one of the other definitions of ARRAY_SIZE :(


 From drivers/net/wireless/mac80211/iwlwifi/iwl-helpers.h
 
 #define GLOBAL_ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
 
 The '+ __must_be_array(arr)' part of ARRAY_SIZE was causing sparse to 
 complain 
 with:
 
 drivers/net/wireless/mac80211/iwlwifi/base.c:4646:22: error: cannot size 
 expression
 ...

OK, that's a problem in sparse, I guess.

There _should_ be some #ifdeffable thing which is being passed to cpp when
we run sparse (but I'm not sure what it is).  If there is such a thing then
we could disable the __must_be_array() thing during sparse runs.

But longer-term, sparse should be taught about __must_be_array().

 When I had run my builds, I had restricted the sparse checks to just iwlwifi 
 (vs. checking the rest of the kernel).
 
 I just ran it C=2 CF=-Wall against the rest of the kernel and do see other 
 code 
 with the same problem, eg:
 
 sound/core/memalloc.c:521:14: error: cannot size expression
 ...
 
 I had erroneously thought it was just a problem with iwlwifi...
 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-06 Thread jamal
On Wed, 2007-06-06 at 16:53 -0700, David Miller wrote:

 There are of course other uses for multiple TX queues, and in
 particular my finer-grained locking example.

 I'm still amazed the TX locking issue wasn't clear to you,
 too nervous about tonight's game? :)

It's too depressing - so i came back here for a break ;-
I cant even stand Don Cherry today.

As a side note: You will have to do a lot of surgery to the current code
to make tx run on multi CPUs. It needs some experimenting to get right.
And i am begining to like Herberts changes ;-
I am not against multi-rings; iam just suggesting an alternative
approach which is less disruptive.

In regards to the tx lock - my thinking is resolving that via tx
batching. You amortize the lock over multiple packets. There may be
value in fine grained locking - i need to think about it. A small
extension to the batching patches will provide the change i am
proposing.

cheers,
jamal



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: warnings in git-wireless

2007-06-06 Thread Dave Jones
On Wed, Jun 06, 2007 at 06:04:21PM -0700, Andrew Morton wrote:

  There _should_ be some #ifdeffable thing which is being passed to cpp when
  we run sparse (but I'm not sure what it is).

#ifdef __CHECKER__

(See include/linux/compiler.h, this is how we implement __user  friends)

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Failover-friendly TCP retransmission

2007-06-06 Thread noboru . obata . ar
Hi Andi,

I thank you for your comments.

Andi Kleen [EMAIL PROTECTED] writes:
  Your suggestion, to utilize NET_XMIT_* code returned from an
  underlying layer, is done in tcp_transmit_skb.
  
  But my problem is that tcp_transmit_skb is not called during a
  certain period of time.  So I'm suggesting to cap RTO value so
  that tcp_transmit_skb gets called more frequently.
 
 The transmit code controls the transmission timeout. Or at least
 it could change it if it really wanted.
 
 What I wanted to say is: if the loss still happens under control
 of the sending end device and TCP knows this then it could change
 the retransmit timer to fire earlier or even just wait for an 
 event from the device that tells it to retransmit early.

I examined your suggestion to introduce some interface so that
TCP can know, or be notified, to retransmit early.

Then there are two options; pull from TCP or notify to TCP.

The first option, pulling from TCP, must be done at the
expiration of retransmission timer because there is no other
context to do this.  But if RTO is already large, this can
easily miss events or status of underlying layer, such as
symptom of failure, failover, etc.  So I give up pulling from
TCP.

The second option, notifying to TCP, seems a bit promising.
Upon such a notification, TCP may look into a timer structure to
find retransmission events, update their timers so that they
expire earlier, and possibly reset their RTO values.  Perhaps
this should be done for all TCP packets because TCP doesn't know
which packet will be sent to the device of interest at that time.

But I don't quite see if this better solves my problem.  Such
upcalls would be more complicated than capping RTO, and thus may
be error-prone and harder to maintain.  Problems might be
solvable, but I'd prefer a simpler solution.


 The problem with capping RTO is that when there is a loss
 in the network for some other reasons (and there is no reason
 bonding can't be used when talking to the internet) you
 might be too aggressive or not aggressive enough anymore
 to get the data through.

I think capping RTO is robust and better than upcalls.  The
effects of capping RTO, as follows, are small enough.

* It just makes retransmission more frequent.  Since there is
  already a fast retransmission in TCP, retransmitting earlier
  itself does not break TCP.  (I'm going to examine every
  occurrence of TCP_RTO_MAX, though.)

* In the worst case, it does not increase the total number of
  retransmission packets, which is bounded by tcp_retries2.

  Thus the final retransmission timeout comes earlier with same
  tcp_retries2.  If this is a case, one will have to raise
  tcp_retries2.

* The average case, in a certain period of time (say 60[s]), it
  may slightly increase the number of retransmission packets.

  Starting from RTO = 0.2[s], the numbers of retransmission
  packets in first 60[s] are, 8 with TCP_RTO_MAX = 120[s], and
  15 with TCP_RTO_MAX = 5[s].

  I think that increasing several packets per minute per socket
  are acceptable.

Therefore the side effects of capping RTO, even talking to the
Internet, seems to be small enough.

-- 
OBATA Noboru ([EMAIL PROTECTED])
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


extra layer below inet layer.

2007-06-06 Thread Tej Parkash

hi

i just want to have something like tcp layer sitting below inet layer.
so that i can directly offload everything to NIC
and i want it to be placed dynamically. so depending on the nic i can
offload the feature or can make it normal flow.
i.e. both layer should exist and normal functionality should not break.

this code is in af_inet.c

static struct inet_protosw inetsw_array[] =
{
  {
  .type =   SOCK_STREAM,
  .protocol =   IPPROTO_TCP,
  .prot =   tcp_prot,
  .ops =inet_stream_ops,
  .capability = -1,
  .no_check =   0,
  .flags =  INET_PROTOSW_PERMANENT |
INET_PROTOSW_ICSK,
  },

  {
  .type =   SOCK_DGRAM,
  .protocol =   IPPROTO_UDP,
  .prot =   udp_prot,
  .ops =inet_dgram_ops,
  .capability = -1,
  .no_check =   UDP_CSUM_DEFAULT,
  .flags =  INET_PROTOSW_PERMANENT,
 },


 {
 .type =   SOCK_RAW,
 .protocol =   IPPROTO_IP,/* wild card */
 .prot =   raw_prot,
 .ops =inet_sockraw_ops,
 .capability = CAP_NET_RAW,
 .no_check =   UDP_CSUM_DEFAULT,
 .flags =  INET_PROTOSW_REUSE,
 }
};

i was just going through the code i found that if i can regsiter above
kind of registration dynamically i will be able to do that.
i have tried to do that but since normal tcp registered this at boot
up time i am not able to unregister this dynamically

so is it possible to unregister tcp and register my protocol dynamically.

thanks
TEJ
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: extra layer below inet layer.

2007-06-06 Thread Stephen Hemminger
On Thu, 7 Jun 2007 10:10:44 +0530
Tej Parkash [EMAIL PROTECTED] wrote:

 hi
 
 i just want to have something like tcp layer sitting below inet layer.
 so that i can directly offload everything to NIC
 and i want it to be placed dynamically. so depending on the nic i can
 offload the feature or can make it normal flow.
 i.e. both layer should exist and normal functionality should not break.

NIC protocol offloading is a contentious subject since it bypasses
all the security and restricts usefulness of the device for VLAN's,
bonding, bridging, etc.


-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html