date:20080110

From: Mirko Lindner [EMAIL PROTECTED]
Date: Thu, 10 Jan 2008 10:33:01 +0100

 This patch makes necessary changes in the Neptune driver to support 
 the new Marvell PHY. It also adds support for the LED blinking
 on Neptune cards with Marvell PHY. All registers are using defines
 in the niu.h header file as is already done for the BCM8704 registers.

Applied.  Please provide a proper Signed-off-by:  line next
time as documented in linux/Documentation/SubmittingPatches

Also there were a lot of coding style and other errors.  I
fixed them up because I already put you through the ringer
to fix up the original patch.

I'll note most of them, but reading linux/CodingStyle would be a great
idea:

 -static int xcvr_init_10g(struct niu *np)
 +void mrvl88x2011_act_led(struct niu *np, int val)
 ...
 +void mrvl88x2011_led_blink_rate(struct niu *np, int rate)
 +{

Both of these functions should be marked static, return an int and
check and return all errors indicated by mdio_read() and mdio_write().

 +static int xcvr_init_10g_mrvl88x2011(struct niu *np)
 +{
 ...
 + /* Set LED functions */
 + mrvl88x2011_led_blink_rate(np, MRVL88X2011_LED_BLKRATE_134MS);
 + mrvl88x2011_act_led(np, MRVL88X2011_LED_CTL_OFF);   /* led activity 
 */

Longer than 80-column lines, no error checking.

 + err = mdio_read(np, np-phy_addr, MRVL88X2011_USER_DEV3_ADDR,
 + MRVL88X2011_GENERAL_CTL);
 + if (err  0) {
 + return(err);
 + }

Extraneous openning and closing braces wasting precious screen
real-estate.  return is not a function taking an argument, don't
surround simple values with parenthesis.

These happened a lot, I won't mention the other instances.

 + err = mdio_write(np, np-phy_addr, MRVL88X2011_USER_DEV3_ADDR,
 + MRVL88X2011_GENERAL_CTL, err);

Bad indentation, the argument on the second line of the mdio_write()
call should line up with the initial np initial arg on the
previous line.

If necessary, use tools or editors that help do this automatically for
you.  Several folks (including me) use Emacs's C mode with coding
style set to linux for that purpose.

 +static int xcvr_init_10g(struct niu *np)
 +{
 + int err;
 + u64 val;
 + int phy_id;

Multiple variables of the same type should be on one single line
unless it would make the line too long.

 + /* handle different phy types */
 + switch((phy_id  NIU_PHY_ID_MASK)) {

Space is needed between switch keyword and openning parenthesis.
Only one set of parenthesis is sufficient here.

 -static int link_status_10g(struct niu *np, int *link_up_p)
 +static int link_status_10g_mrvl(struct niu *np, int *link_up_p)
  {
 - unsigned long flags;
 - int err, link_up;
 + int err;
 + int link_up = 0;
 + int pma_status;
 + int pcs_status;

No tabs please between variable type and names, and again list
multiple variables of the same type on one single line in order to
save previous screen real-estate.

 + pma_status = ((err  MRVL88X2011_LNK_STATUS_OK) ? 1:0);

Spaces are needed to make this easier to read ? 1 : 0.

 + if (err == (
 + PHYXS_XGXS_LANE_STAT_ALINGED | PHYXS_XGXS_LANE_STAT_LANE3 |
 + PHYXS_XGXS_LANE_STAT_LANE2 | PHYXS_XGXS_LANE_STAT_LANE1 |
 + PHYXS_XGXS_LANE_STAT_LANE0 | PHYXS_XGXS_LANE_STAT_MAGIC | 
 0x800)) {

This err == ( line should hold the first line of bit values, again
to save lines.

 + mrvl88x2011_act_led(np, link_up ? 
 MRVL88X2011_LED_CTL_PCS_ACT:MRVL88X2011_LED_CTL_OFF);

Line excessively exceeds 80 columns.

   if (type == PHY_TYPE_PMA_PMD || type == PHY_TYPE_PCS) {
 - if ((id  NIU_PHY_ID_MASK) != NIU_PHY_ID_BCM8704)
 + if (((id  NIU_PHY_ID_MASK) != NIU_PHY_ID_BCM8704) 
 + ((id  NIU_PHY_ID_MASK) != NIU_PHY_ID_MRVL88X2011))

Not indented properly, the second line in the inner if statement
should have it's initial parenthesis match up with the second openning
parenthesis on the previous line.

 +/* MRVL88X2011 register control */
 +#define MRVL88X2011_ENA_XFPREFCLK0x0001
 +#define MRVL88X2011_ENA_PMDTX0x
 +#define MRVL88X2011_LOOPBACK0x1
 +#define MRVL88X2011_LED_ACT 0x1
 +#define MRVL88X2011_LNK_STATUS_OK   0x4
 +#define MRVL88X2011_LED_BLKRATE_MASK 0x70
 +#define MRVL88X2011_LED_BLKRATE_034MS0x0
 +#define MRVL88X2011_LED_BLKRATE_067MS0x1
 +#define MRVL88X2011_LED_BLKRATE_134MS0x2
 +#define MRVL88X2011_LED_BLKRATE_269MS0x3
 +#define MRVL88X2011_LED_BLKRATE_538MS0x4
 +#define MRVL88X2011_LED_CTL_OFF  0x0
 +#define MRVL88X2011_LED_CTL_PCS_ACT  0x5
 +#define MRVL88X2011_LED_CTL_MASK 0x7
 +#define MRVL88X2011_LED(n,v)((v)((n)*4))
 +#define MRVL88X2011_LED_STAT(n,v)   ((v)((n)*4))

These lines inconsistently use tabs vs. spaces to create the
indentation between the macro name and it's definition.

Anyways, after cleaning up all of

[PATCH][NEIGH] Fix race between neigh_parms_release and neightbl_fill_parms

The neightbl_fill_parms() is called under the write-locked
tbl-lock and accesses the parms-dev. The negh_parm_release()
calls the dev_put(parms-dev) without this lock. This 
creates a tiny race window on which the parms contains
potentially stale dev pointer.

To fix this race it's enough to move the dev_put() upper
under the tbl-lock, but note, that the parms are held by
neighbors and thus can live after the neigh_parms_release()
is called, so we still can have a parm with bad dev pointer.

I didn't find where the neigh-parms-dev is accessed, but 
still think that putting the dev is to be done in a place,
where the parms are really freed. Am I right with that?

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 29b8ee4..cc8a2f1 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1316,8 +1316,6 @@ void neigh_parms_release(struct neigh_table *tbl, struct 
neigh_parms *parms)
*p = parms-next;
parms-dead = 1;
write_unlock_bh(tbl-lock);
-   if (parms-dev)
-   dev_put(parms-dev);
call_rcu(parms-rcu_head, neigh_rcu_free_parms);
return;
}
@@ -1328,6 +1326,8 @@ void neigh_parms_release(struct neigh_table *tbl, struct 
neigh_parms *parms)
 
 void neigh_parms_destroy(struct neigh_parms *parms)
 {
+   if (parms-dev)
+   dev_put(parms-dev);
kfree(parms);
 }
 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: No idea about shaping trough many pc

2008-01-10 Thread Denys Fedoryshchenko

For proper link bandwidth sharing i guess something like network counters 
have to be shared between PC's (with proper locking). I didn't heard anything 
like this

IMHO a ways to do this:
Split destination network to multiple parts and do routes on Cisco. Let's say 
you have:
192.168.0.0/16
and u have 4 balancing PC's
total bandwidth 1Gbit/s (speed conforming to IEC 1000Mbit/s in 1Gbit/s)

Then u do on cisco :
192.168.0.0/18 via PC1(shared speed 250Mbit/s)
192.168.64.0/18 via PC2(shared speed 250Mbit/s)
192.168.128.0/18 via PC3(shared speed 250Mbit/s)
192.168.192.0/18 via PC4(shared speed 250Mbit/s)

Probably you can do some scripts to check, if there is in some PC too much 
available bandwidth (average 5 minutes), then you can give some other PC 
which is need more bandwidth - more bandwidth. For example:

Average counters for 5minute shows:
PC1 - occupy 100Mbit/s
PC2 - -//- 50Mbit/s
PC3 - -//- 150Mbit/s
PC4 - -//- 230Mbit/s

Then u change link speed:
PC1 max 200
PC2 max 150
PC3 max 250
PC4 max 400 (100 from PC2 and 50 from PC1)

Sure PC must be capable to pass this traffic. And my IMHO it is not normal 
that your PC's not able to handle more than 200Mbps of traffic. I have 
complicated setup, with 4 LAN 8139 cards, which is passing totally 200Mbps 
traffic. I am sure it can handle up to 300mbps, but already i am changing it 
to PC with PCI-E e1000/broadcom netxtreme with offloading capabilities, large 
buffers and proper drivers with NAPI. I have such hardware handling now for 
example 160Mbps and counters is:
12:50:41 CPU   %user   %nice%sys %iowait%irq   %soft  %steal
   %idleintr/s
12:50:42 all0.000.000.000.000.251.240.00   
98.51   4009.90
12:50:43 all0.000.000.000.000.001.250.00   
98.75   4024.75
12:50:44 all0.000.000.000.000.001.500.00   
98.50   4181.82
12:50:45 all0.250.000.000.000.001.500.00   
98.25   4626.73
12:50:46 all0.000.000.000.000.001.500.00   
98.50   4351.52
12:50:47 all0.250.000.000.000.001.750.00   
98.00   4805.88

It is 2.6.23.8 with some mistakes during configuration, i am doing to try 
2.6.24-rc7 and some optimizations.

Right now profile looks like:
1095717.0675  mwait_idle_with_hints
7454 11.6110  read_hpet
3883  6.0485  _raw_spin_lock
1605  2.5001  timer_interrupt
1363  2.1231  irq_entries_start

So maybe i will have to try change timers to TSC, disable nmi_watchdog and 
try to tune up network driver (bnx2).
Probably you have to check such things too.

On Thu, 10 Jan 2008 12:06:35 +0300, Badalian Vyacheslav wrote
 Hello all.
 I try more then 2 month resolve problem witch my shaping.  Maybe you 
 can help for me?
 
 Sheme:
 +---+
  + - | Shaping PC 1 | -+
  /  +---+  \
 ++   /   ++  \   
+ + | Cisco |  + | Shaping PC N  | ---+ --
 ---| CISCO | ++   \   ++ 
  /  +-+ \  +-
 +   / + - | Shaping PC 
 20 | ++-+
 
 Network - Over 10k users. Common bandwidth to INTERNET more then 1 
 GBs All computers have BGP and turn on multipath. Cisco can't do 
 load sharing by Packet (its can resolve all my problems =((( ). Only 
 by DST IP, SRC IP, or +Level4. Ok. User must have speed 1mbs. Lets 
 look variants:
 1. Create rules to user = (1mbs/N computers). If user use N 
 connection all great, but if it use 1 connection his speed = 1mbs/N -
  its not look good. All be great if cisco can PER PACKET load 
 sharing =(
 2. Create rules to user = 1mbs. If user use 1 connection all great,
  but if it use N connection his speed much more then needed limit =(
 
 Why i use 20 PC? Becouse 1 pc normal forward 100-150mbs... when it 
 have 100% cpu usage on Sofware Interrupts...
 
 Any idea how to resolve this problem?
 
 In my dreams (feature request to netdev ;) ):
 Get PC - title: MASTER TC.  All 20 PC syncronize statistic with 
 MASTER and have common rules and statistic. Then i use variant 2 and 
 will be happy... but its not real? =( Maybe have other variants?
 
 Thanks for help!
 Slavon.
 P.S. Sorry for my english =(
 --
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch net-2.6.25 00/10][NETNS][IPV6] make sysctl per namespace - V3

From: Daniel Lezcano [EMAIL PROTECTED]
Date: Wed, 09 Jan 2008 17:45:33 +0100

 The following patchset makes the ipv6 sysctl to handle multiple
 network namespaces. Each instance of a network namespace as its own
 set of sysctl values, that means the behavior of the ipv6 stack can be
 different depending on the sysctl values setup in the different
 network namespaces.

I applied all of this to net-2.6.25 but what a rough half hour
it was :-/

Starting at patch #5 there were tons of space before tab errors.
And as I fixed them up, this made subsequent patches need rediffing
since the contextual lines in patches after #5 needed the whitespace
fixed up as well.

I didn't push this back to you because this was already the 3rd round,
but please show me some love and check this stuff out before
submission.  GIT gives you effective ways to verify the whitespace
without even applying the patch.

~davem/bin/pcheck:

#!/bin/sh
set -x
git apply --check --whitespace=error-all $1
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Linux IPv6 DAD not full conform to RFC 4862 ?

2008-01-10 Thread Karsten Keil

Hi,
On Wed, Jan 09, 2008 at 09:26:53PM +0100, Karsten Keil wrote:
  
  Reading the section you reference, we do follow all the MUST requirements, 
  and
  we log an error.  Given that the disable section is a SHOULD, I think we 
  can at
  least be somewhat more restrictive in our implementation.  Perhaps we should
  just disable the interface iff the failed address is link-local AND there 
  are no
  other functional address assigned to the interface.
 
 I agree here, but it seems that currently the IPv6 Logo Committee thinks
 that it has to be disable the interface to get the IPv6 ready Logo in
 future. I already claim that on a discussion at the TAHI users list.
 

JFYI, here the answer from the TAHI list.

Hi, Karsten.

Thanks for your comments.

I know that it is SHOULD,
but our test tool supports the test specification
published by IPv6 Ready Logo Program http://www.ipv6ready.org/,
and basically the test specification supports all of MUST and SHOULD.

You may know,
now IPv6 Ready Logo Committee is also discussing
about the next major revision up of test specification.


RFC 4862 Section 5.4.5 is one of discussing point.

The public review has been over,
but if you have strong concern about it,
I recommend to comment to [EMAIL PROTECTED].

Personally,
I think that mandating this function is the best way.
But vendor's input will really important for them.

Regards,
Yukiyo Akisada


So it would be good if some of the networking experts complain there.

-- 
Karsten Keil
SuSE Labs
ISDN and VOIP development
SUSE LINUX Products GmbH, Maxfeldstr.5 90409 Nuernberg, GF: Markus Rex, HRB 
16746 (AG Nuernberg)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2.6.25] [ATM] Oops reading net/atm/arp

2008-01-10 Thread Denis V. Lunev

cat /proc/net/atm/arp causes the NULL pointer dereference in the
get_proc_net+0xc/0x3a. This happens as proc_get_net believes that the
parent proc dir entry contains struct net.

Fix this assumption for net/atm case.

The problem is introduced by the commit c0097b07abf5f92ab135d024dd41bd2aada1512f
from Eric W. Biederman/Daniel Lezcano.

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]

---

diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c
index cfc4f6c..4823c96 100644
--- a/fs/proc/proc_net.c
+++ b/fs/proc/proc_net.c
@@ -96,6 +96,17 @@ static struct proc_dir_entry *proc_net_shadow(struct 
task_struct *task,
return task-nsproxy-net_ns-proc_net;
 }
 
+struct proc_dir_entry *proc_net_mkdir(struct net *net, const char *name,
+   struct proc_dir_entry *parent)
+{
+   struct proc_dir_entry *pde;
+   pde = proc_mkdir_mode(name, S_IRUGO | S_IXUGO, parent);
+   if (pde != NULL)
+   pde-data = net;
+   return pde;
+}
+EXPORT_SYMBOL_GPL(proc_net_mkdir);
+
 static __net_init int proc_net_ns_init(struct net *net)
 {
struct proc_dir_entry *root, *netd, *net_statd;
@@ -107,18 +118,16 @@ static __net_init int proc_net_ns_init(struct net *net)
goto out;
 
err = -EEXIST;
-   netd = proc_mkdir(net, root);
+   netd = proc_net_mkdir(net, net, root);
if (!netd)
goto free_root;
 
err = -EEXIST;
-   net_statd = proc_mkdir(stat, netd);
+   net_statd = proc_net_mkdir(net, stat, netd);
if (!net_statd)
goto free_net;
 
root-data = net;
-   netd-data = net;
-   net_statd-data = net;
 
net-proc_net_root = root;
net-proc_net = netd;
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index a531682..8f92546 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -201,6 +201,8 @@ static inline struct proc_dir_entry 
*create_proc_info_entry(const char *name,
 extern struct proc_dir_entry *proc_net_fops_create(struct net *net,
const char *name, mode_t mode, const struct file_operations *fops);
 extern void proc_net_remove(struct net *net, const char *name);
+extern struct proc_dir_entry *proc_net_mkdir(struct net *net, const char *name,
+   struct proc_dir_entry *parent);
 
 #else
 
diff --git a/net/atm/proc.c b/net/atm/proc.c
index 5d9d5ff..565e75e 100644
--- a/net/atm/proc.c
+++ b/net/atm/proc.c
@@ -476,7 +476,7 @@ static void atm_proc_dirs_remove(void)
if (e-dirent)
remove_proc_entry(e-name, atm_proc_root);
}
-   remove_proc_entry(atm, init_net.proc_net);
+   proc_net_remove(init_net, atm);
 }
 
 int __init atm_proc_init(void)
@@ -484,7 +484,7 @@ int __init atm_proc_init(void)
static struct atm_proc_entry *e;
int ret;
 
-   atm_proc_root = proc_mkdir(atm, init_net.proc_net);
+   atm_proc_root = proc_net_mkdir(init_net, atm, init_net.proc_net);
if (!atm_proc_root)
goto err_out;
for (e = atm_proc_ents; e-name; e++) {
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2.6.25] [NEIGH] Make /proc/net/arp opening consistent with seq_net_open semantics

2008-01-10 Thread Denis V. Lunev

seq_open_net requires that first field of the seq-private data to be
struct seq_net_private. In reality this is a single pointer to a struct net
for now. The patch makes code consistent.

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]

---

diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index a9dda29..09f9fc6 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -223,7 +223,7 @@ extern void __neigh_for_each_release(struct neigh_table 
*tbl, int (*cb)(struct n
 extern void pneigh_for_each(struct neigh_table *tbl, void (*cb)(struct 
pneigh_entry *));
 
 struct neigh_seq_state {
-   struct net *net;
+   struct seq_net_private p;
struct neigh_table *tbl;
void *(*neigh_sub_iter)(struct neigh_seq_state *state,
struct neighbour *n, loff_t *pos);
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 8024933..19c0dd1 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -2142,7 +2142,7 @@ EXPORT_SYMBOL(__neigh_for_each_release);
 static struct neighbour *neigh_get_first(struct seq_file *seq)
 {
struct neigh_seq_state *state = seq-private;
-   struct net *net = state-net;
+   struct net *net = state-p.net;
struct neigh_table *tbl = state-tbl;
struct neighbour *n = NULL;
int bucket = state-bucket;
@@ -2183,7 +2183,7 @@ static struct neighbour *neigh_get_next(struct seq_file 
*seq,
loff_t *pos)
 {
struct neigh_seq_state *state = seq-private;
-   struct net *net = state-net;
+   struct net *net = state-p.net;
struct neigh_table *tbl = state-tbl;
 
if (state-neigh_sub_iter) {
@@ -2243,7 +2243,7 @@ static struct neighbour *neigh_get_idx(struct seq_file 
*seq, loff_t *pos)
 static struct pneigh_entry *pneigh_get_first(struct seq_file *seq)
 {
struct neigh_seq_state *state = seq-private;
-   struct net * net = state-net;
+   struct net * net = state-p.net;
struct neigh_table *tbl = state-tbl;
struct pneigh_entry *pn = NULL;
int bucket = state-bucket;
@@ -2266,7 +2266,7 @@ static struct pneigh_entry *pneigh_get_next(struct 
seq_file *seq,
loff_t *pos)
 {
struct neigh_seq_state *state = seq-private;
-   struct net * net = state-net;
+   struct net * net = state-p.net;
struct neigh_table *tbl = state-tbl;
 
pn = pn-next;
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Linux IPv6 DAD not full conform to RFC 4862 ?

2008-01-10 Thread Karsten Keil

On Wed, Jan 09, 2008 at 03:32:12PM -0800, David Miller wrote:
 From: Karsten Keil [EMAIL PROTECTED]
 Date: Wed, 9 Jan 2008 16:36:56 +0100

 If the address is a link-local address formed from an interface
 identifier based on the hardware address, which is supposed to be
 uniquely assigned (e.g., EUI-64 for an Ethernet interface), IP
 operation on the interface SHOULD be disabled.  By disabling IP
 operation, the node will then:

 -  not send any IP packets from the interface,

 -  silently drop any IP packets received on the interface, and

 -  not forward any IP packets to the interface (when acting as a
router or processing a packet with a Routing header).

 I question any RFC mandate that shuts down IP communication on a node
 because of packets received from remote systems.

 If the TAHI test can trigger this, so can a compromised system on your
 network and won't that be fun? :-)

I agree, but on the other side, a interface with a real duplicate HW address
sending packets on the network can also cause very serious problems, and
maybe is not so easy to detect as  a system where the interface never come
up because of this. So maybe it makes sense to implement it as option, not
as default.
And the DOS scenario is already here, also without disabling IP completely,
since you can deny any IPv6 address assignments with faked DAD pakets.

-- 
Karsten Keil
SuSE Labs
SUSE LINUX Products GmbH, Maxfeldstr.5 90409 Nuernberg, GF: Markus Rex, HRB 
16746 (AG Nuernberg)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: FW: ccid2/ccid3 oopses

2008-01-10 Thread Gerrit Renker

|  So maybe the cause triggering this oops is somewhere else.
| 
| yes, probably.  sorry - i didn`t tell or maybe i didn`t know when writing
| my first mail to module authors and forget to add that before forwarding here.
| 
| for me , the problem does not happen with suse kernel of the day
| (2.6.24-rc6-git7-20080102160500-default, .config attached) but it happens
| with vanilla 2.6.24-rc6 (mostly allmodconfig, also attached)
| 
There are 256 differences between the two .config files. I think there are other
people on the list who will be able to give more information regarding the 
.config
files. The differences that struck me in the one which doesn't work is

 -- CONFIG_DEBUG_KERNEL and
 -- CONFIG_DEBUG_BUGVERBOSE were not set. Both are very useful for bug-hunting,
the latter is much better for decoding oopses.

Can't say anything about the Suse kernel. We use the plain kernel from 
www.kernel.org, 
specifically the netdev-tree:
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
If you can't get further here, try with a kernel.org kernel or check Suse 
forums.   

 1. the tests yesterday were done on the DCCP test tree based on the above 
netdev-2.6
2.6.24-rc7 tree from git://eden-feed.erg.abdn.ac.uk/dccp_exp   (dccp 
subtree)
Tested your for-loop 60 seconds each for CCID3/4 -- no oops.

 2. also repeated the tests on an unmodified 2.6.24-rc7 tree from netdev-2.6 
(today)
120 seconds for-loop each -- no oops.   

As said, if the above does not help, try a www.kernel.org kernel (or one of the
above trees) first.
| 
| |   the easiest way to reproduce is:
|  |   
|  |   while true;do modprobe dccp_ccid2/3;modprobe -r dccp_ccid2/3;done
|  |   after short time, the kernel oopses (messages below)
|  |   
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH netns-2.6.25 0/19] routing virtualization v2

From: Denis V. Lunev [EMAIL PROTECTED]
Date: Wed, 09 Jan 2008 21:03:03 +0300

 This set adds namespace support for routing tables  rules
 manipulation in the different namespaces. So, one could create a
 namespace and setup IPv4 routing there how he wants.

All 19 patches applied, thanks Denis.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][NEIGH] Fix race between neigh_parms_release and neightbl_fill_parms

From: Pavel Emelyanov [EMAIL PROTECTED]
Date: Thu, 10 Jan 2008 13:56:53 +0300

 The neightbl_fill_parms() is called under the write-locked
 tbl-lock and accesses the parms-dev. The negh_parm_release()
 calls the dev_put(parms-dev) without this lock. This 
 creates a tiny race window on which the parms contains
 potentially stale dev pointer.

 To fix this race it's enough to move the dev_put() upper
 under the tbl-lock, but note, that the parms are held by
 neighbors and thus can live after the neigh_parms_release()
 is called, so we still can have a parm with bad dev pointer.

 I didn't find where the neigh-parms-dev is accessed, but 
 still think that putting the dev is to be done in a place,
 where the parms are really freed. Am I right with that?

 Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

It is accessed in lookup_neigh_parms(), neightbl_fill_parms(), and
neightbl_fill_info() (hmmm, that BUG_ON(tbl-parms.dev) is cute).

You fix looks correct, patch applied, thanks!
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.25] [ATM] Oops reading net/atm/arp

From: Denis V. Lunev [EMAIL PROTECTED]
Date: Thu, 10 Jan 2008 14:28:53 +0300

 cat /proc/net/atm/arp causes the NULL pointer dereference in the
 get_proc_net+0xc/0x3a. This happens as proc_get_net believes that the
 parent proc dir entry contains struct net.

 Fix this assumption for net/atm case.

 The problem is introduced by the commit 
 c0097b07abf5f92ab135d024dd41bd2aada1512f
 from Eric W. Biederman/Daniel Lezcano.

 Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.25] [ATM] Simplify /proc/net/atm/arp opening

From: Denis V. Lunev [EMAIL PROTECTED]
Date: Thu, 10 Jan 2008 14:30:44 +0300

 The iterator state-ns.neigh_sub_iter initialization is moved from
 arp_seq_open to clip_seq_start for convinience. This should not be a problem
 as the iterator will be used only after the seq_start callback.

 Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.25] [NEIGH] Make /proc/net/arp opening consistent with seq_net_open semantics

From: Denis V. Lunev [EMAIL PROTECTED]
Date: Thu, 10 Jan 2008 14:32:19 +0300

 seq_open_net requires that first field of the seq-private data to be
 struct seq_net_private. In reality this is a single pointer to a struct net
 for now. The patch makes code consistent.

 Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]

Applied, thanks for correcting this.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch net-2.6.25 00/10][NETNS][IPV6] make sysctl per namespace - V3

2008-01-10 Thread Daniel Lezcano

David Miller wrote:

From: Daniel Lezcano [EMAIL PROTECTED]
Date: Wed, 09 Jan 2008 17:45:33 +0100

The following patchset makes the ipv6 sysctl to handle multiple
network namespaces. Each instance of a network namespace as its own
set of sysctl values, that means the behavior of the ipv6 stack can be
different depending on the sysctl values setup in the different
network namespaces.

I applied all of this to net-2.6.25 but what a rough half hour
it was :-/

Starting at patch #5 there were tons of space before tab errors.
And as I fixed them up, this made subsequent patches need rediffing
since the contextual lines in patches after #5 needed the whitespace
fixed up as well.

I didn't push this back to you because this was already the 3rd round,
but please show me some love and check this stuff out before
submission.  GIT gives you effective ways to verify the whitespace
without even applying the patch.

~davem/bin/pcheck:

#!/bin/sh
set -x
git apply --check --whitespace=error-all $1

Sorry, I will check that in the future :|
Many thanks for taking the time to fix that.

  -- Daniel.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Linux IPv6 DAD not full conform to RFC 4862 ?

2008-01-10 Thread Neil Horman

On Wed, Jan 09, 2008 at 04:09:57PM -0500, Vlad Yasevich wrote:
 Neil Horman wrote:
 On Thu, Jan 10, 2008 at 01:38:57AM +0900, YOSHIFUJI Hideaki / 吉藤英明 wrote:
 In article [EMAIL PROTECTED] (at Wed, 9 Jan 2008 16:36:56 +0100), Karsten 
 Keil [EMAIL PROTECTED] says:

 So I think we should disable the interface now, if DAD fails on a
 hardware based LLA.
 I don't want to do this, at least, unconditionally.

 Options (not exclusive):

 - we could have dad_reaction interface variable and
   1: disable interface
  = 1: disable IPv6
   0: ignore (as we do now)

 I like the flexibility of this solution, but given that the only part of the 
 RFC
 that we're missing on at the moment is that we SHOULD disable the interface 
 on
 DAD failure for a link-local address, I would think this scheme would be 
 good:

0 : ignore, and del address from interface (current behavior)   = 0 : 
 disable interface for dad failure for a link-local address0 : disable 
 interface for dad failure for any address 
 Regards
 Neil
  

 Just a friendly reminder that such a scheme should only be
 applied to autoconfigured addresses.  A manually configured
 duplicated address should not bring down the whole interface.


I agree, but I think that case would be covered by the default option above
(sysctl  0).

Neil

 -vlad
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache

From: Eric Dumazet [EMAIL PROTECTED]
Date: Wed, 9 Jan 2008 11:37:27 +0100

 [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache

 In rt_cache_get_next(), no need to guard seq-private by a rcu_dereference()
 since seq is private to the thread running this function. Reading seq.private
 once (as guaranted bu rcu_dereference()) or several time if compiler really 
 is 
 dumb enough wont change the result.

 But we miss real spots where rcu_dereference() are needed, both in 
 rt_cache_get_first() and rt_cache_get_next()

 Signed-off-by: Eric Dumazet [EMAIL PROTECTED]
 Signed-off-by: Herbert Xu [EMAIL PROTECTED]

I've applied this to net-2.6, thanks!
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3] AX25: kill user triggable printks

From: maximilian attems [EMAIL PROTECTED]
Date: Wed,  9 Jan 2008 11:21:10 +0100

 sfuzz can easily trigger any of those.

 move the printk message to the corresponding comment:
 makes the intention of the code clear and easy
 to pick up on an scheduled removal.
 as bonus simplify the braces placement.

 Signed-off-by: maximilian attems [EMAIL PROTECTED]

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

EQL / doubts

Hi All,
I have few questions about EQL driver

*) Why the tx_queue_len is set as 5?.  For example if we bond 3 lines
and each has 1000 as tx_queue_len, will the bonding line(eql)
tx_queue_len be sum of these three tx_queue_len?. In this case, will the
bonding line(eql)tx_queue_len be 3000?

*)Question: Why list_add is used instead of list_add_tail?. For queue
implementation, list_add_tail would be required. Why do we implement of
slave queue in the way of stack implementation?.

 File:  linux/drivers/net/eql.c
 Function: __eql_insert_slave(slave_queue_t *queue, slave_t *slave)

Code: 

/* queue-lock must be held */

static int __eql_insert_slave(slave_queue_t *queue, slave_t *slave)
{
  if (!eql_is_full(queue)) {
   slave_t *duplicate_slave = NULL;
   duplicate_slave = __eql_find_slave_dev(queue, slave-dev);
   if (duplicate_slave != 0)
 eql_kill_one_slave(queue, duplicate_slave);

list_add(slave-list, queue-all_slaves); // Why
list_add has been
used instead of list_add_tail?. I hope queue-all_slaves is queue
implementation.

   


*) Is it possible to improve the load balancing performance using
multiprocessor?. For example,if a server has two processors and N n/w
interfaces, is it possible to assign one processor for N/2 n/w
interface's tx and rx handling and other for N/2 n/w interface's tx/rx
handling


Thanks
Jeba


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/4] [XFRM]: Kill some bloat

2008-01-10 Thread Ilpo Järvinen

On Tue, 8 Jan 2008, Ilpo Järvinen wrote:

 On Mon, 7 Jan 2008, David Miller wrote:
 
  From: Andi Kleen [EMAIL PROTECTED]
  Date: Tue, 8 Jan 2008 06:00:07 +0100
  
   On Mon, Jan 07, 2008 at 07:37:00PM -0800, David Miller wrote:
The vast majority of them are one, two, and three liners.
   
   % awk '  { line++ } ; /^{/ { total++; start = line } ; /^}/ { 
   len=line-start-3; if (len  4) l++; if (len = 10) k++; } ; END { print 
   total, l, l/total, k, k/total }'  include/net/tcp.h
   68 28 0.411765 20 0.294118
   
   41% are over 4 lines, 29% are = 10 lines.
  
  Take out the comments and whitespace lines, your script is
  too simplistic.

In addition it triggered spuriously per struct/enum end brace :-) and
was using the last known function starting brace in there so no wonder
the numbers were that high... Counting with the corrected lines 
(len=line-start-1)  spurious matches removed:

74 19 0.256757 7 0.0945946


Here are (finally) the measured bytes (couple of the functions are 
missing because I had couple of bugs in the regexps and the #if trickery 
at the inline resulted failed compiles):

 12 funcs, 242+, 1697-, diff: -1455  tcp_set_state 
 13 funcs, 92+, 632-, diff: -540 tcp_is_cwnd_limited 
 12 funcs, 2836+, 3225-, diff: -389  tcp_current_ssthresh 
 5 funcs, 261+, 556-, diff: -295 tcp_prequeue 
 7 funcs, 2777+, 3049-, diff: -272   tcp_clear_retrans_hints_partial 
 11 funcs, 64+, 275-, diff: -211 tcp_win_from_space 
 6 funcs, 128+, 320-, diff: -192 tcp_prequeue_init 
 12 funcs, 45+, 209-, diff: -164 tcp_set_ca_state 
 7 funcs, 106+, 237-, diff: -131 tcp_fast_path_check 
 5 funcs, 167+, 291-, diff: -124 tcp_write_queue_purge 
 6 funcs, 43+, 160-, diff: -117  tcp_push_pending_frames 
 9 funcs, 55+, 159-, diff: -104  tcp_v4_check 
 6 funcs, 4+, 97-, diff: -93 tcp_packets_in_flight 
 7 funcs, 58+, 150-, diff: -92   tcp_fast_path_on 
 4 funcs, 4+, 91-, diff: -87 tcp_clear_options 
 6 funcs, 141+, 217-, diff: -76  tcp_openreq_init 
 8 funcs, 38+, 111-, diff: -73   tcp_unlink_write_queue 
 7 funcs, 32+, 103-, diff: -71   tcp_checksum_complete 
 7 funcs, 35+, 101-, diff: -66   __tcp_fast_path_on 
 5 funcs, 4+, 66-, diff: -62 tcp_receive_window 
 6 funcs, 67+, 128-, diff: -61   tcp_add_write_queue_tail 
 7 funcs, 30+, 86-, diff: -56tcp_ca_event 
 6 funcs, 73+, 106-, diff: -33   tcp_paws_check 
 4 funcs, 4+, 36-, diff: -32 tcp_highest_sack_seq 
 6 funcs, 46+, 78-, diff: -32tcp_fin_time 
 3 funcs, 4+, 35-, diff: -31 tcp_clear_all_retrans_hints 
 7 funcs, 30+, 51-, diff: -21__tcp_add_write_queue_tail 
 3 funcs, 4+, 14-, diff: -10 tcp_enable_fack 
 4 funcs, 4+, 14-, diff: -10 keepalive_time_when 
 8 funcs, 66+, 73-, diff: -7 tcp_full_space 
 3 funcs, 4+, 5-, diff: -1   tcp_wnd_end 
 4 funcs, 97+, 97-, diff: +0 tcp_mib_init 
 3 funcs, 4+, 3-, diff: +1   tcp_skb_is_last 
 2 funcs, 4+, 2-, diff: +2   keepalive_intvl_when 
 2 funcs, 4+, 2-, diff: +2   tcp_is_fack 
 2 funcs, 4+, 2-, diff: +2   tcp_skb_mss 
 2 funcs, 4+, 2-, diff: +2   tcp_write_queue_empty 
 2 funcs, 4+, 2-, diff: +2   tcp_advance_highest_sack 
 2 funcs, 4+, 2-, diff: +2   tcp_advance_send_head 
 2 funcs, 4+, 2-, diff: +2   tcp_check_send_head 
 2 funcs, 4+, 2-, diff: +2   tcp_highest_sack_reset 
 2 funcs, 4+, 2-, diff: +2   tcp_init_send_head 
 2 funcs, 4+, 2-, diff: +2   tcp_sack_reset 
 6 funcs, 47+, 44-, diff: +3 tcp_space 
 5 funcs, 55+, 50-, diff: +5 tcp_too_many_orphans 
 3 funcs, 8+, 2-, diff: +6   tcp_minshall_update 
 3 funcs, 8+, 2-, diff: +6   tcp_update_wl 
 8 funcs, 25+, 14-, diff: +11between 
 3 funcs, 14+, 2-, diff: +12 tcp_put_md5sig_pool 
 3 funcs, 14+, 2-, diff: +12 tcp_clear_xmit_timers 
 5 funcs, 30+, 17-, diff: +13tcp_dec_pcount_approx_int 
 6 funcs, 33+, 20-, diff: +13tcp_insert_write_queue_after 
 3 funcs, 17+, 2-, diff: +15 __tcp_checksum_complete 
 5 funcs, 17+, 2-, diff: +15 tcp_init_wl 
 4 funcs, 57+, 41-, diff: +16tcp_dec_quickack_mode 
 4 funcs, 40+, 22-, diff: +18__tcp_add_write_queue_head 
 5 funcs, 36+, 16-, diff: +20tcp_highest_sack_combine 
 4 funcs, 40+, 18-, diff: +22tcp_dec_pcount_approx 
 6 funcs, 29+, 5-, diff: +24 tcp_is_sack 
 4 funcs, 28+, 2-, diff: +26 tcp_is_reno 
 5 funcs, 50+, 24-, diff: +26tcp_insert_write_queue_before 
 4 funcs, 83+, 56-, diff: +27tcp_check_probe_timer 
 8 funcs, 69+, 14-, diff: +55tcp_left_out 
 11 funcs, 2995+, 2893-, diff: +102  tcp_skb_pcount 
 30 funcs, 930+, 2-, diff: +928  before 

-- 
 i.

[PATCH net-2.6.25 0/6][NETNS]: Make ipv6_devconf (all and default) live in net namespaces

The ipv6_devconf_(all) and ipv6_devconf_dflt are currently
global, but should be per-namespace.

This set moves them on the struct net. Or, more precisely,
on the struct netns_ipv6, which is already added.

Unfortunately, many code in the ipv6 cannot yet provide a 
correct struct net to get the ipv6_devconf from (e.g. routing 
code), so this part of job is to be done after the appropriate 
parts are virtualized.

However, after this set user can play with the ipv6_devconf 
inside a namespace not affecting the others.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-2.6.25 1/6][NETNS]: Clean out the ipv6-related sysctls creation/destruction

The addrconf sysctls and neigh sysctls are registered and
unregistered always in pairs, so they can be joined into
one (well, two) functions, that accept the struct inet6_dev
and do all the job.

This also get rids of unneeded ifdefs inside the code.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---
 net/ipv6/addrconf.c |   63 +++---
 1 files changed, 34 insertions(+), 29 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 6a48bb8..27b35dd 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -102,7 +102,15 @@
 
 #ifdef CONFIG_SYSCTL
 static void addrconf_sysctl_register(struct inet6_dev *idev);
-static void addrconf_sysctl_unregister(struct ipv6_devconf *p);
+static void addrconf_sysctl_unregister(struct inet6_dev *idev);
+#else
+static inline void addrconf_sysctl_register(struct inet6_dev *idev)
+{
+}
+
+static inline void addrconf_sysctl_unregister(struct inet6_dev *idev)
+{
+}
 #endif
 
 #ifdef CONFIG_IPV6_PRIVACY
@@ -392,13 +400,7 @@ static struct inet6_dev * ipv6_add_dev(struct net_device 
*dev)
 
ipv6_mc_init_dev(ndev);
ndev-tstamp = jiffies;
-#ifdef CONFIG_SYSCTL
-   neigh_sysctl_register(dev, ndev-nd_parms, NET_IPV6,
- NET_IPV6_NEIGH, ipv6,
- ndisc_ifinfo_sysctl_change,
- NULL);
addrconf_sysctl_register(ndev);
-#endif
/* protected by rtnl_lock */
rcu_assign_pointer(dev-ip6_ptr, ndev);
 
@@ -2391,15 +2393,8 @@ static int addrconf_notify(struct notifier_block *this, 
unsigned long event,
case NETDEV_CHANGENAME:
if (idev) {
snmp6_unregister_dev(idev);
-#ifdef CONFIG_SYSCTL
-   addrconf_sysctl_unregister(idev-cnf);
-   neigh_sysctl_unregister(idev-nd_parms);
-   neigh_sysctl_register(dev, idev-nd_parms,
- NET_IPV6, NET_IPV6_NEIGH, ipv6,
- ndisc_ifinfo_sysctl_change,
- NULL);
+   addrconf_sysctl_unregister(idev);
addrconf_sysctl_register(idev);
-#endif
err = snmp6_register_dev(idev);
if (err)
return notifier_from_errno(err);
@@ -2523,10 +2518,7 @@ static int addrconf_ifdown(struct net_device *dev, int 
how)
/* Shot the device (if unregistered) */
 
if (how == 1) {
-#ifdef CONFIG_SYSCTL
-   addrconf_sysctl_unregister(idev-cnf);
-   neigh_sysctl_unregister(idev-nd_parms);
-#endif
+   addrconf_sysctl_unregister(idev);
neigh_parms_release(nd_tbl, idev-nd_parms);
neigh_ifdown(nd_tbl, dev);
in6_dev_put(idev);
@@ -4106,21 +4098,34 @@ out:
return;
 }
 
+static void __addrconf_sysctl_unregister(struct ipv6_devconf *p)
+{
+   struct addrconf_sysctl_table *t;
+
+   if (p-sysctl == NULL)
+   return;
+
+   t = p-sysctl;
+   p-sysctl = NULL;
+   unregister_sysctl_table(t-sysctl_header);
+   kfree(t-dev_name);
+   kfree(t);
+}
+
 static void addrconf_sysctl_register(struct inet6_dev *idev)
 {
+   neigh_sysctl_register(idev-dev, idev-nd_parms, NET_IPV6,
+ NET_IPV6_NEIGH, ipv6,
+ ndisc_ifinfo_sysctl_change,
+ NULL);
__addrconf_sysctl_register(idev-dev-name, idev-dev-ifindex,
idev, idev-cnf);
 }
 
-static void addrconf_sysctl_unregister(struct ipv6_devconf *p)
+static void addrconf_sysctl_unregister(struct inet6_dev *idev)
 {
-   if (p-sysctl) {
-   struct addrconf_sysctl_table *t = p-sysctl;
-   p-sysctl = NULL;
-   unregister_sysctl_table(t-sysctl_header);
-   kfree(t-dev_name);
-   kfree(t);
-   }
+   __addrconf_sysctl_unregister(idev-cnf);
+   neigh_sysctl_unregister(idev-nd_parms);
 }
 
 
@@ -4232,8 +4237,8 @@ void addrconf_cleanup(void)
unregister_netdevice_notifier(ipv6_dev_notf);
 
 #ifdef CONFIG_SYSCTL
-   addrconf_sysctl_unregister(ipv6_devconf_dflt);
-   addrconf_sysctl_unregister(ipv6_devconf);
+   __addrconf_sysctl_unregister(ipv6_devconf_dflt);
+   __addrconf_sysctl_unregister(ipv6_devconf);
 #endif
 
rtnl_lock();
-- 
1.5.3.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[DECNET] ROUTE: fix rcu_dereference() uses in /proc/net/decnet_cache

2008-01-10 Thread Eric Dumazet

Hi David

Here is DECNET part, shadowing commit 0bcceadceb0907094ba4e40bf9a7cd9b080f13fb 
([IPV4] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache )

Thank you


[DECNET] ROUTE: fix rcu_dereference() uses in /proc/net/decnet_cache

In dn_rt_cache_get_next(), no need to guard seq-private by a rcu_dereference()
since seq is private to the thread running this function. Reading seq.private
once (as guaranted bu rcu_dereference()) or several time if compiler really is 
dumb enough wont change the result.
 
But we miss real spots where rcu_dereference() are needed, both in 
dn_rt_cache_get_first() and dn_rt_cache_get_next()

Signed-off-by: Eric Dumazet [EMAIL PROTECTED]

diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index 3e5..0e10ff2 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -1665,12 +1665,12 @@ static struct dn_route *dn_rt_cache_get_first(struct 
seq_file *seq)
break;
rcu_read_unlock_bh();
}
-   return rt;
+   return rcu_dereference(rt);
 }
 
 static struct dn_route *dn_rt_cache_get_next(struct seq_file *seq, struct 
dn_route *rt)
 {
-   struct dn_rt_cache_iter_state *s = rcu_dereference(seq-private);
+   struct dn_rt_cache_iter_state *s = seq-private;
 
rt = rt-u.dst.dn_next;
while(!rt) {
@@ -1680,7 +1680,7 @@ static struct dn_route *dn_rt_cache_get_next(struct 
seq_file *seq, struct dn_rou
rcu_read_lock_bh();
rt = dn_rt_hash_table[s-bucket].chain;
}
-   return rt;
+   return rcu_dereference(rt);
 }
 
 static void *dn_rt_cache_seq_start(struct seq_file *seq, loff_t *pos)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

SMP code / network stack

Hi All,

If a server has multiple processors and N number of ethernet cards, is
it possible to handle transmission by each processor separately? .In
other words, each processor will be responsible for tx of few ethernet
cards?.



Example: Server has 4 processors and 8 ethernet cards. is it possible
for each processor for transmission using 2 ethernet cards only?. So
that, at a instant , data will be send out from 8 ethernet cards.


Thanks
Jeba
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-2.6.25 4/6][NETNS]: Create ipv6 devconf-s for namespaces

This is the core. Declare and register the pernet subsys for
addrconf. The init callback the will create the devconf-s.

The init_net will reuse the existing statically declared confs,
so that accessing them from inside the ipv6 code will still
work.

The register_pernet_subsys() is moved above the ipv6_add_dev()
call for loopback, because this function will need the
net-devconf_dflt pointer to be already set.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---
 include/net/netns/ipv6.h |2 +
 net/ipv6/addrconf.c  |   82 +++---
 2 files changed, 72 insertions(+), 12 deletions(-)

diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
index 10733a6..06b4dc0 100644
--- a/include/net/netns/ipv6.h
+++ b/include/net/netns/ipv6.h
@@ -28,5 +28,7 @@ struct netns_sysctl_ipv6 {
 
 struct netns_ipv6 {
struct netns_sysctl_ipv6 sysctl;
+   struct ipv6_devconf *devconf_all;
+   struct ipv6_devconf *devconf_dflt;
 };
 #endif
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index bde50c6..3ad081e 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4135,6 +4135,70 @@ static void addrconf_sysctl_unregister(struct inet6_dev 
*idev)
 
 #endif
 
+static int addrconf_init_net(struct net *net)
+{
+   int err;
+   struct ipv6_devconf *all, *dflt;
+
+   err = -ENOMEM;
+   all = ipv6_devconf;
+   dflt = ipv6_devconf_dflt;
+
+   if (net != init_net) {
+   all = kmemdup(all, sizeof(ipv6_devconf), GFP_KERNEL);
+   if (all == NULL)
+   goto err_alloc_all;
+
+   dflt = kmemdup(dflt, sizeof(ipv6_devconf_dflt), GFP_KERNEL);
+   if (dflt == NULL)
+   goto err_alloc_dflt;
+   }
+
+   net-ipv6.devconf_all = all;
+   net-ipv6.devconf_dflt = dflt;
+
+#ifdef CONFIG_SYSCTL
+   err = __addrconf_sysctl_register(net, all, NET_PROTO_CONF_ALL,
+   NULL, all);
+   if (err  0)
+   goto err_reg_all;
+
+   err = __addrconf_sysctl_register(net, default, NET_PROTO_CONF_DEFAULT,
+   NULL, dflt);
+   if (err  0)
+   goto err_reg_dflt;
+#endif
+   return 0;
+
+#ifdef CONFIG_SYSCTL
+err_reg_dflt:
+   __addrconf_sysctl_unregister(all);
+err_reg_all:
+   kfree(dflt);
+#endif
+err_alloc_dflt:
+   kfree(all);
+err_alloc_all:
+   return err;
+}
+
+static void addrconf_exit_net(struct net *net)
+{
+#ifdef CONFIG_SYSCTL
+   __addrconf_sysctl_unregister(net-ipv6.devconf_dflt);
+   __addrconf_sysctl_unregister(net-ipv6.devconf_all);
+#endif
+   if (net != init_net) {
+   kfree(net-ipv6.devconf_dflt);
+   kfree(net-ipv6.devconf_all);
+   }
+}
+
+static struct pernet_operations addrconf_ops = {
+   .init = addrconf_init_net,
+   .exit = addrconf_exit_net,
+};
+
 /*
  *  Device notifier
  */
@@ -4167,6 +4231,8 @@ int __init addrconf_init(void)
return err;
}
 
+   register_pernet_subsys(addrconf_ops);
+
/* The addrconf netdev notifier requires that loopback_dev
 * has it's ipv6 private information allocated and setup
 * before it can bring up and give link-local addresses
@@ -4190,7 +4256,7 @@ int __init addrconf_init(void)
err = -ENOMEM;
rtnl_unlock();
if (err)
-   return err;
+   goto errlo;
 
ip6_null_entry.u.dst.dev = init_net.loopback_dev;
ip6_null_entry.rt6i_idev = in6_dev_get(init_net.loopback_dev);
@@ -4218,16 +4284,11 @@ int __init addrconf_init(void)
 
ipv6_addr_label_rtnl_register();
 
-#ifdef CONFIG_SYSCTL
-   __addrconf_sysctl_register(init_net, all, NET_PROTO_CONF_ALL,
-   NULL, ipv6_devconf);
-   __addrconf_sysctl_register(init_net, default, NET_PROTO_CONF_DEFAULT,
-   NULL, ipv6_devconf_dflt);
-#endif
-
return 0;
 errout:
unregister_netdevice_notifier(ipv6_dev_notf);
+errlo:
+   unregister_pernet_subsys(addrconf_ops);
 
return err;
 }
@@ -4240,10 +4301,7 @@ void addrconf_cleanup(void)
 
unregister_netdevice_notifier(ipv6_dev_notf);
 
-#ifdef CONFIG_SYSCTL
-   __addrconf_sysctl_unregister(ipv6_devconf_dflt);
-   __addrconf_sysctl_unregister(ipv6_devconf);
-#endif
+   unregister_pernet_subsys(addrconf_ops);
 
rtnl_lock();
 
-- 
1.5.3.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-2.6.25 5/6][NETNS]: Use the per-net ipv6_devconf_dflt

All its users are in net/ipv6/addrconf.c's sysctl handlers.
Since they already have the struct net to get from, the
per-net ipv6_devconf_dflt can already be used.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---
 net/ipv6/addrconf.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 3ad081e..9b96de3 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -334,7 +334,7 @@ static struct inet6_dev * ipv6_add_dev(struct net_device 
*dev)
 
rwlock_init(ndev-lock);
ndev-dev = dev;
-   memcpy(ndev-cnf, ipv6_devconf_dflt, sizeof(ndev-cnf));
+   memcpy(ndev-cnf, dev-nd_net-ipv6.devconf_dflt, sizeof(ndev-cnf));
ndev-cnf.mtu6 = dev-mtu;
ndev-cnf.sysctl = NULL;
ndev-nd_parms = neigh_parms_alloc(dev, nd_tbl);
@@ -481,11 +481,11 @@ static void addrconf_fixup_forwarding(struct ctl_table 
*table, int *p, int old)
struct net *net;
 
net = (struct net *)table-extra2;
-   if (p == ipv6_devconf_dflt.forwarding)
+   if (p == net-ipv6.devconf_dflt-forwarding)
return;
 
if (p == ipv6_devconf.forwarding) {
-   ipv6_devconf_dflt.forwarding = ipv6_devconf.forwarding;
+   net-ipv6.devconf_dflt-forwarding = ipv6_devconf.forwarding;
addrconf_forward_change(net);
} else if ((!*p) ^ (!old))
dev_forward_change((struct inet6_dev *)table-extra1);
-- 
1.5.3.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-2.6.25 2/6][NETNS]: Make the __addrconf_sysctl_register return an error

This error code will be needed to abort the namespace
creation if needed.

Probably, this is to be checked when a new device is
created (currently it is ignored).

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---
 net/ipv6/addrconf.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 27b35dd..18d4334 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4044,7 +4044,7 @@ static struct addrconf_sysctl_table
},
 };
 
-static void __addrconf_sysctl_register(char *dev_name, int ctl_name,
+static int __addrconf_sysctl_register(char *dev_name, int ctl_name,
struct inet6_dev *idev, struct ipv6_devconf *p)
 {
int i;
@@ -4088,14 +4088,14 @@ static void __addrconf_sysctl_register(char *dev_name, 
int ctl_name,
goto free_procname;
 
p-sysctl = t;
-   return;
+   return 0;
 
 free_procname:
kfree(t-dev_name);
 free:
kfree(t);
 out:
-   return;
+   return -ENOBUFS;
 }
 
 static void __addrconf_sysctl_unregister(struct ipv6_devconf *p)
-- 
1.5.3.4


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-2.6.25 3/6][NETNS]: Make the ctl-tables per-namespace

This includes passing the net to __addrconf_sysctl_register
and saving this on the ctl_table-extra2 to be used in
handlers (those, needing it).

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---
 net/ipv6/addrconf.c |   24 ++--
 1 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 18d4334..bde50c6 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -456,13 +456,13 @@ static void dev_forward_change(struct inet6_dev *idev)
 }
 
 
-static void addrconf_forward_change(void)
+static void addrconf_forward_change(struct net *net)
 {
struct net_device *dev;
struct inet6_dev *idev;
 
read_lock(dev_base_lock);
-   for_each_netdev(init_net, dev) {
+   for_each_netdev(net, dev) {
rcu_read_lock();
idev = __in6_dev_get(dev);
if (idev) {
@@ -478,12 +478,15 @@ static void addrconf_forward_change(void)
 
 static void addrconf_fixup_forwarding(struct ctl_table *table, int *p, int old)
 {
+   struct net *net;
+
+   net = (struct net *)table-extra2;
if (p == ipv6_devconf_dflt.forwarding)
return;
 
if (p == ipv6_devconf.forwarding) {
ipv6_devconf_dflt.forwarding = ipv6_devconf.forwarding;
-   addrconf_forward_change();
+   addrconf_forward_change(net);
} else if ((!*p) ^ (!old))
dev_forward_change((struct inet6_dev *)table-extra1);
 
@@ -4044,8 +4047,8 @@ static struct addrconf_sysctl_table
},
 };
 
-static int __addrconf_sysctl_register(char *dev_name, int ctl_name,
-   struct inet6_dev *idev, struct ipv6_devconf *p)
+static int __addrconf_sysctl_register(struct net *net, char *dev_name,
+   int ctl_name, struct inet6_dev *idev, struct ipv6_devconf *p)
 {
int i;
struct addrconf_sysctl_table *t;
@@ -4068,6 +4071,7 @@ static int __addrconf_sysctl_register(char *dev_name, int 
ctl_name,
for (i=0; t-addrconf_vars[i].data; i++) {
t-addrconf_vars[i].data += (char*)p - (char*)ipv6_devconf;
t-addrconf_vars[i].extra1 = idev; /* embedded; no ref */
+   t-addrconf_vars[i].extra2 = net;
}
 
/*
@@ -4082,7 +4086,7 @@ static int __addrconf_sysctl_register(char *dev_name, int 
ctl_name,
addrconf_ctl_path[ADDRCONF_CTL_PATH_DEV].procname = t-dev_name;
addrconf_ctl_path[ADDRCONF_CTL_PATH_DEV].ctl_name = ctl_name;
 
-   t-sysctl_header = register_sysctl_paths(addrconf_ctl_path,
+   t-sysctl_header = register_net_sysctl_table(net, addrconf_ctl_path,
t-addrconf_vars);
if (t-sysctl_header == NULL)
goto free_procname;
@@ -4118,8 +4122,8 @@ static void addrconf_sysctl_register(struct inet6_dev 
*idev)
  NET_IPV6_NEIGH, ipv6,
  ndisc_ifinfo_sysctl_change,
  NULL);
-   __addrconf_sysctl_register(idev-dev-name, idev-dev-ifindex,
-   idev, idev-cnf);
+   __addrconf_sysctl_register(idev-dev-nd_net, idev-dev-name,
+   idev-dev-ifindex, idev, idev-cnf);
 }
 
 static void addrconf_sysctl_unregister(struct inet6_dev *idev)
@@ -4215,9 +4219,9 @@ int __init addrconf_init(void)
ipv6_addr_label_rtnl_register();
 
 #ifdef CONFIG_SYSCTL
-   __addrconf_sysctl_register(all, NET_PROTO_CONF_ALL,
+   __addrconf_sysctl_register(init_net, all, NET_PROTO_CONF_ALL,
NULL, ipv6_devconf);
-   __addrconf_sysctl_register(default, NET_PROTO_CONF_DEFAULT,
+   __addrconf_sysctl_register(init_net, default, NET_PROTO_CONF_DEFAULT,
NULL, ipv6_devconf_dflt);
 #endif
 
-- 
1.5.3.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-2.6.25 6/6][NETNS]: Use the per-net ipv6_devconf(_all) in sysctl handlers

Actually the net-ipv6.devconf_all can be used in a few places,
but to keep the /proc/sys/net/ipv6/conf/ sysctls work consistently
in the namespace we should use the per-net devconf_all in the
sysctl forwarding handler.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---
 net/ipv6/addrconf.c |   13 +++--
 1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 9b96de3..cd90f9a 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -456,7 +456,7 @@ static void dev_forward_change(struct inet6_dev *idev)
 }
 
 
-static void addrconf_forward_change(struct net *net)
+static void addrconf_forward_change(struct net *net, __s32 newf)
 {
struct net_device *dev;
struct inet6_dev *idev;
@@ -466,8 +466,8 @@ static void addrconf_forward_change(struct net *net)
rcu_read_lock();
idev = __in6_dev_get(dev);
if (idev) {
-   int changed = (!idev-cnf.forwarding) ^ 
(!ipv6_devconf.forwarding);
-   idev-cnf.forwarding = ipv6_devconf.forwarding;
+   int changed = (!idev-cnf.forwarding) ^ (!newf);
+   idev-cnf.forwarding = newf;
if (changed)
dev_forward_change(idev);
}
@@ -484,9 +484,10 @@ static void addrconf_fixup_forwarding(struct ctl_table 
*table, int *p, int old)
if (p == net-ipv6.devconf_dflt-forwarding)
return;
 
-   if (p == ipv6_devconf.forwarding) {
-   net-ipv6.devconf_dflt-forwarding = ipv6_devconf.forwarding;
-   addrconf_forward_change(net);
+   if (p == net-ipv6.devconf_all-forwarding) {
+   __s32 newf = net-ipv6.devconf_all-forwarding;
+   net-ipv6.devconf_dflt-forwarding = newf;
+   addrconf_forward_change(net, newf);
} else if ((!*p) ^ (!old))
dev_forward_change((struct inet6_dev *)table-extra1);
 
-- 
1.5.3.4


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

TCP/IP stack / SMP kernel

Hi All,
I am just wondering how TCP/IP stack runs in SMP kernel with multi
processor environment?. will TCP/IP stack be on one processor or it is
shared among the different processors?

thanks
Jeba
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: SMP code / network stack

2008-01-10 Thread Eric Dumazet

On Thu, 10 Jan 2008 14:05:46 +
Jeba Anandhan [EMAIL PROTECTED] wrote:

 Hi All,
 
 If a server has multiple processors and N number of ethernet cards, is
 it possible to handle transmission by each processor separately? .In
 other words, each processor will be responsible for tx of few ethernet
 cards?.
 
 
 
 Example: Server has 4 processors and 8 ethernet cards. is it possible
 for each processor for transmission using 2 ethernet cards only?. So
 that, at a instant , data will be send out from 8 ethernet cards.

Hi Jeba

Modern ethernet cards have a big TX queue, so that even one CPU is enough
to keep several cards busy in //

You can check /proc/interrupts and change /proc/irq/*/smp_affinities to direct 
IRQ to 
particular cpus, but transmit is usually trigered by processes that might run 
on different
cpus.

If all ethernet cards are on the same IRQ, then you might have a problem...

Example on a dual processor :
# cat /proc/interrupts 
   CPU0   CPU1   
  0:   11472559   74291833IO-APIC-edge  timer
  2:  0  0  XT-PIC  cascade
  8:  0  1IO-APIC-edge  rtc
 81:  0  0   IO-APIC-level  ohci_hcd
 97: 1830022231847   IO-APIC-level  ehci_hcd, eth0
121:  163095662  166443627   IO-APIC-level  libata
NMI:  0  0 
LOC:   85887285   85887193 
ERR:  0
MIS:  0

You can see eth0 is on IRQ 97
Then :
# cat /proc/irq/97/smp_affinity 
0001
# echo 2 /proc/irq/97/smp_affinity
# grep 97 /proc/interrupts
 97: 1830035216   2259   IO-APIC-level  ehci_hcd, eth0
# sleep 10
# grep 97 /proc/interrupts
 97: 1830035216   5482   IO-APIC-level  ehci_hcd, eth0

You can see only CPU1 is now handling IRQ 97 (but CPU0 is allowed to give to 
eth0 some transmit work)

You might want to check /proc/net/softnet_stat too.

If your server is doing something very special (network trafic, no disk 
accesses or number crunching),
 you might need to bind application processes to cpus, not only network irqs.

process A, using nic eth0  eth1, bound to CPU 0 (process and IRQs)
process B, using nic eth2  eth3, bound to CPU 1
process C, using nic eth4  eth5, bound to CPU 2
process D, using nic eth6  eth7, bound to CPU 3


Also, take a look at ethtool -c ethX command
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

ipip tunnel code (IPV4)

2008-01-10 Thread Andy Johnson

Hello,

I am trying to learn the IPV4 ipip tunnel code  (net/ipv4/ipip.c)
and I have two little questions about
semantics of variables:

ipip_fb_tunnel_init - what does fb stand for ?

In tunnels_wc   : what does wc stand for ?

Regards,
Andy
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] bonding: 3 fixes for 2.6.24

2008-01-10 Thread Andy Gospodarek

On Thu, Jan 10, 2008 at 11:58:09AM +1100, Herbert Xu wrote:
 On Wed, Jan 09, 2008 at 03:19:10PM -0800, Jay Vosburgh wrote:
 
  No that's not the point.  The point is to move the majority of the code
  into process context so that you can take the RTNL.  Once you have taken
  the RTNL you can disable BH all you want and I don't care one bit.
  
  I'm not sure how we could move more code into a process context;
  much of the bonding driver is at the mercy of its callers, as in this
  case.  The monitoring stuff and enslave / deslave is all in a process
  context now (workqueue).  The transmit processing functions, for
  example, can't be assumed to be in any particular context as they're
  called by dev_queue_xmit.
 
 No I'm not calling for you to move any more code into process context.
 I was replying to the comment that changing the read_lock calls in
 process context to read_lock_bh somehow undoes the benefit of moving
 softirq code into process context.  It does not since the point of the
 move is to be able to take the RTNL, which you can still do as long as
 you do it before you disable BH.
 

That wasn't the only purpose, Herbert.  Making sure that calls to
dev_set_mac_address were called from process context was important at
the time of the coding as well since at least the tg3 driver took locks
that could not be taken reliably in soft-irq context.  Michael Chan
fixed this here:

commit 986e0aeb9ae09127b401c3baa66f15b7a31f354c
Author: Michael Chan [EMAIL PROTECTED]
Date:   Sat May 5 12:10:20 2007 -0700

[TG3]: Remove reset during MAC address changes.

so if wasn't as much of an issue after that, but moving as much of the
code to process context was important for that as well (hence the move
to not continue to try to not use bh-locks everywhere).


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-2.6.25][NEIGH]: Add a comment describing what a NUD stands for.

When I studied the neighbor code I puzzled over what
the NUD can mean for quite a long time.

Finally I asked Alexey and he said that this was smth
like neighbor unreachability detection.

Does it worth adding a comment helping future developers 
understand what's going on?

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index 09f9fc6..bc34144 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -26,6 +26,10 @@
 #include linux/sysctl.h
 #include net/rtnetlink.h
 
+/*
+ * NUD stands for neighbor unreachability detection
+ */
+
 #define NUD_IN_TIMER   (NUD_INCOMPLETE|NUD_REACHABLE|NUD_DELAY|NUD_PROBE)
 #define NUD_VALID  
(NUD_PERMANENT|NUD_NOARP|NUD_REACHABLE|NUD_PROBE|NUD_STALE|NUD_DELAY)
 #define NUD_CONNECTED  (NUD_PERMANENT|NUD_NOARP|NUD_REACHABLE)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: No idea about shaping trough many pc

2008-01-10 Thread Lennart Sorensen

On Thu, Jan 10, 2008 at 12:06:35PM +0300, Badalian Vyacheslav wrote:
 Hello all.
 I try more then 2 month resolve problem witch my shaping.  Maybe you can 
 help for me?
 
 Sheme:
+---+
 + - | Shaping PC 1 | -+
 /  +---+  \
 ++   /   ++  \  
 + +
 | Cisco |  + | Shaping PC N  | ---+ -| CISCO |
 ++   \   ++  /  
 +-+
 \  +-+   /
 + - | Shaping PC 20 | +
+-+
 
 Network - Over 10k users. Common bandwidth to INTERNET more then 1 GBs
 All computers have BGP and turn on multipath.
 Cisco can't do load sharing by Packet (its can resolve all my problems 
 =((( ). Only by DST IP, SRC IP, or +Level4.
 Ok. User must have speed 1mbs.
 Lets look variants:
 1. Create rules to user = (1mbs/N computers). If user use N connection 
 all great, but if it use 1 connection his speed = 1mbs/N - its not look 
 good. All be great if cisco can PER PACKET load sharing =(
 2. Create rules to user = 1mbs. If user use 1 connection all great, but 
 if it use N connection his speed much more then needed limit =(
 
 Why i use 20 PC? Becouse 1 pc normal forward 100-150mbs... when it have 
 100% cpu usage on Sofware Interrupts...

I have managed forwarding of 600Mbps using about 15% CPU load on a
500MHz Geode LX, using 4 100Mbit pcnet32 interfaces and a small tweak to
how the NAPI is implemented on it.  Adding traffic shapping and such to
the processing would certainly increase the CPU load, but hopefully not
by much.  The reason I didn't get more than 600Mbps was that the PCI bus
is now full.

 Any idea how to resolve this problem?
 
 In my dreams (feature request to netdev ;) ):
 Get PC - title: MASTER TC.  All 20 PC syncronize statistic with MASTER 
 and have common rules and statistic. Then i use variant 2 and will be 
 happy... but its not real? =(
 Maybe have other variants?

Well now sure about synchornizing and all that.  I still think if I can
manage 600Mbps forwarding rate using a slow poke Geode then a modern CPU
like a Q6600 with a number of PCIe gig ports should be able to do quite
a lot.

The tweak I did was to add a timer to the driver that I can activate
whenever I finish emptying the receive queue.  When the timer expires it
adds the port back to the NAPI queue, and when it is called again the
poll will either process whatever packets arrived during the delay, or
it will actually unmask the IRQ and go back to IRQ mode.  The delay I
use is 1 jiffy, and I run with 1000HZ and set the queues to 256 packets,
since 1ms at 100MBps can provide at most about 200 packets (64byte worst
case).  I simply check whenever I empty the queue how many packets I
just processed.  If greater than 0, I enable the timer to expire on the
next jiffy and leave the port masked after removing port from napi
polling, and if it was 0 then I must have been called again after the
timer expired and still had no packets to process in which case I unmask
the IRQ and don't enable the timer.  I had to change the HZ to 1000
since at 250 or 100 I wouldn't be able to handle the worst case number
of packets (the pcnet32 has a maximum of 512 packets in a queue).

With NAPI the normal behaviour is that whenever you empty the receive
queue, you reenable IRQs, but it doesn't take that fast a CPU to
actually empty the queue all the time and then you end up with the
overhead for masking IRQs everytime you receive packets, process them,
and then the overhead of unmasking the IRQ just to within a fraction of
a milisecond getting an IRQ for the next packet.  With the delay until
the next jiffy for unmasking the IRQ you end up causing a potential lag
on processing packets of up to 1ms, although on average less than that,
but the IRQ load drops dramatically and the overhead of managing the IRQ
masking and the IRQ handler goes away.  In the case of this system the
CPU load dropped from 90% at 500Mbps to 15% at 600Mbps, and the
interrupt rate dropped from one IRQ every couple of packets, to one IRQ
at the start of each burst of packets.

I believe some GB ethernet ports and most 10Gig ports have the ability
to do delayed IRQ where they wait for a certain number of packets before
generating an IRQ, which is pretty much what I tried to emulate with my
tweak and it sure works amazingly well.

--
Len Sorensen
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Bugme-new] [Bug 9719] New: when a system is configured as a bridge, and at the same time configured to have multipath weighted route, with one leg goes thru NAT and another without NAT, the nat


Andrew Morton wrote:

Distribution: iptables 1.4.0 was used with kernel 2.6.23 and iptables 1.3.8
with 2.6.22.15
Hardware Environment: 3 interfaces, 2 interfaces bridged to form br0, and
another connects to internet using pppoe.
Software Environment: bridge, multipath routing
Problem Description: when a system is configured as a bridge with IP assigned
to br0 interface, and at the same time it is configured to have multipath
weighted default route, and one of the default route is NAT-ed and another of
the default route is not NAT-ed, then it is NAT-ed interface will occasionally
get packets leaking out to it with packets with private IPs.



That is most likely because the route changes over time (when the cache
is flushed) and the NAT mappings for the connection have been set up on
a different interface. The way to properly do this is to add routing
rules based on fwmark and use CONNMARK to bind a connection to one of
the interfaces after the initial multipath routing decision.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: virtio_net and SMP guests

2008-01-10 Thread Christian Borntraeger

Am Donnerstag, 10. Januar 2008 schrieb Christian Borntraeger:
 Am Donnerstag, 10. Januar 2008 schrieb Christian Borntraeger:
  Am Dienstag, 18. Dezember 2007 schrieb Rusty Russell:
   To me this points to doing interrupt suppression a different way.  If 
we
   have a -disable_cb() virtio function, and call it before we call
   netif_rx_schedule, does that fix it?
  
  The fix looks good and I agree with it.
  
  There is one problem that I try to find for some days, but the following 
  BUG_ON triggers:
  
  static void vring_disable_cb(struct virtqueue *_vq)
  {
  struct vring_virtqueue *vq = to_vvq(_vq);
  
  START_USE(vq);
     BUG_ON(vq-vring.avail-flags  VRING_AVAIL_F_NO_INTERRUPT);
  vq-vring.avail-flags |= VRING_AVAIL_F_NO_INTERRUPT;
  END_USE(vq);
  }
 
 Ok, I found it:
 
 static int virtnet_open(struct net_device *dev)
 {
 struct virtnet_info *vi = netdev_priv(dev);
 try_fill_recv(vi);
 /* If we didn't even get one input buffer, we're useless. */
 if (vi-num == 0)
 return -ENOMEM;
   --- int for new packet
   static void skb_recv_done(struct 
 virtqueue *rvq)
   {
   struct virtnet_info *vi = 
 rvq-vdev-priv;
   /* Suppress further interrupts. 
 */
   rvq-vq_ops-disable_cb(rvq);
   netif_rx_schedule(vi-dev, 
 vi-napi);
   }
   - poll is not yet possible, no softirq
   - return from interrupt
 napi_enable(vi-napi);
   vi-rvq-vq_ops-disable_cb(vi-rvq);
   --- BUG: its already disabled

Btw. this problem also happens on single processor guests.

What about the following patch:

---
 drivers/net/virtio_net.c |9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Index: kvm/drivers/net/virtio_net.c
===
--- kvm.orig/drivers/net/virtio_net.c
+++ kvm/drivers/net/virtio_net.c
@@ -179,9 +179,12 @@ static void try_fill_recv(struct virtnet
 static void skb_recv_done(struct virtqueue *rvq)
 {
struct virtnet_info *vi = rvq-vdev-priv;
-   /* Suppress further interrupts. */
-   rvq-vq_ops-disable_cb(rvq);
-   netif_rx_schedule(vi-dev, vi-napi);
+   /* Schedule NAPI, Suppress further interrupts if successful. */
+
+   if (netif_rx_schedule_prep(vi-dev, vi-napi)) {
+   rvq-vq_ops-disable_cb(rvq);
+   __netif_rx_schedule(vi-dev, vi-napi);
+   }
 }
 
 static int virtnet_poll(struct napi_struct *napi, int budget)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[VLAN]: nested VLAN: fix lockdep's recursive locking warning

 [VLAN]: nested VLAN: fix lockdep's recursive locking warning

Allow vlans nesting other vlans without lockdep's warnings (max. 2 levels
i.e. parent + child). Thanks to Patrick McHardy for pointing a bug in the
first version of this patch.

Reported-by: Benny Amorsen

Signed-off-by: Jarek Poplawski [EMAIL PROTECTED]
Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

---
commit 4d14fded63dcaf9d5dcf78e2a8ea3f5de2c29eb9
tree 2f0792e8240151b1e5437b05130d1f569175f572
parent e2474f60798c97f5c05d29a906045dd1f416ba7f
author Jarek Poplawski [EMAIL PROTECTED] Thu, 10 Jan 2008 16:25:00 +0100
committer Patrick McHardy [EMAIL PROTECTED] Thu, 10 Jan 2008 16:25:00 +0100

 net/8021q/vlan.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 4add9bd..032bf44 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -323,6 +323,7 @@ static const struct header_ops vlan_header_ops = {
 static int vlan_dev_init(struct net_device *dev)
 {
 	struct net_device *real_dev = VLAN_DEV_INFO(dev)-real_dev;
+	int subclass = 0;
 
 	/* IFF_BROADCAST|IFF_MULTICAST; ??? */
 	dev-flags  = real_dev-flags  ~IFF_UP;
@@ -349,7 +350,11 @@ static int vlan_dev_init(struct net_device *dev)
 		dev-hard_start_xmit = vlan_dev_hard_start_xmit;
 	}
 
-	lockdep_set_class(dev-_xmit_lock, vlan_netdev_xmit_lock_key);
+	if (real_dev-priv_flags  IFF_802_1Q_VLAN)
+		subclass = 1;
+
+	lockdep_set_class_and_subclass(dev-_xmit_lock,
+vlan_netdev_xmit_lock_key, subclass);
 	return 0;
 }

Re: [PATCH take2] Re: Nested VLAN causes recursive locking error


Jarek Poplawski wrote:

As a matter of fact I started to doubt it's a real problem: 2 vlan
headers in the row - is it working?


Yes, apparently some people are using this.


Anyway, as Patrick pointed, the previous patch was a bit buggy, and
deeper nesting needs a little more (if it's can work too...). So,
here is something minimal.

Patrick, if you think about something else, then of course don't care
about this patch.


No, this seems fine, thanks. Even better would be a way to get
the last lockdep subclass through lockdep somehow, but I couldn't
find a clean way for this. So I've applied your patch and also
fixed macvlan.


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: SMP code / network stack

Hi Eric,
Thanks for the reply. I have one more doubt. For example, if we have 2
processor and 4 ethernet cards. Only CPU0 does all work through 8 cards.
If we set the affinity to each ethernet card as CPU number, will it be
efficient?.

Will this be default behavior?

# cat /proc/interrupts 
   CPU0   CPU1   
  0:   11472559   74291833IO-APIC-edge  timer
  2:  0  0  XT-PIC  cascade
  8:  0  1IO-APIC-edge  rtc
 81:  0  0   IO-APIC-level  ohci_hcd
 97: 1830022231847   IO-APIC-level  ehci_hcd, eth0
 97: 3830012232847   IO-APIC-level  ehci_hcd, eth1
 97: 5830052231847   IO-APIC-level  ehci_hcd, eth2
 97: 6830032213847   IO-APIC-level  ehci_hcd, eth3
#sleep 10

# cat /proc/interrupts 
   CPU0   CPU1   
  0:   11472559   74291833IO-APIC-edge  timer
  2:  0  0  XT-PIC  cascade
  8:  0  1IO-APIC-edge  rtc
 81:  0  0   IO-APIC-level  ohci_hcd
 97: 2031409801847   IO-APIC-level  ehci_hcd, eth0
 97: 4813981390847   IO-APIC-level  ehci_hcd, eth1
 97: 7123982139847   IO-APIC-level  ehci_hcd, eth2
 97: 8030193010847   IO-APIC-level  ehci_hcd, eth3


Instead of the above mentioned ,if we set the affinity for eth2 and
eth3.
the output will be

# cat /proc/interrupts 
   CPU0   CPU1   
  0:   11472559   74291833IO-APIC-edge  timer
  2:  0  0  XT-PIC  cascade
  8:  0  1IO-APIC-edge  rtc
 81:  0  0   IO-APIC-level  ohci_hcd
 97: 1830022231847   IO-APIC-level  ehci_hcd, eth0
 97: 3830012232847   IO-APIC-level  ehci_hcd, eth1
 97: 5830052231923   IO-APIC-level  ehci_hcd, eth2
 97: 68300322131230   IO-APIC-level  ehci_hcd, eth3
#sleep 10

# cat /proc/interrupts 
   CPU0   CPU1   
  0:   11472559   74291833IO-APIC-edge  timer
  2:  0  0  XT-PIC  cascade
  8:  0  1IO-APIC-edge  rtc
 81:  0  0   IO-APIC-level  ohci_hcd
 97: 2300022231847   IO-APIC-level  ehci_hcd, eth0
 97: 4010212232847   IO-APIC-level  ehci_hcd, eth1
 97: 58300522311847   IO-APIC-level  ehci_hcd, eth2
 97: 68300322132337   IO-APIC-level  ehci_hcd, eth3

In this case, will the performance improves?.

Thanks
Jeba
On Thu, 2008-01-10 at 15:45 +0100, Eric Dumazet wrote:
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[MACVLAN]: Prevent nesting macvlan devices

 [MACVLAN]: Prevent nesting macvlan devices

Don't allow to nest macvlan devices since it will cause lockdep warnings and
isn't really useful for anything.

Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

---
commit 80a76fbde679793a17482a3dd842386801fca66b
tree 07f67e78ac0ae505a5de81e7e770a1b7d597f120
parent 4d14fded63dcaf9d5dcf78e2a8ea3f5de2c29eb9
author Patrick McHardy [EMAIL PROTECTED] Thu, 10 Jan 2008 16:25:01 +0100
committer Patrick McHardy [EMAIL PROTECTED] Thu, 10 Jan 2008 16:25:01 +0100

 drivers/net/macvlan.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 2e4bcd5..e8dc2f4 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -384,6 +384,13 @@ static int macvlan_newlink(struct net_device *dev,
 	if (lowerdev == NULL)
 		return -ENODEV;
 
+	/* Don't allow macvlans on top of other macvlans - its not really
+	 * wrong, but lockdep can't handle it and its not useful for anything
+	 * you couldn't do directly on top of the real device.
+	 */
+	if (lowerdev-rtnl_link_ops == dev-rtnl_link_ops)
+		return -ENODEV;
+
 	if (!tb[IFLA_MTU])
 		dev-mtu = lowerdev-mtu;
 	else if (dev-mtu  lowerdev-mtu)

e1000 performance issue in 4 simultaneous links

2008-01-10 Thread Breno Leitao

Hello, 

I've perceived that there is a performance issue when running netperf
against 4 e1000 links connected end-to-end to another machine with 4
e1000 interfaces. 

I have 2 4-port interfaces on my machine, but the test is just
considering 2 port for each interfaces card.

When I run netperf in just one interface, I get 940.95 * 10^6 bits/sec
of transfer rate. If I run 4 netperf against 4 different interfaces, I
get around 720 * 10^6 bits/sec.  

If I run the same test against 2 interfaces I get a 940 * 10^6 bits/sec
transfer rate also, and if I run it against 3 interfaces I get around
850 * 10^6 bits/sec performance. 

I got this results using the upstream netdev-2.6 branch kernel plus
David Miller's 7 NAPI patches set[1]. In the kernel 2.6.23.12 the result
is a bit worse, and the the transfer rate was around 600 * 10^6
bits/sec.

[1] http://marc.info/?l=linux-netdevm=119977075917488w=2

PS: I am not using a switch in the middle of interfaces (they are
end-to-end) and the connections are independents.

-- 
Breno Leitao [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: e1000 performance issue in 4 simultaneous links

2008-01-10 Thread Ben Hutchings

Breno Leitao wrote:
 Hello, 
 
 I've perceived that there is a performance issue when running netperf
 against 4 e1000 links connected end-to-end to another machine with 4
 e1000 interfaces. 
 
 I have 2 4-port interfaces on my machine, but the test is just
 considering 2 port for each interfaces card.
 
 When I run netperf in just one interface, I get 940.95 * 10^6 bits/sec
 of transfer rate. If I run 4 netperf against 4 different interfaces, I
 get around 720 * 10^6 bits/sec.
snip

I take it that's the average for individual interfaces, not the
aggregate?  RX processing for multi-gigabits per second can be quite
expensive.  This can be mitigated by interrupt moderation and NAPI
polling, jumbo frames (MTU 1500) and/or Large Receive Offload (LRO).
I don't think e1000 hardware does LRO, but the driver could presumably
be changed use Linux's software LRO.

Even with these optimisations, if all RX processing is done on a
single CPU this can become a bottleneck.  Does the test system have
multiple CPUs?  Are IRQs for the multiple NICs balanced across
multiple CPUs?

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: e1000 performance issue in 4 simultaneous links

Ben,
I am facing the performance issue when we try to bond the multiple
interfaces with virtual interface. It could be related to this thread. 
My questions are,
*) When we use mulitple NICs, will the performance of overall system  be
summation of all individual lines  XX bits/sec. ?
*) What are the factors improves the performance if we have multiple
interfaces?. [ kind of tuning the parameters in proc ]

Breno, 
I hope this thread will be helpful for performance issue which i have
with bonding driver.

Jeba
On Thu, 2008-01-10 at 16:36 +, Ben Hutchings wrote:
 Breno Leitao wrote:
  Hello, 
  
  I've perceived that there is a performance issue when running netperf
  against 4 e1000 links connected end-to-end to another machine with 4
  e1000 interfaces. 
  
  I have 2 4-port interfaces on my machine, but the test is just
  considering 2 port for each interfaces card.
  
  When I run netperf in just one interface, I get 940.95 * 10^6 bits/sec
  of transfer rate. If I run 4 netperf against 4 different interfaces, I
  get around 720 * 10^6 bits/sec.
 snip
 
 I take it that's the average for individual interfaces, not the
 aggregate?  RX processing for multi-gigabits per second can be quite
 expensive.  This can be mitigated by interrupt moderation and NAPI
 polling, jumbo frames (MTU 1500) and/or Large Receive Offload (LRO).
 I don't think e1000 hardware does LRO, but the driver could presumably
 be changed use Linux's software LRO.
 
 Even with these optimisations, if all RX processing is done on a
 single CPU this can become a bottleneck.  Does the test system have
 multiple CPUs?  Are IRQs for the multiple NICs balanced across
 multiple CPUs?
 
 Ben.
 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

questions on NAPI processing latency and dropped network packets

2008-01-10 Thread Chris Friesen


Hi all,

I've got an issue that's popped up with a deployed system running 
2.6.10.  I'm looking for some help figuring out why incoming network 
packets aren't being processed fast enough.


After a recent userspace app change, we've started seeing packets being 
dropped by the ethernet hardware (e1000, NAPI is enabled).  The 
error/dropped/fifo counts are going up in ethtool:


 rx_packets: 32180834
 rx_bytes: 5480756958
 rx_errors: 862506
 rx_dropped: 771345
 rx_length_errors: 0
 rx_over_errors: 0
 rx_crc_errors: 0
 rx_frame_errors: 0
 rx_fifo_errors: 91161
 rx_missed_errors: 91161

This link is receiving roughly 13K packets/sec, and we're dropping 
roughly 51 packets/sec due to fifo errors.


Increasing the rx descriptor ring size from 256 up to around 3000 or so 
seems to make the problem stop, but it seems to me that this is just a 
workaround for the latency in processing the incoming packets.


So, I'm looking for some suggestions on how to fix this or to figure out 
where the latency is coming from.


Some additional information:


1) Interrupts are being processed on both cpus:

[EMAIL PROTECTED]:/root cat /proc/interrupts
   CPU0   CPU1
 30:17037564530785  U3-MPIC Level eth0




2) top shows a fair amount of time processing softirqs, but very 
little time in ksoftirqd (or is that a sampling artifact?).



Tasks: 79 total, 1 running, 78 sleeping, 0 stopped, 0 zombie
Cpu0: 23.6% us, 30.9% sy, 0.0% ni, 36.9% id, 0.0% wa, 0.3% hi, 8.3% si
Cpu1: 30.4% us, 24.1% sy, 0.0% ni, 5.9% id, 0.0% wa, 0.7% hi, 38.9% si
Mem:  4007812k total, 2199148k used,  1808664k free, 0k buffers
Swap:   0k total,   0k used,  0k free,   219844k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 5375 root  15   0 2682m 1.8g 6640 S 99.9 46.7  31:17.68 
SigtranServices
 7696 root  17   0  6952 3212 1192 S  7.3  0.1   0:15.75 
schedmon.ppc210

 7859 root  16   0  2688 1228  964 R  0.7  0.0   0:00.04 top
 2956 root   8  -8 18940 7436 5776 S  0.3  0.2   0:01.35 blademtc
1 root  16   0  1660  620  532 S  0.0  0.0   0:30.62 init
2 root  RT   0 000 S  0.0  0.0   0:00.01 migration/0
3 root  15   0 000 S  0.0  0.0   0:00.55 ksoftirqd/0
4 root  RT   0 000 S  0.0  0.0   0:00.01 migration/1
5 root  15   0 000 S  0.0  0.0   0:00.43 ksoftirqd/1


3) /proc/sys/net/core/netdev_max_backlog is set to the default of 300


So...anyone have any ideas/suggestions?

Thanks,

Chris
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PROCFS] [NETNS] issue with /proc/net entries

2008-01-10 Thread Benjamin Thery


Hi Eric,

While testing the current network namespace stuff merged in net-2.6.25,
I bumped into the following problem with the /proc/net/ entries.
It doesn't always display the actual data of the current namespace,
but sometime displays data from other namespaces.

I bisected the problem to the commit:
proc: remove/Fix proc generic d_revalidate
3790ee4bd86396558eedd86faac1052cb782e4e1

The problem: If a process in a particular network namespace changes
current directory to /proc/net, then processes in other network
namespaces trying to look at /proc/net entries will see data from the
first namespace (the one with CWD /proc/net). (See test case below).

As you comments in the commit suggest, you seem to be aware of some
issues when CONFIG_NET_NS=y. Is it one of these corner cases you
identified? Any idea on how we can fix it?

Thanks.

Benjamin


Test case:
--
(1) Shell 1, in init namespace:
$ cat /proc/net/dev
lo ...
eth0 ...

(2) Shell 2, in another network namespace
$ cat /proc/net/dev
lo ...

(3) Shell 1
$ cd /proc/net
$ cat dev
lo ...
eth0 ...

(4) Shell 2
$ cat /proc/net/dev
lo ...
eth0 ...

Argh, lo + eth0 in child namespace the device list of init netns
is displayed in /proc/net/dev of child namespace :-(

(5) Shell 1
$ cd /

(6) Shell 2
$ cat /proc/net/dev
lo ...

Back to normality.


--
B e n j a m i n   T h e r y  - BULL/DT/Open Software RD

   http://www.bull.com
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2.6.23+] ingress classify to [nf]mark

2008-01-10 Thread Dzianis Kahanovich


To classid x:y = mark=markx|y (classid :y = -j MARK --set-mark y, etc).

--- linux-2.6.23-gentoo-r2/net/sched/Kconfig
+++ linux-2.6.23-gentoo-r2.fixed/net/sched/Kconfig
@@ -222,6 +222,16 @@
  To compile this code as a module, choose M here: the
  module will be called sch_ingress.

+config NET_SCH_INGRESS_TC2MARK
+   bool ingress classify - mark
+   depends on NET_SCH_INGRESS  NET_CLS_ACT
+   ---help---
+ This enables access to mark value via classid
+ Example: set tc filter ... flowid|classid 1:2
+ eq netfilter mark mark=mark1|2
+   
+ But classid may be undefined (?) - use flowid :0.
+
 comment Classification

 config NET_CLS
--- linux-2.6.23-gentoo-r2/net/sched/sch_ingress.c
+++ linux-2.6.23-gentoo-r2.fixed/net/sched/sch_ingress.c
@@ -161,2 +161,5 @@
skb-tc_index = TC_H_MIN(res.classid);
+#ifdef CONFIG_NET_SCH_INGRESS_TC2MARK
+   skb-mark = 
(skb-mark(res.classid16))|TC_H_MIN(res.classid);
+#endif
default:


--
WBR,
Denis Kaganovich,  [EMAIL PROTECTED]  http://mahatma.bspu.unibel.by
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: e1000 performance issue in 4 simultaneous links

2008-01-10 Thread Breno Leitao

On Thu, 2008-01-10 at 16:36 +, Ben Hutchings wrote:
  When I run netperf in just one interface, I get 940.95 * 10^6 bits/sec
  of transfer rate. If I run 4 netperf against 4 different interfaces, I
  get around 720 * 10^6 bits/sec.
 snip
 
 I take it that's the average for individual interfaces, not the
 aggregate?
Right, each of these results are for individual interfaces. Otherwise,
we'd have a huge problem. :-)

 This can be mitigated by interrupt moderation and NAPI
 polling, jumbo frames (MTU 1500) and/or Large Receive Offload (LRO).
 I don't think e1000 hardware does LRO, but the driver could presumably
 be changed use Linux's software LRO.
Without using these features and keeping the MTU as 1500, do you think
we could get a better performance than this one?

I also tried to increase my interface MTU to 9000, but I am afraid that
netperf only transmits packets with less than 1500. Still investigating.

 single CPU this can become a bottleneck.  Does the test system have
 multiple CPUs?  Are IRQs for the multiple NICs balanced across
 multiple CPUs?
Yes, this machine has 8 ppc 1.9Ghz CPUs. And the IRQs are balanced
across the CPUs, as I see in /proc/interrupts: 

# cat /proc/interrupts 
   CPU0   CPU1   CPU2   CPU3   CPU4   CPU5   
CPU6   CPU7   
 16:940760   1047904993777
975813   XICS  Level IPI
 18:  4  3  4  1  3  6  
8  3   XICS  Level hvc_console
 19:  0  0  0  0  0  0  
0  0   XICS  Level RAS_EPOW
273:  10728  10850  10937  10833  10884  10788  
10868  10776   XICS  Level eth4
275:  0  0  0  0  0  0  
0  0   XICS  Level ehci_hcd:usb1, ohci_hcd:usb2, ohci_hcd:usb3
277: 234933 230275 229770 234048 235906 229858 
229975 233859   XICS  Level eth6
278: 266225 267606 262844 265985 268789 266869 
263110 267422   XICS  Level eth7
279:893919857909867917
894881   XICS  Level eth0
305: 439246 439117 438495 436072 438053 440111 
438973 438951   XICS  Level eth0 Neterion Xframe II 10GbE network 
adapter
321:   3268   3088   3143   3113   3305   2982   
3326   3084   XICS  Level ipr
323: 268030 273207 269710 271338 270306 273258 
270872 273281   XICS  Level eth16
324: 215012 221102 219494 216732 216531 220460 
219718 218654   XICS  Level eth17
325:   7103   3580   7246   3475   7132   3394   
7258   3435   XICS  Level pata_pdc2027x
BAD:   4216

Thanks,

-- 
Breno Leitao [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: questions on NAPI processing latency and dropped network packets

Chris Friesen wrote:
 Hi all,
 
 I've got an issue that's popped up with a deployed system running
 2.6.10.  I'm looking for some help figuring out why incoming network
 packets aren't being processed fast enough.
 
 After a recent userspace app change, we've started seeing packets being
 dropped by the ethernet hardware (e1000, NAPI is enabled).  The
 error/dropped/fifo counts are going up in ethtool:
 
  rx_packets: 32180834
  rx_bytes: 5480756958
  rx_errors: 862506
  rx_dropped: 771345
  rx_length_errors: 0
  rx_over_errors: 0
  rx_crc_errors: 0
  rx_frame_errors: 0
  rx_fifo_errors: 91161
  rx_missed_errors: 91161
 
 This link is receiving roughly 13K packets/sec, and we're dropping
 roughly 51 packets/sec due to fifo errors.
 
 Increasing the rx descriptor ring size from 256 up to around 3000 or so
 seems to make the problem stop, but it seems to me that this is just a
 workaround for the latency in processing the incoming packets.
 
 So, I'm looking for some suggestions on how to fix this or to figure out
 where the latency is coming from.
 
 Some additional information:
 
 
 1) Interrupts are being processed on both cpus:
 
 [EMAIL PROTECTED]:/root cat /proc/interrupts
CPU0   CPU1
  30:17037564530785  U3-MPIC Level eth0
 
 
 
 
 2) top shows a fair amount of time processing softirqs, but very
 little time in ksoftirqd (or is that a sampling artifact?).
 
 
 Tasks: 79 total, 1 running, 78 sleeping, 0 stopped, 0 zombie
 Cpu0: 23.6% us, 30.9% sy, 0.0% ni, 36.9% id, 0.0% wa, 0.3% hi, 8.3% si
 Cpu1: 30.4% us, 24.1% sy, 0.0% ni, 5.9% id, 0.0% wa, 0.7% hi, 38.9% si
 Mem:  4007812k total, 2199148k used,  1808664k free, 0k buffers
 Swap:   0k total,   0k used,  0k free,   219844k cached
 
   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
  5375 root  15   0 2682m 1.8g 6640 S 99.9 46.7  31:17.68
 SigtranServices
  7696 root  17   0  6952 3212 1192 S  7.3  0.1   0:15.75
 schedmon.ppc210
  7859 root  16   0  2688 1228  964 R  0.7  0.0   0:00.04 top
  2956 root   8  -8 18940 7436 5776 S  0.3  0.2   0:01.35 blademtc
 1 root  16   0  1660  620  532 S  0.0  0.0   0:30.62 init
 2 root  RT   0 000 S  0.0  0.0   0:00.01 migration/0
 3 root  15   0 000 S  0.0  0.0   0:00.55 ksoftirqd/0
 4 root  RT   0 000 S  0.0  0.0   0:00.01 migration/1
 5 root  15   0 000 S  0.0  0.0   0:00.43 ksoftirqd/1
 
 
 3) /proc/sys/net/core/netdev_max_backlog is set to the default of 300
 
 
 So...anyone have any ideas/suggestions?

You're using 2.6.10... you can always replace the e1000 module with the
out-of-tree version from e1000.sf.net, this might help a bit - the version in 
the
2.6.10 kernel is very very old.

it also appears that your app is eating up CPU time. perhaps setting the app to 
a
nicer nice level might mitigate things a bit. Also turn off the in-kernel irq
mitigation, it just causes cache misses and you really need the network irq to 
sit
on a single cpu at most (if not all) the time to get the best performance. Use 
the
userspace irqbalance daemon instead to achieve this.

Auke

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.23+] ingress classify to [nf]mark


Dzianis Kahanovich wrote:

--- linux-2.6.23-gentoo-r2/net/sched/sch_ingress.c
+++ linux-2.6.23-gentoo-r2.fixed/net/sched/sch_ingress.c
@@ -161,2 +161,5 @@
 skb-tc_index = TC_H_MIN(res.classid);
+#ifdef CONFIG_NET_SCH_INGRESS_TC2MARK
+skb-mark = 
(skb-mark(res.classid16))|TC_H_MIN(res.classid);

+#endif
 default:



Behaviour like this shouldn't depend on compile-time options.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: questions on NAPI processing latency and dropped network packets

2008-01-10 Thread Chris Friesen


Kok, Auke wrote:


You're using 2.6.10... you can always replace the e1000 module with the
out-of-tree version from e1000.sf.net, this might help a bit - the version in 
the
2.6.10 kernel is very very old.


Do you have any reason to believe this would improve things?  It seems 
like the problem lies in the NAPI/softirq code rather than in the e1000 
driver itself, no?



it also appears that your app is eating up CPU time. perhaps setting the app to 
a
nicer nice level might mitigate things a bit.


If we're not handling the softirq work from ksoftirqd how would changing 
scheduler settings affect anything?


 Also turn off the in-kernel irq

mitigation, it just causes cache misses and you really need the network irq to 
sit
on a single cpu at most (if not all) the time to get the best performance. Use 
the
userspace irqbalance daemon instead to achieve this.


Using userspace irqbalance would be some effort to test and deploy 
properly.  However, as a quick test I tried setting the irq affinity for 
this device and it didn't help.


One thing that might be of interest is that it seems to be bursty rather 
than gradual.  Here are some timestamps (in seconds) along with the 
number of overruns on eth0:


6552.15  overruns:260097
6552.69  overruns:260097
6553.32  overruns:260097
6553.83  overruns:260097
6554.35  overruns:260097
6554.87  overruns:260097
6555.41  overruns:260097
6555.94  overruns:260097
6556.51  overruns:260097
6557.07  overruns:260282
6557.58  overruns:260282
6558.23  overruns:260282


Chris
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: questions on NAPI processing latency and dropped network packets

2008-01-10 Thread James Chapman

Chris Friesen wrote:
 Hi all,
 
 I've got an issue that's popped up with a deployed system running
 2.6.10.  I'm looking for some help figuring out why incoming network
 packets aren't being processed fast enough.
 
 After a recent userspace app change, we've started seeing packets being
 dropped by the ethernet hardware (e1000, NAPI is enabled).

What's changed in your application? Any real-time threads in there?

From the top output below, looks like SigtranServices is consuming all
your CPU...

 The
 error/dropped/fifo counts are going up in ethtool:
 
  rx_packets: 32180834
  rx_bytes: 5480756958
  rx_errors: 862506
  rx_dropped: 771345
  rx_length_errors: 0
  rx_over_errors: 0
  rx_crc_errors: 0
  rx_frame_errors: 0
  rx_fifo_errors: 91161
  rx_missed_errors: 91161
 
 This link is receiving roughly 13K packets/sec, and we're dropping
 roughly 51 packets/sec due to fifo errors.
 
 Increasing the rx descriptor ring size from 256 up to around 3000 or so
 seems to make the problem stop, but it seems to me that this is just a
 workaround for the latency in processing the incoming packets.
 
 So, I'm looking for some suggestions on how to fix this or to figure out
 where the latency is coming from.
 
 Some additional information:
 
 
 1) Interrupts are being processed on both cpus:
 
 [EMAIL PROTECTED]:/root cat /proc/interrupts
CPU0   CPU1
  30:17037564530785  U3-MPIC Level eth0
 
 
 
 
 2) top shows a fair amount of time processing softirqs, but very
 little time in ksoftirqd (or is that a sampling artifact?).
 
 
 Tasks: 79 total, 1 running, 78 sleeping, 0 stopped, 0 zombie
 Cpu0: 23.6% us, 30.9% sy, 0.0% ni, 36.9% id, 0.0% wa, 0.3% hi, 8.3% si
 Cpu1: 30.4% us, 24.1% sy, 0.0% ni, 5.9% id, 0.0% wa, 0.7% hi, 38.9% si
 Mem:  4007812k total, 2199148k used,  1808664k free, 0k buffers
 Swap:   0k total,   0k used,  0k free,   219844k cached
 
   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
  5375 root  15   0 2682m 1.8g 6640 S 99.9 46.7  31:17.68
 SigtranServices
  7696 root  17   0  6952 3212 1192 S  7.3  0.1   0:15.75
 schedmon.ppc210
  7859 root  16   0  2688 1228  964 R  0.7  0.0   0:00.04 top
  2956 root   8  -8 18940 7436 5776 S  0.3  0.2   0:01.35 blademtc
 1 root  16   0  1660  620  532 S  0.0  0.0   0:30.62 init
 2 root  RT   0 000 S  0.0  0.0   0:00.01 migration/0
 3 root  15   0 000 S  0.0  0.0   0:00.55 ksoftirqd/0
 4 root  RT   0 000 S  0.0  0.0   0:00.01 migration/1
 5 root  15   0 000 S  0.0  0.0   0:00.43 ksoftirqd/1
 
 
 3) /proc/sys/net/core/netdev_max_backlog is set to the default of 300
 
 
 So...anyone have any ideas/suggestions?
 
 Thanks,
 
 Chris

-- 
James Chapman
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: e1000 performance issue in 4 simultaneous links

2008-01-10 Thread Rick Jones


Many many things to check when running netperf :)

*) Are the cards on the same or separate PCImumble bus, and what sort of bus

*) is the two interface performance two interfaces on the same four-port 
card, or an interface from each of the two four-port cards?


*) is there a dreaded (IMO) irqbalance daemon running?  one of the very 
first things I do when running netperf is terminate the irqbalance 
daemon with as extreme a predjudice as I can.


*) what is the distribution of interrupts from the interfaces to the 
CPUs?  if you've tried to set that manually, the dreaded irqbalance 
daemon will come along shortly thereafter and ruin everything.


*) what does netperf say about the overall CPU utilization of the 
system(s) when the tests are running?


*) what does top say about the utilization of any single CPU in the 
system(s) when the tests are running?


*) are you using the global -T option to spread the netperf/netserver 
processes across the CPUs, or leaving that all up to the 
stack/scheduler/etc?


I suspect there could be more but that is what comes to mind thusfar as 
far as things I often check when running netperf.


rick jones

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: SMP code / network stack

Arnaldo Carvalho de Melo wrote:
 Em Thu, Jan 10, 2008 at 03:26:59PM +, Jeba Anandhan escreveu:
 Hi Eric,
 Thanks for the reply. I have one more doubt. For example, if we have 2
 processor and 4 ethernet cards. Only CPU0 does all work through 8 cards.
 If we set the affinity to each ethernet card as CPU number, will it be
 efficient?.

 Will this be default behavior?

 # cat /proc/interrupts 
CPU0   CPU1   
   0:   11472559   74291833IO-APIC-edge  timer
   2:  0  0  XT-PIC  cascade
   8:  0  1IO-APIC-edge  rtc
  81:  0  0   IO-APIC-level  ohci_hcd
  97: 1830022231847   IO-APIC-level  ehci_hcd, eth0
  97: 3830012232847   IO-APIC-level  ehci_hcd, eth1
  97: 5830052231847   IO-APIC-level  ehci_hcd, eth2
  97: 6830032213847   IO-APIC-level  ehci_hcd, eth3

another thing to try: if you don't need usb2 support, remove the ehci_hcd 
module -
this will give a slight less overhead servicing irq's in your system.

I take it that you have no MSI support in these ethernet cards?

Auke
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: e1000 performance issue in 4 simultaneous links

Breno Leitao wrote:
 On Thu, 2008-01-10 at 16:36 +, Ben Hutchings wrote:
 When I run netperf in just one interface, I get 940.95 * 10^6 bits/sec
 of transfer rate. If I run 4 netperf against 4 different interfaces, I
 get around 720 * 10^6 bits/sec.
 snip

 I take it that's the average for individual interfaces, not the
 aggregate?
 Right, each of these results are for individual interfaces. Otherwise,
 we'd have a huge problem. :-)
 
 This can be mitigated by interrupt moderation and NAPI
 polling, jumbo frames (MTU 1500) and/or Large Receive Offload (LRO).
 I don't think e1000 hardware does LRO, but the driver could presumably
 be changed use Linux's software LRO.
 Without using these features and keeping the MTU as 1500, do you think
 we could get a better performance than this one?
 
 I also tried to increase my interface MTU to 9000, but I am afraid that
 netperf only transmits packets with less than 1500. Still investigating.
 
 single CPU this can become a bottleneck.  Does the test system have
 multiple CPUs?  Are IRQs for the multiple NICs balanced across
 multiple CPUs?
 Yes, this machine has 8 ppc 1.9Ghz CPUs. And the IRQs are balanced
 across the CPUs, as I see in /proc/interrupts: 


which is wrong and hurts performance. you want your ethernet irq's to stick to a
CPU for long times to prevent cache thrash.

please disable the in-kernel irq balancing code and use the userspace 
`irqbalance`
daemon.

Gee I should put that in my signature, I already wrote that twice today :)

Auke

 
 # cat /proc/interrupts 
CPU0   CPU1   CPU2   CPU3   CPU4   CPU5   
 CPU6   CPU7   
  16:940760   1047904993777
 975813   XICS  Level IPI
  18:  4  3  4  1  3  6
   8  3   XICS  Level hvc_console
  19:  0  0  0  0  0  0
   0  0   XICS  Level RAS_EPOW
 273:  10728  10850  10937  10833  10884  10788  
 10868  10776   XICS  Level eth4
 275:  0  0  0  0  0  0
   0  0   XICS  Level ehci_hcd:usb1, ohci_hcd:usb2, 
 ohci_hcd:usb3
 277: 234933 230275 229770 234048 235906 229858 
 229975 233859   XICS  Level eth6
 278: 266225 267606 262844 265985 268789 266869 
 263110 267422   XICS  Level eth7
 279:893919857909867917
 894881   XICS  Level eth0
 305: 439246 439117 438495 436072 438053 440111 
 438973 438951   XICS  Level eth0 Neterion Xframe II 10GbE network 
 adapter
 321:   3268   3088   3143   3113   3305   2982   
 3326   3084   XICS  Level ipr
 323: 268030 273207 269710 271338 270306 273258 
 270872 273281   XICS  Level eth16
 324: 215012 221102 219494 216732 216531 220460 
 219718 218654   XICS  Level eth17
 325:   7103   3580   7246   3475   7132   3394   
 7258   3435   XICS  Level pata_pdc2027x
 BAD:   4216
 
 Thanks,
 

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] New driver sfc for Solarstorm SFC4000 controller - 4th attempt

2008-01-10 Thread Robert Stonehouse

This is a resubmission of a new driver for Solarflare network controllers.

The driver supports several types of PHY (10Gbase-T, XFP, CX4) on six
different 10G and 1G boards.

Hardware based on this network controller is now available from SMC as
part numbers SMC10GPCIe-XFP and SMC10GPCIe-10BT.

The previous thread was:
  http://marc.info/?l=linux-netdevm=119825632209357w=2


Thanks to the people who looked at the previous patches. We have addressed
the following from comments received after the 3rd submission:
 - Kerneldoc style comment
 - Kconfig changes
 - Reduced size slightly

I am also sending a request to [EMAIL PROTECTED] for review of
the MTD part of the driver.


Previous reviewers have noted that the driver is quite large (but it
would not be the largest network driver by source or compiled module
size). I think it is a reasonable size for a driver that supports a
fully featured NIC, across a range of MACs, PHYs and silicon
revisions.

One aspect that is worth mentioning is that the NIC has no firmware.
A benefit is no dreaded binary blob!  A downside is that more support
code is needed but this tends to be around initialisation and is
readable commented C.

To give a small break down of the sizes of the different driver parts
 (wc output)
 Core control/datapath | 5001  16405 139467  = efx.c rx.c tx.c
 Controller HW support | 3653  11823 107554  = falcon.c
 HW defs   | 1588   4838  47050  = falcon_hwdefs.h
 board support | 1848   7105  52455
 MAC support   | 1623   4977  51007
 PHY support   | 2196   7904  67711
 Headers   | 4565  20645 162402
 Self test code|  863   3088  24981
 Ethtool support   |  751   2144  22845
 MTD code (separate module)| 1021   3200  26944
 Debugfs Code (KConfig option) |  863   2543  24896


Are there further review comments that we need to address before it can be
merged?


The patch (against net-2.6.25) is at:
 https://support.solarflare.com/netdev/4/net-2.6.25-sfc-2.2.0038.patch

The new files may also be downloaded as a tarball:
 https://support.solarflare.com/netdev/4/net-2.6.25-sfc-2.2.0038.tgz

And for verification there is:
 https://support.solarflare.com/netdev/4/MD5SUMS

Regards

-- 
Rob Stonehouse
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: e1000 performance issue in 4 simultaneous links

2008-01-10 Thread Rick Jones


I also tried to increase my interface MTU to 9000, but I am afraid that
netperf only transmits packets with less than 1500. Still investigating.


It may seem like picking a tiny nit, but netperf never transmits 
packets.  It only provides buffers of specified size to the stack. It is 
then the stack which transmits and determines the size of the packets on 
the network.


Drifting a bit more...

While there are settings, conditions and known stack behaviours where 
one can be confident of the packet size on the network based on the 
options passed to netperf, generally speaking one should not ass-u-me a 
direct relationship between the options one passes to netperf and the 
size of the packets on the network.


And for JumboFrames to be effective it must be set on both ends, 
otherwise the TCP MSS exchange will result in the smaller of the two 
MTU's winning as it were.



single CPU this can become a bottleneck.  Does the test system have
multiple CPUs?  Are IRQs for the multiple NICs balanced across
multiple CPUs?


Yes, this machine has 8 ppc 1.9Ghz CPUs. And the IRQs are balanced
across the CPUs, as I see in /proc/interrupts: 


That suggests to me anyway that the dreaded irqbalanced is running, 
shuffling the interrupts as you go.  Not often a happy place for running 
netperf when one want's consistent results.




# cat /proc/interrupts 
   CPU0   CPU1   CPU2   CPU3   CPU4   CPU5   CPU6   CPU7   
 16:940760   1047904993777975813   XICS  Level IPI

 18:  4  3  4  1  3  6  
8  3   XICS  Level hvc_console
 19:  0  0  0  0  0  0  
0  0   XICS  Level RAS_EPOW
273:  10728  10850  10937  10833  10884  10788  
10868  10776   XICS  Level eth4
275:  0  0  0  0  0  0  
0  0   XICS  Level ehci_hcd:usb1, ohci_hcd:usb2, ohci_hcd:usb3
277: 234933 230275 229770 234048 235906 229858 
229975 233859   XICS  Level eth6
278: 266225 267606 262844 265985 268789 266869 
263110 267422   XICS  Level eth7
279:893919857909867917
894881   XICS  Level eth0
305: 439246 439117 438495 436072 438053 440111 
438973 438951   XICS  Level eth0 Neterion Xframe II 10GbE network 
adapter
321:   3268   3088   3143   3113   3305   2982   
3326   3084   XICS  Level ipr
323: 268030 273207 269710 271338 270306 273258 
270872 273281   XICS  Level eth16
324: 215012 221102 219494 216732 216531 220460 
219718 218654   XICS  Level eth17
325:   7103   3580   7246   3475   7132   3394   
7258   3435   XICS  Level pata_pdc2027x
BAD:   4216


IMO, what you want (in the absence of multi-queue NICs) is one CPU 
taking the interrupts of one port/interface, and each port/interface's 
interrupts going to a separate CPU.  So, something that looks roughly 
like concocted example:


   CPU0 CPU1  CPU2 CPU3
  1:   12340 00   eth0
  2:  0 1234 00   eth1
  3:  00  12340   eth2
  4:  00 0 1234   eth3

which you should be able to acheive via the method I think someone else 
has already mentioned about echoing values into 
/proc/irq/irq/smp_affinity  - after you have slain the dreaded 
irqbalance daemon.


rick jones
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: questions on NAPI processing latency and dropped network packets

2008-01-10 Thread Rick Jones


1) Interrupts are being processed on both cpus:

[EMAIL PROTECTED]:/root cat /proc/interrupts
   CPU0   CPU1
 30:17037564530785  U3-MPIC Level eth0


IIRC none of the e1000 driven cards are multi-queue, so while the above 
shows that interrupts from eth0 have been processed on both CPUs at 
various points in the past, it doesn't necessarily mean that they are 
being processed on both CPUs at the same time right?


rick jones
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: ipip tunnel code (IPV4)

2008-01-10 Thread Templin, Fred L

Andy,

 -Original Message-
 From: Andy Johnson [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, January 10, 2008 6:35 AM
 To: netdev@vger.kernel.org
 Subject: ipip tunnel code (IPV4)

 Hello,

 I am trying to learn the IPV4 ipip tunnel code  (net/ipv4/ipip.c)
 and I have two little questions about
 semantics of variables:

 ipip_fb_tunnel_init - what does fb stand for ?

 In tunnels_wc   : what does wc stand for ?

Similar names occur in net/ipv6/sit.c, which is the
IPv6-in-IPv4 analog of ipip.c. I am 90% certain that
wc stands for wildcard - it is used for selecting
the default tunnel interface when no other tunnel
interfaces match a specific (src, dst) pair.

In that light, I assume fb stands for something like
fallback although I am not certain. It would seem to
fit though, because the fallback tunnel interface is
the one that is selected by a wildcard match.

Would be interested if anyone could confirm or correct
my assumptions.

Thanks - Fred
[EMAIL PROTECTED] 

 Regards,
 Andy
 --
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: questions on NAPI processing latency and dropped network packets

Chris Friesen wrote:
 Kok, Auke wrote:
 
 You're using 2.6.10... you can always replace the e1000 module with the
 out-of-tree version from e1000.sf.net, this might help a bit - the
 version in the
 2.6.10 kernel is very very old.
 
 Do you have any reason to believe this would improve things?  It seems
 like the problem lies in the NAPI/softirq code rather than in the e1000
 driver itself, no?

your real issue is that your userspace app is hogging the CPU. While network is
not really cpu intensive, it does require that ample time at many intervals is
given to the CPU to run cleanups and prevent FIFO issues.

alternatively, you can increase your rx/tx ring descriptor count (with ethtool),
which basically makes it easier for the hardware not to be serviced for a longer
period, since there are more buffers available and the card can go longer on 
when
userspace is hogging the CPU.

 it also appears that your app is eating up CPU time. perhaps setting
 the app to a
 nicer nice level might mitigate things a bit.
 
 If we're not handling the softirq work from ksoftirqd how would changing
 scheduler settings affect anything?

correct, it might not.

 Also turn off the in-kernel irq
 mitigation, it just causes cache misses and you really need the
 network irq to sit
 on a single cpu at most (if not all) the time to get the best
 performance. Use the
 userspace irqbalance daemon instead to achieve this.
 
 Using userspace irqbalance would be some effort to test and deploy
 properly.  However, as a quick test I tried setting the irq affinity for
 this device and it didn't help.

irqbalance is a simple userspace app that drops into any system seemlessly and
does the best job all around - often it beats manual tuning of smp_affinity 
even ;)

Auke
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: questions on NAPI processing latency and dropped network packets

Rick Jones wrote:
 1) Interrupts are being processed on both cpus:

 [EMAIL PROTECTED]:/root cat /proc/interrupts
CPU0   CPU1
  30:17037564530785  U3-MPIC Level eth0
 
 IIRC none of the e1000 driven cards are multi-queue

the pci-express variants are, but the functionality is almost always disabled 
(and
relatively new anyway).

even with multiqueue, you can still have only a single irq line (which defeats 
the
purpose of course mostly).

, so while the above 
 shows that interrupts from eth0 have been processed on both CPUs at
 various points in the past, it doesn't necessarily mean that they are
 being processed on both CPUs at the same time right?

never will, an irq can only be processed on one cpu at a time anyway, obviously
the irq here has been migrated ONCE from one of the cpu's to the other.
unfortunately you can't see from /proc/interrupts whether this happens 
frequently
or not, or how many times it happened before.

Auke
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [ipw3945-devel] [PATCH 2/5] iwlwifi: iwl3945 synchronize interruptand tasklet for down iwlwifi

2008-01-10 Thread Chatre, Reinette

On , Joonwoo Park  wrote:

 --- a/drivers/net/wireless/iwlwifi/iwl3945-base.c
 +++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c
 @@ -6262,6 +6262,10 @@ static void __iwl_down(struct iwl_priv *priv)
   /* tell the device to stop sending interrupts */
   iwl_disable_interrupts(priv);
 
 + /* synchronize irq and tasklet */
 + synchronize_irq(priv-pci_dev-irq);
 + tasklet_kill(priv-irq_tasklet);
 +

Could synchronize_irq() be moved into iwl_disable_interrupts() ? I am
also wondering if we cannot call tasklet_kill() before
iwl_disable_interrupts() ... thus preventing it from being scheduled
when we are going down. 

Reinette
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Bugme-new] [Bug 9721] New: wake on lan fails with sky2 module

2008-01-10 Thread supersud501




Stephen Hemminger schrieb:

On Wed, 9 Jan 2008 16:03:00 -0800
Andrew Morton [EMAIL PROTECTED] wrote:


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Wed,  9 Jan 2008 13:05:34 -0800 (PST)
[EMAIL PROTECTED] wrote:


http://bugzilla.kernel.org/show_bug.cgi?id=9721

   Summary: wake on lan fails with sky2 module
   Product: ACPI
   Version: 2.5
 KernelVersion: 2.6.24-rc7
  Platform: All
OS/Version: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: Power-Sleep-Wake
AssignedTo: [EMAIL PROTECTED]
ReportedBy: [EMAIL PROTECTED]

This post-2.6.23 regression was assigned to ACPI but is quite possibly a
net driver problem?


Latest working kernel version: 2.6.23.12
Earliest failing kernel version: 2.6.24-rc6 (not tested earlier kernel,
2.6.24-rc7 still failing)
Distribution: Ubuntu 8.04 (but Kernel build from Kernel.org and system modifiet
to make wake on lan work, i.e. network cards are not shutted down on poweroff)
Hardware Environment: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit
Ethernet Controller (rev 20) onboard Asus P5W DH motherboard, uses module SKY2
Software Environment:
Problem Description:

When enabling wake on lan with: 'ethtool -s eth0 wol' i get the following
status:

21:56:29 ~ # sudo ethtool eth0
Settings for eth0:
Supported ports: [ TP ]
Supported link modes:   10baseT/Half 10baseT/Full 
100baseT/Half 100baseT/Full 
1000baseT/Half 1000baseT/Full 
Supports auto-negotiation: Yes
Advertised link modes:  10baseT/Half 10baseT/Full 
100baseT/Half 100baseT/Full 
1000baseT/Half 1000baseT/Full 
Advertised auto-negotiation: Yes

Speed: 100Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: pg
Wake-on: g wol enabled
Current message level: 0x00ff (255)
Link detected: yes

but after shutting down the pc doesn't wake up when magic packet is sent.

the status lights of the network card are still on (so the card seems to be
online).

same system with only changed kernel to 2.6.23.12 and same procedure like
above: wake on lan works.

Steps to reproduce: enable wol on your network card using SKY2 module and it
doesn't work too?

if you need more information, just tell me, it's my first bug report.
regards




Wake from power off works on 2.6.24-rc7 for me.
Wake from suspend doesn't because Network Manager, HAL, or some other
user space tool gets confused.

I just rechecked it with Fujitsu Lifebook, which has sky2 (88E8055).
There many variations of this chip, and it maybe chip specific problem
or ACPI/BIOS issues.  If you don't enable Wake on Lan in BIOS, the
driver can't do it for you. Also, check how you are shutting down.

Also since the device has to restart the PHY, it could be a switch
issue if you have some fancy pants switch doing intrusion detection
or something, but I doubt that.

Is it a clean or fast shutdown, most distributions mark network
devices as down on shutdown, but if the distribution does something 
stupid like remove the driver module, then the driver is unable to setup Wake On Lan.

The wake on lan setup is done in one place in the driver, add
a printk to see if it is ever called.




I only tried wake from shutdown (poweroff), and like i wrote, on the 
same system with kernel 2.6.23.12 (nothing changed but vmlinuz and 
initrd, with the same kernel config on 2.6.24-rc6/7 (make oldconfig, 
default answer to all questions)), it works. so it seems to me like a 
problem in the kernel.


every wake-up setting (wake up by pci-device, rtc-alarm, modem ...) in 
bios is also enabled, otherwise it couldn't work in 2.6.23.12 (and windows).


if you say your sky2-card works, it might be a acpi-problem not related 
to sky2 like i thought - when i am at home i'll try to start my pc with 
a timer (-- /proc/acpi/alarm) from kernel 2.6.24-rc7 to check if 
acpi-wakeup works and report back (if it is any help in finding the 
source of my problem).


and regarding printk i'll try to find out what you mean (my first 
steps into kernel debugging :) - i think you mean adding a line in the 
source to print out something when the function is called)


regards
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PROCFS] [NETNS] issue with /proc/net entries

2008-01-10 Thread Eric W. Biederman

Benjamin Thery [EMAIL PROTECTED] writes:

 Hi Eric,

 While testing the current network namespace stuff merged in net-2.6.25,
 I bumped into the following problem with the /proc/net/ entries.
 It doesn't always display the actual data of the current namespace,
 but sometime displays data from other namespaces.

 I bisected the problem to the commit:
 proc: remove/Fix proc generic d_revalidate
 3790ee4bd86396558eedd86faac1052cb782e4e1

 The problem: If a process in a particular network namespace changes
 current directory to /proc/net, then processes in other network
 namespaces trying to look at /proc/net entries will see data from the
 first namespace (the one with CWD /proc/net). (See test case below).

 As you comments in the commit suggest, you seem to be aware of some
 issues when CONFIG_NET_NS=y. Is it one of these corner cases you
 identified? Any idea on how we can fix it?

Yes.  It isn't especially hard.   I have most of it in my queue
I just need to get the silly patches out of there.

Essentially we need to fix the caching of proc_generic entries,
So that we can have a proper d_revalidate implementation.

To get d_revalidate and the caching correct for /proc/net will take
just a bit more work.  We need to make /proc/net a symlink
to something like /proc/self/net so that we don't get excess
revalidates when switching between different processes.

Or else we can't properly implement the case you have described.
Where being in the directory causes the wrong version of /proc/net
to show up. Changing the contents of the dentry for /proc/net
should only happen during unshare.  Not when we switch between
processes or else we get into the d_revalidate leaks mount points
problem again.

We also need the check to see if something is mounted on top of
us before we call drop the dentry.  But if we don't even try until
we know the dentry is invalid it should not be too bad.

Eric
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RFC: igb: Intel 82575 gigabit ethernet driver (take #2)

Jeff Garzik wrote:
 Looks pretty decent.  Main comments (style mostly, driver operation path
 seems sound):

thanks again for the comments. I am about to send an updated patch just before 
my
vacation and before I do let me just quickly touch on your comments below:

 * kill the bitfields and unions [in descriptor structs].  they are not
 endian-safe as presented, generate poor code, and are otherwise
 undesirable.

that bitfield was unused and so I removed the code. I don't see any more 
bitfields
at all now in this driver.

 * the basic operations are too verbose:  E1000_READ_REG(hw, REGISTER) is
 far more readable as ER32(REGISTER), following the style of other
 drivers.  Furthermore, the E1000_ prefix, in addition to being overly
 redundant (used in each register read/write), it is also incorrect,
 because this is not E1000...

partially I agree, and I refined the register writes to remove the need for the
hw part.

However the hardware *is* e1000, we ended up making a new driver since it just
does not make sense to add all of this infrastructure for older chipsets 
anymore.

renaming everything (from e1000_ to igb_) would just make life for us really 
hard
looking up historical diffs, history etc. and most importantly compare with
e1000/e1000e when we encounter an issue that might affect the other drivers. For
now it is easier to just leave these alone.

I however do not rule out that we change this at a later stage ...

 * in general, rename everything with e1000_ prefix.  this will
 eliminate plenty of human confusion in the long run.

I'm doing this for all functions, which solves the namespace collisions. The
e1000 specific static structs (which are the same in igb as they are in e1000,
e1000e) as well as the registers (ditto) I'll keep unchanged for now.

 * API:   unless you have chips in the lab that will require an API hook,
 don't create one.  For example, a direct call to
 e1000_acquire_nvm_82575() should replace all -acquire_nvm() hooks
 if there are no chips in pipeline GUARANTEED to have a different
 -acquire_nvm() feature.

Noted

Note also that there are already many less hooks as there are in e1000e. We did
already make an effort to scrub as many as we can.


Cheers,

Auke

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

debian iproute2 patches branch rebased.

2008-01-10 Thread Andreas Henriksson

Hello Stephen!

I've rebased the patches branch we carry in debian on top of the new
080108 release of iproute2.

See patches branch of git://git.debian.org/git/collab-maint/pkg-iproute

I've dropped one of the patches you picked up[1], so there's now one of the
old ones left and a new manpage for routel/routef.
(Any reason you didn't pull the actual commit we served you with git?)

The old remaining patch fixes the infinite loop in ip route flush exactly the
same way you fixed the same problem in ip neigh flush[2].
An additional patch will be provided in a followup mail (not available in
Debian) that was created by request from Patrick McHardy. This one makes max
rounds configurable (and 0 means try to infinity, so you can restore old
behaviour).
Patrick and me disagrees on what the default should be[3]. He thinks the 'ip
route flush' aka 'loop forever' behaviour should stay, while I vote for the
'ip neigh flush' behaviour of bailing out after N attempts.
IMNSHO looping infinitely is an *insane* default. Specially since this is a
tool used in bootup scripts

[1]: See commit ea5dd59c03b36fe2acec8f03a8d7a2f7b7036b04
[2]: See commit 660818498d0f5a3f52c05355a3e82c23f670fcc1
 Where the comment seems to be wrong about Limit ip route flush...,
 since it's actually ip neigh flush that's being modified.
[3]: Read thread from here on:
 http://www.spinics.net/lists/netdev/msg44920.html


commit 1eef590948f81b5c84e8450d5c95dd73744b4278
Author: Andreas Henriksson [EMAIL PROTECTED]
Date:   Thu Jan 3 16:48:56 2008 +0100

Add routel and routef man page.

diff --git a/Makefile b/Makefile
index de04176..723eb5d 100644
--- a/Makefile
+++ b/Makefile
@@ -56,6 +56,7 @@ install: all
ln -sf lnstat.8  $(DESTDIR)$(MANDIR)/man8/rtstat.8
ln -sf lnstat.8  $(DESTDIR)$(MANDIR)/man8/ctstat.8
ln -sf rtacct.8  $(DESTDIR)$(MANDIR)/man8/nstat.8
+   ln -sf routel.8  $(DESTDIR)$(MANDIR)/man8/routef.8
install -m 0755 -d $(DESTDIR)$(MANDIR)/man3
install -m 0644 $(shell find man/man3 -maxdepth 1 -type f) 
$(DESTDIR)$(MANDIR)/man3
 
diff --git a/man/man8/routel.8 b/man/man8/routel.8
new file mode 100644
index 000..cdf8f55
--- /dev/null
+++ b/man/man8/routel.8
@@ -0,0 +1,32 @@
+.TH ROUTEL 8 3 Jan, 2008 iproute2 Linux
+.SH NAME
+.LP 
+routel \- list routes with pretty output format
+.br
+routef \- flush routes
+.SH SYNTAX
+.LP 
+routel [\fItablenr\fP [\fIraw ip args...\fP]]
+.br 
+routef
+.SH DESCRIPTION
+.LP 
+These programs are a set of helper scripts you can use instead of raw iproute2 
commands.
+.br
+The routel script will list routes in a format that some might consider easier 
to interpret then the ip route list equivalent.
+.br
+The routef script does not take any arguments and will simply flush the 
routing table down the drain. Beware! This means deleting all routes which will 
make your network unusable!
+
+.SH FILES
+.LP 
+\fI/usr/bin/routef\fP 
+.br 
+\fI/usr/bin/routel\fP 
+.SH AUTHORS
+.LP 
+The routel script was written by Stephen R. van den Berg [EMAIL PROTECTED], 
1999/04/18 and donated to the public domain.
+.br
+This manual page was written by Andreas Henriksson  [EMAIL PROTECTED], for 
the Debian GNU/Linux system.
+.SH SEE ALSO
+.LP 
+ip(8)

commit 1d1dab5826d1a9091e0bb2cf832f0785dc2add63
Author: Daniel Silverstone [EMAIL PROTECTED]
Date:   Fri Oct 19 13:32:24 2007 +0200

Avoid infinite loop in ip addr flush.

Fix ip addr flush the same way ip neigh flush was previously fixed,
by bailing out if the flush hasn't completed after MAX_ROUNDS (10) tries.

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index d1c6620..34379d0 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -34,6 +34,8 @@
 #include ll_map.h
 #include ip_common.h
 
+#define MAX_ROUNDS 10
+
 static struct
 {
int ifindex;
@@ -667,7 +669,7 @@ int ipaddr_list_or_flush(int argc, char **argv, int flush)
filter.flushp = 0;
filter.flushe = sizeof(flushb);
 
-   for (;;) {
+   while (round  MAX_ROUNDS) {
if (rtnl_wilddump_request(rth, filter.family, 
RTM_GETADDR)  0) {
perror(Cannot send dump request);
exit(1);
@@ -694,6 +696,8 @@ int ipaddr_list_or_flush(int argc, char **argv, int flush)
fflush(stdout);
}
}
+   fprintf(stderr, *** Flush remains incomplete after %d rounds. 
***\n, MAX_ROUNDS); fflush(stderr);
+   return 1;
}
 
if (filter.family != AF_PACKET) {



-- 
Regards,
Andreas Henriksson
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Please pull 'fixes-jgarzik' branch of wireless-2.6

Jeff,

A couple more fixes for 2.6.24.  The one from Mattias Nissler is already
in your upstream tree...FYI.

Let me know if there are problems!

Thanks,

John

---

Individual patches available here:


http://www.kernel.org/pub/linux/kernel/people/linville/wireless-2.6/fixes-jgarzik/

---

The following changes since commit 3ce54450461bad18bbe1f9f5aa3ecd2f8e8d1235:
  Linus Torvalds (1):
Linux 2.6.24-rc7

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git 
fixes-jgarzik

Ivo van Doorn (1):
  rt2x00: Corectly initialize rt2500usb MAC

Mattias Nissler (1):
  rt2x00: Allow rt61 to catch up after a missing tx report

 drivers/net/wireless/rt2x00/rt2500usb.c |2 +-
 drivers/net/wireless/rt2x00/rt61pci.c   |   13 +
 2 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/drivers/net/wireless/rt2x00/rt2500usb.c 
b/drivers/net/wireless/rt2x00/rt2500usb.c
index 50775f9..18b1f91 100644
--- a/drivers/net/wireless/rt2x00/rt2500usb.c
+++ b/drivers/net/wireless/rt2x00/rt2500usb.c
@@ -257,7 +257,7 @@ static const struct rt2x00debug rt2500usb_rt2x00debug = {
 static void rt2500usb_config_mac_addr(struct rt2x00_dev *rt2x00dev,
  __le32 *mac)
 {
-   rt2500usb_register_multiwrite(rt2x00dev, MAC_CSR2, mac,
+   rt2500usb_register_multiwrite(rt2x00dev, MAC_CSR2, mac,
  (3 * sizeof(__le16)));
 }
 
diff --git a/drivers/net/wireless/rt2x00/rt61pci.c 
b/drivers/net/wireless/rt2x00/rt61pci.c
index 01dbef1..0d9436d 100644
--- a/drivers/net/wireless/rt2x00/rt61pci.c
+++ b/drivers/net/wireless/rt2x00/rt61pci.c
@@ -1738,6 +1738,7 @@ static void rt61pci_txdone(struct rt2x00_dev *rt2x00dev)
 {
struct data_ring *ring;
struct data_entry *entry;
+   struct data_entry *entry_done;
struct data_desc *txd;
u32 word;
u32 reg;
@@ -1791,6 +1792,18 @@ static void rt61pci_txdone(struct rt2x00_dev *rt2x00dev)
!rt2x00_get_field32(word, TXD_W0_VALID))
return;
 
+   entry_done = rt2x00_get_data_entry_done(ring);
+   while (entry != entry_done) {
+   /* Catch up. Just report any entries we missed as
+* failed. */
+   WARNING(rt2x00dev,
+   TX status report missed for entry %p\n,
+   entry_done);
+   rt2x00pci_txdone(rt2x00dev, entry_done, TX_FAIL_OTHER,
+0);
+   entry_done = rt2x00_get_data_entry_done(ring);
+   }
+
/*
 * Obtain the status about this packet.
 */
-- 
John W. Linville
[EMAIL PROTECTED]
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Please pull 'upstream-jgarzik-2' branch of wireless-2.6

Jeff,

This is additive on top of the pull request posted on Tuesday evening:

http://marc.info/?l=linux-wirelessm=119985065704687w=2

If you pull this one, you will get that one as well.

Please let me know if there are any problems!

Thanks,

John

---

Individual patches are available here:


http://www.kernel.org/pub/linux/kernel/people/linville/wireless-2.6/upstream-jgarzik-2/

---

The following changes since commit deb27641a93290475f6c66b99d2fceabbc28d6fb:
  Michael Buesch (1):
zd1211rw: fix alignment for QOS and WDS frames

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git 
upstream-jgarzik-2

John W. Linville (2):
  b43: finish removal of pio support
  iwlwifi: fix-up damage from rebase of namespace separation patches

Michael Buesch (3):
  b43: Add N-PHY register definitions
  b43: Fix PHY register routing
  b43: Remove the PHY spinlock

Pavel Roskin (1):
  hostap_cs: don't match revisions in presense of the MAC chip name

 drivers/net/wireless/b43/Makefile   |1 +
 drivers/net/wireless/b43/b43.h  |   17 +-
 drivers/net/wireless/b43/debugfs.c  |6 +-
 drivers/net/wireless/b43/lo.c   |   64 ++--
 drivers/net/wireless/b43/main.c |4 -
 drivers/net/wireless/b43/nphy.c |   34 ++
 drivers/net/wireless/b43/nphy.h |  706 +++
 drivers/net/wireless/b43/phy.c  |  235 +
 drivers/net/wireless/b43/phy.h  |   57 +--
 drivers/net/wireless/b43/pio.c  |  652 -
 drivers/net/wireless/b43/pio.h  |  153 --
 drivers/net/wireless/hostap/hostap_cs.c |   15 +-
 drivers/net/wireless/iwlwifi/iwl3945-base.c |9 +-
 drivers/net/wireless/iwlwifi/iwl4965-base.c |9 +-
 14 files changed, 949 insertions(+), 1013 deletions(-)
 create mode 100644 drivers/net/wireless/b43/nphy.c
 create mode 100644 drivers/net/wireless/b43/nphy.h
 delete mode 100644 drivers/net/wireless/b43/pio.c
 delete mode 100644 drivers/net/wireless/b43/pio.h

Omnibus patch attached as upstream-jgarzik-2.patch.bz2
-- 
John W. Linville
[EMAIL PROTECTED]


upstream-jgarzik-2.patch.bz2
Description: BZip2 compressed data

iproute2: make max rounds in ip {neigh,addr} flush configurable.

2008-01-10 Thread Andreas Henriksson


On tor, 2008-01-10 at 20:54 +0100, Andreas Henriksson wrote:
 An additional patch will be provided in a followup mail (not available in
 Debian) that was created by request from Patrick McHardy. This one makes max
 rounds configurable (and 0 means try to infinity, so you can restore old
 behaviour).

In my opinion 10 tries should be enough for anyone, but here's the patch anyway.

This one is on top of the patches in the previous mail.


diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index ff9e318..232fd64 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -67,6 +67,7 @@ static void usage(void)
fprintf(stderr,ip addr del IFADDR dev STRING\n);
fprintf(stderr,ip addr {show|flush} [ dev STRING ] [ scope 
SCOPE-ID ]\n);
fprintf(stderr, [ to PREFIX ] [ FLAG-LIST 
] [ label PATTERN ]\n);
+   fprintf(stderr, [ maxrounds N ]\n);
fprintf(stderr, IFADDR := PREFIX | ADDR peer PREFIX\n);
fprintf(stderr,   [ broadcast ADDR ] [ anycast ADDR ]\n);
fprintf(stderr,   [ label STRING ] [ scope SCOPE-ID ]\n);
@@ -566,6 +567,7 @@ int ipaddr_list_or_flush(int argc, char **argv, int flush)
struct nlmsg_list *l, *n;
char *filter_dev = NULL;
int no_link = 0;
+   unsigned maxrounds = MAX_ROUNDS;
 
ipaddr_reset_filter(oneline);
filter.showqueue = 1;
@@ -630,6 +632,10 @@ int ipaddr_list_or_flush(int argc, char **argv, int flush)
} else if (strcmp(*argv, label) == 0) {
NEXT_ARG();
filter.label = *argv;
+   } else if (strcmp(*argv, maxrounds) == 0) {
+   NEXT_ARG();
+   if (get_unsigned(maxrounds, *argv, 0))
+   invarg(maxrounds must be 0 (infinite) or 
higher, maxrounds);
} else {
if (strcmp(*argv, dev) == 0) {
NEXT_ARG();
@@ -669,7 +675,7 @@ int ipaddr_list_or_flush(int argc, char **argv, int flush)
filter.flushp = 0;
filter.flushe = sizeof(flushb);
 
-   while (round  MAX_ROUNDS) {
+   while (maxrounds == 0 || round  maxrounds) {
if (rtnl_wilddump_request(rth, filter.family, 
RTM_GETADDR)  0) {
perror(Cannot send dump request);
exit(1);
@@ -696,7 +702,10 @@ int ipaddr_list_or_flush(int argc, char **argv, int flush)
fflush(stdout);
}
}
-   fprintf(stderr, *** Flush remains incomplete after %d rounds. 
***\n, MAX_ROUNDS); fflush(stderr);
+   fprintf(stderr,
+   *** Flush remains incomplete after %u rounds. ***\n,
+   maxrounds);
+   fflush(stderr);
return 1;
}
 
diff --git a/ip/ipneigh.c b/ip/ipneigh.c
index db684f5..61fac66 100644
--- a/ip/ipneigh.c
+++ b/ip/ipneigh.c
@@ -53,6 +53,7 @@ static void usage(void)
  [ nud { permanent | noarp | stale | 
reachable } ]\n
  | proxy ADDR } [ dev DEV ]\n);
fprintf(stderr,ip neigh {show|flush} [ to PREFIX ] [ dev DEV ] 
[ nud STATE ]\n);
+   fprintf(stderr,  [ maxrounds N ]\n);
exit(-1);
 }
 
@@ -321,6 +322,7 @@ int do_show_or_flush(int argc, char **argv, int flush)
 {
char *filter_dev = NULL;
int state_given = 0;
+   unsigned maxrounds = MAX_ROUNDS;
 
ipneigh_reset_filter();
 
@@ -361,6 +363,10 @@ int do_show_or_flush(int argc, char **argv, int flush)
if (state == 0)
state = 0x100;
filter.state |= state;
+   } else if (strcmp(*argv, maxrounds) == 0) {
+   NEXT_ARG();
+   if (get_unsigned(maxrounds, *argv, 0))
+   invarg(maxrounds must be 0 (infinite) or 
higher, maxrounds);
} else {
if (strcmp(*argv, to) == 0) {
NEXT_ARG();
@@ -392,7 +398,7 @@ int do_show_or_flush(int argc, char **argv, int flush)
filter.flushe = sizeof(flushb);
filter.state = ~NUD_FAILED;
 
-   while (round  MAX_ROUNDS) {
+   while (maxrounds == 0 || round  maxrounds) {
if (rtnl_wilddump_request(rth, filter.family, 
RTM_GETNEIGH)  0) {
perror(Cannot send dump request);
exit(1);
@@ -418,8 +424,10 @@ int do_show_or_flush(int argc, char **argv, int flush)
fflush(stdout);
}
}
-   printf(*** Flush not complete bailing out after %d

Re: [PATCH 0/3] bonding: 3 fixes for 2.6.24

2008-01-10 Thread Herbert Xu

On Thu, Jan 10, 2008 at 09:51:44AM -0500, Andy Gospodarek wrote:

 That wasn't the only purpose, Herbert.  Making sure that calls to
 dev_set_mac_address were called from process context was important at
 the time of the coding as well since at least the tg3 driver took locks
 that could not be taken reliably in soft-irq context.  Michael Chan
 fixed this here:

Sure, but where do you call that function while holding the bond lock?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] bonding: 3 fixes for 2.6.24

2008-01-10 Thread Jay Vosburgh

Herbert Xu [EMAIL PROTECTED] wrote:

On Thu, Jan 10, 2008 at 09:51:44AM -0500, Andy Gospodarek wrote:

 That wasn't the only purpose, Herbert.  Making sure that calls to
 dev_set_mac_address were called from process context was important at
 the time of the coding as well since at least the tg3 driver took locks
 that could not be taken reliably in soft-irq context.  Michael Chan
 fixed this here:

Sure, but where do you call that function while holding the bond lock?

If I recall correctly, the problem was that tg3, et al, did
things that might sleep, and bonding was calling from a timer context,
which couldn't sleep.  It wasn't about the lock.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Re : Re : Re : Re : Bonding : Monitoring of 4965 wireless card

2008-01-10 Thread Jay Vosburgh

[EMAIL PROTECTED] wrote:

Yes it's what i'm looking for. I don't understand how to change the 
arp_ip_target with the gateway, arp_ip_target is a module option.

If you're running a relatively recent bonding driver (version
3.0.0 or later), the arp_ip_targets can be changed on the fly via sysfs,
e.g.,

echo +10.0.0.1  /sys/class/net/bond0/bonding/arp_ip_target
echo -20.0.0.1  /sys/class/net/bond0/bonding/arp_ip_target

You can check out Documentation/networking/bonding.txt (in the
kernel source code) for more details.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]


- Message d'origine 
De : Jay Vosburgh [EMAIL PROTECTED]
À : [EMAIL PROTECTED]
Cc : netdev@vger.kernel.org
Envoyé le : Jeudi, 10 Janvier 2008, 0h26mn 38s
Objet : Re: Re : Re : Re : Bonding : Monitoring of 4965 wireless card 

[EMAIL PROTECTED] wrote:

I mean that instead of arp test an ip in lan or else, i want it to
 test 127.0.0.1 but in order to do this it must go out and re-enter and
 then use wlan0 to go out.

In other words, what I think you're saying (and I'm not entirely
sure here) is that you want probes to go to a remote node on the
network, and back, without having to actually know the identity of the
remote node (because, presumably, on a roaming type of wireless
configuration, your gateway and whatnot can change from time to time).

Is that what you're looking for?

That isn't available now, but might be straightforward to plug
into the address update system to keep the arp_ip_target up to date as
the current gateway as the gateway changes.  I haven't looked into the
details of doing that, but in theory it sounds straightforward.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] bonding: 3 fixes for 2.6.24

2008-01-10 Thread Jay Vosburgh

Andy Gospodarek [EMAIL PROTECTED] wrote:
[...]
That wasn't the only purpose, Herbert.  Making sure that calls to
dev_set_mac_address were called from process context was important at
the time of the coding as well since at least the tg3 driver took locks
that could not be taken reliably in soft-irq context.  Michael Chan
fixed this here:

commit 986e0aeb9ae09127b401c3baa66f15b7a31f354c
Author: Michael Chan [EMAIL PROTECTED]
Date:   Sat May 5 12:10:20 2007 -0700

[TG3]: Remove reset during MAC address changes.

so if wasn't as much of an issue after that, but moving as much of the
code to process context was important for that as well (hence the move
to not continue to try to not use bh-locks everywhere).

Well, not for tg3 perhaps, but other network device drivers do
the same thing (if memory serves, any USB ethernet adapter will have
issues there).  Also, I believe the netlink notifier callback,
rtnetlink_event, which every dev_set_whatever calls, does a
possibly-sleeping memory allocation (rtmsg_ifinfo - nlmsg_new -
alloc_skb(GFP_KERNEL)); so we don't really want to hold extra locks for
that, either.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] bonding: 3 fixes for 2.6.24

2008-01-10 Thread Herbert Xu

On Thu, Jan 10, 2008 at 04:03:53PM -0500, Andy Gospodarek wrote:

  Sure, but where do you call that function while holding the bond lock?
  
  If I recall correctly, the problem was that tg3, et al, did
  things that might sleep, and bonding was calling from a timer context,
  which couldn't sleep.  It wasn't about the lock.
 
 Exactly, I was just about to post the same.

In other words, changing read_lock on bond-lock to read_lock_bh doesn't
affect this one bit.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH take2] Re: Nested VLAN causes recursive locking error

2008-01-10 Thread Jarek Poplawski

On Thu, Jan 10, 2008 at 04:31:22PM +0100, Patrick McHardy wrote:
...
 No, this seems fine, thanks. Even better would be a way to get
 the last lockdep subclass through lockdep somehow, but I couldn't
 find a clean way for this. So I've applied your patch and also
 fixed macvlan.

As a matter of fact this simplified version was done mainly to remove
this bad looking effect of a never decreased global. Of course, your
proposal with using parent's subclass + 1 would be better, if deeper
nestings are required: so, I could try to enhance this (probably with
such additional lockdep macro) after some hint.

But still some 'quirks' are possible there: removing and adding
devices 'properly' would often require resetting of many subclasses,
so quite a lot of activities if more devices. And probably not very
common if not requested until now...

Thanks,
Jarek P.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] bonding: 3 fixes for 2.6.24

2008-01-10 Thread Andy Gospodarek

On Thu, Jan 10, 2008 at 12:50:46PM -0800, Jay Vosburgh wrote:
 Herbert Xu [EMAIL PROTECTED] wrote:
 
 On Thu, Jan 10, 2008 at 09:51:44AM -0500, Andy Gospodarek wrote:
 
  That wasn't the only purpose, Herbert.  Making sure that calls to
  dev_set_mac_address were called from process context was important at
  the time of the coding as well since at least the tg3 driver took locks
  that could not be taken reliably in soft-irq context.  Michael Chan
  fixed this here:
 
 Sure, but where do you call that function while holding the bond lock?
 
   If I recall correctly, the problem was that tg3, et al, did
 things that might sleep, and bonding was calling from a timer context,
 which couldn't sleep.  It wasn't about the lock.
 

Exactly, I was just about to post the same.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: e1000 performance issue in 4 simultaneous links

2008-01-10 Thread Brandeburg, Jesse

Breno Leitao wrote:
 When I run netperf in just one interface, I get 940.95 * 10^6 bits/sec
 of transfer rate. If I run 4 netperf against 4 different interfaces, I
 get around 720 * 10^6 bits/sec.

This is actually a known issue that we have worked with your company
before on.  It comes down to your system's default behavior of round
robining interrupts (see cat /proc/interrupts while running the test)
combined with e1000's way of exiting / rescheduling NAPI.

The default round robin behavior of the interrupts on your system is the
root cause of this issue, and here is what happens:

4 interfaces start generating interrupts, if you're lucky the round
robin balancer has them all on different cpus.
As the e1000 driver goes into and out of polling mode, the round robin
balancer keeps moving the interrupt to the next cpu.
Eventually 2 or more driver instances end up on the same CPU, which
causes both driver instances to stay in NAPI polling mode, due to the
amount of work being done, and that there are always more than
netdev-weight packets to do for each instance.  This keeps *hardware*
interrupts for each interface *disabled*.
Staying in NAPI polling mode causes higher cpu utilization on that one
processor, which guarantees that when the hardware round robin balancer
moves any other network interrupt onto that CPU, it too will join the
NAPI polling mode chain.
So no matter how many processors you have, with this round robin style
of hardware interrupts, it guarantees you that if there is a lot of work
to do (more than weight) at each softirq, then, all network interfaces
will end up on the same cpu eventually (the busiest one)
Your performance becomes the same as if you had booted with maxcpus=1

I hope this explanation makes sense, but what it comes down to is that
combining hardware round robin balancing with NAPI is a BAD IDEA.  In
general the behavior of hardware round robin balancing is bad and I'm
sure it is causing all sorts of other performance issues that you may
not even be aware of.

I'm sure your problem will go away if you run e1000 in interrupt mode.
(use make CFLAGS_EXTRA=-DE1000_NO_NAPI)
 
 If I run the same test against 2 interfaces I get a 940 * 10^6
 bits/sec transfer rate also, and if I run it against 3 interfaces I
 get around 850 * 10^6 bits/sec performance.
 
 I got this results using the upstream netdev-2.6 branch kernel plus
 David Miller's 7 NAPI patches set[1]. In the kernel 2.6.23.12 the
 result is a bit worse, and the the transfer rate was around 600 * 10^6
 bits/sec.

Thank you for testing the latest kernel.org kernel.

Hope this helps.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Please pull 'fixes-jgarzik' branch of wireless-2.6

On Thu, Jan 10, 2008 at 02:49:22PM -0500, John W. Linville wrote:
 Jeff,
 
 A couple more fixes for 2.6.24.  The one from Mattias Nissler is already
 in your upstream tree...FYI.
 
 Let me know if there are problems!

Please disregard this request.  The 'upstream-jgarzik-2' request is still valid.

Thanks,

John

-- 
John W. Linville
[EMAIL PROTECTED]
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH take2] Re: Nested VLAN causes recursive locking error

2008-01-10 Thread Jarek Poplawski

On Thu, Jan 10, 2008 at 10:08:16PM +0100, Jarek Poplawski wrote:
...
 But still some 'quirks' are possible there: removing and adding
 devices 'properly' would often require resetting of many subclasses,

...Hmm, probably they are always removed from/with the children, then no
problem! (I know, I could've checked...)

Jarek P.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: questions on NAPI processing latency and dropped network packets

2008-01-10 Thread Chris Friesen


James Chapman wrote:


What's changed in your application? Any real-time threads in there?


From the top output below, looks like SigtranServices is consuming all

your CPU...


There are two cpus, and SigtranServices is multithreaded with many 
threads.  Most of these threads are affined to cpu0, a couple to cpu1. 
None of the threads are realtime.


Top is showing 37% idle on cpu0, and 6% idle on cpu1, so not all the cpu 
is being consumed.  However, I'm wondering if we're hitting bursty bits 
and we're just running out of time.


I'm going to try a system with MAX_SOFTIRQ_RESTART bumped up a bit, and 
also enable profiling.


Chris
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.23+] ingress classify to [nf]mark

2008-01-10 Thread jamal


On Thu, 2008-10-01 at 17:05 -0200, Dzianis Kahanovich wrote:
 To classid x:y = mark=markx|y (classid :y = -j MARK --set-mark y, 
 etc).
 
 --- linux-2.6.23-gentoo-r2/net/sched/Kconfig
 +++ linux-2.6.23-gentoo-r2.fixed/net/sched/Kconfig
 @@ -222,6 +222,16 @@
[..]
   skb-tc_index = TC_H_MIN(res.classid);
 +#ifdef CONFIG_NET_SCH_INGRESS_TC2MARK
 + skb-mark = 
 (skb-mark(res.classid16))|TC_H_MIN(res.classid);
 +#endif
   default:


Please either use ipt action and netfilter fwmarker for this activity or
create a new action. 
If you choose the later (example because you want to dynamically compute
the mark), look at net/sched/act_simple.c to start from and i can help
you if you have any questions.
 
If you want to use ipt action, the syntax would be something like:

---
tc qdisc add dev XXX ingress
tc filter add dev XXX parent : protocol ip prio 5 \
u32 blah bleh \
flowid 1:12 action ipt -j mark --set-mark 13 
-

cheers,
jamal

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Please pull 'fixes-jgarzik' branch of wireless-2.6 (use this one)

[2nd try -- turns-out the Mattis Nissler patch needed an extra tweak.
It will probably also cause build breakage when you rebase since
rt2x00lib_txdone(...) becomes rt2x00pci_txdone(rt2x00dev,...) in 2.6.25,
so FYI... :-)

This also includes another patch (the 4 byte boundary one) which
is already in your upstream branch.  So, beware of that one too. :-)]

Jeff,

Three more fixes for 2.6.24.  The one from Mattias Nissler is already
in your upstream tree...FYI.

Let me know if there are problems!

Thanks,

John

---

Individual patches available here:


http://www.kernel.org/pub/linux/kernel/people/linville/wireless-2.6/fixes-jgarzik/

---

The following changes since commit 3ce54450461bad18bbe1f9f5aa3ecd2f8e8d1235:
  Linus Torvalds (1):
Linux 2.6.24-rc7

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git 
fixes-jgarzik

Ivo van Doorn (2):
  rt2x00: Corectly initialize rt2500usb MAC
  rt2x00: Put 802.11 data on 4 byte boundary

Mattias Nissler (1):
  rt2x00: Allow rt61 to catch up after a missing tx report

 drivers/net/wireless/rt2x00/rt2500usb.c |2 +-
 drivers/net/wireless/rt2x00/rt2x00pci.c |   20 
 drivers/net/wireless/rt2x00/rt2x00usb.c |   17 +++--
 drivers/net/wireless/rt2x00/rt61pci.c   |   12 
 4 files changed, 44 insertions(+), 7 deletions(-)

diff --git a/drivers/net/wireless/rt2x00/rt2500usb.c 
b/drivers/net/wireless/rt2x00/rt2500usb.c
index 50775f9..18b1f91 100644
--- a/drivers/net/wireless/rt2x00/rt2500usb.c
+++ b/drivers/net/wireless/rt2x00/rt2500usb.c
@@ -257,7 +257,7 @@ static const struct rt2x00debug rt2500usb_rt2x00debug = {
 static void rt2500usb_config_mac_addr(struct rt2x00_dev *rt2x00dev,
  __le32 *mac)
 {
-   rt2500usb_register_multiwrite(rt2x00dev, MAC_CSR2, mac,
+   rt2500usb_register_multiwrite(rt2x00dev, MAC_CSR2, mac,
  (3 * sizeof(__le16)));
 }
 
diff --git a/drivers/net/wireless/rt2x00/rt2x00pci.c 
b/drivers/net/wireless/rt2x00/rt2x00pci.c
index 2780df0..6d5d9ab 100644
--- a/drivers/net/wireless/rt2x00/rt2x00pci.c
+++ b/drivers/net/wireless/rt2x00/rt2x00pci.c
@@ -124,7 +124,10 @@ void rt2x00pci_rxdone(struct rt2x00_dev *rt2x00dev)
struct data_entry *entry;
struct data_desc *rxd;
struct sk_buff *skb;
+   struct ieee80211_hdr *hdr;
struct rxdata_entry_desc desc;
+   int header_size;
+   int align;
u32 word;
 
while (1) {
@@ -138,17 +141,26 @@ void rt2x00pci_rxdone(struct rt2x00_dev *rt2x00dev)
memset(desc, 0x00, sizeof(desc));
rt2x00dev-ops-lib-fill_rxdone(entry, desc);
 
+   hdr = (struct ieee80211_hdr *)entry-data_addr;
+   header_size =
+   ieee80211_get_hdrlen(le16_to_cpu(hdr-frame_control));
+
+   /*
+* The data behind the ieee80211 header must be
+* aligned on a 4 byte boundary.
+*/
+   align = NET_IP_ALIGN + (2 * (header_size % 4 == 0));
+
/*
 * Allocate the sk_buffer, initialize it and copy
 * all data into it.
 */
-   skb = dev_alloc_skb(desc.size + NET_IP_ALIGN);
+   skb = dev_alloc_skb(desc.size + align);
if (!skb)
return;
 
-   skb_reserve(skb, NET_IP_ALIGN);
-   skb_put(skb, desc.size);
-   memcpy(skb-data, entry-data_addr, desc.size);
+   skb_reserve(skb, align);
+   memcpy(skb_put(skb, desc.size), entry-data_addr, desc.size);
 
/*
 * Send the frame to rt2x00lib for further processing.
diff --git a/drivers/net/wireless/rt2x00/rt2x00usb.c 
b/drivers/net/wireless/rt2x00/rt2x00usb.c
index 1f5675d..ab4797e 100644
--- a/drivers/net/wireless/rt2x00/rt2x00usb.c
+++ b/drivers/net/wireless/rt2x00/rt2x00usb.c
@@ -221,7 +221,9 @@ static void rt2x00usb_interrupt_rxdone(struct urb *urb)
struct data_ring *ring = entry-ring;
struct rt2x00_dev *rt2x00dev = ring-rt2x00dev;
struct sk_buff *skb;
+   struct ieee80211_hdr *hdr;
struct rxdata_entry_desc desc;
+   int header_size;
int frame_size;
 
if (!test_bit(DEVICE_ENABLED_RADIO, rt2x00dev-flags) ||
@@ -253,9 +255,20 @@ static void rt2x00usb_interrupt_rxdone(struct urb *urb)
skb_put(skb, frame_size);
 
/*
-* Trim the skb_buffer to only contain the valid
-* frame data (so ignore the device's descriptor).
+* The data behind the ieee80211 header must be
+* aligned on a 4 byte boundary.
+* After that trim the entire buffer down to only
+* contain the valid frame data excluding the device
+* descriptor.
 */
+   hdr = (struct ieee80211_hdr *)entry-skb-data;
+

Re: AF_UNIX MSG_PEEK bug?

2008-01-10 Thread Brent Casavant

Here's what I think is a better patch.  Or maybe just simpler.

However, I'm still unsure what the effect of this patch on
file descriptor passing might be.  Reading the prior code,
and the parallel portions/comments in unix_dgram_recvmsg(),
it looks like there's been a lot of uncertainty as to how
file descriptor passing should be handled durning MSG_PEEK
operations.  To quote:

/* It is questionable: on PEEK we could:
   - do not return fds - good, but too simple 8)
   - return fds, and do not return them on read (old strategy,
 apparently wrong)
   - clone fds (I chose it for now, it is the most universal
 solution)

   POSIX 1003.1g does not actually define this clearly
   at all. POSIX 1003.1g doesn't define a lot of things
   clearly however!
*/

With this patch, passed file descriptors are ignored during MSG_PEEK.
This is essentially the first case in the comment above.  What I
can't seem to figure out is why this is incorrect.  I suspect there's
some history here that I can't find via Google, mailing list archives,
or revision logs.

So, that said, here's a cleaner patch.  It's still not ready for
application until the file descriptor passing is better understood.

Thanks,
Brent

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 060bba4..6d6cdb4 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1750,6 +1750,8 @@ static int unix_stream_recvmsg(struct ki
int target;
int err = 0;
long timeo;
+   struct sk_buff *skb;
+   struct sk_buff_head peek_stack;
 
err = -EINVAL;
if (sk-sk_state != TCP_ESTABLISHED)
@@ -1759,6 +1761,9 @@ static int unix_stream_recvmsg(struct ki
if (flagsMSG_OOB)
goto out;
 
+   if (flags  MSG_PEEK)
+   skb_queue_head_init(peek_stack);
+
target = sock_rcvlowat(sk, flagsMSG_WAITALL, size);
timeo = sock_rcvtimeo(sk, flagsMSG_DONTWAIT);
 
@@ -1778,7 +1783,6 @@ static int unix_stream_recvmsg(struct ki
do
{
int chunk;
-   struct sk_buff *skb;
 
unix_state_lock(sk);
skb = skb_dequeue(sk-sk_receive_queue);
@@ -1864,19 +1868,14 @@ static int unix_stream_recvmsg(struct ki
 
if (siocb-scm-fp)
break;
-   }
-   else
-   {
-   /* It is questionable, see note in unix_dgram_recvmsg.
-*/
-   if (UNIXCB(skb).fp)
-   siocb-scm-fp = scm_fp_dup(UNIXCB(skb).fp);
+   } else
+   __skb_queue_head(peek_stack, skb);
+   } while (size);
 
-   /* put message back and return */
+   /* Push all peeked skbs back onto receive queue */
+   if (flags  MSG_PEEK)
+   while ((skb = __skb_dequeue(peek_stack)))
skb_queue_head(sk-sk_receive_queue, skb);
-   break;
-   }
-   } while (size);
 
mutex_unlock(u-readlock);
scm_recv(sock, msg, siocb-scm, flags);
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: AF_UNIX MSG_PEEK bug?

2008-01-10 Thread Alan Cox

 and the parallel portions/comments in unix_dgram_recvmsg(),
 it looks like there's been a lot of uncertainty as to how
 file descriptor passing should be handled durning MSG_PEEK
 operations.  To quote:

The specs basically don't answer the question. What is critical is that
the behaviour does not change compared with older Linux releases.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re : Re : Re : Re : Re : Bonding : Monitoring of 4965 wireless card

2008-01-10 Thread patnel972-linux

I try arp monitoring but it doesn't work! Test an ip, the interface must have 
an address, and the dhcpcd is launch by ifplugd if bond0 is linked ... so it 
goes around in circles.
So i return to miimon, and i figured out that bond detect when wlan0 is 
associated and set it active interface. But when i switch rf_kill it don't 
react. So i try to deassociate and magic it detect interface off!! I presume it 
is a bug of the wlan driver which not re-initialise the info on the wlan. So i 
made a small script in acpi to provide that behavior.



- Message d'origine 
De : Jay Vosburgh [EMAIL PROTECTED]
À : [EMAIL PROTECTED]
Cc : netdev@vger.kernel.org
Envoyé le : Jeudi, 10 Janvier 2008, 21h59mn 20s
Objet : Re: Re : Re : Re : Re : Bonding : Monitoring of 4965 wireless card 

[EMAIL PROTECTED] wrote:

Yes it's what i'm looking for. I don't understand how to change the
 arp_ip_target with the gateway, arp_ip_target is a module option.

If you're running a relatively recent bonding driver (version
3.0.0 or later), the arp_ip_targets can be changed on the fly via
 sysfs,
e.g.,

echo +10.0.0.1  /sys/class/net/bond0/bonding/arp_ip_target
echo -20.0.0.1  /sys/class/net/bond0/bonding/arp_ip_target

You can check out Documentation/networking/bonding.txt (in the
kernel source code) for more details.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]


- Message d'origine 
De : Jay Vosburgh [EMAIL PROTECTED]
À : [EMAIL PROTECTED]
Cc : netdev@vger.kernel.org
Envoyé le : Jeudi, 10 Janvier 2008, 0h26mn 38s
Objet : Re: Re : Re : Re : Bonding : Monitoring of 4965 wireless card 

[EMAIL PROTECTED] wrote:

I mean that instead of arp test an ip in lan or else, i want it to
 test 127.0.0.1 but in order to do this it must go out and re-enter
 and
 then use wlan0 to go out.

In other words, what I think you're saying (and I'm not entirely
sure here) is that you want probes to go to a remote node on the
network, and back, without having to actually know the identity of the
remote node (because, presumably, on a roaming type of wireless
configuration, your gateway and whatnot can change from time to time).

Is that what you're looking for?

That isn't available now, but might be straightforward to plug
into the address update system to keep the arp_ip_target up to date as
the current gateway as the gateway changes.  I haven't looked into the
details of doing that, but in theory it sounds straightforward.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html





  
_ 
Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail 
http://mail.yahoo.fr
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache

2008-01-10 Thread Jarek Poplawski

Eric Dumazet wrote, On 01/09/2008 11:37 AM:
...
 [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
...
 diff --git a/net/ipv4/route.c b/net/ipv4/route.c
 index d337706..28484f3 100644
 --- a/net/ipv4/route.c
 +++ b/net/ipv4/route.c
 @@ -283,12 +283,12 @@ static struct rtable *rt_cache_get_first(struct 
 seq_file *seq)
   break;
   rcu_read_unlock_bh();
   }
 - return r;
 + return rcu_dereference(r);
  }
  
  static struct rtable *rt_cache_get_next(struct seq_file *seq, struct rtable 
 *r)
  {
 - struct rt_cache_iter_state *st = rcu_dereference(seq-private);
 + struct rt_cache_iter_state *st = seq-private;
  
   r = r-u.dst.rt_next;
   while (!r) {
 @@ -298,7 +298,7 @@ static struct rtable *rt_cache_get_next(struct seq_file 
 *seq, struct rtable *r)
   rcu_read_lock_bh();
   r = rt_hash_table[st-bucket].chain;
   }
 - return r;
 + return rcu_dereference(r);
  }

It seems this optimization could've a side effect: if during such a
loop updates are done, and r is seen !NULL during while() check, but
NULL after rcu_dereference(), the listing/counting could stop too
soon. So, IMHO, probably the first version of this patch is more
reliable. (Or alternatively additional check is needed before return.)

Regards,
Jarek P.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/4] ipg: balance locking in irq handler

Spotted-by: [EMAIL PROTECTED]
Signed-off-by: Francois Romieu [EMAIL PROTECTED]
---
 drivers/net/ipg.c |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ipg.c b/drivers/net/ipg.c
index dbd23bb..cd1650e 100644
--- a/drivers/net/ipg.c
+++ b/drivers/net/ipg.c
@@ -1630,6 +1630,8 @@ static irqreturn_t ipg_interrupt_handler(int irq, void 
*dev_inst)
 #ifdef JUMBO_FRAME
ipg_nic_rxrestore(dev);
 #endif
+   spin_lock(sp-lock);
+
/* Get interrupt source information, and acknowledge
 * some (i.e. TxDMAComplete, RxDMAComplete, RxEarly,
 * IntRequested, MacControlFrame, LinkEvent) interrupts
@@ -1647,9 +1649,7 @@ static irqreturn_t ipg_interrupt_handler(int irq, void 
*dev_inst)
handled = 1;
 
if (unlikely(!netif_running(dev)))
-   goto out;
-
-   spin_lock(sp-lock);
+   goto out_unlock;
 
/* If RFDListEnd interrupt, restore all used RFDs. */
if (status  IPG_IS_RFD_LIST_END) {
@@ -1733,9 +1733,9 @@ out_enable:
ipg_w16(IPG_IE_TX_DMA_COMPLETE | IPG_IE_RX_DMA_COMPLETE |
IPG_IE_HOST_ERROR | IPG_IE_INT_REQUESTED | IPG_IE_TX_COMPLETE |
IPG_IE_LINK_EVENT | IPG_IE_UPDATE_STATS, INT_ENABLE);
-
+out_unlock:
spin_unlock(sp-lock);
-out:
+
return IRQ_RETVAL(handled);
 }
 
-- 
1.5.3.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/4] Pull request for 'ipg-fixes' branch

Please pull from branch 'ipg-fixes' in repository

git://git.kernel.org/pub/scm/linux/kernel/git/romieu/netdev-2.6.git ipg-fixes

to get the changes below.

I have tested the driver with a PIV HT based motherboard. The network
controller is connected to a fast ethernet switch (yeah, I'm cheap).
A second host is performing two loop of scp in both direction for a
400Mb file. The files are sha1sumed after each scp. I have added a
'ping -q -f -l16' from the computer under test after some time.

After 35 copies from the computer under test and 28 copies to it
(the ping eats a bit):
 total   used   free sharedbuffers cached
Mem:   10190401003860  15180  0  20556 936792
-/+ buffers/cache:  46512 972528
Swap:  2031608  02031608

Before:
 total   used   free sharedbuffers cached
Mem:   1019040 572036 447004  0  14988 525924
-/+ buffers/cache:  31124 987916
Swap:  2031608  02031608

/proc/slabinfo before and after the test are attached.

The driver is still a POMS but it seems better now.

I will not be available to work further on this issue before sunday.
I'd appreciate being Cced though.

Distance from 'net-2.6/master' (27d1cba21fcc50c37eef5042c6be9fa7135e88fc)
-

286c83ce6e8263a5c4c55a57b4c1040800de0171
d42f3afc953f9c99ffe84667a3ecf0d3b69f3d64
358bf4b8e8cbde5d6411b219e93a61728c892685
a58cceed4464ba8ae94294184c15f43e92a5de89

Diffstat


 drivers/net/ipg.c |   36 
 1 files changed, 12 insertions(+), 24 deletions(-)

Shortlog


Francois Romieu (4):
  ipg: balance locking in irq handler
  ipg: plug Tx completion leak
  ipg: fix queue stop condition in the xmit handler
  ipg: fix Tx completion irq request

Patch
-

diff --git a/drivers/net/ipg.c b/drivers/net/ipg.c
index dbd23bb..50f0c17 100644
--- a/drivers/net/ipg.c
+++ b/drivers/net/ipg.c
@@ -857,21 +857,14 @@ static void init_tfdlist(struct net_device *dev)
 static void ipg_nic_txfree(struct net_device *dev)
 {
struct ipg_nic_private *sp = netdev_priv(dev);
-   void __iomem *ioaddr = sp-ioaddr;
-   unsigned int curr;
-   u64 txd_map;
-   unsigned int released, pending;
-
-   txd_map = (u64)sp-txd_map;
-   curr = ipg_r32(TFD_LIST_PTR_0) -
-   do_div(txd_map, sizeof(struct ipg_tx)) - 1;
+   unsigned int released, pending, dirty;
 
IPG_DEBUG_MSG(_nic_txfree\n);
 
pending = sp-tx_current - sp-tx_dirty;
+   dirty = sp-tx_dirty % IPG_TFDLIST_LENGTH;
 
for (released = 0; released  pending; released++) {
-   unsigned int dirty = sp-tx_dirty % IPG_TFDLIST_LENGTH;
struct sk_buff *skb = sp-TxBuff[dirty];
struct ipg_tx *txfd = sp-txd + dirty;
 
@@ -882,11 +875,8 @@ static void ipg_nic_txfree(struct net_device *dev)
 * If the TFDDone bit is set, free the associated
 * buffer.
 */
-   if (dirty == curr)
-   break;
-
-   /* Setup TFDDONE for compatible issue. */
-   txfd-tfc |= cpu_to_le64(IPG_TFC_TFDDONE);
+   if (!(txfd-tfc  cpu_to_le64(IPG_TFC_TFDDONE)))
+break;
 
/* Free the transmit buffer. */
if (skb) {
@@ -898,6 +888,7 @@ static void ipg_nic_txfree(struct net_device *dev)
 
sp-TxBuff[dirty] = NULL;
}
+   dirty = (dirty + 1) % IPG_TFDLIST_LENGTH;
}
 
sp-tx_dirty += released;
@@ -1630,6 +1621,8 @@ static irqreturn_t ipg_interrupt_handler(int irq, void 
*dev_inst)
 #ifdef JUMBO_FRAME
ipg_nic_rxrestore(dev);
 #endif
+   spin_lock(sp-lock);
+
/* Get interrupt source information, and acknowledge
 * some (i.e. TxDMAComplete, RxDMAComplete, RxEarly,
 * IntRequested, MacControlFrame, LinkEvent) interrupts
@@ -1647,9 +1640,7 @@ static irqreturn_t ipg_interrupt_handler(int irq, void 
*dev_inst)
handled = 1;
 
if (unlikely(!netif_running(dev)))
-   goto out;
-
-   spin_lock(sp-lock);
+   goto out_unlock;
 
/* If RFDListEnd interrupt, restore all used RFDs. */
if (status  IPG_IS_RFD_LIST_END) {
@@ -1733,9 +1724,9 @@ out_enable:
ipg_w16(IPG_IE_TX_DMA_COMPLETE | IPG_IE_RX_DMA_COMPLETE |
IPG_IE_HOST_ERROR | IPG_IE_INT_REQUESTED | IPG_IE_TX_COMPLETE |
IPG_IE_LINK_EVENT | IPG_IE_UPDATE_STATS, INT_ENABLE);
-
+out_unlock:
spin_unlock(sp-lock);
-out:
+
return IRQ_RETVAL(handled);
 }
 
@@ -1943,10 +1934,7 @@ static int ipg_nic_hard_start_xmit(struct sk_buff *skb, 
struct net_device *dev)
 */
if (sp-tenmbpsmode)
txfd-tfc |=

[PATCH 2/4] ipg: plug Tx completion leak

The Tx skb release could not free more than one skb per call.
Add it to the fact that the xmit handler does not check for
a queue full condition and you have a recipe to leak quickly.

Let's release every pending Tx descriptor which has been given
back to the host CPU by the network controller. The xmit handler
suggests that it is done through the IPG_TFC_TFDDONE bit.

Remove the former curr computing: it does not produce anything
usable in its current form.

Signed-off-by: Francois Romieu [EMAIL PROTECTED]
---
 drivers/net/ipg.c |   19 +--
 1 files changed, 5 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ipg.c b/drivers/net/ipg.c
index cd1650e..9752902 100644
--- a/drivers/net/ipg.c
+++ b/drivers/net/ipg.c
@@ -857,21 +857,14 @@ static void init_tfdlist(struct net_device *dev)
 static void ipg_nic_txfree(struct net_device *dev)
 {
struct ipg_nic_private *sp = netdev_priv(dev);
-   void __iomem *ioaddr = sp-ioaddr;
-   unsigned int curr;
-   u64 txd_map;
-   unsigned int released, pending;
-
-   txd_map = (u64)sp-txd_map;
-   curr = ipg_r32(TFD_LIST_PTR_0) -
-   do_div(txd_map, sizeof(struct ipg_tx)) - 1;
+   unsigned int released, pending, dirty;
 
IPG_DEBUG_MSG(_nic_txfree\n);
 
pending = sp-tx_current - sp-tx_dirty;
+   dirty = sp-tx_dirty % IPG_TFDLIST_LENGTH;
 
for (released = 0; released  pending; released++) {
-   unsigned int dirty = sp-tx_dirty % IPG_TFDLIST_LENGTH;
struct sk_buff *skb = sp-TxBuff[dirty];
struct ipg_tx *txfd = sp-txd + dirty;
 
@@ -882,11 +875,8 @@ static void ipg_nic_txfree(struct net_device *dev)
 * If the TFDDone bit is set, free the associated
 * buffer.
 */
-   if (dirty == curr)
-   break;
-
-   /* Setup TFDDONE for compatible issue. */
-   txfd-tfc |= cpu_to_le64(IPG_TFC_TFDDONE);
+   if (!(txfd-tfc  cpu_to_le64(IPG_TFC_TFDDONE)))
+break;
 
/* Free the transmit buffer. */
if (skb) {
@@ -898,6 +888,7 @@ static void ipg_nic_txfree(struct net_device *dev)
 
sp-TxBuff[dirty] = NULL;
}
+   dirty = (dirty + 1) % IPG_TFDLIST_LENGTH;
}
 
sp-tx_dirty += released;
-- 
1.5.3.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache

2008-01-10 Thread Paul E. McKenney

On Fri, Jan 11, 2008 at 12:10:42AM +0100, Jarek Poplawski wrote:
 Eric Dumazet wrote, On 01/09/2008 11:37 AM:
 ...
  [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
 ...
  diff --git a/net/ipv4/route.c b/net/ipv4/route.c
  index d337706..28484f3 100644
  --- a/net/ipv4/route.c
  +++ b/net/ipv4/route.c
  @@ -283,12 +283,12 @@ static struct rtable *rt_cache_get_first(struct 
  seq_file *seq)
  break;
  rcu_read_unlock_bh();
  }
  -   return r;
  +   return rcu_dereference(r);
   }
   
   static struct rtable *rt_cache_get_next(struct seq_file *seq, struct 
  rtable *r)
   {
  -   struct rt_cache_iter_state *st = rcu_dereference(seq-private);
  +   struct rt_cache_iter_state *st = seq-private;
   
  r = r-u.dst.rt_next;
  while (!r) {
  @@ -298,7 +298,7 @@ static struct rtable *rt_cache_get_next(struct seq_file 
  *seq, struct rtable *r)
  rcu_read_lock_bh();
  r = rt_hash_table[st-bucket].chain;
  }
  -   return r;
  +   return rcu_dereference(r);
   }
 
 It seems this optimization could've a side effect: if during such a
 loop updates are done, and r is seen !NULL during while() check, but
 NULL after rcu_dereference(), the listing/counting could stop too
 soon. So, IMHO, probably the first version of this patch is more
 reliable. (Or alternatively additional check is needed before return.)

Looks to me like r is a local variable (argument list), so there
should not be any possibility of it being changed by some other
task, right?

Thanx, Paul
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/4] ipg: fix Tx completion irq request

The current logic will only request an ack for the first pending
packet. No irq is triggered as soon as the CPU submits a few
packets a bit quickly.  Let's request an irq for every packet
instead.

Signed-off-by: Francois Romieu [EMAIL PROTECTED]
---
 drivers/net/ipg.c |5 +
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ipg.c b/drivers/net/ipg.c
index b234b29..50f0c17 100644
--- a/drivers/net/ipg.c
+++ b/drivers/net/ipg.c
@@ -1934,10 +1934,7 @@ static int ipg_nic_hard_start_xmit(struct sk_buff *skb, 
struct net_device *dev)
 */
if (sp-tenmbpsmode)
txfd-tfc |= cpu_to_le64(IPG_TFC_TXINDICATE);
-   else if (!((sp-tx_current - sp-tx_dirty + 1) 
-   IPG_FRAMESBETWEENTXDMACOMPLETES)) {
-   txfd-tfc |= cpu_to_le64(IPG_TFC_TXDMAINDICATE);
-   }
+   txfd-tfc |= cpu_to_le64(IPG_TFC_TXDMAINDICATE);
/* Based on compilation option, determine if FCS is to be
 * appended to transmit frame by IPG.
 */
-- 
1.5.3.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/4] ipg: fix queue stop condition in the xmit handler

Signed-off-by: Francois Romieu [EMAIL PROTECTED]
---
 drivers/net/ipg.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ipg.c b/drivers/net/ipg.c
index 9752902..b234b29 100644
--- a/drivers/net/ipg.c
+++ b/drivers/net/ipg.c
@@ -1994,7 +1994,7 @@ static int ipg_nic_hard_start_xmit(struct sk_buff *skb, 
struct net_device *dev)
ipg_w32(IPG_DC_TX_DMA_POLL_NOW, DMA_CTRL);
 
if (sp-tx_current == (sp-tx_dirty + IPG_TFDLIST_LENGTH))
-   netif_wake_queue(dev);
+   netif_stop_queue(dev);
 
spin_unlock_irqrestore(sp-lock, flags);
 
-- 
1.5.3.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache

2008-01-10 Thread Herbert Xu

On Fri, Jan 11, 2008 at 12:10:42AM +0100, Jarek Poplawski wrote:

 It seems this optimization could've a side effect: if during such a
 loop updates are done, and r is seen !NULL during while() check, but
 NULL after rcu_dereference(), the listing/counting could stop too
 soon. So, IMHO, probably the first version of this patch is more
 reliable. (Or alternatively additional check is needed before return.)

No, while the value of r-u.dst.rt_next can change between two readings,
the value of r cannot.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RFC: igb: Intel 82575 gigabit ethernet driver (take #3)

2008-01-10 Thread Jeff Garzik

Kok, Auke wrote:

All,

here is the third version of the igb (82575) ethernet controller driver. This
driver was previously posted 2007-07-13 and 2007-12-11. Many comments received
were addressed:

- removed indirection wrappers in the same way as e1000e and ixgbe.
- cleaned up largely against sparse, checkpatch
- removed module parameters and moved functionality to ethtool ioctls
- new NAPI API rewrites
- by default the driver runs in multiqueue mode with 2 to 40 RX queues enabled.

and specifically in this version:

- register macro's were condensed for readability
- fixed namespace collisions by renaming functions to igb_*

Since the driver is still too large (allthough the patch shrunk from 558k to
416k
to 407k, almost 38% of its size) to post to this list I am attaching the bzipped
patch here. You can get the same driver alternatively from here:

http://foo-projects.org/~sofar/0001-igb-PCI-Express-82575-Gigabit-Ethernet-driver.patch
[407k]
http://foo-projects.org/~sofar/0001-igb-PCI-Express-82575-Gigabit-Ethernet-driver.patch.bz2
[74k]

or through git:
git://lost.foo-projects.org/~ahkok/git/linux-2.6 #igb

There are several concerns still open for this driver:
- hardware code is still a large API. we're expecting more hardware to be
supported by this driver in the future. The API has already been scrubbed but we
anticipate that the remaining hooks will be used in the future.
- The register defines are still named E1000_ as they are mostly identical to
the e1000 chipsets (igb register space is a superset of most recent e1000
register
sets).

I think we can throw it into netdev#upstream if you're ready...

Jeff

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RFC: igb: Intel 82575 gigabit ethernet driver (take #3)