date:20080219

Re: [patch] pegasus.c

2008-02-19 Thread Petko Manolov

This is another attempt on sending the pegasus patch with Alpine.  I know 
it's even worse, but i've attached gzip-ed version just in case.


Here goes a brief description of what's changed:

---
This patch is fixing a driver bug triggered when malformed string is 
passed to the 'devid' module parameter.  The expected format is:


device_name:vendor_id:device_id:flags

but it turned out people often type:

somename::0

instead of:

somename:::0

which typically ends up dereferencing null pointer.


Signed-off-by: Petko Manolov [EMAIL PROTECTED]
---


cheers,
Petko


On Mon, 11 Feb 2008, Jeff Garzik wrote:


Petko Manolov wrote:

Hi Jeff,

Attached you'll find a patch that is fixing a driver bug triggered when 
malformed string is passed to the 'devid' module parameter.  The expected 
format is:


device_name:vendor_id:device_id:flags

but it turned out people often type:

somename::0

instead of:

somename:::0


ACK but two process problems preventing application:

* patch is base64-encoded

* no signed-off-by included




pegasus.c.patch.gz
Description: Binary data
--- drivers/net/usb/pegasus.c.orig	2008-01-09 12:16:52.0 +0200
+++ drivers/net/usb/pegasus.c	2008-01-09 12:16:58.0 +0200
@@ -1461,12 +1461,24 @@ static void parse_id(char *id)
 
 	if ((token = strsep(id, :)) != NULL)
 		name = token;
+	else
+		goto err;
 	/* name now points to a null terminated string*/
 	if ((token = strsep(id, :)) != NULL)
 		vendor_id = simple_strtoul(token, NULL, 16);
+	else
+		goto err;
+
 	if ((token = strsep(id, :)) != NULL)
 		device_id = simple_strtoul(token, NULL, 16);
-	flags = simple_strtoul(id, NULL, 16);
+	else
+		goto err;
+
+	if (id != NULL)
+		flags = simple_strtoul(id, NULL, 16);
+	else
+		goto err;
+
 	pr_info(%s: new device %s, vendor ID 0x%04x, device ID 0x%04x, flags: 0x%x\n,
 	driver_name, name, vendor_id, device_id, flags);
 
@@ -1476,6 +1488,7 @@ static void parse_id(char *id)
 		return;
 
 	for (i=0; usb_dev_id[i].name; i++);
+
 	usb_dev_id[i].name = name;
 	usb_dev_id[i].vendor = vendor_id;
 	usb_dev_id[i].device = device_id;
@@ -1483,6 +1496,11 @@ static void parse_id(char *id)
 	pegasus_ids[i].match_flags = USB_DEVICE_ID_MATCH_DEVICE;
 	pegasus_ids[i].idVendor = vendor_id;
 	pegasus_ids[i].idProduct = device_id;
+
+	return;
+
+err:
+	pr_info(malformed 'devid' module parameter\n);
 }
 
 static int __init pegasus_init(void)

Re: Linux 2.6.24.1 - kernel does not boot; IRQ trouble?

2008-02-19 Thread Kay Sievers

On Feb 18, 2008 9:06 PM, Stephen Hemminger
[EMAIL PROTECTED] wrote:
 On Mon, 18 Feb 2008 19:42:25 + (GMT)
 Chris Rankin [EMAIL PROTECTED] wrote:

  --- Stephen Hemminger [EMAIL PROTECTED] wrote:
 sysfs: duplicate filename 'bridge' can not be created
 WARNING: at fs/sysfs/dir.c:424 sysfs_add_one()
 Pid: 1, comm: swapper Not tainted 2.6.24.1 #1
  [c0105020] show_trace_log_lvl+0x1a/0x2f
  [c0105990] show_trace+0x12/0x14
  [c010613d] dump_stack+0x6c/0x72
  [c01991bf] sysfs_add_one+0x57/0xbc
  [c0199e41] sysfs_create_link+0xc2/0x10d
  [c01bae9a] pci_bus_add_devices+0xbd/0x103
  [c034016c] pci_legacy_init+0x56/0xe3
  [c03274e1] kernel_init+0x157/0x2c3
  [c0104c83] kernel_thread_helper+0x7/0x10
  ===
 pci :00:01.0: Error creating sysfs bridge symlink, continuing...
 sysfs: duplicate filename 'bridge' can not be created
 WARNING: at fs/sysfs/dir.c:424 sysfs_add_one()
 Pid: 1, comm: swapper Not tainted 2.6.24.1 #1
  [c0105020] show_trace_log_lvl+0x1a/0x2f
  [c0105990] show_trace+0x12/0x14
  [c010613d] dump_stack+0x6c/0x72
  [c01991bf] sysfs_add_one+0x57/0xbc
  [c0199e41] sysfs_create_link+0xc2/0x10d
  [c01bae9a] pci_bus_add_devices+0xbd/0x103
  [c01bae82] pci_bus_add_devices+0xa5/0x103
  [c034016c] pci_legacy_init+0x56/0xe3
  [c03274e1] kernel_init+0x157/0x2c3
  [c0104c83] kernel_thread_helper+0x7/0x10
  ===
   
I have a vague feeling that this was fixed, perhaps in 2.6.24.x?
  
   Never heard of this, what is the initialization script that causes this?
   Also do you have the SYSFS_DEPRECATED option configured? that caused 
   issues
   with regular network drivers.
 
  Yes, SYSFS_DEPRECATED is enabled. And the init scripts are from Fedora 8.

 There was a bug (fixed in 2.6.24) that had to do with sysfs_create_link
 and SYSFS_DEPRECATED probably there is a similar problem with directories.

Chris, could you enable CONFIG_DEBUG_KOBJECT=y, it might show what
objects try to claim the same name.

Thanks,
Kay
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

AW: Problem receiving multicast/promiscuous-mode with kernel.2.6.24

2008-02-19 Thread Reither Robert

Visit AVD on prolight+sound in Frankfurt from 12.-15. March 2008 - Hall 8.0, 
Stand G16




Hi,

ok, i managed to shrink down my code to show the behaviour in small size ;-)

It shows the same effect as my app, sometimes i get all the packets after a 
mc_join, sometimes i get only the timeouts.
I find no predictive behaviour, i'm really desperate ...
Hope you can see the effect too .. Do u need an app sending the multicast VLAN 
packets ?

Robert

-

#include stdint.h
#include sys/select.h
#include errno.h
#include stdio.h
#include string.h
#include time.h
#include sys/types.h
#include sys/socket.h
#include netdb.h
#include netinet/in.h
#include stdio.h
#include stdlib.h
#include unistd.h
#include fcntl.h
#include arpa/inet.h
#include net/if.h
#include errno.h
#include netdb.h

//#include linux/if.h
#include linux/sockios.h
#include sys/ioctl.h

//#define JOIN_SINGLE_IF
#undef JOIN_SINGLE_IF


#define PORT 10500

/*
** Try to receive VLAN-tagged UDP multicast packets 
** VLAN-ID:3, VLAN_PRI=6, Addr: 224.1.9.0, dstnport 10500
**
** Kernel used: 2.6.24
**
** HW: VIA Epia 5000, VIA Epia LT (VIA Rhine II and VIA VT6107)
**
** Additional network settings:
** vconfig add eth0 3
** vconfig set_egress_map eth0.3 6 6
** ifconfig eth0.3 10.0.0.1
** route add -net 224.0.0.0 netmask 240.0.0.0 dev eth0.3
**
*/

#define MAX_PAYLOAD 1000

/* 
 * Network representation of the rtp header.
 */

struct rtp_header {
#ifdef WORDS_BIGENDIAN
uint16_t version:2;
uint16_t padbit:1;
uint16_t extbit:1;
uint16_t cc:4;
uint16_t markbit:1;
uint16_t paytype:7;
#else
uint16_t cc:4;
uint16_t extbit:1;
uint16_t padbit:1;
uint16_t version:2;
uint16_t paytype:7;
uint16_t markbit:1;
#endif
uint16_t seq_number;
uint32_t timestamp;
uint32_t ssrc;

/* In fact we rely on rtp_header being not larger than 
 * the minimum header length. So, there's no contributing
 * sync source field here. */
};

struct rtp_packet {
struct rtp_header header;
unsigned char payload[MAX_PAYLOAD];
};


#ifdef WORDS_BIGENDIAN
#error Sorry this is not supported on bigendian targets.
#endif

#define BYTE0(x) ((x)  0xff)
#define BYTE1(x) (((x)  8)   0xff)
#define BYTE2(x) (((x)  16)   0xff)
#define BYTE3(x) (((x)  24)   0xff)

static char *my_inet_ntoa (char *buf, size_t n, struct in_addr in)
{
if (snprintf (buf, n, %u.%u.%u.%u, BYTE0(in.s_addr), 
BYTE1(in.s_addr), BYTE2(in.s_addr), BYTE3(in.s_addr))  = n) {
return NULL;
}
return buf;
}

static int set_non_blocking (int fd)
{
int flags;

if ((flags = fcntl (fd, F_GETFL))  0) {
perror (fcntl getflags ());
return -1;
}
if (fcntl (fd, F_SETFL, flags | O_NONBLOCK)  0) {
perror (fcntl setflags ());
return -1;
}

return fd;
}

int main()
{
  int fd;
  fd_set read_fds;
  struct timespec timeout;
  
  while(1)
  {
int i, ret;

if ((fd = subscribe_udp (PORT))  0) {
printf(Could not register UDP port.\n);
return 1;
}

  
if (join_multicast_group (fd, 224.1.9.0)  0)
{
  printf(Error joining multicast !\n);
  return 1;  
}

for(i=0;i10;i++)
{

  FD_ZERO (read_fds);
FD_SET (fd, read_fds);
timeout.tv_sec = 1;
timeout.tv_nsec = 0;
  
  
ret = pselect (fd+1, read_fds, NULL, NULL, timeout, NULL);
  
  
if (ret  0) {
if (errno == EINTR) {
continue;
} else {
printf(select failed.\n);
return 1;
}
}

  if (FD_ISSET(fd, read_fds))
  {
ssize_t len;
struct rtp_packet p; 
struct sockaddr_in from;
socklen_t fromlen;
char from_ip[20];

  len = recvfrom (fd, p, sizeof (p), 0, (struct sockaddr*) from, 
fromlen);
my_inet_ntoa (from_ip, sizeof (from_ip), from.sin_addr);

printf(Received data packet from %s, length %u\n, from_ip, len);
  }
  else
  {
printf(Got timeout receiving Multicast packets !\n);  
  }
} /* loop 10 */
  
if (leave_multicast_group(fd, 224.1.9.0)  0)
{
  printf(Error leaving multicast !\n);
  return 1;  
}
close(fd);  
  }
}

int subscribe_udp (int port)
{
int s; 

s = create_listening_socket (port, 1);
return s;
}

int create_listening_socket (int listen_port, int udp) 
{
struct sockaddr_in a;
int s;
int yes;
if ((s = socket (AF_INET, udp ? SOCK_DGRAM : SOCK_STREAM, 0))

Re: tbench regression in 2.6.25-rc1

2008-02-19 Thread Zhang, Yanmin

On Tue, 2008-02-19 at 08:35 +0100, Eric Dumazet wrote:
 Zhang, Yanmin a �crit :
  On Mon, 2008-02-18 at 11:11 +0100, Eric Dumazet wrote:
  On Mon, 18 Feb 2008 16:12:38 +0800
  Zhang, Yanmin [EMAIL PROTECTED] wrote:

  On Fri, 2008-02-15 at 15:22 -0800, David Miller wrote:
  From: Eric Dumazet [EMAIL PROTECTED]
  Date: Fri, 15 Feb 2008 15:21:48 +0100

  On linux-2.6.25-rc1 x86_64 :

  offsetof(struct dst_entry, lastuse)=0xb0
  offsetof(struct dst_entry, __refcnt)=0xb8
  offsetof(struct dst_entry, __use)=0xbc
  offsetof(struct dst_entry, next)=0xc0

  So it should be optimal... I dont know why tbench prefers __refcnt 
  being 
  on 0xc0, since in this case lastuse will be on a different cache line...

  Each incoming IP packet will need to change lastuse, __refcnt and 
  __use, 
  so keeping them in the same cache line is a win.

  I suspect then that even this patch could help tbench, since it avoids 
  writing lastuse...
  I think your suspicions are right, and even moreso
  it helps to keep __refcnt out of the same cache line
  as input/output/ops which are read-almost-entirely :-
  I think you are right. The issue is these three variables sharing the 
  same cache line
  with input/output/ops.

  )

  I haven't done an exhaustive analysis, but it seems that
  the write traffic to lastuse and __refcnt are about the
  same.  However if we find that __refcnt gets hit more
  than lastuse in this workload, it explains the regression.
  I also think __refcnt is the key. I did a new testing by adding 2 
  unsigned long
  pading before lastuse, so the 3 members are moved to next cache line. The 
  performance is
  recovered.

  How about below patch? Almost all performance is recovered with the new 
  patch.

  Signed-off-by: Zhang Yanmin [EMAIL PROTECTED]

  ---

  --- linux-2.6.25-rc1/include/net/dst.h2008-02-21 14:33:43.0 
  +0800
  +++ linux-2.6.25-rc1_work/include/net/dst.h   2008-02-21 
  14:36:22.0 +0800
  @@ -52,11 +52,10 @@ struct dst_entry
unsigned short  header_len; /* more space at head required 
  */
unsigned short  trailer_len;/* space to reserve at tail */

  - u32 metrics[RTAX_MAX];
  - struct dst_entry*path;
  -
  - unsigned long   rate_last;  /* rate limiting for ICMP */
unsigned intrate_tokens;
  + unsigned long   rate_last;  /* rate limiting for ICMP */
  +
  + struct dst_entry*path;

   #ifdef CONFIG_NET_CLS_ROUTE
__u32   tclassid;
  @@ -70,10 +69,12 @@ struct dst_entry
int (*output)(struct sk_buff*);

struct  dst_ops *ops;
  - 
  - unsigned long   lastuse;
  +
  + u32 metrics[RTAX_MAX];
  +
atomic_t__refcnt;   /* client references*/
int __use;
  + unsigned long   lastuse;
union {
struct dst_entry *next;
struct rtable*rt_next;

  Well, after this patch, we grow dst_entry by 8 bytes :
  With my .config, it doesn't grow. Perhaps because of CONFIG_NET_CLS_ROUTE, 
  I don't
  enable it. I will move tclassid under ops.

  sizeof(struct dst_entry)=0xd0
  offsetof(struct dst_entry, input)=0x68
  offsetof(struct dst_entry, output)=0x70
  offsetof(struct dst_entry, __refcnt)=0xb4
  offsetof(struct dst_entry, lastuse)=0xc0
  offsetof(struct dst_entry, __use)=0xb8
  sizeof(struct rtable)=0x140

  So we dirty two cache lines instead of one, unless your cpu have 128 bytes 
  cache lines ?

  I am quite suprised that my patch to not change lastuse if already set to 
  jiffies changes nothing...

  If you have some time, could you also test this (unrelated) patch ?

  We can avoid dirty all the time a cache line of loopback device.

  diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
  index f2a6e71..0a4186a 100644
  --- a/drivers/net/loopback.c
  +++ b/drivers/net/loopback.c
  @@ -150,7 +150,10 @@ static int loopback_xmit(struct sk_buff *skb, struct 
  net_device *dev)
  return 0;
  }
   #endif
  -   dev-last_rx = jiffies;
  +#ifdef CONFIG_SMP
  +   if (dev-last_rx != jiffies)
  +#endif
  +   dev-last_rx = jiffies;

  /* it's OK to use per_cpu_ptr() because BHs are off */
  pcpu_lstats = netdev_priv(dev);

  Although I didn't test it, I don't think it's ok. The key is __refcnt 
  shares the same
  cache line with ops/input/output.

 Note it was unrelated to struct dst, but dirtying of one cache line of 
 'loopback netdevice'

 I tested it, and tbench result was better with this patch : 890 MB/s instead 
 of 870 MB/s on a bi dual core machine.
I tested your new patch and it doesn't help tbench.

On my 8-core stoakley machine, the regression is only 5%, but it's 30% on 
16-core tigerton.
It looks like the scalability is poor.

 I was curious of the potential gain on your 16 cores

Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver

2008-02-19 Thread James Chapman

David Miller wrote:

From: James Chapman [EMAIL PROTECTED]
Date: Mon, 18 Feb 2008 22:09:24 +

Here's a new version of the patch. The patch avoids disabling irqs
and fixes the sk_dst_get() usage that DaveM mentioned. But even with
this patch, lockdep still complains if hundreds of ppp sessions are
inserted into a tunnel as rapidly as possible (lockdep trace is
below). I can stop these errors by wrapping the call to ppp_input()
in pppol2tp_recv_dequeue_skb() with local_irq_save/restore. What is
a better fix?

Firstly, let's fix one thing at a time.  Leave the sk_dst_get()
thing alone until we can prove that it's part of the lockdep
traces.

In reproducing the problem, I obtained several lockdep traces that 
implicated sk_dst_get(). I changed the code to use __sk_dst_check() as 
you suggested and they went away. At that point, I was hopeful the 
locking issues were fixed. But after several minutes of 
creating/deleting hundreds of ppp sessions, lockdep dumped another 
error. It is that error that I posted yesterday.

Next, I can't see why ppp_input() needs to be invoked with
interrupts disabled.  There are many other things that invoke
that in software interrupt context, such as pppoe.

I agree. I'm seeking advice on what the underlying cause is of this new 
trace.

Please provide the lockdep traces without the ppp_input() IRQ
disabling so this can be properly analyzed.

The trace _was_ without ppp_input IRQ disabling. The trace doesn't occur 
if I disable IRQs around the ppp_input() call. The patch I sent showed 
the changes I made before running the tests that created the new lockdep 
trace. I'm sorry this wasn't clear.

--
James Chapman
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/17 net-2.6.26] [NETNS]: Default arp parameters lookup.

2008-02-19 Thread Denis V. Lunev

On Tue, 2008-02-19 at 10:14 +0100, Daniel Lezcano wrote:
 Denis V. Lunev wrote:
  Default ARP parameters should be findable regardless of the context.
  Required to make inetdev_event working.
  
  Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
  ---
   net/core/neighbour.c |4 +---
   1 files changed, 1 insertions(+), 3 deletions(-)
  
  diff --git a/net/core/neighbour.c b/net/core/neighbour.c
  index c895ad4..45ed620 100644
  --- a/net/core/neighbour.c
  +++ b/net/core/neighbour.c
  @@ -1275,9 +1275,7 @@ static inline struct neigh_parms 
  *lookup_neigh_params(struct neigh_table *tbl,
  struct neigh_parms *p;
  
  for (p = tbl-parms; p; p = p-next) {
  -   if (p-net != net)
  -   continue;
  -   if ((p-dev  p-dev-ifindex == ifindex) ||
  +   if ((p-dev  p-dev-ifindex == ifindex  p-net == net) ||
  (!p-dev  !ifindex))
  return p;
  }
 
 If the values are:
   p-dev == NULL
   ifindex == 0
   p-net != net
 
 The parms should not be taken into account and the looping must 
 continue. But with this modification it is not the case, if we specify 
 parms ifindex == 0, the first parms with the dev field set to NULL will 
 be taken belonging or not to the right net.

They should be taken. In the other case inetdev_event will fail for sure
in the middle. You could check.

These are ARP defaults and I do not see a problem for now to get them.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver

2008-02-19 Thread James Chapman


Jarek Poplawski wrote:

On Mon, Feb 18, 2008 at 10:09:24PM +, James Chapman wrote:

Jarek Poplawski wrote:

Hi,

It seems, this nice report is still uncomplete: could you check if
there could have been something more yet?

Unfortunately the ISP's syslog stops. But I've been able to borrow two
Quad Xeon boxes and have reproduced the problem.

Here's a new version of the patch. The patch avoids disabling irqs and
fixes the sk_dst_get() usage that DaveM mentioned. But even with this
patch, lockdep still complains if hundreds of ppp sessions are inserted
into a tunnel as rapidly as possible (lockdep trace is below). I can
stop these errors by wrapping the call to ppp_input() in
pppol2tp_recv_dequeue_skb() with local_irq_save/restore. What is a
better fix?


Hmm... This is a really long report and quite a bit different from
the previous one. I need some time for this. BTW: you sent before a
lockdep report with hlist_lock problem. I think this could be fixed
in some independent patch to make this all more readable. Are all
the other changes in this current patch only because of this or
previous lockdep report or for some other reasons (or reports) yet?


As I mentioned in my reply to davem, modifying the pppol2tp driver as 
described in the patch I sent made the original lockdep problems go away.



--
James Chapman
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/17 net-2.6.26] [NETNS]: Default arp parameters lookup.

2008-02-19 Thread Daniel Lezcano


Denis V. Lunev wrote:

On Tue, 2008-02-19 at 10:14 +0100, Daniel Lezcano wrote:

Denis V. Lunev wrote:

Default ARP parameters should be findable regardless of the context.
Required to make inetdev_event working.

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
---
 net/core/neighbour.c |4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index c895ad4..45ed620 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1275,9 +1275,7 @@ static inline struct neigh_parms 
*lookup_neigh_params(struct neigh_table *tbl,
struct neigh_parms *p;

for (p = tbl-parms; p; p = p-next) {
-   if (p-net != net)
-   continue;
-   if ((p-dev  p-dev-ifindex == ifindex) ||
+   if ((p-dev  p-dev-ifindex == ifindex  p-net == net) ||
(!p-dev  !ifindex))
return p;
}

If the values are:
p-dev == NULL
ifindex == 0
p-net != net

The parms should not be taken into account and the looping must 
continue. But with this modification it is not the case, if we specify 
parms ifindex == 0, the first parms with the dev field set to NULL will 
be taken belonging or not to the right net.


They should be taken. In the other case inetdev_event will fail for sure
in the middle. You could check.

These are ARP defaults and I do not see a problem for now to get them.


Because there is a parms default per namespace. So several instances of 
them per nd table. That was the initial approach with Eric's patchset.


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/17 net-2.6.26] [NETNS]: Default arp parameters lookup.

2008-02-19 Thread Denis V. Lunev

On Tue, 2008-02-19 at 10:51 +0100, Daniel Lezcano wrote:
 Denis V. Lunev wrote:
  On Tue, 2008-02-19 at 10:14 +0100, Daniel Lezcano wrote:
  Denis V. Lunev wrote:
  Default ARP parameters should be findable regardless of the context.
  Required to make inetdev_event working.
 
  Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
  ---
   net/core/neighbour.c |4 +---
   1 files changed, 1 insertions(+), 3 deletions(-)
 
  diff --git a/net/core/neighbour.c b/net/core/neighbour.c
  index c895ad4..45ed620 100644
  --- a/net/core/neighbour.c
  +++ b/net/core/neighbour.c
  @@ -1275,9 +1275,7 @@ static inline struct neigh_parms 
  *lookup_neigh_params(struct neigh_table *tbl,
struct neigh_parms *p;
 
for (p = tbl-parms; p; p = p-next) {
  - if (p-net != net)
  - continue;
  - if ((p-dev  p-dev-ifindex == ifindex) ||
  + if ((p-dev  p-dev-ifindex == ifindex  p-net == net) ||
(!p-dev  !ifindex))
return p;
}
  If the values are:
 p-dev == NULL
 ifindex == 0
 p-net != net
 
  The parms should not be taken into account and the looping must 
  continue. But with this modification it is not the case, if we specify 
  parms ifindex == 0, the first parms with the dev field set to NULL will 
  be taken belonging or not to the right net.
  
  They should be taken. In the other case inetdev_event will fail for sure
  in the middle. You could check.
  
  These are ARP defaults and I do not see a problem for now to get them.
 
 Because there is a parms default per namespace. So several instances of 
 them per nd table. That was the initial approach with Eric's patchset.
 

These changes are not in mainstream and I do not want to touch ARP as
this is not a simple thing. In reality ARP will be needed only when
we'll have a real device inside a namespace.

Right now I prefer to have minimal set of working changes to finish IP
and upper layers.

Regards,
Den

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/17 net-2.6.26] [NETNS]: Default arp parameters lookup.

2008-02-19 Thread Daniel Lezcano


Denis V. Lunev wrote:

On Tue, 2008-02-19 at 10:51 +0100, Daniel Lezcano wrote:

Denis V. Lunev wrote:

On Tue, 2008-02-19 at 10:14 +0100, Daniel Lezcano wrote:

Denis V. Lunev wrote:

Default ARP parameters should be findable regardless of the context.
Required to make inetdev_event working.

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
---
 net/core/neighbour.c |4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index c895ad4..45ed620 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1275,9 +1275,7 @@ static inline struct neigh_parms 
*lookup_neigh_params(struct neigh_table *tbl,
struct neigh_parms *p;

for (p = tbl-parms; p; p = p-next) {
-   if (p-net != net)
-   continue;
-   if ((p-dev  p-dev-ifindex == ifindex) ||
+   if ((p-dev  p-dev-ifindex == ifindex  p-net == net) ||
(!p-dev  !ifindex))
return p;
}

If the values are:
p-dev == NULL
ifindex == 0
p-net != net

The parms should not be taken into account and the looping must 
continue. But with this modification it is not the case, if we specify 
parms ifindex == 0, the first parms with the dev field set to NULL will 
be taken belonging or not to the right net.

They should be taken. In the other case inetdev_event will fail for sure
in the middle. You could check.

These are ARP defaults and I do not see a problem for now to get them.
Because there is a parms default per namespace. So several instances of 
them per nd table. That was the initial approach with Eric's patchset.




These changes are not in mainstream and I do not want to touch ARP as
this is not a simple thing. In reality ARP will be needed only when
we'll have a real device inside a namespace.

Right now I prefer to have minimal set of working changes to finish IP
and upper layers.


core/neighbour.c is a common part between several protocols, especially 
ipv4 and ipv6. If you modify this function just to fit your need in the 
arp that will block me for ipv6 until you make parms default per 
namespace. So please, find another way to do that, perhaps just add a 
helper function.


I suggest you do parms default per namespace first, it is quite small 
and easy :)


Just let me the time to send the copy-parms-default function.

Is it ok ?

  -- Daniel



 -- Daniel
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver

2008-02-19 Thread Jarek Poplawski

On Tue, Feb 19, 2008 at 10:30:47AM +, Jarek Poplawski wrote:
...
 IMHO, just like I wrote earlier, the main problem is in ppp_generic(),

...or maybe ppp_generic.c? Whatever...

Jarek P.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver

2008-02-19 Thread Jarek Poplawski

On Tue, Feb 19, 2008 at 09:03:12AM +, James Chapman wrote:
 David Miller wrote:
 From: James Chapman [EMAIL PROTECTED]
 Date: Mon, 18 Feb 2008 22:09:24 +

 Here's a new version of the patch. The patch avoids disabling irqs
 and fixes the sk_dst_get() usage that DaveM mentioned. But even with
 this patch, lockdep still complains if hundreds of ppp sessions are
 inserted into a tunnel as rapidly as possible (lockdep trace is
 below). I can stop these errors by wrapping the call to ppp_input()
 in pppol2tp_recv_dequeue_skb() with local_irq_save/restore. What is
 a better fix?

 Firstly, let's fix one thing at a time.  Leave the sk_dst_get()
 thing alone until we can prove that it's part of the lockdep
 traces.

 In reproducing the problem, I obtained several lockdep traces that  
 implicated sk_dst_get().

As a matter of fact I missed just that kind information on previous
lockdep report, so if you could send them too this should be still
helpful.

...
 I agree. I'm seeking advice on what the underlying cause is of this new  
 trace.

IMHO, just like I wrote earlier, the main problem is in ppp_generic(),
especially ppp_connect_channel(), where main tx  rx locks are used.
I didn't know enough about this sk_dst_lock traces yet. I hope I could
help with this, but after these changes I need some time to figure
this out again.

Jarek P.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/17 net-2.6.26] [NETNS]: Default arp parameters lookup.

2008-02-19 Thread Daniel Lezcano


Denis V. Lunev wrote:

Default ARP parameters should be findable regardless of the context.
Required to make inetdev_event working.

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
---
 net/core/neighbour.c |4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index c895ad4..45ed620 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1275,9 +1275,7 @@ static inline struct neigh_parms 
*lookup_neigh_params(struct neigh_table *tbl,
struct neigh_parms *p;

for (p = tbl-parms; p; p = p-next) {
-   if (p-net != net)
-   continue;
-   if ((p-dev  p-dev-ifindex == ifindex) ||
+   if ((p-dev  p-dev-ifindex == ifindex  p-net == net) ||
(!p-dev  !ifindex))
return p;
}


If the values are:
p-dev == NULL
ifindex == 0
p-net != net

The parms should not be taken into account and the looping must 
continue. But with this modification it is not the case, if we specify 
parms ifindex == 0, the first parms with the dev field set to NULL will 
be taken belonging or not to the right net.


IMO the right test is:

if (p-net == net  ((p-dev  p-dev-ifindex == ifindex) || !p-dev 
 !ifindex)))


I definitively prefer the first notation :)

 -  if (p-net != net)
 -  continue;

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-2.6.26 1/5][SYSCTL]: Merge equal code in sysctl proc handlers.

2008-02-19 Thread Pavel Emelyanov

The -read and -write callbacks act in a very similar way, so
merge these paths to reduce the number of places to patch later.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---
 fs/proc/proc_sysctl.c |   50 ++--
 1 files changed, 11 insertions(+), 39 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 614c34b..5e31585 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -165,8 +165,8 @@ out:
return err;
 }
 
-static ssize_t proc_sys_read(struct file *filp, char __user *buf,
-   size_t count, loff_t *ppos)
+static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf,
+   size_t count, loff_t *ppos, int write)
 {
struct dentry *dentry = filp-f_dentry;
struct ctl_table_header *head;
@@ -190,12 +190,12 @@ static ssize_t proc_sys_read(struct file *filp, char 
__user *buf,
 * and won't be until we finish.
 */
error = -EPERM;
-   if (sysctl_perm(table, MAY_READ))
+   if (sysctl_perm(table, write ? MAY_WRITE : MAY_READ))
goto out;
 
/* careful: calling conventions are nasty here */
res = count;
-   error = table-proc_handler(table, 0, filp, buf, res, ppos);
+   error = table-proc_handler(table, write, filp, buf, res, ppos);
if (!error)
error = res;
 out:
@@ -204,44 +204,16 @@ out:
return error;
 }
 
-static ssize_t proc_sys_write(struct file *filp, const char __user *buf,
+static ssize_t proc_sys_read(struct file *filp, char __user *buf,
size_t count, loff_t *ppos)
 {
-   struct dentry *dentry = filp-f_dentry;
-   struct ctl_table_header *head;
-   struct ctl_table *table;
-   ssize_t error;
-   size_t res;
-
-   table = do_proc_sys_lookup(dentry-d_parent, dentry-d_name, head);
-   /* Has the sysctl entry disappeared on us? */
-   error = -ENOENT;
-   if (!table)
-   goto out;
-
-   /* Has the sysctl entry been replaced by a directory? */
-   error = -EISDIR;
-   if (!table-proc_handler)
-   goto out;
-
-   /*
-* At this point we know that the sysctl was not unregistered
-* and won't be until we finish.
-*/
-   error = -EPERM;
-   if (sysctl_perm(table, MAY_WRITE))
-   goto out;
-
-   /* careful: calling conventions are nasty here */
-   res = count;
-   error = table-proc_handler(table, 1, filp, (char __user *)buf,
-   res, ppos);
-   if (!error)
-   error = res;
-out:
-   sysctl_head_finish(head);
+   return proc_sys_call_handler(filp, (void __user *)buf, count, ppos, 0);
+}
 
-   return error;
+static ssize_t proc_sys_write(struct file *filp, const char __user *buf,
+   size_t count, loff_t *ppos)
+{
+   return proc_sys_call_handler(filp, (void __user *)buf, count, ppos, 1);
 }
 
 
-- 
1.5.3.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Bugme-new] [Bug 9920] New: kernel panic when using ebtables redirect target

2008-02-19 Thread Patrick McHardy


David Miller wrote:

From: Joonwoo Park [EMAIL PROTECTED]
Date: Tue, 19 Feb 2008 11:53:24 +0900


[PATCH] netfilter: fix incorrect use of skb_make_writable

http://bugzilla.kernel.org/show_bug.cgi?id=9920
The function skb_make_writable returns true or false.

Signed-off-by: Joonwoo Park [EMAIL PROTECTED]


I'll let Patrick pull this in, thanks!



Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-2.6.26 2/5][SYSCTL]: Clean sysctls from unneeded extern and forward declarations.

2008-02-19 Thread Pavel Emelyanov

The do_sysctl_strategy can be static since it's used in 
kernel/sysctl.c only.

Besides, move it and parse_table above their callers and 
drop the forward declarations.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---
 include/linux/sysctl.h |5 --
 kernel/sysctl.c|  144 +++-
 2 files changed, 68 insertions(+), 81 deletions(-)

diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 571f01d..8e50196 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -981,11 +981,6 @@ extern int do_sysctl (int __user *name, int nlen,
  void __user *oldval, size_t __user *oldlenp,
  void __user *newval, size_t newlen);
 
-extern int do_sysctl_strategy (struct ctl_table *table,
-  int __user *name, int nlen,
-  void __user *oldval, size_t __user *oldlenp,
-  void __user *newval, size_t newlen);
-
 extern ctl_handler sysctl_data;
 extern ctl_handler sysctl_string;
 extern ctl_handler sysctl_intvec;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 740e144..c224cc5 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -145,12 +145,6 @@ extern int no_unaligned_warning;
 extern int max_lock_depth;
 #endif
 
-#ifdef CONFIG_SYSCTL_SYSCALL
-static int parse_table(int __user *, int, void __user *, size_t __user *,
-   void __user *, size_t, struct ctl_table *);
-#endif
-
-
 #ifdef CONFIG_PROC_SYSCTL
 static int proc_do_cad_pid(struct ctl_table *table, int write, struct file 
*filp,
  void __user *buffer, size_t *lenp, loff_t *ppos);
@@ -1459,6 +1453,74 @@ void register_sysctl_root(struct ctl_table_root *root)
 }
 
 #ifdef CONFIG_SYSCTL_SYSCALL
+/* Perform the actual read/write of a sysctl table entry. */
+static int do_sysctl_strategy (struct ctl_table *table,
+   int __user *name, int nlen,
+   void __user *oldval, size_t __user *oldlenp,
+   void __user *newval, size_t newlen)
+{
+   int op = 0, rc;
+
+   if (oldval)
+   op |= 004;
+   if (newval) 
+   op |= 002;
+   if (sysctl_perm(table, op))
+   return -EPERM;
+
+   if (table-strategy) {
+   rc = table-strategy(table, name, nlen, oldval, oldlenp,
+newval, newlen);
+   if (rc  0)
+   return rc;
+   if (rc  0)
+   return 0;
+   }
+
+   /* If there is no strategy routine, or if the strategy returns
+* zero, proceed with automatic r/w */
+   if (table-data  table-maxlen) {
+   rc = sysctl_data(table, name, nlen, oldval, oldlenp,
+newval, newlen);
+   if (rc  0)
+   return rc;
+   }
+   return 0;
+}
+
+static int parse_table(int __user *name, int nlen,
+  void __user *oldval, size_t __user *oldlenp,
+  void __user *newval, size_t newlen,
+  struct ctl_table *table)
+{
+   int n;
+repeat:
+   if (!nlen)
+   return -ENOTDIR;
+   if (get_user(n, name))
+   return -EFAULT;
+   for ( ; table-ctl_name || table-procname; table++) {
+   if (!table-ctl_name)
+   continue;
+   if (n == table-ctl_name) {
+   int error;
+   if (table-child) {
+   if (sysctl_perm(table, 001))
+   return -EPERM;
+   name++;
+   nlen--;
+   table = table-child;
+   goto repeat;
+   }
+   error = do_sysctl_strategy(table, name, nlen,
+  oldval, oldlenp,
+  newval, newlen);
+   return error;
+   }
+   }
+   return -ENOTDIR;
+}
+
 int do_sysctl(int __user *name, int nlen, void __user *oldval, size_t __user 
*oldlenp,
   void __user *newval, size_t newlen)
 {
@@ -1531,76 +1593,6 @@ int sysctl_perm(struct ctl_table *table, int op)
return test_perm(table-mode, op);
 }
 
-#ifdef CONFIG_SYSCTL_SYSCALL
-static int parse_table(int __user *name, int nlen,
-  void __user *oldval, size_t __user *oldlenp,
-  void __user *newval, size_t newlen,
-  struct ctl_table *table)
-{
-   int n;
-repeat:
-   if (!nlen)
-   return -ENOTDIR;
-   if (get_user(n, name))
-   return -EFAULT;
-   for ( ; table-ctl_name || table-procname; table++) {
-   if (!table-ctl_name)
-   continue;
-   if

[PATCH net-2.6.26 4/5][SYSCTL]: Create the net sysctl root for RO tables.

2008-02-19 Thread Pavel Emelyanov

This root keeps ctl tables in one global list, but doesn't allow
for non-init namespaces to write into tables, stored in it.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---
 include/net/net_namespace.h |2 ++
 net/sysctl_net.c|   33 +
 2 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 28738b7..2930ae3 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -173,6 +173,8 @@ struct ctl_table;
 struct ctl_table_header;
 extern struct ctl_table_header *register_net_sysctl_table(struct net *net,
const struct ctl_path *path, struct ctl_table *table);
+extern struct ctl_table_header *register_init_net_ctl_table(
+   struct ctl_path *path, struct ctl_table *table);
 extern void unregister_net_sysctl_table(struct ctl_table_header *header);
 
 #endif /* __NET_NET_NAMESPACE_H */
diff --git a/net/sysctl_net.c b/net/sysctl_net.c
index 665e856..42c99e6 100644
--- a/net/sysctl_net.c
+++ b/net/sysctl_net.c
@@ -40,6 +40,30 @@ static struct ctl_table_root net_sysctl_root = {
.lookup = net_ctl_header_lookup,
 };
 
+static LIST_HEAD(net_ro_headers);
+
+static struct list_head *net_ctl_ro_header_lookup(struct ctl_table_root *root,
+   struct nsproxy *namespaces)
+{
+   return net_ro_headers;
+}
+
+static int net_ctl_ro_permissions(struct ctl_table_root *root,
+   struct nsproxy *ns, struct ctl_table *table)
+{
+   int mode;
+
+   mode = table-mode;
+   if (ns-net_ns != init_net)
+   mode = ~0222;
+   return mode;
+}
+
+static struct ctl_table_root net_sysctl_ro_root = {
+   .lookup = net_ctl_ro_header_lookup,
+   .permissions = net_ctl_ro_permissions,
+};
+
 static int sysctl_net_init(struct net *net)
 {
INIT_LIST_HEAD(net-sysctl_table_headers);
@@ -64,6 +88,7 @@ static __init int sysctl_init(void)
if (ret)
goto out;
register_sysctl_root(net_sysctl_root);
+   register_sysctl_root(net_sysctl_ro_root);
 out:
return ret;
 }
@@ -80,6 +105,14 @@ struct ctl_table_header *register_net_sysctl_table(struct 
net *net,
 }
 EXPORT_SYMBOL_GPL(register_net_sysctl_table);
 
+struct ctl_table_header *register_init_net_ctl_table(struct ctl_path *path,
+   struct ctl_table *table)
+{
+   return __register_sysctl_paths(net_sysctl_ro_root,
+   init_nsproxy, path, table);
+}
+EXPORT_SYMBOL_GPL(register_net_ro_ctl_table);
+
 void unregister_net_sysctl_table(struct ctl_table_header *header)
 {
return unregister_sysctl_table(header);
-- 
1.5.3.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-2.6.26 5/5][SYSCTL]: Move some net.core sysctls to RO root.

2008-02-19 Thread Pavel Emelyanov

There are many tables in net/core/sysctl_net_core.c that are
to be read-only. Current implementation duplicates this array
for each namespace just to clear the write bits in the
permissions mask.

Keep the writable tables to per-net ctl root and move the others
to the read-only one. This saves some memory in run time and
removes the... ugly code, that prepared the tables.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---
 net/core/sysctl_net_core.c |   35 +--
 1 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 130338f..4e530ce 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -125,14 +125,6 @@ static struct ctl_table net_core_table[] = {
 #endif /* CONFIG_XFRM */
 #endif /* CONFIG_NET */
{
-   .ctl_name   = NET_CORE_SOMAXCONN,
-   .procname   = somaxconn,
-   .data   = init_net.sysctl_somaxconn,
-   .maxlen = sizeof(int),
-   .mode   = 0644,
-   .proc_handler   = proc_dointvec
-   },
-   {
.ctl_name   = NET_CORE_BUDGET,
.procname   = netdev_budget,
.data   = netdev_budget,
@@ -151,6 +143,18 @@ static struct ctl_table net_core_table[] = {
{ .ctl_name = 0 }
 };
 
+static struct ctl_table netns_core_table[] = {
+   {
+   .ctl_name   = NET_CORE_SOMAXCONN,
+   .procname   = somaxconn,
+   .data   = init_net.sysctl_somaxconn,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec
+   },
+   { .ctl_name = 0 }
+};
+
 static __net_initdata struct ctl_path net_core_path[] = {
{ .procname = net, .ctl_name = CTL_NET, },
{ .procname = core, .ctl_name = NET_CORE, },
@@ -159,23 +163,17 @@ static __net_initdata struct ctl_path net_core_path[] = {
 
 static __net_init int sysctl_core_net_init(struct net *net)
 {
-   struct ctl_table *tbl, *tmp;
+   struct ctl_table *tbl;
 
net-sysctl_somaxconn = SOMAXCONN;
 
-   tbl = net_core_table;
+   tbl = netns_core_table;
if (net != init_net) {
-   tbl = kmemdup(tbl, sizeof(net_core_table), GFP_KERNEL);
+   tbl = kmemdup(tbl, sizeof(netns_core_table), GFP_KERNEL);
if (tbl == NULL)
goto err_dup;
 
-   for (tmp = tbl; tmp-procname; tmp++) {
-   if (tmp-data = (void *)init_net 
-   tmp-data  (void *)(init_net + 1))
-   tmp-data += (char *)net - (char *)init_net;
-   else
-   tmp-mode = ~0222;
-   }
+   tbl[0].data = net-sysctl_somaxconn;
}
 
net-sysctl_core_hdr = register_net_sysctl_table(net,
@@ -209,6 +207,7 @@ static __net_initdata struct pernet_operations 
sysctl_core_ops = {
 
 static __init int sysctl_core_init(void)
 {
+   register_init_net_ctl_table(net_core_path, net_core_table);
return register_pernet_subsys(sysctl_core_ops);
 }
 
-- 
1.5.3.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

On the Top of the World~『Textile Industry』

2008-02-19 Thread 『Taiwan News Express』



 http://www.taiwannews.com.tw/static/express/080219-01.pdf
http://www.taiwannews.com.tw/static/express/080219-02.pdf
http://www.taiwannews.com.tw/static/express/080219-03.pdf
http://www.taiwannews.com.tw/static/express/080219-04.pdf



 


--
Powered by PHPlist, www.phplist.com --



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net/8021q/vlan_dev.c - Use print_mac

2008-02-19 Thread Patrick McHardy

David Miller wrote:

From: David Miller [EMAIL PROTECTED]
Date: Mon, 18 Feb 2008 16:43:05 -0800 (PST)

I think we can fix this easily by using __attribute_const_
on the print_mac() declaration.  Let me play with that.

Actually it seems the 'pure' attribute is more important
here.  Although it's not semantically a perfect match,
what we need to tell the compiler is basically that:

1) the return value depends upon the inputs
2) if the input is not used, it's safe to avoid the call

and 'pure' accomplishes that without any unwanted side-effects.

I think this will not result in any unwanted over-optimization.
Because if the inputs change in any way GCC has to emit the
call.

Any objections?

This seems fine to me, thanks Dave.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] cls_u32 u32_classify()

2008-02-19 Thread jamal

On Mon, 2008-18-02 at 21:46 -0800, David Miller wrote:


 Can some u32 expert review this?

http://marc.info/?l=linux-netdevm=120178638323045w=2

cheers,
jamal

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-2.6.26 0/5][SYSCTL]: Make some sysctl RO in net namespaces.

2008-02-19 Thread Pavel Emelyanov

Hi, David.

Some time ago, when I made the net.core.somaxconn ctl per-namespace,
you told that the approach I used to make some ctl tables read-only
in namespace was not very good and said to improve it. After looking
at other code, I decided, that many ctl variables will have to be 
read-only in namespace, so we need some generic way to do this.

So, here's the patchset, that allows to create ctl tables, that are
read-only in some namespace in general (and in some net namespace in 
particular). I tried to make it work the way not to consume extra
memory at run time.

This patchset is related to net namespaces only, but on the other hand
it affects the core sysctl engine. What is your opinion about this set:
should I send these patches (or some of them) to Andrew instead and wait 
till it appears in mainline (and sequentially in net tree) or will you
accept this one in net-2.6.26?

Thanks,
Pavel

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-2.6.26 3/5][SYSCTL]: Add the -permissions callback on the ctl_table_root.

2008-02-19 Thread Pavel Emelyanov

When the table came from some other root, this root may affect
the table's permissions, depending on who is working with the
table.

The core hunk is at the bottom of this patch. All the rest is
just pushing the ctl_table_root argument up to the sysctl_perm
function.

This will be mostly (only?) used in the net sysctls.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---
 fs/proc/proc_sysctl.c  |4 ++--
 include/linux/sysctl.h |7 ++-
 kernel/sysctl.c|   25 ++---
 3 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 5e31585..5acc001 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -190,7 +190,7 @@ static ssize_t proc_sys_call_handler(struct file *filp, 
void __user *buf,
 * and won't be until we finish.
 */
error = -EPERM;
-   if (sysctl_perm(table, write ? MAY_WRITE : MAY_READ))
+   if (sysctl_perm(head-root, table, write ? MAY_WRITE : MAY_READ))
goto out;
 
/* careful: calling conventions are nasty here */
@@ -388,7 +388,7 @@ static int proc_sys_permission(struct inode *inode, int 
mask, struct nameidata *
goto out;
 
/* Use the permissions on the sysctl table entry */
-   error = sysctl_perm(table, mask);
+   error = sysctl_perm(head-root, table, mask);
 out:
sysctl_head_finish(head);
return error;
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 8e50196..3239561 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -945,11 +945,14 @@ enum
 /* For the /proc/sys support */
 struct ctl_table;
 struct nsproxy;
+struct ctl_table_root;
+
 extern struct ctl_table_header *sysctl_head_next(struct ctl_table_header 
*prev);
 extern struct ctl_table_header *__sysctl_head_next(struct nsproxy *namespaces,
struct ctl_table_header *prev);
 extern void sysctl_head_finish(struct ctl_table_header *prev);
-extern int sysctl_perm(struct ctl_table *table, int op);
+extern int sysctl_perm(struct ctl_table_root *root,
+   struct ctl_table *table, int op);
 
 typedef struct ctl_table ctl_table;
 
@@ -1049,6 +1052,8 @@ struct ctl_table_root {
struct list_head header_list;
struct list_head *(*lookup)(struct ctl_table_root *root,
   struct nsproxy *namespaces);
+   int (*permissions)(struct ctl_table_root *root,
+   struct nsproxy *namespaces, struct ctl_table *table);
 };
 
 /* struct ctl_table_header is used to maintain dynamic lists of
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c224cc5..8b8b582 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1454,7 +1454,8 @@ void register_sysctl_root(struct ctl_table_root *root)
 
 #ifdef CONFIG_SYSCTL_SYSCALL
 /* Perform the actual read/write of a sysctl table entry. */
-static int do_sysctl_strategy (struct ctl_table *table,
+static int do_sysctl_strategy (struct ctl_table_root *root,
+   struct ctl_table *table,
int __user *name, int nlen,
void __user *oldval, size_t __user *oldlenp,
void __user *newval, size_t newlen)
@@ -1465,7 +1466,7 @@ static int do_sysctl_strategy (struct ctl_table *table,
op |= 004;
if (newval) 
op |= 002;
-   if (sysctl_perm(table, op))
+   if (sysctl_perm(root, table, op))
return -EPERM;
 
if (table-strategy) {
@@ -1491,6 +1492,7 @@ static int do_sysctl_strategy (struct ctl_table *table,
 static int parse_table(int __user *name, int nlen,
   void __user *oldval, size_t __user *oldlenp,
   void __user *newval, size_t newlen,
+  struct ctl_table_root *root,
   struct ctl_table *table)
 {
int n;
@@ -1505,14 +1507,14 @@ repeat:
if (n == table-ctl_name) {
int error;
if (table-child) {
-   if (sysctl_perm(table, 001))
+   if (sysctl_perm(root, table, 001))
return -EPERM;
name++;
nlen--;
table = table-child;
goto repeat;
}
-   error = do_sysctl_strategy(table, name, nlen,
+   error = do_sysctl_strategy(root, table, name, nlen,
   oldval, oldlenp,
   newval, newlen);
return error;
@@ -1538,7 +1540,8 @@ int do_sysctl(int __user *name, int nlen, void __user 
*oldval, size_t __user *ol
for (head = sysctl_head_next(NULL); head;

[PATCH] [NETNS]: Namespace leak in pneigh_lookup.

2008-02-19 Thread Denis V. Lunev

release_net is missed on the error path in pneigh_lookup.

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
---
 net/core/neighbour.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 7bb6a9a..174e29e 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -507,6 +507,7 @@ struct pneigh_entry * pneigh_lookup(struct neigh_table *tbl,
if (tbl-pconstructor  tbl-pconstructor(n)) {
if (dev)
dev_put(dev);
+   release_net(net);
kfree(n);
n = NULL;
goto out;
-- 
1.5.3.rc5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [NETFILTER]: Introduce nf_inet_address

2008-02-19 Thread David Woodhouse


On Tue, 2008-01-29 at 13:16 +, Linux Kernel Mailing List wrote:
 Commit: 643a2c15a407faf08101a20e1a3461160711899d

 [NETFILTER]: Introduce nf_inet_address
 
 A few netfilter modules provide their own union of IPv4 and IPv6
 address storage. Will unify that in this patch series.
 
 (1/4): Rename union nf_conntrack_address to union nf_inet_addr and
 move it to x_tables.h.
 
 Signed-off-by: Jan Engelhardt [EMAIL PROTECTED]
 Signed-off-by: Patrick McHardy [EMAIL PROTECTED]
 Signed-off-by: David S. Miller [EMAIL PROTECTED]

 ...

 --- a/include/linux/netfilter.h
 +++ b/include/linux/netfilter.h
 @@ -48,6 +48,12 @@ enum nf_inet_hooks {
   NF_INET_NUMHOOKS
  };
  
 +union nf_inet_addr {
 + u_int32_t   all[4];
 + __be32  ip;
 + __be32  ip6[4];
 +};
 +
  #ifdef __KERNEL__
  #ifdef CONFIG_NETFILTER

This breaks the busybox build:

CC  ipsvd/tcpudp.o
In file included from /usr/include/linux/netfilter_ipv4.h:8,
 from ipsvd/tcpudp.c:33:
/usr/include/linux/netfilter.h:40: error: expected specifier-qualifier-list 
before 'u_int32_t'

What is this 'u_int32_t' nonsense anyway?

If a user-visible header is likely to be included by libc directly from
a 'standard' header, it may not require stdint.h. Therefore it should
use the system-specific types such as '__u32'.

If it isn't likely to be included by libc, which is the case for
netfilter, then it might as well just use the proper C types. Those who
are stuck on C89 or earlier might still prefer to use '__u32' even when
there's no need for it, but 'u_int32_t' is just silly. I suspect we
should eradicate it. I couldn't make busybox work with it --
__BIT_TYPES_DEFINED__ is defined in sys/types.h and prevents the
definitions of u_int32_t et al from appearing in linux/types.h. And if
I include linux/types.h first, other things break.

A later commit adds struct in_addr and struct in6_addr to this union
too, which breaks busybox even harder.

How is this supposed to be used in userspace? Or is it even supposed to
be exposed?

-- 
dwmw2

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/17 net-2.6.26] [NETFILTER]: Consolidate masq_inet_event and masq_device_event.

2008-02-19 Thread Patrick McHardy


Denis V. Lunev wrote:

They do exactly the same job.

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
---
 net/ipv4/netfilter/ipt_MASQUERADE.c |   14 ++
 1 files changed, 2 insertions(+), 12 deletions(-)


Looks fine.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: protocol 0300 is buggy spam in dmesg when injectingcapturing on same interface

2008-02-19 Thread Pekka Pietikainen

On Mon, Feb 18, 2008 at 05:39:03PM +0200, Pekka Pietikainen wrote:
 When playing with some L2 level fuzzing I started getting lots of
 protocol 0300 is buggy, dev eth3 spew in dmesg. That interface is also
 capturing the traffic that's being sent, that's probably why the
 dev_queue_xmit_nit codepath is getting called in the first place.
 
 Any ideas? Add a If it came from AF_PACKET, don't print out anything to
 that if-statement?
I'm probably just plastering over a bug in af_packet.c with this one,
but the following patch should make it shut up. The printk definately
needs a ntohs for skb2-protocol, took me a while to figure out where
the 0300 even came from :-) 

Signed-off-by: Pekka Pietikainen [EMAIL PROTECTED]

diff -up linux-2.6.24.i686/net/core/dev.c.orig linux-2.6.24.i686/net/core/dev.c
--- linux-2.6.24.i686/net/core/dev.c.orig   2008-02-19 15:22:12.0 
+0200
+++ linux-2.6.24.i686/net/core/dev.c2008-02-19 15:29:37.0 +0200
@@ -1262,10 +1262,11 @@ static void dev_queue_xmit_nit(struct sk
 
if (skb_network_header(skb2)  skb2-data ||
skb2-network_header  skb2-tail) {
-   if (net_ratelimit())
-   printk(KERN_CRIT protocol %04x is 
-  buggy, dev %s\n,
-  skb2-protocol, dev-name);
+   if (skb2-protocol != htons(ETH_P_ALL) 
+   net_ratelimit())
+   pr_crit(protocol %04x is buggy, dev 
%s\n,
+   ntohs(skb2-protocol),
+   dev-name);
skb_reset_network_header(skb2);
}
 


-- 
Pekka Pietikainen
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] [NETLABEL] Minor cleanup: remove unused method definition

2008-02-19 Thread Rami Rosen

Hi,

This patch removes definition of netlbl_cfg_cipsov4_del() method in
netlabel/netlabel_kapi.c and in include/net/netlabel.h as it is not used.


Regards,
Rami Rosen


Signed-off-by: Rami Rosen [EMAIL PROTECTED]
diff --git a/include/net/netlabel.h b/include/net/netlabel.h
index 0ca67d7..911d8c6 100644
--- a/include/net/netlabel.h
+++ b/include/net/netlabel.h
@@ -350,7 +350,6 @@ int netlbl_cfg_cipsov4_add(struct cipso_v4_doi *doi_def,
 int netlbl_cfg_cipsov4_add_map(struct cipso_v4_doi *doi_def,
   const char *domain,
   struct netlbl_audit *audit_info);
-int netlbl_cfg_cipsov4_del(u32 doi, struct netlbl_audit *audit_info);
 
 /*
  * LSM security attribute operations
@@ -408,11 +407,6 @@ static inline int netlbl_cfg_cipsov4_add_map(struct 
cipso_v4_doi *doi_def,
 {
return -ENOSYS;
 }
-static inline int netlbl_cfg_cipsov4_del(u32 doi,
-struct netlbl_audit *audit_info)
-{
-   return -ENOSYS;
-}
 static inline int netlbl_secattr_catmap_walk(
  struct netlbl_lsm_secattr_catmap *catmap,
  u32 offset)
diff --git a/net/netlabel/netlabel_kapi.c b/net/netlabel/netlabel_kapi.c
index 39793a1..02b268d 100644
--- a/net/netlabel/netlabel_kapi.c
+++ b/net/netlabel/netlabel_kapi.c
@@ -203,21 +203,6 @@ cfg_cipsov4_add_map_failure:
return ret_val;
 }
 
-/**
- * netlbl_cfg_cipsov4_del - Removean existing CIPSOv4 DOI definition
- * @doi: the CIPSO DOI value
- * @audit_info: NetLabel audit information
- *
- * Description:
- * Removes an existing CIPSOv4 DOI definition from the NetLabel subsystem.
- * Returns zero on success, negative values on failure.
- *
- */
-int netlbl_cfg_cipsov4_del(u32 doi, struct netlbl_audit *audit_info)
-{
-   return cipso_v4_doi_remove(doi, audit_info, netlbl_cipsov4_doi_free);
-}
-
 /*
  * Security Attribute Functions
  */

Re: [NETFILTER]: Introduce nf_inet_address

2008-02-19 Thread David Woodhouse

On Tue, 2008-02-19 at 15:01 +0100, Patrick McHardy wrote:
 David Woodhouse wrote:
  +union nf_inet_addr {
  +  u_int32_t   all[4];
  +  __be32  ip;
  +  __be32  ip6[4];
  +};
  +
   #ifdef __KERNEL__
   #ifdef CONFIG_NETFILTER
  
  This breaks the busybox build:
  
  CC  ipsvd/tcpudp.o
  In file included from /usr/include/linux/netfilter_ipv4.h:8,
   from ipsvd/tcpudp.c:33:
  /usr/include/linux/netfilter.h:40: error: expected specifier-qualifier-list 
  before 'u_int32_t'
  
  What is this 'u_int32_t' nonsense anyway?
  
  If a user-visible header is likely to be included by libc directly from
  a 'standard' header, it may not require stdint.h. Therefore it should
  use the system-specific types such as '__u32'.
 
 Right, I queued this patch to fix it.

That does the trick -- but are we using u_int32_t elsewhere in
user-visible headers? Does it work there? How?

  A later commit adds struct in_addr and struct in6_addr to this union
  too, which breaks busybox even harder.
 
 Thats odd, the iptables headers have always used struct in_addr and
 struct in6_addr in struct ipt_ip/struct ip6t_ip6, which are also
 used by userspace. What is ipsvd/tcpudp.c? I couldn't find it in
 the Debian busybox source.

It's in busybox 1.9.1. Just including netinet/in.h seems to be
sufficient to make it happy again. I wonder if netfilter.h should
include that for itself?

-- 
dwmw2

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: AW: AW: Problem receiving multicast with kernel.2.6.24 #3

2008-02-19 Thread Patrick McHardy


Reither Robert wrote:

Visit AVD on prolight+sound in Frankfurt from 12.-15. March 2008 - Hall 8.0, 
Stand G16




OK, found one solution (but i think, this should not the normal way)

MC joining first the VLAN device (eth0.3) and than the physical (eth0) does the 
trick, 100% success ...

Doing it the opposite way, gives the same problem as before ...

So somehow joining the VLAN interface disrupts the join to the physical one ...

Ideas ?


Does this patch help?
commit 6548b91f39381b2c5f02f99c14734546354bff89
Author: Jorge Boncompte [DTI2] [EMAIL PROTECTED]
Date:   Mon Feb 18 16:02:01 2008 +0100

[NET]: Messed multicast lists after dev_mc_sync/unsync

Commit a0a400d79e3dd7843e7e81baa3ef2957bdc292d0 from you
introduced a new field da_synced to struct dev_addr_list that is
not properly initialized to 0. So when any of the current users (8021q,
macvlan, mac80211) calls dev_mc_sync/unsync they mess the address
list for both devices.

The attached patch fixed it for me and avoid future problems.

Signed-off-by: Jorge Boncompte [DTI2] [EMAIL PROTECTED]
Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

diff --git a/net/core/dev.c b/net/core/dev.c
index 6cfc123..9516105 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2900,7 +2900,7 @@ int __dev_addr_add(struct dev_addr_list **list, int 
*count,
}
}
 
-   da = kmalloc(sizeof(*da), GFP_ATOMIC);
+   da = kzalloc(sizeof(*da), GFP_ATOMIC);
if (da == NULL)
return -ENOMEM;
memcpy(da-da_addr, addr, alen);

Re: [NETFILTER]: Introduce nf_inet_address

2008-02-19 Thread Patrick McHardy


David Woodhouse wrote:

+union nf_inet_addr {
+   u_int32_t   all[4];
+   __be32  ip;
+   __be32  ip6[4];
+};
+
 #ifdef __KERNEL__
 #ifdef CONFIG_NETFILTER


This breaks the busybox build:

CC  ipsvd/tcpudp.o
In file included from /usr/include/linux/netfilter_ipv4.h:8,
 from ipsvd/tcpudp.c:33:
/usr/include/linux/netfilter.h:40: error: expected specifier-qualifier-list 
before 'u_int32_t'

What is this 'u_int32_t' nonsense anyway?

If a user-visible header is likely to be included by libc directly from
a 'standard' header, it may not require stdint.h. Therefore it should
use the system-specific types such as '__u32'.


Right, I queued this patch to fix it.


If it isn't likely to be included by libc, which is the case for
netfilter, then it might as well just use the proper C types. Those who
are stuck on C89 or earlier might still prefer to use '__u32' even when
there's no need for it, but 'u_int32_t' is just silly. I suspect we
should eradicate it.


Yes, some more consitency would be nice. So far the consensus was
to not use it in new code, but keep using it in subsystems like
netfilter that (almost) consistently use it everywhere.


I couldn't make busybox work with it --
__BIT_TYPES_DEFINED__ is defined in sys/types.h and prevents the
definitions of u_int32_t et al from appearing in linux/types.h. And if
I include linux/types.h first, other things break.

A later commit adds struct in_addr and struct in6_addr to this union
too, which breaks busybox even harder.


Thats odd, the iptables headers have always used struct in_addr and
struct in6_addr in struct ipt_ip/struct ip6t_ip6, which are also
used by userspace. What is ipsvd/tcpudp.c? I couldn't find it in
the Debian busybox source.


How is this supposed to be used in userspace? Or is it even supposed to
be exposed?


Yes, its meant to replace many self-made AF-independant address
representations.
commit 6f2e68f81457d72bdb14a7ead305a1da06e78893
Author: Patrick McHardy [EMAIL PROTECTED]
Date:   Tue Feb 19 14:54:36 2008 +0100

[NETFILTER]: Use __u32 in struct nf_inet_addr

As reported by David Woodhouse [EMAIL PROTECTED], using u_int32_t in
struct nf_inet_addr breaks the busybox build. Fix by using __u32.

Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index d74e79b..b74b615 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -51,7 +51,7 @@ enum nf_inet_hooks {
 };
 
 union nf_inet_addr {
-   u_int32_t   all[4];
+   __u32   all[4];
__be32  ip;
__be32  ip6[4];
struct in_addr  in;

Re: [ANNOUNCE] ESFQ -- SFQ patches for Linux 2.6.24

2008-02-19 Thread Patrick McHardy


David Miller wrote:

From: Brock Noland [EMAIL PROTECTED]
Date: Sat, 9 Feb 2008 20:30:58 -0600


Is this going to be merged anytime soon?


If it gets submitted to the proper mailing list, it might.
'linux-net' is for user questions, it is not where the networking
developers hang out, 'netdev' is.

And you have to post patches for review, not URL's point to
the patches.  It has to be int he email, in an applyable form
so people can review the thing properly.



Since SFQ is not exactly simple and I needed something like this
myself, I followed Paul's suggestion and added a new scheduler
(DRR) for this with more flexible limits.

I'll rediff against net-2.6.26 within the next days and send
a final version for review (anyone interested is welcome to
already review this version of course :).

commit 13d0cc64d0f7fed945c357cf4ca43330c8f95ad2
Author: Patrick McHardy [EMAIL PROTECTED]
Date:   Mon Feb 18 22:21:55 2008 +0100

[NET_SCHED]: Add DRR scheduler

Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index dbb7ac3..2fca9c4 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -482,4 +482,20 @@ struct tc_netem_corrupt
 
 #define NETEM_DIST_SCALE   8192
 
+/* DRR */
+
+enum
+{
+   TCA_DRR_UNSPEC,
+   TCA_DRR_QUANTUM,
+   __TCA_DRR_MAX
+};
+
+#define TCA_DRR_MAX(__TCA_DRR_MAX - 1)
+
+struct tc_drr_stats
+{
+   s32 deficit;
+};
+
 #endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 82adfe6..7e1ab99 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -196,6 +196,9 @@ config NET_SCH_NETEM
 
  If unsure, say N.
 
+config NET_SCH_DRR
+   tristate DRR scheduler
+
 config NET_SCH_INGRESS
tristate Ingress Qdisc
depends on NET_CLS_ACT
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 1d2b0f7..b055f74 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -28,6 +28,7 @@ obj-$(CONFIG_NET_SCH_TEQL)+= sch_teql.o
 obj-$(CONFIG_NET_SCH_PRIO) += sch_prio.o
 obj-$(CONFIG_NET_SCH_ATM)  += sch_atm.o
 obj-$(CONFIG_NET_SCH_NETEM)+= sch_netem.o
+obj-$(CONFIG_NET_SCH_DRR)  += sch_drr.o
 obj-$(CONFIG_NET_CLS_U32)  += cls_u32.o
 obj-$(CONFIG_NET_CLS_ROUTE4)   += cls_route.o
 obj-$(CONFIG_NET_CLS_FW)   += cls_fw.o
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
new file mode 100644
index 000..aa241b5
--- /dev/null
+++ b/net/sched/sch_drr.c
@@ -0,0 +1,534 @@
+/*
+ * net/sched/sch_drr.c Deficit Round Robin scheduler
+ *
+ * Copyright (c) 2008 Patrick McHardy [EMAIL PROTECTED]
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ */
+
+#include linux/module.h
+#include linux/init.h
+#include linux/errno.h
+#include linux/netdevice.h
+#include linux/pkt_sched.h
+#include net/sch_generic.h
+#include net/pkt_sched.h
+#include net/pkt_cls.h
+
+struct drr_class {
+   struct hlist_node   hlist;
+   u32 classid;
+   unsigned intrefcnt;
+
+   struct gnet_stats_basic bstats;
+   struct gnet_stats_queue qstats;
+   struct gnet_stats_rate_est  rate_est;
+   struct list_headalist;
+   struct Qdisc *  qdisc;
+
+   u32 quantum;
+   s32 deficit;
+};
+
+#define DRR_HSIZE  16
+
+struct drr_sched {
+   struct list_headactive;
+   struct tcf_proto *  filter_list;
+   unsigned intfilter_cnt;
+   struct hlist_head   clhash[DRR_HSIZE];
+   struct sk_buff *requeue;
+};
+
+static unsigned int drr_hash(u32 h)
+{
+   h ^= h  8;
+   h ^= h  4;
+
+   return h  (DRR_HSIZE - 1);
+}
+
+static struct drr_class *drr_find_class(struct Qdisc *sch, u32 classid)
+{
+   struct drr_sched *q = qdisc_priv(sch);
+   struct drr_class *cl;
+   struct hlist_node *n;
+
+   hlist_for_each_entry(cl, n, q-clhash[drr_hash(classid)], hlist) {
+   if (cl-classid == classid)
+   return cl;
+   }
+   return NULL;
+}
+
+static void drr_purge_queue(struct drr_class *cl)
+{
+   unsigned int len = cl-qdisc-q.qlen;
+
+   qdisc_reset(cl-qdisc);
+   qdisc_tree_decrease_qlen(cl-qdisc, len);
+}
+
+static const struct nla_policy drr_policy[TCA_DRR_MAX + 1] = {
+   [TCA_DRR_QUANTUM]   = { .type = NLA_U32 },
+};
+
+static int drr_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
+   struct nlattr **tca, unsigned long *arg)
+{
+   struct drr_sched *q = qdisc_priv(sch);
+   struct drr_class *cl = (struct drr_class *)*arg;
+   struct nlattr

AW: AW: Problem receiving multicast with kernel.2.6.24 #3

2008-02-19 Thread Reither Robert

Visit AVD on prolight+sound in Frankfurt from 12.-15. March 2008 - Hall 8.0, 
Stand G16




OK, found one solution (but i think, this should not the normal way)

MC joining first the VLAN device (eth0.3) and than the physical (eth0) does the 
trick, 100% success ...

Doing it the opposite way, gives the same problem as before ...

So somehow joining the VLAN interface disrupts the join to the physical one ...

Ideas ?

Robert
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [NETFILTER]: Introduce nf_inet_address

2008-02-19 Thread Patrick McHardy


David Woodhouse wrote:

On Tue, 2008-02-19 at 15:01 +0100, Patrick McHardy wrote:

David Woodhouse wrote:

+union nf_inet_addr {
+   u_int32_t   all[4];
+   __be32  ip;
+   __be32  ip6[4];
+};
+
 #ifdef __KERNEL__
 #ifdef CONFIG_NETFILTER

This breaks the busybox build:

CC  ipsvd/tcpudp.o
In file included from /usr/include/linux/netfilter_ipv4.h:8,
 from ipsvd/tcpudp.c:33:
/usr/include/linux/netfilter.h:40: error: expected specifier-qualifier-list 
before 'u_int32_t'

What is this 'u_int32_t' nonsense anyway?

If a user-visible header is likely to be included by libc directly from
a 'standard' header, it may not require stdint.h. Therefore it should
use the system-specific types such as '__u32'.

Right, I queued this patch to fix it.


That does the trick -- but are we using u_int32_t elsewhere in
user-visible headers? Does it work there? How?



Its used in nearly every ip_tables header file. Those are most likely
not included by glibc and iptables includes sys/types.h. Besides
iptables I'm only aware of a perl module that uses these files,
which is probably also including sys/types.h.


A later commit adds struct in_addr and struct in6_addr to this union
too, which breaks busybox even harder.

Thats odd, the iptables headers have always used struct in_addr and
struct in6_addr in struct ipt_ip/struct ip6t_ip6, which are also
used by userspace. What is ipsvd/tcpudp.c? I couldn't find it in
the Debian busybox source.


It's in busybox 1.9.1. Just including netinet/in.h seems to be
sufficient to make it happy again. I wonder if netfilter.h should
include that for itself?



That would break iptables compilation, which already includes
linux/in.h in some files. I guess the best fix for now is to
include netinet/in.h in busybox and long-term clean this up
properly.


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

AW: AW: AW: Problem receiving multicast with kernel.2.6.24, REAL solution

2008-02-19 Thread Reither Robert

Visit AVD on prolight+sound in Frankfurt from 12.-15. March 2008 - Hall 8.0, 
Stand G16




Hurray, your patch does the trick for me !!!

Now join/leave und receiving of multicasts work like it should !!

May i kiss you Patrick, this saves my week ;-)

Hope we see a minor kernel release with this patch very soon ...

Greetings
Robert

-Ursprüngliche Nachricht-
Von: Patrick McHardy [mailto:[EMAIL PROTECTED]
Gesendet: Dienstag, 19. Februar 2008 15:03
An: Reither Robert
Cc: David Stevens; netdev@vger.kernel.org
Betreff: Re: AW: AW: Problem receiving multicast with kernel.2.6.24 #3

 
 OK, found one solution (but i think, this should not the normal way)
 
 MC joining first the VLAN device (eth0.3) and than the physical (eth0) does 
 the trick, 100% success ...
 
 Doing it the opposite way, gives the same problem as before ...
 
 So somehow joining the VLAN interface disrupts the join to the physical one 
 ...
 
 Ideas ?

Does this patch help?
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/17 net-2.6.26] [NETNS]: Default arp parameters lookup.

2008-02-19 Thread Daniel Lezcano


Daniel Lezcano wrote:

Denis V. Lunev wrote:

On Tue, 2008-02-19 at 10:51 +0100, Daniel Lezcano wrote:

Denis V. Lunev wrote:

On Tue, 2008-02-19 at 10:14 +0100, Daniel Lezcano wrote:

Denis V. Lunev wrote:

Default ARP parameters should be findable regardless of the context.
Required to make inetdev_event working.

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
---
 net/core/neighbour.c |4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index c895ad4..45ed620 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1275,9 +1275,7 @@ static inline struct neigh_parms 
*lookup_neigh_params(struct neigh_table *tbl,

 struct neigh_parms *p;

 for (p = tbl-parms; p; p = p-next) {
-if (p-net != net)
-continue;
-if ((p-dev  p-dev-ifindex == ifindex) ||
+if ((p-dev  p-dev-ifindex == ifindex  p-net == 
net) ||

 (!p-dev  !ifindex))
 return p;
 }

If the values are:
p-dev == NULL
ifindex == 0
p-net != net

The parms should not be taken into account and the looping must 
continue. But with this modification it is not the case, if we 
specify parms ifindex == 0, the first parms with the dev field set 
to NULL will be taken belonging or not to the right net.
They should be taken. In the other case inetdev_event will fail for 
sure

in the middle. You could check.

These are ARP defaults and I do not see a problem for now to get them.
Because there is a parms default per namespace. So several instances 
of them per nd table. That was the initial approach with Eric's 
patchset.




These changes are not in mainstream and I do not want to touch ARP as
this is not a simple thing. In reality ARP will be needed only when
we'll have a real device inside a namespace.

Right now I prefer to have minimal set of working changes to finish IP
and upper layers.


core/neighbour.c is a common part between several protocols, especially 
ipv4 and ipv6. If you modify this function just to fit your need in the 
arp that will block me for ipv6 until you make parms default per 
namespace. So please, find another way to do that, perhaps just add a 
helper function.


I suggest you do parms default per namespace first, it is quite small 
and easy :)


Just let me the time to send the copy-parms-default function.


Ok, so after a long discussion with Denis about this patch, I will 
change the ipv6 code to share the neigh-parms. It is not a problem.
Having the behavior of the neighbour subsystem per namespace is not a 
must-have.


Acked-by: Daniel Lezcano [EMAIL PROTECTED]
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[NET]: Messed multicast lists after dev_mc_sync/unsync

2008-02-19 Thread Patrick McHardy

 commit 6548b91f39381b2c5f02f99c14734546354bff89
Author: Jorge Boncompte [DTI2] [EMAIL PROTECTED]
Date:   Mon Feb 18 16:02:01 2008 +0100

[NET]: Messed multicast lists after dev_mc_sync/unsync

Commit a0a400d79e3dd7843e7e81baa3ef2957bdc292d0 from you
introduced a new field da_synced to struct dev_addr_list that is
not properly initialized to 0. So when any of the current users (8021q,
macvlan, mac80211) calls dev_mc_sync/unsync they mess the address
list for both devices.

The attached patch fixed it for me and avoid future problems.

Signed-off-by: Jorge Boncompte [DTI2] [EMAIL PROTECTED]
Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

diff --git a/net/core/dev.c b/net/core/dev.c
index 6cfc123..9516105 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2900,7 +2900,7 @@ int __dev_addr_add(struct dev_addr_list **list, int 
*count,
}
}
 
-   da = kmalloc(sizeof(*da), GFP_ATOMIC);
+   da = kzalloc(sizeof(*da), GFP_ATOMIC);
if (da == NULL)
return -ENOMEM;
memcpy(da-da_addr, addr, alen);

Re: [net-2.6][DRIVER][VETH] fix dev refcount race

2008-02-19 Thread Pavel Emelyanov

Daniel Lezcano wrote:
 Subject: veth fix dev refcount race
 From: Daniel Lezcano [EMAIL PROTECTED]

 When deleting the veth driver, veth_close calls netif_carrier_off
 for the two extremities of the network device. netif_carrier_off on
 the peer device will fire an event and hold a reference on the peer
 device. Just after, the peer is unregistered taking the rtnl_lock while
 the linkwatch_event is scheduled. If __linkwatch_run_queue does not
 occurs before the unregistering, unregister_netdevice will wait for
 the dev refcount to reach zero holding the rtnl_lock and linkwatch_event

Brr... AFAIS the unregister process waits for refcount without
the rtnl_lock. The run-todo was invented for this. No?

 will wait for the rtnl_lock and hold the dev refcount.

 Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]
 ---
  drivers/net/veth.c |   53 
 -
  1 file changed, 40 insertions(+), 13 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH][SCTP]: Pick up an orphaned sctp_sockets_allocated counter.

2008-02-19 Thread Pavel Emelyanov

This counter is currently write-only.

Drawing an analogy with the similar tcp counter, I think
that this one should be pointed by the sockets_allocated
members of sctp_prot and sctpv6_prot.

Or should it be instead removed at all?

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index d47d578..44797ad 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -6488,6 +6488,7 @@ struct proto sctp_prot = {
.memory_pressure = sctp_memory_pressure,
.enter_memory_pressure = sctp_enter_memory_pressure,
.memory_allocated = sctp_memory_allocated,
+   .sockets_allocated = sctp_sockets_allocated,
REF_PROTO_INUSE(sctp)
 };
 
@@ -6521,6 +6522,7 @@ struct proto sctpv6_prot = {
.memory_pressure = sctp_memory_pressure,
.enter_memory_pressure = sctp_enter_memory_pressure,
.memory_allocated = sctp_memory_allocated,
+   .sockets_allocated = sctp_sockets_allocated,
REF_PROTO_INUSE(sctpv6)
 };
 #endif /* defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) */
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Lksctp-developers] [PATCH][SCTP]: Pick up an orphaned sctp_sockets_allocated counter.

2008-02-19 Thread Vlad Yasevich


Pavel Emelyanov wrote:

This counter is currently write-only.

Drawing an analogy with the similar tcp counter, I think
that this one should be pointed by the sockets_allocated
members of sctp_prot and sctpv6_prot.

Or should it be instead removed at all?

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]


Ack.  Looks like it got missed.  It should be added.

-vlad



---

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index d47d578..44797ad 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -6488,6 +6488,7 @@ struct proto sctp_prot = {
.memory_pressure = sctp_memory_pressure,
.enter_memory_pressure = sctp_enter_memory_pressure,
.memory_allocated = sctp_memory_allocated,
+   .sockets_allocated = sctp_sockets_allocated,
REF_PROTO_INUSE(sctp)
 };
 
@@ -6521,6 +6522,7 @@ struct proto sctpv6_prot = {

.memory_pressure = sctp_memory_pressure,
.enter_memory_pressure = sctp_enter_memory_pressure,
.memory_allocated = sctp_memory_allocated,
+   .sockets_allocated = sctp_sockets_allocated,
REF_PROTO_INUSE(sctpv6)
 };
 #endif /* defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) */

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Lksctp-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/lksctp-developers



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [2.6.25-rc2] e100: Trying to free already-free IRQ 11 during suspend ...

2008-02-19 Thread Kok, Auke

Andrew Morton wrote:
 On Sun, 17 Feb 2008 15:36:50 +0300 Andrey Borzenkov [EMAIL PROTECTED] wrote:
 
 ... and possibly reboot/poweroff (it flows by too fast to be legible).

 [ 8803.850634] ACPI: Preparing to enter system sleep state S3
 [ 8803.853141] Suspending console(s)
 [ 8805.287505] serial 00:09: disabled
 [ 8805.291564] Trying to free already-free IRQ 11
 [ 8805.291579] Pid: 6920, comm: pm-suspend Not tainted 2.6.25-rc2-1avb #2
 [ 8805.291628]  [c0152127] free_irq+0xb7/0x130
 [ 8805.291675]  [c024bd80] e100_suspend+0xc0/0x100
 [ 8805.291724]  [c01eaa36] pci_device_suspend+0x26/0x70
 [ 8805.291747]  [c0243674] suspend_device+0x94/0xd0
 [ 8805.291763]  [c02439a3] device_suspend+0x153/0x240
 [ 8805.291784]  [c014314f] suspend_devices_and_enter+0x4f/0xf0
 [ 8805.291808]  [c0143a5f] ? freeze_processes+0x3f/0x80
 [ 8805.291825]  [c01432fa] enter_state+0xaa/0x140
 [ 8805.291840]  [c014341f] state_store+0x8f/0xd0
 [ 8805.291852]  [c0143390] ? state_store+0x0/0xd0
 [ 8805.291866]  [c01d3404] kobj_attr_store+0x24/0x30
 [ 8805.291901]  [c01b547b] sysfs_write_file+0xbb/0x110
 [ 8805.291936]  [c0177d79] vfs_write+0x99/0x130
 [ 8805.291963]  [c01b53c0] ? sysfs_write_file+0x0/0x110
 [ 8805.291979]  [c01782fd] sys_write+0x3d/0x70
 [ 8805.291998]  [c010409a] sysenter_past_esp+0x5f/0xa5
 [ 8805.292038]  ===
 [ 8805.347640] ACPI: PCI interrupt for device :00:06.0 disabled
 [ 8805.361128] ACPI: PCI interrupt for device :00:02.0 disabled
 [ 8805.376670]  hwsleep-0322 [00] enter_sleep_state : Entering sleep 
 state [S3]
 [ 8805.376670] Back to C!

 Interface is unused normally (only for netconsole sometimes). dmesg and 
 config
 attached.
 
 Does reverting this:
 
 commit 8543da6672b0994921f014f2250e27ae81645580
 Author: Auke Kok [EMAIL PROTECTED]
 Date:   Wed Dec 12 16:30:42 2007 -0800
 
 e100: free IRQ to remove warningwhenrebooting
 
 with this patch:
 
 --- a/drivers/net/e100.c~revert-1
 +++ a/drivers/net/e100.c
 @@ -2804,9 +2804,8 @@ static int e100_suspend(struct pci_dev *
   pci_enable_wake(pdev, PCI_D3cold, 0);
   }
  
 - free_irq(pdev-irq, netdev);
 -
   pci_disable_device(pdev);
 + free_irq(pdev-irq, netdev);
   pci_set_power_state(pdev, PCI_D3hot);
  
   return 0;
 @@ -2848,8 +2847,6 @@ static void e100_shutdown(struct pci_dev
   pci_enable_wake(pdev, PCI_D3cold, 0);
   }
  
 - free_irq(pdev-irq, netdev);
 -
   pci_disable_device(pdev);
   pci_set_power_state(pdev, PCI_D3hot);
  }
 _
 
 fix it?
 
 Hmm ... after resume device has disappeared at all ...

 {pts/1}% cat /proc/interrupts
CPU0
   0:1290492XT-PIC-XTtimer
   1:   6675XT-PIC-XTi8042
   2:  0XT-PIC-XTcascade
   3:  2XT-PIC-XT
   4:  2XT-PIC-XT
   5:  3XT-PIC-XT
   7:  4XT-PIC-XTirda0
   8:  0XT-PIC-XTrtc0
   9:583XT-PIC-XTacpi
  10:  2XT-PIC-XT
  11:  31483XT-PIC-XTyenta, yenta, yenta, ohci_hcd:usb1, ALI 
 5451, pcmcia0.0
  12:  28070XT-PIC-XTi8042
  14:  21705XT-PIC-XTide0
  15:  82123XT-PIC-XTide1
 NMI:  0   Non-maskable interrupts
 TRM:  0   Thermal event interrupts
 SPU:  0   Spurious interrupts
 ERR:  0
 
 I hope that's not a separate bug...

I'll take a look at this as well. thanks for reporting.

Auke
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] [2.6.25-rc2] igb: Fix panic with NICs with 1000BASE-SX PHY

2008-02-19 Thread Andy Gospodarek

From: Bill Hayes [EMAIL PROTECTED]

Auke, Jeff and David,

This patch eliminates a kernel panic with the igb driver in 2.6.25-rc2 when 
running on a Intel 82575 Ethernet controller with a 1000BASE-SX PHY.  The 
panic does not happen with the 1000BASE-T PHY, only with a SX connection.  

Signed-off-by: Bill Hayes [EMAIL PROTECTED]
Signed-off-by: Andy Gospodarek [EMAIL PROTECTED]

 drivers/net/igb/igb_main.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index d4eb8e2..666c8f0 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -843,7 +843,8 @@ void igb_reset(struct igb_adapter *adapter)
wr32(E1000_VET, ETHERNET_IEEE_VLAN_TYPE);
 
igb_reset_adaptive(adapter-hw);
-   adapter-hw.phy.ops.get_phy_info(adapter-hw);
+   if (adapter-hw.phy.ops.get_phy_info)
+   adapter-hw.phy.ops.get_phy_info(adapter-hw);
igb_release_manageability(adapter);
 }
 
@@ -2083,7 +2084,8 @@ static void igb_set_multi(struct net_device *netdev)
 static void igb_update_phy_info(unsigned long data)
 {
struct igb_adapter *adapter = (struct igb_adapter *) data;
-   adapter-hw.phy.ops.get_phy_info(adapter-hw);
+   if (adapter-hw.phy.ops.get_phy_info)
+   adapter-hw.phy.ops.get_phy_info(adapter-hw);
 }
 
 /**



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [2.6.25-rc2, 2.6.24-rc8] page allocation failure...

2008-02-19 Thread Kok, Auke

Andrew Morton wrote:
 On Sun, 17 Feb 2008 13:20:59 + Daniel J Blueman [EMAIL PROTECTED] 
 wrote:
 
 I'm still hitting this with e1000e on 2.6.25-rc2, 10 times again.

are you sure? I don't think that's the case and you're seeing e1000 dumps 
here...

 It's clearly non-fatal, but then do we expect it to occur?

 Daniel

 --- [dmesg]

 [ 1250.822786] swapper: page allocation failure. order:3, mode:0x4020
 [ 1250.822786] Pid: 0, comm: swapper Not tainted 2.6.25-rc2-119 #2
 [ 1250.822786]
 [ 1250.822786] Call Trace:
 [ 1250.822786]  IRQ  [8025fe9e] __alloc_pages+0x34e/0x3a0
 [ 1250.822786]  [8048c6df] ? __netdev_alloc_skb+0x1f/0x40
 [ 1250.822786]  [8027acc2] __slab_alloc+0x102/0x3d0
 [ 1250.822786]  [8048c6df] ? __netdev_alloc_skb+0x1f/0x40
 [ 1250.822786]  [8027b8cb] __kmalloc_track_caller+0x7b/0xc0
 [ 1250.822786]  [8048b74f] __alloc_skb+0x6f/0x160
 [ 1250.822786]  [8048c6df] __netdev_alloc_skb+0x1f/0x40
 [ 1250.822786]  [8042652d] e1000_alloc_rx_buffers+0x1ed/0x260
 [ 1250.822786]  [80426b5a] e1000_clean_rx_irq+0x22a/0x330
 [ 1250.822786]  [80422981] e1000_clean+0x1e1/0x540
 [ 1250.822786]  [8024b7a5] ? tick_program_event+0x45/0x70
 [ 1250.822786]  [804930ba] net_rx_action+0x9a/0x150
 [ 1250.822786]  [802336b4] __do_softirq+0x74/0xf0
 [ 1250.822786]  [8020c5fc] call_softirq+0x1c/0x30
 [ 1250.822786]  [8020eaad] do_softirq+0x3d/0x80
 [ 1250.822786]  [80233635] irq_exit+0x85/0x90
 [ 1250.822786]  [8020eba5] do_IRQ+0x85/0x100
 [ 1250.822786]  [8020a5b0] ? mwait_idle+0x0/0x50
 [ 1250.822786]  [8020b981] ret_from_intr+0x0/0xa
 [ 1250.822786]  EOI  [8020a5f5] ? mwait_idle+0x45/0x50
 [ 1250.822786]  [80209a92] ? enter_idle+0x22/0x30
 [ 1250.822786]  [8020a534] ? cpu_idle+0x74/0xa0
 [ 1250.822786]  [80527825] ? rest_init+0x55/0x60
 
 They're regularly reported with e1000 too - I don't think aything really
 changed.
 
 e1000 has this crazy problem where because of a cascade of follies (mainly
 borked hardware) it has to do a 32kb allocation for a 9kb(?) packet.  It
 would be sad if that was carried over into e1000e?

can't be, I personally removed that code.

for MTU  1500 e1000e uses a plain normal sized SKB. for anything bigger e1000e
uses pages.

so I don't see how this bug could still be showing up for e1000e at all. The 
large
skb receive code is all gone (literally, removed).

*please* rmmod e1000; modprobe e1000e and show the dumps again so we know for 
sure
that we're not looking at e1000 dumps.

short fix: increase ring size for e1000 with `modprobe e1000 RxDescriptors=4096`
(or use ethtool) and `echo -n 8192  /proc/sys/vm/min_free_kbytes` or something
like that.

what nic hardware is this on? lspci?

Auke
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: e1000: Detected Tx Unit Hang

2008-02-19 Thread Kok, Auke

Bernd Schubert wrote:
 On Saturday 16 February 2008, Kok, Auke wrote:
 Bernd Schubert wrote:
 Hello,

 I can't login to one of our servers and just got this in an ipmi sol
 session:

 [18169.209181] e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
 [18169.209183]   Tx Queue 0
 [18169.209184]   TDH  e3
 [18169.209185]   TDT  e3
 [18169.209186]   next_to_use  e3
 [18169.209187]   next_to_cleanbd
 [18169.209188] buffer_info[next_to_clean]
 [18169.209189]   time_stamp   10043e4d2
 [18169.209190]   next_to_watchbe
 [18169.209191]   jiffies  10043e6f6
 [18169.209192]   next_to_watch.status 1
 [18169.256978] e1000: eth2: e1000_clean_tx_irq: Detected Tx Unit Hang
 [18169.256979]   Tx Queue 0
 [18169.256980]   TDH  de
 [18169.256982]   TDT  de
 [18169.256983]   next_to_use  de
 [18169.256984]   next_to_cleanbc
 [18169.256985] buffer_info[next_to_clean]
 [18169.256986]   time_stamp   10043e511
 [18169.256987]   next_to_watchbd
 [18169.256988]   jiffies  10043e701
 [18169.256989]   next_to_watch.status 1

 This is with 2.6.22.18. Is there any chance to recover the system? For
 some reasons I would prefer not to reboot now.
 if that's all you have then it was false alarm. there should be a 'netdev
 timeout - link reset' following those messages. can you send some more
 context on those messages?
 
 All I presently know is that there are 20 servers and login doesn't work any 
 more - sysrq+t does show me it hangs in fuse, which is accessing the 
 underlying nfs (we are using unionfs-fuse). While I checked the sysrq-t 
 output suddenly these e1000 messages appeared.
 Thinking a bit about it, it either could be 2.6.22.18 has an e1000 bug, which 
 2.6.22.X didn't have (X=16, I think, but I'm not sure) or someone  
 mis-configured the switch/network environment today. 
 Hmm, now that I think about the last part, there already had been other 
 networking problems today, which were supposed to be fixed several hours ago. 
 Seems they didn't fix it properly.
 
 in real tx hang cases, the hardware is reset within 2 seconds, and
 everything continues as normal.
 
 Thanks, this gives me hope I don't need to reboot the serves (reboot would 
 mean I would need to start 60 md-raid rebuilds...).

my first thought after I read this e-mail is that the tx-hang message is just a
symptom of your system not responding or being spinlocked all the time. These TX
hang issues normally completely do not interfere with normal system operation 
and
unless you have continuous TX resets you would be able to logon perfectly fine.

I think you might have hit another kernel bug here... perhaps even unionfs/fuse
related and that certainly looks plausible from your problem description.

looking at the changelog for 2.6.22.16-2.6.22.18 I can't see anything relevant
(see
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.22.y.git;a=shortlog),
but there are definately no e1000 driver changes in that range anyway.

I don't suppose you can do a git-bisect? that would certainly help. I don't 
think
we can rule out anything just yet here.

At least try to revert some of your systems to the previous kernel version and 
see
if the problem goes away...

Auke
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] Support arbitrary initial TCP timestamps

2008-02-19 Thread Glenn Griffin

  Adding yet another member to the already bloated tcp_sock structure to
  implement this is too high a cost.
 
 Yes, I was worried that would be deemed too high of a cost, but it was
 the most efficient way I could think to accomplish what I wanted.
 
  I would instead prefer that there be some global random number
  calculated when the first TCP socket is created, and use that as a
  global offset.  You can even recompute it every few hours if you
  like.
 
 That would work fine if my mine purpose was to randomize the tcp
 timestamp to mitigate the leak in information regarding uptime, but
 despite the brief description, that's only a side effect of what I
 intended to do.  What I wanted was a way to be able to choose an initial
 tcp timestamp for a particular connection that was not tied directly to
 jiffies.
 
 The two patches following this show my intended use case.  I intend to
 enhance syncookie support to allow it to support advanced tcp options
 (sack and window scaling).  Normally syncookies encode the bare minimum
 state of a connection in the ISN they choose, but the 32bit ISN isn't
 enough to encode advanced tcp options so you are left with a working but
 crippled tcp stack during a synflood attack.  If in addition to choosing
 an ISN we are able to choose an initial tcp timestamp, we are then able
 to encode an additional 32 bits of information that can contain more of
 the advanced tcp options.

Perhaps I should clarify.  I don't see a way to implement the additional
syncookie enhancements without storing an offset in some type of
per-socket structure.  Given that, is it still deemed too high of a cost?
Is enhancing syncookies not deemed important enough to warrant the
additional 4 bytes?  What if there was an additional config variable to
support arbitrary/random tcp timestamps, and then syncookies only used
the additional options when that setting was chosen?  Is there some
possiblity I'm missing that could get this feature in a form suitable
for inclusion?  Thanks.

--Glenn

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-19 Thread Tony Battersby

Michael Chan wrote:
 On Mon, 2008-02-18 at 16:35 -0800, David Miller wrote:

   
 One consequence of Herbert's change is that the chip will see a
 different datastream.  The initial skb-data linear area will be
 smaller, and the transition to the fragmented area of pages will be
 quicker.

 

 I see.  Perhaps when we get to the end of the data-stream, there is a
 tiny frag that the chip cannot handle.  That's the only thing I can
 think of.

 Please try this patch to see if the problem goes away.  This will
 disable SG on 5701 so we always get linear SKBs.

 diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
 index db606b6..bb37e76 100644
 --- a/drivers/net/tg3.c
 +++ b/drivers/net/tg3.c
 @@ -12717,6 +12717,9 @@ static int __devinit tg3_init_one(struct pci_dev 
 *pdev,
   } else
   tp-tg3_flags = ~TG3_FLAG_RX_CHECKSUMS;
  
 + if (GET_ASIC_REV(tp-pci_chip_rev_id) == ASIC_REV_5701)
 + dev-features = ~(NETIF_F_IP_CSUM | NETIF_F_SG);
 +
   /* flow control autonegotiation is default behavior */
   tp-tg3_flags |= TG3_FLAG_PAUSE_AUTONEG;
   tp-link_config.flowctrl = TG3_FLOW_CTRL_TX | TG3_FLOW_CTRL_RX;



   
This patch does appear to fix the data corruption (tested with
2.6.24.2).  However, it results in performance problems with the iSCSI
application that I am trying to run on this machine.

The test program that I described in the previous message still gets
good performance in both directions.  iperf -r gets good performance
in both directions (940 Mbits/s or 117 MB/s).  However, my target-mode
iSCSI application (which obviously generates rx/tx traffic patterns more
complicated than the synthetic tests) gets very poor performance in one
direction but good performance in the other direction.  iSCSI
performance drops to 6 - 15 MB/s when the 3Com NIC is doing heavy rx
with light tx, but remains at a decent 115 MB/s when the 3Com NIC is
doing heavy tx with light rx.  When I revert Herbert's patch instead of
applying the patch above, I get 115 MB/s in both cases.  (With a stock
unpatched kernel, the test fails almost immediately because the iSCSI
control PDUs are corrupted, causing the TCP connection to be dropped.)

The SysKonnect NIC that does not exhibit this problem has a chip that
says BCM5411KQM TT0128 P2Q and 56975E.

Tony

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-2.6][DRIVER][VETH] fix dev refcount race

2008-02-19 Thread Daniel Lezcano



Subject: veth fix dev refcount race
From: Daniel Lezcano [EMAIL PROTECTED]

When deleting the veth driver, veth_close calls netif_carrier_off
for the two extremities of the network device. netif_carrier_off on
the peer device will fire an event and hold a reference on the peer
device. Just after, the peer is unregistered taking the rtnl_lock while
the linkwatch_event is scheduled. If __linkwatch_run_queue does not
occurs before the unregistering, unregister_netdevice will wait for
the dev refcount to reach zero holding the rtnl_lock and linkwatch_event
will wait for the rtnl_lock and hold the dev refcount.

Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]
---
 drivers/net/veth.c |   53 -
 1 file changed, 40 insertions(+), 13 deletions(-)

Index: net-2.6/drivers/net/veth.c
===
--- net-2.6.orig/drivers/net/veth.c
+++ net-2.6/drivers/net/veth.c
@@ -244,18 +244,6 @@ static int veth_open(struct net_device *
 	return 0;
 }
 
-static int veth_close(struct net_device *dev)
-{
-	struct veth_priv *priv;
-
-	if (netif_carrier_ok(dev)) {
-		priv = netdev_priv(dev);
-		netif_carrier_off(dev);
-		netif_carrier_off(priv-peer);
-	}
-	return 0;
-}
-
 static int veth_dev_init(struct net_device *dev)
 {
 	struct veth_net_stats *stats;
@@ -286,13 +274,50 @@ static void veth_setup(struct net_device
 	dev-hard_start_xmit = veth_xmit;
 	dev-get_stats = veth_get_stats;
 	dev-open = veth_open;
-	dev-stop = veth_close;
 	dev-ethtool_ops = veth_ethtool_ops;
 	dev-features |= NETIF_F_LLTX;
 	dev-init = veth_dev_init;
 	dev-destructor = veth_dev_free;
 }
 
+static void veth_change_state(struct net_device *dev)
+{
+	struct net_device *peer;
+	struct veth_priv *priv;
+
+	priv = netdev_priv(dev);
+	peer = priv-peer;
+
+	if (netif_carrier_ok(peer)) {
+		if (!netif_carrier_ok(dev))
+			netif_carrier_on(dev);
+	} else {
+		if (netif_carrier_ok(dev))
+			netif_carrier_off(dev);
+	}
+}
+
+static int veth_device_event(struct notifier_block *unused,
+			 unsigned long event, void *ptr)
+{
+	struct net_device *dev = ptr;
+
+	if (dev-open != veth_open)
+		goto out;
+
+	switch (event) {
+	case NETDEV_CHANGE:
+		veth_change_state(dev);
+		break;
+	}
+out:
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block veth_notifier_block __read_mostly = {
+	.notifier_call	= veth_device_event,
+};
+
 /*
  * netlink interface
  */
@@ -454,12 +479,14 @@ static struct rtnl_link_ops veth_link_op
 
 static __init int veth_init(void)
 {
+	register_netdevice_notifier(veth_notifier_block);
 	return rtnl_link_register(veth_link_ops);
 }
 
 static __exit void veth_exit(void)
 {
 	rtnl_link_unregister(veth_link_ops);
+	unregister_netdevice_notifier(veth_notifier_block);
 }
 
 module_init(veth_init);

Re: AW: AW: AW: Problem receiving multicast with kernel.2.6.24, REAL solution

2008-02-19 Thread Patrick McHardy


Reither Robert wrote:

Visit AVD on prolight+sound in Frankfurt from 12.-15. March 2008 - Hall 8.0, 
Stand G16




Hurray, your patch does the trick for me !!!

Now join/leave und receiving of multicasts work like it should !!

May i kiss you Patrick, this saves my week ;-)


The credit goes to Jorge Boncompte, I only broke it :|


Hope we see a minor kernel release with this patch very soon ...


Yes, I'll send it to -stable as soon as it hits upstream.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] [NETLABEL] Minor cleanup: remove unused method definition

2008-02-19 Thread Paul Moore

On Tuesday 19 February 2008 9:25:31 am Rami Rosen wrote:
 Hi,

 This patch removes definition of netlbl_cfg_cipsov4_del() method in
 netlabel/netlabel_kapi.c and in include/net/netlabel.h as it is not
 used.


 Regards,
 Rami Rosen


 Signed-off-by: Rami Rosen [EMAIL PROTECTED]

This was added for use by Smack (and any other LSMs which want to 
configure NetLabel directly) and since this is an area that is 
undergoing a lot of churn at this point I'd prefer if this function was 
left in place for the time being.

At a later date if this function is still unused, I'll gladly ack it's 
removal or do so myself.

-- 
paul moore
linux security @ hp
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] igb: Fix panic with NICs with 1000BASE-SX PHY

2008-02-19 Thread Auke Kok

From: Bill Hayes [EMAIL PROTECTED]

This patch eliminates a kernel panic with the igb driver in 2.6.25-rc2 when 
running on a Intel 82575 Ethernet controller with a 1000BASE-SX PHY.  The 
panic does not happen with the 1000BASE-T PHY, only with a SX connection.  

Signed-off-by: Bill Hayes [EMAIL PROTECTED]
Signed-off-by: Andy Gospodarek [EMAIL PROTECTED]
Signed-off-by: Auke Kok [EMAIL PROTECTED]
---

 drivers/net/igb/igb_main.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index 3480cc7..6a1f230 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -816,7 +816,8 @@ void igb_reset(struct igb_adapter *adapter)
wr32(E1000_VET, ETHERNET_IEEE_VLAN_TYPE);
 
igb_reset_adaptive(adapter-hw);
-   adapter-hw.phy.ops.get_phy_info(adapter-hw);
+   if (adapter-hw.phy.ops.get_phy_info)
+   adapter-hw.phy.ops.get_phy_info(adapter-hw);
 }
 
 /**
@@ -2052,7 +2053,8 @@ static void igb_set_multi(struct net_device *netdev)
 static void igb_update_phy_info(unsigned long data)
 {
struct igb_adapter *adapter = (struct igb_adapter *) data;
-   adapter-hw.phy.ops.get_phy_info(adapter-hw);
+   if (adapter-hw.phy.ops.get_phy_info)
+   adapter-hw.phy.ops.get_phy_info(adapter-hw);
 }
 
 /**

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-19 Thread Michael Chan

On Tue, 2008-02-19 at 11:16 -0500, Tony Battersby wrote:
 iSCSI
 performance drops to 6 - 15 MB/s when the 3Com NIC is doing heavy rx
 with light tx,

That's strange.  The patch should only affect TX performance slightly
since we are just turning off SG for TX.  Please take an ethereal trace
to see what's happening and compare with a good trace.

 but remains at a decent 115 MB/s when the 3Com NIC is
 doing heavy tx with light rx.  When I revert Herbert's patch instead
 of
 applying the patch above, I get 115 MB/s in both cases.  (With a stock
 unpatched kernel, the test fails almost immediately because the iSCSI
 control PDUs are corrupted, causing the TCP connection to be dropped.)
 
 The SysKonnect NIC that does not exhibit this problem has a chip that
 says BCM5411KQM TT0128 P2Q and 56975E.
 

I think this is the 5700, but please send me the tg3 output that
identifies the chip and the revision.  Something like this:

eth2: Tigon3 [partno(BCM95705) rev 3003 PHY(5705)] (PCI:66MHz:32-bit) 
10/100/1000Base-T Ethernet 00:10:18:04:57:0d
eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] WireSpeed[0] TSOcap[1]





--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-19 Thread Tony Battersby

Michael Chan wrote:
 The SysKonnect NIC that does not exhibit this problem has a chip that
 says BCM5411KQM TT0128 P2Q and 56975E.
 I think this is the 5700, but please send me the tg3 output that
 identifies the chip and the revision.  Something like this:

 eth2: Tigon3 [partno(BCM95705) rev 3003 PHY(5705)] (PCI:66MHz:32-bit) 
 10/100/1000Base-T Ethernet 00:10:18:04:57:0d
 eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] WireSpeed[0] TSOcap[1]
   
Here is the dmesg output for the SysKonnect NIC:

eth0: Tigon3 [partno(SK-9D21) rev 7104 PHY(5411)] (PCI:66MHz:64-bit)
10/100/1000Base-T Ethernet 00:00:5a:9d:0c:4a
eth0: RXcsums[1] LinkChgREG[1] MIirq[1] ASF[0] WireSpeed[0] TSOcap[0]
eth0: dma_rwctrl[76ff000f] dma_mask[64-bit]

Tony

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RESEND/patch 2.6.25-rc2-git] net/enc28j60: low power mode

2008-02-19 Thread David Brownell

Keep enc28j60 chips in low-power mode when they're not in use.
At typically 120 mA, these chips run hot even when idle; this
low power mode cuts that power usage by a factor of around 100.

This version provides a generic routine to poll a register until
its masked value equals some value ... e.g. bit set or cleared.
It's basically what the previous wait_phy_ready() did, but this
version is generalized to support the handshaking needed to
enter and exit low power mode.

Signed-off-by: David Brownell [EMAIL PROTECTED]
Signed-off-by: Claudio Lanconelli [EMAIL PROTECTED]
---
 drivers/net/enc28j60.c |   82 ++---
 1 files changed, 58 insertions(+), 24 deletions(-)

--- a/drivers/net/enc28j60.c
+++ b/drivers/net/enc28j60.c
@@ -400,26 +400,31 @@ enc28j60_packet_write(struct enc28j60_ne
mutex_unlock(priv-lock);
 }
 
-/*
- * Wait until the PHY operation is complete.
- */
-static int wait_phy_ready(struct enc28j60_net *priv)
+static unsigned long msec20_to_jiffies;
+
+static int poll_ready(struct enc28j60_net *priv, u8 reg, u8 mask, u8 val)
 {
-   unsigned long timeout = jiffies + 20 * HZ / 1000;
-   int ret = 1;
+   unsigned long timeout = jiffies + msec20_to_jiffies;
 
/* 20 msec timeout read */
-   while (nolock_regb_read(priv, MISTAT)  MISTAT_BUSY) {
+   while ((nolock_regb_read(priv, reg)  mask) != val) {
if (time_after(jiffies, timeout)) {
if (netif_msg_drv(priv))
-   printk(KERN_DEBUG DRV_NAME
-   : PHY ready timeout!\n);
-   ret = 0;
-   break;
+   dev_dbg(priv-spi-dev,
+   reg %02x ready timeout!\n, reg);
+   return -ETIMEDOUT;
}
cpu_relax();
}
-   return ret;
+   return 0;
+}
+
+/*
+ * Wait until the PHY operation is complete.
+ */
+static int wait_phy_ready(struct enc28j60_net *priv)
+{
+   return poll_ready(priv, MISTAT, MISTAT_BUSY, 0) ? 0 : 1;
 }
 
 /*
@@ -594,6 +599,32 @@ static void nolock_txfifo_init(struct en
nolock_regw_write(priv, ETXNDL, end);
 }
 
+/*
+ * Low power mode shrinks power consumption about 100x, so we'd like
+ * the chip to be in that mode whenever it's inactive.  (However, we
+ * can't stay in lowpower mode during suspend with WOL active.)
+ */
+static void enc28j60_lowpower(struct enc28j60_net *priv, bool is_low)
+{
+   if (netif_msg_drv(priv))
+   dev_dbg(priv-spi-dev, %s power...\n,
+   is_low ? low : high);
+
+   mutex_lock(priv-lock);
+   if (is_low) {
+   nolock_reg_bfclr(priv, ECON1, ECON1_RXEN);
+   poll_ready(priv, ESTAT, ESTAT_RXBUSY, 0);
+   poll_ready(priv, ECON1, ECON1_TXRTS, 0);
+   /* ECON2_VRPS was set during initialization */
+   nolock_reg_bfset(priv, ECON2, ECON2_PWRSV);
+   } else {
+   nolock_reg_bfclr(priv, ECON2, ECON2_PWRSV);
+   poll_ready(priv, ESTAT, ESTAT_CLKRDY, ESTAT_CLKRDY);
+   /* caller sets ECON1_RXEN */
+   }
+   mutex_unlock(priv-lock);
+}
+
 static int enc28j60_hw_init(struct enc28j60_net *priv)
 {
u8 reg;
@@ -612,8 +643,8 @@ static int enc28j60_hw_init(struct enc28
priv-tx_retry_count = 0;
priv-max_pk_counter = 0;
priv-rxfilter = RXFILTER_NORMAL;
-   /* enable address auto increment */
-   nolock_regb_write(priv, ECON2, ECON2_AUTOINC);
+   /* enable address auto increment and voltage regulator powersave */
+   nolock_regb_write(priv, ECON2, ECON2_AUTOINC | ECON2_VRPS);
 
nolock_rxfifo_init(priv, RXSTART_INIT, RXEND_INIT);
nolock_txfifo_init(priv, TXSTART_INIT, TXEND_INIT);
@@ -690,7 +721,7 @@ static int enc28j60_hw_init(struct enc28
 
 static void enc28j60_hw_enable(struct enc28j60_net *priv)
 {
-   /* enable interrutps */
+   /* enable interrupts */
if (netif_msg_hw(priv))
printk(KERN_DEBUG DRV_NAME : %s() enabling interrupts.\n,
__FUNCTION__);
@@ -726,15 +757,12 @@ enc28j60_setlink(struct net_device *ndev
int ret = 0;
 
if (!priv-hw_enable) {
-   if (autoneg == AUTONEG_DISABLE  speed == SPEED_10) {
+   /* link is in low power mode now; duplex setting
+* will take effect on next enc28j60_hw_init().
+*/
+   if (autoneg == AUTONEG_DISABLE  speed == SPEED_10)
priv-full_duplex = (duplex == DUPLEX_FULL);
-   if (!enc28j60_hw_init(priv)) {
-   if (netif_msg_drv(priv))
-   dev_err(ndev-dev,
-   hw_reset() failed\n);
-   ret = -EINVAL;
-

[RESEND/patch 2.6.25-rc2-git] net/enc28j60: section fix

2008-02-19 Thread David Brownell

Minor bugfixes to the enc28j60 driver ... wrong section marking,
indentation, and bogus use of spi_bus_type.  There's still major
overuse of printk; essentially every place it's used should switch
to dev_*() messaging primitives.

Signed-off-by: David Brownell [EMAIL PROTECTED]
Acked-by: Claudio Lanconelli [EMAIL PROTECTED]
---
 drivers/net/enc28j60.c |5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

--- a/drivers/net/enc28j60.c2008-01-29 11:07:14.0 -0800
+++ b/drivers/net/enc28j60.c2008-01-29 12:43:18.0 -0800
@@ -1555,7 +1555,7 @@ error_alloc:
return ret;
 }
 
-static int enc28j60_remove(struct spi_device *spi)
+static int __devexit enc28j60_remove(struct spi_device *spi)
 {
struct enc28j60_net *priv = dev_get_drvdata(spi-dev);
 
@@ -1572,9 +1572,8 @@ static int enc28j60_remove(struct spi_de
 static struct spi_driver enc28j60_driver = {
.driver = {
   .name = DRV_NAME,
-  .bus = spi_bus_type,
   .owner = THIS_MODULE,
-  },
+},
.probe = enc28j60_probe,
.remove = __devexit_p(enc28j60_remove),
 };
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RESEND/patch 2.6.25-rc2-git] net/enc28j60: oops fix

2008-02-19 Thread David Brownell

Prevent oops on enc28j60 packet RX:  make sure buffers are aligned.
Not all architectures support unaligned accesses in kernel space.

Signed-off-by: David Brownell [EMAIL PROTECTED]
Acked-by: Claudio Lanconelli [EMAIL PROTECTED]
---
Seems the enc28j60 patches didn't get picked up, ergo resend...
they're not in net-2.6.git, even this oops fix.

 drivers/net/enc28j60.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/net/enc28j60.c2008-02-06 09:29:00.0 -0800
+++ b/drivers/net/enc28j60.c2008-02-06 09:30:03.0 -0800
@@ -900,7 +900,7 @@ static void enc28j60_hw_rx(struct net_de
if (RSV_GETBIT(rxstat, RSV_LENCHECKERR))
ndev-stats.rx_frame_errors++;
} else {
-   skb = dev_alloc_skb(len);
+   skb = dev_alloc_skb(len + NET_IP_ALIGN);
if (!skb) {
if (netif_msg_rx_err(priv))
dev_err(ndev-dev,
@@ -908,6 +908,7 @@ static void enc28j60_hw_rx(struct net_de
ndev-stats.rx_dropped++;
} else {
skb-dev = ndev;
+   skb_reserve(skb, NET_IP_ALIGN);
/* copy the packet from the receive buffer */
enc28j60_mem_read(priv, priv-next_pk_ptr + sizeof(rsv),
len, skb_put(skb, len));
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bonding support for eth1394?

2008-02-19 Thread Stefan Richter


Roland Dreier wrote on 2007-10-13:

  The bonding sources have a few occurrences of EOPNOTSUPP.  Unless I
  missed something, they are all related to setting the hardware address
  of the interface.  AFAICS this is impossible with IP over FireWire.  If
  it is crucial to bonding to be able to change the slaves' hardware
  addresses, then you are out of luck.

There are a few changes to the bonding driver pending that will add
support for bonding IP-over-InfiniBand interfaces.  IPoIB also cannot
change its HW address, so the patches address that issue.

Once those patches land, bonding eth1394 interfaces may just work.


Bill Fink wrote on 2007-10-14:

While that might allow multiple eth1394 interfaces to be bonded,
I believe the user wanted to bond an eth1394 interface with a normal
Ethernet interface, and I don't think that will work even with the
IPoIB bonding changes, since bonding of different fundamental types
of network interfaces still won't be supported, and I'm pretty sure
eth1394 is not considered a standard Ethernet interface (different
MAC address format for one thing).



Karl, you could try kernel 2.6.24(.2) which AFAIU features the mentioned 
changes.

--
Stefan Richter
-=-==--- --=- =--==
http://arcgraph.de/sr/
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-19 Thread Tony Battersby

Michael Chan wrote:
 On Tue, 2008-02-19 at 11:16 -0500, Tony Battersby wrote:
   
 iSCSI
 performance drops to 6 - 15 MB/s when the 3Com NIC is doing heavy rx
 with light tx,
 

 That's strange.  The patch should only affect TX performance slightly
 since we are just turning off SG for TX.  Please take an ethereal trace
 to see what's happening and compare with a good trace.

   

Update: when I revert Herbert's patch in addition to applying your
patch, the iSCSI performance goes back up to 115 MB/s again in both
directions.  So it looks like turning off SG for TX didn't itself cause
the performance drop, but rather that the performance drop is just
another manifestation of whatever bug is causing the data corruption.

I do not regularly use wireshark or look at network packet dumps, so I
am not really sure what to look for.  Given the above information, do
you still believe that there is value in examining the packet dump?

Tony

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [NET]: Messed multicast lists after dev_mc_sync/unsync

2008-02-19 Thread David Miller

From: Patrick McHardy [EMAIL PROTECTED]
Date: Tue, 19 Feb 2008 15:56:59 +0100

 [NET]: Messed multicast lists after dev_mc_sync/unsync

 Commit a0a400d79e3dd7843e7e81baa3ef2957bdc292d0 from you
 introduced a new field da_synced to struct dev_addr_list that is
 not properly initialized to 0. So when any of the current users (8021q,
 macvlan, mac80211) calls dev_mc_sync/unsync they mess the address
 list for both devices.

 The attached patch fixed it for me and avoid future problems.

 Signed-off-by: Jorge Boncompte [DTI2] [EMAIL PROTECTED]
 Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

I've added the headline description of the referenced commit
to this commit message, and will queue this up to -stable.

Thanks!
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[2.6.25 patch] fix broken error handling in ieee80211_sta_process_addba_request()

2008-02-19 Thread Adrian Bunk

The Coverity checker spotted this buggy error handling added by
commit 07db218396650933abff3c5c1ad1e2a6e0cfedeb.

Signed-off-by: Adrian Bunk [EMAIL PROTECTED]

---
6003e3d899a8fd6425ff509363b776f8807df25d diff --git 
a/net/mac80211/ieee80211_sta.c b/net/mac80211/ieee80211_sta.c
index 2019b4f..094565e 100644
--- a/net/mac80211/ieee80211_sta.c
+++ b/net/mac80211/ieee80211_sta.c
@@ -1116,9 +1116,10 @@ static void ieee80211_sta_process_addba_request(struct 
net_device *dev,
/* prepare reordering buffer */
tid_agg_rx-reorder_buf =
kmalloc(buf_size * sizeof(struct sk_buf *), GFP_ATOMIC);
-   if ((!tid_agg_rx-reorder_buf)  net_ratelimit()) {
-   printk(KERN_ERR can not allocate reordering buffer 
-   to tid %d\n, tid);
+   if (!tid_agg_rx-reorder_buf) {
+   if (net_ratelimit())
+   printk(KERN_ERR can not allocate reordering buffer 
+   to tid %d\n, tid);
goto end;
}
memset(tid_agg_rx-reorder_buf, 0,

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver

2008-02-19 Thread Jarek Poplawski

On Mon, Feb 18, 2008 at 10:09:24PM +, James Chapman wrote:
...
 Unfortunately the ISP's syslog stops. But I've been able to borrow
 two Quad Xeon boxes and have reproduced the problem.

 Here's a new version of the patch. The patch avoids disabling irqs
 and fixes the sk_dst_get() usage that DaveM mentioned. But even with
 this patch, lockdep still complains if hundreds of ppp sessions are
 inserted into a tunnel as rapidly as possible (lockdep trace is below).
 I can stop these errors by wrapping the call to ppp_input() in
 pppol2tp_recv_dequeue_skb() with local_irq_save/restore. What is a
 better fix?

I send here my proposal: it's intended for testing and to check one of
possible solutions here. IMHO your lockdep reports show there is no
use to change anything around sk_dst_lock: it would need the global
change of this lock to fix this problem. So the fix should be done
around pch-upl lock and this means changing ppp_generic.

In the patch below I've used trylock in places which seem to allow
for skipping some things (while config is changed only) or simply
don't need this lock because there is no ppp struct. This could be
modified to add some waiting loop if necessary. Another option is to
change the write side of this lock: it looks like more vulnerable if
something missed because there are more locks involved, but probably
should be enough to solve this problem too.

I think pppol2tp need to be first checked only with hlist_lock bh
patch, unless there were some lockdep reports on these other locks
too. (BTW, I added ppp maintainer to CC - I hope we get Paul's opinion
on this.)

Regards,
Jarek P.

(testing patch #1)
---

 drivers/net/ppp_generic.c |   33 +++--
 1 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ppp_generic.c b/drivers/net/ppp_generic.c
index 4dc5b4b..5cbc534 100644
--- a/drivers/net/ppp_generic.c
+++ b/drivers/net/ppp_generic.c
@@ -1473,7 +1473,7 @@ void
 ppp_input(struct ppp_channel *chan, struct sk_buff *skb)
 {
struct channel *pch = chan-ppp;
-   int proto;
+   int proto, locked;
 
if (!pch || skb-len == 0) {
kfree_skb(skb);
@@ -1481,8 +1481,13 @@ ppp_input(struct ppp_channel *chan, struct sk_buff *skb)
}
 
proto = PPP_PROTO(skb);
-   read_lock_bh(pch-upl);
-   if (!pch-ppp || proto = 0xc000 || proto == PPP_CCPFRAG) {
+   /*
+* We use trylock to avoid dependency between soft-irq-safe upl lock
+* and soft-irq-unsafe sk_dst_lock.
+*/
+   local_bh_disable();
+   locked = read_trylock(pch-upl);
+   if (!locked || !pch-ppp || proto = 0xc000 || proto == PPP_CCPFRAG) {
/* put it on the channel queue */
skb_queue_tail(pch-file.rq, skb);
/* drop old frames if queue too long */
@@ -1493,7 +1498,10 @@ ppp_input(struct ppp_channel *chan, struct sk_buff *skb)
} else {
ppp_do_recv(pch-ppp, skb, pch);
}
-   read_unlock_bh(pch-upl);
+
+   if (locked)
+   read_unlock(pch-upl);
+   local_bh_enable();
 }
 
 /* Put a 0-length skb in the receive queue as an error indication */
@@ -1506,16 +1514,18 @@ ppp_input_error(struct ppp_channel *chan, int code)
if (!pch)
return;
 
-   read_lock_bh(pch-upl);
-   if (pch-ppp) {
+   /* a trylock comment in ppp_input() */
+   local_bh_disable();
+   if (read_trylock(pch-upl)  pch-ppp) {
skb = alloc_skb(0, GFP_ATOMIC);
if (skb) {
skb-len = 0;   /* probably unnecessary */
skb-cb[0] = code;
ppp_do_recv(pch-ppp, skb, pch);
}
+   read_unlock(pch-upl);
}
-   read_unlock_bh(pch-upl);
+   local_bh_enable();
 }
 
 /*
@@ -2044,10 +2054,13 @@ int ppp_unit_number(struct ppp_channel *chan)
int unit = -1;
 
if (pch) {
-   read_lock_bh(pch-upl);
-   if (pch-ppp)
+   /* a trylock comment in ppp_input() */
+   local_bh_disable();
+   if (read_trylock(pch-upl)  pch-ppp) {
unit = pch-ppp-file.index;
-   read_unlock_bh(pch-upl);
+   read_unlock(pch-upl);
+   }
+   local_bh_enable();
}
return unit;
 }
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC: 2.6.25 patch] ipv4/fib_hash.c: fix NULL dereference

2008-02-19 Thread Adrian Bunk

Unless I miss a guaranteed relation between between f and 
new_fa-fa_info this patch is required for fixing a NULL dereference
introduced by commit a6501e080c318f8d4467679d17807f42b3a33cd5 and 
spotted by the Coverity checker.

Signed-off-by: Adrian Bunk [EMAIL PROTECTED]

---

 net/ipv4/fib_hash.c |   10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

--- linux-2.6/net/ipv4/fib_hash.c.old   2008-02-19 23:23:14.0 +0200
+++ linux-2.6/net/ipv4/fib_hash.c   2008-02-19 23:38:28.0 +0200
@@ -367,17 +367,18 @@ static struct fib_node *fib_find_node(st
}
 
return NULL;
 }
 
 static int fn_hash_insert(struct fib_table *tb, struct fib_config *cfg)
 {
struct fn_hash *table = (struct fn_hash *) tb-tb_data;
-   struct fib_node *new_f, *f;
+   struct fib_node *new_f = NULL;
+   struct fib_node *f;
struct fib_alias *fa, *new_fa;
struct fn_zone *fz;
struct fib_info *fi;
u8 tos = cfg-fc_tos;
__be32 key;
int err;
 
if (cfg-fc_dst_len  32)
@@ -491,33 +492,32 @@ static int fn_hash_insert(struct fib_tab
}
 
err = -ENOENT;
if (!(cfg-fc_nlflags  NLM_F_CREATE))
goto out;
 
err = -ENOBUFS;
 
-   new_f = NULL;
if (!f) {
new_f = kmem_cache_zalloc(fn_hash_kmem, GFP_KERNEL);
if (new_f == NULL)
goto out;
 
INIT_HLIST_NODE(new_f-fn_hash);
INIT_LIST_HEAD(new_f-fn_alias);
new_f-fn_key = key;
f = new_f;
}
 
new_fa = f-fn_embedded_alias;
if (new_fa-fa_info != NULL) {
new_fa = kmem_cache_alloc(fn_alias_kmem, GFP_KERNEL);
if (new_fa == NULL)
-   goto out_free_new_f;
+   goto out;
}
new_fa-fa_info = fi;
new_fa-fa_tos = tos;
new_fa-fa_type = cfg-fc_type;
new_fa-fa_scope = cfg-fc_scope;
new_fa-fa_state = 0;
 
/*
@@ -535,19 +535,19 @@ static int fn_hash_insert(struct fib_tab
if (new_f)
fz-fz_nent++;
rt_cache_flush(-1);
 
rtmsg_fib(RTM_NEWROUTE, key, new_fa, cfg-fc_dst_len, tb-tb_id,
  cfg-fc_nlinfo, 0);
return 0;
 
-out_free_new_f:
-   kmem_cache_free(fn_hash_kmem, new_f);
 out:
+   if (new_f)
+   kmem_cache_free(fn_hash_kmem, new_f);
fib_release_info(fi);
return err;
 }
 
 
 static int fn_hash_delete(struct fib_table *tb, struct fib_config *cfg)
 {
struct fn_hash *table = (struct fn_hash*)tb-tb_data;

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC: 2.6.25 patch] ipv4/fib_hash.c: fix NULL dereference

2008-02-19 Thread Eric Dumazet


Adrian Bunk a écrit :
Unless I miss a guaranteed relation between between f and 
new_fa-fa_info this patch is required for fixing a NULL dereference
introduced by commit a6501e080c318f8d4467679d17807f42b3a33cd5 and 
spotted by the Coverity checker.


Signed-off-by: Adrian Bunk [EMAIL PROTECTED]

---

 net/ipv4/fib_hash.c |   10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

--- linux-2.6/net/ipv4/fib_hash.c.old   2008-02-19 23:23:14.0 +0200
+++ linux-2.6/net/ipv4/fib_hash.c   2008-02-19 23:38:28.0 +0200
@@ -367,17 +367,18 @@ static struct fib_node *fib_find_node(st
}
 
 	return NULL;

 }
 
 static int fn_hash_insert(struct fib_table *tb, struct fib_config *cfg)

 {
struct fn_hash *table = (struct fn_hash *) tb-tb_data;
-   struct fib_node *new_f, *f;
+   struct fib_node *new_f = NULL;
+   struct fib_node *f;
struct fib_alias *fa, *new_fa;
struct fn_zone *fz;
struct fib_info *fi;
u8 tos = cfg-fc_tos;
__be32 key;
int err;
 
 	if (cfg-fc_dst_len  32)

@@ -491,33 +492,32 @@ static int fn_hash_insert(struct fib_tab
}
 
 	err = -ENOENT;

if (!(cfg-fc_nlflags  NLM_F_CREATE))
goto out;
 
 	err = -ENOBUFS;
 
-	new_f = NULL;

if (!f) {
new_f = kmem_cache_zalloc(fn_hash_kmem, GFP_KERNEL);
if (new_f == NULL)
goto out;
 
 		INIT_HLIST_NODE(new_f-fn_hash);

INIT_LIST_HEAD(new_f-fn_alias);
new_f-fn_key = key;
f = new_f;
}
 
 	new_fa = f-fn_embedded_alias;

if (new_fa-fa_info != NULL) {
new_fa = kmem_cache_alloc(fn_alias_kmem, GFP_KERNEL);
if (new_fa == NULL)
-   goto out_free_new_f;
+   goto out;
}
new_fa-fa_info = fi;
new_fa-fa_tos = tos;
new_fa-fa_type = cfg-fc_type;
new_fa-fa_scope = cfg-fc_scope;
new_fa-fa_state = 0;
 
 	/*

@@ -535,19 +535,19 @@ static int fn_hash_insert(struct fib_tab
if (new_f)
fz-fz_nent++;
rt_cache_flush(-1);
 
 	rtmsg_fib(RTM_NEWROUTE, key, new_fa, cfg-fc_dst_len, tb-tb_id,

  cfg-fc_nlinfo, 0);
return 0;
 
-out_free_new_f:

-   kmem_cache_free(fn_hash_kmem, new_f);
 out:
+   if (new_f)
+   kmem_cache_free(fn_hash_kmem, new_f);
fib_release_info(fi);
return err;
 }
 
 
 static int fn_hash_delete(struct fib_table *tb, struct fib_config *cfg)

 {
struct fn_hash *table = (struct fn_hash*)tb-tb_data;



Hum, you are right, kmem_cache_free() doesnt allow a NULL object, like kfree() 
does.



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver

2008-02-19 Thread Jarek Poplawski

On Wed, Feb 20, 2008 at 12:06:40AM +0100, Jarek Poplawski wrote:
...
 (testing patch #1)

SORRY!!!  take 2 (unlocking fixed)

---

 drivers/net/ppp_generic.c |   39 +--
 1 files changed, 29 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ppp_generic.c b/drivers/net/ppp_generic.c
index 4dc5b4b..c4e3808 100644
--- a/drivers/net/ppp_generic.c
+++ b/drivers/net/ppp_generic.c
@@ -1473,7 +1473,7 @@ void
 ppp_input(struct ppp_channel *chan, struct sk_buff *skb)
 {
struct channel *pch = chan-ppp;
-   int proto;
+   int proto, locked;
 
if (!pch || skb-len == 0) {
kfree_skb(skb);
@@ -1481,8 +1481,13 @@ ppp_input(struct ppp_channel *chan, struct sk_buff *skb)
}
 
proto = PPP_PROTO(skb);
-   read_lock_bh(pch-upl);
-   if (!pch-ppp || proto = 0xc000 || proto == PPP_CCPFRAG) {
+   /*
+* We use trylock to avoid dependency between soft-irq-safe upl lock
+* and soft-irq-unsafe sk_dst_lock.
+*/
+   local_bh_disable();
+   locked = read_trylock(pch-upl);
+   if (!locked || !pch-ppp || proto = 0xc000 || proto == PPP_CCPFRAG) {
/* put it on the channel queue */
skb_queue_tail(pch-file.rq, skb);
/* drop old frames if queue too long */
@@ -1493,7 +1498,10 @@ ppp_input(struct ppp_channel *chan, struct sk_buff *skb)
} else {
ppp_do_recv(pch-ppp, skb, pch);
}
-   read_unlock_bh(pch-upl);
+
+   if (locked)
+   read_unlock(pch-upl);
+   local_bh_enable();
 }
 
 /* Put a 0-length skb in the receive queue as an error indication */
@@ -1502,12 +1510,14 @@ ppp_input_error(struct ppp_channel *chan, int code)
 {
struct channel *pch = chan-ppp;
struct sk_buff *skb;
+   int locked;
 
if (!pch)
return;
 
-   read_lock_bh(pch-upl);
-   if (pch-ppp) {
+   /* a trylock comment in ppp_input() */
+   local_bh_disable();
+   if ((locked = read_trylock(pch-upl))  pch-ppp) {
skb = alloc_skb(0, GFP_ATOMIC);
if (skb) {
skb-len = 0;   /* probably unnecessary */
@@ -1515,7 +1525,10 @@ ppp_input_error(struct ppp_channel *chan, int code)
ppp_do_recv(pch-ppp, skb, pch);
}
}
-   read_unlock_bh(pch-upl);
+
+   if (locked)
+   read_unlock(pch-upl);
+   local_bh_enable();
 }
 
 /*
@@ -2044,10 +2057,16 @@ int ppp_unit_number(struct ppp_channel *chan)
int unit = -1;
 
if (pch) {
-   read_lock_bh(pch-upl);
-   if (pch-ppp)
+   int locked;
+
+   /* a trylock comment in ppp_input() */
+   local_bh_disable();
+   if ((locked = read_trylock(pch-upl))  pch-ppp)
unit = pch-ppp-file.index;
-   read_unlock_bh(pch-upl);
+
+   if (locked)
+   read_unlock(pch-upl);
+   local_bh_enable();
}
return unit;
 }
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RTNL]: Add missing link netlink attribute policy definitions

2008-02-19 Thread David Miller

From: Thomas Graf [EMAIL PROTECTED]
Date: Wed, 20 Feb 2008 00:46:06 +0100

 IFLA_LINK is no longer a write-only attribute on the kernel
 side and must thus be validated. Same goes for the newly
 introduced IFLA_LINKINFO.

 Fixes undefined behaviour if either of the attributes are
 not well formed.

 Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Applied, thanks Thomas.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[2.6 patch] net/phy/mdio_bus.c: fix a check-after-use

2008-02-19 Thread Adrian Bunk

This patch fixes a check-after-use spotted by the Coverity checker.

Signed-off-by: Adrian Bunk [EMAIL PROTECTED]

---
6beeb3ac577d74d72b2f91bd654eecb904c3c17e diff --git 
a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index 6e9f619..963630c 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -49,13 +49,13 @@ int mdiobus_register(struct mii_bus *bus)
int i;
int err = 0;
 
-   mutex_init(bus-mdio_lock);
-
if (NULL == bus || NULL == bus-name ||
NULL == bus-read ||
NULL == bus-write)
return -EINVAL;
 
+   mutex_init(bus-mdio_lock);
+
if (bus-reset)
bus-reset(bus);
 

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

kernel BUG at net/core/skbuff.c:95!

2008-02-19 Thread Jim Westfall

Hi

I have a reproducible crash that can be triggered remotely at the layer2 
level.  Example crash outputs is as follows

kernel: skb_over_panic: text:c0541fc7 len:1000 put:997 head:c166ac00 
data:c166ac2f tail:0xc166b017 end:0xc166ac80 dev:eth0
kernel: [ cut here ]
kernel: kernel BUG at net/core/skbuff.c:95!
kernel: invalid opcode:  [#1]
kernel: SMP 
kernel: Modules linked in:
kernel: CPU:0
kernel: EIP:0060:[c052753e]Not tainted VLI
kernel: EFLAGS: 00010286   (2.6.23.16 #1)
kernel: EIP is at skb_over_panic+0x5e/0x70
kernel: eax: 0076   ebx: df600460   ecx: 0086   edx: 
kernel: esi: 03e5   edi: 03e8   ebp: dcf652e0   esp: c087fed0
kernel: ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
kernel: Process swapper (pid: 0, ti=c087f000 task=c076e0e0 task.ti=c081)
kernel: Stack: c0746150 c0541fc7 03e8 03e5 c166ac00 c166ac2f c166b017 
c166ac80 
kernel: c1588000 df600460 deb48418 c0541fcc d246 5429a9b7 c07f6468 c07f645c 
kernel: dcf652e0  c054209b dcf652e0 dcf652e0 0003 deb48033 c0542105 
kernel: Call Trace:
kernel: [c0541fc7] llc_station_ac_send_test_r+0x137/0x190
kernel: [c0541fcc] llc_station_ac_send_test_r+0x13c/0x190
kernel: [c054209b] llc_station_next_state+0x7b/0xc0
kernel: [c0542105] llc_station_state_process+0x25/0x40
kernel: [c053b37a] llc_rcv+0x24a/0x290
kernel: [c05308e2] netif_receive_skb+0x242/0x360
kernel: [c0358563] e100_poll+0x193/0x4a0
kernel: [c0118312] run_rebalance_domains+0x112/0x410
kernel: [c052d6f4] net_rx_action+0x74/0x110
kernel: [c011ff65] __do_softirq+0x75/0xf0
kernel: [c010580b] do_softirq+0x5b/0xb0
kernel: [c011c738] profile_tick+0x38/0x60
kernel: [c013d680] handle_fasteoi_irq+0x0/0xd0
kernel: [c01058da] do_IRQ+0x7a/0xc0
kernel: [c0101750] default_idle+0x0/0x50
kernel: [c01038c3] common_interrupt+0x23/0x30
kernel: [c0101750] default_idle+0x0/0x50
kernel: [c0101784] default_idle+0x34/0x50
kernel: [c01017d4] cpu_idle+0x34/0x80
kernel: [c0817962] start_kernel+0x272/0x350
kernel: [c0817390] unknown_bootoption+0x0/0x1d0
kernel: ===
 

skb_over_panic: text:c055be07 len:1000 put:997 head:de32a800 data:de32a82f 
tail:0xde32ac17 end:0xde32a880 dev:eth0
[ cut here ]
kernel BUG at net/core/skbuff.c:95!
invalid opcode:  [#1] SMP 
Modules linked in:

Pid: 0, comm: swapper Not tainted (2.6.24.2 #1)
EIP: 0060:[c054078e] EFLAGS: 00010282 CPU: 0
EIP is at skb_over_panic+0x5e/0x70
EAX: 0076 EBX: de32daa0 ECX: 0092 EDX: 
ESI: 03e5 EDI: 03e8 EBP: df3cf200 ESP: c08a3ec0
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
Process swapper (pid: 0, ti=c08a3000 task=c0794180 task.ti=c0833000)
Stack: c076ae10 c055be07 03e8 03e5 de32a800 de32a82f de32ac17 de32a880 
   df9f0800 de32daa0 df199418 c055be0c d246 5429a9b7 c081a8f4 c081a8e8 
   df3cf200  c055bedb df3cf200 df3cf200 0003 df199033 c055bf45 
Call Trace:
 [c055be07] llc_station_ac_send_test_r+0x137/0x190
 [c055be0c] llc_station_ac_send_test_r+0x13c/0x190
 [c055bedb] llc_station_next_state+0x7b/0xc0
 [c055bf45] llc_station_state_process+0x25/0x40
 [c055515a] llc_rcv+0x24a/0x2c0
 [c054a0bb] netif_receive_skb+0x27b/0x3a0
 [c03673c9] e100_poll+0x179/0x410
 [c05470f1] net_rx_action+0x91/0x150
 [c0120275] __do_softirq+0x75/0xf0
 [c0105a3b] do_softirq+0x5b/0xb0
 [c011ca78] profile_tick+0x38/0x60
 [c013ea80] handle_fasteoi_irq+0x0/0xd0
 [c0105b0a] do_IRQ+0x7a/0xc0
 [c0101730] default_idle+0x0/0x50
 [c0103903] common_interrupt+0x23/0x30
 [c0101730] default_idle+0x0/0x50
 [c0101764] default_idle+0x34/0x50
 [c01017c8] cpu_idle+0x48/0xa0
 [c083a942] start_kernel+0x272/0x350
 [c083a370] unknown_bootoption+0x0/0x1d0
 ===
Code: 00 00 89 44 24 14 8b 83 98 00 00 00 89 44 24 10 8b 43 50 89 74 24 0c 89 4c
 24 04 89 44 24 08 c7 04 24 10 ae 76 c0 e8 42 ba bd ff 0f 0b eb fe 8d b4 26 00
 00 00 00 8d bc 27 00 00 00 00 56 53 83 
EIP: [c054078e] skb_over_panic+0x5e/0x70 SS:ESP 0068:c08a3ec0
Kernel panic - not syncing: Fatal exception in interrupt

The issue was originally noticed on 2.6.20.21 machines, when we had ~60 
servers all panic at the same time after a switch went crazy.

To reproduce download linkloop

http://freshmeat.net/projects/linkloop/

then run linkloop -d -s 1000 target mac

thanks
jim  
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.24.2
# Tue Feb 19 15:55:53 2008
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
# CONFIG_X86_64 is not set
CONFIG_X86=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
# CONFIG_RWSEM_GENERIC_SPINLOCK is not set

Re: [RFC: 2.6.25 patch] ipv4/fib_hash.c: fix NULL dereference

2008-02-19 Thread David Miller

From: Eric Dumazet [EMAIL PROTECTED]
Date: Wed, 20 Feb 2008 00:06:14 +0100

 Adrian Bunk a écrit :
  Unless I miss a guaranteed relation between between f and 
  new_fa-fa_info this patch is required for fixing a NULL dereference
  introduced by commit a6501e080c318f8d4467679d17807f42b3a33cd5 and 
  spotted by the Coverity checker.

  Signed-off-by: Adrian Bunk [EMAIL PROTECTED]
 ...
 Hum, you are right, kmem_cache_free() doesnt allow a NULL object, like 
 kfree() 
 does.

Applied, thanks everyone.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-19 Thread Michael Chan

On Tue, 2008-02-19 at 17:14 -0500, Tony Battersby wrote:

 
 Update: when I revert Herbert's patch in addition to applying your
 patch, the iSCSI performance goes back up to 115 MB/s again in both
 directions.  So it looks like turning off SG for TX didn't itself cause
 the performance drop, but rather that the performance drop is just
 another manifestation of whatever bug is causing the data corruption.
 
 I do not regularly use wireshark or look at network packet dumps, so I
 am not really sure what to look for.  Given the above information, do
 you still believe that there is value in examining the packet dump?
 

Can you confirm whether you're getting TCP checksum errors on the other
side that is receiving packets from the 5701?  You can just check
statistics using netstat -s.  I suspect that after we turn off SG,
checksum is no longer offloaded and we are getting lots of TCP checksum
errors instead that are slowing the performance.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RTNL]: Add missing link netlink attribute policy definitions

2008-02-19 Thread Thomas Graf

IFLA_LINK is no longer a write-only attribute on the kernel
side and must thus be validated. Same goes for the newly
introduced IFLA_LINKINFO.

Fixes undefined behaviour if either of the attributes are
not well formed.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.26/net/core/rtnetlink.c
===
--- net-2.6.26.orig/net/core/rtnetlink.c2008-02-19 20:30:08.0 
+0100
+++ net-2.6.26/net/core/rtnetlink.c 2008-02-20 00:39:54.0 +0100
@@ -693,10 +693,12 @@
[IFLA_BROADCAST]= { .type = NLA_BINARY, .len = MAX_ADDR_LEN },
[IFLA_MAP]  = { .len = sizeof(struct rtnl_link_ifmap) },
[IFLA_MTU]  = { .type = NLA_U32 },
+   [IFLA_LINK] = { .type = NLA_U32 },
[IFLA_TXQLEN]   = { .type = NLA_U32 },
[IFLA_WEIGHT]   = { .type = NLA_U32 },
[IFLA_OPERSTATE]= { .type = NLA_U8 },
[IFLA_LINKMODE] = { .type = NLA_U8 },
+   [IFLA_LINKINFO] = { .type = NLA_NESTED },
[IFLA_NET_NS_PID]   = { .type = NLA_U32 },
 };
 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] bluetooth : put hci dev after del conn

2008-02-19 Thread Dave Young

On Feb 19, 2008 12:44 PM, David Miller [EMAIL PROTECTED] wrote:
 From: Dave Young [EMAIL PROTECTED]
 Date: Mon, 18 Feb 2008 15:55:55 +0800

  Move hci_dev_put to del_conn to avoid hci dev going away before hci conn.

 This looks correct so I have applied it.

  Signed-off-by: Dave Young [EMAIL PROTECTED]

 Please remove the extraneous space at the end of your
 signoff line next time :-)

Will do :)

 Also, I reworked the loop in del_conn() so that it no longer
 generates a compile warning, so I had to apply your patch
 by hand.

Thanks a lot.

Regards
dave
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATHCH 1/16] ServerEngines 10Gb NIC driver

2008-02-19 Thread Jay Cliburn

On Mon, 18 Feb 2008 10:00:51 -0800
Subbu Seetharaman [EMAIL PROTECTED] wrote:

 I have one question about bit fields. Several of  
 headers in the common code are  generated by  
 srcgen from f/w source files.  Some of the  structures 
 in these headers  have bit fields (with separate definitions 
 for little endian and big endian hosts).  Are these un-acceptable 
 in Linux driver submissions ?

The netdev maintainer, Jeff Garzik (to whom you should submit your
driver), frowns upon the use of bitfields, and for good reason.  See,
for example, 
http://marc.info/?l=linux-kernelm=118444531031506w=2 
http://lkml.org/lkml/2005/8/5/361
http://lkml.org/lkml/2006/6/20/470
http://en.wikipedia.org/wiki/Bit_field.

You should avoid them if at all possible.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-19 Thread Matt Carlson

On Tue, Feb 19, 2008 at 05:14:26PM -0500, Tony Battersby wrote:
 Michael Chan wrote:
  On Tue, 2008-02-19 at 11:16 -0500, Tony Battersby wrote:

  iSCSI
  performance drops to 6 - 15 MB/s when the 3Com NIC is doing heavy rx
  with light tx,
  
 
  That's strange.  The patch should only affect TX performance slightly
  since we are just turning off SG for TX.  Please take an ethereal trace
  to see what's happening and compare with a good trace.
 

 
 Update: when I revert Herbert's patch in addition to applying your
 patch, the iSCSI performance goes back up to 115 MB/s again in both
 directions.  So it looks like turning off SG for TX didn't itself cause
 the performance drop, but rather that the performance drop is just
 another manifestation of whatever bug is causing the data corruption.
 
 I do not regularly use wireshark or look at network packet dumps, so I
 am not really sure what to look for.  Given the above information, do
 you still believe that there is value in examining the packet dump?
 
 Tony

Hi Tony.  Can you give us the output of :

sudo lspci -vvv - -s 03:01.0'

(assuming that is still the correct address of the 3Com NIC.)

Also, after some digging, I found that the 5701 can run into trouble if
a 64-bit DMA read terminates early and then completes as a 32-bit transfer.
The problem is reportedly very rare, but the failure mode looks like a
match.  Can you apply the following patch and see if it helps your
performance / corruption problems?


diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index db606b6..7ad08ce 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -11409,6 +11409,8 @@ static int __devinit tg3_get_invariants(struct tg3 *tp)
tp-tg3_flags |= TG3_FLAG_PCI_HIGH_SPEED;
if ((pci_state_reg  PCISTATE_BUS_32BIT) != 0)
tp-tg3_flags |= TG3_FLAG_PCI_32BIT;
+   else if (GET_ASIC_REV(tp-pci_chip_rev_id) == ASIC_REV_5701)
+   tp-grc_mode |= GRC_MODE_FORCE_PCI32BIT;
 
/* Chip-specific fixup from Broadcom driver */
if ((tp-pci_chip_rev_id == CHIPREV_ID_5704_A0) 


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-19 Thread Herbert Xu

On Tue, Feb 19, 2008 at 05:14:26PM -0500, Tony Battersby wrote:

 Update: when I revert Herbert's patch in addition to applying your
 patch, the iSCSI performance goes back up to 115 MB/s again in both
 directions.  So it looks like turning off SG for TX didn't itself cause
 the performance drop, but rather that the performance drop is just
 another manifestation of whatever bug is causing the data corruption.

Interesting.  So the workload that regressed is mostly RX with a
little TX traffic? Can you try to reproduce this with something
like netperf to eliminate other variables?

This is all very puzzling since the patch in question shouldn't
change an RX load at all.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [2.6.25 patch] fix broken error handling in ieee80211_sta_process_addba_request()

2008-02-19 Thread Jarek Poplawski

On 19-02-2008 23:58, Adrian Bunk wrote:
...
 --- a/net/mac80211/ieee80211_sta.c
 +++ b/net/mac80211/ieee80211_sta.c
 @@ -1116,9 +1116,10 @@ static void ieee80211_sta_process_addba_request(struct 
 net_device *dev,
...
 + printk(KERN_ERR can not allocate reordering buffer 

  + printk(KERN_ERR cannot allocate reordering buffer 

Probably this can be fixed during the commit.

Jarek P.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: tbench regression in 2.6.25-rc1

2008-02-19 Thread Zhang, Yanmin

On Tue, 2008-02-19 at 08:40 +0100, Eric Dumazet wrote:
 Zhang, Yanmin a �crit :
  On Mon, 2008-02-18 at 12:33 -0500, [EMAIL PROTECTED] wrote: 
  On Mon, 18 Feb 2008 16:12:38 +0800, Zhang, Yanmin said:
 
  I also think __refcnt is the key. I did a new testing by adding 2 
  unsigned long
  pading before lastuse, so the 3 members are moved to next cache line. The 
  performance is
  recovered.
 
  How about below patch? Almost all performance is recovered with the new 
  patch.
 
  Signed-off-by: Zhang Yanmin [EMAIL PROTECTED]
  Could you add a comment someplace that says refcnt wants to be on a 
  different
  cache line from input/output/ops or performance tanks badly, to warn some
  future kernel hacker who starts adding new fields to the structure?
  Ok. Below is the new patch.
  
  1) Move tclassid under ops in case CONFIG_NET_CLS_ROUTE=y. So 
  sizeof(dst_entry)=200
  no matter if CONFIG_NET_CLS_ROUTE=y/n. I tested many patches on my 16-core 
  tigerton by
  moving tclassid to different place. It looks like tclassid could also have 
  impact on
  performance.
  If moving tclassid before metrics, or just don't move tclassid, the 
  performance isn't
  good. So I move it behind metrics.
  
  2) Add comments before __refcnt.
  
  If CONFIG_NET_CLS_ROUTE=y, the result with below patch is about 18% better 
  than
  the one without the patch.
  
  If CONFIG_NET_CLS_ROUTE=n, the result with below patch is about 30% better 
  than
  the one without the patch.
  
  Signed-off-by: Zhang Yanmin [EMAIL PROTECTED]
  
  ---
  
  --- linux-2.6.25-rc1/include/net/dst.h  2008-02-21 14:33:43.0 
  +0800
  +++ linux-2.6.25-rc1_work/include/net/dst.h 2008-02-22 12:52:19.0 
  +0800
  @@ -52,15 +52,10 @@ struct dst_entry
  unsigned short  header_len; /* more space at head required 
  */
  unsigned short  trailer_len;/* space to reserve at tail */
   
  -   u32 metrics[RTAX_MAX];
  -   struct dst_entry*path;
  -
  -   unsigned long   rate_last;  /* rate limiting for ICMP */
  unsigned intrate_tokens;
  +   unsigned long   rate_last;  /* rate limiting for ICMP */
   
  -#ifdef CONFIG_NET_CLS_ROUTE
  -   __u32   tclassid;
  -#endif
  +   struct dst_entry*path;
   
  struct neighbour*neighbour;
  struct hh_cache *hh;
  @@ -70,10 +65,20 @@ struct dst_entry
  int (*output)(struct sk_buff*);
   
  struct  dst_ops *ops;
  -   
  -   unsigned long   lastuse;
  +
  +   u32 metrics[RTAX_MAX];
  +
  +#ifdef CONFIG_NET_CLS_ROUTE
  +   __u32   tclassid;
  +#endif
  +
  +   /*
  +* __refcnt wants to be on a different cache line from
  +* input/output/ops or performance tanks badly
  +*/
  atomic_t__refcnt;   /* client references*/
  int __use;
  +   unsigned long   lastuse;
  union {
  struct dst_entry *next;
  struct rtable*rt_next;
  
  
  
 
 I prefer this patch, but unfortunatly your perf numbers are for 64 bits 
 kernels.
 
 Could you please test now with 32 bits one ?
I tested it with 32bit 2.6.25-rc1 on 8-core stoakley. The result almost has no 
difference
between pure kernel and patched kernel.

New update: On 8-core stoakley, the regression becomes 2~3% with kernel 
2.6.25-rc2. On
tigerton, the regression is still 30% with 2.6.25-rc2. On Tulsa( 8 
cores+hyperthreading),
the regression is still 4% with 2.6.25-rc2.

With my patch, on tigerton, almost all regression disappears. On tulsa, only 
about 2%
regression disappears.

So this issue is triggerred with multiple-cpu. Perhaps process scheduler is 
another
factor causing the issue to happen, but it's very hard to change scheduler.


Eric,

I tested your new patch in function loopback_xmit. It has no improvement, while 
it doesn't
introduce new issues. As you tested it on dual-core machine and got 
improvement, how about
merging your patch with mine?

-yanmin


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: tbench regression in 2.6.25-rc1

2008-02-19 Thread Eric Dumazet


Zhang, Yanmin a écrit :

On Tue, 2008-02-19 at 08:40 +0100, Eric Dumazet wrote:

Zhang, Yanmin a �crit :
On Mon, 2008-02-18 at 12:33 -0500, [EMAIL PROTECTED] wrote: 

On Mon, 18 Feb 2008 16:12:38 +0800, Zhang, Yanmin said:


I also think __refcnt is the key. I did a new testing by adding 2 unsigned long
pading before lastuse, so the 3 members are moved to next cache line. The 
performance is
recovered.

How about below patch? Almost all performance is recovered with the new patch.

Signed-off-by: Zhang Yanmin [EMAIL PROTECTED]

Could you add a comment someplace that says refcnt wants to be on a different
cache line from input/output/ops or performance tanks badly, to warn some
future kernel hacker who starts adding new fields to the structure?

Ok. Below is the new patch.

1) Move tclassid under ops in case CONFIG_NET_CLS_ROUTE=y. So 
sizeof(dst_entry)=200
no matter if CONFIG_NET_CLS_ROUTE=y/n. I tested many patches on my 16-core 
tigerton by
moving tclassid to different place. It looks like tclassid could also have 
impact on
performance.
If moving tclassid before metrics, or just don't move tclassid, the performance 
isn't
good. So I move it behind metrics.

2) Add comments before __refcnt.

If CONFIG_NET_CLS_ROUTE=y, the result with below patch is about 18% better than
the one without the patch.

If CONFIG_NET_CLS_ROUTE=n, the result with below patch is about 30% better than
the one without the patch.

Signed-off-by: Zhang Yanmin [EMAIL PROTECTED]

---

--- linux-2.6.25-rc1/include/net/dst.h  2008-02-21 14:33:43.0 +0800
+++ linux-2.6.25-rc1_work/include/net/dst.h 2008-02-22 12:52:19.0 
+0800
@@ -52,15 +52,10 @@ struct dst_entry
unsigned short  header_len; /* more space at head required 
*/
unsigned short  trailer_len;/* space to reserve at tail */
 
-	u32			metrics[RTAX_MAX];

-   struct dst_entry*path;
-
-   unsigned long   rate_last;  /* rate limiting for ICMP */
unsigned intrate_tokens;
+   unsigned long   rate_last;  /* rate limiting for ICMP */
 
-#ifdef CONFIG_NET_CLS_ROUTE

-   __u32   tclassid;
-#endif
+   struct dst_entry*path;
 
 	struct neighbour	*neighbour;

struct hh_cache *hh;
@@ -70,10 +65,20 @@ struct dst_entry
int (*output)(struct sk_buff*);
 
 	struct  dst_ops	*ops;

-   
-   unsigned long   lastuse;
+
+   u32 metrics[RTAX_MAX];
+
+#ifdef CONFIG_NET_CLS_ROUTE
+   __u32   tclassid;
+#endif
+
+   /*
+* __refcnt wants to be on a different cache line from
+* input/output/ops or performance tanks badly
+*/
atomic_t__refcnt;   /* client references*/
int __use;
+   unsigned long   lastuse;
union {
struct dst_entry *next;
struct rtable*rt_next;




I prefer this patch, but unfortunatly your perf numbers are for 64 bits kernels.

Could you please test now with 32 bits one ?

I tested it with 32bit 2.6.25-rc1 on 8-core stoakley. The result almost has no 
difference
between pure kernel and patched kernel.

New update: On 8-core stoakley, the regression becomes 2~3% with kernel 
2.6.25-rc2. On
tigerton, the regression is still 30% with 2.6.25-rc2. On Tulsa( 8 
cores+hyperthreading),
the regression is still 4% with 2.6.25-rc2.

With my patch, on tigerton, almost all regression disappears. On tulsa, only 
about 2%
regression disappears.

So this issue is triggerred with multiple-cpu. Perhaps process scheduler is 
another
factor causing the issue to happen, but it's very hard to change scheduler.



Thanks very much Yanmin, I think we can apply your patch as is, if no 
regression was found for 32bits.




Eric,

I tested your new patch in function loopback_xmit. It has no improvement, while 
it doesn't
introduce new issues. As you tested it on dual-core machine and got 
improvement, how about
merging your patch with mine?


No, thank you, that was an experiment and is not related to your findings on 
dst_entry.


I am currently working on a 'distributed refcount' infrastructure, to be able 
to spread on several nodes (for NUMA machines) or several cache lines (normal 
SMP machines)  the high pressure we currently have on some refcnt (struct 
dst_entry, struct net_device, and many more refcnts ...)


Instead of NR_CPUS allocations, goal is to be able to restrict to a small 
value like 4, 8 or 16 the number of 32bits entities used to store one refcnt, 
even if NR_CPUS=4096 or so.


atomic_inc(p-refcnt) -  distref_inc(p-refcnt)

distref_inc(struct distref *p)
{
atomic_inc(myptr[p-offset]);
}

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: e1000: Question about polling

2008-02-19 Thread Jarek Poplawski

On 18-02-2008 10:18, Badalian Vyacheslav wrote:
 Hello all.

Hi,

 Interesting think:
 
 Have PC that do NAT. Bandwidth about 600 mbs.
 
 Have  4 CPU (2xCoRe 2 DUO HT OFF 3.2 HZ).
 
 irqbalance in kernel is off.
 
 nat2 ~ # cat /proc/irq/217/smp_affinity
 0001
 nat2 ~ # cat /proc/irq/218/smp_affinity
 0003
 
 Load SI on CPU0 and CPU1 is about 90%
 
 Good... try do
 echo   /proc/irq/217/smp_affinity
 echo   /proc/irq/218/smp_affinity
 
 Get 100% SI at CPU0
 
 Question Why?

I think you should show here /proc/interrupts in all these cases.

 
 I listen that if use IRQ from 1 netdevice to 1 CPU i can get 30% 
 perfomance... but i have 4 CPU... i must get more perfomance if i cat 
   to smp_affinity.
 
 picture looks liks this:
 0-3 CPU get over 50% SI bandwith up 55% SI... bandwith up... 
 100% SI on CPU0
 
 I remember patch to fix problem like it... patched function 
 e1000_clean...  kernel on pc have this patch (2.6.24-rc7-git2)... e1000 
 driver work much better (i up to 1.5-2x bandwidth before i get 100% SI), 
 but i think that it not get 100% that it can =)

If some patch works for you, and you can show here its advantages,
you should probably add here some link and request for merging.

BTW, I wonder if you tried to check if changing CONFIG_HZ makes any
difference here?

Regards,
Jarek P.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

77 matches

Mail list logo