Re: Races in net_rx_action vs netpoll?

2007-07-19 Thread Olaf Kirch
On Thursday 12 July 2007 04:33, David Miller wrote:
 I'll add merge your patch with a target of 2.6.23
 
 If you really want, after this patch has sat in 2.6.23 for a while
 and got some good testing, we can consider a submission for -stable.

Okay, those of you who followed the discussion on lkml will have
read why this patch breaks on e1000.

Short summary: some NIC drivers expect that there is a one-to-one
relation between calls to net_rx_schedule (where we put the device
on the poll list) and netif_rx_complete (where it's supposed to be
taken off the list). The e1000 is such a beast. Not sure if other
drivers make the same assumption re NAPI.

So: should a driver be allowed to rely on this behavior? Or should
I go and look for another fix to the poll_napi issue?

I keep coming back to the question Jarek asked - why does netpoll
want to call dev-poll() anyway? I dug around a little and it
seems the original idea was to do this only if netpoll_poll was
running on the CPU the netdevice was scheduled to.

So one way to fix the problem is to add a dev-poll_cpu field
that tells us on which CPU's poll list it has been added - and
check for this in poll_napi.

Comments?

David, should I submit an updated patch for 2.6.23, or do you
prefer to yank the patch now and try again for 2.6.24?

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Races in net_rx_action vs netpoll?

2007-07-12 Thread Olaf Kirch
Hi Jarek,

On Thursday 12 July 2007 14:59, Jarek Poplawski wrote:

  +#ifdef CONFIG_NETPOLL
  +   /* Prevent race with netpoll - yes, this is a kludge.
  +* But at least it doesn't penalize the non-netpoll
  +* code path. */
 
 Alas, this can penalize those who have it enabled (e.g. by distro),
 but don't use.

Well, the test_bit is actually cheap; it's not atomic, and has no memory
ordering requirements by all I know. The costly thing is set_bit/clear_bit
in poll_napi; and you only ever get there when you *use* netpoll.

 And it looks like _netif_rx_complete should be a better place,
 at least considering such cards as: 8139too, skge, sungem and
 maybe more (according to 2.6.22).

Why?

  +   set_bit(__LINK_STATE_POLL_LIST_FROZEN, np-dev-state);
  npinfo-rx_flags |= NETPOLL_RX_DROP;
 
 I wonder, why this flag cannot be used for this check?

I tried, but it made the patch rather icky. netpoll_info is defined
in netpoll.h, which includes netdevice.h. So you cannot inline the
check, and have to use an out-of-line function instead, along the
lines of

extern int am_i_being_called_by_poll_napi(struct net_device *);

netif_rx_complete(struct net_device *dev)
{
#ifdef CONFIG_NETPOLL
if (unlikely(dev-npinfo  am_i_being_called_by_poll_napi(dev))
return;
#endif
...
}

If you don't mind that, yes - this flag (or better, a newly introduced
NETPOLL_RX_NAPI) may work as well.

One thing I was a little worried about was whether dev-npinfo can
go away all of a sudden. It's really just protected by an rcu_readlock...

 BTW, I'd be very glad if somebody could hint me what is the main
 reason for such troublesome function as poll_napi: if it's about
 performance isn't this enough to increase budget for netpoll in
 net_rx_action?

I think one reason is that you want to get the kernel oops out even
when the machine is so hosed that it doesn't even service softirqs
anymore.

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Races in net_rx_action vs netpoll?

2007-07-11 Thread Olaf Kirch
On Wednesday 11 July 2007 07:44, David Miller wrote:
  +#ifdef CONFIG_NETPOLL
  +   /* Prevent race with netpoll - yes, this is a kludge.
  +* But at least it doesn't penalize the non-netpoll
  +* code path. */
  +   if (test_bit(__LINK_STATE_POLL_LIST_FROZEN, dev-state))
  +   return;
  +#endif
  +
  local_irq_save(flags);
  BUG_ON(!test_bit(__LINK_STATE_RX_SCHED, dev-state));
  list_del(dev-poll_list);
 
 That new bit can be set in interrupt context can't it?

It's set and cleared in poll_napi only, and as far as I can tell 
poll_napi will only ever be called from via softirq, but never
from an interrupt handler directly.

I also don't think the test_bit() needs to lock out interrupts.
The only reason we do it for the RX_SCHED bit is that the RX_SCHED
bit and the poll_list change must happen atomically wrt interrupts
from the NIC, right?

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Races in net_rx_action vs netpoll?

2007-07-10 Thread Olaf Kirch
On Tuesday 10 July 2007 00:27, David Miller wrote:
 I'm happy to entertain this kind of solution, but we really
 need to first have an interface to change multiple bits
 at a time in one atomic operation, because by itself this
 patch doubles the number of atomices we do when starting
 a NAPI poll.

Understood. How about the patch below? It takes a similar
approach, but it puts the onus on the netpoll code
path rather than the general NAPI case.

Olaf
--
From: Olaf Kirch [EMAIL PROTECTED]

Keep netpoll/poll_napi from messing with the poll_list.
Only net_rx_action is allowed to manipulate the list.

Signed-off-by: Olaf Kirch [EMAIL PROTECTED]
---
 include/linux/netdevice.h |   10 ++
 net/core/netpoll.c|8 
 2 files changed, 18 insertions(+)

Index: iscsi-2.6/include/linux/netdevice.h
===
--- iscsi-2.6.orig/include/linux/netdevice.h
+++ iscsi-2.6/include/linux/netdevice.h
@@ -248,6 +248,8 @@ enum netdev_state_t
__LINK_STATE_LINKWATCH_PENDING,
__LINK_STATE_DORMANT,
__LINK_STATE_QDISC_RUNNING,
+   /* Set by the netpoll NAPI code */
+   __LINK_STATE_POLL_LIST_FROZEN,
 };
 
 
@@ -919,6 +921,14 @@ static inline void netif_rx_complete(str
 {
unsigned long flags;
 
+#ifdef CONFIG_NETPOLL
+   /* Prevent race with netpoll - yes, this is a kludge.
+* But at least it doesn't penalize the non-netpoll
+* code path. */
+   if (test_bit(__LINK_STATE_POLL_LIST_FROZEN, dev-state))
+   return;
+#endif
+
local_irq_save(flags);
BUG_ON(!test_bit(__LINK_STATE_RX_SCHED, dev-state));
list_del(dev-poll_list);
Index: iscsi-2.6/net/core/netpoll.c
===
--- iscsi-2.6.orig/net/core/netpoll.c
+++ iscsi-2.6/net/core/netpoll.c
@@ -123,6 +123,13 @@ static void poll_napi(struct netpoll *np
if (test_bit(__LINK_STATE_RX_SCHED, np-dev-state) 
npinfo-poll_owner != smp_processor_id() 
spin_trylock(npinfo-poll_lock)) {
+   /* When calling dev-poll from poll_napi, we may end up in
+* netif_rx_complete. However, only the CPU to which the
+* device was queued is allowed to remove it from poll_list.
+* Setting POLL_LIST_FROZEN tells netif_rx_complete
+* to leave the NAPI state alone.
+*/
+   set_bit(__LINK_STATE_POLL_LIST_FROZEN, np-dev-state);
npinfo-rx_flags |= NETPOLL_RX_DROP;
atomic_inc(trapped);
 
@@ -130,6 +137,7 @@ static void poll_napi(struct netpoll *np
 
atomic_dec(trapped);
npinfo-rx_flags = ~NETPOLL_RX_DROP;
+   clear_bit(__LINK_STATE_POLL_LIST_FROZEN, np-dev-state);
spin_unlock(npinfo-poll_lock);
}
 }

-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration

2007-06-24 Thread Olaf Kirch
On Saturday 23 June 2007 03:17, Simon Arlott wrote:
 Because then it requires yet another network daemon, RA in 
 the kernel means there's no need for one to manage adding 
 auto-configured IP addresses... what's wrong with doing the 
 same for DNS?

Actually I think an integrated network autoconfiguration
daemon capable of doing DHCP, DHCP6, RDNSS, mDNS, you-name-it
would be a nice thing.

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net: Make skb_seq_read unmap the last fragment

2007-06-19 Thread Olaf Kirch
From: Olaf Kirch [EMAIL PROTECTED]

Make skb_seq_read unmap the last fragment

Having walked through the entire skbuff, skb_seq_read would leave the
last fragment mapped.  As a consequence, the unwary caller would leak
kmaps, and proceed with preempt_count off by one. The only (kind of
non-intuitive) workaround is to use skb_seq_read_abort.

This patch makes sure skb_seq_read always unmaps frag_data after having
cycled through the skb's paged part.

Signed-off-by: [EMAIL PROTECTED]
---
 net/core/skbuff.c |5 +
 1 file changed, 5 insertions(+)

Index: build-2.6/net/core/skbuff.c
===
--- build-2.6.orig/net/core/skbuff.c
+++ build-2.6/net/core/skbuff.c
@@ -1706,6 +1706,11 @@ next_skb:
st-stepped_offset += frag-size;
}
 
+   if (st-frag_data) {
+   kunmap_skb_frag(st-frag_data);
+   st-frag_data = NULL;
+   }
+
if (st-cur_skb-next) {
st-cur_skb = st-cur_skb-next;
st-frag_idx = 0;

-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: very strange inet_sock corruption with rpc

2007-04-26 Thread Olaf Kirch
On Wednesday 25 April 2007 23:03, Vlad Yasevich wrote:
 It looks like someone is stepping all over the inet_sock.
 We'll continue looking, but if anyone has any ideas of what might
 be going on, I'd appreciate it.

This could be a socket in TIME_WAIT. A socket entering
TIME_WAIT will be replaced by an inet_timewait_sock, which is
a kind of truncated inet_sock.

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] avoid OPEN_MAX in SCM_MAX_FD

2007-03-14 Thread Olaf Kirch
On Wednesday 14 March 2007 02:15, Linus Torvalds wrote:
 Sure. I'm just saying that some people may use OPEN_MAX the way I know
 people use PATH_MAX - whether it's what you're supposed to or not.

glibc removed OPEN_MAX from its header files several years ago. If you
want to find a piece of code that still uses it, it's most likely something
that hasn't seen a compiler in years - and will likely continue to do
so.

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fix for ipv6_setsockopt NULL dereference

2007-03-09 Thread Olaf Kirch
I came across this bug in http://bugzilla.kernel.org/show_bug.cgi?id=8155

Here's a potential fix.

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
--
Fix NULL pointer derefence in ipv6_setsockopt, as described in bug #8155.

Signed-off-by: [EMAIL PROTECTED]

---
 net/ipv6/ipv6_sockglue.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: build-2.6/net/ipv6/ipv6_sockglue.c
===
--- build-2.6.orig/net/ipv6/ipv6_sockglue.c
+++ build-2.6/net/ipv6/ipv6_sockglue.c
@@ -413,7 +413,7 @@ static int do_ipv6_setsockopt(struct soc
}
 
/* routing header option needs extra check */
-   if (optname == IPV6_RTHDR  opt-srcrt) {
+   if (optname == IPV6_RTHDR  opt  opt-srcrt) {
struct ipv6_rt_hdr *rthdr = opt-srcrt;
switch (rthdr-type) {
case IPV6_SRCRT_TYPE_0:
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NFS] [PATCH 001 of 3] knfsd: Use recv_msg to get peer address for NFSD instead of code-copying

2007-03-05 Thread Olaf Kirch
On Friday 02 March 2007 05:28, NeilBrown wrote:
 The sunrpc server code needs to know the source and destination address
 for UDP packets so it can reply properly.
 It currently copies code out of the network stack to pick the pieces out
 of the skb.
 This is ugly and causes compile problems with the IPv6 stuff.

... and this IPv6 code could never have worked anyway:


   case AF_INET6: {
...
 - rqstp-rq_addrlen = sizeof(struct sockaddr_in);
... this should have been sizeof(sockaddr_in6)...

 - /* Remember which interface received this request */
 - ipv6_addr_copy(rqstp-rq_daddr.addr6,
 - skb-nh.ipv6h-saddr);
 and this should have copied from daddr, not saddr.

But I find using recvmsg just for getting at the addresses
a little awkward too. And I think to be on the safe side, you
should check that you're really looking at a PKTINFO cmsg
rather than something else.

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NFS] [PATCH 001 of 3] knfsd: Use recv_msg to get peer address for NFSD instead of code-copying

2007-03-05 Thread Olaf Kirch

Hi Neil,

here's another minor comment:

On Friday 02 March 2007 05:28, NeilBrown wrote:
 +static inline void svc_udp_get_dest_address(struct svc_rqst *rqstp,
 + struct cmsghdr *cmh)
  {
   switch (rqstp-rq_sock-sk_sk-sk_family) {
   case AF_INET: {
 + struct in_pktinfo *pki = CMSG_DATA(cmh);
 + rqstp-rq_daddr.addr.s_addr = pki-ipi_spec_dst.s_addr;
   break;
 + }
...

The daddr that is extracted here will only ever be used to build
another PKTINFO cmsg when sending the reply. So it would be
much easier to just store the raw control message in the svc_rqst,
without looking at its contents, and send it out along with the reply,
unchanged.

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Question on IOAT

2007-02-05 Thread Olaf Kirch

Hi,

I looked into the IOAT code today as I'm trying to find out how to add support
for it to NFS. I ran into this piece of code, which waits for the DMA
operation to complete:

while (dma_async_memcpy_complete(tp-ucopy.dma_chan,
 tp-ucopy.dma_cookie, done,
 used) == DMA_IN_PROGRESS) {
/* do partial cleanup of sk_async_wait_queue */
while ((skb = skb_peek(sk-sk_async_wait_queue)) 
   (dma_async_is_complete(skb-dma_cookie, done,
  used) == DMA_SUCCESS)) {
__skb_dequeue(sk-sk_async_wait_queue);
kfree_skb(skb);
}
}

Nowhere in the dma_async_*complete functions can I see any code
that would sleep if the DMA is not yet complete. Am I missing something,
or are we really busy-waiting on the DMA engine? Wouldn't this kind of
defeat the purpose of freeing up the CPU from the chores of memcpying?

I also checked the code in ioatdma.c - I would have expected there to
be some kind of interrupt handler that kicks the upper layers when a
DMA operation completes. But the interrupt handler seems to be for
error reporting exclusively... 

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] natsemi cable length woes

2007-01-04 Thread Olaf Kirch
Here's a proposed patch that addresses a problem with natsemi
NICs and long cables we've been chasing (*sigh*).

I'm interested in feedback on how to fix this sanely.

Olaf
--
From: Olaf Kirch [EMAIL PROTECTED]
Subject: natsemi: make cable length magic configurable

We had a customer report concerning problems with a Natsemi DP83815-D
and long cables. With 100m cables, the network would be essentially dead,
not a single packet would get through either way. We had to apply the
patch below to make it work.

The patch adds a module parameter named no_cable_magic that does
two things:

 -  Unconditionally set the DSPCFG register to the
fixed value. Without this change, the chip apparently
never completes autonegotiation in the tested configuration.

This has been an unconditional assignment for a long time,
until this was changed in 2.6.11 (there's an interesting
explanation in the ChangeLog, bk commit is
5871b81bf2b5cf188deab0d414dce104fcb69ca6)

 -  skip the bit banging in {,un}do_cable_magic. It seems that
if we write the DSPCFG register as above, a rev D chip will report
all cables as short cables, which do_cable_magic detects, and
trying to be helpful it will fix the attenuation coefficient.

I admit the use of a module parameter is ugly, but I didn't find a sane
way to fix this - especially since the magic registers we're changing
are kind of underdocumented.

Signed-off-by: Olaf Kirch [EMAIL PROTECTED]

 drivers/net/natsemi.c |   11 ---
 1 files changed, 8 insertions(+), 3 deletions(-)

Index: linux-2.6.19/drivers/net/natsemi.c
===
--- linux-2.6.19.orig/drivers/net/natsemi.c
+++ linux-2.6.19/drivers/net/natsemi.c
@@ -72,6 +72,7 @@
 static int debug = -1;
 
 static int mtu;
+static int no_cable_magic;
 
 /* Maximum number of multicast addresses to filter (vs. rx-all-multicast).
This chip uses a 512 element hash table based on the Ethernet CRC.  */
@@ -139,6 +140,7 @@ MODULE_LICENSE(GPL);
 module_param(mtu, int, 0);
 module_param(debug, int, 0);
 module_param(rx_copybreak, int, 0);
+module_param(no_cable_magic, int, 0);
 module_param_array(options, int, NULL, 0);
 module_param_array(full_duplex, int, NULL, 0);
 MODULE_PARM_DESC(mtu, DP8381x MTU (all boards));
@@ -148,6 +150,9 @@ MODULE_PARM_DESC(rx_copybreak,
 MODULE_PARM_DESC(options,
DP8381x: Bits 0-3: media type, bit 17: full duplex);
 MODULE_PARM_DESC(full_duplex, DP8381x full duplex setting(s) (1));
+MODULE_PARM_DESC(no_cable_magic,
+   DP8381x: set no_cable_magic=1 to disable magic workaround for short 
cables 
+   (may help with long cables:-));
 
 /*
Theory of Operation
@@ -1147,7 +1152,7 @@ static void init_phy_fixup(struct net_de
writew(1, ioaddr + PGSEL);
writew(PMDCSR_VAL, ioaddr + PMDCSR);
writew(TSTDAT_VAL, ioaddr + TSTDAT);
-   np-dspcfg = (np-srr = SRR_DP83815_C)?
+   np-dspcfg = (np-srr = SRR_DP83815_C || no_cable_magic)?
DSPCFG_VAL : (DSPCFG_COEF | readw(ioaddr + DSPCFG));
writew(np-dspcfg, ioaddr + DSPCFG);
writew(SDCFG_VAL, ioaddr + SDCFG);
@@ -1511,7 +1516,7 @@ static void do_cable_magic(struct net_de
if (dev-if_port != PORT_TP)
return;
 
-   if (np-srr = SRR_DP83816_A5)
+   if (np-srr = SRR_DP83816_A5 || no_cable_magic)
return;
 
/*
@@ -1556,7 +1561,7 @@ static void undo_cable_magic(struct net_
if (dev-if_port != PORT_TP)
return;
 
-   if (np-srr = SRR_DP83816_A5)
+   if (np-srr = SRR_DP83816_A5 || no_cable_magic)
return;
 
writew(1, ioaddr + PGSEL);

-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.19-rc1: Volanomark slowdown

2006-11-09 Thread Olaf Kirch
On Wed, Nov 08, 2006 at 02:07:32PM -0800, Tim Chen wrote:
 In my testing, the CPU utilization is at 100%.  So
 increase in ACKs will cost CPU to devote more
 time to process those ACKs and reduce throughput.

Oh, I see. I would test on a real network with real clients. I doubt
you would observe a noticeable effect there.

Olaf
-- 
Walks like a duck. Quacks like a duck. Must be a chicken.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.19-rc1: Volanomark slowdown

2006-11-08 Thread Olaf Kirch
On Wed, Nov 08, 2006 at 04:55:18PM +0100, Arjan van de Ven wrote:
 I wonder if it's an option to use low priority QoS fields for these acks
 (heck I don't even know if ACKs have such fields in their packet) so
 that they can get dropped if there are more packets then there is
 bandwidth 

Is it proven that the number of ACKs actually cause bandwidth problems?
I found Volanomark to exercise the scheduler more than anything else,
so maybe the slowdown, while triggered by an increased number of ACKs,
is caused by something else entirely.

Olaf
-- 
Walks like a duck. Quacks like a duck. Must be a chicken.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.19-rc1: Volanomark slowdown

2006-11-08 Thread Olaf Kirch
On Wed, Nov 08, 2006 at 10:38:52AM -0800, Tim Chen wrote:
 The patch in question affects purely TCP and not the scheduler.  I don't

I know.

 think the scheduler has anything to do with the slowdown seen after
 the patch is applied.

In fixing performance issues, the most obvious explanation isn't always
the right one. It's quite possible you're right, sure.

What I'm saying though is that it doesn't rhyme with what I've seen of
Volanomark - we ran 2.6.16 on a 4p Intel box for instance and it didn't
come close to saturating a Gigabit pipe before it maxed out on CPU load.

 The total number of messages being exchanged around the chatrooms in 
 Volanomark remain unchanged.  But ACKS increase by 3.5 times and
 segments received increase by 38% from netstat.  

 So I think it is reasonable to conclude that the increase in TCP traffic
 reduce the bandwidth and throughput in Volanomark.

You could count the number of outbound packets dropped on the server.

Olaf
-- 
Walks like a duck. Quacks like a duck. Must be a chicken.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Fix order in inet_init failure path.

2006-09-26 Thread Olaf Kirch
This is just a minor buglet I came across by accident - when inet_init
fails to register raw_prot, it jumps to out_unregister_udp_proto which
should unregister UDP _and_ TCP.

Signed-off-by: Olaf Kirch [EMAIL PROTECTED]

 net/ipv4/af_inet.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

Index: build/net/ipv4/af_inet.c
===
--- build.orig/net/ipv4/af_inet.c
+++ build/net/ipv4/af_inet.c
@@ -1345,10 +1345,10 @@ static int __init inet_init(void)
rc = 0;
 out:
return rc;
-out_unregister_tcp_proto:
-   proto_unregister(tcp_prot);
 out_unregister_udp_proto:
proto_unregister(udp_prot);
+out_unregister_tcp_proto:
+   proto_unregister(tcp_prot);
goto out;
 }
 

-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC IPv6] Disabling IPv6 autoconf

2006-08-29 Thread Olaf Kirch
Hi,

we had bug reports from people seeing lots of spurious messages
like the following:

kernel: peth0: received packet with own address as source address.

and

xenbr0: duplicate address detected!

This is on a Xen enabled machine, with lots of Xen machines on the
same network.

When the Xen code configures the bridge device, this will do IPv6
autoconfiguration for the interface, and since they use synthetic MAC
addresses, there will be DAD collisions.

When the Xen people looked for a way to disable IPv6 autoconf of the
bridge, they didn't find any way to do it without bringing up the
device first (and thereby triggering DAD).

The attached tentative patch makes IPv6 autoconf depend on the
availability of IFF_MULTICAST. This is admittedly a bit of a hack, but
it makes sense, since DAD and router solicitation do rely on multicast.

Any comments?

Thanks,
Olaf
-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
Summary: Allow to bring up network interface w/o ipv6 autoconf

When bringing up a xen bridge device, it will always be configured to
use a MAC address of ff:ff:ff:ff:ff:fe.  This greatly confuses IPv6 DAD,
which starts logging lots and lots of useless messages to syslog.

We really want to disable IPv6 on these interfaces, and there doesn't
seem to be a reliable way to do this without bringing the interface
up first (and triggering IPv6 autoconf). 

This patch makes autoconf (DAD and router discovery) depend on the
interface's ability to do multicast. Turning off multicast for an
interface before bringing it up will suppress autoconfiguration.

Signed-off-by: Olaf Kirch [EMAIL PROTECTED]

 net/ipv6/addrconf.c |2 ++
 1 files changed, 2 insertions(+)

Index: build/net/ipv6/addrconf.c
===
--- build.orig/net/ipv6/addrconf.c
+++ build/net/ipv6/addrconf.c
@@ -2462,6 +2462,7 @@ static void addrconf_dad_start(struct in
spin_lock_bh(ifp-lock);
 
if (dev-flags(IFF_NOARP|IFF_LOOPBACK) ||
+   !(dev-flagsIFF_MULTICAST) ||
!(ifp-flagsIFA_F_TENTATIVE)) {
ifp-flags = ~IFA_F_TENTATIVE;
spin_unlock_bh(ifp-lock);
@@ -2546,6 +2547,7 @@ static void addrconf_dad_completed(struc
if (ifp-idev-cnf.forwarding == 0 
ifp-idev-cnf.rtr_solicits  0 
(dev-flagsIFF_LOOPBACK) == 0 
+   (dev-flags  IFF_MULTICAST) 
(ipv6_addr_type(ifp-addr)  IPV6_ADDR_LINKLOCAL)) {
struct in6_addr all_routers;
 


[PATCH IPv6] Fix race condition in ipv6_add_addr

2006-08-29 Thread Olaf Kirch
Here's a patch originally from Keir Fraser, which we included in SLES10,
but which we forgot to submit upstream so far.

During stress testing, machines were frequently crashing in
__ipv6_ifa_notify on dst_hold(ifp-rt.u_dst), with ifp-rt being a
NULL pointer.

The attached patch fixes the problem.

Thanks,
Olaf
-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
From: Keir Fraser [EMAIL PROTECTED]
Subject: ipv6_add_addr should install dstentry earlier

ipv6_add_addr allocates a struct inet6_ifaddr and a dstentry, but it
doesn't install the dstentry in ifa-rt until after it releases the
addrconf_hash_lock. This means other CPUs will be able to see the new
address while it hasn't been initialized completely yet.

One possible fix would be to grab the ifp-lock spinlock when
creating the address struct; a simpler fix is to just move the
assignment.

Acked-by: [EMAIL PROTECTED]
Acked-by: [EMAIL PROTECTED]

--- linux-2.6.16.13-old/net/ipv6/addrconf.c 2006-05-02 22:38:44.0 
+0100
+++ linux-2.6.16.13-new/net/ipv6/addrconf.c 2006-06-18 10:16:50.0 
+0100
@@ -549,6 +549,8 @@
ifa-flags = flags | IFA_F_TENTATIVE;
ifa-cstamp = ifa-tstamp = jiffies;
 
+   ifa-rt = rt;
+
ifa-idev = idev;
in6_dev_hold(idev);
/* For caller */
@@ -575,8 +577,6 @@
}
 #endif
 
-   ifa-rt = rt;
-
in6_ifa_hold(ifa);
write_unlock(idev-lock);
 out2:


Re: [RFC IPv6] Disabling IPv6 autoconf

2006-08-29 Thread Olaf Kirch
On Tue, Aug 29, 2006 at 08:39:53PM +1000, Herbert Xu wrote:
 Netfilter is broken for a different reason.  It breaks because packets
 pass through it twice, once going through brigde netfilter and once
 through the Xen netloop interface.  So ideally they'd get rid of the
 netloop device in which case they won't have to disable multicasting
 on the bridge device anymore.

I agree, this would be the right long-term fix.

Olaf
-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC IPv6] Disabling IPv6 autoconf

2006-08-29 Thread Olaf Kirch
On Tue, Aug 29, 2006 at 06:34:26PM +0900, YOSHIFUJI Hideaki / ?$B5HF#1QL@ wrote:
  The attached tentative patch makes IPv6 autoconf depend on the
  availability of IFF_MULTICAST. This is admittedly a bit of a hack, but
  it makes sense, since DAD and router solicitation do rely on multicast.
 
 I disagree.  The node MUST NOT assign live address on
 that interface.

I'm not sure I understand. The Xen bridge devices get fe:ff:ff:ff:ff:ff
as MAC address.  Which is a bit hackish, but that seems to be the way
the Xen folks want to do it.

OTOH they do not want to do any IPv6 autoconfiguration with these
addresses, because they generate DAD warnings, and cause one
random machine to obtain fe80::::feff: as link-local
address.

 Further analysis is needed, but one idea is to skip
 addrconf_dev_config() if !(dev-flags  IFF_MULTICAST).

That should work just as well. Do you want me to submit an
updated patch?

Thanks,
Olaf
-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC IPv6] Disabling IPv6 autoconf

2006-08-29 Thread Olaf Kirch
On Tue, Aug 29, 2006 at 01:55:28PM +0300, Pekka Savola wrote:
 It's not obvious that IFF_MULTICAST is good enough.  IMHO, you should 
 be able to run addrconf on non-multicast interfaces as well (e.g., 
 point-to-point interfaces, tunnels in particular).

So would it work to use this?
(flags  (IFF_MULTICAST|IFF_POINTOPOINT)) == 0

 It seems that current code already excludes IFF_NOARP interfaces 
 though.

I looked at that - it doesn't help because it just disables DAD, but
still does router solicitation, and I think it also sends a MLD listener
report with the bogus link-layer address. I tend to agree that it's
incorrect to assign an address at all in this case.

Olaf
-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC IPv6] Disabling IPv6 autoconf

2006-08-29 Thread Olaf Kirch
On Tue, Aug 29, 2006 at 08:10:21PM +0200, Thomas Graf wrote:
  When the Xen people looked for a way to disable IPv6 autoconf of the
  bridge, they didn't find any way to do it without bringing up the
  device first (and thereby triggering DAD).
 
 They didn't find any because there is no need to disable it. I

Well, as a global statement that may merit a separate debate.
In this specific case, maybe.

First off, I do agree that the use of a constant MAC address across the
board is a bad move :)

OTOH, there are good reasons why you want to turn off autoconf on
specific devices; and the current method of first bringing up
the device and then disabling it doesnt quite cut it.

One could also argue that there's a good reason to not assign addresses
to pure bridge devices at all, regardless of their brokenness.
We don't want to assign IPv4 addresses to a pure bridge, and I think it's
a reasonable expectation that there should be a way to tell the IPv6
stack to keep its hands off that device, too.

 got wrong. Setting dev_addr to -1 is just plain wrong, other
 virtual ethernet devices call random_ether_addr(), it's not a
 new problem at all.

Okay, fine with me - maybe we can convince them to use that
instead.

Thanks for the feedback,
Olaf
-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH netdev-2.6 00/19] e1000: driver update (fixes for 2.6.16)

2006-03-03 Thread Olaf Kirch
 otoh, it's likely that SLES10 and RH-whatever will end up shipping with
 those patches as a backport, so we might as well merge it for them?

We're not awfully keen on seeing these in 2.6.16. We're deep into beta
and can only keep tracking mainline rcs right now as long as they're
bugfix only.  Major driver updates at this point would invalidate the
QA that has happened on these already.

 I'll push it into netdev-2.6.git#upstream (and into meta-branch #ALL), 
 which means it is queued for 2.6.17 [2.6.16-git1, really].  Once that 

Fine with me.

Thanks
Olaf
-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Strange IPsec freeze/partial fix

2006-02-08 Thread Olaf Kirch
On Wed, Feb 08, 2006 at 07:46:48AM +1100, Herbert Xu wrote:
 I suggest that we simply bail out always.  If the dst decides to die
 on us later on, the packet will be dropped anyway.  So there is no
 great urgency to retry here.  Once we have the proper resolution
 queueing, we can then do the retry again.

Yes, that is simpler, and should work as well.

 Signed-off-by: Herbert Xu [EMAIL PROTECTED]

Acked-by: Olaf Kirch [EMAIL PROTECTED]

Olaf
-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Strange IPsec freeze/partial fix

2006-02-07 Thread Olaf Kirch
Hi,

there's a problem with IPsec that has been bugging some of our users
for the last couple of kernel revs. Every now and then, IPsec will
freeze the machine completely. This is with openswan user land,
and with kernels up to and including 2.6.16-rc2.

I managed to debug this a little, and what happens is that we end
up looping in xfrm_lookup, and never get out. With a bit of debug
printks added, I can this happening:

ip_route_output_flow calls xfrm_lookup

xfrm_find_bundle returns NULL (apparently we're in the
middle of negotiating a new SA or something)

We therefore call xfrm_tmpl_resolve. This returns EAGAIN
We go to sleep, waiting for a policy update.
Then we loop back to the top

Apparently, the dst_orig that was passed into xfrm_lookup
has been dropped from the routing table (obsolete=2)
This leads to the endless loop, because we now create
a new bundle, check the new bundle and find it's stale
(stale_bundle - xfrm_bundle_ok - dst_check() return 0)

People have been testing with the patch below, which seems to fix the
problem partially. They still see connection hangs however (things
only clear up when they start a new ping or new ssh). So the patch
is obvsiouly not sufficient, and something else seems to go wrong.

I'm grateful for any hints you may have...

Olaf
-- 
Subject: [XFRM] Fix infinite loop in xfrm_lookup

It seems that the route xfrm_lookup is given on input can go
away when we sleep.

Signed-off-by: Olaf Kirch [EMAIL PROTECTED]

 net/ipv4/route.c   |   25 -
 net/xfrm/xfrm_policy.c |   16 
 2 files changed, 32 insertions(+), 9 deletions(-)

diff -r df2df438c970 net/ipv4/route.c
--- a/net/ipv4/route.c  Mon Feb  6 14:08:26 2006 -0500
+++ b/net/ipv4/route.c  Mon Feb  6 15:52:09 2006 -0500
@@ -2609,18 +2609,25 @@ int ip_route_output_flow(struct rtable *
 {
int err;
 
-   if ((err = __ip_route_output_key(rp, flp)) != 0)
-   return err;
-
-   if (flp-proto) {
-   if (!flp-fl4_src)
-   flp-fl4_src = (*rp)-rt_src;
-   if (!flp-fl4_dst)
-   flp-fl4_dst = (*rp)-rt_dst;
-   return xfrm_lookup((struct dst_entry **)rp, flp, sk, flags);
-   }
-
-   return 0;
+   if (flp-proto == 0) {
+   err = __ip_route_output_key(rp, flp);
+   } else {
+   u32 fl_src = flp-fl4_src, fl_dst = flp-fl4_dst;
+   int repeat = 1;
+
+   do {
+   if ((err = __ip_route_output_key(rp, flp)) != 0)
+   break;
+
+   if (!fl_src)
+   flp-fl4_src = (*rp)-rt_src;
+   if (!fl_dst)
+   flp-fl4_dst = (*rp)-rt_dst;
+   err = xfrm_lookup((struct dst_entry **)rp, flp, sk, 
flags);
+   } while (err == -EAGAIN  repeat--);
+   }
+
+   return err;
 }
 
 EXPORT_SYMBOL_GPL(ip_route_output_flow);
diff -r df2df438c970 net/xfrm/xfrm_policy.c
--- a/net/xfrm/xfrm_policy.cMon Feb  6 14:08:26 2006 -0500
+++ b/net/xfrm/xfrm_policy.cMon Feb  6 15:52:09 2006 -0500
@@ -786,7 +786,22 @@ int xfrm_lookup(struct dst_entry **dst_p
u16 family = dst_orig-ops-family;
u8 dir = policy_to_flow_dir(XFRM_POLICY_OUT);
u32 sk_sid = security_sk_sid(sk, fl, dir);
+   int loops = 0;
+
 restart:
+   if (loops  dst_orig  dst_orig-obsolete  0) {
+   printk(KERN_NOTICE xfrm_lookup: route is stale (obsolete=%d, 
loops=%d)\n,
+   dst_orig-obsolete, loops);
+   err = -EAGAIN;
+   goto error_nopol;
+   }
+   if (unlikely(++loops  10)) {
+   printk(KERN_NOTICE xfrm_lookup bailing out after %d loops\n, 
loops);
+   dump_stack();
+   err = -EHOSTUNREACH;
+   goto error_nopol;
+   }
+
genid = atomic_read(flow_cache_genid);
policy = NULL;
if (sk  sk-sk_policy[1])
@@ -854,6 +869,7 @@ restart:
}
if (nx == -EAGAIN ||
genid != atomic_read(flow_cache_genid)) {
+   printk(KERN_NOTICE xfrm_tmpl_resolve 
says EAGAIN, try again\n);
xfrm_pol_put(policy);
goto restart;
}
@@ -903,8 +919,9 @@ restart:
return 0;
 
 error:
+   xfrm_pol_put(policy);
+error_nopol:
dst_release(dst_orig);
-   xfrm_pol_put(policy);
*dst_p = NULL;
return err;
 }
-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe

Re: e100 oops on resume

2006-01-26 Thread Olaf Kirch
On Thu, Jan 26, 2006 at 08:02:37PM +0100, Stefan Seyfried wrote:
 Will be in the next SUSE betas, so if anything breaks, we'll notice
 it.

I doubt it. As Jesse mentioned, e100_hw_init is called from e100_up,
so the call from e100_resume was really superfluous.

Olaf
-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e100 oops on resume

2006-01-25 Thread Olaf Kirch
On Wed, Jan 25, 2006 at 12:21:42AM +0100, Mattia Dongili wrote:
 I experienced the same today, I was planning to get a photo tomorrow :)
 I'm running 2.6.16-rc1-mm2 and the last working kernel was 2.6.15-mm4
 (didn't try .16-rc1-mm1 being scared of the reiserfs breakage).

I think that's because the latest driver version wants to wait for
the ucode download, and e100_exec_cb_wait before allocating any
control blocks.

static inline int e100_exec_cb_wait(struct nic *nic, struct sk_buff *skb,
void (*cb_prepare)(struct nic *, struct cb *, struct sk_buff *))
{
int err = 0, counter = 50;
struct cb *cb = nic-cb_to_clean;

if ((err = e100_exec_cb(nic, NULL, e100_setup_ucode)))
DPRINTK(PROBE,ERR, ucode cmd failed with error %d\n, err);
/* NOTE: the oops shows that e100_exec_cb fails with ENOMEM,
 * which also means there are no cbs */

/* ... other stuff...
 * and then we die here because cb is NULL: */
while (!(cb-status  cpu_to_le16(cb_complete))) {
msleep(10);
if (!--counter) break;
}

I'm not sure what the right fix would be. e100_resume would probably
have to call e100_alloc_cbs early on, while e100_up should avoid
calling it a second time if nic-cbs_avail != 0. A tentative patch
for testing is attached.

Olaf
-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
[PATCH] e100: allocate cbs early on when resuming

Signed-off-by: Olaf Kirch [EMAIL PROTECTED]

 drivers/net/e100.c |   14 +++---
 1 files changed, 11 insertions(+), 3 deletions(-)

Index: build/drivers/net/e100.c
===
--- build.orig/drivers/net/e100.c
+++ build/drivers/net/e100.c
@@ -1298,8 +1298,10 @@ static inline int e100_exec_cb_wait(stru
int err = 0, counter = 50;
struct cb *cb = nic-cb_to_clean;
 
-   if ((err = e100_exec_cb(nic, NULL, e100_setup_ucode)))
+   if ((err = e100_exec_cb(nic, NULL, e100_setup_ucode))) {
DPRINTK(PROBE,ERR, ucode cmd failed with error %d\n, err);
+   return err;
+   }
 
/* must restart cuc */
nic-cuc_cmd = cuc_start;
@@ -1721,9 +1723,11 @@ static int e100_alloc_cbs(struct nic *ni
struct cb *cb;
unsigned int i, count = nic-params.cbs.count;
 
+   /* bail out if we've been here before */
+   if (nic-cbs_avail)
+   return 0;
+
nic-cuc_cmd = cuc_start;
-   nic-cb_to_use = nic-cb_to_send = nic-cb_to_clean = NULL;
-   nic-cbs_avail = 0;
 
nic-cbs = pci_alloc_consistent(nic-pdev,
sizeof(struct cb) * count, nic-cbs_dma_addr);
@@ -2578,6 +2582,8 @@ static int __devinit e100_probe(struct p
nic-pdev = pdev;
nic-msg_enable = (1  debug) - 1;
pci_set_drvdata(pdev, netdev);
+   nic-cb_to_use = nic-cb_to_send = nic-cb_to_clean = NULL;
+   nic-cbs_avail = 0;
 
if((err = pci_enable_device(pdev))) {
DPRINTK(PROBE, ERR, Cannot enable PCI device, aborting.\n);
@@ -2752,6 +2758,8 @@ static int e100_resume(struct pci_dev *p
retval = pci_enable_wake(pdev, 0, 0);
if (retval)
DPRINTK(PROBE,ERR, Error clearing wake events\n);
+   if ((retval = e100_alloc_cbs(nic)))
+   DPRINTK(PROBE,ERR, No memory for cbs\n);
if(e100_hw_init(nic))
DPRINTK(HW, ERR, e100_hw_init failed\n);
 


Re: e100 oops on resume

2006-01-25 Thread Olaf Kirch
On Wed, Jan 25, 2006 at 10:02:40AM +0100, Olaf Kirch wrote:
 I'm not sure what the right fix would be. e100_resume would probably
 have to call e100_alloc_cbs early on, while e100_up should avoid
 calling it a second time if nic-cbs_avail != 0. A tentative patch
 for testing is attached.

Reportedly, the patch fixes the crash on resume.

Olaf
-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e100 oops on resume

2006-01-25 Thread Olaf Kirch
On Wed, Jan 25, 2006 at 11:37:40AM -0800, Jesse Brandeburg wrote:
 its an interesting patch, but it raises the question why does
 e100_init_hw need to be called at all in resume?  I looked back
 through our history and that init_hw call has always been there.  I
 think its incorrect, but its taking me a while to set up a system with
 the ability to resume.

I'll ask the folks here to give it a try tomorrow. But I suspect at
least some of it will be needed. For instance I assume you'll
have to reload to ucode when bringing the NIC back from sleep.

Olaf
-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] rt2x00core: fix mismatched rwsem calls

2006-01-09 Thread Olaf Kirch
I'm not sure if netdev is the right list to post problems in
this driver to, but here we go...

Subject: rt2x00core: fix mismatched rwsem calls

 rt2x00_update_config pairs a down_write with an up_read call
 on the same semaphore.

 Similarly, rt2x00_link_down pairs a down_read call with an
 up_write call.

Signed-off-by: Olaf Kirch [EMAIL PROTECTED]

 rt2x00core.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

Index: rt2x00-2.0.0-b3/rt2x00core.c
===
--- rt2x00-2.0.0-b3.orig/rt2x00core.c
+++ rt2x00-2.0.0-b3/rt2x00core.c
@@ -93,7 +93,7 @@ rt2x00_update_config(struct _rt2x00_devi
if(likely(update_flags))
device-handler-dev_update_config(device, core-config, 
update_flags);
 
-   up_read(device-rt2x00_sem);
+   up_write(device-rt2x00_sem);
 
clear_bit(DEVICE_CONFIG_UPDATE, device-flags);
 }
@@ -1709,7 +1709,7 @@ rt2x00_link_down(struct _rt2x00_device *
dev_set_promiscuity(device-net_dev, -1);
}
 
-   up_write(device-rt2x00_sem);
+   up_read(device-rt2x00_sem);
 
rt2x00_disconnect(device);
 

-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ipw2200 - do not sleep in ipw_request_direct_scan

2006-01-09 Thread Olaf Kirch
Hi,

We have been seeing frequent deadlocks involving wpa_supplicant on an
ipw2200 card, which rendered the system close to unusable. The scenario
is

 -  wpa_supplicant requests a network scan
 -  For some odd reason, the SCAN_COMPLETED notification never
arrives, possibly because the device loses the beacon and
disassociates, or because something is missing a wake_up
call.
 -  ifconfig etc block in state D
 -  events/0 blocks

Here's my analysis so far:

wpa_supplicant triggers the scan via an ioctl. This calls dev_ioctl, which
takes the rtnl_lock, then the ipw2200 interface's priv-sem.  finds there
is a scan in progress, and goes to sleep until the other scan is complete.

This causes frequent deadlocks: wpa_supplicant triggers the scan, and
blocks. Note that this will first take the rtnl_lock (this is an
ioctl), and then the ipw2200 interface's priv-sem.

The ipw2200 worker thread also tries to do something and blocks on
priv-sem.

events/0 is trying run the linkwatch_queue (net/core/link_watch.c),
and tries to take the rtnl lock but blocks as well. While it does that,
it holds the lock_cpu_hotplug() lock for its CPU, effectively blocking
all sorts of other tasks that try to perform a flush_workqueue on some
work queue.

For sysrq-t ouput and other details, see
https://bugzilla.novell.com/show_bug.cgi?id=133513

What really surprised me was how much of the system a driver can take
down by simply blocking in an ioctl...

And now for a patch - I'm still waiting for feedback on whether this
fixes the problem for good.

-8- cut here 
Subject: ipw2200 - do not sleep in ipw_request_direct_scan

 Drivers should not sleep for very long inside an ioctl -
 so return EAGAIN and let wpa_supplicant handle the problem.

Signed-off-by: Olaf Kirch [EMAIL PROTECTED]

 drivers/net/wireless/ipw2200.c |   14 ++
 1 files changed, 6 insertions(+), 8 deletions(-)

Index: linux-2.6.15/drivers/net/wireless/ipw2200.c
===
--- linux-2.6.15.orig/drivers/net/wireless/ipw2200.c
+++ linux-2.6.15/drivers/net/wireless/ipw2200.c
@@ -8940,14 +8940,12 @@ static int ipw_request_direct_scan(struc
IPW_DEBUG_HC(starting request direct scan!\n);
 
if (priv-status  (STATUS_SCANNING | STATUS_SCAN_ABORTING)) {
-   err = wait_event_interruptible(priv-wait_state,
-  !(priv-
-status  (STATUS_SCANNING |
-  
STATUS_SCAN_ABORTING)));
-   if (err) {
-   IPW_DEBUG_HC(aborting direct scan);
-   goto done;
-   }
+   /* We should not sleep here; otherwise we will block most
+* of the system (for instance, we hold rtnl_lock when we
+* get here).
+*/
+   err = -EAGAIN;
+   goto done;
}
memset(scan, 0, sizeof(scan));
 
-8- cut here 

-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fw: [Fwd: [Bug 5644] New: NFS v3 TCP 3-way handshake incorrect, iptables blocks access]

2005-11-24 Thread Olaf Kirch
On Thu, Nov 24, 2005 at 03:08:27PM +0100, Harald Welte wrote:
 Jozsef Kadlecsik doesn't recall those patches/changes (even though he's
 our Mr. TCP state tracking and is indicated as the author of one of
 the two patches.
 
 I also don't recall having seen any of those patches before.  But that
 doesn't mean all too much, my brain is like a sieve some times.

Those patches came out of a discussion on netfilter-devel.  Sorry,
I don't know exactly when but looking at our CVS log it was Dec 2004.

 Also, it was indicated that those fixes did go mainline at some point?
 Can soembody please indicate when that was supposed to happen?  I don't
 really recall any of this, and I would be surprised if we deliberately
 backed out such changes at a later point.

I wasn't involved in the netfilter discussion; I just did the initial
debugging. But it seems the first patch went into 2.6.10, the other
into 2.6.11-rc1.

Olaf
-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Fwd: [Bug 5644] New: NFS v3 TCP 3-way handshake incorrect, iptables blocks access]

2005-11-23 Thread Olaf Kirch
On Wed, Nov 23, 2005 at 12:53:43PM -0500, Trond Myklebust wrote:
 Sorry to be cross-posting, but does this bug ring any bells? I'm having
 trouble seeing how the sunrpc server code could be at fault.

We've seen this previously, and submitted a fix to netfilter which
supposedly went into mainline at some point. It seems to be gone
from 2.6.14 though.

The problem is with conntrack, and filtering on RELATED (I assume
your netfilter config does that)

What happens is that the client reboots, opens a new TCP connection
with the same port as last time (say 800), sends SYN. Server still has
an active TCB for this, and thus replies with an ACK containing
its current sequence numbers. Now the client is supposed to RST the
connection.

Unfortunately, conntrack does not expect a lone ACK in this state
and ignores it. So the client will retransmit the SYN until timeout.
Then it picks a new port, and succeeds (maybe).

Olaf
-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Small af_inet6 fix

2005-11-14 Thread Olaf Kirch
On Fri, Nov 11, 2005 at 03:00:10PM -0800, David S. Miller wrote:
 There is another bug I see there in the cleanup path too.
 We don't unregister the inet6 protosw on the failure path.
 But this appears totally harmless since you can't reference
 the inet6 proto switch table once you unload the ipv6
 module.  So I'll leave this alone for now.
 
 So putting your patch and the sock_register() check together
 we end up with the following.

Looks good to me.

Thanks,
Olaf
-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Small af_inet6 fix

2005-11-11 Thread Olaf Kirch
The patch below fixes a minor issue with inet6_init.
I noticed though that the code explicitly ignores
the error code of sock_register by calling:
(void) sock_register(inet6_family_ops);
Why does it do this?

Olaf
--
Subject: inet6_init: cleanup after failed initialization

  When initialization fails in inet6_init(), we should
  unregister the PF_INET6 socket ops.

  Signed-off-by: Olaf Kirch [EMAIL PROTECTED]

Index: linux-2.6.14/net/ipv6/af_inet6.c
===
--- linux-2.6.14.orig/net/ipv6/af_inet6.c
+++ linux-2.6.14/net/ipv6/af_inet6.c
@@ -797,6 +797,7 @@ icmp_fail:
 #endif
cleanup_ipv6_mibs();
 out_unregister_raw_proto:
+   sock_unregister(PF_INET6);
proto_unregister(rawv6_prot);
 out_unregister_udp_proto:
proto_unregister(udpv6_prot);

-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Prevent oops when printing martian source

2005-07-11 Thread Olaf Kirch
Hi,

In some cases, we may be generating packets with a source address that
qualifies as martian. This can happen when we're in the middle of setting
up or tearing down the network, and netfilter decides to reject a packet
with an RST.  The routing code would detect the martian, and try to
print a warning.  This would oops, because locally generated packets do
not have a valid skb-mac.raw pointer at this point.

I didn't actually investigate why netfilter was generating an RST with
invalid IP source, but from what it looked like, the system was in the
middle of some interface setup/teardown.

Olaf
-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
From: Olaf Kirch [EMAIL PROTECTED]
Subject: [IPv4] Prevent oops when printing martian source

In some cases, we may be generating packets with a source address that
qualifies as martian. This can happen when we're in the middle of setting
up the network, and netfilter decides to reject a packet with an RST.
The IPv4 routing code would try to print a warning and oops, because
locally generated packets do not have a valid skb-mac.raw pointer
at this point.

Index: linux-2.6.5/net/ipv4/route.c
===
--- linux-2.6.5.orig/net/ipv4/route.c   2004-03-12 12:17:31.0 +0100
+++ linux-2.6.5/net/ipv4/route.c2005-04-28 11:52:23.0 +0200
@@ -1823,7 +1823,7 @@ martian_source:
printk(KERN_WARNING martian source %u.%u.%u.%u from 
%u.%u.%u.%u, on dev %s\n,
NIPQUAD(daddr), NIPQUAD(saddr), dev-name);
-   if (dev-hard_header_len) {
+   if (dev-hard_header_len  skb-mac.raw) {
int i;
unsigned char *p = skb-mac.raw;
printk(KERN_WARNING ll header: );