from:"Matt Mackall"

Re: [BUG] bad address in twothirdsMD4Transform

2008-02-11 Thread Matt Mackall

On Mon, 2008-02-11 at 20:29 +0100, Jiri Slaby wrote:
 Hi,
 
 I get this with 32 bit Firefox 3b2 and java 1.6.0_03 on 64 bit:
 
 BUG: unable to handle kernel paging request at 8102366213f8
 IP: [803558f4] twothirdsMD4Transform+0xc4/0x3b0

You should mention what kernel you're using. This bug is only in -mm
(you didn't cc: Andrew), and it's fixed here:

http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24/2.6.24-mm1/hot-fixes/random-clean-up-checkpatch-complaints-fix.patch

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] [RESENDING] netconsole: register cmdline netconsole configs to configfs

2008-02-11 Thread Matt Mackall


On Mon, 2008-02-11 at 18:08 +0900, Joonwoo Park wrote:
 This patch intorduces cmdline netconsole configs to register to
 configfs
 with dynamic netconsole. Satyam Sharma who designed shiny dynamic
 reconfiguration for netconsole, mentioned about this issue already.
 (http://lkml.org/lkml/2007/7/29/360)
 But I think, without separately managing of two kind of netconsole
 target
 objects, it's possible by using config_group instead of
 config_item in the netconsole_target and default_groups feature of
 configfs.
 
 Patch was tested with configuration creation/destruction by kernel and
 module.
 And it makes possible to enable/disable, modify and review netconsole
 target configs from cmdline.

I'm afraid I'm going to have to leave review of this to someone who is
clueful about configfs. But it seems reasonable.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [2.6.24-mm1] TCP/IPv6 connect() oopses at twothirdsMD4Transform()

2008-02-04 Thread Matt Mackall


On Mon, 2008-02-04 at 17:36 -0800, Andrew Morton wrote:
 On Tue, 05 Feb 2008 10:28:43 +0900 Tetsuo Handa [EMAIL PROTECTED] wrote:
 
  Hello.
  
  Kernel config is at http://I-love.SAKURA.ne.jp/tmp/config-2.6.24-mm1
  
  2.6.24 works fine.

 err, Matt?

random: revert braindamage that snuck into checkpatch cleanup

Signed-off-by: Matt Mackall [EMAIL PROTECTED]

diff -r 50a6e531a9f2 drivers/char/random.c
--- a/drivers/char/random.c Mon Feb 04 20:23:02 2008 -0600
+++ b/drivers/char/random.c Mon Feb 04 20:28:08 2008 -0600
@@ -1306,7 +1306,7 @@
  * Rotation is separate from addition to prevent recomputation
  */
 #define ROUND(f, a, b, c, d, x, s) \
-   (a += f(b, c, d) + in[x], a = (a  s) | (a  (32 - s)))
+   (a += f(b, c, d) + x, a = (a  s) | (a  (32 - s)))
 #define K1 0
 #define K2 013240474631UL
 #define K3 015666365641UL

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [1/4] dst: Distributed storage documentation.

2007-12-02 Thread Matt Mackall

On Thu, Nov 29, 2007 at 03:53:23PM +0300, Evgeniy Polyakov wrote:
 
 Distributed storage documentation.
 
 Algorithms used in the system, userspace interfaces
 (sysfs dirs and files), design and implementation details
 are described here.

Can you give us a summary of how this differs from using device mapper
with NBD?

-- 
Mathematics is the supreme nostalgia of our time.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()

2007-11-24 Thread Matt Mackall

Simon, can you test this patch? I think it's the most straightforward
2.6.24 fix.

diff -r c60016ba6237 net/core/netpoll.c
--- a/net/core/netpoll.cTue Nov 13 09:09:36 2007 -0800
+++ b/net/core/netpoll.cFri Nov 23 13:10:28 2007 -0600
@@ -203,6 +203,12 @@ static void refill_skbs(void)
spin_unlock_irqrestore(skb_pool.lock, flags);
 }
 
+/* used to mark an skb as owned by netpoll */
+static void netpoll_skb_destroy(struct sk_buff *skb)
+{
+   return;
+}
+
 static void zap_completion_queue(void)
 {
unsigned long flags;
@@ -219,10 +225,12 @@ static void zap_completion_queue(void)
while (clist != NULL) {
struct sk_buff *skb = clist;
clist = clist-next;
-   if (skb-destructor)
+   if (skb-destructor == netpoll_skb_destroy) {
+   skb-destructor = NULL;
+   __kfree_skb(skb);
+   }
+   else
dev_kfree_skb_any(skb); /* put this one back */
-   else
-   __kfree_skb(skb);
}
}
 
@@ -252,6 +260,7 @@ repeat:
 
atomic_set(skb-users, 1);
skb_reserve(skb, reserve);
+   skb-destructor = netpoll_skb_destroy;
return skb;
 }
 

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()

2007-11-23 Thread Matt Mackall

On Fri, Nov 23, 2007 at 10:54:11PM +0300, Evgeniy Polyakov wrote:
 On Fri, Nov 23, 2007 at 01:41:39PM -0600, Matt Mackall ([EMAIL PROTECTED]) 
 wrote:
  Here's another thought: move all this logic into the networking core,
  unify it with current softirq zapper, then allow it to be called from
  various other places (like atomic allocators). Then it'll all be in
  central maintained place with more users.
 
 This can be done quite easily - put a check into __kfree_skb() if
 netpoll is compiled-in and we are in hardirq context, then put skb
 into softirq freeing queue. Then zap_completion_queue() can free
 anything without ever knowing about nature of the packet, since this
 will be checked in __kfree_skb() anyway.

What I had in mind was moving the whole zap_completion_queue concept
into net/core/skbuff. So that netpoll (and, say, atomic kmalloc) can
simply call something like clean_completion_queue.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()

2007-11-23 Thread Matt Mackall

On Fri, Nov 23, 2007 at 09:59:06PM +0300, Evgeniy Polyakov wrote:
 On Fri, Nov 23, 2007 at 09:51:01PM +0300, Evgeniy Polyakov ([EMAIL 
 PROTECTED]) wrote:
  On Fri, Nov 23, 2007 at 09:48:51PM +0300, Evgeniy Polyakov ([EMAIL 
  PROTECTED]) wrote:
   Stop, we are trying to free skb without destructor and catch connection
   tracking, so it is not a solution. To fix the problem we need to check
   if it is not netfilter related, kind of this (not tested), Simon please
   give it a try:
  
  And to be really cool we need to bypass skbs with xfrm attached, since
  its freeing also assumes BH context.
 
 What about compile options?

What about my original suggestion that we mark skbs owned by netpoll
and free only those. Much safer, no? Untested:

diff -r c60016ba6237 net/core/netpoll.c
--- a/net/core/netpoll.cTue Nov 13 09:09:36 2007 -0800
+++ b/net/core/netpoll.cFri Nov 23 13:10:28 2007 -0600
@@ -203,6 +203,12 @@ static void refill_skbs(void)
spin_unlock_irqrestore(skb_pool.lock, flags);
 }
 
+/* used to mark an skb as owned by netpoll */
+static void netpoll_skb_destroy(struct sk_buff *skb)
+{
+   return;
+}
+
 static void zap_completion_queue(void)
 {
unsigned long flags;
@@ -219,10 +225,12 @@ static void zap_completion_queue(void)
while (clist != NULL) {
struct sk_buff *skb = clist;
clist = clist-next;
-   if (skb-destructor)
+   if (skb-destructor == netpoll_skb_destroy) {
+   skb-destructor = NULL;
+   __kfree_skb(skb);
+   }
+   else
dev_kfree_skb_any(skb); /* put this one back */
-   else
-   __kfree_skb(skb);
}
}
 
@@ -252,6 +260,7 @@ repeat:
 
atomic_set(skb-users, 1);
skb_reserve(skb, reserve);
+   skb-destructor = netpoll_skb_destroy;
return skb;
 }
 

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()

2007-11-23 Thread Matt Mackall

On Fri, Nov 23, 2007 at 08:57:57PM +0300, Evgeniy Polyakov wrote:
 On Fri, Nov 23, 2007 at 11:07:56AM -0600, Matt Mackall ([EMAIL PROTECTED]) 
 wrote:
  On Fri, Nov 23, 2007 at 01:55:19PM +0300, Evgeniy Polyakov wrote:
   On Fri, Nov 23, 2007 at 12:21:57AM -0800, Andrew Morton ([EMAIL 
   PROTECTED]) wrote:
 [2059664.615816] __iptables__: init4 IN=ppp0 OUT=ppp0 WARNING: at 
 kernel/softirq.c:139 local_bh_enable()
 [2059664.620535]  [80120364] local_bh_enable+0x3c/0x97
   
 [2059664.620657]  [8011c205] __call_console_drivers+0x61/0x6d
 [2059664.620669]  [8011c3fc] release_console_sem+0x164/0x1bf
 [2059664.620679]  [8011c81f] vprintk+0x27a/0x2ff

If that trace is to be beieved we're doing nefilter stuff on packets 
which
were sent across netconsole.

This probably isn't anything the netfilter guys have thought about.  And
probably we don't want them to.  Is there some simple way in which we 
can
exempt netconsole from netfilter processing?
   
   This is not about netfilter, but about freeing skb in interrupt context, 
   which is not allowed, and in interrupt skbs are queued to be freed in 
   softirq,
   but netcnsole wants to flush softirq freeing queue. That is a question: 
   why?
  
  My memory here is hazy, but I think this exists to rescue netconsole
  in low-memory situations. This bit originated with Ingo, so maybe he
  can recall.
  
  Netpoll can process an arbitrary number of skbs inside a single
  interrupt. Think sysrq-t at one packet per line or kgdboe where the
  entire trace session can happen inside one very long interrupt.
  
  Perhaps we can refine this to mark netpoll's skbs (perhaps with
  -destructor?) and delete only skbs we own. As these are never passed
  through any of the other route/xfrm/filter code, they should be safe
  to delete even in irq context, yes?
  
   Removing zap_completion_queue() from find_skb() will fix the warning,
   but I'm not sure this is a correct fix. I've added Matt to the Cc list.
  
  Care to try the sysrq-t or OOM message tests?
 
 We basically can not free skbs there - if it is interrupt context and
 we are freeing some skb with destructor we will catch the warning anyway.

Perhaps I'm missing some context here. We don't free skbs with
destructors in irq context in zap_completion_queue. We reinsert them on the
completion list. We do this by calling dev_kfree_skb_any.

So I'd be surprised if that was a problem. But I can imagine having
problems for skbs without destructors which run into one of these in
__kfree_skb:

dst_release
secpath_put
nf_conntrack_put
nf_conntrack_put_reasm
nf_bridge_put

..some or all of which assume a softirq context.

 No matter if we are under memory pressure or whatever - it is not
 allowed - a lot of skbs are supposed to be freed in softirq context,
 that is why dev_kfree_skb_any() exists.

Some skbs we definitely -can- free in irq context. The only ones we
care about are the ones generated by netpoll. If there's a reason you
think netpoll's own skbs can't be freed, please describe it.

 I think we can drop skbs _without_ destructor from the queue though in
 that conditions given that we actually need only one.

Huh?

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()

2007-11-23 Thread Matt Mackall

On Fri, Nov 23, 2007 at 01:55:19PM +0300, Evgeniy Polyakov wrote:
 On Fri, Nov 23, 2007 at 12:21:57AM -0800, Andrew Morton ([EMAIL PROTECTED]) 
 wrote:
   [2059664.615816] __iptables__: init4 IN=ppp0 OUT=ppp0 WARNING: at 
   kernel/softirq.c:139 local_bh_enable()
   [2059664.620535]  [80120364] local_bh_enable+0x3c/0x97
 
   [2059664.620657]  [8011c205] __call_console_drivers+0x61/0x6d
   [2059664.620669]  [8011c3fc] release_console_sem+0x164/0x1bf
   [2059664.620679]  [8011c81f] vprintk+0x27a/0x2ff
  
  If that trace is to be beieved we're doing nefilter stuff on packets which
  were sent across netconsole.
  
  This probably isn't anything the netfilter guys have thought about.  And
  probably we don't want them to.  Is there some simple way in which we can
  exempt netconsole from netfilter processing?
 
 This is not about netfilter, but about freeing skb in interrupt context, 
 which is not allowed, and in interrupt skbs are queued to be freed in softirq,
 but netcnsole wants to flush softirq freeing queue. That is a question: why?

My memory here is hazy, but I think this exists to rescue netconsole
in low-memory situations. This bit originated with Ingo, so maybe he
can recall.

Netpoll can process an arbitrary number of skbs inside a single
interrupt. Think sysrq-t at one packet per line or kgdboe where the
entire trace session can happen inside one very long interrupt.

Perhaps we can refine this to mark netpoll's skbs (perhaps with
-destructor?) and delete only skbs we own. As these are never passed
through any of the other route/xfrm/filter code, they should be safe
to delete even in irq context, yes?

 Removing zap_completion_queue() from find_skb() will fix the warning,
 but I'm not sure this is a correct fix. I've added Matt to the Cc list.

Care to try the sysrq-t or OOM message tests?

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()

2007-11-23 Thread Matt Mackall

On Fri, Nov 23, 2007 at 10:15:24PM +0300, Evgeniy Polyakov wrote:
 On Fri, Nov 23, 2007 at 12:59:43PM -0600, Matt Mackall ([EMAIL PROTECTED]) 
 wrote:
  So I'd be surprised if that was a problem. But I can imagine having
  problems for skbs without destructors which run into one of these in
  __kfree_skb:
  
  dst_release
  secpath_put
  nf_conntrack_put
  nf_conntrack_put_reasm
  nf_bridge_put
  
  ..some or all of which assume a softirq context.
 
 bridging is ok, others require softirq context.
 I've sent a patch (the last one should be ok) to guard against xfrm and
 connection tracking.
 
   No matter if we are under memory pressure or whatever - it is not
   allowed - a lot of skbs are supposed to be freed in softirq context,
   that is why dev_kfree_skb_any() exists.
  
  Some skbs we definitely -can- free in irq context. The only ones we
  care about are the ones generated by netpoll. If there's a reason you
  think netpoll's own skbs can't be freed, please describe it.
 
 Only some and to distinguish them we can not use destructor - if it is
 set (even empty function) it will fire an alarm.

Yep, please look at the patch I just posted.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()

2007-11-23 Thread Matt Mackall

On Fri, Nov 23, 2007 at 10:32:22PM +0300, Evgeniy Polyakov wrote:
 On Fri, Nov 23, 2007 at 01:11:20PM -0600, Matt Mackall ([EMAIL PROTECTED]) 
 wrote:
  On Fri, Nov 23, 2007 at 09:59:06PM +0300, Evgeniy Polyakov wrote:
   On Fri, Nov 23, 2007 at 09:51:01PM +0300, Evgeniy Polyakov ([EMAIL 
   PROTECTED]) wrote:
On Fri, Nov 23, 2007 at 09:48:51PM +0300, Evgeniy Polyakov ([EMAIL 
PROTECTED]) wrote:
 Stop, we are trying to free skb without destructor and catch 
 connection
 tracking, so it is not a solution. To fix the problem we need to check
 if it is not netfilter related, kind of this (not tested), Simon 
 please
 give it a try:

And to be really cool we need to bypass skbs with xfrm attached, since
its freeing also assumes BH context.
   
   What about compile options?
  
  What about my original suggestion that we mark skbs owned by netpoll
  and free only those. Much safer, no? Untested:
 
 This should work if there are netpoll's skbs, but if we are under memory
 pressure we want to free not only netpoll skbs, but at least one, and 
 what if there are no netpoll skbs in the queue?

Yeah, that's a concern (but note that we do have a private reserve and
we only really need the zap when our reserve is depleted). But I worry
that it's too fragile and if we add a new unsafe case, it won't be
noticed for a long time. This is the first report I've seen of this
particular problem, so this has been a latent bug for three or four
years now.

Here's another thought: move all this logic into the networking core,
unify it with current softirq zapper, then allow it to be called from
various other places (like atomic allocators). Then it'll all be in
central maintained place with more users.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Netconsole and logging everything from /dev/console

2007-11-16 Thread Matt Mackall

On Fri, Nov 16, 2007 at 06:43:15PM +0100, Martin Michlmayr wrote:
 I'm supporting a number of consumer devices (e.g. small NAS devices)
 in Debian.  They typically don't export the serial console and don't
 have any other output devices.  We perform the installation via SSH
 on such devices and start SSH automatically when the system boots so
 people can login.  This works pretty well but sometimes a device stops
 to boot for no good reason.  This is usually really hard to debug
 since we don't have any logs at all.
 
 I could activate netconsole in order to see kernel messages but then I
 still wouldn't see anything that's printed by the ramdisk and later
 when services are started.  Ideally, I'd like to have an option for
 netconsole to show every message that is also shown on the console.
 
 Do you think that would be possible?

It is, definitely, you just need to wire up a tty struct's write
method to netconsole's and add it to the console registration. But I
haven't had any time to work on this in a while.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] net: avoid race between netpoll and network fast path

2007-10-29 Thread Matt Mackall

On Mon, Oct 29, 2007 at 09:26:11PM -0700, David Miller wrote:
 From: Tina Yang [EMAIL PROTECTED]
 Date: Tue, 16 Oct 2007 22:46:30 -0700
 
  The precise race is
  1) net_rx_action get the dev from poll_list
  2) at the same time, netpoll poll_napi() get a hold of the poll lock
 and calls -poll(), remove dev from the poll list
  3) after it finishes, net_rx_action get the poll lock, and calls
 -poll() the second time, and panic when trying to remove (again)
 the dev from the poll list.
 
 This is trivial to fix.
 
 I'll check the following into 2.6.14 and backport it to
 the -stable trees.
 
 [NET]: Fix race between poll_napi() and net_rx_action()
 
 netpoll_poll_lock() synchronizes the -poll() invocation
 code paths, but once we have the lock we have to make
 sure that NAPI_STATE_SCHED is still set.  Otherwise we
 get:
 
   cpu 0   cpu 1
 
   net_rx_action() poll_napi()
   netpoll_poll_lock() ... spin on -poll_lock
   -poll()
 netif_rx_complete
   netpoll_poll_unlock()   acquire -poll_lock()
   -poll()
netif_rx_complete()
CRASH
 
 Based upon a bug report from Tina Yang.
 
 Signed-off-by: David S. Miller [EMAIL PROTECTED]

Thanks, Dave.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] remove netpoll receive code

2007-10-22 Thread Matt Mackall

[annoyed as ever about never being cc:ed on this stuff]

On Wed, Oct 17, 2007 at 01:21:31PM -0700, Stephen Hemminger wrote:
 The netpoll receive code is:
 1. Not used by any in-tree features, it is used by kgdb-over-ether.

And various crashdump over network tools.

 2. A nice hook for people doing nasty things like private binary network 
 stacks or rootkits.

It's a completely useless hook for a binary network stack. It only
supports UDP and only point to point. And it will have crap
performance. It's much less useful here than, say, TUN/TAP.

It doesn't buy anything for a rootkit either, which will continue to
trivially hide servers in userspace as they already do.

This point is completely FUD.

 3. Unsecured by any of the normal firewalling code.

This is correct. It also applies to the TX side of things. The point,
of course, is to bypass as much of the stack as possible so that when
the kernel crashes, we're more likely to actually get our netpoll
data.

 I propose that we take out all the whole netpoll rx path. If/when
 kgdb gets submitted a better and alternative receive path can be
 added.

Let's hear about this better alternative first, shall we? I for one am
a little skeptical of its existence. Going through a larger fraction
of the network stack, running softirqs, etc., are all big (potentially
fatal) steps backward from the point of view of a debugger.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: netconsole problems

2007-10-04 Thread Matt Mackall

On Thu, Oct 04, 2007 at 10:59:38AM -0700, Tina Yang wrote:
 We recently run into a few problems with netconsole
 in at least 2.6.9, 2.6.18 and 2.6.23.  It either panicked
 at netdevice.h:890 or hung the system, and sometimes depending
 on which NIC we are using, the following console message,
 e1000:
  e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
 tg3:
  NETDEV WATCHDOG: eth4: transmit timed out
  tg3: eth4: transmit timed out, resetting
 
 The postmortem vmcore analysis indicated race between normal
 network stack (net_rx_action) and netpoll, and disabling the
 following code segment cures all the problems.

That doesn't tell us much. Can you provide any more details? Like the
call chains on both sides?
 
 netpoll.c
178 /* Process pending work on NIC */
179 np-dev-poll_controller(np-dev);
180 if (np-dev-poll)
181 poll_napi(np);

There are a couple different places this gets called, and for
different reasons. If we have a -large- netconsole dump (like
sysrq-t), we'll swallow up all of our SKB pool and may get stuck waiting
for the NIC to send them (because it's waiting to hand packets back to
the kernel and has no free buffers for outgoing packets).

 Big or small, there seems to be several race windows in the code,
 and fixing them probably has consequence on overall system performance.

Yes, the networking layer goes to great lengths to avoid having any
locking in its fast paths and we don't want to undo any of that
effort.

 Maybe this code should only run when the machine is single-threaded ?

In the not-very-distant future, such machines will be extremely rare.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.23-rc8: cannot make netconsole work

2007-09-28 Thread Matt Mackall

On Fri, Sep 28, 2007 at 01:27:55PM +0400, Andrey Borzenkov wrote:
 I finally decided to try netconsole in attempt to get some more information 
 why my system does not resume (but that is different story). But I cannot 
 make it work - it does load but I see no traffic flowing ever. This is 
 notebook with e100 driver and it works just fine for normal traffic. 
 netconsole is currently compiled as module, loading it I get:
 
 sudo modprobe netconsole netconsole=@/eth0,@/ 
 
 [ 2395.838094] netconsole: local port 6665
 [ 2395.838126] netconsole: interface eth0
 [ 2395.838132] netconsole: remote port 
 [ 2395.838140] netconsole: remote IP 0.0.0.0
 [ 2395.838150] netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
 [ 2395.838175] netconsole: local IP 192.168.1.8
 [ 2395.843798] console [netcon0] enabled
 [ 2395.844722] netconsole: network logging started
 
 but I cannot see any packet flowing via eth0 not can I catch anything on 
 remote side using netcat. This does not work when I add explicitly source and 
 target addresses either:
 
 pts/0}% sudo modprobe netconsole [EMAIL PROTECTED]/eth0,@192.168.1.5/
 {pts/0}% dmesg | tail
 [ 2427.128947] device eth0 left promiscuous mode
 [ 2427.129005] audit(1190970818.972:5): dev=eth0 prom=0 old_prom=256 
 auid=4294967295
 [ 2541.831721] netconsole: local port 6665
 [ 2541.831756] netconsole: local IP 192.168.1.8
 [ 2541.831763] netconsole: interface eth0
 [ 2541.831769] netconsole: remote port 
 [ 2541.831776] netconsole: remote IP 192.168.1.5
 [ 2541.831787] netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
 [ 2541.838832] console [netcon0] enabled
 [ 2541.839964] netconsole: network logging started

What is your console log level set to? If the messages don't come out
on the local console, they won't get sent out the network either.
Fedora at least defaults to hiding most messages. Adding 'debug' to
your kernel command line will change that.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: NETPOLL=y , NETDEVICES=n compile error ( Re: 2.6.23-rc1-mm1 )

2007-08-02 Thread Matt Mackall

On Thu, Aug 02, 2007 at 11:00:08AM +0200, Jarek Poplawski wrote:
 On Wed, Aug 01, 2007 at 09:02:19PM -0500, Matt Mackall wrote:
  On Wed, Aug 01, 2007 at 11:59:21AM +0200, Jarek Poplawski wrote:
   On Tue, Jul 31, 2007 at 05:05:00PM +0200, Gabriel C wrote:
Jarek Poplawski wrote:
 On Tue, Jul 31, 2007 at 12:14:36PM +0200, Gabriel C wrote:
 Jarek Poplawski wrote:
 On 28-07-2007 20:42, Gabriel C wrote:
 Andrew Morton wrote:
 On Sat, 28 Jul 2007 17:44:45 +0200 Gabriel C [EMAIL PROTECTED] 
 wrote:
 ...
 net/core/netpoll.c:155: error: 'struct net_device' has no member 
 named 'poll_controller'
 ...
 I think is because KGDBOE selects just NETPOLL.

 Looks like it.

 Select went and selected NETPOLL and NETPOLL_TRAP but things like
 CONFIG_NETDEVICES and CONFIG_NET_POLL_CONTROLLER remain unset.  
 `select'
 remains evil.
 ...
   seems to select NET_POLL_CONTROLLER after selecting NETPOLL, but
   still doesn't check for NETDEVICES dependency.
  
  That's odd. Adding Sam to the cc:.
 
 Looks right, but after reading Andrew's opinion about select I'd be
 astonished if he doesn't know this problem already.
 
  
 Now KGDBOE just selects NETPOLL and NETPOLL_TRAP.
 Adding 'select CONFIG_NET_POLL_CONTROLLER' let kgdboe compiles but 
 the question is does it work without any ethernet card ?
 
 Why kgdboe should care what netpoll needs? So, I hope, you are adding
 this select under config NETPOLL. On the other hand, if NETPOLL should
 depend on NET_POLL_CONTROLLER there is probably no reason to have them
 both.

NET_POLL_CONTROLLER has def_bool NETPOLL if NETDEVICES .

Net peoples ping ?:)
  
  How about cc:ing the netpoll maintainer?
 
 Is there a new one or do you suggest possibility of abusing the
 authority of the netpoll's author with such trifles...?!

I'm just subtly suggesting that if you're going to have a discussion
about netpoll, you ought to cc: me.

 There are some notions about other diagnostic tools in some
 net drivers, eg. 3c509.c, so there would be a little bit of
 work if, after changing this, they really exist (and even if
 not - maybe it's reasonable to save such possibility for the
 future?).

I created it for netpoll, only netpoll clients have ever cared.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: NETPOLL=y , NETDEVICES=n compile error ( Re: 2.6.23-rc1-mm1 )

2007-08-01 Thread Matt Mackall

On Wed, Aug 01, 2007 at 11:59:21AM +0200, Jarek Poplawski wrote:
 On Tue, Jul 31, 2007 at 05:05:00PM +0200, Gabriel C wrote:
  Jarek Poplawski wrote:
   On Tue, Jul 31, 2007 at 12:14:36PM +0200, Gabriel C wrote:
   Jarek Poplawski wrote:
   On 28-07-2007 20:42, Gabriel C wrote:
   Andrew Morton wrote:
   On Sat, 28 Jul 2007 17:44:45 +0200 Gabriel C [EMAIL PROTECTED] 
   wrote:
  
   Hi,
  
   I got this compile error with a randconfig ( 
   http://194.231.229.228/MM/randconfig-auto-82.broken.netpoll.c ).
  
   ...
  
   net/core/netpoll.c: In function 'netpoll_poll':
   net/core/netpoll.c:155: error: 'struct net_device' has no member 
   named 'poll_controller'
   net/core/netpoll.c:159: error: 'struct net_device' has no member 
   named 'poll_controller'
   net/core/netpoll.c: In function 'netpoll_setup':
   net/core/netpoll.c:670: error: 'struct net_device' has no member 
   named 'poll_controller'
   make[2]: *** [net/core/netpoll.o] Error 1
   make[1]: *** [net/core] Error 2
   make: *** [net] Error 2
   make: *** Waiting for unfinished jobs
  
   ...
  
  
   I think is because KGDBOE selects just NETPOLL.
  
   Looks like it.
  
   Select went and selected NETPOLL and NETPOLL_TRAP but things like
   CONFIG_NETDEVICES and CONFIG_NET_POLL_CONTROLLER remain unset.  
   `select'
   remains evil.
   ...
   I think there may be a logical issue ( again if I got it right ).
   We need some ethernet card to work with kgdboe right ? but we don't 
   have any if !NETDEVICES  !NET_ETHERNET.
  
   So maybe some ' depends on ...  NETDEVICES!=n  NET_ETHERNET!=n ' 
   is needed too ? 
   IMHO, the only logical issue here is netpoll.c mustn't use 
   CONFIG_NET_POLL_CONTROLLER code without #ifdef if it doesn't
   add this dependency itself.
  
   Well it does if NETDEVICES  if NET_ETHERNET which booth are N when 
   !NETDEVICES is why KGDBOE uses select and not depends on.
   
   does if XXX means may use if XXX.
  
  From what I know means only use if xxx on !xxx everything inside the if 
  xxx is n and depends on something inside the if xxx 
  does not work.
  
  ...
  
  menuconfig FOO
  bool FOO
  depends on BAR
  default y 
  -- help --
  something
  if FOO
  
  config BAZ
  depends on WHATEVR  !NOT_THIS
  
  menuconfig SOMETHING_ELSE
  
  if SOMETHING_ELSE
  
  config BLUBB
  depends on PCI  WHATNOT
  
  endif # SOMETHING_ELSE
  
  config NETPOLL
  def_bool NETCONSOLE
  
  config NETPOLL_TRAP
  bool Netpoll traffic trapping
  default n
  depends on NETPOLL

  config NET_POLL_CONTROLLER
  def_bool NETPOLL
  
  endif # FOO
  
  Now if you set FOO=n all is gone and your driver have to select whatever it 
  needs from there.
 
 Probably not exactly so...
 
 From drivers/net/Kconfig:
 
  # All the following symbols are dependent on NETDEVICES - do not repeat
  # that for each of the symbols.
  if NETDEVICES
 
 So, according to this netpoll could presume NETDEVICES and
 NET_POLL_CONTROLLER are always on.
 
 But, as you've found, it's possible to select NETPOLL and !NETDEVICES,
 so this comment is at least not precise.
 
 On the other side something like this:
 
 ...
 endif # NETDEVICES
 
 config NETPOLL
 depends on NETDEVICES
 def_bool NETCONSOLE
 
 config NETPOLL_TRAP
 bool Netpoll traffic trapping
 default n
 depends on NETPOLL
 
 config NET_POLL_CONTROLLER
 def_bool NETPOLL
 depends on NETPOLL
 
 
 seems to select NET_POLL_CONTROLLER after selecting NETPOLL, but
 still doesn't check for NETDEVICES dependency.

That's odd. Adding Sam to the cc:.

   Now KGDBOE just selects NETPOLL and NETPOLL_TRAP.
   Adding 'select CONFIG_NET_POLL_CONTROLLER' let kgdboe compiles but the 
   question is does it work without any ethernet card ?
   
   Why kgdboe should care what netpoll needs? So, I hope, you are adding
   this select under config NETPOLL. On the other hand, if NETPOLL should
   depend on NET_POLL_CONTROLLER there is probably no reason to have them
   both.
  
  NET_POLL_CONTROLLER has def_bool NETPOLL if NETDEVICES .
  
  Net peoples ping ?:)

How about cc:ing the netpoll maintainer?
 
 OK, I wasn't right here: there is no visible reason for both in the
 kernel code, but I can imagine there could be some external users of
 NET_POLL_CONTROLLER without NETPOLL.

I don't know of any. As far as I can tell at this point,
NET_POLL_CONTROLLER == NETPOLL.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 (updated) -mm 4/9] netconsole: Add some useful tips to documentation

2007-07-13 Thread Matt Mackall

On Wed, Jul 11, 2007 at 03:49:29AM +0530, Satyam Sharma wrote:
 [4/9 (updated)] netconsole: Add some useful tips to documentation
 
 Add some useful general-purpose tips. Also suggest solution for the frequent
 problem of console_loglevel set too low numerically (i.e. for high priority
 messages only) on the sender.
 
 Cc: Matt Mackall [EMAIL PROTECTED]
 Cc: Jesper Juhl [EMAIL PROTECTED]
 Signed-off-by: Satyam Sharma [EMAIL PROTECTED]

Thanks.

Acked-by: Matt Mackall [EMAIL PROTECTED]

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [2.6.23 PATCH 13/18] dm: netlink

2007-07-12 Thread Matt Mackall

On Wed, Jul 11, 2007 at 02:27:12PM -0700, Andrew Morton wrote:
 On Wed, 11 Jul 2007 22:01:37 +0100
 Alasdair G Kergon [EMAIL PROTECTED] wrote:
 
  From: Mike Anderson [EMAIL PROTECTED]
  
  This patch adds a dm-netlink skeleton support to the Makefile, and the dm
  directory.
  
  ...
   
  +config DM_NETLINK
  +   bool DM netlink events (EXPERIMENTAL)
  +   depends on BLK_DEV_DM  EXPERIMENTAL
  +   ---help---
  +   Generate netlink events for DM events.
 
 Need a dependency on NET there?

It's really sad to make DM dependent on the network layer.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 -mm 1/9] netconsole: Cleanups, codingstyle, prettyfication

2007-07-10 Thread Matt Mackall

On Tue, Jul 10, 2007 at 02:49:25PM +0530, Satyam Sharma wrote:
 From: Satyam Sharma [EMAIL PROTECTED]
 
 [1/9] netconsole: Cleanups, codingstyle, prettyfication
 
 (1) Remove unwanted headers.
 (2) Mark __init and __exit as appropriate.
 (3) Various trivial codingstyle and prettification stuff.

I don't much like what you've done in (3) because I personally think
tabs should never appear anywhere but at the beginning of a line.
Which is why we consistently ask for people not to mix codingstyle
bits with substantive changes.

But what the hell..

Acked-by: Matt Mackall [EMAIL PROTECTED]

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 -mm 2/9] netconsole: Remove bogus check

2007-07-10 Thread Matt Mackall

On Tue, Jul 10, 2007 at 02:49:30PM +0530, Satyam Sharma wrote:
 From: Satyam Sharma [EMAIL PROTECTED]
 
 [2/9] netconsole: Remove bogus check
 
 The (!np.dev) check in write_msg() is bogus (always false), because:
 np.dev is set by netpoll_setup(), which is called by the target init
 code in init_netconsole() _before_ register_console() = write_msg() cannot
 be triggered unless netpoll_setup() returns with success. And that will not
 happen if netpoll_setup() failed to set np.dev. Also np.dev cannot go from
 under us while netconsole is loaded. This is because netpoll_setup() grabs
 a reference for us on that dev. So let's remove the pointless check.
 
 Signed-off-by: Satyam Sharma [EMAIL PROTECTED]
 Cc: Keiichi Kii [EMAIL PROTECTED]

Acked-by: Matt Mackall [EMAIL PROTECTED]

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 -mm 3/9] netconsole: Simplify boot/module option setup logic

2007-07-10 Thread Matt Mackall

On Tue, Jul 10, 2007 at 02:49:36PM +0530, Satyam Sharma wrote:
 From: Satyam Sharma [EMAIL PROTECTED]
 
 [3/9] netconsole: Simplify boot/module option setup logic

Acked-by: Matt Mackall [EMAIL PROTECTED]

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 -mm 4/9] netconsole: Add some useful tips to documentation

2007-07-10 Thread Matt Mackall

On Tue, Jul 10, 2007 at 02:49:41PM +0530, Satyam Sharma wrote:
 From: Satyam Sharma [EMAIL PROTECTED]

 [4/9] netconsole: Add some useful tips to documentation

 Add some useful general-purpose tips.

 Signed-off-by: Satyam Sharma [EMAIL PROTECTED]
 Cc: Keiichi Kii [EMAIL PROTECTED]

Acked-by: Matt Mackall [EMAIL PROTECTED]

As long as we're on the subject, I've been meaning to add a note
telling people to set their console log level to something useful, as
having that set too low is the most common problem people encounter.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH -mm 5/9] netconsole: Introduce dev_status member

2007-07-04 Thread Matt Mackall

On Wed, Jul 04, 2007 at 04:38:04PM +0530, Satyam Sharma wrote:
 From: Satyam Sharma [EMAIL PROTECTED]
 
 [5/9] netconsole: Introduce dev_status member
 
 Introduce a new member in netconsole_target that tracks the status (up or
 down) of the underlying interface network device that the specific logging
 target netpoll is attached to.
 
 We then join this up with the just-introduced net_device notifier, and
 introduce NETDEV_UP and NETDEV_DOWN notifications. By disabling the target
 when the corresponding local interface is down, we save the overhead of
 unnecessarily disabling interrupts and calling into the netpoll stack in
 console-write().

Yuck. 

 +/*
 + * Why no net_dev_is_up() in netdevice.h? The kernel could lose a lot of
 + * weight if only netdevice.h had the good sense to export such a function.
 + * Oh well ...
 + */
 +static inline int net_dev_is_up(struct net_device *net_dev)
 +{
 + return ((net_dev-flags  IFF_UP) == IFF_UP);
 +}

Why editorialize? Why not just add this to netdevice.h?

 + if (nt-dev_status) {

Why not simply call net_dev_is_up?

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH -mm 4/9] netconsole: Introduce netconsole_netdev_notifier

2007-07-04 Thread Matt Mackall

On Wed, Jul 04, 2007 at 04:37:59PM +0530, Satyam Sharma wrote:
 From: Satyam Sharma [EMAIL PROTECTED]
 
 [4/9] netconsole: Introduce netconsole_netdev_notifier
 
 To update fields of underlying netpoll structure at runtime on
 corresponding NETDEV_CHANGEADDR or NETDEV_CHANGENAME notifications.

It's not obvious that we want to do that.

I can have a host with DHCP that uses netconsole to log to a private
network address. This breaks that.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH -mm 6/9] netconsole: Update documentation for multiple target support

2007-07-04 Thread Matt Mackall

On Wed, Jul 04, 2007 at 04:38:09PM +0530, Satyam Sharma wrote:
 From: Satyam Sharma [EMAIL PROTECTED]
 
 [6/9] netconsole: Update documentation for multiple target support
 
 ... and add a few useful general purpose tips as well while we're at it.

The tips are fine and should go in their own patch at the beginning of
the series. The docs for multiple interfaces should go either in the
patch that adds that (ideally) or after.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 07/12] use a dynamic pool of sk_buffs to keep up with fast targets

2007-07-03 Thread Matt Mackall

On Mon, Jul 02, 2007 at 09:40:36PM -0700, David Miller wrote:
 From: Andrew Morton [EMAIL PROTECTED]
 Date: Mon, 2 Jul 2007 21:36:36 -0700

  My initial thought is that if there is a legitimate need for this
  new capability then it should be made available to other parts of
  the kernel rather than being private to the AEO driver.

 Absolutely.

 We even used to have something like this on a per-cpu basis but using
 generic SLAB is so much better for caching and NUMA that we got rid of
 that.

 Every sk_buff private quicklist pool implementation you
 see should essentially be NAK'd from the get go, it's
 meaningless and if it's really needed one should investigate
 why SKB allocations become such a problem instead of papering
 over the issue. :-)

This is in the VM write-back path. SLAB is insufficient to avoid
deadlock.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCH -mm take5 1/7] marking __init

2007-06-13 Thread Matt Mackall

On Wed, Jun 13, 2007 at 07:25:55PM +0900, Keiichi KII wrote:
 From: Keiichi KII [EMAIL PROTECTED]
 
 This patch contains the following cleanups.
  - add __init for initialization functions(option_setup() and
init_netconsole()).
 
 Acked-by: Matt Mackall [EMAIL PROTECTED]
 Signed-off-by: Keiichi KII [EMAIL PROTECTED]
 Signed-off-by: Takayoshi Kochi [EMAIL PROTECTED]

Dave, please take this one. Not sure what to do about the rest, sigh.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/4] Make net watchdog timers 1 sec jiffy aligned

2007-05-30 Thread Matt Mackall

On Wed, May 30, 2007 at 12:55:51PM -0700, David Miller wrote:
 From: Patrick McHardy [EMAIL PROTECTED]
 Date: Wed, 30 May 2007 20:42:32 +0200

  Stephen Hemminger wrote:
  Index: linux-2.6.22-rc-mm/net/sched/sch_generic.c
  ===
  --- linux-2.6.22-rc-mm.orig/net/sched/sch_generic.c  2007-05-24 
  11:16:03.0 -0700
  +++ linux-2.6.22-rc-mm/net/sched/sch_generic.c   2007-05-25 
  15:10:02.0 -0700
  @@ -224,7 +224,8 @@
   if (dev-tx_timeout) {
   if (dev-watchdog_timeo = 0)
   dev-watchdog_timeo = 5*HZ;
  -if (!mod_timer(dev-watchdog_timer, jiffies + 
  dev-watchdog_timeo))
  +if (!mod_timer(dev-watchdog_timer,
  +   round_jiffies(jiffies + 
  dev-watchdog_timeo)))
   dev_hold(dev);
   }
   }

  Please cc netdev on net patches.

  Again, I worry that if people set the watchdog timeout to, say, 0.1 
  seconds
  then they will get one second, which is grossly different.

  And if they were to set it to 1.5 seconds, they'd get 2.0 which is pretty
  significant, too.

   Alternatively, we could change to a timer that is pushed forward after 
   each
   TX, maybe using hrtimer and hrtimer_forward().  That way the timer would
   never run in normal case.

  It seems wasteful to add per-packet overhead for tx timeouts, which
  should be an exception. Do drivers really care about the exact
  timeout value? Compared to a packet transmission time its incredibly
  long anyways ..

 I agree, this change is absolutely rediculious and is just a blind
 cookie-cutter change made without consideration of what the code is
 doing and what it's requirements are.

What are you agreeing with, Dave?

Are you agreeing that it seems wasteful to add per-packet overhead?
This patch is not doing that.

A quick grep shows that most things are using multi-second timeouts
here. Of the ones that aren't, a number are using .4s, and many more
aren't even in units of HZ. Makes me wonder if the various boards
using 50ms are being overzealous.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/4] Make net watchdog timers 1 sec jiffy aligned

2007-05-30 Thread Matt Mackall

On Wed, May 30, 2007 at 03:29:39PM -0700, David Miller wrote:
 From: Matt Mackall [EMAIL PROTECTED]
 Date: Wed, 30 May 2007 17:10:39 -0500
 
  Are you agreeing that it seems wasteful to add per-packet overhead?
  This patch is not doing that.
 
 Yes, and I know that :-)

Is there a real reason for watchdog_timeo to be all over the map?

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH repost] netpoll: trapping fix/cleanup

2007-04-28 Thread Matt Mackall

On Sat, Apr 28, 2007 at 04:56:23PM +0400, Sergei Shtylyov wrote:
 Hello.
 
 Matt Mackall wrote:
 
 CONFIG_NETPOLL_TRAP causes the TX queue controls to be completely 
 bypassed in
 the netpoll's trapped mode which easily causes overflows in the drivers 
 with
 short TX queues (most notably, in 8139too with its 4-deep queue).
 Make this option more sensible by only bypassing TX softirq wakeup and 
 remove
 CONFIG_NETPOLL_RX option completely since there is *no* code depending on 
 it.
 
 You've got two unrelated patches here, so that's an automatic NAK.
 
   Come on, killing a long ago no-op option doesn't worth the sepearte 
   patch. :-)
 
 I suppose we can kill the config option.
 
   I've even posted the refs to the commits introducing and killing the 
   #ifdef's.
 
 What did you test the NETPOLL_TRAP test with?
 
   KGDBoE (and maybe also netconsole -- don't remember already).

Ok, KGDBoE is a pretty good test here. Netconsole isn't.

Please resend as two separate patches and add:

Acked-by: Matt Mackall [EMAIL PROTECTED]

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH repost] netpoll: trapping fix/cleanup

2007-04-27 Thread Matt Mackall

On Fri, Apr 27, 2007 at 11:44:00PM +0400, Sergei Shtylyov wrote:
 CONFIG_NETPOLL_TRAP causes the TX queue controls to be completely bypassed in
 the netpoll's trapped mode which easily causes overflows in the drivers with
 short TX queues (most notably, in 8139too with its 4-deep queue).
 Make this option more sensible by only bypassing TX softirq wakeup and remove
 CONFIG_NETPOLL_RX option completely since there is *no* code depending on it.

You've got two unrelated patches here, so that's an automatic NAK.

I suppose we can kill the config option.

What did you test the NETPOLL_TRAP test with?

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCH -mm take4 2/6] support multiple logging

2007-04-20 Thread Matt Mackall

On Fri, Apr 20, 2007 at 11:15:26AM -0700, Andrew Morton wrote:
 On Fri, 20 Apr 2007 18:51:13 +0900
 Keiichi KII [EMAIL PROTECTED] wrote:
 
   I started to do some cleanups and fixups here, but abandoned it when it 
   was
   all getting a bit large.
   
   Here are some fixes against this patch:
  
  I'm going to fix my patches by following your reviews and send new patches 
  on the LKML and the netdev ML in a few days.
  
 
 Well..  before you can finish this work we need to decide upon what the
 interface to userspace will be.
 
 - The miscdev isn't appropriate
 
 - netlink remains a possibility
 
 - Stephen suggests an ioctl against a socket and davem suggests socket
   options, but it's unclear to me how that socket will get bound to
   netconsole?

Yeah, that's a bit of a head-scratcher.

 either way, I agree with the overall thrust of this work: netconsole is
 useful in production environments, can become more useful and will need
 runtime configurability.
 
 
 I wonder if we're approaching this in the right way, however...
 
 At a high level, netconsole is just a flow of UDP packets between two
 machines.  The kernel already has rich and well-understood ways of creating
 and configuring such flows.
 
 So...  instead of creating a brand new way of configuring such a flow via
 sysfs and ioctl, could we instead create a flow using the existing
 mechanisms (presumably the socket API) and then transfer the information
 from that flow over to netconsole by some means??

We don't really have anything that corresponds to netpoll's
connections at higher levels.

I'm tempted to say we should make this work more like the dummy
network device. ie:

modprobe netconsole -o netcon1 [params]
modprobe netconsole -o netcon2 [params]

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] netconsole hangs machine 2.6.20

2007-04-16 Thread Matt Mackall

[cc:ed to netdev]

On Sun, Apr 15, 2007 at 10:45:37PM -0700, Mike Mattie wrote:
 Hello,
 
 netconsole is hanging my box during IDE init.
 
 I am running 2.6.20.7, config is attached from /proc
 
 Without using netconsole the kernel boots fine. I am writing this message 
 from it. 
 
 When I do enable net-console I get from the linux banner to a couple of lines 
 after
 sanitize end on the logging client ( Mac OS X 10.4 , simple nc listener )
 
 The boot continues until hde (handled by the HPT366 driver, hdg is the last 
 drive )
 and then the kernel just hangs.

I don't really have a theory for that. What NIC are you using? Looks
like VIA Rhine? Is it sharing an IRQ with your IDE interface?

Please post your boot messages with debug on the kernel command
line.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCH -mm take3 1/6][resend] marking __init

2007-03-20 Thread Matt Mackall

On Tue, Mar 20, 2007 at 09:24:18PM +0900, Keiichi KII wrote:
 From: Keiichi KII [EMAIL PROTECTED]
 
 This patch contains the following cleanups.
  - add __init for initialization functions(option_setup() and
init_netconsole()).
 
 Signed-off-by: Keiichi KII [EMAIL PROTECTED]
 Signed-off-by: Takayoshi Kochi [EMAIL PROTECTED]
 ---
 Index: linux-mm/drivers/net/netconsole.c
 ===
 --- linux-mm.orig/drivers/net/netconsole.c
 +++ linux-mm/drivers/net/netconsole.c
 @@ -91,7 +91,7 @@ static struct console netconsole = {
   .write = write_msg
  };
  
 -static int option_setup(char *opt)
 +static int __init option_setup(char *opt)
  {
   configured = !netpoll_parse_options(np, opt);
   return 1;
 @@ -99,7 +99,7 @@ static int option_setup(char *opt)
  
  __setup(netconsole=, option_setup);
  
 -static int init_netconsole(void)
 +static int __init init_netconsole(void)
  {
   int err;

This is fine.

Acked-by: Matt Mackall [EMAIL PROTECTED]

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Matt Mackall

On Tue, Mar 20, 2007 at 09:31:58AM -0700, Jeremy Fitzhardinge wrote:
 Linus Torvalds wrote:
  On Tue, 20 Mar 2007, Eric W. Biederman wrote:

  If that is the case.  In the normal kernel what would
  the the oops, we got an interrupt code do?
  I assume it would leave interrupts disabled when it returns?
  Like we currently do with the delayed disable of normal interrupts?
  
 
  Yeah, disable interrupts, and set a flag that the fake sti can test, and 
  just return without doing anything.
 
  (You may or may not also need to do extra work to Ack the hardware 
  interrupt etc, which may be irq-controller specific. Once the CPU has 
  accepted the interrupt, you may not be able to just leave it dangling)

 
 So it would be something like:
 
 pda.intr_mask = 1;/* disable interrupts */
 ...
 pda.intr_mask = 0;/* enable interrupts */
 if (xchg(pda.intr_pending, 0))   /* check pending */
   asm(sti); /* was pending; isr left cpu interrupts masked 
 */

I don't know that you need an xchg there. If you're still on the same
CPU, it should all be nice and causal even across an interrupt handler.
So it could be:

   pda.intr_mask = 0; /* intr_pending can't get set after this */
   if (unlikely(pda.intr_pending)) {
  pda.intr_pending = 0;
  asm(sti);
   }

(This would actually need a C barrier, but I'll ignore that as this'd
end up being asm...)

But other interesting things could happen. If we never did a real CLI
and we get preempted and switched to another CPU between clearing
intr_mask and checking intr_pending, we get a little confused. 

But perhaps that doesn't matter because we'd by definition have no
pending interrupts on either processor?

Is it expensive to do an STI if interrupts are already enabled?

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Matt Mackall

On Tue, Mar 20, 2007 at 03:08:19PM -0800, Zachary Amsden wrote:
 Matt Mackall wrote:
 I don't know that you need an xchg there. If you're still on the same
 CPU, it should all be nice and causal even across an interrupt handler.
 So it could be:
 
pda.intr_mask = 0; /* intr_pending can't get set after this */
   
 
 Why not?  Oh, I see.  intr_mask is inverted form of EFLAGS_IF.

It's not even that. There are two things that can happen:

case 1:

  intr_mask = 1;
interrupt occurs and is deferred
  intr_mask = 0;
  /* intr_pending is already set and CLI is in effect */
  if(intr_pending)

case 2:

  intr_mask = 1;
  intr_mask = 0;
interrupt occurs and is processed
  /* intr_pending remains cleared */
  if(intr_pending)

As this is all about local interrupts, it's all on a single CPU and
out of order issues aren't visible..
 
 (This would actually need a C barrier, but I'll ignore that as this'd
 end up being asm...)

..unless the compiler is doing the reordering, of course.

 But other interesting things could happen. If we never did a real CLI
 and we get preempted and switched to another CPU between clearing
 intr_mask and checking intr_pending, we get a little confused. 
 
 I think Jeremy's idea was to have interrupt handlers leave interrupts 
 disabled on exit if pda.intr_mask was set.  In which case, they would 
 bypass all work and we could never get preempted.

I was actually worrying about the case where the interrupt came in
late. But I don't think it's a problem there either.

 I don't think leaving 
 hardware interrupts disabled for such a long time is good though.

It can only be worse than the current situation by the amount of time
it takes to defer an interrupt once. On average, it'll be a lot
better as most critical sections are -tiny-.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: netconsole system freeze when cable unplugged

2007-03-10 Thread Matt Mackall

On Fri, Mar 09, 2007 at 09:42:43PM +0100, Francois Romieu wrote:
 Simon Arlott [EMAIL PROTECTED] :
  When I unplug the cable the system just stops responding to anything, 
  at all. No message is printed to the console when the cable is plugged 
  back in.
 
 rtl8139_interrupt (spin_lock(tp-lock))
 - rtl8139_weird_interrupt
- rtl_check_media
   - mii_check_media (printk(KERN_INFO %s: link down\n, ...))
  [netpoll stuff here]
  - rtl8139_poll_controller
 - rtl8139_interrupt
*deadlock*

Probably the best thing that can be done here aside from moving
mii_check_media outside interrupt handling is to add
netpoll_plug/unplug(dev) methods that push work off to netpoll's queue
for drivers that have a netpoll recursion problem.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC: net-2.6.20 patch] remove unused exports

2007-03-07 Thread Matt Mackall

On Wed, Mar 07, 2007 at 11:40:59PM +0100, Adrian Bunk wrote:
 This patch removes the following not or no longer used exports:
 - drivers/char/random.c: secure_tcp_sequence_number

This part looks reasonable.

Acked-by: Matt Mackall [EMAIL PROTECTED]
-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Recent wireless breakage (ipw2200, iwconfig, NetworkManager)

2007-03-05 Thread Matt Mackall

On Mon, Mar 05, 2007 at 10:58:13AM -0800, Greg KH wrote:
 Ok, how about the following patch.  Is it acceptable to everyone?
 
 -   If you are using a distro that was released in 2006 or later,
 -   it should be safe to say N here.
 +   If you are using an OpenSuSE, Gentoo, Ubuntu, or Fedora
 +   release from 2007 or later, it should be safe to say N here.
 +
 +   If you are using Debian or other distros that are slow to
 +   update HAL, please say Y here.

What HAL version do you think Debian ought to have, pray tell? And
what the hell version do those other distros have?

The last HAL release was 0.5.8 on 11-Sep-2006. It showed up in
Debian/unstable on 2-Oct. There have been six Debian bugfix releases,
the most recent on 12-Feb.

http://people.freedesktop.org/~david/dist/
http://packages.debian.org/changelogs/pool/main/h/hal/hal_0.5.8.1-6.1/changelog

The last NetworkManager is 0.6.4 released 13-Jul-2006. It showed up in
Debian/unstable on 8-Aug. There have been five bugfix releases, the
most recent on 30-Nov.

http://ftp.gnome.org/pub/GNOME/sources/NetworkManager/0.6/
http://packages.debian.org/changelogs/pool/main/n/network-manager/network-manager_0.6.4-6/changelog

Debian is NOT the problem.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Recent wireless breakage (ipw2200, iwconfig, NetworkManager)

2007-03-05 Thread Matt Mackall

On Mon, Mar 05, 2007 at 04:46:09PM +0100, Tomasz Torcz wrote:
  That's not the point. The point is that Debian/unstable as of _this
  morning_ doesn't work. For reference, I'm running both the latest
  releases of both hal (0.5.8.1-6.1) and network-manager (0.6.4-6). And
  there are people telling me I need a copy of HAL out of git that
  hasn't even been released for Debian to package. Debian isn't the
  problem here.
 
   hal 0.5.9-rc1 (released, not from git) should work. It will be
 problably released soon and picked by sane distributions. Debian is very
 irritating corner case.

Presumably the -rc1 stands for release candidate. Which means not
yet released. And when did it show up? 04-Mar-2007 at 18:31. That's
right, YESTERDAY. Almost a full month after Greg's commit.

For the last time, DEBIAN IS NOT THE PROBLEM.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [2.6.21 patch] unconditionally enable SYSFS_DEPRECATED

2007-03-05 Thread Matt Mackall

On Mon, Mar 05, 2007 at 04:07:22PM -0800, Greg KH wrote:
 On Tue, Mar 06, 2007 at 12:40:52AM +0100, Adrian Bunk wrote:
  On Mon, Mar 05, 2007 at 10:58:13AM -0800, Greg KH wrote:
   
   Ok, how about the following patch.  Is it acceptable to everyone?
   
   thanks,
   
   greg k-h
   
   ---
init/Kconfig |   13 +++--
1 file changed, 11 insertions(+), 2 deletions(-)
   
   --- gregkh-2.6.orig/init/Kconfig
   +++ gregkh-2.6/init/Kconfig
   @@ -290,8 +290,17 @@ config SYSFS_DEPRECATED
   that belong to a class, back into the /sys/class heirachy, in
   order to support older versions of udev.

   -   If you are using a distro that was released in 2006 or later,
   -   it should be safe to say N here.
   +   If you are using an OpenSuSE, Gentoo, Ubuntu, or Fedora
   +   release from 2007 or later, it should be safe to say N here.
   +
   +   If you are using Debian or other distros that are slow to
   +   update HAL, please say Y here.
  ...
  
  The sane solution seems to be to enable SYSFS_DEPRECATED unconditionally 
  for all users, and schedule it's removal for mid-2008 (or later).
  
  12 months after the first _release_ of a HAL that can live without seems 
  to be the first time when we can consider getting rid of it, since all 
  distributions with at least one release a year should ship it by then.
  
  Currently, SYSFS_DEPRECATED is only a trap for users.
 
 Huh?
 
 No, again, I've been using this just fine for about 6 months now.
 
 And what about all of the servers not using HAL/NetworkManager?
 And what about all of the embedded systems not using either?
 
 So to not allow this to be turned off by people who might want to (we
 want this for OpenSuSE 10.3, and Fedora 7 also will want this, as will
 other distros released this year), is pretty heavy-handed.
 
 It also will work in OpenSuSE 10.2 which is already released, and I
 think Fedora 6, but I've only limited experience with these.
 
 Oh, and Gentoo works just fine, and has been for the past 6 months.

 I would just prefer to come up with an acceptable set of wording that
 will work to properly warn people.
 
 I proposed one such wording which some people took as a slam against
 Debian, which it really was not at all.
 
 Does someone else want to propose some other wording instead?

Back up a bit. Let's review:

Problem: NetworkManager stopped working with my ipw2200 on Debian/unstable

Theory A: It broke because I'm not running an as-yet-unreleased HAL.

 Then we should revert the patch pronto because it's an unqualified
 regression.

Theory B: It broke because I'm not running relatively recent HAL.

 By all accounts I'm running the latest and greatest HAL and Network
 Manager, more than recent enough to work.

Theory C: It broke because I've got some goofy config.

 My setup passes no arguments to either. The HAL config file is
 completely bare-bones and there's no sign of any configuration files
 for Network Manager.

Theory D: It broke for some nebulous Debian-related reason.

 That's a bunch of unhelpful crap.

Can we come up with an actual theory for what's wrong with my setup, please?
Like, perhaps:

Theory E: There's some undiagnosed new breakage that this introduces
that no else hit until it went into mainline.

 Hmmm, this one sounds more promising.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [2.6.21 patch] unconditionally enable SYSFS_DEPRECATED

2007-03-05 Thread Matt Mackall

On Mon, Mar 05, 2007 at 06:48:50PM -0800, Greg KH wrote:
 
 Wait, have confirmed that if you enable this config option,
 NetworkManager starts back up again and works properly?

Yep, probably should have mentioned that.

 If so, can you disable the option and strace it to see what program is
 trying to access what?  That will put the
 HAL/NetworkManager/libsysfs/distro script finger pointing to rest pretty
 quickly :)

Did that a few hours ago, got a very large dump from both programs. No
smoking guns to my eye, but I'll send you the logs later.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [2.6.21 patch] unconditionally enable SYSFS_DEPRECATED

2007-03-05 Thread Matt Mackall

On Mon, Mar 05, 2007 at 06:48:50PM -0800, Greg KH wrote:
 If so, can you disable the option and strace it to see what program is
 trying to access what?  That will put the
 HAL/NetworkManager/libsysfs/distro script finger pointing to rest pretty
 quickly :)

Ok, I've got straces of both good and bad (5M each). Filtered out
random pointer values and the like, diffed, and filtered for /sys/,
and the result's still 1.5M. What should I be looking for?

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [2.6.21 patch] unconditionally enable SYSFS_DEPRECATED

2007-03-05 Thread Matt Mackall

On Mon, Mar 05, 2007 at 08:03:50PM -0800, Greg KH wrote:
 On Mon, Mar 05, 2007 at 09:39:47PM -0600, Matt Mackall wrote:
  On Mon, Mar 05, 2007 at 06:48:50PM -0800, Greg KH wrote:
   If so, can you disable the option and strace it to see what program is
   trying to access what?  That will put the
   HAL/NetworkManager/libsysfs/distro script finger pointing to rest pretty
   quickly :)
  
  Ok, I've got straces of both good and bad (5M each). Filtered out
  random pointer values and the like, diffed, and filtered for /sys/,
  and the result's still 1.5M. What should I be looking for?
 
 Failures when trying to read from /sys/class/net/
 
 Or opening the directory and iterating over the subdirs in there.  Or
 something like that.
 
 But the /sys/class/net/ stuff should hopefully help narrow it down.

Works:

6857  open(/sys/class/net,
O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 13
6857  fstat64(13, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
6857  fcntl64(13, F_SETFD, FD_CLOEXEC)  = 0
6857  getdents64(13, /* 5 entries */, 4096) = 120
6857  readlink(/sys/class/net/eth1, 0x80a2450, 256) = -1 EINVAL
(Invalid argument)
6857  readlink(/sys/class/net/eth1/device,
../../../devices/pci:00/:00:1e.0/:02:02.0, 256) = 53
6857  readlink(/sys/class/net/lo, 0x80a2450, 256) = -1 EINVAL
(Invalid argument)
6857  readlink(/sys/class/net/lo/device, 0x80a2450, 256) = -1 ENOENT
(No such
file or directory)
6857  readlink(/sys/class/net/eth0, 0x80a2450, 256) = -1 EINVAL
(Invalid argument)
6857  readlink(/sys/class/net/eth0/device,
../../../devices/pci:00/:00:1e.0/:02:01.0, 256) = 53
6857  getdents64(13, /* 0 entries */, 4096) = 0
6857  close(13) = 0

Breaks:

3620  open(/sys/class/net,
O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 13
3620  fstat64(13, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
3620  fcntl64(13, F_SETFD, FD_CLOEXEC)  = 0
3620  getdents64(13, /* 5 entries */, 4096) = 120
3620  readlink(/sys/class/net/eth1,
../../devices/pci:00/:00:1e.0/00\00:02:02.0/eth1, 256) = 55
3620
readlink(/sys/devices/pci:00/:00:1e.0/:02:02.0/eth1/device,
0x809e910, 256) = -1 ENOENT (No such file or directory)
3620  readlink(/sys/class/net/lo, ../../devices/virtual/net/lo,
256) = 28
3620  readlink(/sys/devices/virtual/net/lo/device, 0x809e960, 256) =
-1 ENOEN\T (No such file or directory)
3620  readlink(/sys/class/net/eth0,
../../devices/pci:00/:00:1e.0/00\00:02:01.0/eth0, 256) = 55
3620
readlink(/sys/devices/pci:00/:00:1e.0/:02:01.0/eth0/device,
0x809e960, 256) = -1 ENOENT (No such file or directory)
3620  getdents64(13, /* 0 entries */, 4096) = 0
3620  close(13) = 0

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Recent wireless breakage (ipw2200, iwconfig, NetworkManager)

2007-03-04 Thread Matt Mackall

Recent kernels are having troubles with wireless for me. Two seemingly
related problems:

a) NetworkManager seems oblivious to the existence of my IPW2200
b) Manual iwconfig waits for 60s and then reports:

Error for wireless request Set Encode (8B2A) :
SET failed on device eth1 ; Operation not supported.

During this time, my keyboard in X is unresponsive, but everything
else seems to be functioning properly. Queued keypresses eventually
show up. Alt-sysrq-w gives:

ieee80211_crypt: registered algorithm 'WEP'
SysRq : Show Blocked State

 freesibling
  task PCstack   pid father child younger older
events/0  D C0102D3C 0 4  1 5 3
(L-TLB)
   c1d0bf1c 0046 c1d0ac20 c0102d3c  c1d0aa70 0022
   000a
   c1d0aa70 ca8ed618 0034 0cd3 c1d0ab7c 0287 0002
   f581f040
   0002 c1d0bf34 0246 f7b5cb04 c1d0aa70 c038224e c0102c3a
   
Call Trace:
 [c0102d3c] __switch_to+0x11b/0x143
 [c038224e] __mutex_lock_slowpath+0xfb/0x1e2
 [c0102c3a] __switch_to+0x19/0x143
 [f994a291] ipw_bg_link_down+0x19/0xbd [ipw2200]
 [f994a278] ipw_bg_link_down+0x0/0xbd [ipw2200]
 [c0122604] run_workqueue+0x97/0x156
 [c0122bc7] worker_thread+0x105/0x12e
 [c0112399] default_wake_function+0x0/0xc
 [c0122ac2] worker_thread+0x0/0x12e
 [c01251f7] kthread+0xa0/0xc9
 [c0125157] kthread+0x0/0xc9
 [c0103f87] kernel_thread_helper+0x7/0x10
 ===
ipw2200/0 D 0020 0  1985  6  2260  1983
(L-TLB)
   f7981f24 0046 0001 0020 c1cdf8c0  
   000a
   f7d09030 e1fdc8c3 0034 093a f7d0913c 0086 0020
   f7c4a740
   0086 f7981f3c 0246 f7b5cb04 f7d09030 c038224e f7d09030
   c0496550
Call Trace:
 [c038224e] __mutex_lock_slowpath+0xfb/0x1e2
 [c0381233] __sched_text_start+0x4b3/0x56b
 [f99445c6] ipw_bg_gather_stats+0x0/0x27 [ipw2200]
 [f99445dd] ipw_bg_gather_stats+0x17/0x27 [ipw2200]
 [c0122604] run_workqueue+0x97/0x156
 [c0122bc7] worker_thread+0x105/0x12e
 [c0112399] default_wake_function+0x0/0xc
 [c0122ac2] worker_thread+0x0/0x12e
 [c01251f7] kthread+0xa0/0xc9
 [c0125157] kthread+0x0/0xc9
 [c0103f87] kernel_thread_helper+0x7/0x10
 ===
ieee80211_crypt_wep: could not allocate crypto API arc4
eth1: could not initialize WEP: load module ieee80211_crypt_wep
ADDRCONF(NETDEV_UP): eth1: link is not ready

A second attempt to enable WEP via iwconfig succeeds and network
connectivity is normal. However, NetworkManager still ignores the
device at this point.

Bisect with Mercurial points to this patch:

$ hg bisect bad
The first bad revision is:
changeset:   46985:f701b96bb2f7
user:Greg Kroah-Hartman [EMAIL PROTECTED]
date:Wed Feb 07 10:37:11 2007 -0800
summary: Network: convert network devices to use struct device
instead of class_device

which corresponds to 43cb76d91ee85f579a69d42bc8efc08bac560278 in git.

-- 
 Love the dolphins, she advised him. Write by W.A.S.T.E.. 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Recent wireless breakage (ipw2200, iwconfig, NetworkManager)

2007-03-04 Thread Matt Mackall

On Mon, Mar 05, 2007 at 12:39:24AM +0100, Johannes Berg wrote:
 [adding linux-wireless to CC]
 
 On Sun, 2007-03-04 at 16:08 -0600, Matt Mackall wrote:
  Recent kernels are having troubles with wireless for me. Two seemingly
  related problems:
 
 I don't think they are related actually.
 
  a) NetworkManager seems oblivious to the existence of my IPW2200
 
 This is due to the recent sysfs restructuring I think. IIRC the fix is
 to upgrade hal to a current git version.

If that's the cause, the fix is to back out whatever was done to break
userspace. Breaking userspace is not ok. Upgrading from 2.6.x to
2.6.x+1 should not entail replacing substantial parts of userspace,
especially with NOT-EVEN-FRAKKING-RELEASED-YET CODE. 

I will try a new HAL when it shows up in Debian/unstable and not a
moment sooner.

  b) Manual iwconfig waits for 60s and then reports:
 
 That one's strange.
 
  A second attempt to enable WEP via iwconfig succeeds and network
  connectivity is normal. However, NetworkManager still ignores the
  device at this point.
 
 I'd think it's a ipw bug but I have no idea if that was even touched
 during this time.
 
  Bisect with Mercurial points to this patch:
  
  $ hg bisect bad
  The first bad revision is:
  changeset:   46985:f701b96bb2f7
  user:Greg Kroah-Hartman [EMAIL PROTECTED]
  date:Wed Feb 07 10:37:11 2007 -0800
  summary: Network: convert network devices to use struct device
  instead of class_device
  
  which corresponds to 43cb76d91ee85f579a69d42bc8efc08bac560278 in git.
 
 Yup, sysfs breakage/hal stuff. Can you try with a recent hal? And maybe
 try to bisect the iwconfig stop thing if you've got enough time...

Will double-check the iwconfig tests. It's been masked by
NetworkManager for a while.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Recent wireless breakage (ipw2200, iwconfig, NetworkManager)

2007-03-04 Thread Matt Mackall

On Sun, Mar 04, 2007 at 05:16:25PM -0800, Greg KH wrote:
 On Sun, Mar 04, 2007 at 04:08:57PM -0600, Matt Mackall wrote:
  Recent kernels are having troubles with wireless for me. Two seemingly
  related problems:
  
  a) NetworkManager seems oblivious to the existence of my IPW2200
  b) Manual iwconfig waits for 60s and then reports:
  
  Error for wireless request Set Encode (8B2A) :
  SET failed on device eth1 ; Operation not supported.
 
 Do you have CONFIG_SYSFS_DEPRECATED enabled?  If not, please do as that
 will keep you from having to change any userspace code.

No, it's disabled. Will test once I'm done tracking down the iwconfig
problem. From the help text for SYSFS_DEPRECATED:

  If you are using a distro that was released in 2006 or
  later, it should be safe to say N here.

If we need an as-yet-unreleased HAL without it, I would say the above
should be changed to 2008 or so. If Debian actually cuts a release in
the next few months, you might make that 2010.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Recent wireless breakage (ipw2200, iwconfig, NetworkManager)

2007-03-04 Thread Matt Mackall

On Sun, Mar 04, 2007 at 11:02:48PM -0800, Greg KH wrote:
 On Mon, Mar 05, 2007 at 12:42:29AM -0600, Matt Mackall wrote:
  On Sun, Mar 04, 2007 at 05:16:25PM -0800, Greg KH wrote:
   On Sun, Mar 04, 2007 at 04:08:57PM -0600, Matt Mackall wrote:
Recent kernels are having troubles with wireless for me. Two seemingly
related problems:

a) NetworkManager seems oblivious to the existence of my IPW2200
b) Manual iwconfig waits for 60s and then reports:

Error for wireless request Set Encode (8B2A) :
SET failed on device eth1 ; Operation not supported.
   
   Do you have CONFIG_SYSFS_DEPRECATED enabled?  If not, please do as that
   will keep you from having to change any userspace code.
  
  No, it's disabled. Will test once I'm done tracking down the iwconfig
  problem. From the help text for SYSFS_DEPRECATED:
  
If you are using a distro that was released in 2006 or
later, it should be safe to say N here.
  
  If we need an as-yet-unreleased HAL without it, I would say the above
  should be changed to 2008 or so. If Debian actually cuts a release in
  the next few months, you might make that 2010.
 
 Well, just because Debian has such a slow release cycle, should the rest
 of the world be forced to follow suit?  :)
 
 When I originally wrote that, I thought Debian would have already done
 their release, my mistake...

That's not the point. The point is that Debian/unstable as of _this
morning_ doesn't work. For reference, I'm running both the latest
releases of both hal (0.5.8.1-6.1) and network-manager (0.6.4-6). And
there are people telling me I need a copy of HAL out of git that
hasn't even been released for Debian to package. Debian isn't the
problem here.

If it is indeed the case that HAL needs to be upgraded here, the clock
on deprecating these features can't even start counting until a usable
HAL version is released. And then you need to give it at least a year
after that before you can start recommending people disable it.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCH 2.6.19 take2 1/5] marking __init and remove drop initialization

2006-12-21 Thread Matt Mackall

On Thu, Dec 21, 2006 at 07:03:23PM +0900, Keiichi KII wrote:
  - remove drop initialization in the netpoll structure.

Why?

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/4] e1000: fixes for 7.1.9-k2

2006-07-14 Thread Matt Mackall

On Fri, Jul 14, 2006 at 04:24:52PM -0700, Kok, Auke wrote:
 
 Jeff,
 
 Please pull the following patches from
 
 git://lost.foo-projects.org/~ahkok/git/netdev-2.6 upstream-fixes-jgarzik
 (based on 22e1170310ec6afa41e0dc7ac9dfac735d82dcab)
 
 To receive the following fixes for e1000:
 
 [01]: Redo netpoll fix

Have I seen this patch? Is there a way to browse it?

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/4] e1000: fixes for 7.1.9-k2

2006-07-14 Thread Matt Mackall

On Fri, Jul 14, 2006 at 05:15:18PM -0700, Jeff Kirsher wrote:
 Matt,
 The patch you are referring to is in patch set emails that Auke sent out.

Doesn't look like it made it here.

 I have also copied the patch below:

I'm going to assume all the tab damage here is an artifact. Otherwise,
looks good.

Signed-off-by: Matt Mackall [EMAIL PROTECTED]

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] netpoll: break recursive loop in netpoll rx path

2006-06-12 Thread Matt Mackall

On Mon, Jun 12, 2006 at 11:40:29AM -0400, Neil Horman wrote:
 Hey there-
   the netpoll system currently has a rx to tx path via:
 netpoll_rx
  __netpoll_rx
   arp_reply
netpoll_send_skb
 dev-hard_start_tx
 
   This rx-tx loop places network drivers at risk of
 inadvertently causing a deadlock or BUG halt by recursively trying
 to acquire a spinlock that is used in both their rx and tx paths
 (this problem was origionally reported to me in the 3c59x driver,
 which shares a spinlock between the boomerang_interrupt and
 boomerang_start_xmit routines).

Grumble.

 This patch breaks this loop, by queueing arp frames, so that they
 can be responded to after all receive operations have been
 completed. Tested by myself and the reported with successful
 results.

Tested how? kgdb?

 + if (likely(npi))
 + while ((skb = skb_dequeue(npi-arp_tx)) != NULL)
 + arp_reply(skb);
 +

Assignment inside tests is frowned upon. I'd prefer pulling this out
to its own function too.


Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] netpoll: don't spin forever sending to stopped queues

2006-06-11 Thread Matt Mackall

On Thu, Jun 08, 2006 at 07:15:50PM -0700, Jeremy Fitzhardinge wrote:
 Matt Mackall wrote:
 That's odd. Netpoll holds a reference to the device, of course, but so
 does a normal up interface. So that shouldn't be the problem.
 Another possibility is that outgoing packets from printks in the
 driver are causing difficulty. Not sure what can be done about that.
   
 Here's a patch.  I haven't tested it beyond compiling it, and I don't 
 know if it is actually correct.  In this case, it seems pointless to 
 spin waiting for an even which will never happen.  Should 
 netif_poll_disable() cause netpoll_send_skb() (or something) to not even 
 bother trying to send?  netif_poll_disable seems mysteriously simple to me.
 
J

Did this work for you at all?

 When transmitting a skb in netpoll_send_skb(), only retry a limited
 number of times if the device queue is stopped.

Where limited = once?

 Signed-off-by: Jeremy Fitzhardinge [EMAIL PROTECTED]
 
 diff -r aac813f54617 net/core/netpoll.c
 --- a/net/core/netpoll.c  Wed Jun 07 14:53:40 2006 -0700
 +++ b/net/core/netpoll.c  Thu Jun 08 19:00:29 2006 -0700
 @@ -280,15 +280,10 @@ static void netpoll_send_skb(struct netp
* network drivers do not expect to be called if the queue is
* stopped.
*/
 - if (netif_queue_stopped(np-dev)) {
 - np-dev-xmit_lock_owner = -1;
 - spin_unlock(np-dev-xmit_lock);
 - netpoll_poll(np);
 - udelay(50);
 - continue;
 - }
 -
 - status = np-dev-hard_start_xmit(skb, np-dev);
 + status = NETDEV_TX_BUSY;
 + if (!netif_queue_stopped(np-dev))
 + status = np-dev-hard_start_xmit(skb, np-dev);
 +
   np-dev-xmit_lock_owner = -1;
   spin_unlock(np-dev-xmit_lock);
 
 
 

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Using netconsole for debugging suspend/resume

2006-06-09 Thread Matt Mackall

On Fri, Jun 09, 2006 at 07:50:25AM +0200, Andi Kleen wrote:
 On Friday 09 June 2006 07:23, David Miller wrote:
  From: Auke Kok [EMAIL PROTECTED]
  Date: Thu, 08 Jun 2006 22:13:48 -0700

   netconsole should retry. There is no timeout programmed here since that 
   might
   lose important information, and you rather want netconsole to survive an 
   odd
   unplugged cable then to lose vital debugging information when the system 
   is
   busy for instance. (losing link will cause the interface to be down and 
   thus
   the queue to be stopped)

  I completely disagree that netpoll should loop when the ethernet
  cable is plugged out. 

 Currently it is a bit dumb and doesn't distingush the various cases
 well.

 I submitted a patch to loop to be a bit more clever at some point. It can be 
 still
 found in the netdev archives.

Agreed that timeouts should happen.

IIRC, the trouble with your patch was that it a) timed out on far too
short a timescale and b) locked up on my box. Unfortunately, so did my
own patch, which made timeouts approximately 1ms.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] e1000: fix netpoll with NAPI

2006-06-07 Thread Matt Mackall

On Wed, Jun 07, 2006 at 11:05:22AM -0400, Neil Horman wrote:
   
   Matt, any ideas on this?
  
  Not at the moment.
 
 how about this for a solution?  It doesn't make netpoll any more robust, but I
 think in the interests of efficiency it would be fair to require that, when
 netpolled, a driver must receive frames on the same net device for which it 
 was
 polled.  With this patch we detect that condition and handle it accordingly in
 e1000_intr.  This eliminates the need for us to call the clean_rx method from
 the poll_controller when napi is configured, instead allowing the poll method 
 to
 be called from napi_poll, as the netpoll model currently does.  This fixes the
 netdump regression, and eliminates the layering violation and the potential 
 race
 that we've been discussing.  I've just tested it with netdump here and it 
 works
 quite well.
 
 Thoughts appreciated.

This looks pretty reasonable, mostly from the perspective that it
doesn't put any further ugliness in netpoll. We might want to add a
comment somewhere in netpoll of the new rule we're now observing.
I'll let the e1000 guys comment on the particulars of the driver change.

 Signed-off-by: Neil Horman [EMAIL PROTECTED]
Signed-off-by: Matt Mackall [EMAIL PROTECTED]

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [2.6 patch] drivers/char/random.c: unexport secure_ipv6_port_ephemeral

2006-04-09 Thread Matt Mackall

On Sun, Apr 09, 2006 at 05:58:22PM +0200, Adrian Bunk wrote:
 This patch removes the unused EXPORT_SYMBOL(secure_ipv6_port_ephemeral).

 Signed-off-by: Adrian Bunk [EMAIL PROTECTED]
Acked-by: Matt Mackall [EMAIL PROTECTED]

Adrian appears to be correct that this doesn't break modular ipv6.
-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: netconsole: no IP address for eth0, aborting

2006-03-14 Thread Matt Mackall

On Sun, Mar 12, 2006 at 04:27:51PM -0800, Randy.Dunlap wrote:
 
 hardware: e1000 NIC in ThinkPad T42
 kernel:  2.6.16-rc5-mm3
 
 I always get $subject message.  I changed the delay in
 net/core/netpoll.c from 4 seconds to 9 seconds, but it
 doesn't matter, the e1000 always finds Link is Up immediately
 after $subject message.
 
 I suppose this is a consequence of e1000 using a workqueue for
 some of its init code.  Maybe it would work better on an MP
 machine instead of UP (giving e1000 a CPU to init on?).
 I guess I'll have to switch from netconsole built-in to use a
 loadable module instead.  That could work for some non-kernel-init
 debugging, but this really needs to work for kernel init-time
 debugging as well IMO.

That's an odd message. Are you specifying an IP address on the command
line?

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: netconsole: no IP address for eth0, aborting

2006-03-14 Thread Matt Mackall

On Tue, Mar 14, 2006 at 09:31:08AM -0800, Randy.Dunlap wrote:
 On Mon, 13 Mar 2006 22:29:12 -0600 Matt Mackall wrote:
 
  On Sun, Mar 12, 2006 at 04:27:51PM -0800, Randy.Dunlap wrote:
   
   hardware: e1000 NIC in ThinkPad T42
   kernel:  2.6.16-rc5-mm3
   
   I always get $subject message.  I changed the delay in
   net/core/netpoll.c from 4 seconds to 9 seconds, but it
   doesn't matter, the e1000 always finds Link is Up immediately
   after $subject message.
   
   I suppose this is a consequence of e1000 using a workqueue for
   some of its init code.  Maybe it would work better on an MP
   machine instead of UP (giving e1000 a CPU to init on?).
   I guess I'll have to switch from netconsole built-in to use a
   loadable module instead.  That could work for some non-kernel-init
   debugging, but this really needs to work for kernel init-time
   debugging as well IMO.
  
  That's an odd message. Are you specifying an IP address on the command
  line?
 
 Yes, I am specifying a target IP address and a target MAC address.
 Is that OK or should I not specify the IP address?

That's fine, but you also need to specify a _source_ IP address. If
the interface is up and configured, we can use its IP address, but
that obviously doesn't work at boot time.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism

2005-12-14 Thread Matt Mackall

On Wed, Dec 14, 2005 at 09:55:45AM -0800, Sridhar Samudrala wrote:
 On Wed, 2005-12-14 at 10:22 +0100, Andi Kleen wrote:
   I would appreciate any feedback or comments on this approach.
  
  Maybe I'm missing something but wouldn't you need an own critical
  pool (or at least reservation) for each socket to be safe against deadlocks?
  
  Otherwise if a critical sockets needs e.g. 2 pages to finish something
  and 2 critical sockets are active they can each steal the last pages
  from each other and deadlock.
 
 Here we are assuming that the pre-allocated critical page pool is big enough
 to satisfy the requirements of all the critical sockets.

Not a good assumption. A system can have between 1-1000 iSCSI
connections open and we certainly don't want to preallocate enough
room for 1000 connections to make progress when we might only have one
in use.

I think we need a global receive pool and per-socket send pools.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism

2005-12-14 Thread Matt Mackall

On Wed, Dec 14, 2005 at 08:30:23PM -0800, David S. Miller wrote:
 From: Matt Mackall [EMAIL PROTECTED]
 Date: Wed, 14 Dec 2005 19:39:37 -0800

  I think we need a global receive pool and per-socket send pools.

 Mind telling everyone how you plan to make use of the global receive
 pool when the allocation happens in the device driver and we have no
 idea which socket the packet is destined for?  What should be done for
 non-local packets being routed?  The device drivers allocate packets
 for the entire system, long before we know who the eventually received
 packets are for.  It is fully anonymous memory, and it's easy to
 design cases where the whole pool can be eaten up by non-local
 forwarded packets.

There needs to be two rules:

iff global memory critical flag is set
- allocate from the global critical receive pool on receive
- return packet to global pool if not destined for a socket with an
  attached send mempool

I think this will provide the desired behavior, though only
probabilistically. That is, we can fill the global receive pool with
uninteresting packets such that we're forced to drop critical ACKs,
but the boring packets will eventually be discarded as we walk up the
stack and we'll eventually have room to receive retried ACKs.

 I truly dislike these patches being discussed because they are a
 complete hack, and admittedly don't even solve the problem fully.  I
 don't have any concrete better ideas but that doesn't mean this stuff
 should go into the tree.

Agreed. I'm fairly convinced a full fix is doable, if you make a
couple assumptions (limited fragmentation), but will unavoidably be
less than pretty as it needs to cross some layers.

 I think GFP_ATOMIC memory pools are more powerful than they are given
 credit for.  There is nothing preventing the implementation of dynamic
 GFP_ATOMIC watermarks, and having critical socket behavior kick in
 in response to hitting those water marks.

There are two problems with GFP_ATOMIC. The first is that its users
don't pre-state their worst-case usage, which means sizing the pool to
reliably avoid deadlocks is impossible. The second is that there
aren't any guarantees that GFP_ATOMIC allocations are actually
critical in the needed-to-make-forward-VM-progress sense or will be
returned to the pool in a timely fashion.

So I do think we need a distinct pool if we want to tackle this
problem. Though it's probably worth mentioning that Linus was rather
adamantly against even trying at KS.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [2.6 patch] move some code to net/ipx/af_ipx.c

2005-11-18 Thread Matt Mackall

On Fri, Nov 18, 2005 at 06:22:52AM +0100, Adrian Bunk wrote:
  
  This patch isn't bad, but looking closer we could move the contents of 
  p8023.c as well as the contents of at least p8022.c and pe2.c into 
  af_ipx.c.
  
  Is the contents of any of these three files expected to be used
  outside IPX (closest candidate would be appletalk)?
 
 Below is a patch implementing what I was thinking of.

Looks reasonable.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] netpoll is unable to handle skb's using packet split

2005-11-14 Thread Matt Mackall

On Mon, Nov 14, 2005 at 09:41:30PM -0800, David S. Miller wrote:
 From: David S. Miller [EMAIL PROTECTED]
 Date: Mon, 14 Nov 2005 21:39:22 -0800 (PST)

  From: Matt Mackall [EMAIL PROTECTED]
  Date: Mon, 14 Nov 2005 21:23:58 -0800

   What is packet split in this context?

  It's a mode of buffering used by the e1000 driver.

 BTW, the issue is that in packet split mode, the e1000 driver is
 feeding paged based non-linear SKBs into the stack on receive which is
 completely legal but aparently netpoll or something parsing netpoll RX
 packets does not handle it properly.

The bug is in netpoll. It's non-linear ignorant. We probably can't
call skb_linearize because it wants to kmalloc, but we probably can
follow the fragment. Or, worst case, we can manually linearize into an
SKB in our private pool.

Can we make any assumptions about the size and position of fragments.
For instance, will the first N data bytes of a UDP packet all be in
the same fragment?

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Netconsole violates dev-hard_start_xmit synch rules

2005-09-07 Thread Matt Mackall

On Tue, Sep 06, 2005 at 09:02:42PM -0700, Eugene Surovegin wrote:
 I don't quite understand, you _already_ have deferred processing in 
 netpoll (btw, how good this will work with kgdboe?).

Yes, it exists as a last ditch fallback for cases where we can detect
deadlock. And it works not at all for kgdboe. Fortunately the fallback
case rarely coincides with a breakpoint unless you're trying to debug
the network stack itself (which is asking for trouble).

 If some drivers cannot handle *additional* restriction (nowhere
 documented, btw), ok, let's have a fallback mode for them, which you
 _already_ have.

If we have to use the fallback mode all the time on a particular device,
netpoll has no value on that device. It can't catch the oopses you can't
already catch most of with syslogd, and it can't be used for kgdb.
At that point, why bother?

 It's amazing, frankly, you insist that this feature works just fine, 

I have in fact said since the first message in the thread that you
guys are right, I'm breaking the rules. But I also claim it works fine
for some hardware. These are not incompatible statements. The second
is a reminder that this stuff is actually useful to people in its
current form and that we shouldn't break that.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Netconsole violates dev-hard_start_xmit synch rules

2005-09-07 Thread Matt Mackall

On Wed, Sep 07, 2005 at 12:17:01PM -0700, Ben Greear wrote:
 Francois Romieu wrote:
 Stephen Hemminger [EMAIL PROTECTED] :
 [...]
 
 It really sounds like netconsole needs to have a different device hook
 instead of start_xmit.  It also probably doesn't want to have allocate
 an skb.  What you want is a synchronous way to send one packet with
 interrupts disabled:
 
 (dev-netpoll_send)(dev, packet, len)
 
 
 I do not see how it will solve the issue raised by David:
 
 hard_start_xmit does not disable irq and takes nearly no lock,
 thus it can be interrupted if a random irq event happens on a
 different device. Assume that the driver of the said different
 device issues a printk through netconsole and the game is over.
 
 Without extra locking, netpoll_send can not know that the
 interrupted hard_start_xmit has left the device/driver in a sane
 state.
 
 You could do some horrible kludge like make netpoll register itself
 as active and then enable locking in both netpoll_send and hard_start_xmit.

We can probably get away with something much lighter weight, as we
only need to exclude netpoll from calling into the driver when it's
already doing softirq processing. There's almost but not quite enough
info for this already in the network scheduling flags.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Netconsole violates dev-hard_start_xmit synch rules

2005-09-06 Thread Matt Mackall

On Tue, Sep 06, 2005 at 03:36:27PM -0700, David S. Miller wrote:
 From: Eugene Surovegin [EMAIL PROTECTED]
 Date: Tue, 6 Sep 2005 15:04:17 -0700

  David, correct me if I'm wrong, but I think there is a major problem
  with current netconsole/netpoll approach.

 You're preaching to the choir.  I think the whole netpoll
 implementation is fundamentally flawed, and the locking problems we
 keep bumping into are merely a symptom.

 People want this thing so badly, that I keep letting them continue to
 patch this thing into quasi-working, even though it's foundations are
 what are so problematic.

Well I agree in some ways and disagree in others.

 It's never going to work %100 reliably, I think, here's why:

 The core issue, and conflict, is that the desire is to have the
 responses be immediate and come at the moment the event occurs.
 Because the situation may be so dire that deferring into a more
 appropriate software IRQ context may not be possible, and thus we'd
 lose the log message or event.

In the case of kgdb-over-ethernet or netdump or several others,
deferring to an IRQ context doesn't even make sense.

 So we try to spit out netconsole messages in hw IRQ context and stuff
 like that, as you stated.  The tg3 driver is susceptible to the
 problem you mention, as is bnx2, because they use purely software
 interrupt spinlocking, and thus their timers will deadlock if any hw
 IRQ context netpoll operations occur.

I'm not aware of the tg3 problem, please describe it in more detail.

 There is a way to fix all of this, deferring all netpoll operations to
 software IRQ context, but you sacrifice reliability when the system is
 in such a bad state that software IRQs are not occuring any more
 or are deadlocked.

At that point, why bother? Just use syslogd. Or more likely, use a
serial cable, which will actually work reliably.

Where I disagree is this: what netpoll is trying to do is not
fundamentally unreasonable. There's nothing magical about interrupts
that should make it impossible to drive network hardware in polled
mode with interrupts disabled. 

Instead, the problem is that the network stack has evolved in a
direction that made a bunch of fairly reasonable assumptions that
netpoll has now broken.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Netconsole violates dev-hard_start_xmit synch rules

2005-09-06 Thread Matt Mackall

On Tue, Sep 06, 2005 at 03:42:07PM -0700, David S. Miller wrote:
 Mentioning the latency of the serial console, in support of
 netpoll's interrupt disabling, is quite a straw man.

No, it's exactly to the point: latency is a secondary concern when
we're printing an oops or other diagnostic. Otherwise it would be
unacceptable in the serial console as well, where the problem is
orders of magnitude worse.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Netconsole violates dev-hard_start_xmit synch rules

2005-09-06 Thread Matt Mackall

On Tue, Sep 06, 2005 at 04:01:24PM -0700, David S. Miller wrote:
 So you cannot call into these drivers with HW interrupts disabled or
 even worse from HW interrupt context.  These drivers use locking
 strategies which are perfectly legal and work until you add netpoll.

And again, I agree.

What I disagree with is the claim that it's netpoll that's broken.
Again, I claim that netpoll is not doing something fundamentally
unreasonable. Driving hardware by polling with interrupts disabled is
nothing magic, it's just something the network stack to date has not
been designed to cope with.

So we can either:

a) have a netpoll that's crippled to softirq-only
b) live with some amount of localized brain damage here
c) attempt to make the stack more netpoll-accomodating

Option a) is a non-starter. It's scarcely better than syslogd for
logging purposes, and completely useless for things like
kgdb-over-ethernet and the like.

Option b) is what we have now. It's ugly as hell, agreed, but it does
work for quite a few scenarios where nothing else suffices.

Option c) is obviously a big project but maybe we can get from here to
there. One possible step in that direction would be exposing a
standard driver lock that netpoll can see and switching drivers that
have trouble (bnx2 and tg3 for starters) over to using it.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Netconsole violates dev-hard_start_xmit synch rules

2005-09-06 Thread Matt Mackall

On Tue, Sep 06, 2005 at 06:37:40PM -0700, Eugene Surovegin wrote:
 On Tue, Sep 06, 2005 at 06:03:38PM -0700, Matt Mackall wrote:
  On Tue, Sep 06, 2005 at 04:01:24PM -0700, David S. Miller wrote:
   So you cannot call into these drivers with HW interrupts disabled or
   even worse from HW interrupt context.  These drivers use locking
   strategies which are perfectly legal and work until you add netpoll.
  
  And again, I agree.
  
  What I disagree with is the claim that it's netpoll that's broken.
  Again, I claim that netpoll is not doing something fundamentally
  unreasonable. Driving hardware by polling with interrupts disabled is
  nothing magic, it's just something the network stack to date has not
  been designed to cope with.
 
 Well, it's not driving hw in polling mode with hw IRQs disabled 
 (in fact, NAPI does exactly this), it's _calling_ driver entry 
 points from illegal context. IMHO this is not the same.

The only reasonable way for core code to manipulate hardware is
calling driver interfaces, so I think you're splitting hairs.

  So we can either:
  
  a) have a netpoll that's crippled to softirq-only
  b) live with some amount of localized brain damage here
  c) attempt to make the stack more netpoll-accomodating
 
 or a') make this a per-driver feature (e.g. NETIF_F_NETPOLL_CHALENGED) 

 In this case, even if driver cannot handle being called from IRQ 
 context, netconsole still can be used, although in a little more 
 limited fashion, but being safe and not causing lock ups, which, as 
 far as I understand you, it was designed to help debugging in the 
 first place :).

Again, there's basically no point to this at all. It won't catch an
appreciable number of diagnostics that are not already caught by
syslog and it won't work with kgdb-over-ethernet, netdump, etc.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Netconsole violates dev-hard_start_xmit synch rules

2005-09-06 Thread Matt Mackall

On Tue, Sep 06, 2005 at 07:37:45PM -0700, David S. Miller wrote:
 From: Matt Mackall [EMAIL PROTECTED]
 Date: Tue, 6 Sep 2005 18:03:38 -0700

  Option c) is obviously a big project but maybe we can get from here to
  there. One possible step in that direction would be exposing a
  standard driver lock that netpoll can see and switching drivers that
  have trouble (bnx2 and tg3 for starters) over to using it.

 If this means moving them over to hw IRQ locking, this I totally
 cannot agree with.  It's a non-starter.

 I worked way too hard to eliminate all of the hw IRQ locking from the
 tg3 driver.  There is no way we'll have it undone for something like
 netpoll.

My proposal is for drivers to expose enough internal state in a
consistent fashion so that netpoll can avoid lock recursion deadlocks.
I don't particularly care how.

 And as I stated, you absolutely cannot call some drivers
 -hard_start_xmit() routine with hw IRQs disabled.  It's just not
 possible at all.  I even tried this myself last year in order to deal
 with a seperate locking issue, and it failed miserably.  To reiterate:

   Drivers need to be able to let things like jiffies
   advance in their -hard_start_xmit() routine.  Thus
   disabling HW interrupts during -hard_start_xmit()'s
   execution cannot ever be allowed.

That's sounds like a per-driver limitation, albeit likely a very hard
one to work around.

 All of this points to the fact that network stuff needs to be done
 within software IRQ context.

 I also do not agree with your assertion that netpoll cannot be useful
 if run from software IRQ context.  Someone trying to implement block
 device based core dumping would run into the same exact issue in the
 SCSI layer, which runs command completion from sw IRQs and whose
 locking totally depends upon it.

...And that's been tried and does indeed lose most of the time. Which
is why it's been abandoned.

Think upon the kgdb-over-ethernet case, please. The kernel hits a
breakpoint, the kgdb stub stops everything, sends a packet to the
debugging client, waits for a packet back.. This simply can't work if
we delay send/receive to the next softirq processing.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Netconsole violates dev-hard_start_xmit synch rules

2005-09-06 Thread Matt Mackall

On Tue, Sep 06, 2005 at 07:32:38PM -0700, Eugene Surovegin wrote:
 On Tue, Sep 06, 2005 at 07:19:21PM -0700, Matt Mackall wrote:
   or a') make this a per-driver feature (e.g. NETIF_F_NETPOLL_CHALENGED) 
  
   In this case, even if driver cannot handle being called from IRQ 
   context, netconsole still can be used, although in a little more 
   limited fashion, but being safe and not causing lock ups, which, as 
   far as I understand you, it was designed to help debugging in the 
   first place :).
  
  Again, there's basically no point to this at all. It won't catch an
  appreciable number of diagnostics that are not already caught by
  syslog and it won't work with kgdb-over-ethernet, netdump, etc.
 
 Well, you're simplifying here. There are small embedded systems which 
 don't have syslogd, serial cable connected. For such systems even 
 softirq driven netpoll is useful.

I'm fairly familiar with the requirements of embedded systems, thanks.
If you're not getting significantly more functionality than the
syslogd in, say, busybox (about 4k), then why put it in the kernel?
 
 Anyway, it's much more easier for me to just say to my driver users 
 that netpoll is broken and isn't supported because of this, than 
 arguing here. For some reason, I have a wrong impression, that you 
 would want *more* users of your stuff.

Which would I rather have:

netconsole never catches my oopses, it's useless. 

netconsole didn't work with my driver, so I tried another card and it
works great.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Netconsole violates dev-hard_start_xmit synch rules

2005-09-06 Thread Matt Mackall

On Tue, Sep 06, 2005 at 08:19:34PM -0700, David S. Miller wrote:
 From: Matt Mackall [EMAIL PROTECTED]
 Date: Tue, 6 Sep 2005 20:08:10 -0700

  Think upon the kgdb-over-ethernet case, please. The kernel hits a
  breakpoint, the kgdb stub stops everything, sends a packet to the
  debugging client, waits for a packet back.. This simply can't work if
  we delay send/receive to the next softirq processing.

 Networking is asynchronous, kgdb-over-ethernet wants synchronous
 processing from an asynchronous subsystem.  It is therefore no
 surprise that these two sets of requirements are at odds with each
 other and it simply isn't going to work very well.

Serial is also asynchronous. And yet we have robust polling support
for typical serial hardware that works great for both console and
remote debugging support. Why? Because delivering kernel messages
synchronously has been a requirement for the serial console since day
one.

Kgdboe simply wants the asynchronous subsystem to get out of the
way. For most cards, it works just fine.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Netconsole violates dev-hard_start_xmit synch rules

2005-09-06 Thread Matt Mackall

On Tue, Sep 06, 2005 at 08:37:36PM -0700, Eugene Surovegin wrote:
 On Tue, Sep 06, 2005 at 08:22:25PM -0700, Matt Mackall wrote:
  Which would I rather have:
  
  netconsole never catches my oopses, it's useless. 
  
  netconsole didn't work with my driver, so I tried another card and it
  works great.
 
 Well, not all world which uses Linux is PC with PCI slots. In fact, 
 there are much more non-PC devices (where you have SoC with built-in 
 Ethernet) with Linux than PCs with Linux.

Let me try that again:

Which would I rather have:

netconsole never catches my oopses, it's useless. 
 
netconsole didn't work with my driver, but it works with other
drivers just fine.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] netpoll can lock up on low memory.

2005-08-06 Thread Matt Mackall

On Sat, Aug 06, 2005 at 09:58:27AM +0200, Ingo Molnar wrote:
 
 btw., the current NR_SKBS 32 in netpoll.c seems quite low, especially 
 e1000 can have a whole lot more skbs queued at once. Might be more 
 robust to increase it to 128 or 256?

Not sure that the card's queueing really makes a difference. It either
eventually releases the queued SKBs or it doesn't. What's more
important is that we be able to survive bursts like the output of
sysrq-t. This seems to work already.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: lockups with netconsole on e1000 on media insertion

2005-08-05 Thread Matt Mackall

On Fri, Aug 05, 2005 at 01:45:55PM +0200, Andi Kleen wrote:
 John B?ckstrand [EMAIL PROTECTED] writes:
 
  I've been trying to hunt down a hard lockup issue with some hardware
  of mine, but I've possibly hit a kernel bug instead. When using
  netconsole on my e1000, if I unplug the cable and then re-plug it, the
  machine locks up hard. It manages to print the link up message on
  the screen, but nothing after that. Now, I wonder if this is supposed
  to be so? I tried this on 4 different configurations, 2.6.13-rc5 and
  2.6.12 with and without noapic acpi=off, same result on all of
  them. I've tried with 1 and 3 other NICs in the machine at the same
  time.
 
 I ran into the same problem some time ago on e1000. The problem was
 that if the link doesn't come up netconsole ends up waiting forever
 for it.

I still don't like this fix. Yes, you're right, it should eventually
give up. But here it gives up way too easily - 5 could easily
translate to 5 microseconds. This is analogous to giving up on serial
transmit if CTS is down for 5 loops.

I'd be much happier if there were some udelay or the like in here so
that we're not giving up on such a short timeframe.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] netpoll can lock up on low memory.

2005-08-05 Thread Matt Mackall

On Fri, Aug 05, 2005 at 04:57:00PM -0400, Steven Rostedt wrote:
 On Fri, 2005-08-05 at 13:01 -0700, Matt Mackall wrote:
  On Fri, Aug 05, 2005 at 10:36:31AM -0400, Steven Rostedt wrote:
   Looking at the netpoll routines, I noticed that the find_skb could
   lockup if the memory is low. This is because the allocations are
   called with GFP_ATOMIC (since this is in interrupt context) and if
   it fails, it will continue to fail. This is just by observing the
   code, I didn't have this actually happen. So if this is not the
   case, please let me know how it can get out. Otherwise, please
   accept this patch.
  
  By netpoll_poll() tickling the driver enough to free the currently
  queued outgoing SKBs.
 
 I believe that the e1000 wont free up any outgoing packets since the
 netpoll call doesn't seem to get to the e1000_clean_tx part of the
 e1000_intr, otherwise the system wouldn't lock under the
 netpoll_send_skb when one disconnects the wire and puts it back in.  The
 disconnect would lock it up anyway (with Andi's patch it now doesn't)
 but since it won't come back up after the link is back up, there seems
 to be something wrong with the e1000 netpoll driver.  This is because
 the e1000_netpoll doesn't seem to be cleaning up the tx buffer and start
 the queue back up.

That does seem like a driver problem.

  Also note that by the time we're in this loop, we're ready to take
  desperate measures. We've already exhausted our private queue of SKBs
  so we have no alternative but to keep kicking the driver until
  something happens.
 
 OK, the system is under heavy memory load and starts eating up the
 netpoll packets.  When the last packet is gone, and you have something
 like the e1000 that doesn't clean up its packets with netpoll, then you
 just locked up the system.
 
 The scary part of this loop is that if the netpoll doesn't come up with
 the goods, its game over.  Say we are at desperate measures but it could
 be a case where we need to output more information and lockup here
 before we can go out and free some memory. 

Realistically, we were probably going to crash anyway at this point as
we're apparently failing to recycle SKBs.

Netpoll generally must assume it won't get a second chance, as it's
being called by things like oops() and panic() and used by things like
kgdb. If netpoll fails, the box is dead anyway.

  The netpoll philosophy is to assume that its traffic is an absolute
  priority - it is better to potentially hang trying to deliver a panic
  message than to give up and crash silently.
 
 So even a long timeout would not do?  So you don't even get a message to
 the console?

In general, there's no way to measure time here. And if we're
using netconsole, what makes you think there's any other console?

   Also, as Andi told me, the printk here would probably not show up
   anyway if this happens with netconsole.
  
  That's fine. But in fact, it does show up occassionally - I've seen
  it.
 
 Then maybe what Andi told me is not true ;-)
 
 Oh, and did your machine crash when you saw it?  Have you seen it with
 the e1000 driver?

No and no. Most of my own testing is done with tg3.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] netpoll can lock up on low memory.

2005-08-05 Thread Matt Mackall

On Fri, Aug 05, 2005 at 11:26:10PM +0200, Andi Kleen wrote:
 On Fri, Aug 05, 2005 at 01:01:57PM -0700, Matt Mackall wrote:
  The netpoll philosophy is to assume that its traffic is an absolute
  priority - it is better to potentially hang trying to deliver a panic
  message than to give up and crash silently.
 
 That would be ok if netpoll was only used to deliver panics. But 
 it is not. It delivers all messages, and you cannot hang the 
 kernel during that. Actually even for panics it is wrong, because often
 it is more important to reboot in a panic than (with a panic timeout) 
 to actually deliver the panic. That's needed e.g. in a failover cluster.
 
 If that was the policy it would be a quite dumb one and make netpoll
 totally unsuitable for production use. I hope it is not.

Suggest you rip __GFP_NOFAIL out of JBD before complaining about this.

 Dave, can you please apply the timeout patch anyways?

Yes, let's go right over the maintainer's objections almost
immediately after he chimes into the thread. I'll spare the list the
colorful language this inspires.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: lockups with netconsole on e1000 on media insertion

2005-08-05 Thread Matt Mackall

On Fri, Aug 05, 2005 at 11:56:50PM +0200, Andi Kleen wrote:
  I still don't like this fix. Yes, you're right, it should eventually
  give up. But here it gives up way too easily - 5 could easily
  translate to 5 microseconds. This is analogous to giving up on serial
  transmit if CTS is down for 5 loops.
  
  I'd be much happier if there were some udelay or the like in here so
  that we're not giving up on such a short timeframe.
 
 Problem is that it could translate to a long aggregate delay
 e.g. when the kernel tries to dump the backlog after console_init.
 That is why I made the delay so short.

But why are we in a hurry to dump the backlog on the floor? Why are we
worrying about the performance of netpoll without the cable plugged in
at all? We shouldn't be optimizing the data loss case.

My primary concern here is that the loop have a non-negligible extent
in time. 5 loops is effectively equal to none. I'd be very surprised
if it was even enough for deglitching.

With serial console, we do polled I/O that runs at the serial rate -
milliseconds per line of output.

 Longer delay would be possible, but then it would need some logic
 to detect down links and don't delay on them and then retry later etc. 
 Would be all far more complicated.

I think we could probably have subsequent failures be much shorter
without too much added complexity. But I'm not sure it matters.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] netpoll can lock up on low memory.

2005-08-05 Thread Matt Mackall

On Fri, Aug 05, 2005 at 11:51:18PM +0200, Andi Kleen wrote:
   If that was the policy it would be a quite dumb one and make netpoll
   totally unsuitable for production use. I hope it is not.
  
  Suggest you rip __GFP_NOFAIL out of JBD before complaining about this.
 
 So you're suggesting we should become as bad at handling networking
 errors as we are at handling IO errors?  

No, I'm suggesting that the machine will hang forever sometimes and no
amount of patching up netpoll or JBD, etc. will fix that. A hardware
watchdog is a requirement for robust failover anyway and if you think
otherwise, you're dreaming.

And for reference, both are examples of theoretical
should-never-happen memory allocation failures.
 
   Dave, can you please apply the timeout patch anyways?
  
  Yes, let's go right over the maintainer's objections almost
  immediately after he chimes into the thread. I'll spare the list the
  colorful language this inspires.
 
 Sure when the maintainer has a unreasonable position on something
 I think that's justified. Yours in this case is clearly unreasonable.

What's clear is that you didn't like my position from my very first
post in this thread and immediately went for the nuclear option
without even trying to discuss it.

Are you even aware that the patch we're discussing here is for a problem
that has yet to be observed and that Steven's initial analysis had
missed a couple things?

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: lockups with netconsole on e1000 on media insertion

2005-08-05 Thread Matt Mackall

On Sat, Aug 06, 2005 at 01:51:22AM +0200, Andi Kleen wrote:
  But why are we in a hurry to dump the backlog on the floor? Why are we
  worrying about the performance of netpoll without the cable plugged in
  at all? We shouldn't be optimizing the data loss case.
 
 Because a system shouldn't stall for minutes (or forever like right now) 
 at boot just because the network cable isn't plugged in.

Using netconsole without a network cable could well be classified as a
serious configuration error. NFS also is a bit sluggish without a
network cable.

I've already agreed that forever is a problem. Can we work towards
agreeing on a non-trivial timeout, please?

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] netpoll can lock up on low memory.

2005-08-05 Thread Matt Mackall

On Fri, Aug 05, 2005 at 08:23:55PM -0400, Steven Rostedt wrote:
 On Fri, 2005-08-05 at 14:28 -0700, Matt Mackall wrote:
  
  Netpoll generally must assume it won't get a second chance, as it's
  being called by things like oops() and panic() and used by things like
  kgdb. If netpoll fails, the box is dead anyway.
 
 But it is also being called by every printk in the kernel. What happens
 when the printk that causes this lock up is not a panic but just some
 info print.  One would use netconsole when they turn on more verbose
 printing, to keep the output fast, right?  So if the system gets a
 little memory tight, but not to the point of failing, this will cause a
 lock up and no one would know why. 

This doesn't happen, or if it does, it happens far less often than
genuine crashes. This can only happen when netpoll's burned through
its entire pool of SKBs in a single interrupt and the card never
releases them, despite repeated prodding. In other words, you'll get
most of a dump out of the box, but you're probably screwed no matter
what you do.

Also note that _any_ printk can be the kernel's dying breath. This is
why for both serial and video we do polled I/O to be sure we actually
get our message out. Netconsole is no different.

 If you need to really get the data out, then the design should be
 changed.  Have some return value showing the failure, check for
 oops_in_progress or whatever, and try again after turning interrupts
 back on, and getting to a point where the system can free up memory
 (write to swap, etc).  Just a busy loop without ever getting a skb is
 just bad.

Why, pray tell, do you think there will be a second chance after
re-enabling interrupts? How does this work when we're panicking or
oopsing where we most care? How does this work when the netpoll client
is the kernel debugger and the machine is completely stopped because
we're tracing it?

As for busy loops, let me direct you to the poll part of the name.
It is in fact the whole point.

   So even a long timeout would not do?  So you don't even get a message to
   the console?
  
  In general, there's no way to measure time here. And if we're
  using netconsole, what makes you think there's any other console?
 
 Why assume that there isn't another console?  The screen may be used
 with netconsole, you just lose whatever has been scrolled too far.

Yes, there may be another console, but we should by no means depend on
that being the case. We should in fact assume it's not.

 Also, as Andi told me, the printk here would probably not show up
 anyway if this happens with netconsole.

That's fine. But in fact, it does show up occassionally - I've seen
it.
   
   Then maybe what Andi told me is not true ;-)
   
   Oh, and did your machine crash when you saw it?  Have you seen it with
   the e1000 driver?
  
  No and no. Most of my own testing is done with tg3.
  
 
 If you saw the message and the system didn't crash, then that's proof
 that if the driver is not working properly, you would have lock up the
 system, and the system was _not_ in a state that it _had_ to get the
 message out.

Let me be more precise. I've seen it in the middle of an oops dump,
where it complained, then made further progress, and then died. In
other words, the code works. And I've since upped the pool size.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: atheros driver (1/8)

2005-08-04 Thread Matt Mackall

Some initial comments:

- you should include a description of each patch in the message
- you should use more descriptive subjects, eg

[PATCH 1/8] atheros: Kconfig bits

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Netconsole Driver

2005-07-29 Thread Matt Mackall

On Fri, Jul 29, 2005 at 12:13:06PM -0500, Paul Vinson wrote:
   Are you sure eth1 is an e1000? From e1000_main.c:
  
   #ifdef CONFIG_NET_POLL_CONTROLLER
   netdev-poll_controller = e1000_netpoll;
   #endif
  
   That's been there for 17 months.
 
  Yep, it's an e1000:
 
  [EMAIL PROTECTED] cat /proc/modules
  e1000 82612 0 - Live 0xf88f2000
 
  It's quite a mystery.  I've been racking my brains.
 
  Please holler if anything else comes to mind.
 
  Send me your full dmesg output.
  --
 
 Matt,
 
 I had compiled the kernel with 'CONFIG_E1000_NAPI=y'.  Once I disabled
 that netconsole worked fine.

Odd. I don't see any reason why netconsole should fail to work with
that switch on. The original report indicated netdev-poll_controller
wasn't set, but that doesn't seem to be affected by the NAPI config.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [2.6 patch] NETCONSOLE must depend on INET

2005-07-27 Thread Matt Mackall

On Wed, Jul 27, 2005 at 01:19:00PM -0700, David S. Miller wrote:
 From: Matt Mackall [EMAIL PROTECTED]
 Date: Tue, 26 Jul 2005 19:36:37 -0700

  # HG changeset patch
  # User [EMAIL PROTECTED]
  # Node ID 6cdd6f36d53678a016cfbf5ce667cbd91504d538
  # Parent  75716ae25f9d87ee2a5ef7c4df2d8f86e0f3f762
  Move in_aton from net/ipv4/utils.c to net/core/utils.c

 This patch doesn't apply, in the current 2.6.x GIT tree
 NETCONSOLE does not depend on NETDEVICES.

Odd, gitweb of Linus' tree seems to disagree. I see it depends on
NETDEVICES  INET  EXPERIMENTAL. NETDEVICES has been there since
the beginning of git history and according to my Mercurial import from
BKCVS, it's been dependent on NETDEVICES since I first submitted it.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [2.6 patch] NETCONSOLE must depend on INET

2005-07-26 Thread Matt Mackall

On Tue, Jul 19, 2005 at 02:01:04PM -0700, David S. Miller wrote:
 From: Adrian Bunk [EMAIL PROTECTED]
 Date: Tue, 19 Jul 2005 20:29:19 +0200

  NETCONSOLE=y and INET=n results in the following compile error:

 Also applied, thanks Adrian.

I should have been cc:ed on this.

This problem also exists in PKTGEN. And this fix is incorrect as
neither is dependent on the IP part of the networking stack in any
substantive way. The right fix is to make inet_aton available outside
of CONFIG_INET (preferred) or making private copies of inet_aton.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [2.6 patch] NETCONSOLE must depend on INET

2005-07-26 Thread Matt Mackall

[sch added to cc: as I think he's the effective pktgen maintainer]

On Tue, Jul 26, 2005 at 05:03:49PM -0700, David S. Miller wrote:
 From: Matt Mackall [EMAIL PROTECTED]
 Date: Tue, 26 Jul 2005 16:58:24 -0700
 
  On Tue, Jul 26, 2005 at 04:32:02PM -0700, David S. Miller wrote:
   More seriously, please submit a version of whatever you
   believe to be the more correct fix so it can be reviewed
   and integrated.
  
  Do you have a preferred location to put this function or
  shall I invent one?
 
 I actually can't wait to see where you're going to
 to put a function that transforms IPV4 addresses
 from ascii other than some place protected by CONFIG_INET.
 :-)

# HG changeset patch
# User [EMAIL PROTECTED]
# Node ID 6cdd6f36d53678a016cfbf5ce667cbd91504d538
# Parent  75716ae25f9d87ee2a5ef7c4df2d8f86e0f3f762
Move in_aton from net/ipv4/utils.c to net/core/utils.c

Move in_aton to allow netpoll and pktgen to work without the rest of
the IPv4 stack. Fix whitespace and add comment for the odd placement.

Delete now-empty net/ipv4/utils.c

Re-enable netpoll/netconsole without CONFIG_INET

Signed-off-by: Matt Mackall [EMAIL PROTECTED]

diff -r 75716ae25f9d -r 6cdd6f36d536 drivers/net/Kconfig
--- a/drivers/net/Kconfig   Tue Jul 26 23:21:24 2005
+++ b/drivers/net/Kconfig   Wed Jul 27 02:31:24 2005
@@ -2544,7 +2544,7 @@
 
 config NETCONSOLE
tristate Network console logging support (EXPERIMENTAL)
-   depends on NETDEVICES  INET  EXPERIMENTAL
+   depends on NETDEVICES  EXPERIMENTAL
---help---
If you want to log kernel messages over the network, enable this.
See file:Documentation/networking/netconsole.txt for details.
diff -r 75716ae25f9d -r 6cdd6f36d536 net/core/utils.c
--- a/net/core/utils.c  Tue Jul 26 23:21:24 2005
+++ b/net/core/utils.c  Wed Jul 27 02:31:24 2005
@@ -23,9 +23,9 @@
 #include linux/percpu.h
 #include linux/init.h
 
+#include asm/byteorder.h
 #include asm/system.h
 #include asm/uaccess.h
-
 
 /*
   This is a maximally equidistributed combined Tausworthe generator
@@ -153,3 +153,38 @@
 EXPORT_SYMBOL(net_random);
 EXPORT_SYMBOL(net_ratelimit);
 EXPORT_SYMBOL(net_srandom);
+
+/*
+ * Convert an ASCII string to binary IP.
+ * This is outside of net/ipv4/ because various code that uses IP addresses
+ * is otherwise not dependent on the TCP/IP stack.
+ */
+
+__u32 in_aton(const char *str)
+{
+   unsigned long l;
+   unsigned int val;
+   int i;
+
+   l = 0;
+   for (i = 0; i  4; i++)
+   {
+   l = 8;
+   if (*str != '\0')
+   {
+   val = 0;
+   while (*str != '\0'  *str != '.')
+   {
+   val *= 10;
+   val += *str - '0';
+   str++;
+   }
+   l |= val;
+   if (*str != '\0')
+   str++;
+   }
+   }
+   return(htonl(l));
+}
+
+EXPORT_SYMBOL(in_aton);
diff -r 75716ae25f9d -r 6cdd6f36d536 net/ipv4/Makefile
--- a/net/ipv4/Makefile Tue Jul 26 23:21:24 2005
+++ b/net/ipv4/Makefile Wed Jul 27 02:31:24 2005
@@ -2,7 +2,7 @@
 # Makefile for the Linux TCP/IP (INET) layer.
 #
 
-obj-y := utils.o route.o inetpeer.o protocol.o \
+obj-y := route.o inetpeer.o protocol.o \
 ip_input.o ip_fragment.o ip_forward.o ip_options.o \
 ip_output.o ip_sockglue.o \
 tcp.o tcp_input.o tcp_output.o tcp_timer.o tcp_ipv4.o \
diff -r 75716ae25f9d -r 6cdd6f36d536 net/ipv4/utils.c
--- a/net/ipv4/utils.c  Tue Jul 26 23:21:24 2005
+++ /dev/null   Wed Jul 27 02:31:24 2005
@@ -1,59 +0,0 @@
-/*
- * INETAn implementation of the TCP/IP protocol suite for the 
LINUX
- * operating system.  INET is implemented using the  BSD Socket
- * interface as the means of communication with the user level.
- *
- * Various kernel-resident INET utility functions; mainly
- * for format conversion and debugging output.
- *
- * Version:$Id: utils.c,v 1.8 2000/10/03 07:29:01 anton Exp $
- *
- * Author: Fred N. van Kempen, [EMAIL PROTECTED]
- *
- * Fixes:
- * Alan Cox:   verify_area check.
- * Alan Cox:   removed old debugging.
- * Andi Kleen  :   add net_ratelimit()  
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- */
-
-#include linux/module.h
-#include linux/types.h
-#include asm/byteorder.h
-
-/*
- * Convert an ASCII string to binary IP. 
- */
- 
-__u32 in_aton(const char *str)
-{
-   unsigned long l;
-   unsigned int val;
-   int i;
-
-   l = 0

87 matches

Mail list logo