Re: "protocol 0300 is buggy" spam in dmesg when injecting&capturing on same interface

2008-02-19 Thread Pekka Pietikainen
On Mon, Feb 18, 2008 at 05:39:03PM +0200, Pekka Pietikainen wrote:
> When playing with some L2 level fuzzing I started getting lots of
> "protocol 0300 is buggy, dev eth3" spew in dmesg. That interface is also
> capturing the traffic that's being sent, that's probably why the
> dev_queue_xmit_nit codepath is getting called in the first place.
> 
> Any ideas? Add a "If it came from AF_PACKET, don't print out anything" to
> that if-statement?
I'm probably just plastering over a bug in af_packet.c with this one,
but the following patch should make it shut up. The printk definately
needs a ntohs for skb2->protocol, took me a while to figure out where
the 0300 even came from :-) 

Signed-off-by: Pekka Pietikainen <[EMAIL PROTECTED]>

diff -up linux-2.6.24.i686/net/core/dev.c.orig linux-2.6.24.i686/net/core/dev.c
--- linux-2.6.24.i686/net/core/dev.c.orig   2008-02-19 15:22:12.0 
+0200
+++ linux-2.6.24.i686/net/core/dev.c2008-02-19 15:29:37.0 +0200
@@ -1262,10 +1262,11 @@ static void dev_queue_xmit_nit(struct sk
 
if (skb_network_header(skb2) < skb2->data ||
skb2->network_header > skb2->tail) {
-   if (net_ratelimit())
-   printk(KERN_CRIT "protocol %04x is "
-  "buggy, dev %s\n",
-  skb2->protocol, dev->name);
+   if (skb2->protocol != htons(ETH_P_ALL) &&
+   net_ratelimit())
+   pr_crit("protocol %04x is buggy, dev 
%s\n",
+   ntohs(skb2->protocol),
+       dev->name);
skb_reset_network_header(skb2);
}
 


-- 
Pekka Pietikainen
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


"protocol 0300 is buggy" spam in dmesg when injecting&capturing on same interface

2008-02-18 Thread Pekka Pietikainen
When playing with some L2 level fuzzing I started getting lots of
"protocol 0300 is buggy, dev eth3" spew in dmesg. That interface is also
capturing the traffic that's being sent, that's probably why the
dev_queue_xmit_nit codepath is getting called in the first place.

Tested on 2.6.23-as-shipped-in-F8. didn't spot any relevant changes in .24
but can pretty easily verify there too. 

Oh. That printk wasn't very easy to find:

  if (net_ratelimit())
printk(KERN_CRIT "protocol %04x is "
   "buggy, dev %s\n",


and I naturally grepped for "is buggy". 

Any ideas? Add a "If it came from AF_PACKET, don't print out anything" to
that if-statement?


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] remove Documentation/networking/net-modules.txt

2007-10-29 Thread Pekka Pietikainen
On Wed, Oct 24, 2007 at 06:25:03PM +0200, Adrian Bunk wrote:
> According to git, the only one who touched this file during the last
> 5 years was me when removing drivers...

That's not the only obsolete thing there:
>  ncsa-telnet
>   - notes on how NCSA telnet (DOS) breaks with MTU discovery enabled.
And probably others too. Then again, the information there isn't wrong, it's
just totally useless these days :P

-- 
Pekka Pietikainen
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tcp bw in 2.6

2007-10-03 Thread Pekka Pietikainen
On Wed, Oct 03, 2007 at 02:23:58PM -0700, Larry McVoy wrote:
> > A few notes to the discussion. I've seen one e1000 "bug" that ended up being
> > a crappy AMD pre-opteron SMP chipset with a totally useless PCI bus
> > implementation, which limited performance quite a bit-totally depending on
> > what you plugged in and in which slot. 10e milk-and-bread-store 
> > 32/33 gige nics actually were better than server-class e1000's 
> > in those, but weren't that great either.
> 
> That could well be my problem, this is a dual processor (not core) athlon
> (not opteron) tyan motherboard if I recall correctly.
If it's AMD760/768MPX, here's some relevant discussion:

http://lkml.org/lkml/2002/7/18/292  
http://www.ussg.iu.edu/hypermail/linux/kernel/0307.1/1109.html  
http://www.ussg.iu.edu/hypermail/linux/kernel/0307.1/1154.html  
http://www.ussg.iu.edu/hypermail/linux/kernel/0307.1/1212.html 
http://forums.2cpu.com/showthread.php?s=&threadid=31211

> 
> > Check your interrupt rates for the interface. You shouldn't be getting
> > anywhere near 1 interrupt/packet. If you are, something is badly wrong :).
> 
> The acks (because I'm sending) are about 1.5 packets/interrupt.
> When this box is receiving it's moving about 3x ass much data
> and has a _lower_ (absolute, not per packet) interrupt load.
Probably not a problem then, since those acks probably cover many 
sent packets. Current interrupt mitigation schemes are pretty 
dynamic, balancing between latency and bulk performance so the acks
might be fine (thousands vs. tens of thousands/sec)

-- 
Pekka Pietikainen
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tcp bw in 2.6

2007-10-03 Thread Pekka Pietikainen
On Tue, Oct 02, 2007 at 02:21:32PM -0700, Larry McVoy wrote:
> More data, sky2 works fine (really really fine, like 79MB/sec) between
> Linux dylan.bitmover.com 2.6.18.1 #5 SMP Mon Oct 23 17:36:00 PDT 2006 i686
> Linux steele 2.6.20-16-generic #2 SMP Sun Sep 23 18:31:23 UTC 2007 x86_64
> 
> So this is looking like a e1000 bug.  I'll try to upgrade the kernel on 
> the ia64 box and see what happens.
A few notes to the discussion. I've seen one e1000 "bug" that ended up being
a crappy AMD pre-opteron SMP chipset with a totally useless PCI bus
implementation, which limited performance quite a bit-totally depending on
what you plugged in and in which slot. 10e milk-and-bread-store 
32/33 gige nics actually were better than server-class e1000's 
in those, but weren't that great either.

A few things worth trying out is using recv(.., MSG_TRUNC ) on the receiver,
that tests the theoretical sender maximum performance much better (but memory
bandwidth vs. GigE is much higher these days than it was in 2001 so maybe
not that useful anymore).

Check your interrupt rates for the interface. You shouldn't be getting
anywhere near 1 interrupt/packet. If you are, something is badly wrong :).

Running getsockopt(...TCP_INFO) every few secs on the socket and printing
that out can be useful too. That gives you both sides' idea on what the
tcp windows etc. are.

My favourite tool is a home-made thing called yantt btw. 
( http://www.ee.oulu.fi/~pp/yantt.tgz . Needs lots of cleanup love, 
it mucks with the window sizes by default, since in the 2.4 days you really
had to do that to get any kind of performance and the help text is wrong.
But it's pretty easy to hack to try out new ideas, use 
sendfile/MSG_TRUNC/TCP_INFO etc.

Netperf is the kitchen sink of network benchmark tools. But trying out a few
tiny things with it is not fun at all, I tried and quickly decided to 
write my own tool for my master's thesis work ;-)

Oh. Don't measure CPU usage with top. Use a cyclesoaker (google for
cyclesoak, I included akpm's with yantt) :-)

And yes. TCP stacks do have bugs, especially when things get outside the
equipment most people have. Having a dedicated transatlantic 2.5Gbps
connection found a really fun one a long time ago ;)

-- 
Pekka Pietikainen
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Missing VLAN tags in bnx2

2007-02-28 Thread Pekka Pietikainen
Just had to spend some time figuring out why a bnx2 card connected to
a switch monitor port didn't see any vlan tags (when in our scenario the
tags are pretty vital).  Found the following explanation:

[BNX2]: Fix VLAN on ASF

Always set up the device to strip incoming VLAN tags when ASF is
enabled. ASF firmware will not parse packets correctly if VLAN tags
are not stripped

My fix:

#ifdef I_DONT_KNOW_WHAT_ASF_IS_AND_DONT_REALLY_CARE_EITHER
if (REG_RD_IND(bp, bp->shmem_base + BNX2_PORT_FEATURE) &
BNX2_PORT_FEATURE_ASF_ENABLED)
bp->flags |= ASF_ENABLE_FLAG;
#endif

Any hope of getting this as a ethtool tunable or something similar?
There didn't seem to be a BIOS option for this (Dell PE2900), at least.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TKIP encryption should allocate enough tailroom

2007-01-19 Thread Pekka Pietikainen
On Thu, Jan 18, 2007 at 08:55:37AM -0500, Brandon Craig Rhodes wrote:
> to debugging messages!  In some circumstances, debug messages are
> always produced; in several others, net_ratelimit() is called to
> decided whether to print an error (but why in these cases and not
> others?); and in many cases, nothing is printed at all (is this
> because convention would dictate that the caller discover the error
> and print something out?).
> 
> If I want to generate a patch that festoons the ieee80211 functions
> with informative error messages, what are the guidelines?
My understanding is:

BUG_ON() / BUG() if it's a clear "impossible" condition ("function calling
me was wrong") null pointers/buffer lengths being inconsistent. Might even be
justified in this case? 

net_ratelimit() says:
/*
 * All net warning printk()s should be guarded by this function.
 */
int net_ratelimit(void)
{
return __printk_ratelimit(net_msg_cost, net_msg_burst);
}

Especially important if the code path can be triggered by anyone (local user
or arbitrary packet from the network). Otherwise not that big a deal if it's
buggy code elsewhere in the kernel that causes the message to be printed.
You fix the code and you stop getting thousands of lines of debug
messages/second (which is why net_ratelimit() exists).

If it's an arbitrary packet from the network, there probably should even
be a sysctl to enable/disable debug output completely. IPv4 has:

static void ip_handle_martian_source(struct net_device *dev,
 struct in_device *in_dev,
 struct sk_buff *skb,
 __be32 daddr,
 __be32 saddr)
{
RT_CACHE_STAT_INC(in_martian_src);
#ifdef CONFIG_IP_ROUTE_VERBOSE
if (IN_DEV_LOG_MARTIANS(in_dev) && net_ratelimit()) {
/*
 *  RFC1812 recommendation, if source is martian,
 *  the only hint is MAC header.
 */
printk(KERN_WARNING "martian source %u.%u.%u.%u from "
"%u.%u.%u.%u, on dev %s\n",
NIPQUAD(daddr), NIPQUAD(saddr), dev->name);
...

(so there's a #ifdef _and_ a log_martians sysctl to see debug output).
In general #ifdefs should be avoided.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TKIP encryption should allocate enough tailroom

2007-01-18 Thread Pekka Pietikainen
On Wed, Jan 17, 2007 at 11:46:35AM -0500, Brandon Craig Rhodes wrote:
> Having further reviewed my code, I have changed my mind; the
> ieee80211_crypt_tkip routines are not designed to be responsible for
> creating enough headroom and tailroom.  The "hostap" driver should be
> doing this.  In fact, I now see that the "hostap" driver actually
> attempts to create enough headroom and tailroom, but computes them
> incorrectly.
Even then, if ieee80211_tkip_encrypt() didn't produce debug output for the
not enough space-case, should that be added to catch other
potentially broken drivers?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bluetooth] recursive locking in l2cap_sock_accept/bt_accept_dequeue

2006-09-17 Thread Pekka Pietikainen
On Sun, Sep 17, 2006 at 02:18:26PM +0200, Marcel Holtmann wrote:
> Hi Pekka,
> 
> > Got this from a 2.6.18rc7-based fedora-devel kernel:
> > 
> > =
> > [ INFO: possible recursive locking detected ]
> > 2.6.17-1.2647.fc6 #1

> > sdpd/4955 is trying to acquire lock:
> >  (sk_lock-AF_BLUETOOTH){--..}, at: [] bt_accept_dequeue+0x26/0xc6 
> > [bluetooth]
> 
> what where you doing when this happened?
Hiya

Hmn. Plugged my USB BT dongle in, started gnome-obex-server and sent a few
pictures from my cellphone to the box, nothing fancy.. Bluetooth daemons
running obviously at this time...


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bluetooth] recursive locking in l2cap_sock_accept/bt_accept_dequeue

2006-09-17 Thread Pekka Pietikainen
Hiya!

Got this from a 2.6.18rc7-based fedora-devel kernel:

=
[ INFO: possible recursive locking detected ]
2.6.17-1.2647.fc6 #1
-
sdpd/4955 is trying to acquire lock:
 (sk_lock-AF_BLUETOOTH){--..}, at: [] bt_accept_dequeue+0x26/0xc6 
[bluetooth]

but task is already holding lock:
 (sk_lock-AF_BLUETOOTH){--..}, at: [] l2cap_sock_accept+0x41/0x11e 
[l2cap]

other info that might help us debug this:
1 lock held by sdpd/4955:
 #0:  (sk_lock-AF_BLUETOOTH){--..}, at: [] 
l2cap_sock_accept+0x41/0x11e [l2cap]

stack backtrace:
 [] show_trace_log_lvl+0x58/0x171
 [] show_trace+0xd/0x10
 [] dump_stack+0x19/0x1b
 [] __lock_acquire+0x778/0x99c
 [] lock_acquire+0x4b/0x6d
 [] lock_sock+0xac/0xbc
 [] bt_accept_dequeue+0x26/0xc6 [bluetooth]
 [] l2cap_sock_accept+0xe2/0x11e [l2cap]
 [] sys_accept+0xd8/0x179
 [] sys_socketcall+0xa7/0x186
 [] syscall_call+0x7/0xb
DWARF2 unwinder stuck at syscall_call+0x7/0xb
Leftover inexact backtrace:
 [] show_trace+0xd/0x10
 [] dump_stack+0x19/0x1b
 [] __lock_acquire+0x778/0x99c
 [] lock_acquire+0x4b/0x6d
 [] lock_sock+0xac/0xbc
 [] bt_accept_dequeue+0x26/0xc6 [bluetooth]
 [] l2cap_sock_accept+0xe2/0x11e [l2cap]
 [] sys_accept+0xd8/0x179
 [] sys_socketcall+0xa7/0x186
 [] syscall_call+0x7/0xb
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: airo: inconsistent {hardirq-on-W} -> {in-hardirq-W} usage.

2006-08-14 Thread Pekka Pietikainen
On Tue, Aug 08, 2006 at 02:38:51PM +0300, Pekka Pietikainen wrote:
> Hmm... I retried with a 2.6.18rc4-based rawhide kernel and the warning
> is still there, previous one was rc3-git7.
> 
> Could be http://marc.theaimsgroup.com/?l=linux-netdev&m=115461336523555&w=2
> which isn't upstream yet, right?
The warning is gone in the rawhide kernel-of-the-day and
everything seems happy now, so whatever it was, it's fixed now.



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: airo: inconsistent {hardirq-on-W} -> {in-hardirq-W} usage.

2006-08-08 Thread Pekka Pietikainen
On Tue, Aug 08, 2006 at 09:16:13PM +1000, Herbert Xu wrote:
> Pekka Pietikainen <[EMAIL PROTECTED]> wrote:
> > Only aironet lockdep related report I could find was 
> > http://marc.theaimsgroup.com/?l=linux-netdev&m=115406279721287&w=2
> > 
> > this looks a bit different:
> > 
> > Linux version 2.6.17-1.2528.fc6 ([EMAIL PROTECTED]) (gcc version 4.1.1 
> > 20060802 (Red Hat 4.1.1-14)) #1 SMP Sun Aug 6 01:43:42 EDT 2006
> 
> If you run the latest upstream kernel this warning will disappear
> as different skb queues no have different lock keys.
Hmm... I retried with a 2.6.18rc4-based rawhide kernel and the warning
is still there, previous one was rc3-git7.

Could be http://marc.theaimsgroup.com/?l=linux-netdev&m=115461336523555&w=2
which isn't upstream yet, right?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


airo: inconsistent {hardirq-on-W} -> {in-hardirq-W} usage.

2006-08-08 Thread Pekka Pietikainen
Only aironet lockdep related report I could find was 
http://marc.theaimsgroup.com/?l=linux-netdev&m=115406279721287&w=2

this looks a bit different:

Linux version 2.6.17-1.2528.fc6 ([EMAIL PROTECTED]) (gcc version 4.1.1 20060802 
(Red Hat 4.1.1-14)) #1 SMP Sun Aug 6 01:43:42 EDT 2006

=
[ INFO: inconsistent lock state ]
-
inconsistent {hardirq-on-W} -> {in-hardirq-W} usage.
swapper/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
 (&list->lock){++..}, at: [] skb_queue_tail+0x14/0x32
{hardirq-on-W} state was registered at:
  [] lock_acquire+0x4b/0x6a
  [] _spin_lock_bh+0x1e/0x2d
  [] udp_poll+0x49/0xce
  [] sock_poll+0x12/0x15
  [] do_sys_poll+0x1f7/0x3c3
  [] sys_poll+0x34/0x36
  [] syscall_call+0x7/0xb
irq event stamp: 516288
hardirqs last  enabled at (516287): [] restore_nocheck+0x12/0x15
hardirqs last disabled at (516288): [] common_interrupt+0x1b/0x2c
softirqs last  enabled at (516272): [] __do_softirq+0xec/0xf2
softirqs last disabled at (516261): [] do_softirq+0x5a/0xbe

other info that might help us debug this:
no locks held by swapper/0.

stack backtrace:
 [] show_trace_log_lvl+0x58/0x159
 [] show_trace+0xd/0x10
 [] dump_stack+0x19/0x1b
 [] print_usage_bug+0x1ca/0x1d7
 [] mark_lock+0x96/0x353
 [] __lock_acquire+0x3b2/0x997
 [] lock_acquire+0x4b/0x6a
 [] _spin_lock_irqsave+0x22/0x32
 [] skb_queue_tail+0x14/0x32
 [] netlink_broadcast+0x1bf/0x28e
 [] wireless_send_event+0x28b/0x29d
 [] airo_send_event+0x6c/0x76 [airo]
 [] airo_interrupt+0x210/0xff7 [airo]
 [] handle_IRQ_event+0x20/0x4d
 [] __do_IRQ+0x94/0xef
 [] do_IRQ+0xb9/0xcd
 [] common_interrupt+0x25/0x2c
DWARF2 unwinder stuck at common_interrupt+0x25/0x2c
Leftover inexact backtrace:
 [] show_trace+0xd/0x10
 [] dump_stack+0x19/0x1b
 [] print_usage_bug+0x1ca/0x1d7
 [] mark_lock+0x96/0x353
 [] __lock_acquire+0x3b2/0x997
 [] lock_acquire+0x4b/0x6a
 [] _spin_lock_irqsave+0x22/0x32
 [] skb_queue_tail+0x14/0x32
 [] netlink_broadcast+0x1bf/0x28e
 [] wireless_send_event+0x28b/0x29d
 [] airo_send_event+0x6c/0x76 [airo]
 [] airo_interrupt+0x210/0xff7 [airo]
 [] handle_IRQ_event+0x20/0x4d
 [] __do_IRQ+0x94/0xef
 [] do_IRQ+0xb9/0xcd
 [] common_interrupt+0x25/0x2c
 [] cpu_idle+0xa7/0xc1
 [] rest_init+0x23/0x26
 [] start_kernel+0x3ab/0x3b3
 [] 0xc0400210
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sky2 rx error (was: Re: sky2 freezes the system)

2006-02-03 Thread Pekka Pietikainen
On Tue, Jan 31, 2006 at 01:45:18PM -0800, Stephen Hemminger wrote:
> The driver doesn't handle overlength packets properly. 
> Someone maybe sending you jumbo frames or some other crap.
> 
> The vendor driver doesn't handle over size frames either;
> it just resets itself every 5 seconds so you don't notice.
Indeed, I just set the MTU to 4000 on the interface with problems and
the hangs went away (it's connected to a switch monitor port so doesn't really
matter what the MTU is, if it's oversized I definately want to see what it
was :-) ). 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sky2 rx error (was: Re: sky2 freezes the system)

2006-01-31 Thread Pekka Pietikainen
On Tue, Jan 24, 2006 at 06:36:29PM +0200, Pekka Pietikainen wrote:
> On Mon, Jan 23, 2006 at 08:03:15PM +0200, Pekka Pietikainen wrote:
> > The box is a Nexcom NSA 1086 with 4x skge ports and 4x sky2. acpi=off made
> > the driver work apparently, haven't used it with much load yet, though.
> Ran the box for a while with some load (the sky2 ports on the box are only 
> used for sniffing, really, so no outbound traffic), and after a while 
> the box stopped receiving traffic on one of the interfaces (the one 
> with the most traffic). Tried rmmod/modprobe sky2, that didn't help as
> well as disabling rx checksums with ethtool.
Still happens with 2.6.16-rc1-git4 + sky2-0.15. I did get 

sky2 eth3: rx error, status 0x7ffc2001 length 72
sky2 eth3: rx error, status 0x7ffc2001 length 72
sky2 eth3: rx error, status 0x7ffc2001 length 112
sky2 eth3: rx error, status 0x7ffc2001 length 108
sky2 eth3: rx error, status 0x7ffc2001 length 104
sky2 eth3: rx error, status 0x7ffc2001 length 80
sky2 eth3: rx error, status 0x7ffc2001 length 108
sky2 eth3: rx error, status 0x7ffc2001 length 112
sky2 eth3: rx error, status 0x7ffc2001 length 104

shortly after booting, then the box (NSA 1086) ran ok for quite some time,
several hours. Then it just stopped accepting packets on one of the sky2
interfaces (others continued running). rmmod/modprobe fixed it now. Nothing
in the logs, unfortunately. Ethtool showed link was still up...
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: skge bridge & hw csum failure (Was: Re: [BUG] sky2 broken for Yukon PCI-E Gigabit Ethernet Controller 11ab:4362 (rev 19))

2006-01-31 Thread Pekka Pietikainen
On Mon, Jan 30, 2006 at 04:15:51PM -0800, Stephen Hemminger wrote:
> Okay, what is the hardware version:
>   dmesg | grep skge
> Maybe that chip rev is no good.
skge 1.3 addr 0xd042c000 irq 169 chip Yukon-Lite rev 7
skge eth0: addr 00:10:f3:06:83:d4
skge 1.3 addr 0xd042 irq 193 chip Yukon-Lite rev 7
skge eth1: addr 00:10:f3:06:83:d5
skge 1.3 addr 0xd0424000 irq 185 chip Yukon-Lite rev 7
skge eth2: addr 00:10:f3:06:83:d6
skge 1.3 addr 0xd0428000 irq 177 chip Yukon-Lite rev 7
skge eth3: addr 00:10:f3:06:83:d7

In related news, the box has been running with a 2.6.16-rc1-git4 based
kernel for 8 hours now without any errors. First kernel+driver version
that does that (apart from vendor-sk98lin on a kernel without slab
debugging ;) ;) ;) ). 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


skge bridge & hw csum failure (Was: Re: [BUG] sky2 broken for Yukon PCI-E Gigabit Ethernet Controller 11ab:4362 (rev 19))

2006-01-30 Thread Pekka Pietikainen
On Fri, Jan 27, 2006 at 11:22:42PM +1100, Herbert Xu wrote:
> OK, although we can't rule out sky2/netfilter from the enquiry, I've
> identified two bugs in ppp/pppoe that may be responsible for what you
> are seeing.  So please try the following patch and let us know if the
> problem still exists (or deteriorates/improves).
Borrowing this thread for a related problem, I'm getting lots of those on a
bridge device (this one running skge, rmmod skge; modprobe sk98lin actually
seemed to do it too, I've disabled rx checksums with ethtool for now). 
Kernel is a 2.6.15.1-ish Fedora one.

skge:

bridge-cd: hw csum failure.
 [] __skb_checksum_complete+0x56/0x5c []
icmp_error+0xbf/0x1af [ip_conntrack]
 [] __wake_up+0x32/0x43 [] icmp_error+0x0/0x1af
[ip_conntrack]
 [] ip_conntrack_in+0x95/0x2d6 [ip_conntrack] []
__wake_up_bit+0x2e/0x33
 [] end_buffer_async_write+0xbf/0x12a []
nf_iterate+0x60/0x84
 [] br_nf_pre_routing_finish+0x0/0x320 [bridge] []
nf_hook_slow+0x4d/0xf9
 [] br_nf_pre_routing_finish+0x0/0x320 [bridge] []
br_nf_pre_routing+0x2f5/0x431 [bridge]
 [] br_nf_pre_routing_finish+0x0/0x320 [bridge] []
nf_iterate+0x60/0x84
 [] br_handle_frame_finish+0x0/0xe9 [bridge] []
nf_hook_slow+0x4d/0xf9
 [] br_handle_frame_finish+0x0/0xe9 [bridge] []
br_handle_frame+0x164/0x23e [bridge]
 [] br_handle_frame_finish+0x0/0xe9 [bridge] []
netif_receive_skb+0x1ac/0x325
 [] skge_poll+0x3b6/0x4be [skge] []
__mod_timer+0x85/0xa0
 [] net_rx_action+0xb7/0x1bb []
__do_softirq+0x72/0xdc
 [] do_softirq+0x4b/0x4f
 ===
 [] do_IRQ+0x55/0x86 []
smp_apic_timer_interrupt+0xc1/0xca
 [] common_interrupt+0x1a/0x20 []
mwait_idle+0x2a/0x34
 [] cpu_idle+0x6c/0xa7 [] start_kernel+0x173/0x1ca
 [] unknown_bootoption+0x0/0x1b6


and sk98lin 
bridge-cd: hw csum failure.
 [] __skb_checksum_complete+0x56/0x5c []
icmp_error+0xbf/0x1af [ip_conntrack]
 [] __wake_up_bit+0x2e/0x33 [] icmp_error+0x0/0x1af
[ip_conntrack]
 [] ip_conntrack_in+0x95/0x2d6 [ip_conntrack] []
mempool_free+0x3a/0x73
 [] end_bio_bh_io_sync+0x0/0x4f []
end_bio_bh_io_sync+0x0/0x4f
 [] nf_iterate+0x60/0x84 []
br_nf_pre_routing_finish+0x0/0x320 [bridge]
 [] nf_hook_slow+0x4d/0xf9 []
br_nf_pre_routing_finish+0x0/0x320 [bridge]
 [] br_nf_pre_routing+0x2f5/0x431 [bridge] []
br_nf_pre_routing_finish+0x0/0x320 [bridge]
 [] nf_iterate+0x60/0x84 []
br_handle_frame_finish+0x0/0xe9 [bridge]
 [] nf_hook_slow+0x4d/0xf9 []
br_handle_frame_finish+0x0/0xe9 [bridge]
 [] br_handle_frame+0x164/0x23e [bridge] []
br_handle_frame_finish+0x0/0xe9 [bridge]
 [] netif_receive_skb+0x1ac/0x325 []
process_backlog+0x83/0x129
 [] net_rx_action+0xb7/0x1bb []
__do_softirq+0x72/0xdc
 [] do_softirq+0x4b/0x4f
 ===
 [] do_IRQ+0x55/0x86 [] common_interrupt+0x1a/0x20
 [] page_waitqueue+0x5/0x32 [] unlock_page+0x1d/0x27
 [] __block_write_full_page+0x1e7/0x354 []
ext3_get_block+0x0/0x90 [ext3]
 [] block_write_full_page+0xe3/0x109 []
ext3_get_block+0x0/0x90 [ext3]
 [] ext3_ordered_writepage+0xe5/0x183 [ext3] []
bget_one+0x0/0x7 [ext3]
 [] mpage_writepages+0x222/0x3ee []
ext3_ordered_writepage+0x0/0x183 [ext3]
 [] __filemap_fdatawrite_range+0x66/0x72 []
filemap_fdatawrite+0x23/0x27
 [] do_fsync+0x55/0xc8 [] syscall_call+0x7/0xb

iptables forward chain is just ACCEPT...
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/9] sky2: version 0.14

2006-01-26 Thread Pekka Pietikainen
On Wed, Jan 25, 2006 at 04:28:44PM -0800, Stephen Hemminger wrote:
> Mostly a collection of bug fixes. The most critical is the pci express
> fix. Also adds Message Signaled Interrupt and entropy support.
Patches seemed to be mangled. Actually having a "version I want
people to test" tarball would be nice (for other drivers too!) for
those too lazy to play around with git :)

Index: sky2-1.0/drivers/net/sky2.c
===
--- sky2-1.0.orig/drivers/net/sky2.c
+++ sky2-1.0/drivers/net/sky2.c
@@ -2003,19 +2003,16 @@ static void sky2_hw_intr(struct sky2_hw 

if (status & Y2_IS_PCI_EXP) {
/* PCI-Express uncorrectable Error occurred */
-   u32 pex_err;
-
-   pci_read_config_dword(hw->pdev, PEX_UNC_ERR_STAT, &pex_err);


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


sky2 rx error (was: Re: sky2 freezes the system)

2006-01-24 Thread Pekka Pietikainen
On Mon, Jan 23, 2006 at 08:03:15PM +0200, Pekka Pietikainen wrote:
> The box is a Nexcom NSA 1086 with 4x skge ports and 4x sky2. acpi=off made
> the driver work apparently, haven't used it with much load yet, though.
Ran the box for a while with some load (the sky2 ports on the box are only 
used for sniffing, really, so no outbound traffic), and after a while 
the box stopped receiving traffic on one of the interfaces (the one 
with the most traffic). Tried rmmod/modprobe sky2, that didn't help as
well as disabling rx checksums with ethtool.

Jan 24 11:06:45 localhost kernel: sky2 eth4: enabling interface
Jan 24 11:06:46 localhost kernel: sky2 eth3: Link is up at 100 Mbps, full 
duplex, flow control none
Jan 24 11:06:46 localhost kernel: sky2 eth4: Link is up at 100 Mbps, full 
duplex, flow control none
Jan 24 11:06:48 localhost kernel: sky2 eth1: enabling interface
Jan 24 11:06:48 localhost kernel: sky2 eth2: enabling interface
Jan 24 11:41:30 localhost kernel: sky2 eth3: rx error, status 0x7ffc2001 length 
1112
Jan 24 12:05:10 localhost kernel: sky2 eth3: disabling interface
Jan 24 12:05:12 localhost kernel: sky2 eth3: enabling interface
Jan 24 12:05:12 localhost kernel: sky2 unknown status opcode 0xe3
Jan 24 12:05:12 localhost kernel: sky2 eth3: rx error, status 0x9f6c1703 length 
58049
Jan 24 12:05:14 localhost kernel: sky2 eth3: Link is up at 100 Mbps, full 
duplex, flow control none
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sky2 freezes the system

2006-01-23 Thread Pekka Pietikainen
> On Wed, 11 Jan 2006 14:00:17 -0800
> Stephen Hemminger <[EMAIL PROTECTED]> wrote:
>> On Wed, 11 Jan 2006 21:16:40 +0300
>> 
>> > 02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E 
>> > Gigabit Ethernet Controller (rev 19)
>> 
>> What is the chip revision, reported on dmesg output.  Something like:
>> 
>> [   27.769459] sky2 v0.12 addr 0xdfcfc000 irq 177 Yukon-EC (0xb6) rev 1
>> [   27.770012] sky2 eth0: addr 00:11:11:ed:c2:2e
>> [  245.515149] sky2 eth0: enabling interface
>> [  247.966639] sky2 eth0: phy interrupt status 0x1c40 0xbc4c
>> [  247.967056] sky2 eth0: Link is up at 1000 Mbps, full duplex, flow
>control both
>> 
>Still need to know the chip revision..
Mine does the same, lspci says 88E8053 rev 15, modprobe sky2 says

ACPI PCI Interrupt :01:00.0[A] -> Link [LNKA]  -> GSI 7 (level, low) 
sky2 v0.13 addr 0xd002 irq 7 Yukon-EC (0xb6) rev 1
sky2 eth0: addr 00:10:f3:06:73:d0
sky2 :01:00.0: pci express error (0x0)
printk: 447465 messages supressed.
sky2 :01:00.0: pci express error (0x0)
printk: 447465 messages supressed.
...

and the box dies

The box is a Nexcom NSA 1086 with 4x skge ports and 4x sky2. acpi=off made
the driver work apparently, haven't used it with much load yet, though.

sk98lin (8.23 and 8.28 tried) both work ok with ACPI enabled. Not admittedly
exactly the same kernel (for sk98lin it's a Fedora updates-testing
2.6.15, for sky2 it's a Fedora 2.6.16rc1-git1, which included
sky2 v0.13).

dmesg (with and without ACPI)

Linux version 2.6.15-1.2003_FC4 ([EMAIL PROTECTED]) (gcc version 4.0.2 20051125 
(Red Hat 4.0.2-8)) #1 Thu Jan 19 02:22:28 EST 2006
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009f400 (usable)
 BIOS-e820: 0009f400 - 000a (reserved)
 BIOS-e820: 000f - 0010 (reserved)
 BIOS-e820: 0010 - 3f7f (usable)
 BIOS-e820: 3f7f - 3f7f3000 (ACPI NVS)
 BIOS-e820: 3f7f3000 - 3f80 (ACPI data)
 BIOS-e820: e000 - f000 (reserved)
 BIOS-e820: fec0 - 0001 (reserved)
119MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000f4a60
Using x86 segment limits to approximate NX protection
On node 0 totalpages: 260080
  DMA zone: 4096 pages, LIFO batch:0
  DMA32 zone: 0 pages, LIFO batch:0
  Normal zone: 225280 pages, LIFO batch:31
  HighMem zone: 30704 pages, LIFO batch:7
DMI 2.2 present.
ACPI: RSDP (v000 IntelR) @ 0x000f8010
ACPI: RSDT (v001 IntelR AWRDACPI 0x42302e31 AWRD 0x) @ 0x3f7f3040
ACPI: FADT (v001 IntelR AWRDACPI 0x42302e31 AWRD 0x) @ 0x3f7f30c0
ACPI: MCFG (v001 IntelR AWRDACPI 0x42302e31 AWRD 0x) @ 0x3f7f6fc0
ACPI: MADT (v001 IntelR AWRDACPI 0x42302e31 AWRD 0x) @ 0x3f7f6f00
ACPI: DSDT (v001 INTELR AWRDACPI 0x1000 MSFT 0x010e) @ 0x
ACPI: PM-Timer IO Port: 0x408
ACPI: Local APIC disabled (-2); pass 'lapic' to re-enable.
LAPIC disabled (-2)
Allocating PCI resources starting at 4000 (gap: 3f80:a080)
Built 1 zonelists
Kernel command line: ro root=LABEL=/ console=ttyS1,115200 console=tty0 vga=ask 
init=/bin/sh
mapped APIC to d000 ()
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
CPU 0 irqstacks, hard=c040d000 soft=c040e000
PID hash table entries: 4096 (order: 12, 65536 bytes)
Detected 3000.605 MHz processor.
Using pmtmr for high-res timesource
Console: colour VGA+ 80x60
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 1025912k/1040320k available (2140k kernel code, 13684k reserved, 758k 
data, 200k init, 122816k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 6012.25 BogoMIPS (lpj=12024500)
Security Framework v1.0.0 initialized
SELinux:  Initializing.
SELinux:  Starting in permissive mode
selinux_register_security:  Registering secondary module capability
Capability LSM initialized as secondary
Mount-cache hash table entries: 512
CPU: After generic identify, caps: bfebfbff 0010   441d 
 
CPU: After vendor identify, caps: bfebfbff 0010   441d 
 
monitor/mwait feature present.
using mwait in idle threads.
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: After all inits, caps: bfebf3ff 0010  0180 441d 
 
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU0: Intel P4/Xeon Extended MCE MSRs (12) available
CPU0: Thermal monitoring enabled
mtrr: v2.0 (20020519)
CPU: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 01
Checking 'hlt' instruction... OK.
ACPI: setting ELCR to 0200 (from 0ea0)
Local APIC disabled by default; use 'lapic' to enable it.
checki

Re: latest netdev tree - (broadcom44 bug?) letting you know...

2005-08-05 Thread Pekka Pietikainen
On Thu, Aug 04, 2005 at 11:51:02PM +0200, Mateusz Berezecki wrote:
> Jeff Garzik wrote:
> 
> >
> >Nothing in that tree has changed the b44 driver...
> >  
> 
> Ok, so I will try to find out what has changed as soon as I have some 
> more time
> 
> also please ignore forwarded e-mail. I just updated the netdev list address.
I heard some rumors that someone recently broke ACPI (interrupt routing or
somesuch) and that broke b44 in some configs. Could be that.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html