64bit counters: iproute(netlink) vs ifconfig

2008-02-13 Thread Krzysztof Oledzki

Hello,

Just discovered that counters returned by the ip tool are truncated:

# ip -s link show bond0
1: bond0: BROADCAST,MULTICAST,MASTER,UP,LOWER_UP mtu 1500 qdisc noqueue
link/ether 00:1d:09:67:6e:2f brd ff:ff:ff:ff:ff:ff
RX: bytes  packets  errors  dropped overrun mcast
2485605521 9010211  0   0   0   6
TX: bytes  packets  errors  dropped carrier collsns
3023237974 9345397  0   0   0   0

# ifconfig bond0
bond0 Link encap:Ethernet  HWaddr 00:1D:09:67:6E:2F
  inet addr:192.168.152.62  Bcast:192.168.152.255  Mask:255.255.255.0
  inet6 addr: fe80::21d:9ff:fe67:6e2f/64 Scope:Link
  UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
  RX packets:9010367 errors:0 dropped:351 overruns:0 frame:0
  TX packets:9345521 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0
  RX bytes:2485631020 (2370.4 Mb)  TX bytes:7318232294 (6979.2 Mb)

Is it possible to get 64-bit counters in ip via netlink? Struct 
rtnl_link_stats does not look very optimistic, as it has rx_bytes/tx_bytes 
defined with __u32.


Best regards,

Krzysztof Olędzki

Re: [PATCH] ipvs: Make the synchronization interval controllable

2008-02-08 Thread Krzysztof Oledzki



On Fri, 8 Feb 2008, Andi Kleen wrote:


Sven Wegener [EMAIL PROTECTED] writes:


The default synchronization interval of 1000 milliseconds is too high for a
heavily loaded director. Collecting the connection information from one second
and then sending it out in a burst will overflow the socket buffer and lead to
synchronization information being dropped. Make the interval controllable by a
sysctl variable so that users can tune it.


It would be better if the defaults just worked under all circumstances.
So why not just lower the default?

Or the code could detect overflowing socket buffers and lower the
value dynamically.


We can also start sending when amount of data reaches defined level.

Best regards,

Krzysztof Olędzki

Re: [patch for 2.6.24? 1/1] bonding: locking fix

2008-01-18 Thread Krzysztof Oledzki



On Thu, 17 Jan 2008, Jay Vosburgh wrote:


Krzysztof Oledzki [EMAIL PROTECTED] wrote:


Andrew Morton [EMAIL PROTECTED] wrote:
[...]

Can we get this bug fixed please?  Today?  It has been known about for more
than two months.


I just reposted the complete fix; it's #1 of the series of 7.


Bad news. :( 2.6.24-rc7 + patch #1 (bonding: fix locking in sysfs
primary/active selection):

[...]

=
[ INFO: possible irq lock inversion dependency detected ]
2.6.24-rc7 #1
-
events/0/9 just changed the state of lock:
(mc-mca_lock){-+..}, at: [c041255a] mld_ifc_timer_expire+0x130/0x1fb
but this lock took another, soft-read-irq-unsafe lock in the past:
(bond-lock){-.--}


None of the seven patches I posted just a bit ago will fix this
lockdep warning (which is a different thing that the bug Andrew inquired
about); I'm still working on that one.

For that one, I had posted this work in progress patch:


Yes, this one works.


which makes the warning go away, but Herbert Xu pointed out that
there is a potential problem with bond_enslave accessing the mc_lists
without sufficient locking.  It's not the only offender, either, and the
bond-mc_list references really need to be protected by the bond_lock,
and the whole thing probably ought to use dev_mc_sync/unsync instead of
what it does now.

Since the bond_enslave, et al, business isn't a new problem, and
I've never heard of it being hit, I'm thinking now to just leave the
bond_enslave part for 2.6.25, and fix the lockdep warning for 2.6.24.


It is a new problem, as it never happened with =2.6.23.

Best regards,

Krzysztof Olędzki

Re: [PATCH 0/3] bonding: 3 fixes for 2.6.24

2008-01-14 Thread Krzysztof Oledzki



On Sat, 12 Jan 2008, Jay Vosburgh wrote:


Krzysztof Oledzki [EMAIL PROTECTED] wrote:
[...]

Exactly. All I need to do is to reboot my server, I have 100% probability
to get the warning.


I wish it were that easy for me; I'm not sure what magic thing
you've got on your server or network that I don't, but I haven't been
able to make this lockdep warning happen at all.


Right. So, what is the final patch? I would like to test it if that's
possible. ;)


Can you test the following and let me know if it triggers the
warning?  I believe this is the minimum locking needed, and based on
input from Herbert, we shouldn't need to hold the lock at _bh.  If this
one works, and nobody sees any other issues with it, then it's the final
patch for this lockdep problem.  I'll add some deep, meaningful comments
to explain the locking a bit (i.e., we're called with rtnl for the
allmulti and promisc cases, so we're ok there without additional locks,
but the later code could be called from anywhere, so it needs locks to
prevent the slave list from changing, but the mc_lists themselves are
covered by the netif_tx_lock that all callers will hold), but this would
be the actual code change.

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 77d004d..6906dbc 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3937,8 +3937,6 @@ static void bond_set_multicast_list(struct net_device 
*bond_dev)
struct bonding *bond = bond_dev-priv;
struct dev_mc_list *dmi;

-   write_lock_bh(bond-lock);
-
/*
 * Do promisc before checking multicast_mode
 */
@@ -3959,6 +3957,8 @@ static void bond_set_multicast_list(struct net_device 
*bond_dev)
bond_set_allmulti(bond, -1);
}

+   read_lock(bond-lock);
+
bond-flags = bond_dev-flags;

/* looking for addresses to add to slaves' mc list */
@@ -3979,7 +3979,7 @@ static void bond_set_multicast_list(struct net_device 
*bond_dev)
bond_mc_list_destroy(bond);
bond_mc_list_copy(bond_dev-mc_list, bond, GFP_ATOMIC);

-   write_unlock_bh(bond-lock);
+   read_unlock(bond-lock);
}

/*




I can confirm that the warning went away.

Tested-by: Krzysztof Piotr Oledzki [EMAIL PROTECTED]

Best regards,

Krzysztof Olędzki

Re: [PATCH 0/3] bonding: 3 fixes for 2.6.24

2008-01-12 Thread Krzysztof Oledzki



On Wed, 9 Jan 2008, Andy Gospodarek wrote:

On Wed, Jan 09, 2008 at 09:54:56AM -0800, Jay Vosburgh wrote:

CUT

This should silence the lockdep (if I'm understanding what
everybody's saying), and keep the change set to a minimum.  This might


The lockdep problem is easy to trigger.  The lockdep code does a good
job of noticing problems quickly regardless of how easy the deadlocks
are to create.


Exactly. All I need to do is to reboot my server, I have 100% probability 
to get the warning.



not even be worth pushing for 2.6.24; I'm not exactly sure how difficult
the lockdep problem would be to trigger.



I'd like to see it go in there (for correct-ness) and to avoid hearing
about these lockdep issues for the next few months until it makes it
into 2.4.25.


Right. So, what is the final patch? I would like to test it if that's 
possible. ;)


Best regards,

Krzysztof Olędzki

Re: [PATCH 0/3] bonding: 3 fixes for 2.6.24

2008-01-09 Thread Krzysztof Oledzki



On Tue, 8 Jan 2008, Jay Vosburgh wrote:


Krzysztof Oledzki [EMAIL PROTECTED] wrote:


Fine. Just let you know that someone test your patches and everything
works, except mentioned problem.


And I appreciate it; I just wanted to make sure our many fans
following along at home didn't misunderstand.

Could you let me know if the patch below make the lockdep
warning go away?  This applies on top of the previous three, although it
should be trivial to do by hand.

I'm still checking to make sure this is safe with regard to
mutexing the bonding structures, but it would be good to know if it
eliminates the warning.


I can confirm that the warning went away.

Tested-by: Krzysztof Piotr Oledzki [EMAIL PROTECTED]

Best regards,

Krzysztof Olędzki

Re: [PATCH 0/3] bonding: 3 fixes for 2.6.24

2008-01-08 Thread Krzysztof Oledzki



On Mon, 7 Jan 2008, Jay Vosburgh wrote:


Following are three fixes to fix locking problems and
silence locking-related warnings in the current 2.6.24-rc.

patch 1: fix locking in sysfs primary/active selection

Call core network functions with expected locks to
eliminate potential deadlock and silence warnings.

patch 2: fix ASSERT_RTNL that produces spurious warnings

Relocate ASSERT_RTNL to remove a false warning; after patch,
ASSERT is located in code that holds only RTNL (additional locks were
causing the ASSERT to trip)

patch 3: fix locking during alb failover and slave removal

Fix all call paths into alb_fasten_mac_swap to hold only RTNL.
Eliminates deadlock and silences warnings.

Patches are against the current netdev-2.6#upstream branch.

Please apply for 2.6.24.


2.6.24-rc7 + patches #1, #2, #3:

bonding: bond0: setting mode to active-backup (1).
bonding: bond0: Setting MII monitoring interval to 100.
ADDRCONF(NETDEV_UP): bond0: link is not ready
bonding: bond0: Adding slave eth0.
e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow 
Control: RX/TX
bonding: bond0: making interface eth0 the new active one.
bonding: bond0: first active interface up!
bonding: bond0: enslaving eth0 as an active interface with an up link.
bonding: bond0: Adding slave eth1.
ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready

=
[ INFO: possible irq lock inversion dependency detected ]
2.6.24-rc7 #1
-
events/0/9 just changed the state of lock:
 (mc-mca_lock){-+..}, at: [c041258e] mld_ifc_timer_expire+0x130/0x1fb
but this lock took another, soft-read-irq-unsafe lock in the past:
 (bond-lock){-.--}

and interrupts could create inverse lock ordering between them.


other info that might help us debug this:
4 locks held by events/0/9:
 #0:  (events){--..}, at: [c0133d33] run_workqueue+0x87/0x1b6
 #1:  ((linkwatch_work).work){--..}, at: [c0133d33] run_workqueue+0x87/0x1b6
 #2:  (rtnl_mutex){--..}, at: [c03ac678] linkwatch_event+0x5/0x22
 #3:  (ndev-lock){-.-+}, at: [c0412475] mld_ifc_timer_expire+0x17/0x1fb

the first lock's dependencies:
- (mc-mca_lock){-+..} ops: 10 {
   initial-use  at:
[c0104ee2] dump_trace+0x83/0x8d
[c0142890] __lock_acquire+0x4ba/0xc07
[c0109ef2] save_stack_trace+0x20/0x3a
[c0142f95] __lock_acquire+0xbbf/0xc07
[c0412d66] ipv6_dev_mc_inc+0x24d/0x31c
[c0143056] lock_acquire+0x79/0x93
[c04129ea] igmp6_group_added+0x18/0x11d
[c043a8aa] _spin_lock_bh+0x3b/0x64
[c04129ea] igmp6_group_added+0x18/0x11d
[c04129ea] igmp6_group_added+0x18/0x11d
[c0141f93] trace_hardirqs_on+0x122/0x14c
[c0412dbc] ipv6_dev_mc_inc+0x2a3/0x31c
[c0412d66] ipv6_dev_mc_inc+0x24d/0x31c
[c0412df1] ipv6_dev_mc_inc+0x2d8/0x31c
[c0412b19] ipv6_dev_mc_inc+0x0/0x31c
[c0402168] ipv6_add_dev+0x21c/0x24b
[c040b991] ndisc_ifinfo_sysctl_change+0x0/0x1ef
[c05c5ae9] addrconf_init+0x13/0x193
[c019a04b] proc_net_fops_create+0x10/0x21
[c041a44c] ip6_flowlabel_init+0x1e/0x20
[c05c59c9] inet6_init+0x1f0/0x2ad
[c05a9499] kernel_init+0x150/0x2b7
[c05a9349] kernel_init+0x0/0x2b7
[c05a9349] kernel_init+0x0/0x2b7
[c0104baf] kernel_thread_helper+0x7/0x10
[] 0x
   in-softirq-W at:
[c014197a] mark_lock+0x64/0x451
[c0142816] __lock_acquire+0x440/0xc07
[c0103f7b] restore_nocheck+0x12/0x15
[c0143056] lock_acquire+0x79/0x93
[c041258e] mld_ifc_timer_expire+0x130/0x1fb
[c041245e] mld_ifc_timer_expire+0x0/0x1fb
[c043a8aa] _spin_lock_bh+0x3b/0x64
[c041258e] mld_ifc_timer_expire+0x130/0x1fb
[c041258e] mld_ifc_timer_expire+0x130/0x1fb
[c041245e] mld_ifc_timer_expire+0x0/0x1fb
[c0141f7d] trace_hardirqs_on+0x10c/0x14c
[c041245e] mld_ifc_timer_expire+0x0/0x1fb
[c012e02e] run_timer_softirq+0xfa/0x15d
[c012a982] __do_softirq+0x56/0xdb
[c0141f7d] trace_hardirqs_on+0x10c/0x14c
[c012a994] __do_softirq+0x68/0xdb
[c012aa3d] do_softirq+0x36/0x51
 

Re: [PATCH 0/3] bonding: 3 fixes for 2.6.24

2008-01-08 Thread Krzysztof Oledzki



On Tue, 8 Jan 2008, Jay Vosburgh wrote:


Krzysztof Oledzki [EMAIL PROTECTED] wrote:


On Mon, 7 Jan 2008, Jay Vosburgh wrote:


Following are three fixes to fix locking problems and
silence locking-related warnings in the current 2.6.24-rc.

patch 1: fix locking in sysfs primary/active selection

Call core network functions with expected locks to
eliminate potential deadlock and silence warnings.

patch 2: fix ASSERT_RTNL that produces spurious warnings

Relocate ASSERT_RTNL to remove a false warning; after patch,
ASSERT is located in code that holds only RTNL (additional locks were
causing the ASSERT to trip)

patch 3: fix locking during alb failover and slave removal

Fix all call paths into alb_fasten_mac_swap to hold only RTNL.
Eliminates deadlock and silences warnings.

Patches are against the current netdev-2.6#upstream branch.

Please apply for 2.6.24.


2.6.24-rc7 + patches #1, #2, #3:

bonding: bond0: setting mode to active-backup (1).
bonding: bond0: Setting MII monitoring interval to 100.
ADDRCONF(NETDEV_UP): bond0: link is not ready
bonding: bond0: Adding slave eth0.
e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow 
Control: RX/TX
bonding: bond0: making interface eth0 the new active one.
bonding: bond0: first active interface up!
bonding: bond0: enslaving eth0 as an active interface with an up link.
bonding: bond0: Adding slave eth1.
ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready

=
[ INFO: possible irq lock inversion dependency detected ]
2.6.24-rc7 #1
-
events/0/9 just changed the state of lock:
(mc-mca_lock){-+..}, at: [c041258e] mld_ifc_timer_expire+0x130/0x1fb
but this lock took another, soft-read-irq-unsafe lock in the past:
(bond-lock){-.--}

and interrupts could create inverse lock ordering between them.


Just to be clear: the patch set I posted yesterday was not
intended to resolve the lockdep problem; I haven't studied that one yet.


Fine. Just let you know that someone test your patches and everything 
works, except mentioned problem.


Best regards,

Krzysztof Olędzki

Re: [Bugme-new] [Bug 9543] New: RTNL: assertion failed at net/ipv6/addrconf.c (2164)/RTNL: assertion failed at net/ipv4/devinet.c (1055)

2008-01-07 Thread Krzysztof Oledzki



On Wed, 19 Dec 2007, Andy Gospodarek wrote:


On Tue, Dec 18, 2007 at 08:53:39PM +0100, Krzysztof Oledzki wrote:



On Fri, 14 Dec 2007, Andy Gospodarek wrote:


On Fri, Dec 14, 2007 at 07:57:42PM +0100, Krzysztof Oledzki wrote:



On Fri, 14 Dec 2007, Andy Gospodarek wrote:


On Fri, Dec 14, 2007 at 05:14:57PM +0100, Krzysztof Oledzki wrote:



On Wed, 12 Dec 2007, Jay Vosburgh wrote:


Herbert Xu [EMAIL PROTECTED] wrote:


diff -puN drivers/net/bonding/bond_sysfs.c~bonding-locking-fix
drivers/net/bonding/bond_sysfs.c
--- a/drivers/net/bonding/bond_sysfs.c~bonding-locking-fix
+++ a/drivers/net/bonding/bond_sysfs.c
@@ -,8 +,6 @@ static ssize_t bonding_store_primary(str
out:
write_unlock_bh(bond-lock);

-   rtnl_unlock();
-


Looking at the changeset that added this perhaps the intention
is to hold the lock? If so we should add an rtnl_lock to the start
of the function.


Yes, this function needs to hold locks, and more than just
what's there now.  I believe the following should be correct; I haven't
tested it, though (I'm supposedly on vacation right now).

The following change should be correct for the
bonding_store_primary case discussed in this thread, and also corrects
the bonding_store_active case which performs similar functions.

The bond_change_active_slave and bond_select_active_slave
functions both require rtnl, bond-lock for read and curr_slave_lock
for
write_bh, and no other locks.  This is so that the lower level
mode-specific functions can release locks down to just rtnl in order to
call, e.g., dev_set_mac_address with the locks it expects (rtnl only).

Signed-off-by: Jay Vosburgh [EMAIL PROTECTED]

diff --git a/drivers/net/bonding/bond_sysfs.c
b/drivers/net/bonding/bond_sysfs.c
index 11b76b3..28a2d80 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -1075,7 +1075,10 @@ static ssize_t bonding_store_primary(struct
device
*d,
struct slave *slave;
struct bonding *bond = to_bond(d);

-   write_lock_bh(bond-lock);
+   rtnl_lock();
+   read_lock(bond-lock);
+   write_lock_bh(bond-curr_slave_lock);
+
if (!USES_PRIMARY(bond-params.mode)) {
printk(KERN_INFO DRV_NAME
   : %s: Unable to set primary slave; %s is in mode
   %d\n,
@@ -1109,8 +1112,8 @@ static ssize_t bonding_store_primary(struct
device
*d,
}
}
out:
-   write_unlock_bh(bond-lock);
-
+   write_unlock_bh(bond-curr_slave_lock);
+   read_unlock(bond-lock);
rtnl_unlock();

return count;
@@ -1190,7 +1193,8 @@ static ssize_t bonding_store_active_slave(struct
device *d,
struct bonding *bond = to_bond(d);

rtnl_lock();
-   write_lock_bh(bond-lock);
+   read_lock(bond-lock);
+   write_lock_bh(bond-curr_slave_lock);

if (!USES_PRIMARY(bond-params.mode)) {
printk(KERN_INFO DRV_NAME
@@ -1247,7 +1251,8 @@ static ssize_t bonding_store_active_slave(struct
device *d,
}
}
out:
-   write_unlock_bh(bond-lock);
+   write_unlock_bh(bond-curr_slave_lock);
+   read_unlock(bond-lock);
rtnl_unlock();

return count;


Vanilla 2.6.24-rc5 plus this patch:

=
[ INFO: possible irq lock inversion dependency detected ]
2.6.24-rc5 #1
-
events/0/9 just changed the state of lock:
(mc-mca_lock){-+..}, at: [c0411c7a] mld_ifc_timer_expire+0x130/0x1fb
but this lock took another, soft-read-irq-unsafe lock in the past:
(bond-lock){-.--}

and interrupts could create inverse lock ordering between them.




Grrr, I should have seen that -- sorry.  Try your luck with this instead:

CUT

No luck.




I'm guessing if we go back to using a write-lock for bond-lock this
will go back to working again, but I'm not totally convinced since there
are plenty of places where we used a read-lock with it.


Should I check this patch or rather, based on a future discussion, wait
for another version?



diff --git a/drivers/net/bonding/bond_sysfs.c
b/drivers/net/bonding/bond_sysfs.c
index 11b76b3..635b857 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -1075,7 +1075,10 @@ static ssize_t bonding_store_primary(struct device
*d,
struct slave *slave;
struct bonding *bond = to_bond(d);

+   rtnl_lock();
write_lock_bh(bond-lock);
+   write_lock_bh(bond-curr_slave_lock);
+
if (!USES_PRIMARY(bond-params.mode)) {
printk(KERN_INFO DRV_NAME
   : %s: Unable to set primary slave; %s is in mode
   %d\n,
@@ -1109,8 +1112,8 @@ static ssize_t bonding_store_primary(struct device
*d,
}
}
out:
+   write_unlock_bh(bond-curr_slave_lock);
write_unlock_bh(bond-lock);
-
rtnl_unlock();

return count

Re: [PATCH] sky2: Use deferrable timer for watchdog

2007-12-20 Thread Krzysztof Oledzki



On Thu, 20 Dec 2007, Parag Warudkar wrote:


On Dec 20, 2007 2:22 PM, Kok, Auke [EMAIL PROTECTED] wrote:

ok, that's just bad and if there's no user-defineable limit to the deferral I
definately don't like this change.

Can I safely assume that any irq will cause all deferred timers to run?


I think even other causes for wakeup like process related ones will
cause the CPU to go busy and run the timers.
This, coupled with the fact that no one is yet able to reach 0 wakeups
per second makes it pretty unlikely that deferrable timers will be
deferred indefinitely.



If this is the case then for e1000 this patch is still OK since the watchdog 
needs
to run (1) after a link up/down interrupt or (2) to update statistics. Those
statistics won't increase if there is no traffic of course...



I think it is reasonable for Network driver watchdogs to use a
deferrable timer - if the machine is 100% IDLE there is no one needing
the network to be up.


Please note tha being connected to a network does not only mean to send 
but also to receive.


Best regards,

Krzysztof Oledzki
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 9543] New: RTNL: assertion failed at net/ipv6/addrconf.c (2164)/RTNL: assertion failed at net/ipv4/devinet.c (1055)

2007-12-18 Thread Krzysztof Oledzki



On Fri, 14 Dec 2007, Andy Gospodarek wrote:


On Fri, Dec 14, 2007 at 07:57:42PM +0100, Krzysztof Oledzki wrote:



On Fri, 14 Dec 2007, Andy Gospodarek wrote:


On Fri, Dec 14, 2007 at 05:14:57PM +0100, Krzysztof Oledzki wrote:



On Wed, 12 Dec 2007, Jay Vosburgh wrote:


Herbert Xu [EMAIL PROTECTED] wrote:


diff -puN drivers/net/bonding/bond_sysfs.c~bonding-locking-fix
drivers/net/bonding/bond_sysfs.c
--- a/drivers/net/bonding/bond_sysfs.c~bonding-locking-fix
+++ a/drivers/net/bonding/bond_sysfs.c
@@ -,8 +,6 @@ static ssize_t bonding_store_primary(str
out:
 write_unlock_bh(bond-lock);

-   rtnl_unlock();
-


Looking at the changeset that added this perhaps the intention
is to hold the lock? If so we should add an rtnl_lock to the start
of the function.


Yes, this function needs to hold locks, and more than just
what's there now.  I believe the following should be correct; I haven't
tested it, though (I'm supposedly on vacation right now).

The following change should be correct for the
bonding_store_primary case discussed in this thread, and also corrects
the bonding_store_active case which performs similar functions.

The bond_change_active_slave and bond_select_active_slave
functions both require rtnl, bond-lock for read and curr_slave_lock for
write_bh, and no other locks.  This is so that the lower level
mode-specific functions can release locks down to just rtnl in order to
call, e.g., dev_set_mac_address with the locks it expects (rtnl only).

Signed-off-by: Jay Vosburgh [EMAIL PROTECTED]

diff --git a/drivers/net/bonding/bond_sysfs.c
b/drivers/net/bonding/bond_sysfs.c
index 11b76b3..28a2d80 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -1075,7 +1075,10 @@ static ssize_t bonding_store_primary(struct device
*d,
struct slave *slave;
struct bonding *bond = to_bond(d);

-   write_lock_bh(bond-lock);
+   rtnl_lock();
+   read_lock(bond-lock);
+   write_lock_bh(bond-curr_slave_lock);
+
if (!USES_PRIMARY(bond-params.mode)) {
printk(KERN_INFO DRV_NAME
   : %s: Unable to set primary slave; %s is in mode
   %d\n,
@@ -1109,8 +1112,8 @@ static ssize_t bonding_store_primary(struct device
*d,
}
}
out:
-   write_unlock_bh(bond-lock);
-
+   write_unlock_bh(bond-curr_slave_lock);
+   read_unlock(bond-lock);
rtnl_unlock();

return count;
@@ -1190,7 +1193,8 @@ static ssize_t bonding_store_active_slave(struct
device *d,
struct bonding *bond = to_bond(d);

rtnl_lock();
-   write_lock_bh(bond-lock);
+   read_lock(bond-lock);
+   write_lock_bh(bond-curr_slave_lock);

if (!USES_PRIMARY(bond-params.mode)) {
printk(KERN_INFO DRV_NAME
@@ -1247,7 +1251,8 @@ static ssize_t bonding_store_active_slave(struct
device *d,
}
}
out:
-   write_unlock_bh(bond-lock);
+   write_unlock_bh(bond-curr_slave_lock);
+   read_unlock(bond-lock);
rtnl_unlock();

return count;


Vanilla 2.6.24-rc5 plus this patch:

=
[ INFO: possible irq lock inversion dependency detected ]
2.6.24-rc5 #1
-
events/0/9 just changed the state of lock:
(mc-mca_lock){-+..}, at: [c0411c7a] mld_ifc_timer_expire+0x130/0x1fb
but this lock took another, soft-read-irq-unsafe lock in the past:
(bond-lock){-.--}

and interrupts could create inverse lock ordering between them.




Grrr, I should have seen that -- sorry.  Try your luck with this instead:

CUT

No luck.




I'm guessing if we go back to using a write-lock for bond-lock this
will go back to working again, but I'm not totally convinced since there
are plenty of places where we used a read-lock with it.


Should I check this patch or rather, based on a future discussion, wait 
for another version?




diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 11b76b3..635b857 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -1075,7 +1075,10 @@ static ssize_t bonding_store_primary(struct device *d,
struct slave *slave;
struct bonding *bond = to_bond(d);

+   rtnl_lock();
write_lock_bh(bond-lock);
+   write_lock_bh(bond-curr_slave_lock);
+
if (!USES_PRIMARY(bond-params.mode)) {
printk(KERN_INFO DRV_NAME
   : %s: Unable to set primary slave; %s is in mode %d\n,
@@ -1109,8 +1112,8 @@ static ssize_t bonding_store_primary(struct device *d,
}
}
out:
+   write_unlock_bh(bond-curr_slave_lock);
write_unlock_bh(bond-lock);
-
rtnl_unlock();

return count;
@@ -1191,6 +1194,7 @@ static ssize_t bonding_store_active_slave(struct device 
*d,

rtnl_lock();
write_lock_bh(bond-lock

Re: [Bugme-new] [Bug 9543] New: RTNL: assertion failed at net/ipv6/addrconf.c (2164)/RTNL: assertion failed at net/ipv4/devinet.c (1055)

2007-12-14 Thread Krzysztof Oledzki



On Wed, 12 Dec 2007, Jay Vosburgh wrote:


Herbert Xu [EMAIL PROTECTED] wrote:


diff -puN drivers/net/bonding/bond_sysfs.c~bonding-locking-fix 
drivers/net/bonding/bond_sysfs.c
--- a/drivers/net/bonding/bond_sysfs.c~bonding-locking-fix
+++ a/drivers/net/bonding/bond_sysfs.c
@@ -,8 +,6 @@ static ssize_t bonding_store_primary(str
out:
   write_unlock_bh(bond-lock);

-   rtnl_unlock();
-


Looking at the changeset that added this perhaps the intention
is to hold the lock? If so we should add an rtnl_lock to the start
of the function.


Yes, this function needs to hold locks, and more than just
what's there now.  I believe the following should be correct; I haven't
tested it, though (I'm supposedly on vacation right now).

The following change should be correct for the
bonding_store_primary case discussed in this thread, and also corrects
the bonding_store_active case which performs similar functions.

The bond_change_active_slave and bond_select_active_slave
functions both require rtnl, bond-lock for read and curr_slave_lock for
write_bh, and no other locks.  This is so that the lower level
mode-specific functions can release locks down to just rtnl in order to
call, e.g., dev_set_mac_address with the locks it expects (rtnl only).

Signed-off-by: Jay Vosburgh [EMAIL PROTECTED]

diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 11b76b3..28a2d80 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -1075,7 +1075,10 @@ static ssize_t bonding_store_primary(struct device *d,
struct slave *slave;
struct bonding *bond = to_bond(d);

-   write_lock_bh(bond-lock);
+   rtnl_lock();
+   read_lock(bond-lock);
+   write_lock_bh(bond-curr_slave_lock);
+
if (!USES_PRIMARY(bond-params.mode)) {
printk(KERN_INFO DRV_NAME
   : %s: Unable to set primary slave; %s is in mode %d\n,
@@ -1109,8 +1112,8 @@ static ssize_t bonding_store_primary(struct device *d,
}
}
out:
-   write_unlock_bh(bond-lock);
-
+   write_unlock_bh(bond-curr_slave_lock);
+   read_unlock(bond-lock);
rtnl_unlock();

return count;
@@ -1190,7 +1193,8 @@ static ssize_t bonding_store_active_slave(struct device 
*d,
struct bonding *bond = to_bond(d);

rtnl_lock();
-   write_lock_bh(bond-lock);
+   read_lock(bond-lock);
+   write_lock_bh(bond-curr_slave_lock);

if (!USES_PRIMARY(bond-params.mode)) {
printk(KERN_INFO DRV_NAME
@@ -1247,7 +1251,8 @@ static ssize_t bonding_store_active_slave(struct device 
*d,
}
}
out:
-   write_unlock_bh(bond-lock);
+   write_unlock_bh(bond-curr_slave_lock);
+   read_unlock(bond-lock);
rtnl_unlock();

return count;


Vanilla 2.6.24-rc5 plus this patch:

=
[ INFO: possible irq lock inversion dependency detected ]
2.6.24-rc5 #1
-
events/0/9 just changed the state of lock:
 (mc-mca_lock){-+..}, at: [c0411c7a] mld_ifc_timer_expire+0x130/0x1fb
but this lock took another, soft-read-irq-unsafe lock in the past:
 (bond-lock){-.--}

and interrupts could create inverse lock ordering between them.


other info that might help us debug this:
4 locks held by events/0/9:
 #0:  (events){--..}, at: [c0133c57] run_workqueue+0x87/0x1b6
 #1:  ((linkwatch_work).work){--..}, at: [c0133c57] 
run_workqueue+0x87/0x1b6

 #2:  (rtnl_mutex){--..}, at: [c03abd50] linkwatch_event+0x5/0x22
 #3:  (ndev-lock){-.-+}, at: [c0411b61] 
mld_ifc_timer_expire+0x17/0x1fb


the first lock's dependencies:
- (mc-mca_lock){-+..} ops: 10 {
   initial-use  at:
[c0104ee2] dump_trace+0x83/0x8d
[c014289c] __lock_acquire+0x4ba/0xc07
[c0109ef2] save_stack_trace+0x20/0x3a
[c0142fa1] __lock_acquire+0xbbf/0xc07
[c0412452] ipv6_dev_mc_inc+0x24d/0x31c
[c0143062] lock_acquire+0x79/0x93
[c04120d6] igmp6_group_added+0x18/0x11d
[c0439d62] _spin_lock_bh+0x3b/0x64
[c04120d6] igmp6_group_added+0x18/0x11d
[c04120d6] igmp6_group_added+0x18/0x11d
[c0141f9f] trace_hardirqs_on+0x122/0x14c
[c04124a8] ipv6_dev_mc_inc+0x2a3/0x31c
[c0412452] ipv6_dev_mc_inc+0x24d/0x31c
[c04124dd] ipv6_dev_mc_inc+0x2d8/0x31c
[c0412205] ipv6_dev_mc_inc+0x0/0x31c
[c0401834] ipv6_add_dev+0x21c/0x24b
[c040b07d] ndisc_ifinfo_sysctl_change+0x0/0x1ef
[c05c5b40] addrconf_init+0x13/0x193
[c0199f63] proc_net_fops_create+0x10/0x21

Re: [Bugme-new] [Bug 9543] New: RTNL: assertion failed at net/ipv6/addrconf.c (2164)/RTNL: assertion failed at net/ipv4/devinet.c (1055)

2007-12-14 Thread Krzysztof Oledzki



On Fri, 14 Dec 2007, Andy Gospodarek wrote:


On Fri, Dec 14, 2007 at 07:57:42PM +0100, Krzysztof Oledzki wrote:



On Fri, 14 Dec 2007, Andy Gospodarek wrote:


On Fri, Dec 14, 2007 at 05:14:57PM +0100, Krzysztof Oledzki wrote:



On Wed, 12 Dec 2007, Jay Vosburgh wrote:


Herbert Xu [EMAIL PROTECTED] wrote:


diff -puN drivers/net/bonding/bond_sysfs.c~bonding-locking-fix
drivers/net/bonding/bond_sysfs.c
--- a/drivers/net/bonding/bond_sysfs.c~bonding-locking-fix
+++ a/drivers/net/bonding/bond_sysfs.c
@@ -,8 +,6 @@ static ssize_t bonding_store_primary(str
out:
 write_unlock_bh(bond-lock);

-   rtnl_unlock();
-


Looking at the changeset that added this perhaps the intention
is to hold the lock? If so we should add an rtnl_lock to the start
of the function.


Yes, this function needs to hold locks, and more than just
what's there now.  I believe the following should be correct; I haven't
tested it, though (I'm supposedly on vacation right now).

The following change should be correct for the
bonding_store_primary case discussed in this thread, and also corrects
the bonding_store_active case which performs similar functions.

The bond_change_active_slave and bond_select_active_slave
functions both require rtnl, bond-lock for read and curr_slave_lock for
write_bh, and no other locks.  This is so that the lower level
mode-specific functions can release locks down to just rtnl in order to
call, e.g., dev_set_mac_address with the locks it expects (rtnl only).

Signed-off-by: Jay Vosburgh [EMAIL PROTECTED]

diff --git a/drivers/net/bonding/bond_sysfs.c
b/drivers/net/bonding/bond_sysfs.c
index 11b76b3..28a2d80 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -1075,7 +1075,10 @@ static ssize_t bonding_store_primary(struct device
*d,
struct slave *slave;
struct bonding *bond = to_bond(d);

-   write_lock_bh(bond-lock);
+   rtnl_lock();
+   read_lock(bond-lock);
+   write_lock_bh(bond-curr_slave_lock);

F

+
if (!USES_PRIMARY(bond-params.mode)) {
printk(KERN_INFO DRV_NAME
   : %s: Unable to set primary slave; %s is in mode
   %d\n,
@@ -1109,8 +1112,8 @@ static ssize_t bonding_store_primary(struct device
*d,
}
}
out:
-   write_unlock_bh(bond-lock);
-
+   write_unlock_bh(bond-curr_slave_lock);
+   read_unlock(bond-lock);
rtnl_unlock();

return count;
@@ -1190,7 +1193,8 @@ static ssize_t bonding_store_active_slave(struct
device *d,
struct bonding *bond = to_bond(d);

rtnl_lock();
-   write_lock_bh(bond-lock);
+   read_lock(bond-lock);
+   write_lock_bh(bond-curr_slave_lock);

if (!USES_PRIMARY(bond-params.mode)) {
printk(KERN_INFO DRV_NAME
@@ -1247,7 +1251,8 @@ static ssize_t bonding_store_active_slave(struct
device *d,
}
}
out:
-   write_unlock_bh(bond-lock);
+   write_unlock_bh(bond-curr_slave_lock);
+   read_unlock(bond-lock);
rtnl_unlock();

return count;


Vanilla 2.6.24-rc5 plus this patch:

=
[ INFO: possible irq lock inversion dependency detected ]
2.6.24-rc5 #1
-
events/0/9 just changed the state of lock:
(mc-mca_lock){-+..}, at: [c0411c7a] mld_ifc_timer_expire+0x130/0x1fb
but this lock took another, soft-read-irq-unsafe lock in the past:
(bond-lock){-.--}

and interrupts could create inverse lock ordering between them.




Grrr, I should have seen that -- sorry.  Try your luck with this instead:

CUT

No luck.

bonding: bond0: setting mode to active-backup (1).
bonding: bond0: Setting MII monitoring interval to 100.
ADDRCONF(NETDEV_UP): bond0: link is not ready
bonding: bond0: Adding slave eth0.
e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow
Control: RX/TX
bonding: bond0: making interface eth0 the new active one.
bonding: bond0: first active interface up!
bonding: bond0: enslaving eth0 as an active interface with an up link.
bonding: bond0: Adding slave eth1.
ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready


SNIP


bonding: bond0: enslaving eth1 as a backup interface with a down link.
bonding: bond0: Setting eth0 as primary slave.
bond0: no IPv6 routers present



Based on the console log, I'm guessing your initialization scripts use
sysfs to set eth0 as the primary interface for bond0?  Can you confirm?


Yep, that's correct:

postup() {
if [[ ${IFACE} == bond0 ]] ; then
echo -n +eth0  /sys/class/net/${IFACE}/bonding/slaves
echo -n +eth1  /sys/class/net/${IFACE}/bonding/slaves
echo -n  eth0  /sys/class/net/${IFACE}/bonding/primary
fi
}


If you did somehow use sysfs to set the primary device as eth0, I'm
guessing you never see this issue without that line or without this
patch

Re: [Bugme-new] [Bug 9543] New: RTNL: assertion failed at net/ipv6/addrconf.c (2164)/RTNL: assertion failed at net/ipv4/devinet.c (1055)

2007-12-11 Thread Krzysztof Oledzki



On Tue, 11 Dec 2007, Andrew Morton wrote:


On Tue, 11 Dec 2007 03:20:48 -0800 (PST) [EMAIL PROTECTED] wrote:


http://bugzilla.kernel.org/show_bug.cgi?id=9543

   Summary: RTNL: assertion failed at net/ipv6/addrconf.c
(2164)/RTNL: assertion failed at net/ipv4/devinet.c
(1055)
   Product: Drivers
   Version: 2.5
 KernelVersion: 2.6.24-rc4-git7
  Platform: All
OS/Version: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: Network
AssignedTo: [EMAIL PROTECTED]
ReportedBy: [EMAIL PROTECTED]


Most recent kernel where this bug did not occur: 2.6.23
Distribution: Gentoo

Problem Description:
ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
RTNL: assertion failed at net/ipv6/addrconf.c (2164)
Pid: 9, comm: events/0 Not tainted 2.6.24-rc4-git7 #1
 [78402cfb] addrconf_notify+0x5b4/0x7b7
 [7812203a] finish_task_switch+0x0/0x8c
 [781346ff] worker_thread+0x0/0x85
 [78438e23] schedule+0x545/0x55f
 [781408d1] print_lock_contention_bug+0x11/0xd2
 [783bfa72] rt_run_flush+0x43/0x8b
 [783bfa93] rt_run_flush+0x64/0x8b
 [7813ac54] notifier_call_chain+0x2a/0x52
 [7813ac9e] raw_notifier_call_chain+0x17/0x1a
 [783a3471] netdev_state_change+0x18/0x29
 [783ac6a9] __linkwatch_run_queue+0x150/0x17e
 [783ac6f4] linkwatch_event+0x1d/0x22
 [78133cdf] run_workqueue+0xdb/0x1b6
 [78133c8b] run_workqueue+0x87/0x1b6
 [783ac6d7] linkwatch_event+0x0/0x22
 [781346ff] worker_thread+0x0/0x85
 [78134778] worker_thread+0x79/0x85
 [781371ad] autoremove_wake_function+0x0/0x35
 [781370f6] kthread+0x38/0x5e
 [781370be] kthread+0x0/0x5e
 [78104baf] kernel_thread_helper+0x7/0x10
 ===
RTNL: assertion failed at net/ipv6/addrconf.c (1610)



Hopefully this is due to the bug you reported in bug #9542.

Does this patch fix both issues?


Unfortunately not. I just updated bugzilla.

Best regards,

Krzysztof Olędzki

Re: [PATCH 5/6] e1000: Secondary unicast address support

2007-11-13 Thread Krzysztof Oledzki



On Tue, 13 Nov 2007, Auke Kok wrote:


From: Patrick McHardy [EMAIL PROTECTED]

Add support for configuring secondary unicast addresses. Unicast
addresses take precendece over multicast addresses when filling
the exact address filters to avoid going to promiscous mode.
When more unicast addresses are present than filter slots,
unicast filtering is disabled and all slots can be used for
multicast addresses.


Is there any easy way to use it for VRRP? It would be really great to 
have two IP addresses on the same interface, each with a different hw 
address.


Best regards,

 Krzysztof Olędzki

Re: [PATCH 5/6] e1000: Secondary unicast address support

2007-11-13 Thread Krzysztof Oledzki



On Tue, 13 Nov 2007, Ben Greear wrote:


Krzysztof Oledzki wrote:



On Tue, 13 Nov 2007, Auke Kok wrote:


From: Patrick McHardy [EMAIL PROTECTED]

Add support for configuring secondary unicast addresses. Unicast
addresses take precendece over multicast addresses when filling
the exact address filters to avoid going to promiscous mode.
When more unicast addresses are present than filter slots,
unicast filtering is disabled and all slots can be used for
multicast addresses.


Is there any easy way to use it for VRRP? It would be really great to have 
two IP addresses on the same interface, each with a different hw address.


mac-vlans should do this for you..with our without the driver patch.


I'm afraid mac-vlans is not a solution here. Having 2x more interfaces 
(ex. 2000 instead of 1000) makes everything (especially routing, 
firewalling and QoS) much more complicated. It would be nice to have 
something like ip addr add a.b.c.d/24 dev vlan32 hwaddress 
aa:bb:cc:dd:ee:ff.


BTW: is it possible to stack mac-vlans ontop of .1Q vlans?

Best regards,

Krzysztof Olędzki

Re: [PATCH 5/6] e1000: Secondary unicast address support

2007-11-13 Thread Krzysztof Oledzki



On Tue, 13 Nov 2007, Ben Greear wrote:


Krzysztof Oledzki wrote:

I'm afraid mac-vlans is not a solution here. Having 2x more interfaces (ex. 
2000 instead of 1000) makes everything (especially routing, firewalling and 
QoS) much more complicated. It would be nice to have something like ip 
addr add a.b.c.d/24 dev vlan32 hwaddress aa:bb:cc:dd:ee:ff.


I'll take your word for it, though I have had good luck using mac-vlans
in my own app.  They are nice because the are full-fledged interfaces,
so you can treat them basically as .1q vlans or ethernet devices, including
all the routing and firewalling tricks.


OK. But in my situation it is going to be:

vlan1 (.1q) - real MAC
vlan1a (mac-vlan) - VRRP MAC
(...)
vlan999 (.1q) - real MAC
vlan999 (mac-vlan) - VRRP MAC

... with packets for the same destination coming in and out over both 
interfaces depending on a src ip address.



BTW: is it possible to stack mac-vlans ontop of .1Q vlans?


I believe it will work fine.  You could probably also stack .1q
VLANs on top of mac-vlans so long as you use the same MAC for the VLANs as 
for

the mac-vlan dev.


So, this is something exactly I don't want to do as I need two different 
MAC addresses. ;)


Best regards,

Krzysztof Olędzki

ISNs and 2.6.22, Was: Re: haproxy linux firewall (netfilter)

2007-10-20 Thread Krzysztof Oledzki



On Sat, 20 Oct 2007, Willy Tarreau wrote:
CUT


What is very strange is that linux uses random increments, so your ISNs
should not wrap in a matter of a few seconds.


Good point. I need to investigate this.


netcat is very convenient for such tests. It's easy to bind it to a
source port for consecutive tests while you run tcpdump in the background :

 $ echo bla | nc -p 1234 192.168.1.2 80
 $ echo bla | nc -p 1234 192.168.1.2 80

Also, please try this with tcp_timestamps enabled and disabled to see if it
changes anything.


Interesting... :|

2.6.20:
18:52:33.558379 IP 192.168.0.33.  212.77.100.101.80: S 3708509816:3708509816(0) 
win 5840 mss 1460,sackOK,timestamp 1884090256 0,nop,wscale 1
18:52:33.882129 IP 192.168.0.33.  212.77.100.101.80: S 3708833567:3708833567(0) 
win 5840 mss 1460,sackOK,timestamp 1884090580 0,nop,wscale 1
18:52:34.084000 IP 192.168.0.33.  212.77.100.101.80: S 3709035437:3709035437(0) 
win 5840 mss 1460,sackOK,timestamp 1884090782 0,nop,wscale 1

2.6.21:
18:58:36.074969 IP 192.168.0.66.  212.77.100.101.80: S 110585153:110585153(0) 
win 5840 mss 1460,sackOK,timestamp 112007046 0,nop,wscale 5
18:58:36.440084 IP 192.168.0.66.  212.77.100.101.80: S 110950271:110950271(0) 
win 5840 mss 1460,sackOK,timestamp 112007412 0,nop,wscale 5
18:58:36.830141 IP 192.168.0.66.  212.77.100.101.80: S 111340328:111340328(0) 
win 5840 mss 1460,sackOK,timestamp 112007802 0,nop,wscale 5

2.6.22:
18:59:34.525097 IP 192.168.0.7.  212.77.100.101.80: S 3303295586:3303295586(0) 
win 5840 mss 1460,sackOK,timestamp 842 0,nop,wscale 6
18:59:34.942104 IP 192.168.0.7.  212.77.100.101.80: S 3720303240:3720303240(0) 
win 5840 mss 1460,sackOK,timestamp 1112259 0,nop,wscale 6
18:59:35.412229 IP 192.168.0.7.  212.77.100.101.80: S 4190427367:4190427367(0) 
win 5840 mss 1460,sackOK,timestamp 1112729 0,nop,wscale 6

2.6.22+tcp_timestamps=0:
19:00:38.285554 IP 192.168.0.7.  212.77.100.101.80: S 2639244549:2639244549(0) 
win 5840 mss 1460,nop,nop,sackOK,nop,wscale 6
19:00:39.448675 IP 192.168.0.7.  212.77.100.101.80: S 3802363348:3802363348(0) 
win 5840 mss 1460,nop,nop,sackOK,nop,wscale 6
19:00:43.003850 IP 192.168.0.7.  212.77.100.101.80: S 3062574559:3062574559(0) 
win 5840 mss 1460,nop,nop,sackOK,nop,wscale 6
19:00:45.950863 IP 192.168.0.7.  212.77.100.101.80: S 1714619373:1714619373(0) 
win 5840 mss 1460,nop,nop,sackOK,nop,wscale 6

So it seems that ISNs are not randomly incremented but rather randomly 
generated. Adding netdev@vger.kernel.org to the CC list.


Best regards,

Krzysztof Olędzki

Re: ISNs and 2.6.22, Was: Re: haproxy linux firewall (netfilter)

2007-10-20 Thread Krzysztof Oledzki



On Sat, 20 Oct 2007, Krzysztof Oledzki wrote:




On Sat, 20 Oct 2007, Willy Tarreau wrote:
CUT


What is very strange is that linux uses random increments, so your ISNs
should not wrap in a matter of a few seconds.


Good point. I need to investigate this.


netcat is very convenient for such tests. It's easy to bind it to a
source port for consecutive tests while you run tcpdump in the background :

 $ echo bla | nc -p 1234 192.168.1.2 80
 $ echo bla | nc -p 1234 192.168.1.2 80

Also, please try this with tcp_timestamps enabled and disabled to see if it
changes anything.


Interesting... :|

2.6.20:
18:52:33.558379 IP 192.168.0.33.  212.77.100.101.80: S 
3708509816:3708509816(0) win 5840 mss 1460,sackOK,timestamp 1884090256 
0,nop,wscale 1
18:52:33.882129 IP 192.168.0.33.  212.77.100.101.80: S 
3708833567:3708833567(0) win 5840 mss 1460,sackOK,timestamp 1884090580 
0,nop,wscale 1
18:52:34.084000 IP 192.168.0.33.  212.77.100.101.80: S 
3709035437:3709035437(0) win 5840 mss 1460,sackOK,timestamp 1884090782 
0,nop,wscale 1


2.6.21:
18:58:36.074969 IP 192.168.0.66.  212.77.100.101.80: S 
110585153:110585153(0) win 5840 mss 1460,sackOK,timestamp 112007046 
0,nop,wscale 5
18:58:36.440084 IP 192.168.0.66.  212.77.100.101.80: S 
110950271:110950271(0) win 5840 mss 1460,sackOK,timestamp 112007412 
0,nop,wscale 5
18:58:36.830141 IP 192.168.0.66.  212.77.100.101.80: S 
111340328:111340328(0) win 5840 mss 1460,sackOK,timestamp 112007802 
0,nop,wscale 5


2.6.22:
18:59:34.525097 IP 192.168.0.7.  212.77.100.101.80: S 
3303295586:3303295586(0) win 5840 mss 1460,sackOK,timestamp 842 
0,nop,wscale 6
18:59:34.942104 IP 192.168.0.7.  212.77.100.101.80: S 
3720303240:3720303240(0) win 5840 mss 1460,sackOK,timestamp 1112259 
0,nop,wscale 6
18:59:35.412229 IP 192.168.0.7.  212.77.100.101.80: S 
4190427367:4190427367(0) win 5840 mss 1460,sackOK,timestamp 1112729 
0,nop,wscale 6


2.6.22+tcp_timestamps=0:
19:00:38.285554 IP 192.168.0.7.  212.77.100.101.80: S 
2639244549:2639244549(0) win 5840 mss 1460,nop,nop,sackOK,nop,wscale 6
19:00:39.448675 IP 192.168.0.7.  212.77.100.101.80: S 
3802363348:3802363348(0) win 5840 mss 1460,nop,nop,sackOK,nop,wscale 6
19:00:43.003850 IP 192.168.0.7.  212.77.100.101.80: S 
3062574559:3062574559(0) win 5840 mss 1460,nop,nop,sackOK,nop,wscale 6
19:00:45.950863 IP 192.168.0.7.  212.77.100.101.80: S 
1714619373:1714619373(0) win 5840 mss 1460,nop,nop,sackOK,nop,wscale 6


So it seems that ISNs are not randomly incremented but rather randomly 
generated. Adding netdev@vger.kernel.org to the CC list.


Eh, I was little to hurry this time. There were not randomly generated but 
incremented with to big value. This patch fixes my problem:


http://git.kernel.org/?p=linux/kernel/git/stable/stable-queue.git;a=blob;f=queue-2.6.22/fix-tcp-initial-sequence-number-selection.patch;h=05b9167d68ecde1e6088f58c55e2906b768420ed;hb=HEAD

Looking forward for a next -stable release. ;)

Best regards,

Krzysztof Olędzki

Re: TCP port randomization

2007-10-18 Thread Krzysztof Oledzki



On Wed, 17 Oct 2007, Stephen Hemminger wrote:


On Thu, 18 Oct 2007 00:31:13 +0200 (CEST)
Krzysztof Oledzki [EMAIL PROTECTED] wrote:




On Wed, 17 Oct 2007, Stephen Hemminger wrote:


On Wed, 17 Oct 2007 23:15:48 +0200 (CEST)
Krzysztof Oledzki [EMAIL PROTECTED] wrote:


Hello,

Is it normal that TCP port randomization (tested with 2.6.22) works only
when explicitly binding to a IP address:


--- cut here ---
[EMAIL PROTECTED]:~# nc 192.168.129.28 11
(UNKNOWN) [192.168.129.28] 11 (systat) : Connection refused
[EMAIL PROTECTED]:~# nc 192.168.129.28 11
(UNKNOWN) [192.168.129.28] 11 (systat) : Connection refused
[EMAIL PROTECTED]:~# nc 192.168.129.28 11
(UNKNOWN) [192.168.129.28] 11 (systat) : Connection refused

23:11:11.896126 IP 192.168.129.2.37839  192.168.129.28.11: S
23:11:12.146573 IP 192.168.129.2.37840  192.168.129.28.11: S
23:11:12.396488 IP 192.168.129.2.37841  192.168.129.28.11: S
--- cut here ---


--- cut here ---
[EMAIL PROTECTED]:~# nc -s 192.168.129.2 192.168.129.28 11
(UNKNOWN) [192.168.129.28] 11 (systat) : Connection refused
[EMAIL PROTECTED]:~# nc -s 192.168.129.2 192.168.129.28 11
(UNKNOWN) [192.168.129.28] 11 (systat) : Connection refused
[EMAIL PROTECTED]:~# nc -s 192.168.129.2 192.168.129.28 11
(UNKNOWN) [192.168.129.28] 11 (systat) : Connection refused

23:11:31.704391 IP 192.168.129.2.57204  192.168.129.28.11: S
23:11:34.400048 IP 192.168.129.2.14512  192.168.129.28.11: S
23:11:34.606707 IP 192.168.129.2.20117  192.168.129.28.11: S
--- cut here ---

Best regards,

Krzysztof Olędzki


It is a expected side effect.


So it is not possible to use randomization without binding to a specific
srcip?


The starting point for the search
is based on hash(srcaddr, dstaddr, dstport, secret).
You are using same source, dest and port so yes it will stay
the same until rekeying occurs.
The secret only changes every 5min same as TCP initial sequence number.


If I get it right, even with explicitly selected constant srcaddr port
numbers should simply increase? This is not what I observed.



When you set srcaddr, it calls bind, and bind does randomization always
independent of address.

This existing behavior may seem odd, but it shouldn't present a security
problem.


Right. Thank you very much for the explanation.

Best regards,
Krzysztof Olędzki

Re: BUG: unable to handle kernel NULL pointer dereference at virtual address 000000b0

2007-10-18 Thread Krzysztof Oledzki



On Wed, 17 Oct 2007, Eric Dumazet wrote:


Krzysztof Oledzki a écrit :



On Wed, 17 Oct 2007, Eric Dumazet wrote:


Krzysztof Oledzki a écrit :



On Wed, 17 Oct 2007, Eric Dumazet wrote:


Krzysztof Oledzki a écrit :

Hello,

Today I found in my logs:

BUG: unable to handle kernel NULL pointer dereference at virtual 
address 00b0

 printing eip:
78395f65
*pde = 
Oops:  [#1]
PREEMPT SMP
CPU:0
EIP:0060:[78395f65]Not tainted VLI
EFLAGS: 00210286   (2.6.22.9 #1)
EIP is at __ip_route_output_key+0x412/0x722
eax: 8000   ebx:    ecx: 5dd2b1c3   edx: 
esi:    edi: d44c7e30   ebp: ec8c4980   esp: d44c7ddc
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process smtpd (pid: 12479, ti=d44c6000 task=9e759510 task.ti=d44c6000)
Stack: d44c7e7c d44c7e7c d44c7eb8  d44c7e7c   
0005
     5dd2b1c3     

        0003  
d44c7e7c

Call Trace:
 [78396280] ip_route_output_flow+0xb/0x3e
 [783b2b29] ip4_datagram_connect+0x1c9/0x308
 [783ba70a] inet_dgram_connect+0x45/0x4e
 [7837135e] sys_connect+0x72/0x9c
 [78371607] sock_map_fd+0x41/0x4a
 [7840d1b1] _spin_lock+0x33/0x3e
 [7840d623] _spin_unlock+0x25/0x3b
 [78371607] sock_map_fd+0x41/0x4a
 [78372792] sys_socketcall+0x8f/0x242
 [7813e99c] trace_hardirqs_on+0x122/0x14c
 [78103dc6] sysenter_past_esp+0x8f/0x99
 [78103d96] sysenter_past_esp+0x5f/0x99
 ===
Code: fa e0 00 00 00 75 07 c6 44 24 56 05 eb 14 81 fa f0 00 00 00 0f 84 
e1 02 00 00 84 c0 0f 84 d9 02 00 00 8b 44 24 0c 0d 00 00 00 80 f6 86 
b0 00 00 00 08 0f 44 44 24 0c 89 44 24 0c b8 01 00 00 00
EIP: [78395f65] __ip_route_output_key+0x412/0x722 SS:ESP 
0068:d44c7ddc


Shortly before it there was:
Oct 17 07:17:55 cougar postfix/master[3400]: warning: process 
/usr/lib/postfix/smtpd pid 12479 killed by signal 11


Best regards,


Krzysztof Olędzki


Hello Krzysztof

Could you give us some details about this ? kernel version at least.


Yes, I was little to hurry sending this bug report. Anyway, it is 
2.6.22.9 like mentioned in the oops: EFLAGS: 00210286 (2.6.22.9 #1)



(you could for example take a look at REPORTING-BUGS, or run
scripts/ver_linux)


Linux cougar 2.6.22.9 #1 SMP PREEMPT Wed Oct 3 10:24:19 CEST 2007 i686 
Intel(R) Pentium(R) D CPU 3.20GHz GenuineIntel GNU/Linux


Gnu C  4.1.2
Gnu make   3.81
binutils   2.17
util-linux 2.12r
mount  2.12r
module-init-tools  3.2.2
e2fsprogs  1.40.2
Linux C Library libc.2.5
Dynamic linker (ldd)   2.5
Procps 3.2.7
Net-tools  1.60
Kbd1.12
Sh-utils   6.9



Yes indeed, version was on your initial report.

It seems this kernel is unusual (VMSPLIT_2G_OPT instead of stdandard 
VMSPLIT_3G), any chance you provide full .config ?


Attached, both .config and dmesg.



Hum, you are using IPT_TPROXY thing, which is not in linux-2.6.22.9


It is only compiled in, not used at the moment.

I have no idea how this can taint the kernel, since you provide no 
information.


Try to reproduce the problem with a genuine kernel.


OK. Thank you.

Best regards,

Krzysztof Olędzki

Re: BUG: unable to handle kernel NULL pointer dereference at virtual address 000000b0

2007-10-18 Thread Krzysztof Oledzki



On Thu, 18 Oct 2007, Patrick McHardy wrote:


Krzysztof Oledzki wrote:

Hum, you are using IPT_TPROXY thing, which is not in linux-2.6.22.9


It is only compiled in, not used at the moment.


But at least the previous version (before those patches posted a week
ago) touches the routing code in exactly that function.


Right. Thank you.

Best regards,

Krzysztof Olędzki

BUG: unable to handle kernel NULL pointer dereference at virtual address 000000b0

2007-10-17 Thread Krzysztof Oledzki

Hello,

Today I found in my logs:

BUG: unable to handle kernel NULL pointer dereference at virtual address 
00b0
 printing eip:
78395f65
*pde = 
Oops:  [#1]
PREEMPT SMP
CPU:0
EIP:0060:[78395f65]Not tainted VLI
EFLAGS: 00210286   (2.6.22.9 #1)
EIP is at __ip_route_output_key+0x412/0x722
eax: 8000   ebx:    ecx: 5dd2b1c3   edx: 
esi:    edi: d44c7e30   ebp: ec8c4980   esp: d44c7ddc
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process smtpd (pid: 12479, ti=d44c6000 task=9e759510 task.ti=d44c6000)
Stack: d44c7e7c d44c7e7c d44c7eb8  d44c7e7c   0005
     5dd2b1c3     
        0003  d44c7e7c
Call Trace:
 [78396280] ip_route_output_flow+0xb/0x3e
 [783b2b29] ip4_datagram_connect+0x1c9/0x308
 [783ba70a] inet_dgram_connect+0x45/0x4e
 [7837135e] sys_connect+0x72/0x9c
 [78371607] sock_map_fd+0x41/0x4a
 [7840d1b1] _spin_lock+0x33/0x3e
 [7840d623] _spin_unlock+0x25/0x3b
 [78371607] sock_map_fd+0x41/0x4a
 [78372792] sys_socketcall+0x8f/0x242
 [7813e99c] trace_hardirqs_on+0x122/0x14c
 [78103dc6] sysenter_past_esp+0x8f/0x99
 [78103d96] sysenter_past_esp+0x5f/0x99
 ===
Code: fa e0 00 00 00 75 07 c6 44 24 56 05 eb 14 81 fa f0 00 00 00 0f 84 e1 02 00 00 
84 c0 0f 84 d9 02 00 00 8b 44 24 0c 0d 00 00 00 80 f6 86 b0 00 00 00 08 0f 44 
44 24 0c 89 44 24 0c b8 01 00 00 00
EIP: [78395f65] __ip_route_output_key+0x412/0x722 SS:ESP 0068:d44c7ddc

Shortly before it there was:
Oct 17 07:17:55 cougar postfix/master[3400]: warning: process 
/usr/lib/postfix/smtpd pid 12479 killed by signal 11

Best regards,


Krzysztof Olędzki

Re: BUG: unable to handle kernel NULL pointer dereference at virtual address 000000b0

2007-10-17 Thread Krzysztof Oledzki



On Wed, 17 Oct 2007, Eric Dumazet wrote:


Krzysztof Oledzki a écrit :

Hello,

Today I found in my logs:

BUG: unable to handle kernel NULL pointer dereference at virtual address 
00b0

 printing eip:
78395f65
*pde = 
Oops:  [#1]
PREEMPT SMP
CPU:0
EIP:0060:[78395f65]Not tainted VLI
EFLAGS: 00210286   (2.6.22.9 #1)
EIP is at __ip_route_output_key+0x412/0x722
eax: 8000   ebx:    ecx: 5dd2b1c3   edx: 
esi:    edi: d44c7e30   ebp: ec8c4980   esp: d44c7ddc
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process smtpd (pid: 12479, ti=d44c6000 task=9e759510 task.ti=d44c6000)
Stack: d44c7e7c d44c7e7c d44c7eb8  d44c7e7c   
0005
     5dd2b1c3     

        0003  
d44c7e7c

Call Trace:
 [78396280] ip_route_output_flow+0xb/0x3e
 [783b2b29] ip4_datagram_connect+0x1c9/0x308
 [783ba70a] inet_dgram_connect+0x45/0x4e
 [7837135e] sys_connect+0x72/0x9c
 [78371607] sock_map_fd+0x41/0x4a
 [7840d1b1] _spin_lock+0x33/0x3e
 [7840d623] _spin_unlock+0x25/0x3b
 [78371607] sock_map_fd+0x41/0x4a
 [78372792] sys_socketcall+0x8f/0x242
 [7813e99c] trace_hardirqs_on+0x122/0x14c
 [78103dc6] sysenter_past_esp+0x8f/0x99
 [78103d96] sysenter_past_esp+0x5f/0x99
 ===
Code: fa e0 00 00 00 75 07 c6 44 24 56 05 eb 14 81 fa f0 00 00 00 0f 84 e1 
02 00 00 84 c0 0f 84 d9 02 00 00 8b 44 24 0c 0d 00 00 00 80 f6 86 b0 00 
00 00 08 0f 44 44 24 0c 89 44 24 0c b8 01 00 00 00

EIP: [78395f65] __ip_route_output_key+0x412/0x722 SS:ESP 0068:d44c7ddc

Shortly before it there was:
Oct 17 07:17:55 cougar postfix/master[3400]: warning: process 
/usr/lib/postfix/smtpd pid 12479 killed by signal 11


Best regards,


Krzysztof Olędzki


Hello Krzysztof

Could you give us some details about this ? kernel version at least.


Yes, I was little to hurry sending this bug report. Anyway, it is 2.6.22.9 
like mentioned in the oops: EFLAGS: 00210286 (2.6.22.9 #1)



(you could for example take a look at REPORTING-BUGS, or run
scripts/ver_linux)


Linux cougar 2.6.22.9 #1 SMP PREEMPT Wed Oct 3 10:24:19 CEST 2007 i686 Intel(R) 
Pentium(R) D CPU 3.20GHz GenuineIntel GNU/Linux

Gnu C  4.1.2
Gnu make   3.81
binutils   2.17
util-linux 2.12r
mount  2.12r
module-init-tools  3.2.2
e2fsprogs  1.40.2
Linux C Library libc.2.5
Dynamic linker (ldd)   2.5
Procps 3.2.7
Net-tools  1.60
Kbd1.12
Sh-utils   6.9


Best regards,

Krzysztof OLędzki

TCP port randomization

2007-10-17 Thread Krzysztof Oledzki

Hello,

Is it normal that TCP port randomization (tested with 2.6.22) works only 
when explicitly binding to a IP address:



--- cut here ---
[EMAIL PROTECTED]:~# nc 192.168.129.28 11
(UNKNOWN) [192.168.129.28] 11 (systat) : Connection refused
[EMAIL PROTECTED]:~# nc 192.168.129.28 11
(UNKNOWN) [192.168.129.28] 11 (systat) : Connection refused
[EMAIL PROTECTED]:~# nc 192.168.129.28 11
(UNKNOWN) [192.168.129.28] 11 (systat) : Connection refused

23:11:11.896126 IP 192.168.129.2.37839  192.168.129.28.11: S
23:11:12.146573 IP 192.168.129.2.37840  192.168.129.28.11: S
23:11:12.396488 IP 192.168.129.2.37841  192.168.129.28.11: S
--- cut here ---


--- cut here ---
[EMAIL PROTECTED]:~# nc -s 192.168.129.2 192.168.129.28 11
(UNKNOWN) [192.168.129.28] 11 (systat) : Connection refused
[EMAIL PROTECTED]:~# nc -s 192.168.129.2 192.168.129.28 11
(UNKNOWN) [192.168.129.28] 11 (systat) : Connection refused
[EMAIL PROTECTED]:~# nc -s 192.168.129.2 192.168.129.28 11
(UNKNOWN) [192.168.129.28] 11 (systat) : Connection refused

23:11:31.704391 IP 192.168.129.2.57204  192.168.129.28.11: S
23:11:34.400048 IP 192.168.129.2.14512  192.168.129.28.11: S
23:11:34.606707 IP 192.168.129.2.20117  192.168.129.28.11: S
--- cut here ---

Best regards,

Krzysztof Olędzki

Re: TCP port randomization

2007-10-17 Thread Krzysztof Oledzki



On Wed, 17 Oct 2007, Stephen Hemminger wrote:


On Wed, 17 Oct 2007 23:15:48 +0200 (CEST)
Krzysztof Oledzki [EMAIL PROTECTED] wrote:


Hello,

Is it normal that TCP port randomization (tested with 2.6.22) works only
when explicitly binding to a IP address:


--- cut here ---
[EMAIL PROTECTED]:~# nc 192.168.129.28 11
(UNKNOWN) [192.168.129.28] 11 (systat) : Connection refused
[EMAIL PROTECTED]:~# nc 192.168.129.28 11
(UNKNOWN) [192.168.129.28] 11 (systat) : Connection refused
[EMAIL PROTECTED]:~# nc 192.168.129.28 11
(UNKNOWN) [192.168.129.28] 11 (systat) : Connection refused

23:11:11.896126 IP 192.168.129.2.37839  192.168.129.28.11: S
23:11:12.146573 IP 192.168.129.2.37840  192.168.129.28.11: S
23:11:12.396488 IP 192.168.129.2.37841  192.168.129.28.11: S
--- cut here ---


--- cut here ---
[EMAIL PROTECTED]:~# nc -s 192.168.129.2 192.168.129.28 11
(UNKNOWN) [192.168.129.28] 11 (systat) : Connection refused
[EMAIL PROTECTED]:~# nc -s 192.168.129.2 192.168.129.28 11
(UNKNOWN) [192.168.129.28] 11 (systat) : Connection refused
[EMAIL PROTECTED]:~# nc -s 192.168.129.2 192.168.129.28 11
(UNKNOWN) [192.168.129.28] 11 (systat) : Connection refused

23:11:31.704391 IP 192.168.129.2.57204  192.168.129.28.11: S
23:11:34.400048 IP 192.168.129.2.14512  192.168.129.28.11: S
23:11:34.606707 IP 192.168.129.2.20117  192.168.129.28.11: S
--- cut here ---

Best regards,

Krzysztof Olędzki


It is a expected side effect.


So it is not possible to use randomization without binding to a specific 
srcip?



The starting point for the search
is based on hash(srcaddr, dstaddr, dstport, secret).
You are using same source, dest and port so yes it will stay
the same until rekeying occurs.
The secret only changes every 5min same as TCP initial sequence number.


If I get it right, even with explicitly selected constant srcaddr port 
numbers should simply increase? This is not what I observed.


Thanks.

Best regards,

Krzysztof Olędzki

Re: incorrect cksum with tcp/udp on lo with 2.6.20/2.6.21/2.6.22

2007-10-02 Thread Krzysztof Oledzki



On Tue, 2 Oct 2007, Herbert Xu wrote:


On Mon, Sep 24, 2007 at 11:44:19AM +0200, Krzysztof Oledzki wrote:


So, with DR mode, packet goes by the lo device (with bad checksum) and
then get redirected outside. Unfortunately, when it leaves host it has bad
checksum, too. :(


Did you check this by taking a tcpdump on an external host?

Yes.


Doing a local tcpdump doesn't work as tcpdump won't show the
correct checksum if checksum offload is enabled.

Indeed, I'm aware about this.


If it's really sending a bogus checksum then it's a bug in
LVS.


I'm not sure if we should call it a bug. LVS does not support such 
configuration by default - it requires kernel patching. However, it worked 
with older kernels so that's why I asked if it is possible to force full 
TCP/UDP checksum calculation?


Thank you.

Best regards,

Krzysztof Olędzki

Upgradeing 2.6.21.7-2.6.22.9 kill my network (sky2): sky2 eth0: rx error, status 0x402300 length 60

2007-09-28 Thread Krzysztof Oledzki

Hello,

After upgrading my kernel from 2.6.21.7 to 2.6.22.9 my 88E8053 no longer 
works:


sky2 :02:00.0: v1.14 addr 0xcfffc000 irq 17 Yukon-EC (0xb6) rev 1
sky2 eth0: addr 00:11:d8:50:f6:28
sky2 eth0: enabling interface
sky2 eth0: ram buffer 48K
sky2 eth0: Link is up at 100 Mbps, full duplex, flow control both
sky2 eth0: rx error, status 0x402300 length 60
sky2 eth0: rx error, status 0x402500 length 60
sky2 eth0: rx error, status 0x402300 length 60
sky2 eth0: rx error, status 0x402500 length 60
sky2 eth0: rx error, status 0x402300 length 60
sky2 eth0: rx error, status 0x402500 length 60
sky2 eth0: rx error, status 0x402300 length 60
sky2 eth0: rx error, status 0x402300 length 60
sky2 eth0: rx error, status 0x402500 length 60
sky2 eth0: rx error, status 0x402300 length 60
sky2 eth0: rx error, status 0x402500 length 60
sky2 eth0: rx error, status 0x402500 length 60
sky2 eth0: rx error, status 0x402500 length 60
sky2 eth0: rx error, status 0x402500 length 60
(...)

I also compared lspci output from both 2.6.21/2.6.22 and it is the same:

02:00.0 Ethernet controller [0200]: Marvell Technology Group Ltd. 88E8053 PCI-E 
Gigabit Ethernet Controller [11ab:4362] (rev 15)
Subsystem: ASUSTeK Computer Inc. Marvell 88E8053 Gigabit Ethernet 
controller PCIe (Asus) [1043:8142]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- 
MAbort- SERR- PERR-
Latency: 0, Cache Line Size: 16 bytes
Interrupt: pin A routed to IRQ 221
Region 0: Memory at cfffc000 (64-bit, non-prefetchable) [size=16K]
Region 2: I/O ports at d800 [size=256]
Expansion ROM at cffc [disabled] [size=128K]
Capabilities: [48] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA 
PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [50] Vital Product Data
Capabilities: [5c] Message Signalled Interrupts: Mask- 64bit+ Queue=0/1 
Enable+
Address: fee0300c  Data: 41c9
Capabilities: [e0] Express Legacy Endpoint IRQ 0
Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag-
Device: Latency L0s unlimited, L1 unlimited
Device: AtnBtn- AtnInd- PwrInd-
Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr+ NoSnoop-
Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s, Port 0
Link: Latency L0s 256ns, L1 unlimited
Link: ASPM Disabled RCB 128 bytes CommClk- ExtSynch-
Link: Speed 2.5Gb/s, Width x1
00: ab 11 62 43 07 04 10 00 15 00 00 02 04 00 00 00
10: 04 c0 ff cf 00 00 00 00 01 d8 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 43 10 42 81
30: 00 00 fc cf 48 00 00 00 00 00 00 00 0a 01 00 00
40: 00 00 f0 01 00 80 a0 01 01 50 02 fe 00 20 00 14
50: 03 5c 00 80 00 00 00 01 00 00 00 01 05 e0 83 00
60: 0c 30 e0 fe 00 00 00 00 c9 41 00 00 00 00 00 00
70: 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 10 00 11 00 c0 0f 00 00 00 24 1b 00 11 a4 03 00
f0: 08 00 11 10 00 00 00 00 00 00 00 00 00 00 00 00


It is quite strange as on the other similar system (only rev 1/2 
difference), sky2 driver from this 2.6.22 kernel solved my problem 
(network hangs):


sky2 :03:00.0: v1.14 addr 0xf100 irq 16 Yukon-EC (0xb6) rev 2
sky2 eth0: addr 00:16:e6:5f:64:24
sky2 eth0: enabling interface
sky2 eth0: ram buffer 48K
sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control both

Best regards,

Krzysztof Olędzki

Re: Upgrading 2.6.21.7-2.6.22.9 kills my network (sky2): sky2 eth0: rx error, status 0x402300 length 60

2007-09-28 Thread Krzysztof Oledzki



On Fri, 28 Sep 2007, Krzysztof Oledzki wrote:




On Fri, 28 Sep 2007, Krzysztof Oledzki wrote:


Hello,

After upgrading my kernel from 2.6.21.7 to 2.6.22.9 my 88E8053 no longer 
works:


Small update: 2.6.22.9 with sky2.c/sky2.h from 2.4.22.4 works without any 
problems.


Final update.

Reverting this patch: 
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.22.y.git;a=commitdiff_plain;h=8c07a8e30ba8a2e0831da4b134202598435f8358

solved my problem.

I also found this one:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff_plain;h=d6532232cd3de79c852685823a9c52f723816d0a

Could it go to a next -stable ASAP, please? It seems that 
2.6.22.5-2.6.22.9 kernels have broken sky2 if used with vlans. :( Such 
regression in a -stable kernel isn't nice. :(


Best regards,

Krzysztof Olędzki

Re: Upgradeing 2.6.21.7-2.6.22.9 kill my network (sky2): sky2 eth0: rx error, status 0x402300 length 60

2007-09-28 Thread Krzysztof Oledzki



On Fri, 28 Sep 2007, Krzysztof Oledzki wrote:


Hello,

After upgrading my kernel from 2.6.21.7 to 2.6.22.9 my 88E8053 no longer 
works:


Small update: 2.6.22.9 with sky2.c/sky2.h from 2.4.22.4 works without any 
problems.


Best regards,


Krzysztof Olędzki

Re: [stable] Upgrading 2.6.21.7-2.6.22.9 kills my network (sky2): sky2 eth0: rx error, status 0x402300 length 60

2007-09-28 Thread Krzysztof Oledzki



On Fri, 28 Sep 2007, Greg KH wrote:


On Fri, Sep 28, 2007 at 01:11:27PM +0200, Krzysztof Oledzki wrote:



On Fri, 28 Sep 2007, Krzysztof Oledzki wrote:




On Fri, 28 Sep 2007, Krzysztof Oledzki wrote:


Hello,
After upgrading my kernel from 2.6.21.7 to 2.6.22.9 my 88E8053 no longer
works:


Small update: 2.6.22.9 with sky2.c/sky2.h from 2.4.22.4 works without any
problems.


Final update.

Reverting this patch:
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.22.y.git;a=commitdiff_plain;h=8c07a8e30ba8a2e0831da4b134202598435f8358
solved my problem.

I also found this one:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff_plain;h=d6532232cd3de79c852685823a9c52f723816d0a

Could it go to a next -stable ASAP, please? It seems that 2.6.22.5-2.6.22.9
kernels have broken sky2 if used with vlans. :( Such regression in a
-stable kernel isn't nice. :(


So should we just apply the second patch?  I'll let Stephen tell us what
we should do :)



Second patch works for me, so IMHO yes. Forget to mention that earlier, 
sorry. Ofcourse this should be the maintainer decision, this is only my 
vote. :)

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] sky2: fix VLAN receive processing

2007-09-28 Thread Krzysztof Oledzki



On Fri, 28 Sep 2007, Stephen Hemminger wrote:


The length check for truncated frames was not correctly handling
the case where VLAN acceleration had already read the tag.
Also, the Yukon EX has some features that use high bit of status
as security tag.



Thank you.



Best regards

Krzysztof Oledzki
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: incorrect cksum with tcp/udp on lo with 2.6.20/2.6.21/2.6.22

2007-09-24 Thread Krzysztof Oledzki



On Mon, 24 Sep 2007, Herbert Xu wrote:


On Sun, Sep 23, 2007 at 11:18:58PM +0200, Krzysztof Oledzki wrote:


Thank you for the information. Is there any easy way to turn them on? I
need it for LVS.


Do you really need it?


Yes. I would like to use a LVS redirector as both a client and a director:
 
http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.LVS-DR.html#director_as_client_in_LVS-DR


The packets should be checksummed at the point where they physically
leave the host.


So, with DR mode, packet goes by the lo device (with bad checksum) and 
then get redirected outside. Unfortunately, when it leaves host it has bad 
checksum, too. :(


Best regards,

Krzysztof Olędzki

Re: incorrect cksum with tcp/udp on lo with 2.6.20/2.6.21/2.6.22

2007-09-23 Thread Krzysztof Oledzki



On Sun, 23 Sep 2007, Herbert Xu wrote:


Krzysztof Oledzki [EMAIL PROTECTED] wrote:


It seems that after some not very recent changes udp and tcp packes
carring data send by a loopback have incorrect cksum:


This correct.  The loopback interfaces has the no checksum flag
set so we only provide a partial checksum on output (i.e., the
pseudoheader without the payload).

We even export this to user-space via a flag.  So you should fix
tcpdump to read this flag and ignore the checksum.


Thank you for the information. Is there any easy way to turn them on? I 
need it for LVS.


Best regards,

Krzysztof Olędzki

Re: [PATCH 1/2] bnx2: factor out gzip unpacker

2007-09-21 Thread Krzysztof Oledzki



On Fri, 21 Sep 2007, Denys Vlasenko wrote:


On Friday 21 September 2007 19:36, [EMAIL PROTECTED] wrote:

On Fri, 21 Sep 2007 19:05:23 BST, Denys Vlasenko said:


I plan to use gzip compression on following drivers' firmware,
if patches will be accepted:

   textdata bss dec hex filename
  17653  109968 240  127861   1f375 drivers/net/acenic.o
   6628  120448   4  127080   1f068 drivers/net/dgrs.o
 ^^


Should this be redone to use the existing firmware loading framework to
load the firmware instead?


Not in every case.

For example, bnx2 maintainer says that driver and
firmware are closely tied for his driver. IOW: you upgrade kernel
and your NIC is not working anymore.
Firmware may come with a kernel. We have a install modules, we can also 
add install firmware.



Another argument is to make kernel be able to bring up NICs
without needing firmware images in initramfs/initrd/hard drive.


It is not possible to bring up things like FC or WiFi without firmware, 
what special is in classic NICs?


Best regards,

Krzysztof Olędzki

incorrect cksum with tcp/udp on lo with 2.6.20/2.6.21/2.6.22

2007-09-13 Thread Krzysztof Oledzki

Hello,

It seems that after some not very recent changes udp and tcp packes 
carring data send by a loopback have incorrect cksum:


UDP:
# echo test|nc -u 127.0.0.1 

# tcpdump -i lo -n -v -v port 
tcpdump: listening on lo, link-type EN10MB (Ethernet), capture size 96 bytes
19:43:39.340576 IP (tos 0x0, ttl  64, id 15179, offset 0, flags [DF], proto: UDP 
(17), length: 33) 127.0.0.1.49512  127.0.0.1.: [bad udp cksum 174c!] UDP, 
length 5

TCP:
# echo test|nc -u 127.0.0.1 

tcpdump: listening on lo, link-type EN10MB (Ethernet), capture size 96 bytes
*Correct:
19:44:27.692614 IP (tos 0x0, ttl  64, id 32100, offset 0, flags [DF], proto: TCP (6), 
length: 60) 127.0.0.1.53804  127.0.0.1.: S, cksum 0xfd54 (correct), 
3426125135:3426125135(0) win 32792 mss 16396,sackOK,timestamp 1912797227 
0,nop,wscale 7
19:44:27.692674 IP (tos 0x0, ttl  64, id 0, offset 0, flags [DF], proto: TCP (6), 
length: 60) 127.0.0.1.  127.0.0.1.53804: S, cksum 0xea3f (correct), 
3427916955:3427916955(0) ack 3426125136 win 32768 mss 16396,sackOK,timestamp 
1912797227 1912797227,nop,wscale 7
19:44:27.692711 IP (tos 0x0, ttl  64, id 32101, offset 0, flags [DF], proto: TCP (6), 
length: 52) 127.0.0.1.53804  127.0.0.1.: ., cksum 0xd263 (correct), 1:1(0) ack 1 
win 257 nop,nop,timestamp 1912797227 1912797227

*Incorrect:
19:44:27.692831 IP (tos 0x0, ttl  64, id 32102, offset 0, flags [DF], proto: TCP (6), 
length: 57) 127.0.0.1.53804  127.0.0.1.: P, cksum 0xfe2d (incorrect (- 0xe07c), 
1:6(5) ack 1 win 257 nop,nop,timestamp 1912797227 1912797227

*Correct:
19:44:27.692859 IP (tos 0x0, ttl  64, id 9399, offset 0, flags [DF], proto: TCP (6), 
length: 52) 127.0.0.1.  127.0.0.1.53804: ., cksum 0xd25f (correct), 1:1(0) ack 6 
win 256 nop,nop,timestamp 1912797227 1912797227

Tested on:
 - 2.6.22.6
 - 2.6.21.7
 - 2.6.20.11

Best regards,


Krzysztof Olędzki

sky2: workaround for lost IRQ and 2.6.22-stable

2007-08-07 Thread Krzysztof Oledzki

Hello,

http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.21.y.git;a=commitdiff;h=fe1fe7c982f86624c692644e8ed05e132f4753cc

Is this fix going to be included in the next 2.6.22-stable release or is 
it not needed any more?


Best regards,

Krzysztof Olędzki

Re: Network card IRQ balancing with Intel 5000 series chipsets

2006-12-28 Thread Krzysztof Oledzki



On Wed, 27 Dec 2006, jamal wrote:


On Wed, 2006-27-12 at 09:09 +0200, Robert Iakobashvili wrote:



My scenario is treatment of RTP packets in kernel space with a single network
card (both Rx and Tx). The default of the Intel 5000 series chipset is
affinity of each
network card to a certain CPU. Currently, neither with irqbalance nor
with kernel
irq-balancing (MSI and io-apic attempted) I do not find a way to
balance that irq.


In the near future, when the NIC vendors wake up[1] because CPU vendors
- including big bad Intel -  are going to be putting out a large number
of hardware threads, you should be able to do more clever things with
such a setup. At the moment, just tie it to a single CPU and have your
other processes that are related running/bound on the other cores so you
can utilize them. OTOH, you say you are only using 30% of the one CPU,
so it may not be a big deal to tie your single nic to on cpu.


Anyway, it seems that with more advanced firewalls/routers kernel spends 
most of a time in IPSec/crypto code, netfilter conntrack and iptables 
rules/extensions, routing lookups, etc and not in hardware IRQ handler. 
So, it would be nice if this part coulde done by all CPUs.


Best regards,


Krzysztof Olędzki

Re: gratuitous arp

2006-11-26 Thread Krzysztof Oledzki



On Sun, 26 Nov 2006, James Courtier-Dutton wrote:


dean gaudet wrote:

On Sun, 26 Nov 2006, James Courtier-Dutton wrote:


dean gaudet wrote:

hi...

i ran into some problems recently which would have been avoided if my box
did a gratuitous arp as it brought up all interfaces (the router took
forever to timeout the ARP entries for interface aliases).  so i set 
about

looking to see why that wasn't happening.

...

Are you 100% sure about this?
Have you done a packet sniff on the network?
A lot of routers ignore gratuitous arp for security reasons.


yeah i've done some packet sniffing to verify this.

here's what happened (twice now):  i upgraded a (normally busy) box, so the 
MAC address changed.  the router is a cisco (not managed by me).


debian reboot sequence at some point brings up the primary eth0 address and 
very soon thereafter there will be an arp who-has $default_gw tell 
$primary_addr.  that's sufficient to get the cisco to update its ARP cache 
for $primary_addr.  this isn't gratuitous arp, but does the trick for the 
$primary_addr.


but there's no gratuitous arp for any eth0:N aliased interfaces... and the 
cisco ARP cache on this ISP router seems to be set to a long timeout.  i 
could reach eth0:N from local net, but couldn't get outside local net from 
eth0:N.


issuing arping -I eth0 -s $secondary_addr $default_gw for each secondary 
address updated the cisco ARP cache and i could then reach eth0:N remotely.


so... that may not be exactly gratuitous arp, but basically i was stuck 
until i forced the cisco to update its ARP cache for each of the secondary 
addrs...


it seems to me it'd be nice for the init sequence to take care of this, so 
that other folks don't have to spend time debugging similar problems.  i 
just wanted to ask if i'm missing something obvious before i go open a 
debian bug.  (i'm tempted to see if fedora does anything differently.)


thanks
-dean


Ok, I think it is better to just do gratuitous arp on the primary interface.
If one starts doing it on secondary interfaces, one would then have to also 
do it for all proxy-arp addresses(if used), and thinks could start getting 
rather messy.


BTW: There is no such thing like secondary interfaces. What you use 
(ethX:X) is emulation of interface aliases that was necessary for linux 
2.2.x, more than 5 yers ago. Currently (2.4/2.6) it is possible to add 
many addressess to one interface - all you need is the iproute2 package 
and utility called ip.


Best regards,

Krzysztof Olędzki

Re: Zero checksum in netconsole/netdump packets

2006-11-07 Thread Krzysztof Oledzki



On Tue, 7 Nov 2006, Gerrit Renker wrote:


Quoting Chris Lalancette:
|  Hello,
|   I realized that all of the packets that go from the crashing machine to 
the netdump server have a zero checksum.
snip
|   Assuming that this is just an oversight, attached is a simple patch to 
compute the UDP checksum in netpoll_send_udp.
|
|  Signed-off-by: Chris Lalancette [EMAIL PROTECTED]
|
RFC 768 allows to not compute the checksum by leaving uh-check at 0 - hence it 
is not illegal.


BTW: leaving UDP checksum at 0 is only valid for IPv4, with IPv6 we _have 
to_ compute a checksum.


Best regards,

Krzysztof Olędzki

Re: [Bugme-new] [Bug 7421] New: Oops, EIP is at atalk_sendmsg

2006-10-26 Thread Krzysztof Oledzki



On Thu, 26 Oct 2006, Andrew Morton wrote:


On Thu, 26 Oct 2006 04:08:36 -0700
[EMAIL PROTECTED] wrote:


http://bugzilla.kernel.org/show_bug.cgi?id=7421

   Summary: Oops, EIP is at atalk_sendmsg
Kernel Version: 2.6.18.1
Status: NEW
  Severity: normal
 Owner: [EMAIL PROTECTED]
 Submitter: [EMAIL PROTECTED]


Distribution: Debian sarge
Hardware Environment: i386

Problem Description:

ct 26 10:01:03 localhost papd[3120]: restart (2.0.3)
Oct 26 10:01:07 localhost kernel: BUG: unable to handle kernel NULL pointer \
dereference at virtual address 
Oct 26 10:01:07 localhost kernel:  printing eip:
Oct 26 10:01:07 localhost kernel: d0c16a8a
Oct 26 10:01:07 localhost kernel: *pde = 
Oct 26 10:01:07 localhost kernel: Oops:  [#1]
Oct 26 10:01:07 localhost kernel: Modules linked in: appletalk psnap llc ipv6 \
pcmcia_core af_packet parport_pc parport floppy pcspkr sn d_maestro3
snd_ac97_codec \
snd_ac97_bus snd_pcm snd_timer snd_page_alloc snd soundcore intel_agp uhci_hcd \
usbcore 3c59x mii agpgart mous edev tsdev joydev psmouse ide_cd cdrom rtc 
reiserfs \
ext3 jbd ide_disk ide_generic siimage aec62xx trm290 alim15x3 hpt34x hpt366
cmd64x  \
piix rz1000 slc90e66 generic cs5530 cs5520 sc1200 triflex atiixp pdc202xx_old \
pdc202xx_new opti621 ns87415 cy82c693 amd74xx sis5513 via 82cxxx serverworks
ide_core \
unix
Oct 26 10:01:07 localhost kernel: CPU:0
Oct 26 10:01:07 localhost kernel: EIP:0060:[pg0+277633674/1070257152]
Not \
tainted VLI
Oct 26 10:01:07 localhost kernel: EFLAGS: 00010286   (2.6.17.14.2006-10-25 #1)
Oct 26 10:01:07 localhost kernel: EIP is at atalk_sendmsg+0x15b/0x4e4 
[appletalk]
Oct 26 10:01:07 localhost kernel: eax:    ebx: 002f   ecx:  
  \
edx: 
Oct 26 10:01:07 localhost kernel: esi: cadcb600   edi:    ebp: cc9d7eec 
  \
esp: cc9d7d6c
Oct 26 10:01:07 localhost kernel: ds: 007b   es: 007b   ss: 0068
Oct 26 10:01:07 localhost kernel: Process afpd (pid: 3118, threadinfo=cc9d6000 \
task=cfe205d0)
Oct 26 10:01:07 localhost kernel: Stack:  c02b32c0  cc9d7ee8
cffbc500 \
 d0c16f05 cffbc500
Oct 26 10:01:07 localhost kernel:cffbc500 cc9d7ec8 cadcb600 
 \
0400 cc9d7f48 001b
Oct 26 10:01:07 localhost kernel:cc9d7ec8 cc9d7e1c cc9d7ee8 c01fe97a
cc9d7e1c \
ca252600 cc9d7ec8 001b
Oct 26 10:01:07 localhost kernel: Call Trace:
Oct 26 10:01:07 localhost kernel:  d0c16f05 atalk_recvmsg+0xf2/0x105
[appletalk]  \
c01fe97a sock_sendmsg+0xd0/0xeb
Oct 26 10:01:07 localhost kernel:  c0157bfd touch_atime+0xb4/0xbb  c0198b22 
\
copy_from_user+0x34/0x5a
Oct 26 10:01:07 localhost kernel:  c012383e autoremove_wake_function+0x0/0x3a 
 \
c0198b22 copy_from_user+0x34/0x5a
Oct 26 10:01:07 localhost kernel:  c01fe490 move_addr_to_kernel+0x24/0x39  \
c01ffaaa sys_sendto+0xe9/0x10d
Oct 26 10:01:07 localhost kernel:  c01fe67e sock_attach_fd+0x72/0xd2  
c0143d52 \
get_empty_filp+0x3b/0xe4
Oct 26 10:01:07 localhost kernel:  c0143d7b get_empty_filp+0x64/0xe4  
c0198ae4 \
copy_to_user+0x32/0x3c
Oct 26 10:01:07 localhost kernel:  c02001de sys_socketcall+0xf2/0x180
c0102a03 \
syscall_call+0x7/0xb
Oct 26 10:01:07 localhost kernel: Code: 0c 83 c0 04 eb 15 c6 44 24 1a 00 0f b7
86 26 \
01 00 00 66 89 44 24 18 8d 44 24 18 50 e8 e0 eb ff  ff 89 44 24 04 85 f6 5d 8b
14 24 \
8b 12 89 54 24 04 74 1b 8b 86 84 00 00 00 f6 c4 04 74 10 52 53
Oct 26 10:01:07 localhost kernel: EIP: [pg0+277633674/1070257152] \
atalk_sendmsg+0x15b/0x4e4 [appletalk] SS:ESP 0068:cc9d7d6c
Oct 26 10:01:21 localhost atalkd[3106]: as_timer gateway 8000.100 down



Steps to reproduce:
restart the machine, start papd after network initializing has finished
a second start of papd works fine

appletalk is loades as module

same behaviour with 2.6.17.14


Something like me too:

Unable to handle kernel NULL pointer dereference at virtual address 
 printing eip:
c036b1ef
*pde = 
Oops:  [#1]
PREEMPT
Modules linked in: bonding
CPU:0
EIP:0060:[c036b1ef]Not tainted VLI
EFLAGS: 00010286   (2.6.15.1)
EIP is at atalk_sendmsg+0x158/0x557
eax: d468fee4   ebx: 0017   ecx: d468fd20   edx: 
esi:    edi: d7e88200   ebp: bfa7c480   esp: d468fd68
ds: 007b   es: 007b   ss: 0068
Process atalkd (pid: 551, threadinfo=d468e000 task=d6f55090)
Stack:  d468ff40  d468fee0 d70d20a0 0003 c036b6e0 d70d20a0
   d70d20a0 d468fec0 d7e88200   0400 d468ff40 0003
   d468fec0 d468fe18 bfa7c480 c02e2d5e d468fe18 d7194540 d468fec0 0003
Call Trace:
 [c036b6e0] atalk_recvmsg+0xf2/0x105
 [c02e2d5e] sock_sendmsg+0xce/0xe9
 [c01212c2] 

Probably e1000 related Oops in 2.6.18-rc5

2006-08-31 Thread Krzysztof Oledzki

Hello,

My testing workstation running 2.6.18-rc5 Oopsed. It has a dualport e1000 
card with bonding and vlans.


All I have is three fotos made by a digital camera: 
http://www.ans.pl/Oops/1/


Hope it is enough.

Best regards,

Krzysztof Olędzki


Re: PATCH Fix bonding active-backup behavior for VLAN interfaces

2006-08-16 Thread Krzysztof Oledzki



On Mon, 14 Aug 2006, David Miller wrote:


From: Jay Vosburgh [EMAIL PROTECTED]
Date: Thu, 03 Aug 2006 18:01:35 -0700


In this case (bond0.555 above bond0 above eth0,eth1,etc),
skb_bond doesn't suppress duplicates because skb_bond is called with the
skb-dev set to the bond0.555 dev, not the ethX dev.  Non-accelerated
VLAN devices don't do this; they'll come in with skb-dev set to ethX
and will go through skb_bond as expected.


Ok, since __vlan_hwaccel_rx() bypasses the netif_receive_skb()
that would normally occur, we have to duplicate the bonding
drop checks.

The submitted patch put skb_bond() into if_vlan.h which is
definitely the wrong thing to do.  This is a generic operation
and therefore belongs in linux/netdevice.h at best.

Furthermore, we're only interested in the packet drop check,
so that's the only part of the logic we need to export,
the rest can stay private to skb_bond() in net/core/dev.c

Can the folks who can reproduce this try this patch?


Works for me, thank you.

Acked-by: Krzysztof Piotr Oledzki [EMAIL PROTECTED]

Best regards,

Krzysztof Olędzki

Re: PATCH Fix bonding active-backup behavior for VLAN interfaces

2006-08-10 Thread Krzysztof Oledzki



On Thu, 3 Aug 2006, Krzysztof Oledzki wrote:




On Wed, 2 Aug 2006, David Miller wrote:
CUT

Finally, I'm still a little stumped about why this change is necessary
still, to be honest.


If I understand it correctly this patch fixes the [PATCH] bonding: suppress 
duplicate packets patch:


http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=8f903c708fcc2b579ebf16542bf6109bad593a1d;hp=ebe19a4ed78d4a11a7e01cdeda25f91b7f2fcb5a

It seems that the original patch does not work properly in vlan accelerated 
environment, which I reported 31 Mar 2006

http://marc.theaimsgroup.com/?l=bonding-develm=114381240718113w=2

Anyway, I didn't test this patch yet but I'm going to di it ASAP.


OK, this patch really solves the bug from my report. Are there any chances 
for similar fix in the net-2.6.19.git?


Best regards,

Krzysztof Olędzki

Re: PATCH Fix bonding active-backup behavior for VLAN interfaces

2006-08-03 Thread Krzysztof Oledzki



On Wed, 2 Aug 2006, David Miller wrote:
CUT

Finally, I'm still a little stumped about why this change is necessary
still, to be honest.


If I understand it correctly this patch fixes the [PATCH] bonding: 
suppress duplicate packets patch:


http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=8f903c708fcc2b579ebf16542bf6109bad593a1d;hp=ebe19a4ed78d4a11a7e01cdeda25f91b7f2fcb5a

It seems that the original patch does not work properly in vlan 
accelerated environment, which I reported 31 Mar 2006

 http://marc.theaimsgroup.com/?l=bonding-develm=114381240718113w=2

Anyway, I didn't test this patch yet but I'm going to di it ASAP.

Best regards,

Krzysztof Olędzki

Re: problems with e1000 and jumboframes

2006-08-03 Thread Krzysztof Oledzki



On Thu, 3 Aug 2006, Benjamin LaHaise wrote:


On Thu, Aug 03, 2006 at 03:48:39PM +0200, Arnd Hannemann wrote:

However the box is a VIA Epia MII12000 with 1 GB of Ram and 1 GB of swap
enabled, so there should be plenty of memory available. HIGHMEM support
is off. The e1000 nic seems to be an 82540EM, which to my knowledge
should support jumboframes.



However I can't always reproduce this on a freshly booted system, so
someone else may be the culprit and leaking pages?

Any ideas how to debug this?


This is memory fragmentation, and all you can do is work around it until
the e1000 driver is changed to split jumbo frames up on rx.  Here are a
few ideas that should improve things for you:

- switch to a 2GB/2GB split to recover the memory lost to highmem
  (see Processor Type and Features / Memory split)
With 1 GB of RAM full 1GB/3GB (CONFIG_VMSPLIT_3G_OPT) seems to be 
enough...



- increase /proc/sys/vm/min_free_kbytes -- more free memory will
  improve the odds that enough unfragmented memory is available for
  incoming network packets


True. IMO, 65535 is a good starting point.

Best regards,
Krzysztof Olędzki

Re: problems with e1000 and jumboframes

2006-08-03 Thread Krzysztof Oledzki



On Thu, 3 Aug 2006, Benjamin LaHaise wrote:


On Thu, Aug 03, 2006 at 04:49:15PM +0200, Krzysztof Oledzki wrote:

With 1 GB of RAM full 1GB/3GB (CONFIG_VMSPLIT_3G_OPT) seems to be
enough...


Nope, you lose ~128MB of RAM for vmalloc space.


No sure:

Linux version 2.6.17.7 ([EMAIL PROTECTED]) (gcc version 3.4.6) #1 SMP PREEMPT 
Fri Jul 28 18:05:40 CEST 2006
BIOS-provided physical RAM map:
 BIOS-e820:  - 000a (usable)
 BIOS-e820: 0010 - 3ffc (usable)
 BIOS-e820: 3ffc - 3ffcfc00 (ACPI data)
 BIOS-e820: 3ffcfc00 - 3000 (reserved)
 BIOS-e820: e000 - f000 (reserved)
 BIOS-e820: fec0 - fec9 (reserved)
 BIOS-e820: fed0 - fed00400 (reserved)
 BIOS-e820: fee0 - fee1 (reserved)
 BIOS-e820: ffb0 - 0001 (reserved)
1023MB LOWMEM available.
found SMP MP-table at 000fe710
On node 0 totalpages: 262080
  DMA zone: 4096 pages, LIFO batch:0
  Normal zone: 257984 pages, LIFO batch:31
(...)

$ zcat /proc/config.gz |grep VMSPLIT
# CONFIG_VMSPLIT_3G is not set
CONFIG_VMSPLIT_3G_OPT=y
# CONFIG_VMSPLIT_2G is not set
# CONFIG_VMSPLIT_1G is not set


Best regards,

Krzysztof Olędzki

Re: problems with e1000 and jumboframes

2006-08-03 Thread Krzysztof Oledzki



On Thu, 3 Aug 2006, Evgeniy Polyakov wrote:
CUT

Why? After your explanation that makes sense for me. The driver needs
one contiguous chunk for those 9k packet buffer and thus requests a
3-order page of 16k. Or do i still do not understand this?


Correct, except that it wants 32k.
e1000 logic is following:
align frame size to power-of-two,

16K?


then skb_alloc adds a little
(sizeof(struct skb_shared_info)) at the end, and this ends up
in 32k request just for 9k jumbo frame.


Strange, why this skb_shared_info cannon be added before first alignment? 
And what about smaller frames like 1500, does this driver behave similar 
(first align then add)?


Best regards,

Krzysztof Olędzki

Re: skge driver oops

2006-07-30 Thread Krzysztof Oledzki



On Sun, 23 Jul 2006, Krzysztof Oledzki wrote:




On Fri, 26 May 2006, Stephen Hemminger wrote:


Please give this a try, it rearranges the transmit buffer management,
and may avoid issues with partial completions causing SKB reuse.


CUT

Plase excuse me, I overlooked this patch. Anyway, it seems that this fix went 
into the 2.6.16 kernel, which is already on the server that caused problems 
(http://bugzilla.kernel.org/show_bug.cgi?id=6142). I'll disable my workaround 
(/usr/sbin/ethtool -K eth1 tx off) and let you known about the results.


Strange, I had reenabled tx csum and there were no problems for about one 
week. Yesterday I had upgraded my kernel to the 2.6.17.7 and after one 
day, about 3 hours ago, my system crashed with following log:


 782b6fe4 skge_xmit_frame+0x121/0x2ea  781249b6 
raise_softirq_irqoff+0xe/0x59
 7833b9b7 qdisc_restart+0xc4/0x16b  78332352 net_tx_action+0x97/0xbd
 7812484d __do_softirq+0x59/0xc0  781248e4 do_softirq+0x30/0x35
 78124947 local_bh_enable+0x5e/0x7e  78332194 dev_queue_xmit+0x1b6/0x1bd
 7834ab2c ip_output+0x1b5/0x1eb  7834af00 ip_queue_xmit+0x39e/0x3e6
 78191f3e __ext3_get_inode_loc+0x53/0x201  7819df94 
journal_dirty_metadata+0x1d1/0x1eb
 7811bafb __wake_up+0x27/0x3b  7819e3dc journal_stop+0x1bd/0x1c9
 781963d0 __ext3_journal_stop+0x19/0x37  78192b58 ext3_dirty_inode+0x5d/0x63
 78359652 tcp_transmit_skb+0x38e/0x3af  7816d122 touch_atime+0x97/0x9d
 7835a89c tcp_write_xmit+0x1ad/0x212  7835a924 
__tcp_push_pending_frames+0x23/0x80
 78352732 do_tcp_setsockopt+0x12e/0x2f3  7832cd3c 
sock_common_setsockopt+0x1e/0x22
 7832ac7b sys_setsockopt+0x61/0x81  7832b242 sys_socketcall+0x164/0x1a4
 7815765d sys_sendfile+0x5d/0x84  78102c93 sysenter_past_esp+0x54/0x75
Bad page state in process 'swapper'
page:7985eb20 flags:0x80010008 mapping:e25867a0 mapcount:0 count:0
Trying to fix it up, but a reboot is needed
Backtrace:
 78140e43 bad_page+0x43/0x6c  781415e5 free_hot_cold_page+0x5b/0x123
 7832d700 skb_release_data+0x50/0x86  7832d741 kfree_skbmem+0xb/0x70
 78355b41 tcp_clean_rtx_queue+0x225/0x3e6  783560b1 tcp_ack+0x151/0x27b
 78358116 tcp_rcv_established+0x544/0x5ed  7835e972 tcp_v4_do_rcv+0x1f/0xb4
 7835ee8e tcp_v4_rcv+0x487/0x6de  7833f4ef nf_hook_slow+0xb3/0xce
 78347aac ip_local_deliver+0x11b/0x1ab  78348086 ip_rcv+0x40c/0x446
 783324e7 netif_receive_skb+0x16f/0x1a7  782b79a0 skge_poll+0x307/0x3e8
 78332661 net_rx_action+0x5c/0xd3  7812484d __do_softirq+0x59/0xc0
 781248e4 do_softirq+0x30/0x35  7812499d irq_exit+0x36/0x41
 78104edc do_IRQ+0x20/0x28  7810101c default_idle+0x0/0x55
 7810373e common_interrupt+0x1a/0x20  7810101c default_idle+0x0/0x55
 78101048 default_idle+0x2c/0x55  78101132 cpu_idle+0xad/0xda

I know it is incomplete (this is all what I am able to find in my logs) 
but it looks _very_ similar to the one from:

http://bugzilla.kernel.org/show_bug.cgi?id=6142

BTW: During normal work skge driver still logs (about 10 times per 1 hour) 
informations about hardware error. However, message changed slightly - in 
2.6.16 it was:

 skge hardware error detected (status 0x400)
but in 2.6.17 it is:
 skge :00:0b.0: PCI error cmd=0x7 status=0x82b0
 skge :00:0b.0: PCI error cmd=0x147 status=0xc2b0
 skge :00:0b.0: PCI error cmd=0x147 status=0xc2b0
 skge :00:0b.0: PCI error cmd=0x147 status=0xc2b0
 skge :00:0b.0: PCI error cmd=0x147 status=0xc2b0
 skge :00:0b.0: PCI error cmd=0x147 status=0xc2b0
(...)

Anyway, everything works fine. I don't know if it is somehow related to 
mentioned crashes.


Best regards,

Krzysztof Oledzki
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: skge driver oops

2006-07-22 Thread Krzysztof Oledzki



On Fri, 26 May 2006, Stephen Hemminger wrote:


Please give this a try, it rearranges the transmit buffer management,
and may avoid issues with partial completions causing SKB reuse.


CUT

Plase excuse me, I overlooked this patch. Anyway, it seems that this fix 
went into the 2.6.16 kernel, which is already on the server that caused 
problems (http://bugzilla.kernel.org/show_bug.cgi?id=6142). I'll disable 
my workaround (/usr/sbin/ethtool -K eth1 tx off) and let you known about 
the results.


Thank you.

Best regards,

Krzysztof Olędzki

Re: [e1000]: flow control on by default - good idea really?

2006-07-05 Thread Krzysztof Oledzki



On Wed, 5 Jul 2006, Auke Kok wrote:


jamal wrote:

On Tue, 2006-04-07 at 13:11 -0400, jamal wrote:

I have a device connected to a e1000 that was erroneously advertising
both tx/rx flow control but wasnt properly reacting to it. The default 
setup on the e1000 has rx flow control turned on.

I was sending at wire rate gige from the device - which is about
1.48Mpps. The e1000 was in turn sending me flow control packets
as per default/expected behavior. Unfortunately, it was sending
a very large amount of packets. At one point i was seeing upto
1Mpps and on average, the flow control packets were consuming
60-70% of the bandwidth. Even when i fixed this behavior to act
properly, allowing flow control on consumed up to 15% of the bandwidth. 
Clearly, this is a bad thing. Yes, the device in the first instance was

at fault. But i have argued in the past that NAPI does just fine without
flow control being turned on, so even chewing 5% of bandwidth on flow
control is a bad thing..

As a compromise, can we declare flow control as an advanced feature
and turn it off by default? People who feel it is valuable and know
what they are doing can turn it off.


I meant turn it on.

BTW, As an addendum this default behavior changed around 2.6.16 it
seems.


Flow Control is using the EEPROM provided value, the module driver itself 
does not choose a default:


e1000_param.c:

/* User Specified Flow Control Override
*
* Valid Range: 0-3
*  - 0 - No Flow Control
*  - 1 - Rx only, respond to PAUSE frames but do not generate them
*  - 2 - Tx only, generate PAUSE frames but ignore them on receive
*  - 3 - Full Flow Control Support
*
* Default Value: Read flow control settings from the EEPROM
*/

Turning flow control off usually (i.e. almost always) causes (significantly) 
_degraded_ performance. We should really leave it the way it is (as per 
eeprom setting), and this is best for most if not all people. The card itself 
has this value programmed, which makes it possible for the user to turn 
on/off flowcontrol per card consistently, which makes much more sense to me. 
Also considering e1000 hardware varies significantly.


I was never able to find such tool for Linux or at least DOS. Where should 
I look for it?


Best regards,

Krzysztof Olędzki

Re: [e1000]: flow control on by default - good idea really?

2006-07-05 Thread Krzysztof Oledzki



On Wed, 5 Jul 2006, Auke Kok wrote:


David Miller wrote:

From: jamal [EMAIL PROTECTED]
Date: Tue, 04 Jul 2006 15:20:39 -0400


BTW, As an addendum this default behavior changed around 2.6.16 it
seems.


Flow control has been on by default in the tg3 driver since the
beginning, maybe e1000 only recently started to behave that way
but it's the right thing to do IMHO.


As said earlier, e1000 always honors the EEPROM setting for this, which has 
been _on_ by default for all cards (AFAIK, that is).


I'm not sure:

[EMAIL PROTECTED]:~# mii-tool -v eth0
eth0: negotiated 100baseTx-FD, link ok
  product info: vendor 00:aa:00, model 56 rev 0
  basic mode:   autonegotiation enabled
  basic status: autonegotiation complete, link ok
  capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
  advertising:  100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
  link partner: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control

[EMAIL PROTECTED]:~# ethtool -d eth0|grep flow
  Receive flow control:  disabled
  Transmit flow control: disabled

[EMAIL PROTECTED]:~# uname -r
2.6.14.3

[EMAIL PROTECTED]:~# mii-tool -v eth0
eth0: negotiated 100baseTx-FD flow-control, link ok
  product info: vendor 00:aa:00, model 56 rev 0
  basic mode:   autonegotiation enabled
  basic status: autonegotiation complete, link ok
  capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
  advertising:  100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control
  link partner: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control

[EMAIL PROTECTED]:~# ethtool -d eth0|grep flow
  Receive flow control:  enabled
  Transmit flow control: enabled

[EMAIL PROTECTED]:~# uname -r
2.6.16.19

This is exactly the same hardware, only kernel was recently upgraded on 
the r2.


Best regards,


Krzysztof Olędzki

Re: [PATCH UPDATE netdev-2.6.git] bonding: suppress duplicate packets

2006-04-01 Thread Krzysztof Oledzki



On Fri, 31 Mar 2006, Jay Vosburgh wrote:


Krzysztof Oledzki [EMAIL PROTECTED] wrote:
[...]

I took this patch from linux-2.6 using git tree and applied to 2.6.16.1
together with recent link status fix. Unfortunately broadcast packet
duplication still occurs.


I am unable to induce any duplicate packets using the current
netdev-2.6.git upstream branch (which should be the same bonding driver
as you're using).  I tried it with and without VLANs, using ping to
various addresses (unicast, subnet broadcast, all-1s broadcast).  I'm
using a Cisco switch, and I'm issuing the IOS command clear mac
address-table dynamic to induce it to (briefly) flood traffic to all
ports.

The only duplicates I see are ping pointing out duplicate
returns from the multiple stations on the network.  I don't see bonding
delivering two copies of the same packet.

Using the unmodified 2.6.16.1 kernel, I do see multiple copies
of the same packet from a ping to the broadcast address using the method
I describe above.


Thank you for your tests and fast response.


I am using a different network device (tg3), although I'm not
sure how that would affect this.


Probably this is not releated.


Under what circumstances are you seeing duplicates, and what
type of traffic is it?


If I set net.ipv4.icmp_echo_ignore_broadcasts=0 I'm seeing duplicates 
while pinging broadcast address:


# ping 192.168.149.255 -b
WARNING: pinging broadcast address
PING 192.168.149.255 (192.168.149.255) 56(84) bytes of data.
64 bytes from 192.168.149.21: icmp_seq=1 ttl=128 time=0.159 ms
64 bytes from 192.168.149.2: icmp_seq=1 ttl=64 time=0.267 ms (DUP!)
64 bytes from 192.168.149.11: icmp_seq=1 ttl=128 time=0.279 ms (DUP!)
64 bytes from 192.168.149.2: icmp_seq=1 ttl=64 time=0.288 ms (DUP!)
64 bytes from 192.168.149.10: icmp_seq=1 ttl=128 time=0.295 ms (DUP!)

Please notice that 192.168.149.2 responded two times.

If I run tcpdump on 192.168.149.2 it shows:

[EMAIL PROTECTED]:~# tcpdump -i vlan19 -n icmp -e
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vlan19, link-type EN10MB (Ethernet), capture size 96 bytes

15:41:07.512007 00:14:22:b0:cb:52  ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), 
length 98: 192.168.149.3  192.168.149.255: ICMP echo request, id 27686, seq 1, 
length 64
15:41:07.512111 00:14:22:b0:c9:f9  00:14:22:b0:cb:52, ethertype IPv4 (0x0800), 
length 98: 192.168.149.2  192.168.149.3: ICMP echo reply, id 27686, seq 1, length 
64
15:41:07.512139 00:14:22:b0:cb:52  ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), 
length 98: 192.168.149.3  192.168.149.255: ICMP echo request, id 27686, seq 1, 
length 64
15:41:07.512160 00:14:22:b0:c9:f9  00:14:22:b0:cb:52, ethertype IPv4 (0x0800), 
length 98: 192.168.149.2  192.168.149.3: ICMP echo reply, id 27686, seq 1, length 
64

So it seems I must have done something wrong but I have no idea what? 
Wrong patch? I'm using exactly this one:

ftp://ftp.ans.pl/pub/patches/0140-bonding_suppress_duplicate_packets.patch

Best regards,

Krzysztof Olędzki

Re: [PATCH UPDATE netdev-2.6.git] bonding: suppress duplicate packets

2006-03-31 Thread Krzysztof Oledzki



On Tue, 21 Feb 2006, Jay Vosburgh wrote:



Originally submitted by Kenzo Iwami; his original description is:

The current bonding driver receives duplicate packets when broadcast/
multicast packets are sent by other devices or packets are flooded by the
switch. In this patch, new flags are added in priv_flags of net_device
structure to let the bonding driver discard duplicate packets in
dev.c:skb_bond().

Modified by Jay Vosburgh to change a define name, update some
comments, rearrange the new skb_bond() for clarity, clear all bonding
priv_flags on slave release, and update the driver version.

Signed-off-by: Kenzo Iwami [EMAIL PROTECTED]
Signed-off-by: Jay Vosburgh [EMAIL PROTECTED]


CUT

I took this patch from linux-2.6 using git tree and applied to 2.6.16.1 
together with recent link status fix. Unfortunately broadcast packet 
duplication still occurs.


I have two e1000 NICs:

02:04.0 Ethernet controller: Intel Corporation 82541GI/PI Gigabit Ethernet 
Controller (rev 05)
04:03.0 Ethernet controller: Intel Corporation 82541GI/PI Gigabit Ethernet 
Controller (rev 05)

My configuration follows:

# echo -n 1  /sys/class/net/bond0/bonding/mode
# echo -n 100  /sys/class/net/bond0/bonding/miimon
# /sbin/ifconfig bond0 up
# ifenslave bond0 eth0 eth1
# echo -n eth0  /sys/class/net/bond0/bonding/primary

# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.0.3 (March 23, 2006)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: eth0
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth0
MII Status: up
Link Failure Count: 7
Permanent HW addr: 00:14:22:b0:c9:f9

Slave Interface: eth1
MII Status: up
Link Failure Count: 7
Permanent HW addr: 00:14:22:b0:c9:fa


[EMAIL PROTECTED]:~# /sbin/ifconfig eth0
eth0  Link encap:Ethernet  HWaddr 00:14:22:B0:C9:F9
  inet6 addr: fe80::214:22ff:feb0:c9f9/64 Scope:Link
  UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
  RX packets:612084 errors:0 dropped:0 overruns:0 frame:0
  TX packets:720804 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:65497223 (62.4 Mb)  TX bytes:193356963 (184.3 Mb)
  Base address:0xecc0 Memory:fe9e-fea0

[EMAIL PROTECTED]:~# /sbin/ifconfig eth1
eth1  Link encap:Ethernet  HWaddr 00:14:22:B0:C9:F9
  inet6 addr: fe80::214:22ff:feb0:c9f9/64 Scope:Link
  UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
  RX packets:85134 errors:0 dropped:0 overruns:0 frame:0
  TX packets:1161 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:10730231 (10.2 Mb)  TX bytes:93436 (91.2 Kb)
  Base address:0xdcc0 Memory:fe5e-fe60

I'm using .1Q vlans over bondig interface:

# cat /proc/net/vlan/config
VLAN Dev name| VLAN ID
Name-Type: VLAN_NAME_TYPE_PLUS_VID_NO_PAD
vlan1  | 1  | bond0
vlan2  | 2  | bond0
vlan3  | 3  | bond0
vlan4  | 4  | bond0
vlan5  | 5  | bond0
vlan6  | 6  | bond0
vlan7  | 7  | bond0
vlan18 | 18  | bond0
vlan19 | 19  | bond0
vlan33 | 33  | bond0
vlan34 | 34  | bond0
vlan37 | 37  | bond0
vlan66 | 66  | bond0


Any ideas?

Best regards,

Krzysztof Olędzki

Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...

2006-03-30 Thread Krzysztof Oledzki



On Wed, 29 Mar 2006, Brandeburg, Jesse wrote:


Hi all, I've identified you as people who have at some point in the past
emailed one of the Linux lists with problems with e1000 and
sk_forward_alloc.  It seems to be fairly widespread, but only seems to
have appeared with recent kernel changes (after 2.6.12...)

What I need from you is a reproducible test, and some information.  I
have never been able to reproduce this, and I'm trying to isolate the
problem a bit.  What motherboards are you using?

RIOWORKS/PDRCA.

# lspci
00:00.0 Host bridge: ServerWorks GCNB-LE Host Bridge (rev 32)
00:00.1 Host bridge: ServerWorks GCNB-LE Host Bridge
00:02.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
00:03.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet 
Controller (rev 02)
00:04.0 I2O: Adaptec (formerly DPT) SmartRAID V Controller (rev 01)
00:04.1 PCI bridge: Adaptec (formerly DPT) PCI Bridge (rev 01)
00:06.0 Ethernet controller: D-Link System Inc DL2000-based Gigabit Ethernet 
(rev 0c)
00:08.0 SCSI storage controller: Initio Corporation INI-A100U2W (rev 01)
00:0e.0 IDE interface: ServerWorks CSB6 IDE Controller (rev a0)
00:0f.0 Host bridge: ServerWorks CSB6 South Bridge (rev a0)
00:0f.1 IDE interface: ServerWorks CSB6 RAID/IDE Controller (rev a0)
00:0f.2 USB Controller: ServerWorks CSB6 OHCI USB Controller (rev 05)
00:0f.3 ISA bridge: ServerWorks GCLE-2 Host Bridge


What seems to cause this problem?

Don't known as this problem occurs only occasionally.


Are you all using iptables?

Yes, this is a www proxy server with -j REDIRECT.


 Are you all routing?

Some kind as this is a transparent www proxy.

From the reports I assume none of you are using an 82571/2/3 (pci 
express)



00:03.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet 
Controller (rev 02)
Subsystem: Rioworks: Unknown device 3011
Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 169
Memory at d000 (32-bit, non-prefetchable) [size=128K]
I/O ports at 2c00 [size=64]
Capabilities: [dc] Power Management version 2
Capabilities: [e4] PCI-X non-bridge device.
Capabilities: [f0] Message Signalled Interrupts: 64bit+ Queue=0/0 
Enable-

Thank you.

Best regards,

Krzysztof Olędzki

Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...

2006-03-30 Thread Krzysztof Oledzki



On Thu, 30 Mar 2006, Mark Nipper wrote:


On 29 Mar 2006, Brandeburg, Jesse wrote:

What I need from you is a reproducible test, and some information.  I
have never been able to reproduce this, and I'm trying to isolate the
problem a bit.  What motherboards are you using?  What seems to cause
this problem?  Are you all using iptables?  Are you all routing? From
the reports I assume none of you are using an 82571/2/3 (pci express)


   Unfortunately, my problem machine is a remote, leased
server, so I'd have to ask my provider for information on the
motherboard.


You can probably check this with the dmidecode tool.

Best regards,

Krzysztof Olędzki

Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...

2006-03-30 Thread Krzysztof Oledzki



On Thu, 30 Mar 2006, Phil Oester wrote:


On 29 Mar 2006, Brandeburg, Jesse wrote:
What I need from you is a reproducible test, and some information.  I


From all the reports which have come in thus far, it seems everyone
has  1 e1000.  One person even reported that removing one of the two
nics solved the problem for him.  Does this help narrow down the search
area?


I have only one. Anyway, this massage happens _very_ occasionally in my
case.


Best regrads,

Krzysztof Olędzki

Re: KERNEL: assertion (!sk-sk_forward_alloc) failed

2006-03-10 Thread Krzysztof Oledzki



On Fri, 10 Mar 2006, David S. Miller wrote:


From: Ian McDonald [EMAIL PROTECTED]
Date: Fri, 10 Feb 2006 08:37:48 +1300


On 2/10/06, Boris B. Zhmurov [EMAIL PROTECTED] wrote:

Hello, Ian McDonald.

On 09.02.2006 22:25 you said the following:


Is it possible for you to download 2.6.16-rc2 or similar and see if it
goes away?


It'll be better, if I get only patch fixs that problem, not all 2.6.16-rc2.


Oops I didn't read Jesse's message earlier properly.

That patch which probably fixed it is (from his message):
I think the commit id that is missing from 2.6.14.X is
fb5f5e6e0cebd574be737334671d1aa8f170d5f3


This patch is in the linux-2.6.14 stable tree, I just
verified this.


So it must be another problem: I had this message with 2.6.15.2:

KERNEL: assertion (!sk-sk_forward_alloc) failed at net/core/stream.c (279)
KERNEL: assertion (!sk-sk_forward_alloc) failed at net/ipv4/af_inet.c (148)

Best regards,

Krzysztof Olędzki

Re: Fw: [Bugme-new] [Bug 5946] New: KERNEL: assertion (!sk-sk_forward_alloc) failed at net/core/stream.c (279)

2006-01-24 Thread Krzysztof Oledzki



On Tue, 24 Jan 2006, Andrew Morton wrote:




Begin forwarded message:

Date: Tue, 24 Jan 2006 00:11:51 -0800
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: [Bugme-new] [Bug 5946] New: KERNEL: assertion (!sk-sk_forward_alloc) 
failed at net/core/stream.c (279)


http://bugzilla.kernel.org/show_bug.cgi?id=5946

  Summary: KERNEL: assertion (!sk-sk_forward_alloc) failed at
   net/core/stream.c (279)
   KERNEL: assertion (!sk-sk_forward_alloc) failed at
   net/ipv4/af_inet.c (148)
   Kernel Version: 2.6.15.1
   Status: NEW
 Severity: normal
Owner: [EMAIL PROTECTED]
Submitter: [EMAIL PROTECTED]


Most recent kernel where this bug did not occur: 2.6.13
Distribution: Gentoo
Hardware Environment: P4 3.2GHz 2x e1000 driver 4GB RAM 8 SCSI DISCS
Software Environment: Squid
Problem Description: dmesg shows :

KERNEL: assertion (!sk-sk_forward_alloc) failed at net/core/stream.c (279)
KERNEL: assertion (!sk-sk_forward_alloc) failed at net/ipv4/af_inet.c (148)


Just found the same message on my logs, so me too.

/var/log/old/syslog.19:Jan  4 16:20:57 bizon kernel: KERNEL: assertion 
(!sk-sk_forward_alloc) failed at net/core/stream.c (279)
/var/log/old/syslog.19:Jan  4 16:20:58 bizon kernel: KERNEL: assertion 
(!sk-sk_forward_alloc) failed at net/ipv4/af_inet.c (148)

This happend only once, it was with the 2.6.14.2 kernel.

It is a dual Xeon server with HT (4 logicals CPU total) running Slackware 
(NPTL) with apache, mysql, squid, sendmail, amavis, clamav, spamassassin, 
pop3/imap (courier).


# zcat /proc/config.gz |grep PRE
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_BKL=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_DEBUG_PREEMPT=y

Best regards,

Krzysztof Olędzki

Re: [Ipsec-tools-devel] Re: [PATCH]: Re: SA switchover

2005-12-19 Thread Krzysztof Oledzki



On Sun, 18 Dec 2005, David S. Miller wrote:


From: David S. Miller [EMAIL PROTECTED]
Date: Sun, 18 Dec 2005 13:20:19 -0800 (PST)


From: Krzysztof Oledzki [EMAIL PROTECTED]
Date: Sun, 18 Dec 2005 17:49:50 +0100 (CET)


At 17:31:26 kernel executed the one from xfrm_state_add() (Ole #2) but it
didn't help. :(


Thanks for testing, I'll try to figure out what might be going
on.


Ok, xfrm_flush_bundles() isn't pruning the bundles because they
still look valid.

We fix this by adding a xfrm_flush_all_bundles() that doesn't
do the validity check and simply flushes everything.

Please give this new version of the patch a try, thanks.


OK. With this patch kernel switches to new SA immediately, but only for 
ping. TCP (ssh) session between Cisco and Linux is still protected by the 
old SA.


Tested by running two tests simultaneously:
 - while true ; do echo -ne . ; sleep 1; done over ssh
 - ping
Both protected by the same ipsec policy.

ssh:
10:21:58.376530 IP 192.168.0.24  192.168.0.7: ESP(spi=0x00648e34,seq=0x17c)
10:21:58.376856 IP 192.168.0.7  192.168.0.24: ESP(spi=0x1acf2fac,seq=0x17c)

ping:
10:21:58.943229 IP 192.168.0.7  192.168.0.24: ESP(spi=0x1acf2fac,seq=0x17d)
10:21:58.947768 IP 192.168.0.24  192.168.0.7: ESP(spi=0x00648e34,seq=0x17d)

ssh:
10:21:59.396334 IP 192.168.0.24  192.168.0.7: ESP(spi=0x00648e34,seq=0x17e)
10:21:59.396664 IP 192.168.0.7  192.168.0.24: ESP(spi=0x1acf2fac,seq=0x17e)

ping:
10:21:59.944079 IP 192.168.0.7  192.168.0.24: ESP(spi=0x1acf2fac,seq=0x17f)
10:21:59.971934 IP 192.168.0.24  192.168.0.7: ESP(spi=0x00648e34,seq=0x17f)

* New SA was negotiated:
Dec 19 10:22:00 chochlik racoon: INFO: IPsec-SA established: ESP/Tunnel 
192.168.0.24[0]-192.168.0.7[0] spi=228316027(0xd9bd37b)
Dec 19 10:22:00 chochlik racoon: INFO: IPsec-SA established: ESP/Tunnel 
192.168.0.7[0]-192.168.0.24[0] spi=3587656557(0xd5d74b6d)

* Cisco switched to the new SA immediately, Linux switched only partially:
ssh:
10:22:00.416215 IP 192.168.0.24  192.168.0.7: ESP(spi=0x0d9bd37b,seq=0x1)
10:22:00.416607 IP 192.168.0.7  192.168.0.24: ESP(spi=0x1acf2fac,seq=0x180)

ping:
10:22:00.944950 IP 192.168.0.7  192.168.0.24: ESP(spi=0xd5d74b6d,seq=0x1)
10:22:00.949622 IP 192.168.0.24  192.168.0.7: ESP(spi=0x0d9bd37b,seq=0x2)

ssh:
10:22:01.436183 IP 192.168.0.24  192.168.0.7: ESP(spi=0x0d9bd37b,seq=0x3)
10:22:01.436523 IP 192.168.0.7  192.168.0.24: ESP(spi=0x1acf2fac,seq=0x181)

ping:
10:22:01.945777 IP 192.168.0.7  192.168.0.24: ESP(spi=0xd5d74b6d,seq=0x2)
10:22:01.950323 IP 192.168.0.24  192.168.0.7: ESP(spi=0x0d9bd37b,seq=0x4)

(...)

* Executed ip route flush cache:
ssh:
10:22:16.743559 IP 192.168.0.24  192.168.0.7: ESP(spi=0x0d9bd37b,seq=0x21)
10:22:16.744028 IP 192.168.0.7  192.168.0.24: ESP(spi=0xd5d74b6d,seq=0x11)

ping:
10:22:16.959512 IP 192.168.0.7  192.168.0.24: ESP(spi=0xd5d74b6d,seq=0x12)
10:22:16.964147 IP 192.168.0.24  192.168.0.7: ESP(spi=0x0d9bd37b,seq=0x22)


Best regards,


Krzysztof Olędzki

Re: [PATCH]: Re: SA switchover

2005-12-19 Thread Krzysztof Oledzki



On Mon, 19 Dec 2005, jamal wrote:


On Mon, 2005-19-12 at 13:57 -0800, David S. Miller wrote:

From: jamal [EMAIL PROTECTED]
Date: Mon, 19 Dec 2005 08:17:19 -0500


Just an addendum: If this works it should be sysctl controlled i hope.


There is absolutely no reason for that, so no :)



Well, we went from use old SA to use new SA policy;-


No, we went from use both new and old SA to always use the same (new) 
SA. Adding a sysctl for keeping kernel buggy is totally wrong.



best regards,

Krzysztof Olędzki

Re: [Ipsec-tools-devel] Re: [PATCH]: Re: SA switchover

2005-12-19 Thread Krzysztof Oledzki



On Mon, 19 Dec 2005, David S. Miller wrote:


From: Krzysztof Oledzki [EMAIL PROTECTED]
Date: Mon, 19 Dec 2005 10:37:14 +0100 (CET)


OK. With this patch kernel switches to new SA immediately, but only for
ping. TCP (ssh) session between Cisco and Linux is still protected by the
old SA.


Ok, we're making progress :-)

When the bundles get flushed, xfrm_prune_bundles() accumulates all
the per-policy bundles into a list and runs dst_free() on each
and every one.

Unless marked obsolete already (these dst's should not be marked
obsolete), it invokes __dst_free() which marks the dst as obsolete
and this in turn should trigger the cached socket route check here
in __sk_dst_check().

static inline struct dst_entry *
__sk_dst_check(struct sock *sk, u32 cookie)
{
struct dst_entry *dst = sk-sk_dst_cache;

if (dst  dst-obsolete  dst-ops-check(dst, cookie) == NULL) {
sk-sk_dst_cache = NULL;
dst_release(dst);
return NULL;
}

return dst;
}

Oh, that's the bug, dst-ops-check() is xfrm_dst_check().  That tests
validity using stable_bundle() which thinks the dst is still
valid.  Please add these two lines:

if (dst-obsolete)
return NULL;

at the beginning of xfrm_dst_check() and all should be fine.


Yes, it works now perfectly:

06:19:09.363154 IP 192.168.0.24  192.168.0.7: ESP(spi=0x03456676,seq=0x145)
06:19:09.363548 IP 192.168.0.7  192.168.0.24: ESP(spi=0x4fd702b2,seq=0x166)
06:19:09.736632 IP 192.168.0.7  192.168.0.24: ESP(spi=0x4fd702b2,seq=0x167)
06:19:09.741256 IP 192.168.0.24  192.168.0.7: ESP(spi=0x03456676,seq=0x146)

Dec 20 06:19:10 chochlik racoon: INFO: IPsec-SA established: ESP/Tunnel 
192.168.0.24[0]-192.168.0.7[0] spi=72688259(0x4552283)
Dec 20 06:19:10 chochlik racoon: INFO: IPsec-SA established: ESP/Tunnel 
192.168.0.7[0]-192.168.0.24[0] spi=671780776(0x280a8fa8)

06:19:10.382903 IP 192.168.0.24  192.168.0.7: ESP(spi=0x04552283,seq=0x1)
06:19:10.383364 IP 192.168.0.7  192.168.0.24: ESP(spi=0x280a8fa8,seq=0x1)
06:19:10.737511 IP 192.168.0.7  192.168.0.24: ESP(spi=0x280a8fa8,seq=0x2)
06:19:10.742083 IP 192.168.0.24  192.168.0.7: ESP(spi=0x04552283,seq=0x2)


Dziekuje bardzo for all of your testing so far Krzysztof.


Dziekuje bardzo ;)

Best regards,

Krzysztof Olędzki

Re: [Ipsec-tools-devel] Re: [PATCH]: Re: SA switchover

2005-12-18 Thread Krzysztof Oledzki



On Thu, 15 Dec 2005, David S. Miller wrote:


From: David S. Miller [EMAIL PROTECTED]
Date: Thu, 15 Dec 2005 17:52:54 -0800 (PST)


diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 7cf48aa..25dd8f4 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c


Sorry, that patch was incomplete, please try this one instead:


It does not work. :(

192.168.0.7 - Linux
192.168.0.24 - Cisco

Tested it by running ping directly from Linux IPSec gateway:

17:31:22.830181 IP 192.168.0.7  192.168.0.24: ESP(spi=0x4ca5896a,seq=0x57)
17:31:22.834761 IP 192.168.0.24  192.168.0.7: ESP(spi=0x0a91a2ae,seq=0x57)
17:31:23.830997 IP 192.168.0.7  192.168.0.24: ESP(spi=0x4ca5896a,seq=0x58)
17:31:23.835811 IP 192.168.0.24  192.168.0.7: ESP(spi=0x0a91a2ae,seq=0x58)
17:31:24.831855 IP 192.168.0.7  192.168.0.24: ESP(spi=0x4ca5896a,seq=0x59)
17:31:24.836430 IP 192.168.0.24  192.168.0.7: ESP(spi=0x0a91a2ae,seq=0x59)
17:31:25.832692 IP 192.168.0.7  192.168.0.24: ESP(spi=0x4ca5896a,seq=0x5a)
17:31:25.837190 IP 192.168.0.24  192.168.0.7: ESP(spi=0x0a91a2ae,seq=0x5a)

New IPsec-SA was negotiated:
Dec 18 17:31:26 chochlik racoon: INFO: respond new phase 2 negotiation: 
192.168.0.7[0]=192.168.0.24[0]
Dec 18 17:31:26 chochlik racoon: INFO: IPsec-SA established: ESP/Tunnel 
192.168.0.24[0]-192.168.0.7[0] spi=132988380(0x7ed3ddc)
Dec 18 17:31:26 chochlik racoon: INFO: IPsec-SA established: ESP/Tunnel 
192.168.0.7[0]-192.168.0.24[0] spi=1929290090(0x72fea16a)

Cisco switched to the new SA immediately:
17:31:26.833579 IP 192.168.0.7  192.168.0.24: ESP(spi=0x4ca5896a,seq=0x5b)
17:31:26.838184 IP 192.168.0.24  192.168.0.7: ESP(spi=0x07ed3ddc,seq=0x1)
17:31:27.834389 IP 192.168.0.7  192.168.0.24: ESP(spi=0x4ca5896a,seq=0x5c)
17:31:27.839044 IP 192.168.0.24  192.168.0.7: ESP(spi=0x07ed3ddc,seq=0x2)
17:31:28.835245 IP 192.168.0.7  192.168.0.24: ESP(spi=0x4ca5896a,seq=0x5d)
17:31:28.839843 IP 192.168.0.24  192.168.0.7: ESP(spi=0x07ed3ddc,seq=0x3)
17:31:29.836088 IP 192.168.0.7  192.168.0.24: ESP(spi=0x4ca5896a,seq=0x5e)
17:31:29.840708 IP 192.168.0.24  192.168.0.7: ESP(spi=0x07ed3ddc,seq=0x4)

Executed ip route flush cache, linux switched to the new SA:
17:31:30.837009 IP 192.168.0.7  192.168.0.24: ESP(spi=0x72fea16a,seq=0x1)
17:31:30.841616 IP 192.168.0.24  192.168.0.7: ESP(spi=0x07ed3ddc,seq=0x5)
17:31:31.837779 IP 192.168.0.7  192.168.0.24: ESP(spi=0x72fea16a,seq=0x2)
17:31:31.842349 IP 192.168.0.24  192.168.0.7: ESP(spi=0x07ed3ddc,seq=0x6)
17:31:32.838647 IP 192.168.0.7  192.168.0.24: ESP(spi=0x72fea16a,seq=0x3)
17:31:32.843224 IP 192.168.0.24  192.168.0.7: ESP(spi=0x07ed3ddc,seq=0x7)
17:31:33.839475 IP 192.168.0.7  192.168.0.24: ESP(spi=0x72fea16a,seq=0x4)
17:31:33.985697 IP 192.168.0.24  192.168.0.7: ESP(spi=0x07ed3ddc,seq=0x8)
(...)

I also added two printks to check if schedule_work is executed:


diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 7cf48aa..f255e97 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -431,6 +431,9 @@ void xfrm_state_insert(struct xfrm_state
spin_lock_bh(xfrm_state_lock);
__xfrm_state_insert(x);
spin_unlock_bh(xfrm_state_lock);
+

+ printk(Ole #1\n);

+   xfrm_state_gc_flush_bundles = 1;
+   schedule_work(xfrm_state_gc_work);
}
EXPORT_SYMBOL(xfrm_state_insert);

@@ -478,6 +481,11 @@ out:
spin_unlock_bh(xfrm_state_lock);
xfrm_state_put_afinfo(afinfo);

+   if (err == 0) {

+ printk(Ole #2\n);

+   xfrm_state_gc_flush_bundles = 1;
+   schedule_work(xfrm_state_gc_work);
+   }
+
if (x1) {
xfrm_state_delete(x1);
xfrm_state_put(x1);



At 17:31:26 kernel executed the one from xfrm_state_add() (Ole #2) but it 
didn't help. :(


Sorry, it took me so long but now I have everything ready so I can make 
more tests.


Best regards,

Krzysztof Olędzki

Re: [PATCH]: Re: SA switchover

2005-12-16 Thread Krzysztof Oledzki



On Thu, 15 Dec 2005, David S. Miller wrote:


From: David S. Miller [EMAIL PROTECTED]
Date: Thu, 15 Dec 2005 17:52:54 -0800 (PST)


diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 7cf48aa..25dd8f4 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c


Sorry, that patch was incomplete, please try this one instead:

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 7cf48aa..f255e97 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c

CUT

Thank you! Will test ASAP. Need day or two, I need to reassemble my IPSec 
netlab. ;)


Best regards,

Krzysztof Olędzki

Re: [Ipsec-tools-devel] Re: SA switchover

2005-12-15 Thread Krzysztof Oledzki



On Thu, 15 Dec 2005, jamal wrote:

Agree. It is a _workaround_;- A good one in my opinion. Given that it
works for CISCOs, a very large piece of the problem is resolved, no?

No, again, it does not help. I explained it in my previous mail.


It will help 100% of the time _if you know_ you have CISCOs on the other
end and you configure racoon with that in mind. In other words it doesnt
matter who the initiator/responder is in this case.
It does matter. This problem does not exist when cisco acts as responder, 
this problem does not exist when linksys acts as initiator.



Do you disagree with this?

Yes ;)


Other people who have tried the patch dont seem to agree with your
thesis.


Not sure:

-- Forwarded message BEGIN --
Date: Mon, 21 Nov 2005 15:31:54 +0100
From: [EMAIL PROTECTED]
To: jamal [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: [Ipsec-tools-devel] Forcing SA soft limit?


I could give you a patch that forces the soft limit. I have not tested
it and have not seen interest from the racoon folks to incorporate it.
Talk to me privately.


Unfortunately setting the soft limit does not solve my problem. I
tried recompiling from the sources setting my wanted soft limit.

The problem turned out to be the peer (a cheapo DrayTek Vigor
2500 router) which discard the old SA before the hard limit
expires and without agreeing the revoke with Racoon.
-- Forwarded message END --

Very simple and accurate explanation.


You are the one who opened the bug - have you tried the patch? ;-

It adds very useful feature but does not solve my problem.


Well, as strange as this may sound, actually it may not be that
unreasonable to dynamically make the policy decision;- we know which
devices have problems.

Really? I don't think so.



OK, let me ask you this: When you configure use new SA - are you
making assumption about what is on the other end? in other words, you
have knowledge of the end device to assume it will start accepting the
new SA immediately.

Sometime yes, sometime no. Generally: no.

Is there no way in racoon that someone will get the vendor id in an 
external C program or shell script and then they reconfigure IKE 
parameters for that peer only? If yes, then one could create a little 
script or C program that sets the softime for ciscos.
But what if the same problem exists in other IPSec implementations that 
can not be detected by vendor ID?


then you will need to use the admins brain as a last resort i.e.
no different than you making the assumption that the other end is
respecting use new SA.

In any case - what we need to do is fix this issue and not argue
semantics of the RFC. IMO, its a screw up in the RFC definition.


True. I can accept any fix, as long as it is going to _solve_ the problem. 
For me both kernel or racoon fixes are totally fine. But please notice 
that dirty workarounds are not going to fix this and I alredy have one 
(echo -ne -1  /proc/sys/net/ipv4/route/flush after negotiating each new 
SA). It works and it is ugly. Very ugly. ;)


Best regards,

Krzysztof Olędzki

Re: [Ipsec-tools-devel] Re: SA switchover

2005-12-15 Thread Krzysztof Oledzki



On Thu, 15 Dec 2005, David S. Miller wrote:


1) I don't understand how a routing cache flush fixes the problem.
  The routing cache flush only marks non-IPSEC cached routes as
  invalid, not IPSEC ones.


New IPsec SA is used for communication between new src/dst (previously 
unseend) pair even if old SA exist. Only communication for src/dst, which 
was previously active, is stucked with old SA.


I was also surprised that routing cache flush helps but it really works 
and I have used this workaround for more than three months.


It looks like XFRM caches that information, so kernel does need to search 
whole SADB for each packet and this is the reason why usage of old SA is 
observed. This is my theory only, someone who wrote XFRM probably knows 
this for sure.


Best regards,

Krzysztof Olędzki

Re: [PATCH 1/1] [NETFILTER] ip_conntrack: fix ftp/irc/tftp helpers on ports = 32768

2005-11-18 Thread Krzysztof Oledzki



On Thu, 17 Nov 2005, David S. Miller wrote:


From: Harald Welte [EMAIL PROTECTED]
Date: Tue, 15 Nov 2005 11:03:51 +0100


[NETFILTER] ip_conntrack: fix ftp/irc/tftp helpers on ports = 32768

Since we've converted the ftp/irc/tftp helpers to use the new
module_parm_array() some time ago, we ware accidentially using signed data
types - thus preventing those modules from being used on ports = 32768.

This patch fixes it by using 'ushort' module parameters.

Thanks to Jan Nijs for reporting this bug.

Signed-off-by: Harald Welte [EMAIL PROTECTED]


Applied, thanks.

I think this is definitely a 2.6.14-stable candidate?


What about patch that fixes vlan with bonding?

http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8e3babcd69ec0fde874838e276eb0b211c6a5647

I think we should fix this in 2.6.14-stable also.

Best regards,

Krzysztof Olędzki

Re: [PATCH 1/1] [NETFILTER] ip_conntrack: fix ftp/irc/tftp helpers on ports = 32768

2005-11-18 Thread Krzysztof Oledzki



On Fri, 18 Nov 2005, David S. Miller wrote:


From: Krzysztof Oledzki [EMAIL PROTECTED]
Date: Fri, 18 Nov 2005 09:43:27 +0100 (CET)


What about patch that fixes vlan with bonding?

http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8e3babcd69ec0fde874838e276eb0b211c6a5647

I think we should fix this in 2.6.14-stable also.


That's network device stuff, ask Jeff Garzik and the patch
submitter,

I had already did that (15 Nov 2005) but it seems I was ignored.


not me.

OK, please excuse me.

Best regards,

Krzysztof Olędzki

Re: [PATCH 2.6.14] bonding: fix feature consolidation

2005-11-15 Thread Krzysztof Oledzki



On Fri, 4 Nov 2005, Jay Vosburgh wrote:



This should resolve http://bugzilla.kernel.org/show_bug.cgi?id=5519

The current feature computation loses bits that it doesn't know
about, resulting in an inability to add VLANs and possibly other havoc.
Rewrote function to preserve bits it doesn't know about, remove an
unneeded state variable, and simplify the code.

-J


Could we have this fix in next -stable for 2.6.14, please?

Best regards,


Krzysztof Olędzki

Re: [PATCH 2.6.14 0/18] Yet Another Bonding Sysfs patchset

2005-11-11 Thread Krzysztof Oledzki



On Wed, 9 Nov 2005, Mitch Williams wrote:


Jay says he's finally ready to take this patch.  So here we go again.

Rebased against 2.6.14 final.  Which turned out to be way more work than I
expected.


Is this patchset going to be included in 2.6.15?

Best regards,

Krzysztof Olędzki

Re: Fw: [Bugme-new] [Bug 5194] New: IPSec related OOps in 2.6.13

2005-09-06 Thread Krzysztof Oledzki



On Tue, 6 Sep 2005, Herbert Xu wrote:


On Tue, Sep 06, 2005 at 04:08:56AM -0700, Andrew Morton wrote:


Problem Description:

Oops:  [#1]
PREEMPT
Modules linked in:
CPU:0
EIP:0060:[c01f562c]Not tainted VLI
EFLAGS: 00010216   (2.6.13)
EIP is at sha1_update+0x7c/0x160


Thanks for the report.  Matt LaPlante had exactly the same problem
a couple of days ago.  I've tracked down now to my broken crypto
cipher wrapper functions which will step over a page boundary if
it's not aligned correctly.


[CRYPTO] Fix boundary check in standard multi-block cipher processors


Thanks. Patched my kernel, recompiled and waiting. So far it is OK,

Should this patch be merged into 2.6.13.1?

Best regards,

Krzysztof Olędzki


Re: Fw: Re: [Bugme-new] [Bug 4952] New: IPSec incompabilty. Linux kernel waits to long to start using new SA for outbound traffic.

2005-08-02 Thread Krzysztof Oledzki



On Tue, 2 Aug 2005, Patrick McHardy wrote:


Krzysztof Oledzki wrote:



On Mon, 1 Aug 2005, Herbert Xu wrote:


On Mon, Aug 01, 2005 at 05:46:26AM +0200, Krzysztof Oledzki wrote:



Any new patches to test? ;)



As I said in an earlier message, you should patch racoon to delete
the old *outbound* SA when the new SA has been negotiated.



Did not receive this one, sorry :(. However, the same question was asked
to racoon developers and the answer was, that it is kernel job. They
even pointed that KAME IPSec stack can be tuned to (or not to) prefer
old SA.


The kernel's job is to use a valid SA.


Again... RFC 2408 says: A protocol implementation SHOULD begin using the 
newly created SA for outbound traffic and SHOULD continue to support 
incoming traffic on the old SA until it is deleted or until traffic is 
received under the protection of the newly created SA. - Section 4.3.



In this case both are valid and the peer is buggy.


The problem is the word SHOULD and IMHO both Linux and the peer are buggy.

So I think the suggestion to work around this in the keying daemons is 
not unreasonable.


There is no need to work around this on *BSD (KAME stack) and the keying 
daemon is exactly the same for both Linux and *BSD.



Best regards,

Krzysztof Olędzki

Re: [Bugme-new] [Bug 4952] New: IPSec incompabilty. Linux kernel waits to long to start using new SA for outbound traffic.

2005-08-02 Thread Krzysztof Oledzki



On Tue, 2 Aug 2005, Herbert Xu wrote:


On Mon, Aug 01, 2005 at 10:41:33AM +0200, Krzysztof Oledzki wrote:


RFC 2408 says: A protocol implementation SHOULD begin using the newly
created SA for outbound traffic and SHOULD continue to support incoming
traffic on the old SA until it is deleted or until traffic is received
under the protection of the newly created SA. - Section 4.3.

The problem is the word SHOULD and IMHO both Linux and peer are buggy.


The protocol implementation is made up of a kernel component as well as
a user-space component.  IMHO this should be done where it's easiest.


IMHO userland is not to supposed solve kernel issues.

Best regards,

Krzysztof Olędzki