s390 allmodconfig

2007-03-02 Thread Andrew Morton

Not sure who to blame for all of this...

net/bluetooth/hidp/Kconfig:4:warning: 'select' used by config symbol 'BT_HIDP' 
refer to undefined symbol 'HID'
net/mac80211/Kconfig:17:warning: 'select' used by config symbol 'MAC80211_LEDS' 
refer to undefined symbol 'NEW_LEDS'
net/mac80211/Kconfig:18:warning: 'select' used by config symbol 'MAC80211_LEDS' 
refer to undefined symbol 'LEDS_TRIGGERS'
drivers/net/Kconfig:1435:warning: 'select' used by config symbol 'B44' refer to 
undefined symbol 'SSB'
drivers/net/wireless/bcm43xx/Kconfig:5:warning: 'select' used by config symbol 
'BCM43XX' refer to undefined symbol 'HW_RANDOM'
drivers/net/wireless/mac80211/bcm43xx/Kconfig:13:warning: 'select' used by 
config symbol 'BCM43XX_MAC80211_PCI' refer to undefined symbol 'SSB_PCIHOST'
drivers/net/wireless/mac80211/bcm43xx/Kconfig:14:warning: 'select' used by 
config symbol 'BCM43XX_MAC80211_PCI' refer to undefined symbol 
'SSB_DRIVER_PCICORE'
drivers/net/wireless/mac80211/bcm43xx/Kconfig:27:warning: 'select' used by 
config symbol 'BCM43XX_MAC80211_PCMCIA' refer to undefined symbol 
'SSB_PCMCIAHOST'
drivers/net/wireless/mac80211/bcm43xx/Kconfig:5:warning: 'select' used by 
config symbol 'BCM43XX_MAC80211' refer to undefined symbol 'SSB'
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: s390 allmodconfig

2007-03-02 Thread Andrew Morton
net/mac80211/ieee80211_led.c: In function 'ieee80211_led_init':
net/mac80211/ieee80211_led.c:38: error: invalid application of 'sizeof' to 
incomplete type 'struct led_trigger' 
net/mac80211/ieee80211_led.c:43: error: dereferencing pointer to incomplete type
net/mac80211/ieee80211_led.c:44: warning: implicit declaration of function 
'led_trigger_register'
net/mac80211/ieee80211_led.c:49: error: invalid application of 'sizeof' to 
incomplete type 'struct led_trigger' 
net/mac80211/ieee80211_led.c:54: error: dereferencing pointer to incomplete type
net/mac80211/ieee80211_led.c: In function 'ieee80211_led_exit':
net/mac80211/ieee80211_led.c:64: warning: implicit declaration of function 
'led_trigger_unregister'

akpm2:/usr/src/25 grep LED .config
CONFIG_NF_CONNTRACK_ENABLED=m
CONFIG_MAC80211_LEDS=y

Probably related to the Kconfig problems.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Extensible hashing and RCU

2007-03-02 Thread Evgeniy Polyakov
On Sat, Feb 17, 2007 at 04:13:02PM +0300, Evgeniy Polyakov ([EMAIL PROTECTED]) 
wrote:
  I noticed in an LCA talk mention that apprently extensible hashing
  with RCU access is an unsolved problem.  Here's an idea for solving it.
  
  
  Yes, I have been playing around with the same idea for
  doing dynamic resizing of the TCP hashtable.
  
  Did a prototype toy implementation, and I have a
  half-finished patch which resizes the TCP hashtable
  at runtime. Hmmm, your mail may be the impetus to get
  me to finally finish this thing
 
 Why anyone do not want to use trie - for socket-like loads it has
 exactly constant search/insert/delete time and scales as hell.

Ok, I've ran an analysis of linked lists and trie traversals and found 
that (at least on x86) optimized one list traversal is about 4 (!) 
times faster than one bit lookup in trie traversal (or actually one
lookup in binary tree-like structure) - that is because of the fact 
that trie traversal needs to have more instructions per lookup, and at 
least one additional branch which can not be predicted.

Tests with rdtsc shows that one bit lookup in trie (actually it is any
lookup in binary tree structures) is about 3-4 times slower than one
lookup in linked list.

Since hash table usually has upto 4 elements in each hash entry,
competing binary tree/trie stucture must get an entry in one lookup,
which is essentially impossible with usual tree/trie implementations.

Things dramatically change when linked list became too long, but it
should not happend with proper resizing of the hash table, wildcards
implementation also introduce additional requirements, which can not be
easily solved in hash tables.

So I get my words about tree/trie implementation instead of hash table 
for socket lookup back.

Interested reader can find more details on tests, asm outputs and
conclusions at:
http://tservice.net.ru/~s0mbre/blog/2007/03/01#2007_03_01

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: need some help on a backport of r8169

2007-03-02 Thread Pascal GREGIS
[EMAIL PROTECTED] a écrit, le Thu 01 Mar 2007 à 10:57:11AM :
 Hello Ueimor,
 [...] 
  Once you have logged the ifconfig/ethtool dump, you can try the serie
  or the patch at:
  
  http://www.fr.zoreil.com/people/francois/backport/r8169/20070228-00
 Hum... ok I might have enough time to check it, not sure though, I
 have a point with my boss this morning.
Indeed I wasn't able to test it yesterday. I won't be able today so,
the hardware being required for other tests, but don't worry, I don't
forget you, I'll test it as soon as I can, probably next week.

 
  
  Btw:
  
  [...dmesg dump...]
   Enabling fast FPU save and restore... done.
   Enabling unmasked SIMD FPU exception support... done.
   Checking 'hlt' instruction... OK.
   ACPI: setting ELCR to 0200 (from 0c08)
   NET: Registered protocol family 16
   PCI: PCI BIOS revision 3.00 entry at 0xf0031, last bus=2
   PCI: Using MMCONFIG
  
  Please disable MMCONFIG.
 In the BIOS?
 
  
  If you have any PCI latency option in your bios, set it to 64.
 I'm not the BIOS-master, I'll suggest it.
 
  
  -- 
  Ueimor
 
 Sigerg
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Arp announce (for Xen)

2007-03-02 Thread Pekka Savola

On Thu, 1 Mar 2007, Stephen Hemminger wrote:

What about implementing the unused arp_announce flag on the inetdevice?
Something like the following.  Totally untested...

Looks like it either was there (and got removed) or was planned but
never implemented.


If something like this goes in, it wouldn't hurt to do similar with 
IPv6 (RFC2461 section 7.2.6).


There are very popular hardware-based routers which refresh their NDP 
caches only every 24 hours or 20 minutes (depending on the software 
version).  Sending unsolicited NAs would eliminate traffic 
blackholing.



diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index e10794d..cefc339 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1089,6 +1089,16 @@ static int inetdev_event(struct notifier
}
}
ip_mc_up(in_dev);
+   /* fallthru */
+
+   case NETDEV_CHANGEADDR:
+   /* Send gratuitous ARP in case of address change or new device 
*/
+   if (IN_DEV_ARP_ANNOUNCE(in_dev))
+   arp_send(ARPOP_REQUEST, ETH_P_ARP,
+in_dev-ifa_list-ifa_address, dev,
+in_dev-ifa_list-ifa_address, NULL,
+dev-dev_addr, NULL);
+
break;
case NETDEV_DOWN:
ip_mc_down(in_dev);

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
Pekka Savola You each name yourselves king, yet the
Netcore Oykingdom bleeds.
Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Extensible hashing and RCU

2007-03-02 Thread Eric Dumazet
On Friday 02 March 2007 09:52, Evgeniy Polyakov wrote:

 Ok, I've ran an analysis of linked lists and trie traversals and found
 that (at least on x86) optimized one list traversal is about 4 (!)
 times faster than one bit lookup in trie traversal (or actually one
 lookup in binary tree-like structure) - that is because of the fact
 that trie traversal needs to have more instructions per lookup, and at
 least one additional branch which can not be predicted.

 Tests with rdtsc shows that one bit lookup in trie (actually it is any
 lookup in binary tree structures) is about 3-4 times slower than one
 lookup in linked list.

 Since hash table usually has upto 4 elements in each hash entry,
 competing binary tree/trie stucture must get an entry in one lookup,
 which is essentially impossible with usual tree/trie implementations.

 Things dramatically change when linked list became too long, but it
 should not happend with proper resizing of the hash table, wildcards
 implementation also introduce additional requirements, which can not be
 easily solved in hash tables.

 So I get my words about tree/trie implementation instead of hash table
 for socket lookup back.

 Interested reader can find more details on tests, asm outputs and
 conclusions at:
 http://tservice.net.ru/~s0mbre/blog/2007/03/01#2007_03_01

Thank you for this report. (Still avoiding cache misses studies, while they 
obviously are the limiting factor)

Anyqay, if data is in cache and you want optimum performance from your cpu,
you may try to use an algorithm without conditional branches :
(well 4 in this case for the whole 32 bits tests)

gcc -O2 -S -march=i686 test1.c

struct node {
struct node *left;
struct node *right;
int value;
};
struct node *head;
int v1;

#define PASS2(bit) \
n2 = n1-left; \
right = n1-right; \
if (value  (1bit)) \
n2 = right; \
n1 = n2-left; \
right = n2-right; \
if (value  (2bit)) \
n1 = right;

main()
{
int j;
unsigned int value = v1;
struct node *n1 = head, *n2, *right;
for (j=0; j4; ++j) {
PASS2(0)
PASS2(2)
PASS2(4)
PASS2(6)
value = 8;
}
printf(result=%p\n, n1);
}
.file   test1.c
.section.rodata.str1.1,aMS,@progbits,1
.LC0:
.string result=%p\n
.text
.p2align 4,,15
.globl main
.type   main, @function
main:
leal4(%esp), %ecx
andl$-16, %esp
pushl   -4(%ecx)
pushl   %ebp
movl%esp, %ebp
pushl   %ebx
xorl%ebx, %ebx
pushl   %ecx
subl$16, %esp
movlv1, %ecx
movlhead, %edx
.p2align 4,,7
.L2:
movl4(%edx), %eax
testb   $1, %cl
cmove   (%edx), %eax
testb   $2, %cl
movl4(%eax), %edx
cmove   (%eax), %edx
testb   $4, %cl
movl4(%edx), %eax
cmove   (%edx), %eax
testb   $8, %cl
movl4(%eax), %edx
cmove   (%eax), %edx
testb   $16, %cl
movl4(%edx), %eax
cmove   (%edx), %eax
testb   $32, %cl
movl4(%eax), %edx
cmove   (%eax), %edx
testb   $64, %cl
movl4(%edx), %eax
cmove   (%edx), %eax
testb   %cl, %cl
movl4(%eax), %edx
cmovns  (%eax), %edx
addl$1, %ebx
cmpl$4, %ebx
je  .L19
shrl$8, %ecx
jmp .L2
.p2align 4,,7
.L19:
movl%edx, 4(%esp)
movl$.LC0, (%esp)
callprintf
addl$16, %esp
popl%ecx
popl%ebx
popl%ebp
leal-4(%ecx), %esp
ret
.size   main, .-main
.comm   head,4,4
.comm   v1,4,4
.ident  GCC: (GNU) 4.1.2 20060928 (prerelease) (Ubuntu 
4.1.1-13ubuntu5)
.section.note.GNU-stack,,@progbits


Re: CLOCK_MONOTONIC datagram timestamps by the kernel

2007-03-02 Thread Eric Dumazet
On Friday 02 March 2007 10:26, John wrote:
 Eric Dumazet wrote:

  Anyway, if you want to play, you can apply this patch on top of
  linux-2.6.21-rc2  (nanosecond resolution infrastructure needs 2.6.21)
  I let you do the adjustments for rt kernel.

 Why does it require 2.6.21?

Well, this patch was done on top of the latest kernel for obvious practical 
reasons, but you probably can adapt it on the kernel of your choice.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CLOCK_MONOTONIC datagram timestamps by the kernel

2007-03-02 Thread John

Eric Dumazet wrote:


John wrote:


Consider an idle Linux 2.6.20-rt8 system, equipped with a single PCI-E
gigabit Ethernet NIC, running on a modern CPU (e.g. Core 2 Duo E6700).
All this system does is time stamp 1000 packets per second.

Are you claiming that this platform *cannot* handle most packets within
less than 1 microsecond of their arrival?


Yes I claim it. You expect too much of this platform, unless most means
10 % for you ;)


By most I meant more than 50%.

Has someone tried to measure interrupt latency in Linux? I'd like to 
plot the distribution of network IRQ to interrupt handler latencies.


If you replace 1 us by 50 us, then yes, it probably can do it, if most 
means 99%, (not 99.999 %)


I think we need cold, hard numbers at this point :-)

Anyway, if you want to play, you can apply this patch on top of 
linux-2.6.21-rc2  (nanosecond resolution infrastructure needs 2.6.21)

I let you do the adjustments for rt kernel.


Why does it require 2.6.21?


This patch converts sk_buff timestamp to use new nanosecond infra
(added in 2.6.21)


Is this mentioned somewhere in the 2.6.21-rc1 ChangeLog?
http://kernel.org/pub/linux/kernel/v2.6/testing/ChangeLog-2.6.21-rc1

Regards.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: s390 allmodconfig

2007-03-02 Thread Richard Purdie
On Fri, 2007-03-02 at 00:25 -0800, Andrew Morton wrote:
 net/mac80211/ieee80211_led.c: In function 'ieee80211_led_init':
 net/mac80211/ieee80211_led.c:38: error: invalid application of 'sizeof' to 
 incomplete type 'struct led_trigger' 
 net/mac80211/ieee80211_led.c:43: error: dereferencing pointer to incomplete 
 type
 net/mac80211/ieee80211_led.c:44: warning: implicit declaration of function 
 'led_trigger_register'
 net/mac80211/ieee80211_led.c:49: error: invalid application of 'sizeof' to 
 incomplete type 'struct led_trigger' 
 net/mac80211/ieee80211_led.c:54: error: dereferencing pointer to incomplete 
 type
 net/mac80211/ieee80211_led.c: In function 'ieee80211_led_exit':
 net/mac80211/ieee80211_led.c:64: warning: implicit declaration of function 
 'led_trigger_unregister'
 
 akpm2:/usr/src/25 grep LED .config
 CONFIG_NF_CONNTRACK_ENABLED=m
 CONFIG_MAC80211_LEDS=y
 
 Probably related to the Kconfig problems.

Almost certainly. Someone is building some LED trigger/driver without
the LED core enabled which is what that Kconfig warning was about.

Nobody's ever mentioned this driver to me...

Richard
(LED Maintainer)





-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Extensible hashing and RCU

2007-03-02 Thread Evgeniy Polyakov
On Fri, Mar 02, 2007 at 10:56:23AM +0100, Eric Dumazet ([EMAIL PROTECTED]) 
wrote:
 On Friday 02 March 2007 09:52, Evgeniy Polyakov wrote:
 
  Ok, I've ran an analysis of linked lists and trie traversals and found
  that (at least on x86) optimized one list traversal is about 4 (!)
  times faster than one bit lookup in trie traversal (or actually one
  lookup in binary tree-like structure) - that is because of the fact
  that trie traversal needs to have more instructions per lookup, and at
  least one additional branch which can not be predicted.
 
  Tests with rdtsc shows that one bit lookup in trie (actually it is any
  lookup in binary tree structures) is about 3-4 times slower than one
  lookup in linked list.
 
  Since hash table usually has upto 4 elements in each hash entry,
  competing binary tree/trie stucture must get an entry in one lookup,
  which is essentially impossible with usual tree/trie implementations.
 
  Things dramatically change when linked list became too long, but it
  should not happend with proper resizing of the hash table, wildcards
  implementation also introduce additional requirements, which can not be
  easily solved in hash tables.
 
  So I get my words about tree/trie implementation instead of hash table
  for socket lookup back.
 
  Interested reader can find more details on tests, asm outputs and
  conclusions at:
  http://tservice.net.ru/~s0mbre/blog/2007/03/01#2007_03_01
 
 Thank you for this report. (Still avoiding cache misses studies, while they 
 obviously are the limiting factor)
 
 Anyqay, if data is in cache and you want optimum performance from your cpu,
 you may try to use an algorithm without conditional branches :
 (well 4 in this case for the whole 32 bits tests)

Tests were always for no-cache-miss case.
I also ran them in kenel mode (to eliminate tlb flushes per rescheduling
and to get into account that kernel tlb covers 8mb while userspace only
4k), but results were essentially the same (modulo several percents). I
only tested trie, in my impementation its memory usage is smaller than
hash table for 2^20 entries.

 gcc -O2 -S -march=i686 test1.c
 

 struct node {
   struct node *left;
   struct node *right;
   int value;
   };
 struct node *head;
 int v1;
 
 #define PASS2(bit) \
   n2 = n1-left; \
   right = n1-right; \
 if (value  (1bit)) \
 n2 = right; \
   n1 = n2-left; \
   right = n2-right; \
   if (value  (2bit)) \
   n1 = right;
 
 main()
 {
 int j;
 unsigned int value = v1;
 struct node *n1 = head, *n2, *right;
 for (j=0; j4; ++j) {
   PASS2(0)
   PASS2(2)
   PASS2(4)
   PASS2(6)
   value = 8;
   }
 printf(result=%p\n, n1);
 }

This one resulted in 10*4 and 2*4 branches per loop.
So total 32 branches (instead of 64 in simpler code) and 160
instructions (instead of 128 in simpler code).
Getting that branch is two times longer to execute (though it is quite
strange sentence, but I must admit, that I did not read x86 processor
manual at all (only ppc32)) according to tests, we do not get any gain
for 32bit value (32 lookups): 64*2+128 in old case, 32*2+160 in new one.

I also have advanced trie implementation, which caches values in nodes
if there are no child entries, and it _greatly_ decrease number of
lookups and memory usage for smaller sets, but in long run and huge 
amount of entries in trie, it does not matter since only the 
lowest layer caches values.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: s390 allmodconfig

2007-03-02 Thread Johannes Berg
On Fri, 2007-03-02 at 00:25 -0800, Andrew Morton wrote:

 Probably related to the Kconfig problems.

Yeah, it is. s390 is funny, it doesn't include drivers/Kconfig, I don't
think anybody of us would have suspected that.

There doesn't seem to be a reason why it shouldn't have drivers/leds
though. drivers/ssb I don't know about, does s390 have pci or pcmcia?
And the bluetooth stuff is also plain weird, I suppose s390 really
should include drivers/hid/Kconfig :)

Same with drivers/char that includes hw_random.

Is there any reason it isn't including drivers/Kconfig? 


I can offer below patch to fix the LED trigger problem, it's probably
cleaner to depend on LEDS_TRIGGERS rather than selecting it and
NEW_LEDS.

johannes

--- wireless-dev.orig/net/mac80211/Kconfig  2007-03-02 11:18:45.464333268 
+0100
+++ wireless-dev/net/mac80211/Kconfig   2007-03-02 11:33:24.534333268 +0100
@@ -13,9 +13,7 @@ config MAC80211
 
 config MAC80211_LEDS
bool Enable LED triggers
-   depends on MAC80211
-   select NEW_LEDS
-   select LEDS_TRIGGERS
+   depends on MAC80211  LEDS_TRIGGERS
---help---
This option enables a few LED triggers for different
packet receive/transmit events.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: s390 allmodconfig

2007-03-02 Thread Andrew Morton
On Fri, 02 Mar 2007 10:32:32 + Richard Purdie [EMAIL PROTECTED] wrote:

 On Fri, 2007-03-02 at 00:25 -0800, Andrew Morton wrote:
  net/mac80211/ieee80211_led.c: In function 'ieee80211_led_init':
  net/mac80211/ieee80211_led.c:38: error: invalid application of 'sizeof' to 
  incomplete type 'struct led_trigger' 
  net/mac80211/ieee80211_led.c:43: error: dereferencing pointer to incomplete 
  type
  net/mac80211/ieee80211_led.c:44: warning: implicit declaration of function 
  'led_trigger_register'
  net/mac80211/ieee80211_led.c:49: error: invalid application of 'sizeof' to 
  incomplete type 'struct led_trigger' 
  net/mac80211/ieee80211_led.c:54: error: dereferencing pointer to incomplete 
  type
  net/mac80211/ieee80211_led.c: In function 'ieee80211_led_exit':
  net/mac80211/ieee80211_led.c:64: warning: implicit declaration of function 
  'led_trigger_unregister'
  
  akpm2:/usr/src/25 grep LED .config
  CONFIG_NF_CONNTRACK_ENABLED=m
  CONFIG_MAC80211_LEDS=y
  
  Probably related to the Kconfig problems.
 
 Almost certainly. Someone is building some LED trigger/driver without
 the LED core enabled which is what that Kconfig warning was about.
 
 Nobody's ever mentioned this driver to me...
 

It's a mountain of new wireless code in the just-released 2.6.21-rc2-mm1.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: s390 allmodconfig

2007-03-02 Thread Andrew Morton
On Fri, 02 Mar 2007 11:38:24 +0100 Johannes Berg [EMAIL PROTECTED] wrote:

 On Fri, 2007-03-02 at 00:25 -0800, Andrew Morton wrote:
 
  Probably related to the Kconfig problems.
 
 Yeah, it is. s390 is funny, it doesn't include drivers/Kconfig, I don't
 think anybody of us would have suspected that.
 
 There doesn't seem to be a reason why it shouldn't have drivers/leds
 though. drivers/ssb I don't know about, does s390 have pci or pcmcia?

No, s390 doesn't have PCI.

 And the bluetooth stuff is also plain weird, I suppose s390 really
 should include drivers/hid/Kconfig :)
 
 Same with drivers/char that includes hw_random.
 
 Is there any reason it isn't including drivers/Kconfig? 
 

s390 is weird ;)   There's no way it'll support any of the hardware which you're
working on (until they release the s390 laptop).  So all we really want to
do here is to avoid breaking s390 allmodconfig.
 
 I can offer below patch to fix the LED trigger problem, it's probably
 cleaner to depend on LEDS_TRIGGERS rather than selecting it and
 NEW_LEDS.
 
 johannes
 
 --- wireless-dev.orig/net/mac80211/Kconfig2007-03-02 11:18:45.464333268 
 +0100
 +++ wireless-dev/net/mac80211/Kconfig 2007-03-02 11:33:24.534333268 +0100
 @@ -13,9 +13,7 @@ config MAC80211
  
  config MAC80211_LEDS
   bool Enable LED triggers
 - depends on MAC80211
 - select NEW_LEDS
 - select LEDS_TRIGGERS
 + depends on MAC80211  LEDS_TRIGGERS
   ---help---
   This option enables a few LED triggers for different
   packet receive/transmit events.

OK, I'll try that, thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: s390 allmodconfig

2007-03-02 Thread Johannes Berg
On Fri, 2007-03-02 at 03:06 -0800, Andrew Morton wrote:

 No, s390 doesn't have PCI.

Ok.

 s390 is weird ;)   There's no way it'll support any of the hardware which 
 you're
 working on (until they release the s390 laptop).  So all we really want to
 do here is to avoid breaking s390 allmodconfig.

Alright. I think we'll probably have to make bcm43xx and b44 depend on
SSB instead of selecting it like the LED trigger stuff below.

But I don't see why s390 can't include hw random, led trigger or even
hid, those are all software features afaict.
 

 OK, I'll try that, thanks.

Not that it'll actually help get the compile through... bcm43xx will
drop fail and bluetooth probably as well.

johannes


signature.asc
Description: This is a digitally signed message part


Re: s390 allmodconfig

2007-03-02 Thread Andrew Morton
On Fri, 02 Mar 2007 12:11:48 +0100 Johannes Berg [EMAIL PROTECTED] wrote:

 On Fri, 2007-03-02 at 03:06 -0800, Andrew Morton wrote:
 
  No, s390 doesn't have PCI.
 
 Ok.
 
  s390 is weird ;)   There's no way it'll support any of the hardware which 
  you're
  working on (until they release the s390 laptop).  So all we really want to
  do here is to avoid breaking s390 allmodconfig.
 
 Alright. I think we'll probably have to make bcm43xx and b44 depend on
 SSB instead of selecting it like the LED trigger stuff below.
 
 But I don't see why s390 can't include hw random, led trigger or even
 hid, those are all software features afaict.
  
 
  OK, I'll try that, thanks.
 
 Not that it'll actually help get the compile through... bcm43xx will
 drop fail and bluetooth probably as well.
 

OK, thanks.

fwiw, http://userweb.kernel.org/~akpm/cross-compilers/ has an s390
cross-compiler binary.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [TCP]: FRTO undo response falls back to ratehalving one if ECEd

2007-03-02 Thread Ilpo Järvinen
Undoing ssthresh is disabled in fastretrans_alert whenever
FLAG_ECE is set by clearing prior_ssthresh. This clearing does
not protect FRTO because FRTO operates before fastretrans_alert.
Moving the clearing of prior_ssthresh earlier seems to be a
suboptimal solution to the FRTO case because then FLAG_ECE will
cause a second ssthresh reduction in try_to_open (the first
occurred when FRTO was entered). So instead, FRTO falls back
immediately to the rate halving response, which switches TCP to
CA_CWR state preventing the latter reduction of ssthresh.

If the first ECE arrived before the ACK after which FRTO is able
to decide RTO as spurious, prior_ssthresh is already cleared.
Thus no undoing for ssthresh occurs. Besides, FLAG_ECE should be
set also in the following ACKs resulting in rate halving response
that sees TCP already in CA_CWR, which again prevents an extra
ssthresh reduction on that round-trip.

If the first ECE arrived before RTO, ssthresh has already been
adapted and prior_ssthresh remains cleared on entry because TCP
is in CA_CWR (the same applies also to a case where FRTO is
entered more than once and ECE comes in the middle).

I believe that after this patch, FRTO should be ECN-safe and
even able to take advantage of synergy benefits.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_input.c |9 ++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index dc221a3..bdd6172 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2592,9 +2592,12 @@ static void tcp_ratehalving_spur_to_resp
tp-high_seq = tp-frto_highmark;   /* Smoother w/o this? - ij */
 }
 
-static void tcp_undo_spur_to_response(struct sock *sk)
+static void tcp_undo_spur_to_response(struct sock *sk, int flag)
 {
-   tcp_undo_cwr(sk, 1);
+   if (flagFLAG_ECE)
+   tcp_ratehalving_spur_to_response(sk);
+   else
+   tcp_undo_cwr(sk, 1);
 }
 
 /* F-RTO spurious RTO detection algorithm (RFC4138)
@@ -2680,7 +2683,7 @@ static int tcp_process_frto(struct sock 
return 1;
} else /* frto_counter == 2 */ {
switch (sysctl_tcp_frto_response) {
-   case 2: tcp_undo_spur_to_response(sk); break;
+   case 2: tcp_undo_spur_to_response(sk, flag); break;
case 1: tcp_conservative_spur_to_response(tp); break;
default: tcp_ratehalving_spur_to_response(sk); break;
}
-- 
1.4.2

[PATCH v2] [TCP]: FRTO undo response falls back to ratehalving one if ECEd

2007-03-02 Thread Ilpo Järvinen
Undoing ssthresh is disabled in fastretrans_alert whenever
FLAG_ECE is set by clearing prior_ssthresh. The clearing does
not protect FRTO because FRTO operates before fastretrans_alert.
Moving the clearing of prior_ssthresh earlier seems to be a
suboptimal solution to the FRTO case because then FLAG_ECE will
cause a second ssthresh reduction in try_to_open (the first
occurred when FRTO was entered). So instead, FRTO falls back
immediately to the rate halving response, which switches TCP to
CA_CWR state preventing the latter reduction of ssthresh.

If the first ECE arrived before the ACK after which FRTO is able
to decide RTO as spurious, prior_ssthresh is already cleared.
Thus no undoing for ssthresh occurs. Besides, FLAG_ECE should be
set also in the following ACKs resulting in rate halving response
that sees TCP is already in CA_CWR, which again prevents an extra
ssthresh reduction on that round-trip.

If the first ECE arrived before RTO, ssthresh has already been
adapted and prior_ssthresh remains cleared on entry because TCP
is in CA_CWR (the same applies also to a case where FRTO is
entered more than once and ECE comes in the middle).

High_seq must not be touched after tcp_enter_cwr because CWR
round-trip calculation depends on it.

I believe that after this patch, FRTO should be ECN-safe and
even able to take advantage of synergy benefits.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---

Of course I forgot to fix also the high_seq thing I had in mind last 
evening, so here is this again now with it too.


diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index dc221a3..6b268dc 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2587,14 +2587,15 @@ static void tcp_conservative_spur_to_res
  */
 static void tcp_ratehalving_spur_to_response(struct sock *sk)
 {
-   struct tcp_sock *tp = tcp_sk(sk);
tcp_enter_cwr(sk, 0);
-   tp-high_seq = tp-frto_highmark;   /* Smoother w/o this? - ij */
 }
 
-static void tcp_undo_spur_to_response(struct sock *sk)
+static void tcp_undo_spur_to_response(struct sock *sk, int flag)
 {
-   tcp_undo_cwr(sk, 1);
+   if (flagFLAG_ECE)
+   tcp_ratehalving_spur_to_response(sk);
+   else
+   tcp_undo_cwr(sk, 1);
 }
 
 /* F-RTO spurious RTO detection algorithm (RFC4138)
@@ -2680,7 +2681,7 @@ static int tcp_process_frto(struct sock 
return 1;
} else /* frto_counter == 2 */ {
switch (sysctl_tcp_frto_response) {
-   case 2: tcp_undo_spur_to_response(sk); break;
+   case 2: tcp_undo_spur_to_response(sk, flag); break;
case 1: tcp_conservative_spur_to_response(tp); break;
default: tcp_ratehalving_spur_to_response(sk); break;
}
-- 
1.4.2


Re: Network activity LED trigger

2007-03-02 Thread Florian Fainelli
Hi All,

Some more thoughts. The IDE activity LED trigger is currently triggered when a 
function is called in the IDE writing/reading routines.

In a similar way, we could call the trigger function in net/core/dev.c in 
netif_receive_skb and netif_rx ?

I was also thinking that some network NIC already have LEDs, so it is not 
necessary for those models to overload the user with lights everywhere.

Regars, Florian

Le jeudi 1 mars 2007, Florian Fainelli a écrit :
 Hi All,

 I have been talking a bit with Richard, who is the LED API maintainer, and
 a LED trigger based on network activity would be something great.

 There are somethings that concern the network stack :

 - should we specify if the network driver is allowed to contribute to
 the LED activity, just like it is done for random generation, at compile
 time

 - I would like to trigger the LED based on one or several network
 interfaces, maybe specify via sysfs which interface triggers which LED,
 and also maybe differentiate the layer-2 activity from the layer-3
 activity for instance

 - A led driver could by default be bound to a network driver, or an
 interface name

 As it could be very intrusive in the network stack, you might want to
 specify a bit more how you imagine a network activity trigger.

 Thanks
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network activity LED trigger

2007-03-02 Thread jamal

Where are these LEDs typically located? Are you talking about LEDs on a
network card for example? can you light them up in different colors?

cheers,
jamal

On Fri, 2007-02-03 at 13:58 +0100, Florian Fainelli wrote:
 Hi All,
 
 Some more thoughts. The IDE activity LED trigger is currently triggered when 
 a 
 function is called in the IDE writing/reading routines.
 
 In a similar way, we could call the trigger function in net/core/dev.c in 
 netif_receive_skb and netif_rx ?
 
 I was also thinking that some network NIC already have LEDs, so it is not 
 necessary for those models to overload the user with lights everywhere.
 
 R

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network activity LED trigger

2007-03-02 Thread Florian Fainelli
Hi,

Le vendredi 2 mars 2007, jamal a écrit :
 Where are these LEDs typically located? Are you talking about LEDs on a
 network card for example? can you light them up in different colors?

Those LEDS are typically controlled by GPIO lines visible in front of the 
device. It is mostly targeted to embedded devices for which you do not 
necessarily want to assign a LED to a given network interface


 cheers,
 jamal

 On Fri, 2007-02-03 at 13:58 +0100, Florian Fainelli wrote:
  Hi All,
 
  Some more thoughts. The IDE activity LED trigger is currently triggered
  when a function is called in the IDE writing/reading routines.
 
  In a similar way, we could call the trigger function in net/core/dev.c in
  netif_receive_skb and netif_rx ?
 
  I was also thinking that some network NIC already have LEDs, so it is not
  necessary for those models to overload the user with lights everywhere.
 
  R

 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Cordialement, Florian Fainelli
-
5, rue Charles Fourier
Chambre 1202
91011 Evry
http://www.alphacore.net
(+33) 01 60 76 64 21
(+33) 06 09 02 64 95
-
Association MiNET
http://www.minet.net
-
Institut National des Télécommunication
http://www.int-evry.fr/telecomint
-
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] tc35815 driver update (part 2)

2007-03-02 Thread Atsushi Nemoto
More updates for tc35815 driver, including:

* TX4939 support.
* NETPOLL support.
* NAPI support. (disabled by default)
* Reduce memcpy on receiving.
* PM support.
* Many cleanups and bugfixes.

Signed-off-by: Atsushi Nemoto [EMAIL PROTECTED]
---
 drivers/net/tc35815.c   |  827 +++---
 include/linux/pci_ids.h |1 
 2 files changed, 632 insertions(+), 196 deletions(-)

diff --git a/drivers/net/tc35815.c b/drivers/net/tc35815.c
index 0cf1f87..ec888db 100644
--- a/drivers/net/tc35815.c
+++ b/drivers/net/tc35815.c
@@ -38,9 +38,33 @@
  * Add workaround for 100MHalf HUB.
  * 1.22Minor fix.
  * 1.23Minor cleanup.
+ * 1.24Remove tc35815_setup since new stype option
+ * (tc35815.speed=10, etc.) can be used for 2.6 kernel.
+ * 1.25TX4939 support.
+ * 1.26Minor cleanup.
+ * 1.27Move TX4939 PCFG.SPEEDn control code out from this driver.
+ * Cleanup init_dev_addr. (NETDEV_REGISTER event notifier
+ * can overwrite dev_addr)
+ * support ETHTOOL_GPERMADDR.
+ * 1.28Minor cleanup.
+ * 1.29support netpoll.
+ * 1.30Minor cleanup.
+ * 1.31NAPI support. (disabled by default)
+ * Use DMA_RxAlign_2 if possible.
+ * Do not use PackedBuffer.
+ * Cleanup.
+ * 1.32Fix free buffer management on non-PackedBuffer mode.
+ * 1.33Fix netpoll build.
+ * 1.34Fix netpoll locking.  BH rule for NAPI is not enough with
+ * netpoll, hard_start_xmit might be called from irq context.
+ * PM support.
  */
 
-#define DRV_VERSION1.23
+#ifdef TC35815_NAPI
+#define DRV_VERSION1.34-NAPI
+#else
+#define DRV_VERSION1.34
+#endif
 static const char *version = tc35815.c:v DRV_VERSION \n;
 #define MODNAMEtc35815
 
@@ -71,23 +95,27 @@ static const char *version = tc35815.c:
 #define GATHER_TXINT   /* On-Demand Tx Interrupt */
 #define WORKAROUND_LOSTCAR
 #define WORKAROUND_100HALF_PROMISC
+/* #define TC35815_USE_PACKEDBUFFER */
 
 typedef enum {
TC35815CF = 0,
TC35815_NWU,
+   TC35815_TX4939,
 } board_t;
 
 /* indexed by board_t, above */
-static struct {
+static const struct {
const char *name;
 } board_info[] __devinitdata = {
{ TOSHIBA TC35815CF 10/100BaseTX },
{ TOSHIBA TC35815 with Wake on LAN },
+   { TOSHIBA TC35815/TX4939 },
 };
 
-static struct pci_device_id tc35815_pci_tbl[] = {
-   {PCI_VENDOR_ID_TOSHIBA_2, PCI_DEVICE_ID_TOSHIBA_TC35815CF, PCI_ANY_ID, 
PCI_ANY_ID, 0, 0, TC35815CF },
-   {PCI_VENDOR_ID_TOSHIBA_2, PCI_DEVICE_ID_TOSHIBA_TC35815_NWU, 
PCI_ANY_ID, PCI_ANY_ID, 0, 0, TC35815_NWU },
+static const struct pci_device_id tc35815_pci_tbl[] = {
+   {PCI_DEVICE(PCI_VENDOR_ID_TOSHIBA_2, PCI_DEVICE_ID_TOSHIBA_TC35815CF), 
.driver_data = TC35815CF },
+   {PCI_DEVICE(PCI_VENDOR_ID_TOSHIBA_2, 
PCI_DEVICE_ID_TOSHIBA_TC35815_NWU), .driver_data = TC35815_NWU },
+   {PCI_DEVICE(PCI_VENDOR_ID_TOSHIBA_2, 
PCI_DEVICE_ID_TOSHIBA_TC35815_TX4939), .driver_data = TC35815_TX4939 },
{0,}
 };
 MODULE_DEVICE_TABLE (pci, tc35815_pci_tbl);
@@ -140,6 +168,11 @@ struct tc35815_regs {
  * Bit assignments
  */
 /* DMA_Ctl bit asign --- */
+#define DMA_RxAlign0x00c0 /* 1:Reception Alignment   */
+#define DMA_RxAlign_1  0x0040
+#define DMA_RxAlign_2  0x0080
+#define DMA_RxAlign_3  0x00c0
+#define DMA_M66EnStat  0x0008 /* 1:66MHz Enable State*/
 #define DMA_IntMask0x0004 /* 1:Interupt mask */
 #define DMA_SWIntReq   0x0002 /* 1:Software Interrupt request*/
 #define DMA_TxWakeUp   0x0001 /* 1:Transmit Wake Up  */
@@ -351,6 +384,8 @@ struct BDesc {
Int_SSysErrEn  | Int_RMasAbtEn | Int_RTargAbtEn | \
Int_STargAbtEn | \
Int_BLExEn  | Int_FDAExEn) /* maybe 0xb7f*/
+#define DMA_CTL_CMDDMA_BURST_SIZE
+#define HAVE_DMA_RXALIGN(lp)   likely((lp)-boardtype != TC35815CF)
 
 /* Tuning parameters */
 #define DMA_BURST_SIZE 32
@@ -358,12 +393,28 @@ struct BDesc {
 #define TX_THRESHOLD_MAX 1536   /* used threshold with packet max byte for 
low pci transfer ability.*/
 #define TX_THRESHOLD_KEEP_LIMIT 10  /* setting threshold max value when 
overrun error occured this count. */
 
+/* 16 + RX_BUF_NUM * 8 + RX_FD_NUM * 16 + TX_FD_NUM * 32 = 
PAGE_SIZE*FD_PAGE_NUM */
+#ifdef TC35815_USE_PACKEDBUFFER
 #define FD_PAGE_NUM 2
-#define FD_PAGE_ORDER 1
-/* 16 + RX_BUF_PAGES * 8 + RX_FD_NUM * 16 + TX_FD_NUM * 32 = PAGE_SIZE*2 */
-#define RX_BUF_PAGES   8   /* = 2 */
+#define RX_BUF_NUM 8   /* = 2 */
 #define RX_FD_NUM  250 /* = 32 */
 #define TX_FD_NUM  128
+#define RX_BUF_SIZEPAGE_SIZE
+#else /* TC35815_USE_PACKEDBUFFER */
+#define FD_PAGE_NUM 4
+#define RX_BUF_NUM   

[PATCH] NET : convert network timestamps to ktime_t

2007-03-02 Thread Eric Dumazet
We currently use a special structure (struct skb_timeval) and plain 'struct 
timeval' to store packet timestamps in sk_buffs and struct sock.

This has some drawbacks :
- Fixed resolution of micro second.
- Waste of space on 64bit platforms where sizeof(struct timeval)=16

I suggest using ktime_t that is a nice abstraction of high resolution time 
services, currently capable of nanosecond resolution.

As sizeof(ktime_t) is 8 bytes, using ktime_t in 'struct sock' permits a 8 byte 
shrink of this structure on 64bit architectures. Some other structures also 
benefit from this size reduction (struct ipq in ipv4/ip_fragment.c, struct 
frag_queue in ipv6/reassembly.c, ...)


Once this ktime infrastructure adopted, we can more easily provide nanosecond 
resolution on top of it. (ioctl SIOCGSTAMPNS and/or 
SO_TIMESTAMPNS/SCM_TIMESTAMPNS)

Note : this patch includes a bug correction in compat_sock_get_timestamp() 
where a err = 0; was missing (so this syscall returned -ENOENT instead of 
0)

Signed-off-by: Eric Dumazet [EMAIL PROTECTED]
CC: Stephen Hemminger [EMAIL PROTECTED]
CC: John find [EMAIL PROTECTED]

 include/linux/skbuff.h  |   26 --
 include/net/sock.h  |   18 +++
 net/bridge/netfilter/ebt_ulog.c |6 +++--
 net/compat.c|   15 
 net/core/dev.c  |   19 +++-
 net/core/sock.c |   16 +++--
 net/econet/af_econet.c  |2 -
 net/ipv4/ip_fragment.c  |6 ++---
 net/ipv4/netfilter/ip_queue.c   |6 +++--
 net/ipv4/netfilter/ipt_ULOG.c   |8 --
 net/ipv6/exthdrs.c  |2 -
 net/ipv6/netfilter/ip6_queue.c  |6 +++--
 net/ipv6/netfilter/nf_conntrack_reasm.c |6 ++---
 net/ipv6/reassembly.c   |6 ++---
 net/ipx/af_ipx.c|4 +--
 net/netfilter/nfnetlink_log.c   |8 +++---
 net/netfilter/nfnetlink_queue.c |8 +++---
 net/packet/af_packet.c  |8 --
 18 files changed, 80 insertions(+), 90 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 4ff3940..24dcbb3 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -27,6 +27,7 @@ #include linux/textsearch.h
 #include net/checksum.h
 #include linux/rcupdate.h
 #include linux/dmaengine.h
+#include linux/hrtimer.h
 
 #define HAVE_ALLOC_SKB /* For the drivers to know */
 #define HAVE_ALIGNABLE_SKB /* Ditto 8)*/
@@ -156,11 +157,6 @@ struct skb_shared_info {
 #define SKB_DATAREF_SHIFT 16
 #define SKB_DATAREF_MASK ((1  SKB_DATAREF_SHIFT) - 1)
 
-struct skb_timeval {
-   u32 off_sec;
-   u32 off_usec;
-};
-
 
 enum {
SKB_FCLONE_UNAVAILABLE,
@@ -233,7 +229,7 @@ struct sk_buff {
struct sk_buff  *prev;
 
struct sock *sk;
-   struct skb_timeval  tstamp;
+   ktime_t tstamp;
struct net_device   *dev;
struct net_device   *input_dev;
 
@@ -1360,26 +1356,14 @@ extern void skb_add_mtu(int mtu);
  */
 static inline void skb_get_timestamp(const struct sk_buff *skb, struct timeval 
*stamp)
 {
-   stamp-tv_sec  = skb-tstamp.off_sec;
-   stamp-tv_usec = skb-tstamp.off_usec;
+   *stamp = ktime_to_timeval(skb-tstamp);
 }
 
-/**
- * skb_set_timestamp - set timestamp of a skb
- * @skb: skb to set stamp of
- * @stamp: pointer to struct timeval to get stamp from
- *
- * Timestamps are stored in the skb as offsets to a base timestamp.
- * This function converts a struct timeval to an offset and stores
- * it in the skb.
- */
-static inline void skb_set_timestamp(struct sk_buff *skb, const struct timeval 
*stamp)
+static inline void __net_timestamp(struct sk_buff *skb)
 {
-   skb-tstamp.off_sec  = stamp-tv_sec;
-   skb-tstamp.off_usec = stamp-tv_usec;
+   skb-tstamp = ktime_get_real();
 }
 
-extern void __net_timestamp(struct sk_buff *skb);
 
 extern __sum16 __skb_checksum_complete(struct sk_buff *skb);
 
diff --git a/include/net/sock.h b/include/net/sock.h
index 2c7d60c..19f6540 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -244,7 +244,7 @@ #define sk_prot __sk_common.skc_prot
struct sk_filter*sk_filter;
void*sk_protinfo;
struct timer_list   sk_timer;
-   struct timeval  sk_stamp;
+   ktime_t sk_stamp;
struct socket   *sk_socket;
void*sk_user_data;
struct page *sk_sndmsg_page;
@@ -1307,19 +1307,19 @@ static inline int sock_intr_errno(long t
 static __inline__ void
 sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb)
 {
-   struct timeval stamp;
+   ktime_t kt = skb-tstamp;
 
-   

Re: Network activity LED trigger

2007-03-02 Thread jamal
On Fri, 2007-02-03 at 15:16 +0100, Florian Fainelli wrote:
 Hi,
 
 Le vendredi 2 mars 2007, jamal a écrit :
  Where are these LEDs typically located? Are you talking about LEDs on a
  network card for example? can you light them up in different colors?
 
 Those LEDS are typically controlled by GPIO lines visible in front of the 
 device. It is mostly targeted to embedded devices for which you do not 
 necessarily want to assign a LED to a given network interface
 

Ah, ok - ive worked with a not-so-embedded board that had something that
was accessible via the ICH; i recall writting a user-space program to
handle it. So instead of calling this just LED, probably find a more
descriptive name for it; Example GPIO-LED.

Those things are tricky to have in a generic code though, no? I.e each
chipset/board will have different address mappings on where to
read/write for a specific LED. So you need to deal with that problem
without requiring changing of the kernel every time an address changes.
I actually found exactly similar board (some manufacturer) but the
firmware was slightly different.

Heres my view of what would be useful:
Have them accessible via the kernel, but also have an API from user
space. This way user space apps can control the LED, but if i wanted to
do it from the kernel i could as well. In my case i was actually
monitoring the health of a daemon; it would show off if the daemon was
not running, green if it was happy, yellow if semi-healthy and Red if it
was in trouble.

here are some operations/messages i can see that are useful which you
probably already have in your API:

turn on LED at #x color somecolor
turn off LED at #y
query LED info at #x
dump all LEDs on board - think of this as a discovery
flicker LED at #z at frequency y color green
maybe even: I am a wireless card with no LED, I claim LED #x
which is matched by tell me if anyone owns LED code

In other words, if you just provide mechanims let people write the
policies.
This way if i wanted to tie it to my eth0 i can. 

Hope that helps.

cheers,
jamal


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 8107] New: dev-header_cache_update has a random value

2007-03-02 Thread Krzysztof Halasa
Andrew Morton [EMAIL PROTECTED] writes:

 However, in 
 drivers/net/wan/hdlc_cisco.c, in function static int cisco_ioctl(struct 
 net_device *dev, struct ifreq *ifr), where dev-hard_header is assigned a 
 valid 
 function, and dev-hard_header_cache is assigned a known value (NULL), dev-
 header_cache_update is not set to a known value:

Right, it seems I was never aware of dev-header_cache_update existence.
I wonder where does the non-NULL value come from? Nevermind.

 diff -puN 
 drivers/net/wan/hdlc_cisco.c~cisco_ioctl-initialise-header_cache_update 
 drivers/net/wan/hdlc_cisco.c
 --- a/drivers/net/wan/hdlc_cisco.c~cisco_ioctl-initialise-header_cache_update
 +++ a/drivers/net/wan/hdlc_cisco.c
 @@ -366,6 +366,7 @@ static int cisco_ioctl(struct net_device
   dev-hard_start_xmit = hdlc-xmit;
   dev-hard_header = cisco_hard_header;
   dev-hard_header_cache = NULL;
 + dev-header_cache_update = NULL;
   dev-type = ARPHRD_CISCO;
   dev-flags = IFF_POINTOPOINT | IFF_NOARP;
   dev-addr_len = 0;
 _

ACK, I think it's the best place.

Is it OK to leave this (and hard_header_cache) set to random value
if dev-hard_header = NULL (as with other protocols)?
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network activity LED trigger

2007-03-02 Thread Richard Purdie
On Fri, 2007-03-02 at 10:16 -0500, jamal wrote:
 Heres my view of what would be useful:
 Have them accessible via the kernel, but also have an API from user
 space. This way user space apps can control the LED, but if i wanted to
 do it from the kernel i could as well. In my case i was actually
 monitoring the health of a daemon; it would show off if the daemon was
 not running, green if it was happy, yellow if semi-healthy and Red if it
 was in trouble.

We already have this API, see drivers/leds ;-)

 here are some operations/messages i can see that are useful which you
 probably already have in your API:
 
 turn on LED at #x color somecolor
 turn off LED at #y
 query LED info at #x
 dump all LEDs on board - think of this as a discovery
 flicker LED at #z at frequency y color green
 maybe even: I am a wireless card with no LED, I claim LED #x
 which is matched by tell me if anyone owns LED code
 
 In other words, if you just provide mechanims let people write the
 policies.
 This way if i wanted to tie it to my eth0 i can. 

We have LEDs which show up in sysfs and can be controlled by userspace
from there. They can also choose to be controlled by kernel LED
'triggers', for example. we have an IDE disk trigger which shows up
activity on IDE disks. Florian would like to see a network trigger.

The LED trigger code is quite generic and designed to have little impact
on the subsystem its added to, at least in terms of code. As always,
there will be some runtime overhead though. Ultimately it depends how
complex you make the trigger (eg. how many options it has) and where and
how you hook it into the network subsystem. I know little about the
network subsystem so this is something others will have to advise on.

Cheers,

Richard 
(LED Maintainer)


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [git patches] net driver fixes

2007-03-02 Thread Linus Torvalds


On Thu, 1 Mar 2007, Kok, Auke wrote:

 Linus Torvalds wrote:
  
  Ok, here's an interesting one: my e1000 card no longer worked for a while.
  
  The green link-light blinks on/off once a second, and in time to that, my
  dmesg fills up with an endless supply of
  
  e1000: eth0: e1000_watchdog: NIC Link is Down
  e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps Full Duplex, Flow 
  Control: None
  e1000: eth0: e1000_watchdog: 10/100 speed: disabling TSO
  
  and networking obviously doesn't actually work.
 
 Just out of curiosity, which e1000 chipset+motherboard are you running this
 on?

The kernel prints out:

e1000: :00:19.0: e1000_probe: (PCI Express:2.5Gb/s:Width x1) 
00:16:76:c7:eb:fe
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection

and lspci says:

00:19.0 Ethernet controller: Intel Corporation 82566DM Gigabit Network 
Connection (rev 02)
Subsystem: Intel Corporation Unknown device 0001
Flags: bus master, fast devsel, latency 0, IRQ 506
Memory at e040 (32-bit, non-prefetchable) [size=128K]
Memory at e0424000 (32-bit, non-prefetchable) [size=4K]
I/O ports at 20c0 [size=32]
Capabilities: access denied
00: 86 80 4a 10 07 04 10 00 02 00 00 02 00 00 00 00
10: 00 00 40 e0 00 40 42 e0 c1 20 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 01 00
30: 00 00 00 00 c8 00 00 00 00 00 00 00 0b 01 00 00

It's an Intel system (Host bridge: Intel Corporation 82Q963/Q965) with 
integrated graphics: PCI ID 8086:2990 (rev 02) for the host bridge.

DMI info isn't very interesting, but it's an all-Intel board:

OEM-specific Type
Strings:
Intel_ASF
Intel_ASF_001
..
Base Board Information
Manufacturer: Intel Corporation
Product Name: DQ965GF
Version: AAD41676-305
Serial Number: BQGF635009R2
...
BIOS Information
Vendor: Intel Corp.
Version: CO96510J.86A.4462.2006.0804.2059
Release Date: 08/04/2006

so it's all-intel chipset, all-intel board, and all-intel BIOS ;)

 there have been problems reported with AMT2 on several chipsets (AMT2 is
 not supported under linux, unlike AMT1), and having it enabled in the BIOS
 produces this phenomenon.

Is there some way to at least disable AMT2 from the Linux driver (ie I 
assume this is some issue of Intel not documenting it all - but maybe you 
can add a turn off that bit to the affected chip).

If I'm not the only one to see it, it's obviously not just my personal 
ethernet switch bug, but apparently the e1000 becoming confused by some 
link detection event (and powering down the switch probably just gets it 
out of its confusion).

Linus
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NetLabel: Verify sensitivity level has a valid CIPSO mapping

2007-03-02 Thread Paul Moore
On Wednesday, February 28 2007 3:01:31 pm Paul Moore wrote:
 The current CIPSO engine has a problem where it does not verify that the
 given sensitivity level has a valid CIPSO mapping when the std CIPSO DOI
 type is used.  The end result is that bad packets are sent on the wire
 which should have never been sent in the first place.  This patch corrects
 this problem by verifying the sensitivity level mapping similar to what is
 done with the category mapping.  This patch also changes the returned error
 code in this case to -EPERM to better match what the category mapping
 verification code returns.

 Signed-off-by: Paul Moore [EMAIL PROTECTED]
 ---
  net/ipv4/cipso_ipv4.c |7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)

I probably should have been more clear in the original patch posting ... this 
is a bugfix patch which I believe should go into 2.6.21 (as well as 
the -stable tree, but I know they like to see it hit Linus' tree first).

-- 
paul moore
linux security @ hp
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [git patches] net driver fixes

2007-03-02 Thread Kok, Auke

Linus Torvalds wrote:

On Thu, 1 Mar 2007, Kok, Auke wrote:
and lspci says:

00:19.0 Ethernet controller: Intel Corporation 82566DM Gigabit Network 
Connection (rev 02)



DMI info isn't very interesting, but it's an all-Intel board:

so it's all-intel chipset, all-intel board, and all-intel BIOS ;)


It's like the devil plays with it. We just discussed adding a piece of text 
about this issue to our README.



there have been problems reported with AMT2 on several chipsets (AMT2 is
not supported under linux, unlike AMT1), and having it enabled in the BIOS
produces this phenomenon.


Is there some way to at least disable AMT2 from the Linux driver (ie I 
assume this is some issue of Intel not documenting it all - but maybe you 
can add a turn off that bit to the affected chip).


Our suggestion is (IOW will be in the README) to turn AMT2 off completely in the 
BIOS, but I'll investigate if your suggestion is possible. It may be another 
workaround but this one indeed hurts.


If I'm not the only one to see it, it's obviously not just my personal 
ethernet switch bug, but apparently the e1000 becoming confused by some 
link detection event (and powering down the switch probably just gets it 
out of its confusion).


No, this fits the description perfectly of this issue. I'll get right on it and 
owe you a patch for the `e1000: not ready for irq` problem too, which seems to 
hold out after tests...


Cheers,

Auke
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network activity LED trigger

2007-03-02 Thread jamal
On Fri, 2007-02-03 at 16:03 +, Richard Purdie wrote:
 On Fri, 2007-03-02 at 10:16 -0500, jamal wrote:

 We already have this API, see drivers/leds ;-)

Very cool ;- I was not aware of the existence of this API.
Actually i dont think it was available around 2.6.10.

 We have LEDs which show up in sysfs and can be controlled by userspace
 from there. They can also choose to be controlled by kernel LED
 'triggers', for example. we have an IDE disk trigger which shows up
 activity on IDE disks. Florian would like to see a network trigger.
 

This literally covers most of what i wanted; it may be too late to get
rid of that user space program but it is something i see you already
support;-

 The LED trigger code is quite generic and designed to have little impact
 on the subsystem its added to, at least in terms of code. As always,
 there will be some runtime overhead though. Ultimately it depends how
 complex you make the trigger (eg. how many options it has) and where 


Well, give me pointers and i will send you a patch for a board i
currently use:
http://download.intel.com/design/telecom/techspec/9635.pdf
which has GPIO LED.
I take it i would have to write a driver using your API?

 and how you hook it into the network subsystem. 
 I know little about the
 network subsystem so this is something others will have to advise on.

Other people may have different opionions: I cant think of something
useful from a network perspective mostly because you cant make it
generic enough i.e some boards will have LEDs for their NICs and some
wont. Just as some boards have activity LEDS for their IDE disks. IOW, I
think general purpose LEDs will probably be very dependent on the
shipping product.

other than that, great work!

cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET : convert network timestamps to ktime_t

2007-03-02 Thread Stephen Hemminger
On Fri, 2 Mar 2007 15:38:41 +0100
Eric Dumazet [EMAIL PROTECTED] wrote:

 We currently use a special structure (struct skb_timeval) and plain 'struct 
 timeval' to store packet timestamps in sk_buffs and struct sock.
 
 This has some drawbacks :
 - Fixed resolution of micro second.
 - Waste of space on 64bit platforms where sizeof(struct timeval)=16
 
 I suggest using ktime_t that is a nice abstraction of high resolution time 
 services, currently capable of nanosecond resolution.
 
 As sizeof(ktime_t) is 8 bytes, using ktime_t in 'struct sock' permits a 8 
 byte 
 shrink of this structure on 64bit architectures. Some other structures also 
 benefit from this size reduction (struct ipq in ipv4/ip_fragment.c, struct 
 frag_queue in ipv6/reassembly.c, ...)

This is even better. Also comparing ktime_t's is easier if some code needs
to do that.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


SWS for rcvbuf MTU

2007-03-02 Thread Alex Sidorenko
Hello,

this is a rare corner case met by one of HP partners on 2.4.20 on IA64. 
Inspecting the sources of the latest 2.6.20.1 (net/ipv4/tcp_output.c) we can 
see that the bug is still there.

Here is a description of the bug and the suggested fix.

The problem occurs when the remote host (not necessarily Linux - in our case 
it was Solaris) does not implement SWS avoidance on sender side. If Linux 
connection socket has rcvbufmtu, we can potentially advertise small rcv_wnd 
for a long time (SWS).

The problem is due to SWS avoidance as implemented in __tcp_select_window(). 
Everything works fine when rcvbuf  mtu. But if we use small rcvbuf (set by 
SO_RCVBUF), we can go into SWS mode. Let us for simplicity look only at the 
case when we don't have WS enabled. If we have free_space above full_space/2, 
we reach the following section:


/* Don't do rounding if we are using window scaling, since the
 * scaled window will not line up with the MSS boundary anyway.
 */
window = tp-rcv_wnd;
if (tp-rx_opt.rcv_wscale) {
snip
} else {
/* Get the largest window that is a nice multiple of mss.
 * Window clamp already applied above.
 * If our current window offering is within 1 mss of the
 * free space we just keep it. This prevents the divide
 * and multiply from happening most of the time.
 * We also don't do any window rounding when the free space
 * is too small.
 */
(1)  if (window = free_space - mss || window  free_space)
window = (free_space/mss)*mss;
}

return window;

What happens if we have a small tp-rcv_wnd and rcvbuf = mss? In this case 
condition (1) is almost always false and as a result we'll return 
unmodified 'window' set to tp-rcv_wnd.  If tp-rcv_wnd is small, it can be 
reused over and over again.

For the case rcvbuf = mss  __tcp_select_window() returns:

  0 if we have free_space  full_space/2OK
  mss   if rcvbuf is empty  OK
  tp-rcv_wnd   in other case   Bad


If there is no SWS avoidance on sender side, we can see Linux advertising the 
same small rcv_wnd over and over again. The problem here is that we never 
advertise one-half the receiver's buffer space as described e.g. in

TCP/IP Illustrated by Stevens (v.1, Chapter 22.3):

The normal algorithm is for the receiver not to advertise a larger window 
than it is currently advertising (which can be 0) until the window can be 
increased by either one full-sized segment (i.e. the MSS being received) or by 
one-half the receiver's buffer space, whichever is smaller
^^

The fix.


We have not been able to reproduce the problem inside HP as it is unclear what 
conditions are needed to bring system into SWS mode (this needs very special 
event timing). HP customer was seeing it every 2-3 days while running a 
custom application (Solaris-Linux) that was running with low priority on a 
busy host running other custom applications with SCHED_RR. After going into 
SWS mode, his application stayed in it until restarted.

We provided to customer a fix for 2.4.20 only (used by customer in production) 
by adding another test and returning rcvbuf/2 when needed:

--- net/ipv4/tcp_output.c.orig  Wed May  3 20:40:43 2006
+++ net/ipv4/tcp_output.c   Tue Jan 30 14:24:56 2007
@@ -641,6 +641,7 @@
  * Note, we don't adjust for TIMESTAMP or SACK option bytes.
  * Regular options like TIMESTAMP are taken into account.
  */
+static const char *SWS_id_string=@#SWS-fix-2;
 u32 __tcp_select_window(struct sock *sk)
 {
struct tcp_opt *tp = sk-tp_pinfo.af_tcp;
@@ -682,6 +683,9 @@
window = tp-rcv_wnd;
if (window = free_space - mss || window  free_space)
window = (free_space/mss)*mss;
+/* A fix for small rcvbuf [EMAIL PROTECTED] */
+   else if (mss == full_space  window  full_space/2)
+   window = full_space/2;

return window;
 }


Customer has confirmed that this resolves the problem and decreases CPU usage 
by  his custom application - even when there is no SWS.


This is a rare corner case and most users will never meet it. But as the fix 
is trivial, I think it makes sense to include it in upstream sources. 

Regards,
Alex

-- 
--
Alexandre Sidorenko email: [EMAIL PROTECTED]
Global Solutions Engineering:   Unix Networking
Hewlett-Packard (Canada)
--
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] spidernet: Fix problem sending IP fragments

2007-03-02 Thread Norbert Eicker
On Fri 2.3.2007 00:34, Linas Vepstas wrote:
 On Thu, Mar 01, 2007 at 04:52:54PM -0600, Chris Engel wrote:
  I tried to apply this patch to 2.6.21-rc2 and CHECKSUM_HW appears
  to be changed to CHECKSUM_COMPLETE

Oops. I did not test this on the actual 2.6.21-rc2 before sending it.
It worked fine for me on 2.6.18.

In the meantime it tested the patch below on 2.6.21.

 The use of CHECKSUM_HW was replaced by CHECKSUM_PARTIAL and
 CHECKSUM_COMPLETE on a cae-by-case basis, in the patch series leading
 up to 2.6.19.  In this case, I'm not sure which should have been
 used.

In fact CHECKSUM_COMPLETE seems to be used on the receiving side while
CHECKSUM_PARTIAL is the one to be used while sending frames. Thus the
latter is the one to chose.

 Norbert, can you resubmit a patch that applies to a more recent
 kernel? p.s. your emailer replaced tabs by spaces ...

so here's the new one:

Fix problem sending IP fragments on spidernet.

Signed-off-by: Norbert Eicker [EMAIL PROTECTED]
---
diff --git a/drivers/net/spider_net.c b/drivers/net/spider_net.c
index 3b91af8..e3019d5 100644
--- a/drivers/net/spider_net.c
+++ b/drivers/net/spider_net.c
@@ -719,7 +719,7 @@ spider_net_prepare_tx_descr(struct spide
SPIDER_NET_DESCR_CARDOWNED | SPIDER_NET_DMAC_NOCS;
spin_unlock_irqrestore(chain-lock, flags);

-   if (skb-protocol == htons(ETH_P_IP))
+   if (skb-protocol == htons(ETH_P_IP)  skb-ip_summed == 
CHECKSUM_PARTIAL)
switch (skb-nh.iph-protocol) {
case IPPROTO_TCP:
hwdescr-dmac_cmd_status |= SPIDER_NET_DMAC_TCP;

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: s390 allmodconfig

2007-03-02 Thread Martin Schwidefsky
On Fri, 2007-03-02 at 12:11 +0100, Johannes Berg wrote:
 On Fri, 2007-03-02 at 03:06 -0800, Andrew Morton wrote:
  s390 is weird ;)   There's no way it'll support any of the hardware which 
  you're
  working on (until they release the s390 laptop).  So all we really want to
  do here is to avoid breaking s390 allmodconfig.

Well, I would not say weird but different. None of the usual device
attachments is present on a s390. That includes memory mapped i/o (!).

 Alright. I think we'll probably have to make bcm43xx and b44 depend on
 SSB instead of selecting it like the LED trigger stuff below.
 
 But I don't see why s390 can't include hw random, led trigger or even
 hid, those are all software features afaict.

True. I'm still sitting on a couple of patches that make s390 use the
standard drivers/Kconfig. The downside of these patches is that I have
to add a lot of depends on !S390 all over the place.

  OK, I'll try that, thanks.
 
 Not that it'll actually help get the compile through... bcm43xx will
 drop fail and bluetooth probably as well.

No bcm43xx, no bluetooth on s390..

-- 
blue skies,
  Martin.

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

Reality continues to ruin my life. - Calvin.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Arp announce (for Xen)

2007-03-02 Thread Ben Greear

Pekka Savola wrote:

On Thu, 1 Mar 2007, Stephen Hemminger wrote:

What about implementing the unused arp_announce flag on the inetdevice?
Something like the following.  Totally untested...

Looks like it either was there (and got removed) or was planned but
never implemented.

IN_DEV_ARP_ANNOUNCE is in 2.6.18, at least..used in arp_solicit in arp.c

I really hope this didn't get removed because I find it very useful!

But, you could certainly add another sysctl...

Thanks,
Ben

--
Ben Greear [EMAIL PROTECTED] 
Candela Technologies Inc  http://www.candelatech.com



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SWS for rcvbuf MTU

2007-03-02 Thread John Heffner

Alex Sidorenko wrote:
[snip]

--- net/ipv4/tcp_output.c.orig  Wed May  3 20:40:43 2006
+++ net/ipv4/tcp_output.c   Tue Jan 30 14:24:56 2007
@@ -641,6 +641,7 @@
  * Note, we don't adjust for TIMESTAMP or SACK option bytes.
  * Regular options like TIMESTAMP are taken into account.
  */
+static const char *SWS_id_string=@#SWS-fix-2;
 u32 __tcp_select_window(struct sock *sk)
 {
struct tcp_opt *tp = sk-tp_pinfo.af_tcp;
@@ -682,6 +683,9 @@
window = tp-rcv_wnd;
if (window = free_space - mss || window  free_space)
window = (free_space/mss)*mss;
+/* A fix for small rcvbuf [EMAIL PROTECTED] */
+   else if (mss == full_space  window  full_space/2)
+   window = full_space/2;

return window;
 }


Good analysis of the problem, but the patch does not look quite right. 
In particular, you can't ever announce a zero window. :)


I think this attached patch does the correct SWS avoidance.

Thanks,
  -John

Do receiver-side SWS avoidance for rcvbuf  MSS.

Signed-off-by: John Heffner [EMAIL PROTECTED]

---
commit 38d33181c93a28cf7fb2f9f3377305a04636c054
tree 503f8a9de6e78694bae9fc2eb1c9dd5d26a0b5ed
parent 562aa1d4c6a874373f9a48ac184f662fbbb06a04
author John Heffner [EMAIL PROTECTED] Fri, 02 Mar 2007 13:47:44 -0500
committer John Heffner [EMAIL PROTECTED] Fri, 02 Mar 2007 13:47:44 -0500

 net/ipv4/tcp_output.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index dc15113..688b955 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1607,6 +1607,9 @@ u32 __tcp_select_window(struct sock *sk)
 */
if (window = free_space - mss || window  free_space)
window = (free_space/mss)*mss;
+   else if (mss == full_space 
+free_space  window + full_space/2)
+   window = free_space;
}
 
return window;


Re: [Bugme-new] [Bug 8107] New: dev-header_cache_update has a random value

2007-03-02 Thread David Miller
From: Krzysztof Halasa [EMAIL PROTECTED]
Date: Fri, 02 Mar 2007 16:29:06 +0100

 Andrew Morton [EMAIL PROTECTED] writes:
 
  However, in 
  drivers/net/wan/hdlc_cisco.c, in function static int cisco_ioctl(struct 
  net_device *dev, struct ifreq *ifr), where dev-hard_header is assigned a 
  valid 
  function, and dev-hard_header_cache is assigned a known value (NULL), dev-
  header_cache_update is not set to a known value:
 
 Right, it seems I was never aware of dev-header_cache_update existence.
 I wonder where does the non-NULL value come from? Nevermind.
 
  diff -puN 
  drivers/net/wan/hdlc_cisco.c~cisco_ioctl-initialise-header_cache_update 
  drivers/net/wan/hdlc_cisco.c
  --- 
  a/drivers/net/wan/hdlc_cisco.c~cisco_ioctl-initialise-header_cache_update
  +++ a/drivers/net/wan/hdlc_cisco.c
  @@ -366,6 +366,7 @@ static int cisco_ioctl(struct net_device
  dev-hard_start_xmit = hdlc-xmit;
  dev-hard_header = cisco_hard_header;
  dev-hard_header_cache = NULL;
  +   dev-header_cache_update = NULL;
  dev-type = ARPHRD_CISCO;
  dev-flags = IFF_POINTOPOINT | IFF_NOARP;
  dev-addr_len = 0;
  _
 
 ACK, I think it's the best place.

I disagree, you can't leave dangling references to functions
which are potentially inside of unloaded modules, as this code
does.

Rather, HDLC Cisco should implement a proper protocol destructor
method to clean up these function pointers.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NetLabel: Verify sensitivity level has a valid CIPSO mapping

2007-03-02 Thread David Miller
From: Paul Moore [EMAIL PROTECTED]
Date: Fri, 2 Mar 2007 11:12:12 -0500

 On Wednesday, February 28 2007 3:01:31 pm Paul Moore wrote:
  The current CIPSO engine has a problem where it does not verify that the
  given sensitivity level has a valid CIPSO mapping when the std CIPSO DOI
  type is used.  The end result is that bad packets are sent on the wire
  which should have never been sent in the first place.  This patch corrects
  this problem by verifying the sensitivity level mapping similar to what is
  done with the category mapping.  This patch also changes the returned error
  code in this case to -EPERM to better match what the category mapping
  verification code returns.
 
  Signed-off-by: Paul Moore [EMAIL PROTECTED]
  ---
   net/ipv4/cipso_ipv4.c |7 ---
   1 file changed, 4 insertions(+), 3 deletions(-)
 
 I probably should have been more clear in the original patch posting ... this 
 is a bugfix patch which I believe should go into 2.6.21 (as well as 
 the -stable tree, but I know they like to see it hit Linus' tree first).

I realize this and plan to apply the patch, I'm just backlogged
at the moment.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SWS for rcvbuf MTU

2007-03-02 Thread David Miller
From: Alex Sidorenko [EMAIL PROTECTED]
Date: Fri, 2 Mar 2007 11:28:28 -0500

 Customer has confirmed that this resolves the problem and decreases
 CPU usage by his custom application - even when there is no SWS.

There is rarely ever a reason to set explicit socket receive
buffer sizes, since the kernel dynamically sizes them based
upon how the connection is used.

Why do they set it so low?

It is just as easy to fix their performance bug by simply removing
SO_RCVBUF setting in the application.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Netem tfifo implementation

2007-03-02 Thread Patrick McHardy
Ritesh Kumar wrote:
 Hi,
I recently saw the qdisc tfifo in the netem module
 (net/sched/sch_netem.c) when I migrated some of my patches from 2.6.14
 to 2.6.20. As I understand, tfifo helps in keeping the queue of
 packets sorted according to their time_to_send. [tfifo was not
 present in 2.6.14 perhaps because arrival order of packets was always
 equal to the departure order]. However, tfifo uses a linear search in
 the packet queue to find where to enqueue the packet.
Quite some time ago (2.6.14 era), I needed a similar functionality
 from the netem module and I ended up coding a pointer based min-heap
 for the same. I was wondering if the community was interested in using
 the min-heap implementation to replace the linear search
 implementation. I have tested the min-heap quite a few times and it
 seems to work.
The implementation is slightly non-trivial because it uses
 pointers to maintain the heap structure instead if using good old
 fixed size arrays. I did this mainly so that the limit of the netem
 qdisc could be changed on the fly. However, because every sk_buff now
 needs two pointers for its children nodes, I added an extra
 (sk_buff*)next2 to struct sk_buff (sorry!). However, this can probably
 be changed to a pointer inside netem_skb_cb.  Also, because I needed
 this for personal work and 2.6.14 didn't contain tfifo, I basically
 removed the embedded qdisc and made netem a classless qdisc with my
 min heap as the native queue (sorry again! :) )

The tfifo qdisc has a limit, why not just allocate a fixed-size heap
based on that?

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Arp announce (for Xen)

2007-03-02 Thread Stephen Hemminger
On Fri, 02 Mar 2007 10:29:53 -0800
Ben Greear [EMAIL PROTECTED] wrote:

 Pekka Savola wrote:
  On Thu, 1 Mar 2007, Stephen Hemminger wrote:
  What about implementing the unused arp_announce flag on the inetdevice?
  Something like the following.  Totally untested...
 
  Looks like it either was there (and got removed) or was planned but
  never implemented.
 IN_DEV_ARP_ANNOUNCE is in 2.6.18, at least..used in arp_solicit in arp.c
 
 I really hope this didn't get removed because I find it very useful!
 
 But, you could certainly add another sysctl...
 
 Thanks,
 Ben
 

yeah, something new like arp_notify? or arp_gratiutous

There are other drivers that do their own arp, they need to be fixed.

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SWS for rcvbuf MTU

2007-03-02 Thread Alex Sidorenko
On March 2, 2007 02:25:42 pm David Miller wrote:
 From: Alex Sidorenko [EMAIL PROTECTED]
 Date: Fri, 2 Mar 2007 11:28:28 -0500

  Customer has confirmed that this resolves the problem and decreases
  CPU usage by his custom application - even when there is no SWS.

 There is rarely ever a reason to set explicit socket receive
 buffer sizes, since the kernel dynamically sizes them based
 upon how the connection is used.

 Why do they set it so low?

 It is just as easy to fix their performance bug by simply removing
 SO_RCVBUF setting in the application.

Hi David,

they told us that they use small rcvbuf to throttle bandwidth for this 
application. I explained it would be better to use TC for this purpose. They 
agreed and will probably redesign their application in the future, but they 
cannot do it right now. For the same reason they have to use the old 2.4.20 
for a while - in big companies the important production software cannot be 
changed quickly. 

The fix I suggested is trivial and should have no impact the case of 
rcvfbufmtu, so I think it makes sense to include it in upstream kernel.

Regards,
Alex


-- 
--
Alexandre Sidorenko email: [EMAIL PROTECTED]
Global Solutions Engineering:   Unix Networking
Hewlett-Packard (Canada)
--
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network access fails unless tcpdump is running?

2007-03-02 Thread Andy Gospodarek
On Thu, Mar 01, 2007 at 06:27:18PM -0500, Marc D Ronell wrote:
 Thats correct. Its the wired interface, eth0 which is having the
 problem.  I have turned the wireless interface, eth2 off with both
 ifconfig and ifdown, and still, the connection to the outside only
 works when tcpdump is running.
 

Good to know.

  Can you post the output from `ethtool -i ethX` (where ethX is the wired
  interface).  I ask because that tells me what version of the b44/ipw3945
  driver you are using.
 
 
 
 # ethtool -i eth0
 driver: b44
 version: 1.01
 firmware-version:
 bus-info: :03:00.0
 
 
 The system was working originally fine, but something changed.
 Perhaps through an Debian aptitude update.

Any chance you can boot back to the old kernel (the one where is was
working) and run and ethtool -i eth0 on that one to see what version of
the driver was used there?  It's hard to know what may have changed
between the 2 versions of the driver since I don't know the starting
point.

It's also hard to know if this is fixed already since you aren't running
the latest upstream kernel.  Downloading, building, and testing the
latest from kernel.org would be a good way to know if this is already
fixed.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SWS for rcvbuf MTU

2007-03-02 Thread Alex Sidorenko
On March 2, 2007 01:54:45 pm John Heffner wrote:
 Alex Sidorenko wrote:
 [snip]

  --- net/ipv4/tcp_output.c.orig  Wed May  3 20:40:43 2006
  +++ net/ipv4/tcp_output.c   Tue Jan 30 14:24:56 2007
  @@ -641,6 +641,7 @@
* Note, we don't adjust for TIMESTAMP or SACK option bytes.
* Regular options like TIMESTAMP are taken into account.
*/
  +static const char *SWS_id_string=@#SWS-fix-2;
   u32 __tcp_select_window(struct sock *sk)
   {
  struct tcp_opt *tp = sk-tp_pinfo.af_tcp;
  @@ -682,6 +683,9 @@
  window = tp-rcv_wnd;
  if (window = free_space - mss || window  free_space)
  window = (free_space/mss)*mss;
  +/* A fix for small rcvbuf [EMAIL PROTECTED] */
  +   else if (mss == full_space  window  full_space/2)
  +   window = full_space/2;
 
  return window;
   }

 Good analysis of the problem, but the patch does not look quite right.
 In particular, you can't ever announce a zero window. :)

Hi John,

in case when (free_space  full_space/2) we do not reach the modified code and
we will return zero:

if (free_space  full_space/2) {
icsk-icsk_ack.quick = 0;
 if (tcp_memory_pressure)
tp-rcv_ssthresh = min(tp-rcv_ssthresh, 4U*tp-advmss);
 if (free_space  mss)
return 0;
}

Here is how windows look with the fixed kernel (from customer's test):

20:59:45.320758 Node1.logical.40171  11.0.0.1.39909: win = 708
20:59:45.322758 Node1.logical.40171  11.0.0.1.39909: win = 288
20:59:45.714567 Node1.logical.40171  11.0.0.1.39909: win = 354
20:59:45.717110 Node1.logical.40171  11.0.0.1.39909: win = 0
20:59:45.719110 Node1.logical.40171  11.0.0.1.39909: win = 708
...

Regards,
Alex

 I think this attached patch does the correct SWS avoidance.

 Thanks,
-John



-- 
--
Alexandre Sidorenko email: [EMAIL PROTECTED]
Global Solutions Engineering:   Unix Networking
Hewlett-Packard (Canada)
--
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SWS for rcvbuf MTU

2007-03-02 Thread David Miller
From: Alex Sidorenko [EMAIL PROTECTED]
Date: Fri, 2 Mar 2007 15:21:58 -0500

 they told us that they use small rcvbuf to throttle bandwidth for this 
 application. I explained it would be better to use TC for this purpose. They 
 agreed and will probably redesign their application in the future, but they 
 cannot do it right now. For the same reason they have to use the old 2.4.20 
 for a while - in big companies the important production software cannot be 
 changed quickly. 
 
 The fix I suggested is trivial and should have no impact the case of 
 rcvfbufmtu, so I think it makes sense to include it in upstream kernel.

I have no objection to the fix, especially John's version.

I was just curious about the app, thanks for the info :)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Extensible hashing and RCU

2007-03-02 Thread Michael K. Edwards

On 3/2/07, Eric Dumazet [EMAIL PROTECTED] wrote:

Thank you for this report. (Still avoiding cache misses studies, while they
obviously are the limiting factor)


1)  The entire point of going to a tree-like structure would be to
allow the leaves to age out of cache (or even forcibly evict them)
when the structure bloats (generally under DDoS attack), on the theory
that most of them are bogus and won't be referenced again.  It's not
about the speed of the data structure -- it's about managing its
impact on the rest of the system.

2)  The other entire point of going to a tree-like structure is that
they're drastically simpler to RCU than hashes, and more generally
they don't involve individual atomic operations (RCU reaping passes,
resizing, etc.) that cause big latency hiccups and evict a bunch of
other stuff from cache.

3)  The third entire point of going to a tree-like structure is to
have a richer set of efficient operations, since you can give them a
second priority-type index and have pluck-highest-priority-item,
three-sided search, and bulk delete operations.  These aren't that
much harder to RCU than the basic modify-existing-node operation.

Now can we give these idiotic micro-benchmarks a rest until Robert's
implementation is tuned and ready for stress-testing?

Cheers,
- Michael
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fix bugs in Whether sock accept queue is full checking

2007-03-02 Thread David Miller
From: weidong [EMAIL PROTECTED]
Date: Wed, 14 Feb 2007 11:30:57 -0500

 diff -ruN old/include/net/sock.h new/include/net/sock.h
 --- old/include/net/sock.h2007-02-03 08:38:21.0 -0500
 +++ new/include/net/sock.h2007-02-03 08:38:30.0 -0500
 @@ -426,7 +426,7 @@
  
  static inline int sk_acceptq_is_full(struct sock *sk)
  {
 - return sk-sk_ack_backlog  sk-sk_max_ack_backlog;
 + return sk-sk_ack_backlog = sk-sk_max_ack_backlog;
  }
  
  /*

I've applied this patch, and also fixed a similar case
I spotted in AF_UNIX after doing a quick audit.

Thank you.

commit 626d548a8d145a032cff9237245f8ac9d9056ac1
Author: David S. Miller [EMAIL PROTECTED]
Date:   Fri Mar 2 12:49:23 2007 -0800

[AF_UNIX]: Test against sk_max_ack_backlog properly.

This brings things inline with the sk_acceptq_is_full() bug
fix.  The limit test should be x = sk_max_ack_backlog.

Signed-off-by: David S. Miller [EMAIL PROTECTED]

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 6069716..51ca438 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -934,7 +934,7 @@ static long unix_wait_for_peer(struct sock *other, long 
timeo)
 
sched = !sock_flag(other, SOCK_DEAD) 
!(other-sk_shutdown  RCV_SHUTDOWN) 
-   (skb_queue_len(other-sk_receive_queue) 
+   (skb_queue_len(other-sk_receive_queue) =
 other-sk_max_ack_backlog);
 
unix_state_runlock(other);
@@ -1008,7 +1008,7 @@ restart:
if (other-sk_state != TCP_LISTEN)
goto out_unlock;
 
-   if (skb_queue_len(other-sk_receive_queue) 
+   if (skb_queue_len(other-sk_receive_queue) =
other-sk_max_ack_backlog) {
err = -EAGAIN;
if (!timeo)
@@ -1381,7 +1381,7 @@ restart:
}
 
if (unix_peer(other) != sk 
-   (skb_queue_len(other-sk_receive_queue) 
+   (skb_queue_len(other-sk_receive_queue) =
 other-sk_max_ack_backlog)) {
if (!timeo) {
err = -EAGAIN;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network access fails unless tcpdump is running?

2007-03-02 Thread Marc D Ronell
Andy Gospodarek [EMAIL PROTECTED] writes:


 Any chance you can boot back to the old kernel (the one where is was
 working) and run and ethtool -i eth0 on that one to see what version of
 the driver was used there?  It's hard to know what may have changed
 between the 2 versions of the driver since I don't know the starting
 point.

 It's also hard to know if this is fixed already since you aren't running
 the latest upstream kernel.  Downloading, building, and testing the
 latest from kernel.org would be a good way to know if this is already
 fixed.


I had already loaded,  compiled, and tested linux-2.6.20.1.  There was
no change with the newer kernel.  Network connections only worked when
tcpdump was running.

Similar for booting with an  older kernel 2.6.17.  I think the problem
is not with the kernel, but with other system software.  It could take
a while to debug, so I am just rebuilding.

Thanks for your help.

marc


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Netem tfifo implementation

2007-03-02 Thread Ritesh Kumar

On 3/2/07, Patrick McHardy [EMAIL PROTECTED] wrote:

Ritesh Kumar wrote:
 Hi,
I recently saw the qdisc tfifo in the netem module
 (net/sched/sch_netem.c) when I migrated some of my patches from 2.6.14
 to 2.6.20. As I understand, tfifo helps in keeping the queue of
 packets sorted according to their time_to_send. [tfifo was not
 present in 2.6.14 perhaps because arrival order of packets was always
 equal to the departure order]. However, tfifo uses a linear search in
 the packet queue to find where to enqueue the packet.
Quite some time ago (2.6.14 era), I needed a similar functionality
 from the netem module and I ended up coding a pointer based min-heap
 for the same. I was wondering if the community was interested in using
 the min-heap implementation to replace the linear search
 implementation. I have tested the min-heap quite a few times and it
 seems to work.
The implementation is slightly non-trivial because it uses
 pointers to maintain the heap structure instead if using good old
 fixed size arrays. I did this mainly so that the limit of the netem
 qdisc could be changed on the fly. However, because every sk_buff now
 needs two pointers for its children nodes, I added an extra
 (sk_buff*)next2 to struct sk_buff (sorry!). However, this can probably
 be changed to a pointer inside netem_skb_cb.  Also, because I needed
 this for personal work and 2.6.14 didn't contain tfifo, I basically
 removed the embedded qdisc and made netem a classless qdisc with my
 min heap as the native queue (sorry again! :) )

The tfifo qdisc has a limit, why not just allocate a fixed-size heap
based on that?




The tfifo queue limit itself can be changed and that creates the
problem. If we use a fixed heap (say implemented using a fixed size
array) then we will have to copy over all pointers from the first
array to a reallocated array whenever the queue limit is changed.
In retrospect, moving just a few 10s of kilobytes of data doesn't seem
that much of a problem... now I feel stupid having put so much effort
:).

Ritesh
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET : convert network timestamps to ktime_t

2007-03-02 Thread Stephen Hemminger
On Fri, 2 Mar 2007 15:38:41 +0100
Eric Dumazet [EMAIL PROTECTED] wrote:

 We currently use a special structure (struct skb_timeval) and plain 'struct 
 timeval' to store packet timestamps in sk_buffs and struct sock.
 
 This has some drawbacks :
 - Fixed resolution of micro second.
 - Waste of space on 64bit platforms where sizeof(struct timeval)=16
 
 I suggest using ktime_t that is a nice abstraction of high resolution time 
 services, currently capable of nanosecond resolution.
 
 As sizeof(ktime_t) is 8 bytes, using ktime_t in 'struct sock' permits a 8 
 byte 
 shrink of this structure on 64bit architectures. Some other structures also 
 benefit from this size reduction (struct ipq in ipv4/ip_fragment.c, struct 
 frag_queue in ipv6/reassembly.c, ...)
 
 

You missed a couple of spots.

--- tcp-2.6.orig/net/sunrpc/svcsock.c   2007-03-02 12:50:45.0 -0800
+++ tcp-2.6/net/sunrpc/svcsock.c2007-03-02 12:58:28.0 -0800
@@ -805,16 +805,9 @@
/* possibly an icmp error */
dprintk(svc: recvfrom returned error %d\n, -err);
}
-   if (skb-tstamp.off_sec == 0) {
-   struct timeval tv;
 
-   tv.tv_sec = xtime.tv_sec;
-   tv.tv_usec = xtime.tv_nsec / NSEC_PER_USEC;
-   skb_set_timestamp(skb, tv);
-   /* Don't enable netstamp, sunrpc doesn't
-  need that much accuracy */
-   }
-   skb_get_timestamp(skb, svsk-sk_sk-sk_stamp);
+   svsk-sk_sk-sk_stamp = (skb-tstamp.tv64 != 0) ? skb-tstamp
+   : ktime_get_real();
set_bit(SK_DATA, svsk-sk_flags); /* there may be more data... */
 
/*
--- tcp-2.6.orig/kernel/time.c  2007-03-02 12:59:55.0 -0800
+++ tcp-2.6/kernel/time.c   2007-03-02 13:00:08.0 -0800
@@ -469,6 +469,8 @@
 
return tv;
 }
+EXPORT_SYMBOL(ns_to_timeval);
+
 
 /*
  * Convert jiffies to milliseconds and back.



-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Netem tfifo implementation

2007-03-02 Thread Stephen Hemminger
On Fri, 2 Mar 2007 15:56:54 -0500
Ritesh Kumar [EMAIL PROTECTED] wrote:

 On 3/2/07, Patrick McHardy [EMAIL PROTECTED] wrote:
  Ritesh Kumar wrote:
   Hi,
  I recently saw the qdisc tfifo in the netem module
   (net/sched/sch_netem.c) when I migrated some of my patches from 2.6.14
   to 2.6.20. As I understand, tfifo helps in keeping the queue of
   packets sorted according to their time_to_send. [tfifo was not
   present in 2.6.14 perhaps because arrival order of packets was always
   equal to the departure order]. However, tfifo uses a linear search in
   the packet queue to find where to enqueue the packet.
  Quite some time ago (2.6.14 era), I needed a similar functionality
   from the netem module and I ended up coding a pointer based min-heap
   for the same. I was wondering if the community was interested in using
   the min-heap implementation to replace the linear search
   implementation. I have tested the min-heap quite a few times and it
   seems to work.
  The implementation is slightly non-trivial because it uses
   pointers to maintain the heap structure instead if using good old
   fixed size arrays. I did this mainly so that the limit of the netem
   qdisc could be changed on the fly. However, because every sk_buff now
   needs two pointers for its children nodes, I added an extra
   (sk_buff*)next2 to struct sk_buff (sorry!). However, this can probably
   be changed to a pointer inside netem_skb_cb.  Also, because I needed
   this for personal work and 2.6.14 didn't contain tfifo, I basically
   removed the embedded qdisc and made netem a classless qdisc with my
   min heap as the native queue (sorry again! :) )
 
  The tfifo qdisc has a limit, why not just allocate a fixed-size heap
  based on that?
 
 
 
 The tfifo queue limit itself can be changed and that creates the
 problem. If we use a fixed heap (say implemented using a fixed size
 array) then we will have to copy over all pointers from the first
 array to a reallocated array whenever the queue limit is changed.
 In retrospect, moving just a few 10s of kilobytes of data doesn't seem
 that much of a problem... now I feel stupid having put so much effort
 :).
 

Tfifo is a special case because:
  * timestamps are stored in skb-cb so it is only really usable inside
netem that adds timestamps.
  * insertions are cheap because it walks backwards and netem usually has
tnext  tlast.   Only if you have a huge jitter which causes massive 
reordering
and that is unrealistic, would you see a problem.

You can always make a new qisc and since netem is classless use yours.


-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SWS for rcvbuf MTU

2007-03-02 Thread John Heffner

David Miller wrote:

From: Alex Sidorenko [EMAIL PROTECTED]
Date: Fri, 2 Mar 2007 15:21:58 -0500

they told us that they use small rcvbuf to throttle bandwidth for this 
application. I explained it would be better to use TC for this purpose. They 
agreed and will probably redesign their application in the future, but they 
cannot do it right now. For the same reason they have to use the old 2.4.20 
for a while - in big companies the important production software cannot be 
changed quickly. 

The fix I suggested is trivial and should have no impact the case of 
rcvfbufmtu, so I think it makes sense to include it in upstream kernel.


I have no objection to the fix, especially John's version.

I was just curious about the app, thanks for the info :)


Please don't apply the patch I sent.  I've been thinking about this a 
bit harder, and it may not fix this particular problem.  (Hard to say 
without knowing exactly what it is.)  As the comment above 
__tcp_select_window() states, we do not do full receive-side SWS 
avoidance because of header prediction.


Alex, you're right I missed that special zero-window case.  I'm still 
not quite sure I'm completely happy with this patch.  I'd like to think 
about this a little bit harder...


Thanks,
  -John
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][BUG][SECURITY] Re: Weird problem with PPPoE on tap interface

2007-03-02 Thread David Miller
From: Florian Zumbiehl [EMAIL PROTECTED]
Date: Wed, 28 Feb 2007 13:38:44 +0100

 As noone seems to have an opinion on this: Here is a patch that does
 work for me and that should solve the problem as far as that is easily
 possible. It is based on the assumption that an interface's ifindex is
 basically an alias for a local MAC address, so incoming packets now are
 matched to sockets based on remote MAC, session id, and ifindex of the
 interface the packet came in on/the socket was bound to by connect().

I agree with your analysis and have applied your patch.

Another way to implement this would have been to store the
pre-computed ifindex on the kernel side sockaddr.

Thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NetLabel: Verify sensitivity level has a valid CIPSO mapping

2007-03-02 Thread David Miller
From: James Morris [EMAIL PROTECTED]
Date: Wed, 28 Feb 2007 15:45:07 -0500 (EST)

 On Wed, 28 Feb 2007, Paul Moore wrote:
 
  The current CIPSO engine has a problem where it does not verify that the 
  given
  sensitivity level has a valid CIPSO mapping when the std CIPSO DOI type is
  used.  The end result is that bad packets are sent on the wire which should
  have never been sent in the first place.  This patch corrects this problem 
  by
  verifying the sensitivity level mapping similar to what is done with the
  category mapping.  This patch also changes the returned error code in this 
  case
  to -EPERM to better match what the category mapping verification code 
  returns.
  
  Signed-off-by: Paul Moore [EMAIL PROTECTED]
 
 [removed redhat-lspp, which is subscriber only]
 
 Acked-by: James Morris [EMAIL PROTECTED]

Applied, thanks everyone.

If -stable inclusion is desired, please submit this patch there.
You can add my signoff if you want:

Signed-off-by: David S. Miller [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PATCH: Second try at vlan mailing list patch.

2007-03-02 Thread David Miller
From: Ben Greear [EMAIL PROTECTED]
Date: Wed, 28 Feb 2007 15:25:59 -0800

 Hopefully, by attaching it as a file it will not screw up the tabs  spaces.
 
 Signed-off-by:  Ben Greear [EMAIL PROTECTED]

Nope still doesn't apply.

I can guess that you didn't try emailing the patch to yourself and
applying it?  If so I'm basically still your guinea pig each time you
correct this problem.  How nice that is :-/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bridge: avoid ptype_all packet handling

2007-03-02 Thread David Miller
From: Stephen Hemminger [EMAIL PROTECTED]
Date: Wed, 28 Feb 2007 17:18:46 -0800

 I was measuring bridging/routing performance and noticed this.
 
 The current code runs the all packet type handlers before calling the
 bridge hook.  If an application (like some DHCP clients) is using AF_PACKET,
 this means that each received packet gets run through the Berkeley Packet 
 Filter
 code in sk_run_filter (slow).

I know we closed this out by saying that even though performance
sucks, we can't really apply this without breaking things.

What would be broken is if the DHCP client isn't specifying
a device ifindex when it binds the AF_PACKET socket.  That
would be an easy way to fix this performance problem at the
application level.

The DHCP client should only care about a particular interface's
traffic, the one it wants to listen on.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] [TCP]: Add two new spurious RTO responses to FRTO

2007-03-02 Thread David Miller
From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Thu, 1 Mar 2007 13:30:20 +0200 (EET)

 [PATCH] [TCP]: Complete icsk-to-local-variable change (in tcp_enter_cwr)
 
 A local variable for icsk was created but this change was
 missing. Spotted by Jarek Poplawski.
 
 Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]

Applied to tcp-2.6, thank you.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] [TCP]: Move clearing of the prior_ssthresh due to ECE earlier

2007-03-02 Thread David Miller
From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Thu, 1 Mar 2007 22:26:57 +0200 (EET)

 I think that doing it in the response is better that this approach,
 since it knows that the ssthresh has been halved already within that
 round-trip, so there is no need to do that again... I'll submit the
 patch tomorrow... With this prior_ssthresh clearing move alone, the 
 ssthresh ends up being halved twice if I tought it right (first in 
 tcp_enter_frto and then again in tcp_enter_cwr that is called from 
 fastretrans_alert)... So please, drop this patch.

Ok.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] [TCP]: FRTO undo response falls back to ratehalving one if ECEd

2007-03-02 Thread David Miller
From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Fri, 2 Mar 2007 14:34:36 +0200 (EET)

 Undoing ssthresh is disabled in fastretrans_alert whenever
 FLAG_ECE is set by clearing prior_ssthresh. The clearing does
 not protect FRTO because FRTO operates before fastretrans_alert.
 Moving the clearing of prior_ssthresh earlier seems to be a
 suboptimal solution to the FRTO case because then FLAG_ECE will
 cause a second ssthresh reduction in try_to_open (the first
 occurred when FRTO was entered). So instead, FRTO falls back
 immediately to the rate halving response, which switches TCP to
 CA_CWR state preventing the latter reduction of ssthresh.
 
 If the first ECE arrived before the ACK after which FRTO is able
 to decide RTO as spurious, prior_ssthresh is already cleared.
 Thus no undoing for ssthresh occurs. Besides, FLAG_ECE should be
 set also in the following ACKs resulting in rate halving response
 that sees TCP is already in CA_CWR, which again prevents an extra
 ssthresh reduction on that round-trip.
 
 If the first ECE arrived before RTO, ssthresh has already been
 adapted and prior_ssthresh remains cleared on entry because TCP
 is in CA_CWR (the same applies also to a case where FRTO is
 entered more than once and ECE comes in the middle).
 
 High_seq must not be touched after tcp_enter_cwr because CWR
 round-trip calculation depends on it.
 
 I believe that after this patch, FRTO should be ECN-safe and
 even able to take advantage of synergy benefits.
 
 Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]

Applied, but I had to apply this by hand, you did not generate
this diff against tcp-2.6

And I'm very angry about this specific case because I told you
EXPLICITLY that I reformated the switch() statement when I applied
the earlier FRTO patches.

Not only are people expected to patch against tcp-2.6, BUT I TOLD
YOU specifically that I modified your patch in this specific area.

What else do I need to do in order for people to generate clean
patches? :-(  Tell me, I'll do it!!!

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SWS for rcvbuf MTU

2007-03-02 Thread David Miller
From: John Heffner [EMAIL PROTECTED]
Date: Fri, 02 Mar 2007 16:16:39 -0500

 Please don't apply the patch I sent.  I've been thinking about this a 
 bit harder, and it may not fix this particular problem.  (Hard to say 
 without knowing exactly what it is.)  As the comment above 
 __tcp_select_window() states, we do not do full receive-side SWS 
 avoidance because of header prediction.
 
 Alex, you're right I missed that special zero-window case.  I'm still 
 not quite sure I'm completely happy with this patch.  I'd like to think 
 about this a little bit harder...

Ok
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vlan net drivers: avoid a 4-order allocation]

2007-03-02 Thread David Miller
From: Dan Aloni [EMAIL PROTECTED]
Date: Thu, 1 Mar 2007 12:02:17 +0200

 This patch splits the vlan_group struct into a multi-allocated struct. On
 x86_64, the size of the original struct is a little more than 32KB, causing
 a 4-order allocation, which is prune to problems caused by buddy-system 
 external fragmentation conditions.
 
 I couldn't just use vmalloc() because vfree() cannot be called in the
 softirq context of the RCU callback.
 
 Signed-off-by: Dan Aloni [EMAIL PROTECTED]

No objections, this really needs to be fixed, applied.

Thank you.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] udp: whitespace fixes

2007-03-02 Thread Stephen Hemminger

The udp code is full of bad indenting, extra whitespace and other
style confusion.  It makes no sense to declare functions that are used
outside the current file (extern) as inline.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]
---
 net/ipv4/udp.c |  402 -
 net/ipv6/udp.c |  175 +---
 2 files changed, 295 insertions(+), 282 deletions(-)

--- tcp-2.6.orig/net/ipv4/udp.c 2007-03-02 12:08:06.0 -0800
+++ tcp-2.6/net/ipv4/udp.c  2007-03-02 12:37:38.0 -0800
@@ -120,8 +120,8 @@
struct hlist_node *node;
 
sk_for_each(sk, node, udptable[num  (UDP_HTABLE_SIZE - 1)])
-   if (sk-sk_hash == num)
-   return 1;
+   if (sk-sk_hash == num)
+   return 1;
return 0;
 }
 
@@ -136,13 +136,13 @@
  */
 int __udp_lib_get_port(struct sock *sk, unsigned short snum,
   struct hlist_head udptable[], int *port_rover,
-  int (*saddr_comp)(const struct sock *sk1,
-const struct sock *sk2 ))
+  int (*saddr_comp) (const struct sock * sk1,
+ const struct sock * sk2))
 {
struct hlist_node *node;
struct hlist_head *head;
struct sock *sk2;
-   interror = 1;
+   int error = 1;
 
write_lock_bh(udp_hash_lock);
if (snum == 0) {
@@ -160,8 +160,9 @@
if (hlist_empty(head)) {
if (result  sysctl_local_port_range[1])
result = sysctl_local_port_range[0] +
-   ((result - 
sysctl_local_port_range[0]) 
-(UDP_HTABLE_SIZE - 1));
+   ((result -
+ sysctl_local_port_range[0]) 
+(UDP_HTABLE_SIZE - 1));
goto gotit;
}
size = 0;
@@ -175,12 +176,13 @@
;
}
result = best;
-   for(i = 0; i  (1  16) / UDP_HTABLE_SIZE; i++, result += 
UDP_HTABLE_SIZE) {
+   for (i = 0; i  (1  16) / UDP_HTABLE_SIZE;
+i++, result += UDP_HTABLE_SIZE) {
if (result  sysctl_local_port_range[1])
result = sysctl_local_port_range[0]
-   + ((result - 
sysctl_local_port_range[0]) 
-  (UDP_HTABLE_SIZE - 1));
-   if (! __udp_lib_lport_inuse(result, udptable))
+   + ((result - sysctl_local_port_range[0]) 
+  (UDP_HTABLE_SIZE - 1));
+   if (!__udp_lib_lport_inuse(result, udptable))
break;
}
if (i = (1  16) / UDP_HTABLE_SIZE)
@@ -191,13 +193,13 @@
head = udptable[snum  (UDP_HTABLE_SIZE - 1)];
 
sk_for_each(sk2, node, head)
-   if (sk2-sk_hash == snum 
-   sk2 != sk
-   (!sk2-sk_reuse|| !sk-sk_reuse) 
-   (!sk2-sk_bound_dev_if || !sk-sk_bound_dev_if
-|| sk2-sk_bound_dev_if == sk-sk_bound_dev_if) 
-   (*saddr_comp)(sk, sk2) )
-   goto fail;
+   if (sk2-sk_hash == snum 
+   sk2 != sk 
+   (!sk2-sk_reuse || !sk-sk_reuse) 
+   (!sk2-sk_bound_dev_if || !sk-sk_bound_dev_if
+|| sk2-sk_bound_dev_if == sk-sk_bound_dev_if) 
+   (*saddr_comp) (sk, sk2))
+   goto fail;
}
inet_sk(sk)-num = snum;
sk-sk_hash = snum;
@@ -212,19 +214,19 @@
return error;
 }
 
-__inline__ int udp_get_port(struct sock *sk, unsigned short snum,
-   int (*scmp)(const struct sock *, const struct sock *))
+int udp_get_port(struct sock *sk, unsigned short snum,
+int (*scmp) (const struct sock *, const struct sock *))
 {
-   return  __udp_lib_get_port(sk, snum, udp_hash, udp_port_rover, scmp);
+   return __udp_lib_get_port(sk, snum, udp_hash, udp_port_rover, scmp);
 }
 
-inline int ipv4_rcv_saddr_equal(const struct sock *sk1, const struct sock *sk2)
+int ipv4_rcv_saddr_equal(const struct sock *sk1, const struct sock *sk2)
 {
-   struct inet_sock *inet1 = inet_sk(sk1), *inet2 = inet_sk(sk2);
+   const struct inet_sock *inet1 = inet_sk(sk1), *inet2 = inet_sk(sk2);
 
- 

[RFC 2/2] bridge: per device promiscious taps

2007-03-02 Thread Stephen Hemminger
Part of the next set of bridge patches includes this.

It allows packet capture by interface on a bridge:
tcpdump -i eth0

will work as expected.

@@ -128,34 +125,45 @@ static inline int is_link_local(const un
 int br_handle_frame(struct net_bridge_port *p, struct sk_buff **pskb)
 {
struct sk_buff *skb = *pskb;
+   struct sk_buff *skb2 = NULL;
const unsigned char *dest = eth_hdr(skb)-h_dest;
 
if (!is_valid_ether_addr(eth_hdr(skb)-h_source))
goto err;

if (unlikely(is_link_local(dest))) {
skb-pkt_type = PACKET_HOST;
return NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_IN, skb, skb-dev,
   NULL, br_handle_local_finish) != 0;
}
+
+   if (unlikely(p-dev-promiscuity  1))
+   skb2 = skb_clone(skb, GFP_ATOMIC);
 
-   if (p-state == BR_STATE_FORWARDING || p-state == BR_STATE_LEARNING) {
+   switch (p-state) {
+   case BR_STATE_FORWARDING:
if (br_should_route_hook) {
-   if (br_should_route_hook(pskb))
+   if (br_should_route_hook(pskb)) {
+   kfree_skb(skb2);
return 0;
+   }
skb = *pskb;
dest = eth_hdr(skb)-h_dest;
}
 
if (!compare_ether_addr(p-br-dev-dev_addr, dest))
skb-pkt_type = PACKET_HOST;
+   /* fall thru */
 
+   case BR_STATE_LEARNING:
NF_HOOK(PF_BRIDGE, NF_BR_PRE_ROUTING, skb, skb-dev, NULL,
br_handle_frame_finish);
-   return 1;
+   break;
+
+   default:
+   kfree_skb(skb);
}
 
-err:
-   kfree_skb(skb);
-   return 1;
+   if (likely(!skb2))
+   return 1;
+
+   *pskb = skb2;
+   return 0;
 }
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 1/2] bridge: avoid ptype_all packet handling

2007-03-02 Thread Stephen Hemminger
On Fri, 02 Mar 2007 13:26:38 -0800 (PST)
David Miller [EMAIL PROTECTED] wrote:

 From: Stephen Hemminger [EMAIL PROTECTED]
 Date: Wed, 28 Feb 2007 17:18:46 -0800
 
  I was measuring bridging/routing performance and noticed this.
  
  The current code runs the all packet type handlers before calling the
  bridge hook.  If an application (like some DHCP clients) is using AF_PACKET,
  this means that each received packet gets run through the Berkeley Packet 
  Filter
  code in sk_run_filter (slow).
 
 I know we closed this out by saying that even though performance
 sucks, we can't really apply this without breaking things.

wrong.

 What would be broken is if the DHCP client isn't specifying
 a device ifindex when it binds the AF_PACKET socket.  That
 would be an easy way to fix this performance problem at the
 application level.
 
 The DHCP client should only care about a particular interface's
 traffic, the one it wants to listen on.


My assumption is that when bridging, the normal stack path only has
to receive those packets that it would receive if it was not doing
bridging.

A better version of the patch is:
==

The current code runs the all packet type handlers before calling the
bridge hook.  If an application (like some DHCP clients) is using AF_PACKET,
this means that each received packet gets run through the Berkeley Packet Filter
code in sk_run_filter. This is significant overhead.

By moving the bridging hook to run first, the packets flowing through
the bridge get filtered out there first. This results in a 14%
improvement in performance.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]
---
 net/core/dev.c |   24 
 1 file changed, 12 insertions(+), 12 deletions(-)

--- netem.orig/net/core/dev.c
+++ netem/net/core/dev.c
@@ -1702,9 +1702,12 @@ struct net_bridge_fdb_entry *(*br_fdb_ge
unsigned char *addr);
 void (*br_fdb_put_hook)(struct net_bridge_fdb_entry *ent);
 
-static __inline__ int handle_bridge(struct sk_buff **pskb,
-   struct packet_type **pt_prev, int *ret,
-   struct net_device *orig_dev)
+/*
+ * If bridge module is loaded call bridging hook.
+ * when it returns 1, this is a non-local packet
+ */
+int (*br_handle_frame_hook)(struct net_bridge_port *p, struct sk_buff **pskb) 
__read_mostly;
+static int handle_bridge(struct sk_buff **pskb)
 {
struct net_bridge_port *port;
 
@@ -1712,15 +1715,10 @@ static __inline__ int handle_bridge(stru
(port = rcu_dereference((*pskb)-dev-br_port)) == NULL)
return 0;
 
-   if (*pt_prev) {
-   *ret = deliver_skb(*pskb, *pt_prev, orig_dev);
-   *pt_prev = NULL;
-   }
-
return br_handle_frame_hook(port, pskb);
 }
 #else
-#define handle_bridge(skb, pt_prev, ret, orig_dev) (0)
+#define handle_bridge(pskb)0
 #endif
 
 #ifdef CONFIG_NET_CLS_ACT
@@ -1799,6 +1797,9 @@ int netif_receive_skb(struct sk_buff *sk
}
 #endif
 
+   if (handle_bridge(skb))
+   goto out;
+
list_for_each_entry_rcu(ptype, ptype_all, list) {
if (!ptype-dev || ptype-dev == skb-dev) {
if (pt_prev)
@@ -1826,9 +1827,6 @@ int netif_receive_skb(struct sk_buff *sk
 ncls:
 #endif
 
-   if (handle_bridge(skb, pt_prev, ret, orig_dev))
-   goto out;
-
type = skb-protocol;
list_for_each_entry_rcu(ptype, ptype_base[ntohs(type)15], list) {
if (ptype-type == type 



-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 1/2] bridge: avoid ptype_all packet handling

2007-03-02 Thread Stephen Hemminger
On Fri, 02 Mar 2007 13:26:38 -0800 (PST)
David Miller [EMAIL PROTECTED] wrote:

 From: Stephen Hemminger [EMAIL PROTECTED]
 Date: Wed, 28 Feb 2007 17:18:46 -0800
 
  I was measuring bridging/routing performance and noticed this.
  
  The current code runs the all packet type handlers before calling the
  bridge hook.  If an application (like some DHCP clients) is using AF_PACKET,
  this means that each received packet gets run through the Berkeley Packet 
  Filter
  code in sk_run_filter (slow).
 
 I know we closed this out by saying that even though performance
 sucks, we can't really apply this without breaking things.

wrong.

 What would be broken is if the DHCP client isn't specifying
 a device ifindex when it binds the AF_PACKET socket.  That
 would be an easy way to fix this performance problem at the
 application level.
 
 The DHCP client should only care about a particular interface's
 traffic, the one it wants to listen on.


My assumption is that when bridging, the normal stack path only has
to receive those packets that it would receive if it was not doing
bridging.

A better version of the patch is:
==

The current code runs the all packet type handlers before calling the
bridge hook.  If an application (like some DHCP clients) is using AF_PACKET,
this means that each received packet gets run through the Berkeley Packet Filter
code in sk_run_filter. This is significant overhead.

By moving the bridging hook to run first, the packets flowing through
the bridge get filtered out there first. This results in a 14%
improvement in performance.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]
---
 net/core/dev.c |   24 
 1 file changed, 12 insertions(+), 12 deletions(-)

--- netem.orig/net/core/dev.c
+++ netem/net/core/dev.c
@@ -1702,9 +1702,12 @@ struct net_bridge_fdb_entry *(*br_fdb_ge
unsigned char *addr);
 void (*br_fdb_put_hook)(struct net_bridge_fdb_entry *ent);
 
-static __inline__ int handle_bridge(struct sk_buff **pskb,
-   struct packet_type **pt_prev, int *ret,
-   struct net_device *orig_dev)
+/*
+ * If bridge module is loaded call bridging hook.
+ * when it returns 1, this is a non-local packet
+ */
+int (*br_handle_frame_hook)(struct net_bridge_port *p, struct sk_buff **pskb) 
__read_mostly;
+static int handle_bridge(struct sk_buff **pskb)
 {
struct net_bridge_port *port;
 
@@ -1712,15 +1715,10 @@ static __inline__ int handle_bridge(stru
(port = rcu_dereference((*pskb)-dev-br_port)) == NULL)
return 0;
 
-   if (*pt_prev) {
-   *ret = deliver_skb(*pskb, *pt_prev, orig_dev);
-   *pt_prev = NULL;
-   }
-
return br_handle_frame_hook(port, pskb);
 }
 #else
-#define handle_bridge(skb, pt_prev, ret, orig_dev) (0)
+#define handle_bridge(pskb)0
 #endif
 
 #ifdef CONFIG_NET_CLS_ACT
@@ -1799,6 +1797,9 @@ int netif_receive_skb(struct sk_buff *sk
}
 #endif
 
+   if (handle_bridge(skb))
+   goto out;
+
list_for_each_entry_rcu(ptype, ptype_all, list) {
if (!ptype-dev || ptype-dev == skb-dev) {
if (pt_prev)
@@ -1826,9 +1827,6 @@ int netif_receive_skb(struct sk_buff *sk
 ncls:
 #endif
 
-   if (handle_bridge(skb, pt_prev, ret, orig_dev))
-   goto out;
-
type = skb-protocol;
list_for_each_entry_rcu(ptype, ptype_base[ntohs(type)15], list) {
if (ptype-type == type 



-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET : convert network timestamps to ktime_t

2007-03-02 Thread Eric Dumazet

Stephen Hemminger a écrit :

On Fri, 2 Mar 2007 15:38:41 +0100
Eric Dumazet [EMAIL PROTECTED] wrote:

We currently use a special structure (struct skb_timeval) and plain 'struct 
timeval' to store packet timestamps in sk_buffs and struct sock.


This has some drawbacks :
- Fixed resolution of micro second.
- Waste of space on 64bit platforms where sizeof(struct timeval)=16

I suggest using ktime_t that is a nice abstraction of high resolution time 
services, currently capable of nanosecond resolution.


As sizeof(ktime_t) is 8 bytes, using ktime_t in 'struct sock' permits a 8 byte 
shrink of this structure on 64bit architectures. Some other structures also 
benefit from this size reduction (struct ipq in ipv4/ip_fragment.c, struct 
frag_queue in ipv6/reassembly.c, ...)





You missed a couple of spots.


Arg yes...



--- tcp-2.6.orig/net/sunrpc/svcsock.c   2007-03-02 12:50:45.0 -0800
+++ tcp-2.6/net/sunrpc/svcsock.c2007-03-02 12:58:28.0 -0800
@@ -805,16 +805,9 @@
/* possibly an icmp error */
dprintk(svc: recvfrom returned error %d\n, -err);
}
-   if (skb-tstamp.off_sec == 0) {
-   struct timeval tv;
 
-		tv.tv_sec = xtime.tv_sec;

-   tv.tv_usec = xtime.tv_nsec / NSEC_PER_USEC;
-   skb_set_timestamp(skb, tv);
-   /* Don't enable netstamp, sunrpc doesn't
-  need that much accuracy */
-   }
-   skb_get_timestamp(skb, svsk-sk_sk-sk_stamp);
+   svsk-sk_sk-sk_stamp = (skb-tstamp.tv64 != 0) ? skb-tstamp
+   : ktime_get_real();


Well, if we want to stay in the spirit of old code, we probably want to use 
current_kernel_time() (+ timespec_to_ktime()), because its less expensive.


And also setting the skb tstamp, no ?


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/2] bridge: avoid ptype_all packet handling

2007-03-02 Thread David Miller
From: Stephen Hemminger [EMAIL PROTECTED]
Date: Fri, 2 Mar 2007 14:09:29 -0800

 On Fri, 02 Mar 2007 13:26:38 -0800 (PST)
 David Miller [EMAIL PROTECTED] wrote:
 
  From: Stephen Hemminger [EMAIL PROTECTED]
  Date: Wed, 28 Feb 2007 17:18:46 -0800
  
   I was measuring bridging/routing performance and noticed this.
   
   The current code runs the all packet type handlers before calling the
   bridge hook.  If an application (like some DHCP clients) is using 
   AF_PACKET,
   this means that each received packet gets run through the Berkeley Packet 
   Filter
   code in sk_run_filter (slow).
  
  I know we closed this out by saying that even though performance
  sucks, we can't really apply this without breaking things.
 
 wrong.

I disagee, and your patch is still broken because as Jamal
pointed out (which you didn't address in any way) this breaks
traffic classification of bridged traffic as well.

If someone wants their network tap to hear all traffic, they do mean
all traffic, and this includes potentially seeing it multiple times
when things like bridging and virtual devices decap incoming frames.

We can't apply this.

Back to a workable solution, why doesn't DHCP specify a specific
device?  It would fix this performance problem completely, at the
application level.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/2] bridge: avoid ptype_all packet handling

2007-03-02 Thread David Miller
From: David Miller [EMAIL PROTECTED]
Date: Fri, 02 Mar 2007 14:48:18 -0800 (PST)

 Back to a workable solution, why doesn't DHCP specify a specific
 device?  It would fix this performance problem completely, at the
 application level.

Since nobody seems to be able to be bothered to actually look
at what DHCP clients are doing, I actually did and it's no
surprise that broken stuff is happening here.

Here is how dhcp3-3.0.3 binds AF_PACKET sockets, in common/lpf.c:

struct sockaddr sa;
 ...
/* Bind to the interface name */
memset (sa, 0, sizeof sa);
sa.sa_family = AF_PACKET;
strncpy (sa.sa_data, (const char *)info - ifp, sizeof sa.sa_data);
if (bind (sock, sa, sizeof sa)) {
if (errno == ENOPROTOOPT || errno == EPROTONOSUPPORT ||
errno == ESOCKTNOSUPPORT || errno == EPFNOSUPPORT ||
errno == EAFNOSUPPORT || errno == EINVAL) {
log_error (socket: %m - make sure);
log_error (CONFIG_PACKET (Packet socket) %s,
   and CONFIG_FILTER);
log_error ((Socket Filtering) are enabled %s,
   in your kernel);
log_fatal (configuration!);
}
log_fatal (Bind socket to interface: %m);
}

So it puts a string into the sockaddr data, and there
is no mention of sockaddr_ll, which is what is supposed to be
provided as the socket address here, in the entire DHCP tree.

I'm tempted to say I must be missing something here, since I can't see
how this could possible work at all.  The string passed in should
be interpreted as the ifindex value, and thus trigger a -ENODEV
return from AF_PACKET's bind() implementation.

My suspicions are confirmed by the patch here:

http://kernel.org/pub/linux/kernel/people/chuyee/patches/dhcp-3.0/dhcp-3.0-linux_cooked_packet.patch

Really, this bogus bind() explains everything.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/2] bridge: avoid ptype_all packet handling

2007-03-02 Thread Stephen Hemminger
On Fri, 02 Mar 2007 15:18:03 -0800 (PST)
David Miller [EMAIL PROTECTED] wrote:

 From: David Miller [EMAIL PROTECTED]
 Date: Fri, 02 Mar 2007 14:48:18 -0800 (PST)
 
  Back to a workable solution, why doesn't DHCP specify a specific
  device?  It would fix this performance problem completely, at the
  application level.
 
 Since nobody seems to be able to be bothered to actually look
 at what DHCP clients are doing, I actually did and it's no
 surprise that broken stuff is happening here.

I was in middle of checking that..

 Here is how dhcp3-3.0.3 binds AF_PACKET sockets, in common/lpf.c:
 
   struct sockaddr sa;
  ...
   /* Bind to the interface name */
   memset (sa, 0, sizeof sa);
   sa.sa_family = AF_PACKET;
   strncpy (sa.sa_data, (const char *)info - ifp, sizeof sa.sa_data);
   if (bind (sock, sa, sizeof sa)) {
   if (errno == ENOPROTOOPT || errno == EPROTONOSUPPORT ||
   errno == ESOCKTNOSUPPORT || errno == EPFNOSUPPORT ||
   errno == EAFNOSUPPORT || errno == EINVAL) {
   log_error (socket: %m - make sure);
   log_error (CONFIG_PACKET (Packet socket) %s,
  and CONFIG_FILTER);
   log_error ((Socket Filtering) are enabled %s,
  in your kernel);
   log_fatal (configuration!);
   }
   log_fatal (Bind socket to interface: %m);
   }
 
 So it puts a string into the sockaddr data, and there
 is no mention of sockaddr_ll, which is what is supposed to be
 provided as the socket address here, in the entire DHCP tree.
 
 I'm tempted to say I must be missing something here, since I can't see
 how this could possible work at all.  The string passed in should
 be interpreted as the ifindex value, and thus trigger a -ENODEV
 return from AF_PACKET's bind() implementation.
 
 My suspicions are confirmed by the patch here:
 
 http://kernel.org/pub/linux/kernel/people/chuyee/patches/dhcp-3.0/dhcp-3.0-linux_cooked_packet.patch

Can you get FC fixed?

 Really, this bogus bind() explains everything.

Should we add a warning to kernel log, to make distro's fix it?

It might make sense to add a per-device ptype_dev list in network device?



-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 8107] New: dev-header_cache_update has a random value

2007-03-02 Thread Krzysztof Halasa
Switching HDLC devices from Ethernet-framing mode caused stale ethernet
function assignments within net_device.

Signed-off-by: Krzysztof Halasa [EMAIL PROTECTED]

diff --git a/drivers/net/wan/hdlc.c b/drivers/net/wan/hdlc.c
index db354e0..f6e6b63 100644
--- a/drivers/net/wan/hdlc.c
+++ b/drivers/net/wan/hdlc.c
@@ -38,7 +38,7 @@
 #include linux/hdlc.h
 
 
-static const char* version = HDLC support module revision 1.20;
+static const char* version = HDLC support module revision 1.21;
 
 #undef DEBUG_LINK
 
@@ -222,19 +222,30 @@ int hdlc_ioctl(struct net_device *dev, struct ifreq *ifr, 
int cmd)
return -EINVAL;
 }
 
+static void hdlc_setup_dev(struct net_device *dev)
+{
+/* Re-init all variables changed by HDLC protocol drivers,
+   including ether_setup() called from hdlc_raw_eth.c. */
+   dev-get_stats   = hdlc_get_stats;
+   dev-flags   = IFF_POINTOPOINT | IFF_NOARP;
+   dev-mtu = HDLC_MAX_MTU;
+   dev-type= ARPHRD_RAWHDLC;
+   dev-hard_header_len = 16;
+   dev-addr_len= 0;
+   dev-hard_header = NULL;
+   dev-rebuild_header  = NULL;
+   dev-set_mac_address = NULL;
+   dev-hard_header_cache   = NULL;
+   dev-header_cache_update = NULL;
+   dev-change_mtu  = hdlc_change_mtu;
+   dev-hard_header_parse   = NULL;
+}
+
 void hdlc_setup(struct net_device *dev)
 {
hdlc_device *hdlc = dev_to_hdlc(dev);
 
-   dev-get_stats = hdlc_get_stats;
-   dev-change_mtu = hdlc_change_mtu;
-   dev-mtu = HDLC_MAX_MTU;
-
-   dev-type = ARPHRD_RAWHDLC;
-   dev-hard_header_len = 16;
-
-   dev-flags = IFF_POINTOPOINT | IFF_NOARP;
-
+   hdlc_setup_dev(dev);
hdlc-carrier = 1;
hdlc-open = 0;
spin_lock_init(hdlc-state_lock);
@@ -294,6 +305,7 @@ void detach_hdlc_protocol(struct net_device *dev)
}
kfree(hdlc-state);
hdlc-state = NULL;
+   hdlc_setup_dev(dev);
 }
 
 
diff --git a/drivers/net/wan/hdlc_cisco.c b/drivers/net/wan/hdlc_cisco.c
index b0bc5dd..c9664fd 100644
--- a/drivers/net/wan/hdlc_cisco.c
+++ b/drivers/net/wan/hdlc_cisco.c
@@ -365,10 +365,7 @@ static int cisco_ioctl(struct net_device *dev, struct 
ifreq *ifr)
memcpy(state(hdlc)-settings, new_settings, size);
dev-hard_start_xmit = hdlc-xmit;
dev-hard_header = cisco_hard_header;
-   dev-hard_header_cache = NULL;
dev-type = ARPHRD_CISCO;
-   dev-flags = IFF_POINTOPOINT | IFF_NOARP;
-   dev-addr_len = 0;
netif_dormant_on(dev);
return 0;
}
diff --git a/drivers/net/wan/hdlc_fr.c b/drivers/net/wan/hdlc_fr.c
index b45ab68..c6c3c75 100644
--- a/drivers/net/wan/hdlc_fr.c
+++ b/drivers/net/wan/hdlc_fr.c
@@ -1289,10 +1289,7 @@ static int fr_ioctl(struct net_device *dev, struct ifreq 
*ifr)
memcpy(state(hdlc)-settings, new_settings, size);
 
dev-hard_start_xmit = hdlc-xmit;
-   dev-hard_header = NULL;
dev-type = ARPHRD_FRAD;
-   dev-flags = IFF_POINTOPOINT | IFF_NOARP;
-   dev-addr_len = 0;
return 0;
 
case IF_PROTO_FR_ADD_PVC:
diff --git a/drivers/net/wan/hdlc_ppp.c b/drivers/net/wan/hdlc_ppp.c
index e9f7170..4591437 100644
--- a/drivers/net/wan/hdlc_ppp.c
+++ b/drivers/net/wan/hdlc_ppp.c
@@ -127,9 +127,7 @@ static int ppp_ioctl(struct net_device *dev, struct ifreq 
*ifr)
if (result)
return result;
dev-hard_start_xmit = hdlc-xmit;
-   dev-hard_header = NULL;
dev-type = ARPHRD_PPP;
-   dev-addr_len = 0;
netif_dormant_off(dev);
return 0;
}
diff --git a/drivers/net/wan/hdlc_raw.c b/drivers/net/wan/hdlc_raw.c
index fe3cae5..e23bc66 100644
--- a/drivers/net/wan/hdlc_raw.c
+++ b/drivers/net/wan/hdlc_raw.c
@@ -88,10 +88,7 @@ static int raw_ioctl(struct net_device *dev, struct ifreq 
*ifr)
return result;
memcpy(hdlc-state, new_settings, size);
dev-hard_start_xmit = hdlc-xmit;
-   dev-hard_header = NULL;
dev-type = ARPHRD_RAWHDLC;
-   dev-flags = IFF_POINTOPOINT | IFF_NOARP;
-   dev-addr_len = 0;
netif_dormant_off(dev);
return 0;
}
diff --git a/drivers/net/wan/hdlc_x25.c b/drivers/net/wan/hdlc_x25.c
index e4bb9f8..cd7b22f 100644
--- a/drivers/net/wan/hdlc_x25.c
+++ b/drivers/net/wan/hdlc_x25.c
@@ -215,9 +215,7 @@ static int x25_ioctl(struct net_device *dev, struct ifreq 
*ifr)
   x25_rx, 0)) != 0)
return result;
dev-hard_start_xmit = x25_xmit;
-   dev-hard_header = NULL;
dev-type = ARPHRD_X25;
-   dev-addr_len = 

Re: [Bugme-new] [Bug 8107] New: dev-header_cache_update has a random value

2007-03-02 Thread Krzysztof Halasa
David Miller [EMAIL PROTECTED] writes:

 I disagree, you can't leave dangling references to functions
 which are potentially inside of unloaded modules, as this code
 does.

All such pointers were thought to be initialized by all HDLC protocol
handlers before device activation, but they were actually used by the
hdlc* code, and this one doesn't seem to...

 Rather, HDLC Cisco should implement a proper protocol destructor
 method to clean up these function pointers.

No, it wouldn't work - hdlc_cisco doesn't use it at all, it's just
a victim. But now I think there may be other victims.

It seems the only way to become non-NULL is through ether_setup()
from hdlc_raw_eth.c (Ethernet framing over HDLC).

I think it's best to NULLify it and the like in hdlc.c
unconditionally, it's slow path and we don't need another useless
EXPORT_SYMBOL(s). It would fix all such problems forever.

Compile-tested only but it seems pretty obvious and of course I check
if the packets still flow after regular kernel upgrades (and I run
automatic tests checking all protos except X.25 from time to time as
well).

(the patch is in the next message).

Not sure if 2.6.21 material.
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/2] bridge: avoid ptype_all packet handling

2007-03-02 Thread David Miller
From: Stephen Hemminger [EMAIL PROTECTED]
Date: Fri, 2 Mar 2007 15:34:14 -0800

 Can you get FC fixed?

I am not the DHCP package maintainer. :-)

I'm up to my earfulls already dealing with people trying
to slug broken patches into the kernel networking that paper
around application bugs. ;)

 Should we add a warning to kernel log, to make distro's fix it?

Unfortunately it looks like a properly formed sockaddr_ll,
the ifindex is in fact zero, so there is nothing we can do
to warn about this case.

The sockaddr_ll sits after the first sockaddr string in the ifreq, and
the rest remains initialized to zeros, thus the bind() succeeds.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 8107] New: dev-header_cache_update has a random value

2007-03-02 Thread David Miller
From: Krzysztof Halasa [EMAIL PROTECTED]
Date: Sat, 03 Mar 2007 00:38:05 +0100

 Switching HDLC devices from Ethernet-framing mode caused stale ethernet
 function assignments within net_device.
 
 Signed-off-by: Krzysztof Halasa [EMAIL PROTECTED]

This looks good to me, I think I'll apply it :-)

Thanks!
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] udp: whitespace fixes

2007-03-02 Thread David Miller
From: Stephen Hemminger [EMAIL PROTECTED]
Date: Fri, 2 Mar 2007 14:04:49 -0800

 
 The udp code is full of bad indenting, extra whitespace and other
 style confusion.  It makes no sense to declare functions that are used
 outside the current file (extern) as inline.
 
 Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]
 ---
  net/ipv4/udp.c |  402 
 -
  net/ipv6/udp.c |  175 +---
  2 files changed, 295 insertions(+), 282 deletions(-)
 
 --- tcp-2.6.orig/net/ipv4/udp.c   2007-03-02 12:08:06.0 -0800
 +++ tcp-2.6/net/ipv4/udp.c2007-03-02 12:37:38.0 -0800
 @@ -120,8 +120,8 @@
   struct hlist_node *node;
  
   sk_for_each(sk, node, udptable[num  (UDP_HTABLE_SIZE - 1)])
 - if (sk-sk_hash == num)
 - return 1;
 + if (sk-sk_hash == num)
 + return 1;

This turns tabs into spaces, it cannot be correct.

Yoshifuji fixed all the whitespace problems under net/ already
for 2.6.21
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


tree plans...

2007-03-02 Thread David Miller

I plan to cut a net-2.6.22 tree after I finish pushing the
current round of 2.6.21 networking bug fixes to Linus.

I'll load the tcp-2.6 tree changes into net-2.6.22, and then
we'll do all non-bug-fix development in the net-2.6.22 tree.

It may take some time for me to push out the bug fixes for today
because due to the VLAN group allocation fix, I need to do an
exhaustive build test with allmodconfig and stuff like that to
make sure no drivers got accidently build broken by that change.

Thanks!
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: Convert xtime.tv_sec to get_seconds()

2007-03-02 Thread David Miller
From: James Morris [EMAIL PROTECTED]
Date: Tue, 27 Feb 2007 16:24:49 -0500 (EST)

 Where appropriate, convert references to xtime.tv_sec to the
 get_seconds() helper function.
 
 Signed-off-by: James Morris [EMAIL PROTECTED]

This looks great James, I'll apply it to net-2.6.2 once I set
that tree up.

Thanks again.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] pktgen: fix device name handling

2007-03-02 Thread David Miller
From: Robert Olsson [EMAIL PROTECTED]
Date: Wed, 28 Feb 2007 18:07:09 +0100

 Yes it seems be handle dev name change. So configuration scripts should
 use ifindex now :)
 
 Signed-off-by: Robert Olsson [EMAIL PROTECTED]

I will apply all 4 of these patches to net-2.6.22, thanks everyone.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Netem tfifo implementation

2007-03-02 Thread Ritesh Kumar

On 3/2/07, Stephen Hemminger [EMAIL PROTECTED] wrote:

On Fri, 2 Mar 2007 15:56:54 -0500
Ritesh Kumar [EMAIL PROTECTED] wrote:

 On 3/2/07, Patrick McHardy [EMAIL PROTECTED] wrote:
  Ritesh Kumar wrote:
   Hi,
  I recently saw the qdisc tfifo in the netem module
   (net/sched/sch_netem.c) when I migrated some of my patches from 2.6.14
   to 2.6.20. As I understand, tfifo helps in keeping the queue of
   packets sorted according to their time_to_send. [tfifo was not
   present in 2.6.14 perhaps because arrival order of packets was always
   equal to the departure order]. However, tfifo uses a linear search in
   the packet queue to find where to enqueue the packet.
  Quite some time ago (2.6.14 era), I needed a similar functionality
   from the netem module and I ended up coding a pointer based min-heap
   for the same. I was wondering if the community was interested in using
   the min-heap implementation to replace the linear search
   implementation. I have tested the min-heap quite a few times and it
   seems to work.
  The implementation is slightly non-trivial because it uses
   pointers to maintain the heap structure instead if using good old
   fixed size arrays. I did this mainly so that the limit of the netem
   qdisc could be changed on the fly. However, because every sk_buff now
   needs two pointers for its children nodes, I added an extra
   (sk_buff*)next2 to struct sk_buff (sorry!). However, this can probably
   be changed to a pointer inside netem_skb_cb.  Also, because I needed
   this for personal work and 2.6.14 didn't contain tfifo, I basically
   removed the embedded qdisc and made netem a classless qdisc with my
   min heap as the native queue (sorry again! :) )
 
  The tfifo qdisc has a limit, why not just allocate a fixed-size heap
  based on that?
 
 

 The tfifo queue limit itself can be changed and that creates the
 problem. If we use a fixed heap (say implemented using a fixed size
 array) then we will have to copy over all pointers from the first
 array to a reallocated array whenever the queue limit is changed.
 In retrospect, moving just a few 10s of kilobytes of data doesn't seem
 that much of a problem... now I feel stupid having put so much effort
 :).


Tfifo is a special case because:
  * timestamps are stored in skb-cb so it is only really usable inside
netem that adds timestamps.
  * insertions are cheap because it walks backwards and netem usually has
tnext  tlast.   Only if you have a huge jitter which causes massive 
reordering
and that is unrealistic, would you see a problem.



You are right. A huge jitter inside a given flow is unrealistic in
real networks. It can also cause artificial reordering. However, in
our lab we use netem (with my changes) to enable per-flow delays. The
per-flow delays that we use vary a lot and hence we have to go through
some optimizations.

Thanks for all the feedback.

Ritesh
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] udp: whitespace fixes

2007-03-02 Thread Stephen Hemminger
Resend with less garbage...

The udp code is full of bad indenting, extra whitespace and other
style confusion.  It makes no sense to declare functions that are used
outside the current file (extern) as inline.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]
---
 net/ipv4/udp.c |  312 +++--
 net/ipv6/udp.c |  153 +++
 2 files changed, 236 insertions(+), 229 deletions(-)

--- tcp-2.6.orig/net/ipv4/udp.c 2007-03-02 16:25:12.0 -0800
+++ tcp-2.6/net/ipv4/udp.c  2007-03-02 16:41:04.0 -0800
@@ -136,13 +136,13 @@
  */
 int __udp_lib_get_port(struct sock *sk, unsigned short snum,
   struct hlist_head udptable[], int *port_rover,
-  int (*saddr_comp)(const struct sock *sk1,
-const struct sock *sk2 ))
+  int (*saddr_comp)(const struct sock * sk1,
+const struct sock * sk2))
 {
struct hlist_node *node;
struct hlist_head *head;
struct sock *sk2;
-   interror = 1;
+   int error = 1;
 
write_lock_bh(udp_hash_lock);
if (snum == 0) {
@@ -160,7 +160,8 @@
if (hlist_empty(head)) {
if (result  sysctl_local_port_range[1])
result = sysctl_local_port_range[0] +
-   ((result - 
sysctl_local_port_range[0]) 
+   ((result -
+ sysctl_local_port_range[0]) 
 (UDP_HTABLE_SIZE - 1));
goto gotit;
}
@@ -175,12 +176,13 @@
;
}
result = best;
-   for(i = 0; i  (1  16) / UDP_HTABLE_SIZE; i++, result += 
UDP_HTABLE_SIZE) {
+   for (i = 0; i  (1  16) / UDP_HTABLE_SIZE;
+i++, result += UDP_HTABLE_SIZE) {
if (result  sysctl_local_port_range[1])
result = sysctl_local_port_range[0]
+ ((result - 
sysctl_local_port_range[0]) 
   (UDP_HTABLE_SIZE - 1));
-   if (! __udp_lib_lport_inuse(result, udptable))
+   if (!__udp_lib_lport_inuse(result, udptable))
break;
}
if (i = (1  16) / UDP_HTABLE_SIZE)
@@ -194,9 +196,8 @@
if (sk2-sk_hash == snum 
sk2 != sk
(!sk2-sk_reuse|| !sk-sk_reuse) 
-   (!sk2-sk_bound_dev_if || !sk-sk_bound_dev_if
 || sk2-sk_bound_dev_if == sk-sk_bound_dev_if) 
-   (*saddr_comp)(sk, sk2) )
+   (*saddr_comp)(sk, sk2))
goto fail;
}
inet_sk(sk)-num = snum;
@@ -212,19 +213,19 @@
return error;
 }
 
-__inline__ int udp_get_port(struct sock *sk, unsigned short snum,
-   int (*scmp)(const struct sock *, const struct sock *))
+int udp_get_port(struct sock *sk, unsigned short snum,
+int (*scmp)(const struct sock *, const struct sock *))
 {
-   return  __udp_lib_get_port(sk, snum, udp_hash, udp_port_rover, scmp);
+   return __udp_lib_get_port(sk, snum, udp_hash, udp_port_rover, scmp);
 }
 
-inline int ipv4_rcv_saddr_equal(const struct sock *sk1, const struct sock *sk2)
+int ipv4_rcv_saddr_equal(const struct sock *sk1, const struct sock *sk2)
 {
-   struct inet_sock *inet1 = inet_sk(sk1), *inet2 = inet_sk(sk2);
+   const struct inet_sock *inet1 = inet_sk(sk1), *inet2 = inet_sk(sk2);
 
-   return  ( !ipv6_only_sock(sk2)  
- (!inet1-rcv_saddr || !inet2-rcv_saddr ||
-  inet1-rcv_saddr == inet2-rcv_saddr  ));
+   return !ipv6_only_sock(sk2) 
+   (!inet1-rcv_saddr || !inet2-rcv_saddr ||
+inet1-rcv_saddr == inet2-rcv_saddr);
 }
 
 static inline int udp_v4_get_port(struct sock *sk, unsigned short snum)
@@ -253,27 +254,27 @@
if (inet-rcv_saddr) {
if (inet-rcv_saddr != daddr)
continue;
-   score+=2;
+   score += 2;
}
if (inet-daddr) {
if (inet-daddr != saddr)
continue;
-   score+=2;
+   score += 2;
   

Re: [PATCH] udp: whitespace fixes

2007-03-02 Thread David Miller
From: Stephen Hemminger [EMAIL PROTECTED]
Date: Fri, 2 Mar 2007 16:47:19 -0800

 Resend with less garbage...
 
 The udp code is full of bad indenting, extra whitespace and other
 style confusion.  It makes no sense to declare functions that are used
 outside the current file (extern) as inline.
 
 Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

Looks good, I'll try to apply this when I cut the net-2.6.22
tree.

Thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3]: Updates, removal of unsupported features and minor bug fixes.

2007-03-02 Thread Jeff Garzik

Linsys Contractor Mithlesh Thukral wrote:

NetXen: Updates, removal of unsupported features and minor bug fixes.

Signed-off-by: Mithlesh Thukral [EMAIL PROTECTED]

---
 netxen_nic.h  |4 +
 netxen_nic_ethtool.c  |  144 +-
 netxen_nic_main.c |4 -
 netxen_nic_phan_reg.h |3 +
 4 files changed, 34 insertions(+), 121 deletions(-)


applied patches 1-2 of 3


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] NetXen: Make driver use multi PCI functions

2007-03-02 Thread Jeff Garzik

Linsys Contractor Mithlesh Thukral wrote:

NetXen: Make driver use multi PCI functions.

Signed-off by: Mithlesh Thukral [EMAIL PROTECTED]

---

 netxen_nic.h  |  126 +---
 netxen_nic_ethtool.c  |   80 +++
 netxen_nic_hdr.h  |8 
 netxen_nic_hw.c   |  213 +++-

 netxen_nic_hw.h   |   18 -
 netxen_nic_init.c |  115 +++---
 netxen_nic_isr.c  |   80 +++
 netxen_nic_main.c |  523 +-
 netxen_nic_niu.c  |   27 +-
 netxen_nic_phan_reg.h |  125 ---
 10 files changed, 631 insertions(+), 684 deletions(-)


all three patches in this patchset contained nothing but one-line 
summaries of the changes included in them, and are overall very poorly 
and vaguely described.


This patch is far too big, with far too little description and 
justification to go along with it.


If you are not going to make the effort to write a paragraph or two 
describing such huge changes, then I'm not going to make the effort to 
review and apply it.  NAK.



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET] Add support for Seeq 8003 on Challenge S Mezz board.

2007-03-02 Thread Jeff Garzik

Ralf Baechle wrote:

From: Ladislav Michl [EMAIL PROTECTED]

Thanks to Jö Fahlke for donating hardware.

Signed-off-by: Ladislav Michl [EMAIL PROTECTED]

Forward porting of Ladis' 2.4 patch.

Signed-off-by: Ralf Baechle [EMAIL PROTECTED]


applied to #upstream (2.6.22)


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] tc35815 driver update (part 1)

2007-03-02 Thread Jeff Garzik

Atsushi Nemoto wrote:

Current tc35815 driver is very obsolete and less maintained for a long
time.  Replace it with a new driver based on one from CELF patch
archive.  It was for 2.6.10 kernel so some adjustment and cleanup are
added. (remove config.h, SA_ to IRQF_ conversion, etc.)

Major advantages are:

* Independent of JMR3927.
  (Actually independent of MIPS, but AFAIK the chip is used only on
   MIPS platforms)
* TX4938 support.
* 64-bit proof.
* Asynchronous and on-demand auto negotiation.
* High performance on non-coherent architecture.
* ethtool support.
* Many bugfixes and cleanups.

And next patch add further improvements/bugfixes/cleanups.

Signed-off-by: Atsushi Nemoto [EMAIL PROTECTED]
---
This is a patch against current linux-mips.org git-tree.

 drivers/net/Kconfig |3 
 drivers/net/tc35815.c   | 2070 +++---
 include/linux/pci_ids.h |1 
 3 files changed, 1440 insertions(+), 634 deletions(-)


Would you be kind enough to

a) provide a URL to a .c file (or post it, if it's under 100K) so that 
we may more easily review this


b) combine both patches into a single patch.  might as well, since it's 
a rewrite.


c) rediff your patch against linux-2.6.git + Ralf's killall removal 
patch, and resend.  There were some minor conflicting changes that 
appeared, though these changes will certainly become irrelevant once 
your new driver is merged.



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bridge: avoid ptype_all packet handling

2007-03-02 Thread Andi Kleen
David Miller [EMAIL PROTECTED] writes:
 
 And in fact that effectively makes the new socket option
 pointless, since it doesn't buy us anything since we have
 to support the old stuff fully anyways.

I don't think it's pointless because it would still allow
newer DHCP clients to have less impact on other packets
when they are active. 

This can matter when you have a system with multiple
interfaces where DHCP doesn't get a address on one.

That's pretty common with many x86 server boards because 
they come with two NICs by default but must people only
plug the cable into one. However the distro installers
run DHCP on all.

When this happens all packets are always forced through
ptype_all chains before being rejected by AF_PACKETs device
bind, which adds some overhead to them. 

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qla3xxx: bugfix for line omitted in previous patch.

2007-03-02 Thread Jeff Garzik

Ron Mercer wrote:

From 01751a39d7327acc28dabf4f68930b7e20b279d1 Mon Sep 17 00:00:00 2001

From: Ron Mercer [EMAIL PROTECTED]
Date: Wed, 28 Feb 2007 16:42:17 -0800
Subject: [PATCH] [PATCH] qla3xxx: bugfix for line omitted in previous patch.

This missing line caused transmit errors on the Qlogic 4032 chip.

Signed-off-by: Ron Mercer [EMAIL PROTECTED]


applied


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network activity LED trigger

2007-03-02 Thread Andi Kleen
Florian Fainelli [EMAIL PROTECTED] writes:

 Hi All,
 
 I have been talking a bit with Richard, who is the LED API maintainer, and a 
 LED trigger based on network activity would be something great.

You should be aware that normally the kernel doesn't see all packets
on a ethernet unless promiscuous mode is enabled (which it is normally 
not). That is because the hardware filters out all packets
not for this host. A software controlled LED wouldn't be equivalent
to the activity LEDs you normally have on network cards,
but only show local traffic.

That said if you want to get events for any in/outgoing packets
you can use the same hooks as PF_PACKET uses for sniffing;
using dev_add_pack with ETH_P_ALL.
That will get you all incoming and outgoing packets that 
are local.

And when someone runs tcpdump it will suddenly see all which
might be unexpected.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] bonding: only receive ARPs for us

2007-03-02 Thread Jeff Garzik

Jay Vosburgh wrote:

The ARP validation code only needs ARPs for the bonding device.

Signed-off-by: Jay Vosburgh [EMAIL PROTECTED]


I seem to have lost the context of this.  Did this get discussed, and 
need further revision?


The three patches from 2/28/2007 look OK to me, and I just wanted to 
make sure before applying them.


Jeff



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] div64_64 consolidate (rev3)

2007-03-02 Thread Stephen Hemminger
Here is the current version of the 64 bit divide common code.
Since it is used by three times by networking code, can we put it net-2.6.22 
tree?

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

---
 include/asm-arm/div64.h  |2 ++
 include/asm-generic/div64.h  |7 +++
 include/asm-i386/div64.h |2 ++
 include/asm-m68k/div64.h |1 +
 include/asm-mips/div64.h |2 ++
 include/asm-um/div64.h   |1 +
 include/asm-xtensa/div64.h   |4 
 lib/Makefile |5 +++--
 lib/div64.c  |   22 ++
 net/ipv4/tcp_cubic.c |   23 ---
 net/ipv4/tcp_yeah.c  |   21 -
 net/ipv4/tcp_yeah.h  |1 +
 net/netfilter/xt_connbytes.c |   16 
 13 files changed, 45 insertions(+), 62 deletions(-)

--- tcp-2.6.orig/include/asm-arm/div64.h2007-03-02 17:21:27.0 
-0800
+++ tcp-2.6/include/asm-arm/div64.h 2007-03-02 17:22:38.0 -0800
@@ -223,4 +223,6 @@
 
 #endif
 
+extern uint64_t div64_64(uint64_t dividend, uint64_t divisor);
+
 #endif
--- tcp-2.6.orig/include/asm-generic/div64.h2007-03-02 17:21:27.0 
-0800
+++ tcp-2.6/include/asm-generic/div64.h 2007-03-02 17:22:38.0 -0800
@@ -30,6 +30,11 @@
__rem;  \
  })
 
+static inline uint64_t div64_64(uint64_t dividend, uint64_t divisor)
+{
+   return dividend / divisor;
+}
+
 #elif BITS_PER_LONG == 32
 
 extern uint32_t __div64_32(uint64_t *dividend, uint32_t divisor);
@@ -49,6 +54,8 @@
__rem;  \
  })
 
+extern uint64_t div64_64(uint64_t dividend, uint64_t divisor);
+
 #else /* BITS_PER_LONG == ?? */
 
 # error do_div() does not yet support the C64
--- tcp-2.6.orig/include/asm-i386/div64.h   2007-03-02 17:21:27.0 
-0800
+++ tcp-2.6/include/asm-i386/div64.h2007-03-02 17:22:38.0 -0800
@@ -45,4 +45,6 @@
return dum2;
 
 }
+
+extern uint64_t div64_64(uint64_t dividend, uint64_t divisor);
 #endif
--- tcp-2.6.orig/include/asm-m68k/div64.h   2007-03-02 17:21:27.0 
-0800
+++ tcp-2.6/include/asm-m68k/div64.h2007-03-02 17:22:38.0 -0800
@@ -23,4 +23,5 @@
__rem;  \
 })
 
+extern uint64_t div64_64(uint64_t dividend, uint64_t divisor);
 #endif /* _M68K_DIV64_H */
--- tcp-2.6.orig/include/asm-mips/div64.h   2007-03-02 17:21:27.0 
-0800
+++ tcp-2.6/include/asm-mips/div64.h2007-03-02 17:22:38.0 -0800
@@ -78,6 +78,8 @@
__quot = __quot  32 | __low; \
(n) = __quot; \
__mod; })
+
+extern uint64_t div64_64(uint64_t dividend, uint64_t divisor);
 #endif /* (_MIPS_SZLONG == 32) */
 
 #if (_MIPS_SZLONG == 64)
--- tcp-2.6.orig/include/asm-um/div64.h 2007-03-02 17:21:27.0 -0800
+++ tcp-2.6/include/asm-um/div64.h  2007-03-02 17:22:38.0 -0800
@@ -3,4 +3,5 @@
 
 #include asm/arch/div64.h
 
+extern uint64_t div64_64(uint64_t dividend, uint64_t divisor);
 #endif
--- tcp-2.6.orig/include/asm-xtensa/div64.h 2007-03-02 17:21:27.0 
-0800
+++ tcp-2.6/include/asm-xtensa/div64.h  2007-03-02 17:22:38.0 -0800
@@ -16,4 +16,8 @@
n /= (unsigned int) base; \
__res; })
 
+static inline uint64_t div64_64(uint64_t dividend, uint64_t divisor)
+{
+   return dividend / divisor;
+}
 #endif
--- tcp-2.6.orig/lib/Makefile   2007-03-02 17:21:27.0 -0800
+++ tcp-2.6/lib/Makefile2007-03-02 17:22:38.0 -0800
@@ -4,7 +4,7 @@
 
 lib-y := ctype.o string.o vsprintf.o cmdline.o \
 rbtree.o radix-tree.o dump_stack.o \
-idr.o div64.o int_sqrt.o bitmap.o extable.o prio_tree.o \
+idr.o int_sqrt.o bitmap.o extable.o prio_tree.o \
 sha1.o irq_regs.o reciprocal_div.o
 
 lib-$(CONFIG_MMU) += ioremap.o
@@ -12,7 +12,8 @@
 
 lib-y  += kobject.o kref.o kobject_uevent.o klist.o
 
-obj-y += sort.o parser.o halfmd4.o debug_locks.o random32.o bust_spinlocks.o
+obj-y += div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \
+bust_spinlocks.o
 
 ifeq ($(CONFIG_DEBUG_KOBJECT),y)
 CFLAGS_kobject.o += -DDEBUG
--- tcp-2.6.orig/lib/div64.c2007-03-02 17:21:27.0 -0800
+++ tcp-2.6/lib/div64.c 2007-03-02 17:22:38.0 -0800
@@ -58,4 +58,26 @@
 
 EXPORT_SYMBOL(__div64_32);
 
+/* 64bit divisor, dividend and result. dynamic precision */
+uint64_t div64_64(uint64_t dividend, uint64_t divisor)
+{
+   uint32_t d = divisor;
+
+   if (divisor  0xULL) {
+   unsigned int shift = fls(divisor  32);
+
+   d = divisor  shift;
+   dividend = shift;
+   }
+
+   /* avoid 64 bit division if possible */
+   if (dividend  32)
+   do_div(dividend, d);
+   else
+   dividend = (uint32_t) dividend / d;
+
+   return dividend;
+}
+EXPORT_SYMBOL(div64_64);
+
 #endif /* BITS_PER_LONG == 32 */

Re: [PATCH] [USBNET] DM9501: Add Corega FEther USB-TXC support.

2007-03-02 Thread Jeff Garzik

YOSHIFUJI Hideaki / 吉藤英明 wrote:

Signed-off-by: YOSHIFUJI Hideaki [EMAIL PROTECTED]
---
 drivers/usb/net/dm9601.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/usb/net/dm9601.c b/drivers/usb/net/dm9601.c
index 4a932e1..c0bc52b 100644
--- a/drivers/usb/net/dm9601.c
+++ b/drivers/usb/net/dm9601.c
@@ -571,6 +571,10 @@ static const struct driver_info dm9601_info = {
 
 static const struct usb_device_id products[] = {

{
+USB_DEVICE(0x07aa, 0x9601),/* Corega FEther USB-TXC */
+.driver_info = (unsigned long)dm9601_info,
+},
+   {



ACK the patch, though I wonder if this shouldn't instead go to Greg.

Honestly, I would prefer that the USB net drivers were moved into 
drivers/net with the other net drivers, /then/ I would merge such 
patches.  We don't add drivers for PCI-based hardware to 
drivers/pci/net, after all...


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bonding-devel] [PATCH 2/3] bonding: only receive ARPs for us

2007-03-02 Thread Jay Vosburgh
Jeff Garzik [EMAIL PROTECTED] wrote:

Jay Vosburgh wrote:
  The ARP validation code only needs ARPs for the bonding device.
 
 Signed-off-by: Jay Vosburgh [EMAIL PROTECTED]

I seem to have lost the context of this.  Did this get discussed, and 
need further revision?

The further discussion can be (loosely) paraphrased as:

Andy Gospodarek [EMAIL PROTECTED]: Hey, this no workee with IPv6.

Me: True, but bonding no workee with IPv6 at all.

Andy: Oh, ok.  Ack.

After which followed some preliminary yakkage about fixing up
said non-workee IPv6 support.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bonding-devel] [PATCH 2/3] bonding: only receive ARPs for us

2007-03-02 Thread Jeff Garzik

Jay Vosburgh wrote:

Jeff Garzik [EMAIL PROTECTED] wrote:


Jay Vosburgh wrote:

The ARP validation code only needs ARPs for the bonding device.

Signed-off-by: Jay Vosburgh [EMAIL PROTECTED]
I seem to have lost the context of this.  Did this get discussed, and 
need further revision?


The further discussion can be (loosely) paraphrased as:

Andy Gospodarek [EMAIL PROTECTED]: Hey, this no workee with IPv6.

Me: True, but bonding no workee with IPv6 at all.

Andy: Oh, ok.  Ack.

After which followed some preliminary yakkage about fixing up
said non-workee IPv6 support.


thanks :)  I'll make sure the 3 patches go into #upstream-fixes


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bridge: avoid ptype_all packet handling

2007-03-02 Thread David Miller
From: Andi Kleen [EMAIL PROTECTED]
Date: 03 Mar 2007 03:14:29 +0100

 That's pretty common with many x86 server boards because 
 they come with two NICs by default but must people only
 plug the cable into one. However the distro installers
 run DHCP on all.

Nope, that's not what I've seen them do, instead they run dhcp on
interfaces that report a link being present.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/2] bridge: avoid ptype_all packet handling

2007-03-02 Thread Herbert Xu
David Miller [EMAIL PROTECTED] wrote:
 
 I'm tempted to say I must be missing something here, since I can't see
 how this could possible work at all.  The string passed in should
 be interpreted as the ifindex value, and thus trigger a -ENODEV
 return from AF_PACKET's bind() implementation.

This is using packet_bind_spkt which uses a name instead of ifindex.

As you may recall, I've made a patch to convert it to use the new
(actually it's not-so-new anymore) AF_PACKET interface.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/2] bridge: avoid ptype_all packet handling

2007-03-02 Thread David Miller
From: Herbert Xu [EMAIL PROTECTED]
Date: Sat, 03 Mar 2007 16:38:45 +1100

 This is using packet_bind_spkt which uses a name instead of ifindex.

So it should be just fine, it should be binding to a specific
device (by name instead of ifindex) and therefore it should
only trigger the pt_all hook when the packet arrives on that
specific device.

 As you may recall, I've made a patch to convert it to use the new
 (actually it's not-so-new anymore) AF_PACKET interface.

That's right.

So it's still a mystery why dhcp is causing bridge devices
to trigger the network tap paths on Stephen's machine.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ppp and routing table rules.

2007-03-02 Thread Bill Fink
On Thu, 01 Mar 2007, Ben Greear wrote:

 Ben Greear wrote:
 
 I am sending udp packets through ppp400, and I see them appear on ppp401 
 as expected.
 
 The thing that is bothering me is that all I see on rddVR4 (172.1.2.1) 
 is arps for 172.1.2.2, but the 'tell' IP is that of the
 originating ppp400 link, not the IP of rddVR4, as I expected:
 
 21:47:16.119640 arp who-has 172.1.2.2 tell 11.1.1.3
 21:47:17.119371 arp who-has 172.1.2.2 tell 11.1.1.3
 21:47:18.119254 arp who-has 172.1.2.2 tell 11.1.1.3
 21:47:19.273118 arp who-has 172.1.2.2 tell 11.1.1.3
 
 Unless I'm missing something dumb, a similar setup with all ethernet-ish 
 network devices
 works fine.
 
 I have also enabled arp filtering:
 # Only answer ARPs if it is for the IP on our own interface.
 echo 2  /proc/sys/net/ipv4/conf/all/arp_ignore
 and for every device used in these routing tables:
 echo 1  /proc/sys/net/ipv4/conf/[dev]/arp_filter
 
 Any idea what I need to do in order to make  the source IP for the ARP 
 packet correct?

Wouldn't that be controlled by arp_announce?

arp_announce - INTEGER
Define different restriction levels for announcing the local
source IP address from IP packets in ARP requests sent on
interface:
0 - (default) Use any local address, configured on any interface
1 - Try to avoid local addresses that are not in the target's
subnet for this interface. This mode is useful when target
hosts reachable via this interface require the source IP
address in ARP requests to be part of their logical network
configured on the receiving interface. When we generate the
request we will check all our subnets that include the
target IP and will preserve the source address if it is from
such subnet. If there is no such subnet we select source
address according to the rules for level 2.
2 - Always use the best local address for this target.
In this mode we ignore the source address in the IP packet
and try to select local address that we prefer for talks with
the target host. Such local address is selected by looking
for primary IP addresses on all our subnets on the outgoing
interface that include the target IP address. If no suitable
local address is found we select the first local address
we have on the outgoing interface or on all other interfaces,
with the hope we will receive reply for our request and
even sometimes no matter the source IP address we announce.

The max value from conf/{all,interface}/arp_announce is used.

Increasing the restriction level gives more chance for
receiving answer from the resolved target while decreasing
the level announces more valid sender's information.

-Bill
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4]: Kill fastpath_{skb,cnt}_hint.

2007-03-02 Thread David Miller
From: Baruch Even [EMAIL PROTECTED]
Date: Thu, 1 Mar 2007 20:13:40 +0200

 If you take this approach it makes sense to also remove the sorting of
 SACKs, the traversal of the SACK blocks will not start from the
 beginning anyway which was the reason for this sorting in the first
 place.
 
 One drawback for this approach is that you now walk the entire sack
 block when you advance one packet. If you consider a 10,000 packet queue
 which had several losses at the beginning and a large sack block that
 advances from the middle to the end you'll walk a lot of packets for
 that one last stretch of a sack block.
 
 One way to handle that is to use the still existing sack fast path to
 detect this case and calculate what is the sequence number to search
 for. Since you know what was the end_seq that was handled last, you can
 search for it as the start_seq and go on from there. Does it make sense?

Thanks for the feedback and these great ideas.

BTW, I think I figured out a way to get rid of
lost_{skb,cnt}_hint.  The fact of the matter in this case is that
the setting of the tag bits always propagates from front of the queue
onward.  We don't get holes mid-way.

So what we can do is search the RB-tree for high_seq and walk
backwards.  Once we hit something with TCPCB_TAGBITS set, we
stop processing as there are no earlier SKBs which we'd need
to do anything with.

Do you see any problems with that idea?

scoreboard_skb_hint is a little bit trickier, but it is a similar
case to the tcp_lost_skb_hint case.  Except here the termination
condition is a relative timeout instead of a sequence number and
packet count test.

Perhaps for that we can remember some state from the
tcp_mark_head_lost() we do first.  In fact, we can start
the queue walk from the latest packet which tcp_mark_head_lost()
marked with a tag bit.

Basically these two algorithms are saying:

1) Mark up to smallest of 'lost' or tp-high_seq.
2) Mark packets after those processed in #1 which have
   timed out.

Right?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bridge: avoid ptype_all packet handling

2007-03-02 Thread Stephen Hemminger

David Miller wrote:

From: Andi Kleen [EMAIL PROTECTED]
Date: 03 Mar 2007 03:14:29 +0100

  
That's pretty common with many x86 server boards because 
they come with two NICs by default but must people only

plug the cable into one. However the distro installers
run DHCP on all.



Nope, that's not what I've seen them do, instead they run dhcp on
interfaces that report a link being present.
  


Actually, It may be even simpler... I start bridge with a script and 
there was still a dhclient
left over running on the original interface.  It was an interesting 
exercise, and I have new

tools to help, but still no magic bullet to get up to full line rate.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ppp and routing table rules.

2007-03-02 Thread Ben Greear

Bill Fink wrote:

On Thu, 01 Mar 2007, Ben Greear wrote:

  

Ben Greear wrote:

I am sending udp packets through ppp400, and I see them appear on ppp401 
as expected.


The thing that is bothering me is that all I see on rddVR4 (172.1.2.1) 
is arps for 172.1.2.2, but the 'tell' IP is that of the

originating ppp400 link, not the IP of rddVR4, as I expected:

21:47:16.119640 arp who-has 172.1.2.2 tell 11.1.1.3
21:47:17.119371 arp who-has 172.1.2.2 tell 11.1.1.3
21:47:18.119254 arp who-has 172.1.2.2 tell 11.1.1.3
21:47:19.273118 arp who-has 172.1.2.2 tell 11.1.1.3

Unless I'm missing something dumb, a similar setup with all ethernet-ish 
network devices

works fine.

I have also enabled arp filtering:
# Only answer ARPs if it is for the IP on our own interface.
echo 2  /proc/sys/net/ipv4/conf/all/arp_ignore
and for every device used in these routing tables:
echo 1  /proc/sys/net/ipv4/conf/[dev]/arp_filter

Any idea what I need to do in order to make  the source IP for the ARP 
packet correct?



Wouldn't that be controlled by arp_announce?
  


Yes, after trawling through the code I found that one, and setting it to '2'
seems to have fixed everything.

Thanks,
Ben

--
Ben Greear [EMAIL PROTECTED] 
Candela Technologies Inc  http://www.candelatech.com



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -mm 3/5] Blackfin: on-chip ethernet MAC controller driver

2007-03-02 Thread Wu, Bryan
On Thu, 2007-03-01 at 10:52 -0500, Stephen Hemminger wrote:
 
 Please do not use mixed case for function or structure member names
 (see  
 Coding Style)
 

Here is the updated version of this driver. 
a) Change to follow kernel coding style
b) rename some functions and structures
c) change '//' to '/* */'
d) use pr_debug()

[PATCH] Blackfin: on-chip ethernet MAC controller driver

This patch implements the driver necessary use the Analog Devices
Blackfin processor's on-chip ethernet MAC controller.

Signed-off-by: Bryan Wu [EMAIL PROTECTED]
---

 drivers/net/Kconfig|   44 ++
 drivers/net/Makefile   |1
 drivers/net/bfin_mac.c |  981 
+
 drivers/net/bfin_mac.h |  147 +
 4 files changed, 1173 insertions(+)

Index: linux-2.6/drivers/net/Kconfig
===
--- linux-2.6.orig/drivers/net/Kconfig  2007-03-02 13:39:00.0 +0800
+++ linux-2.6/drivers/net/Kconfig   2007-03-02 13:39:00.0 +0800
@@ -836,6 +836,50 @@
  module, say M here and read file:Documentation/modules.txt as well
  as file:Documentation/networking/net-modules.txt.
 
+config BFIN_MAC
+   tristate Blackfin 536/537 on-chip mac support
+   depends on NET_ETHERNET  (BF537 || BF536)  (!BF537_PORT_H)
+   select CRC32
+   select BFIN_MAC_USE_L1 if DMA_UNCACHED_NONE
+   help
+ This is the driver for blackfin on-chip mac device. Say Y if you want 
it
+ compiled into the kernel. This driver is also available as a module
+ ( = code which can be inserted in and removed from the running kernel
+ whenever you want). The module will be called bfin_mac.
+
+config BFIN_MAC_USE_L1
+   bool Use L1 memory for rx/tx packets
+   depends on BFIN_MAC  BF537
+   default y
+   help
+ To get maximum network performace, you should use L1 memory as rx/tx 
buffers.
+ Say N here if you want to reserve L1 memory for other uses.
+
+config BFIN_TX_DESC_NUM
+   int Number of transmit buffer packets
+   depends on BFIN_MAC
+   range 6 10 if BFIN_MAC_USE_L1
+   range 10 100
+   default 10
+   help
+ Set the number of buffer packets used in driver.
+
+config BFIN_RX_DESC_NUM
+   int Number of receive buffer packets
+   depends on BFIN_MAC
+   range 20 100 if BFIN_MAC_USE_L1
+   range 20 800
+   default 20
+   help
+ Set the number of buffer packets used in driver.
+
+config BFIN_MAC_RMII
+   bool RMII PHY Interface (EXPERIMENTAL)
+   depends on BFIN_MAC  EXPERIMENTAL
+   default n
+   help
+ Use Reduced PHY MII Interface
+
 config SMC9194
tristate SMC 9194 support
depends on NET_VENDOR_SMC  (ISA || MAC  BROKEN)
Index: linux-2.6/drivers/net/Makefile
===
--- linux-2.6.orig/drivers/net/Makefile 2007-03-02 13:38:59.0 +0800
+++ linux-2.6/drivers/net/Makefile  2007-03-02 13:39:00.0 +0800
@@ -195,6 +195,7 @@
 obj-$(CONFIG_MYRI10GE) += myri10ge/
 obj-$(CONFIG_SMC91X) += smc91x.o
 obj-$(CONFIG_SMC911X) += smc911x.o
+obj-$(CONFIG_BFIN_MAC) += bfin_mac.o
 obj-$(CONFIG_DM9000) += dm9000.o
 obj-$(CONFIG_FEC_8XX) += fec_8xx/
 obj-$(CONFIG_PASEMI_MAC) += pasemi_mac.o
Index: linux-2.6/drivers/net/bfin_mac.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6/drivers/net/bfin_mac.c2007-03-02 13:52:39.0 +0800
@@ -0,0 +1,981 @@
+/*
+ * File: drivers/net/bfin_mac.c
+ * Based on:
+ * Author:   Luke Yang [EMAIL PROTECTED]
+ *
+ * Created:
+ * Description:
+ *
+ * Modified:
+ *   Copyright 2004-2006 Analog Devices Inc.
+ *
+ * Bugs: Enter bugs at http://blackfin.uclinux.org/
+ *
+ * This program is free software ;  you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation ;  either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY ;  without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program ;  see the file COPYING.
+ * If not, write to the Free Software Foundation,
+ * 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+
+#include linux/init.h
+#include linux/module.h
+#include linux/kernel.h
+#include linux/sched.h
+#include linux/slab.h
+#include linux/delay.h
+#include linux/timer.h
+#include linux/errno.h
+#include linux/ioport.h
+#include linux/crc32.h
+#include linux/device.h
+#include linux/spinlock.h
+#include linux/ethtool.h
+#include linux/mii.h
+

Very slow routing table modification if RTA_FLOW is set

2007-03-02 Thread NetArt - Grzegorz Nosek
Hello all,

I have noticed that using realm patch for quagga
http://vcalinus.gemenii.ro/quaggarealms.html causes the kernel to
spend a lot more time processing rtnetlink messages.

If routes added to the kernel are not tagged with a realm number, the
time from sending a netlink cmd to receiving an ack is mostly stable
at several dozen microseconds or less.

However, if I add route tagging with 'neighbor X.X.X.X realm
origin-as', the time spent in kernel:
1. seems to increase with the numer of FIB entries
2. is much more jittery

The net result is that after adding about 100k routes, the time between
cmd and ack is usually around 4 _milli_seconds, but sometimes the
route is added immediately (i.e. after 20 us or so), just like when
the table is nearly empty. Overall, the process of receiving a full
routing table slows down from a minute to about 11.

The kernel is 2.6.18.6. I have tried using both FIB_HASH and FIB_TRIE.
I'll try to collect dome results from oprofile next and if anything
pops out at me, I'll let you know.

The core of the quagga patch with regard to the kernel is:

  if (rib-realmto)
  addattr32 (req.n, sizeof req, RTA_FLOW, rib-realmto);

while constructing the netlink packet.

Is this a known problem? Can anything be done about it?

Please CC as I'm not subscribed to the list.

Best regards,
 Grzegorz Nosek

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >