Re: Packets lost: Softirq

2007-05-19 Thread Eric Dumazet
Vasantha Kumar Puttappa a écrit : Hi All, Please somebody guide me here. I desparatley need help regarding this issue. ( plz do reply to all) I am tracking all udp packets (in particular, SIP based UDP packets)that goes through the iptables using LOG mechanism. I use the following command,

Re: UDP packet loss when running lsof

2007-05-22 Thread Eric Dumazet
John Miller a écrit : Hi Eric, I CCed netdev since this stuff is about network and not lkml. Ok, dropped the CC... What kind of machine do you have ? SMP or not ? It's a HP system with two dual core CPUs at 3GHz, the storage system is connected through QLogic FC-HBA. It should really be

Re: UDP packet loss when running lsof

2007-05-22 Thread Eric Dumazet
Eric Dumazet a écrit : John Miller a écrit : Hi Eric, I CCed netdev since this stuff is about network and not lkml. Ok, dropped the CC... What kind of machine do you have ? SMP or not ? It's a HP system with two dual core CPUs at 3GHz, the storage system is connected through QLogic FC

Re: TCP_MD5 and Intel e1000

2007-05-22 Thread Eric Dumazet
On Tue, 22 May 2007 09:33:29 +0200 Marc Donner [EMAIL PROTECTED] wrote: Hi, I have tried to set up quagga with tcp-md5 support from kernel. All seems ok with a intel e100 NIC, but as i testetd with a intel e1000 NIC the tcp packets have an invalid md5 digest. If i run tcpdump on the

Re: [PATCH netdev] wrong timeout value in sk_wait_data()

2007-05-23 Thread Eric Dumazet
Vasily Averin a écrit : sys_setsockopt() do not check properly timeout values for SO_RCVTIMEO/SO_SNDTIMEO, for example it's possible to set negative timeout values. POSIX do not defines behaviour for sys_setsockopt in case negative timeouts, but requires that setsockopt() shall fail with -EDOM

Re: [PATCH]: Make XFRM_ACQ_EXPIRES tweakable

2007-05-24 Thread Eric Dumazet
David Miller a écrit : I've had several requests for the capability to change this timeout, which I think is perfectly reasonable. So I intend to merge the following upstream unless I hear some objections :-) commit 7191f131aff4797f2a906495c7b285d8adf47da2 Author: David S. Miller [EMAIL

Re: [Bugme-new] [Bug 8536] New: Kernel drops UDP packets silently when reading from certain proc file entries

2007-05-25 Thread Eric Dumazet
Herbert Xu a écrit : Andrew Morton [EMAIL PROTECTED] wrote: It is possible to introduce UDP packet losses by reading the proc file entry /proc/net/tcp. The really strange thing is that the error counters for packet drops are not increased. Please try this patch and let us know if it helps.

Re: [Bugme-new] [Bug 8536] New: Kernel drops UDP packets silently when reading from certain proc file entries

2007-05-25 Thread Eric Dumazet
Herbert Xu a écrit : On Fri, May 25, 2007 at 08:50:20AM +0200, Eric Dumazet wrote: If this patch really helps, this means cond_resched_softirq() doesnt work at all and should be fixed, or just zapped as it is seldom used. cond_resched_softirq lets other threads run if they want to. It doesn't

[PATCH] TCP : use LIMIT_NETDEBUG in tcp_retransmit_timer()

2007-06-04 Thread Eric Dumazet
LIMIT_NETDEBUG allows the admin to disable some warning messages (echo 0 /proc/sys/net/core/warnings). The TCP: Treason uncloaked! message can use this facility. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index e613401..e9b151b

Re: [PATCH] TCP : use LIMIT_NETDEBUG in tcp_retransmit_timer()

2007-06-05 Thread Eric Dumazet
David Miller a écrit : From: Eric Dumazet [EMAIL PROTECTED] Date: Mon, 04 Jun 2007 09:13:40 +0200 LIMIT_NETDEBUG allows the admin to disable some warning messages (echo 0 /proc/sys/net/core/warnings). The TCP: Treason uncloaked! message can use this facility. Signed-off-by: Eric Dumazet

[BUG] UDP : bind() checks are not complete

2007-06-05 Thread Eric Dumazet
David I discovered one big problem with UDP binding in 2.6.22-rc4 : Consider you have eth0 with addr 192.168.0.1 Consider one UDP socket was bound to 192.168.0.1:32769. It will be stored on a slot != 1 Another UDP socket is created and binded to (0.0.0.0:0) __udp_lib_get_port() is called

Re: [2.6.22-rc4] UDP's local port assignment not working correctly.

2007-06-07 Thread Eric Dumazet
On Thu, 7 Jun 2007 20:40:39 +0900 Tetsuo Handa [EMAIL PROTECTED] wrote: Hello. Same local ports are assigned to multiple sockets. The following program should print different local port number. - Start of program - #include stdio.h #include unistd.h #include sys/socket.h

Re: [PATCH] Age Entry For IPv4 Route Table

2007-06-25 Thread Eric Dumazet
On Mon, 25 Jun 2007 10:28:38 +0530 Varun Chandramohan [EMAIL PROTECTED] wrote: According to the RFC 4292 (IP Forwarding Table MIB) there is a need for an age entry for all the routes in the routing table. The entry in the RFC is inetCidrRouteAge and oid is inetCidrRouteAge.1.10. Many snmp

Re: Beyond 64K TCP connections limit per IP-address

2007-07-04 Thread Eric Dumazet
On Wed, 4 Jul 2007 11:40:48 +0200 Robert Iakobashvili [EMAIL PROTECTED] wrote: On 7/4/07, Evgeniy Polyakov [EMAIL PROTECTED] wrote: On Wed, Jul 04, 2007 at 09:50:31AM +0200, Robert Iakobashvili ([EMAIL PROTECTED]) wrote: If I am correct, a TCP server can make up to 64K accepts for a

[RFC] Idea to speedup tcp lookups]

2005-08-02 Thread Eric Dumazet
Message original Sujet: [RFC] Idea to speedup tcp lookups Date: Tue, 02 Aug 2005 11:53:12 +0200 De: Eric Dumazet [EMAIL PROTECTED] Pour: David S. Miller davem@redhat.com Copie: [EMAIL PROTECTED], [EMAIL PROTECTED] Références: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL

Re: [PATCH] make use of -private_data in sockfd_lookup

2005-08-17 Thread Eric Dumazet
Andi Kleen a écrit : David, do you think we could place file-private_data in the same cache line than file-f_count and file-f_op, so that sockfd_lookup() can access all the needed information (f_count, f_op, private_data) using one L1_CACHE_LINE only ? You mean for 32byte cache lines? Not

[PATCH] struct file cleanup : the very large file_ra_state is now allocated only on demand.

2005-08-17 Thread Eric Dumazet
sockfd_lookups Thank you Signed-off-by: Eric Dumazet [EMAIL PROTECTED] --- linux-2.6.13-rc6/include/linux/fs.h 2005-08-07 20:18:56.0 +0200 +++ linux-2.6.13-rc6-ed/include/linux/fs.h 2005-08-18 01:33:04.0 +0200 @@ -586,20 +586,19 @@ struct dentry *f_dentry

Re: [PATCH] struct file cleanup : the very large file_ra_state is now allocated only on demand.

2005-08-18 Thread Eric Dumazet
Coywolf Qi Hunt a écrit : On 8/18/05, Eric Dumazet [EMAIL PROTECTED] wrote: Andi Kleen a écrit : (because of the insane struct file_ra_state f_ra. I wish this structure were dynamically allocated only for files that really use it) How about you submit a patch for that instead? -Andi

Re: [PATCH] struct file cleanup : the very large file_ra_state is now allocated only on demand.

2005-08-18 Thread Eric Dumazet
David S. Miller a écrit : From: Andi Kleen [EMAIL PROTECTED] Date: Thu, 18 Aug 2005 03:05:25 +0200 I would just set the ra pointer to a single global structure if the allocation fails. Then you can avoid all the other checks. It will slow down things and trash some state, but not fail and

[PATCH] Put the very large file_ra_state outside of 'struct file'

2005-08-18 Thread Eric Dumazet
to f_count and f_op fields to speedup sockfd_lookups Thank you Signed-off-by: Eric Dumazet [EMAIL PROTECTED] --- linux-2.6.13-rc6/include/linux/fs.h 2005-08-07 20:18:56.0 +0200 +++ linux-2.6.13-rc6-ed/include/linux/fs.h 2005-08-18 10:30:35.0 +0200 @@ -586,20 +586,18

Perf problem with qdisc ? dev_queue_xmit_nit() can be called many times for the same packet

2005-08-23 Thread Eric Dumazet
Hi all I have strange numbers on a 4 way SMP Opteron machine, with a single tg3 NIC, linux-2.6.13-rc6 I have about 12000 requeues per second. oprofile data show high numbers for these related functions : qdisc_restart() 2.6452 % dev_queue_xmit() 0.9599 % pfifo_fast_dequeue() 0.7094 %

Re: Perf problem with qdisc ? dev_queue_xmit_nit() can be called many times for the same packet

2005-08-23 Thread Eric Dumazet
David S. Miller a écrit : No, all of your cpus are racing to get the transmit lock of the tg3 driver. Whoever wins the race gets to queue the packet, the others have to back off. I believe the tg3_tx() holds the tx_lock for too long, in the case 200 or so skb are delivered Maybe adding a

[PATCH] [NET] use __read_mostly on kmem_cache_t , DEFINE_SNMP_STAT pointers

2005-08-25 Thread Eric Dumazet
. Thank you Signed-off-by: Eric Dumazet [EMAIL PROTECTED] --- net-2.6.14/net/ipv4/tcp.c 2005-08-26 02:14:00.0 +0200 +++ net-2.6.14-ed/net/ipv4/tcp.c2005-08-26 02:20:08.0 +0200 @@ -269,7 +269,7 @@ int sysctl_tcp_fin_timeout = TCP_FIN_TIMEOUT; -DEFINE_SNMP_STAT(struct

Re: [PATCH] [NET] use __read_mostly on kmem_cache_t , DEFINE_SNMP_STAT pointers

2005-08-26 Thread Eric Dumazet
David S. Miller a écrit : From: Eric Dumazet [EMAIL PROTECTED] Date: Fri, 26 Aug 2005 03:07:06 +0200 On one of my production machine, tcp_statistics was sitting in a heavily modified cache line, so *every* SNMP update had to force a reload. But I disagree that statistics belong

Re: [PATCH] [NET] use __read_mostly on kmem_cache_t , DEFINE_SNMP_STAT pointers

2005-08-26 Thread Eric Dumazet
Benjamin LaHaise a écrit : On Fri, Aug 26, 2005 at 09:11:14AM +0200, Eric Dumazet wrote: The patch I suggested only changed the root pointer, moving to read_mostly section because it is really write once at boot, then read only. This is the same with slab pointers : they are hot objects (read

Re: Perf problem with qdisc ? dev_queue_xmit_nit() can be called many times for the same packet

2005-08-27 Thread Eric Dumazet
David S. Miller a écrit : From: Eric Dumazet [EMAIL PROTECTED] Date: Wed, 24 Aug 2005 01:10:44 +0200 Looking at tg3_tx() more closely, I am not convinced it really needs to lock tp-tx_lock during the loop. tp-tx_cons (swidx) is changed in this function only, and could be changed

Re: Perf problem with qdisc ? dev_queue_xmit_nit() can be called many times for the same packet

2005-08-28 Thread Eric Dumazet
jamal a écrit : On Sat, 2005-27-08 at 22:38 +0200, Eric Dumazet wrote: (So about 360 requeues per second, much better than before (12000 / second)) I suspect what you are doing is shoving a lot more packets than the wire can handle. Thats why you are getting the backpressure. I read back

Re: [PATCH] make use of -private_data in sockfd_lookup

2005-09-05 Thread Eric Dumazet
of Benjamin patch ? Avoid touching file-f_dentry on sockets, since file-private_data directly gives us the socket pointer. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] --- linux-2.6/net/socket.c 2005-09-06 01:20:25.0 +0200 +++ linux-2.6-ed/net/socket.c 2005-09-06 01:35:02.0

Re: [PATCH] [RFT] ip_tables NUMA optimization

2005-11-17 Thread Eric Dumazet
[cpu]); + } + kfree(info); It should probably use vfree() like : + for_each_cpu(cpu) { + if (info-size = PAGE_SIZE) + kfree(info-entries[cpu]); + else + vfree(info-entries[cpu]); + } See you Eric Dumazet

Re: [take19 1/4] kevent: Core files.

2006-10-05 Thread Eric Dumazet
On Thursday 05 October 2006 10:57, Evgeniy Polyakov wrote: Well, it is possible to create /sys/proc entry for that, and even now userspace can grow mapping ring until it is forbiden by kernel, which means limit is reached. No need for yet another /sys/proc entry. Right now, I (for example)

Re: [take19 1/4] kevent: Core files.

2006-10-05 Thread Eric Dumazet
On Thursday 05 October 2006 12:55, Evgeniy Polyakov wrote: On Thu, Oct 05, 2006 at 12:45:03PM +0200, Eric Dumazet ([EMAIL PROTECTED]) What is missing or not obvious is : If events are skipped because of overflows, What happens ? Connections stuck forever ? Hope that everything

[RFC] Question about potential problem in net/ipv4/route.c

2006-10-11 Thread Eric Dumazet
Hi David While browsing net/ipv4/route.c I discovered compare_keys() function, and a potential bug in it. static inline int compare_keys(struct flowi *fl1, struct flowi *fl2) { return memcmp(fl1-nl_u.ip4_u, fl2-nl_u.ip4_u, sizeof(fl1-nl_u.ip4_u)) == 0 fl1-oif ==

Re: [RFC] Question about potential problem in net/ipv4/route.c

2006-10-11 Thread Eric Dumazet
David Miller a écrit : From: Eric Dumazet [EMAIL PROTECTED] Date: Wed, 11 Oct 2006 15:11:18 +0200 Using memcmp(ptr1, ptr2, sizeof(SOMEFIELD)) is dangerous because sizeof(SOMEFIELD) can be larger than the underlying object, because of alignment constraints. In this case, sizeof(fl1

Re: [RFC] Question about potential problem in net/ipv4/route.c

2006-10-12 Thread Eric Dumazet
David Miller a écrit : From: Eric Dumazet [EMAIL PROTECTED] Date: Thu, 12 Oct 2006 07:48:20 +0200 Not on my gcc here (gcc version 3.4.4) : It wont zeros out the padding bytes Patrick just proved this too :) Well, on this machine I have these oprofile numbers : rt_intern_hash

Re: Suppress / delay SYN-ACK

2006-10-12 Thread Eric Dumazet
On Thursday 12 October 2006 10:08, Martin Schiller wrote: Hi! I'm searching for a solution to suppress / delay the SYN-ACK packet of a listening server (-application) until he has decided (e.g. analysed the requesting ip-address or checked if the corresponding other end of a connection is

Re: Suppress / delay SYN-ACK

2006-10-12 Thread Eric Dumazet
On Thursday 12 October 2006 12:13, Martin Schiller wrote: On Thursday, October 12, 2006 10:38 AM, Eric Dumazet wrote: Well, it is already possible to delay the 'third packet' of an outgoing connection with a litle hack. But AFAIK not the SYNACK of incoming connection. It could be cool

Re: Suppress / delay SYN-ACK

2006-10-12 Thread Eric Dumazet
On Thursday 12 October 2006 12:31, Evgeniy Polyakov wrote: On Thu, Oct 12, 2006 at 12:13:26PM +0200, Martin Schiller ([EMAIL PROTECTED]) wrote: On Thursday, October 12, 2006 10:38 AM, Eric Dumazet wrote: Well, it is already possible to delay the 'third packet' of an outgoing connection

[PATCH] [NET] reduce sizeof(struct inet_peer), cleanup, change in peer_check_expire()

2006-10-12 Thread Eric Dumazet
to delete entries for at most one timer tick. CPUS are going faster, hard limits are becoming useless... Similar thing is done in net/ipv4/route.c garbage collector. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] --- linux-2.6.18/include/net/inetpeer.h Wed Sep 20 05:42:06 2006 +++ linux-2.6.18-ed

Re: [PATCH] [NET] reduce sizeof(struct inet_peer), cleanup, change in peer_check_expire()

2006-10-12 Thread Eric Dumazet
David Miller a écrit : From: Eric Dumazet [EMAIL PROTECTED] Date: Thu, 12 Oct 2006 22:14:12 +0200 1) shrink struct inet_peer on 64 bits platforms. I noticed sizeof(struct inet_peer) was 64+8 on x86_64 As we dont really need 64 bits timestamps

Re: Suppress / delay SYN-ACK

2006-10-12 Thread Eric Dumazet
Rick Jones a écrit : More to the point, on what basis would the application be rejecting a connection request based solely on the SYN? True, it isn't like there would suddenly be any call user data as in XTI/TLI. DATA payload could be included in the SYN packet. TCP specs allow this AFAIK.

Re: [PATCH] [NET] reduce sizeof(struct inet_peer), cleanup, change in peer_check_expire()

2006-10-12 Thread Eric Dumazet
David Miller a écrit : From: Eric Dumazet [EMAIL PROTECTED] Date: Fri, 13 Oct 2006 05:56:43 +0200 2^31 is 2147483648 Thats a *lot* of timer ticks, an inet_peer entry should not stay in unused_list for more than 10 minutes. My bad, I thought the time was compared to the creation time

Re: Suppress / delay SYN-ACK

2006-10-13 Thread Eric Dumazet
Rick Jones a écrit : Eric Dumazet wrote: Rick Jones a écrit : More to the point, on what basis would the application be rejecting a connection request based solely on the SYN? True, it isn't like there would suddenly be any call user data as in XTI/TLI. DATA payload could be included

[PATCH] NET : Suspicious locking in reqsk_queue_hash_req()

2006-10-16 Thread Eric Dumazet
Hi David While browsing include/net/request_sock.h I found this suspicious locking protecting the SYN table hash table. I think this patch is necessary. Thank you Signed-off-by: Eric Dumazet [EMAIL PROTECTED] --- linux-2.6.18/include/net/request_sock.h.orig2006-10-16 10:53

[PATCH] NET : Suspicious locking in reqsk_queue_hash_req()

2006-10-16 Thread Eric Dumazet
(Sorry, patch inlined this time) Hi David While browsing include/net/request_sock.h I found this suspicious locking protecting the SYN table hash table. I think this patch is necessary. Thank you Signed-off-by: Eric Dumazet [EMAIL PROTECTED] --- linux-2.6.18/include/net/request_sock.h.orig

Re: [PATCH] NET : Suspicious locking in reqsk_queue_hash_req()

2006-10-16 Thread Eric Dumazet
On Monday 16 October 2006 18:16, Arnaldo Carvalho de Melo wrote: On 10/16/06, Eric Dumazet [EMAIL PROTECTED] wrote: (Sorry, patch inlined this time) Hi David While browsing include/net/request_sock.h I found this suspicious locking protecting the SYN table hash table. I think

Re: [PATCH] NET : Suspicious locking in reqsk_queue_hash_req()

2006-10-16 Thread Eric Dumazet
On Monday 16 October 2006 18:56, Eric Dumazet wrote: On Monday 16 October 2006 18:16, Arnaldo Carvalho de Melo wrote: On 10/16/06, Eric Dumazet [EMAIL PROTECTED] wrote: (Sorry, patch inlined this time) Hi David While browsing include/net/request_sock.h I found this suspicious

Re: PATCH zero-copy send completion callback

2006-10-17 Thread Eric Dumazet
On Tuesday 17 October 2006 02:53, Eric Barton wrote: If so, do you have any ideas about how to do it more economically? It's 2 pointers rather than 1 to avoid forcing an unnecessary packet boundary between successive zero-copy sends. But I guess that might not be hugely significant since

Re: Suppress / delay SYN-ACK

2006-10-17 Thread Eric Dumazet
On Tuesday 17 October 2006 14:04, Martin Schiller wrote: On Monday, October 16, 2006 9:02 AM, Lennert Buytenhek wrote: I wrote something like this a couple of years ago: http://marc.theaimsgroup.com/?l=linux-netdevm=103666165629419w=2

Re: [take19 1/4] kevent: Core files.

2006-10-17 Thread Eric Dumazet
On Tuesday 17 October 2006 12:39, Evgeniy Polyakov wrote: I can add such notification, but its existense _is_ the broken design. After such condition happend, all new events will dissapear (although they are still accessible through usual queue) from mapped buffer. While writing this I have

Re: [take19 1/4] kevent: Core files.

2006-10-17 Thread Eric Dumazet
On Tuesday 17 October 2006 15:42, Evgeniy Polyakov wrote: On Tue, Oct 17, 2006 at 03:19:36PM +0200, Eric Dumazet ([EMAIL PROTECTED]) wrote: On Tuesday 17 October 2006 12:39, Evgeniy Polyakov wrote: I can add such notification, but its existense _is_ the broken design. After

Re: [take19 1/4] kevent: Core files.

2006-10-17 Thread Eric Dumazet
On Tuesday 17 October 2006 16:07, Evgeniy Polyakov wrote: On Tue, Oct 17, 2006 at 03:52:34PM +0200, Eric Dumazet ([EMAIL PROTECTED]) wrote: What about the case, which I described in other e-mail, when in case of the full ring buffer, no new events are written there, and when userspace

Re: [take19 1/4] kevent: Core files.

2006-10-17 Thread Eric Dumazet
On Tuesday 17 October 2006 17:09, Evgeniy Polyakov wrote: On Tue, Oct 17, 2006 at 04:25:00PM +0200, Eric Dumazet ([EMAIL PROTECTED]) wrote: On Tuesday 17 October 2006 16:07, Evgeniy Polyakov wrote: On Tue, Oct 17, 2006 at 03:52:34PM +0200, Eric Dumazet ([EMAIL PROTECTED]) wrote

Re: [take19 1/4] kevent: Core files.

2006-10-17 Thread Eric Dumazet
On Tuesday 17 October 2006 18:01, Evgeniy Polyakov wrote: Ok, there is one apologist for mmap buffer implementation, who forced me to create first implementation, which was dropped due to absense of remote mental reading abilities. Ulrich, does above approach sound good for you? I actually

Re: [take19 1/4] kevent: Core files.

2006-10-17 Thread Eric Dumazet
On Tuesday 17 October 2006 18:35, Evgeniy Polyakov wrote: On Tue, Oct 17, 2006 at 06:26:04PM +0200, Eric Dumazet ([EMAIL PROTECTED]) wrote: On Tuesday 17 October 2006 18:01, Evgeniy Polyakov wrote: Ok, there is one apologist for mmap buffer implementation, who forced me to create first

Re: [take19 1/4] kevent: Core files.

2006-10-17 Thread Eric Dumazet
Evgeniy Polyakov a e'crit : On Tue, Oct 17, 2006 at 06:45:54PM +0200, Eric Dumazet ([EMAIL PROTECTED]) wrote: I am not sure I understand what you wrote, English is not our native language. I think many people gave you feedbacks. I feel that all feedback on this mailing list is constructive

[PATCH] [NET] reduce sizeof(struct flow)

2006-10-17 Thread Eric Dumazet
) As many routers are base on PIII (L1_CACHE_SIZE=32), this saves one cache line per rtable entry. Thank you Signed-off-by: Eric Dumazet [EMAIL PROTECTED] --- linux-2.6.19-rc2/include/net/flow.h 2006-10-18 06:03:08.0 +0200 +++ linux-2.6.19-rc2-ed/include/net/flow.h 2006-10-18 06:56

Re: [PATCH] [NET] reduce sizeof(struct flow)

2006-10-17 Thread Eric Dumazet
YOSHIFUJI Hideaki / a écrit : In article [EMAIL PROTECTED] (at Wed, 18 Oct 2006 07:08:07 +0200), Eric Dumazet [EMAIL PROTECTED] says: +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) struct { struct in6_addr daddr

Re: [PATCH] [NET] reduce sizeof(struct flow)

2006-10-17 Thread Eric Dumazet
David Miller a écrit : From: Eric Dumazet [EMAIL PROTECTED] Date: Wed, 18 Oct 2006 07:08:07 +0200 Each route entry includes a 'struct flow'. This structure has a current size of 80 bytes. This patch makes a size reduction depending on CONFIG_IPV6/CONFIG_IPV6_MODULE/CONFIG_DECNET

[PATCH] [NET] inet_peer : group together avl_left, avl_right, v4daddr to speedup lookups on some CPUS

2006-10-18 Thread Eric Dumazet
Hi David Lot of routers still use CPUS with 32 bytes cache lines. (Intel PIII) It make sense to make sure fields used at lookup time are in the same cache line, to reduce cache footprint and speedup lookups. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] --- linux/include/net/inetpeer.h

Re: [PATCH] [NET] reduce sizeof(struct flow)

2006-10-18 Thread Eric Dumazet
On Wednesday 18 October 2006 10:20, Steven Whitehouse wrote: Hi, On Tue, Oct 17, 2006 at 11:53:36PM -0700, David Miller wrote: From: Eric Dumazet [EMAIL PROTECTED] Date: Wed, 18 Oct 2006 07:42:17 +0200 How many people are using DECNET and want to pay the price of this 20 bytes

Re: [PATCH] [NET] reduce sizeof(struct flow)

2006-10-18 Thread Eric Dumazet
On Wednesday 18 October 2006 14:42, Steven Whitehouse wrote: Hi, Its not used at the moment[*], but would be required for any kind of flow tracking. The objnum field, could be folded into the objname field I guess on the basis that objnamel == 0 means objname[0] represents the

[PATCH, resent] [NET] reduce per cpu ram used for loopback stats

2006-10-18 Thread Eric Dumazet
too in loopback_xmit() not updating 4 fields, but 2. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] --- linux/drivers/net/loopback.c2006-10-18 17:28:20.0 +0200 +++ linux-ed/drivers/net/loopback.c 2006-10-18 18:26:41.0 +0200 @@ -58,7 +58,11 @@ #include linux/tcp.h

[PATCH] [NET] reduce per cpu ram used for loopback stats

2006-10-18 Thread Eric Dumazet
We dont need a full struct net_device_stats (currently 23 long : 184 bytes on x86_64) per possible CPU, but only two counters : bytes and packets We save few CPU cycles too in loopback_xmit() not updating 4 fields, but 2. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] --- linux/drivers/net

Re: rare bad TCP checksum with 2.6.19?

2007-01-15 Thread Eric Dumazet
Michael Tokarev a e'crit : Any idea how to force sending FIN-with-data? int flag_on = 1; setsockopt(fd, SOL_TCP, TCP_CORK, flag_on, sizeof(int)); send(fd, data, datalen, 0); close(fd); Eric Dumazet - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message

Re: rare bad TCP checksum with 2.6.19?

2007-01-15 Thread Eric Dumazet
Michael Tokarev a écrit : Eric Dumazet wrote: Michael Tokarev a e'crit : Any idea how to force sending FIN-with-data? int flag_on = 1; setsockopt(fd, SOL_TCP, TCP_CORK, flag_on, sizeof(int)); send(fd, data, datalen, 0); close(fd); That produces two packets - one (or more - depending

Re: [PATCH 11/11] ixgb: Add prefetch

2006-04-22 Thread Eric Dumazet
Jeff Kirsher a écrit : - This patch is to improve performance by adding prefetch to the ixgb driver - Add driver comments Signed-off-by: Jeff Kirsher [EMAIL PROTECTED] Signed-off-by: Jesse Brandeburg [EMAIL PROTECTED] Signed-off-by: John Ronciak [EMAIL PROTECTED] ---

[x86_64, NET] smp_rmb() in dst_destroy() seems very expensive, ditto in kfree_skb()

2006-05-05 Thread Eric Dumazet
On a dual opteron box, I noticed high oprofile numbers in net/core/dst.c , function dst_destroy(struct dst_entry * dst) It appears the smb_rmb() done at the begining of dst_destroy() is the killer (this is a lfence machine instruction, that apparently is doing a *lot* of things... may be IO

Very long list of struct dst_entry in dst_garbage_list

2006-05-05 Thread Eric Dumazet
I noticed that after a 'ip route flush cache' (manual or timer triggered) on a busy server, X entries are added to dst_garbage_list. (X depends on the number of established sockets) Every 1/10th second (DST_GC_MIN) , net/core/dst.c::dst_run_gc() is fired, and try to free some entries

[PATCH] netfilter : zap get_cpu()/put_cpu() calls from ip_tables

2005-11-25 Thread Eric Dumazet
raw_smp_processor_id() were appropriate. Signed-off-by: Eric Dumazet [EMAIL PROTECTED]

[PATCH (resent with the attachment !)] netfilter : zap get_cpu()/put_cpu() calls from ip_tables

2005-11-25 Thread Eric Dumazet
raw_smp_processor_id() were appropriate. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] --- net-2.6.16-orig/net/ipv4/netfilter/ip_tables.c 2005-11-25 10:24:02.0 +0100 +++ net-2.6.16/net/ipv4/netfilter/ip_tables.c 2005-11-25 11:44:40.0 +0100 @@ -988,11 +988,14 @@ { unsigned

Re: more on PMTU issues

2005-12-01 Thread Eric Dumazet
David S. Miller a écrit : This gives further credence to BSD's hostcache which makes it use PMTU metrics only learned by TCP. I still dislike the reduced granularity of such a scheme, since as we all know ipsec routes can have wildly different metrics and can be keyed by things like port

Re: [IPv4] Fix issue reported by Coverity in net/ipv4/udp.c

2005-12-01 Thread Eric Dumazet
Herbert Xu a écrit : Jayachandran C. [EMAIL PROTECTED] wrote: diff -ur linux-2.6.15-rc3-git1.clean/net/ipv4/udp.c linux-2.6.15-rc3-git1/net/ipv4/udp.c --- linux-2.6.15-rc3-git1.clean/net/ipv4/udp.c Wed Nov 30 21:55:27 2005 +++ linux-2.6.15-rc3-git1/net/ipv4/udp.cThu Dec 1 05:23:40

Re: [IPv4] Fix issue reported by Coverity in net/ipv4/udp.c

2005-12-02 Thread Eric Dumazet
! Acked-by: Eric Dumazet [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-02 Thread Eric Dumazet
Ronciak, John a écrit : In this combination of hardware and in this forwarding test copybreak is bad but prefetching helps. e1000 vanilla 1150 kpps e1000 6.2.151084 e1000 6.2.15 copybreak disabled

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-03 Thread Eric Dumazet
David S. Miller a écrit : I agree with the analysis, but I truly hate knobs. Every new one we add means it's even more true that you need to be a wizard to get a Linux box performing optimally. [rant mode] Well, I suspect this is the reason why various hash tables (IP route cache, TCP

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread Eric Dumazet
in cases like small packet routing is being done. I am no longer sure that your results on copybreak for host bound packets can be trusted anymore. All your copybreak was doing was making the prefetch look good according to my tests. Eric Dumazet [EMAIL PROTECTED] theorized there may be some value

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread Eric Dumazet
David S. Miller a écrit : From: jamal [EMAIL PROTECTED] Date: Wed, 07 Dec 2005 16:37:10 -0500 I think there is value for prefetch - just not the way the current patch has it. Something less adventorous as suggested by Robert would make a lot more sense. Looking at the e1000 patch in

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-07 Thread Eric Dumazet
John Ronciak a écrit : On 12/7/05, David S. Miller [EMAIL PROTECTED] wrote: Keyword, this box. We don't disagree and never have with this. It's why we were asking the question of find us a case where the prefetch shows a detriment to performance. I think Jesse's data and recommendation of

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-08 Thread Eric Dumazet
Robert Olsson a écrit : David S. Miller writes: For the host bound case, copybreak is always a way due to how socket buffer accounting works. If you use a 1500 byte SKB for 64 bytes of data, this throws off all of the socket buffer accounting because you're consuming more of the socket

Re: Resend [PATCH netdev-2.6 2/8] e1000: Performance Enhancements

2005-12-08 Thread Eric Dumazet
Jesse Brandeburg a écrit : On 12/7/05, David S. Miller [EMAIL PROTECTED] wrote: From: Eric Dumazet [EMAIL PROTECTED] Date: Thu, 08 Dec 2005 04:47:05 +0100 #4#5 as proposed in the patch can not be a win + prefetch(next_skb); + prefetch(next_skb-data - NET_IP_ALIGN

[PATCH] [NET] : move struct proto_ops to const

2005-12-17 Thread Eric Dumazet
of false sharing on SMP, and speedup some socket system calls. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] --- linux-2.6.15-rc5/include/net/protocol.h 2005-12-04 06:10:42.0 +0100 +++ linux-2.6.15-rc5-ed/include/net/protocol.h 2005-12-17 11:21:22.0 +0100 @@ -65,7 +65,7

[PATCH, RFC] RCU : OOM avoidance and lower latency

2006-01-06 Thread Eric Dumazet
entry of this queue. I assume that if a CPU queued 10.000 items in its RCU queue, then the oldest entry cannot still be in use by another CPU. This might sounds as a violation of RCU rules, (I'm not an RCU expert) but seems quite reasonable. Signed-off-by: Eric Dumazet [EMAIL PROTECTED

Re: [PATCH, RFC] RCU : OOM avoidance and lower latency (Version 2), HOTPLUG_CPU fix

2006-01-06 Thread Eric Dumazet
stress tests, I could not reproduce OOM anymore after applying this patch. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] --- linux-2.6.15/kernel/rcupdate.c 2006-01-03 04:21:10.0 +0100 +++ linux-2.6.15-edum/kernel/rcupdate.c 2006-01-06 13:32:02.0 +0100 @@ -71,14 +71,14

Re: [PATCH, RFC] RCU : OOM avoidance and lower latency

2006-01-06 Thread Eric Dumazet
Andi Kleen a écrit : On Friday 06 January 2006 11:17, Eric Dumazet wrote: I assume that if a CPU queued 10.000 items in its RCU queue, then the oldest entry cannot still be in use by another CPU. This might sounds as a violation of RCU rules, (I'm not an RCU expert) but seems quite reasonable

Re: [PATCH, RFC] RCU : OOM avoidance and lower latency

2006-01-06 Thread Eric Dumazet
Alan Cox a écrit : On Gwe, 2006-01-06 at 11:17 +0100, Eric Dumazet wrote: I assume that if a CPU queued 10.000 items in its RCU queue, then the oldest entry cannot still be in use by another CPU. This might sounds as a violation of RCU rules, (I'm not an RCU expert) but seems quite reasonable

Re: [PATCH, RFC] RCU : OOM avoidance and lower latency

2006-01-06 Thread Eric Dumazet
Paul E. McKenney a écrit : On Fri, Jan 06, 2006 at 01:37:12PM +, Alan Cox wrote: On Gwe, 2006-01-06 at 11:17 +0100, Eric Dumazet wrote: I assume that if a CPU queued 10.000 items in its RCU queue, then the oldest entry cannot still be in use by another CPU. This might sounds as a violation

Re: [PATCH, RFC] RCU : OOM avoidance and lower latency

2006-01-06 Thread Eric Dumazet
Andi Kleen a écrit : I always disliked the per chain spinlocks even for other hash tables like TCP/UDP multiplex - it would be much nicer to use a much smaller separately hashed lock table and save cache. In this case the special case of using a one entry only lock hash table makes sense.

Re: [PATCH, RFC] RCU : OOM avoidance and lower latency

2006-01-06 Thread Eric Dumazet
David S. Miller a écrit : From: Eric Dumazet [EMAIL PROTECTED] Date: Sat, 07 Jan 2006 08:34:35 +0100 I agree, I do use a hashed spinlock array on my local tree for TCP, mainly to reduce the hash table size by a 2 factor. So what do you think about going to a single spinlock for the routing

Re: [PATCH, RFC] RCU : OOM avoidance and lower latency

2006-01-07 Thread Eric Dumazet
David S. Miller a écrit : Eric, how important do you honestly think the per-hashchain spinlocks are? That's the big barrier from making rt_secret_rebuild() a simple rehash instead of flushing the whole table as it does now. No problem for me in going to a single spinlock. I did the hashed

Re: [E1000-devel] Transmit timeout with E1000

2006-01-11 Thread Eric Dumazet
Rogier Wolff a écrit : On Wed, Jan 11, 2006 at 02:43:49PM +0100, Erik Mouw wrote: The system only recovers after the Netdev watchdog found out that the transmit timed out. However, the e1000 register dump starts about 4 to 5 seconds earlier: a possible workaround would be to trigger the timeout

[PATCH] [IPV4] : rt_cache_stat can be statically defined

2006-01-17 Thread Eric Dumazet
of a too big increase of bss (in UP mode) or static per_cpu data for SMP (PERCPU_ENOUGH_ROOM is currently 32768 bytes) Signed-off-by: Eric Dumazet [EMAIL PROTECTED] --- net-2.6/net/ipv4/route.c2006-01-17 10:51:24.0 +0100 +++ net-2.6-ed/net/ipv4/route.c 2006-01-17 11:25:33.0

Re: [patch 3/4] net: Percpufy frequently used variables -- proto.sockets_allocated

2006-01-27 Thread Eric Dumazet
Ravikiran G Thirumalai a écrit : Change the atomic_t sockets_allocated member of struct proto to a per-cpu counter. Signed-off-by: Pravin B. Shelar [EMAIL PROTECTED] Signed-off-by: Ravikiran Thirumalai [EMAIL PROTECTED] Signed-off-by: Shai Fultheim [EMAIL PROTECTED] Hi Ravikiran If I

Re: [patch 2/4] net: Percpufy frequently used variables -- struct proto.memory_allocated

2006-01-27 Thread Eric Dumazet
Ravikiran G Thirumalai a écrit : Change struct proto-memory_allocated to a batching per-CPU counter (percpu_counter) from an atomic_t. A batching counter is better than a plain per-CPU counter as this field is read often. Signed-off-by: Pravin B. Shelar [EMAIL PROTECTED] Signed-off-by:

Re: [patch 3/4] net: Percpufy frequently used variables -- proto.sockets_allocated

2006-01-27 Thread Eric Dumazet
Ravikiran G Thirumalai a écrit : On Fri, Jan 27, 2006 at 12:16:02PM -0800, Andrew Morton wrote: Ravikiran G Thirumalai [EMAIL PROTECTED] wrote: which can be assumed as not frequent. At sk_stream_mem_schedule(), read_sockets_allocated() is invoked only certain conditions, under memory

Re: [patch 3/4] net: Percpufy frequently used variables -- proto.sockets_allocated

2006-01-27 Thread Eric Dumazet
Ravikiran G Thirumalai a écrit : On Fri, Jan 27, 2006 at 11:30:23PM +0100, Eric Dumazet wrote: There are several issues here : alloc_percpu() current implementation is a a waste of ram. (because it uses slab allocations that have a minimum size of 32 bytes) Oh there was a solution

Re: [patch 3/4] net: Percpufy frequently used variables -- proto.sockets_allocated

2006-01-27 Thread Eric Dumazet
Andrew Morton a écrit : Eric Dumazet [EMAIL PROTECTED] wrote: Ravikiran G Thirumalai a écrit : On Fri, Jan 27, 2006 at 12:16:02PM -0800, Andrew Morton wrote: Ravikiran G Thirumalai [EMAIL PROTECTED] wrote: which can be assumed as not frequent. At sk_stream_mem_schedule

Re: [patch 3/4] net: Percpufy frequently used variables -- proto.sockets_allocated

2006-01-27 Thread Eric Dumazet
Eric Dumazet a écrit : Andrew Morton a écrit : Eric Dumazet [EMAIL PROTECTED] wrote: Ravikiran G Thirumalai a écrit : On Fri, Jan 27, 2006 at 12:16:02PM -0800, Andrew Morton wrote: Ravikiran G Thirumalai [EMAIL PROTECTED] wrote: which can be assumed as not frequent

Re: [patch 3/4] net: Percpufy frequently used variables -- proto.sockets_allocated

2006-01-27 Thread Eric Dumazet
Andrew Morton a écrit : Eric Dumazet [EMAIL PROTECTED] wrote: An advantage of retaining a spinlock in percpu_counter is that if accuracy is needed at a low rate (say, /proc reading) we can take the lock and then go spill each CPU's local count into the main one. It would need to be a very low

Re: [patch 3/4] net: Percpufy frequently used variables -- proto.sockets_allocated

2006-01-27 Thread Eric Dumazet
Ravikiran G Thirumalai a écrit : On Sat, Jan 28, 2006 at 01:35:03AM +0100, Eric Dumazet wrote: Eric Dumazet a écrit : Andrew Morton a écrit : Eric Dumazet [EMAIL PROTECTED] wrote: long percpu_counter_read_accurate(struct percpu_counter *fbc) { long res = 0; int cpu

Re: [patch 3/4] net: Percpufy frequently used variables -- proto.sockets_allocated

2006-01-28 Thread Eric Dumazet
Benjamin LaHaise a écrit : On Sat, Jan 28, 2006 at 01:28:20AM +0100, Eric Dumazet wrote: We might use atomic_long_t only (and no spinlocks) Something like this ? Erk, complex and slow... Try using local_t instead, which is substantially cheaper on the P4 as it doesn't use the lock prefix

  1   2   3   4   5   6   7   8   9   10   >