Re: [PATCH][RFC] Re: high latency with TCP connections
Hello! transactions to data segments is fubar. That issue is also why I wonder about the setting of tcp_abc. Yes, switching ABC on/off has visible impact on amount of segments. When ABC is off, amount of segments is almost the same as number of transactions. When it is on, ~1.5% are merged. But this is invisible in numbers of throughput/cpu usage. That' numbers: 1Gig link. The first column is b. - separates runs of netperf in backward direction. Run #1. One host is slower. old,abc=0 new,abc=0 new,abc=1 old,abc=1 2 23652.00 6.31 21.11 10.665 8.924 23622.16 6.47 21.01 10.951 8.893 23625.05 6.21 21.01 10.512 8.891 23725.12 6.46 20.31 10.898 8.559 - 23594.87 21.90 6.44 9.283 10.912 23631.52 20.30 6.36 8.592 10.766 23609.55 21.00 6.26 8.896 10.599 23633.75 21.10 5.44 8.929 9.206 4 36349.11 8.71 31.21 9.584 8.585 36461.37 8.65 30.81 9.492 8.449 36723.72 8.22 31.31 8.949 8.526 35801.24 8.58 30.51 9.589 8.521 - 35127.34 33.80 8.43 9.621 9.605 36165.50 30.90 8.48 8.545 9.381 36201.45 31.10 8.31 8.592 9.185 35269.76 30.00 8.58 8.507 9.732 8 41148.23 10.39 42.30 10.101 10.281 41270.06 11.04 31.31 10.698 7.585 41181.56 5.66 48.61 5.496 11.803 40372.37 9.68 56.50 9.591 13.996 - 40392.14 47.00 11.89 11.637 11.775 40613.80 36.90 9.16 9.086 9.019 40504.66 53.60 7.73 13.234 7.639 40388.99 48.70 11.93 12.058 11.814 16 67952.27 16.27 43.70 9.576 6.432 68031.40 10.56 53.70 6.206 7.894 6.95 12.81 46.90 7.559 6.920 67814.41 16.13 46.50 9.517 6.857 - 68031.46 51.30 11.53 7.541 6.781 68044.57 40.70 8.48 5.982 4.986 67808.13 39.60 15.86 5.840 9.355 67818.32 52.90 11.51 7.801 6.791 32 90445.09 15.41 99.90 6.817 11.045 90210.34 16.11 100.00 7.143 11.085 90221.84 17.31 98.90 7.676 10.962 90712.78 18.41 99.40 8.120 10.958 - 89155.51 99.90 12.89 11.205 5.782 90058.54 99.90 16.16 11.093 7.179 90092.31 98.60 15.41 10.944 6.840 88688.96 99.00 17.59 11.163 7.933 64 89983.76 13.66 100.00 6.071 11.113 90504.24 17.54 100.00 7.750 11.049 92043.36 17.44 99.70 7.580 10.832 90979.29 16.01 99.90 7.038 10.981 - 88615.27 99.90 14.91 11.273 6.729 89316.13 99.90 17.28 11.185 7.740 90622.85 99.90 16.81 11.024 7.420 89084.85 99.90 17.51 11.214 7.861 Run #2. Slower host is replaced with better one. ABC=0. No runs in backward directions. new old 2 24009.73 8.80 6.49 3.667 10.806 24008.43 8.00 6.32 3.334 10.524 4 40012.53 18.30 8.79 4.574 8.783 3.84 19.40 8.86 4.851 8.857 8 60500.29 26.30 12.78 4.348 8.452 60397.79 26.30 11.73 4.355 7.769 16 69619.95 39.80 14.03 5.717 8.063 70528.72 24.90 14.43 3.531 8.184 32 132522.01 53.20 21.28 4.015 6.424 132602.93 57.70 22.59 4.351 6.813 64 145738.83 60.30 25.01 4.138 6.865 143129.55 73.20 24.19 5.114 6.759 128 148184.21 69.70 24.96 4.704 6.739 148143.47 71.00 25.01 4.793 6.753 256 144798.91 69.40 25.01 4.793 6.908 144086.01 73.00 24.61 5.067 6.832 Frankly, I do not see any statistically valid correlations. that linux didn't seem to be doing the same thing. Hence my tweaking when seeing this patch come along...] netperf does not catch this. :-) Even with this patch linux does not ack each second segment dumbly, it waits for some conditions, mostly read() emptying receive queue. To model this it is necessary to insert some gaps between bursted segments or to use slow network. I have no doubts it is easy to model a situation when we send lots of useless ACKs. F.e. inserting 20ms gaps between requests. To see effect on thoughput/cpu, we could start enough of connections, doing the same thing. Alexey - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] Re: high latency with TCP connections
Alexey Kuznetsov wrote: Hello! transactions to data segments is fubar. That issue is also why I wonder about the setting of tcp_abc. Yes, switching ABC on/off has visible impact on amount of segments. When ABC is off, amount of segments is almost the same as number of transactions. When it is on, ~1.5% are merged. But this is invisible in numbers of throughput/cpu usage. Hmm, that would seem to suggest that for new the netperf/netserver were being fast enough that the code didn't perceive the receipt of back-to-back sub-MSS segments? (Is that even possible once -b is fairly large?) Otherwise, with new I would have expected the segment count to be meaningfully than the transaction count? That' numbers: 1Gig link. The first column is b. - separates runs of netperf in backward direction. Run #1. One host is slower. old,abc=0 new,abc=0 new,abc=1 old,abc=1 2 23652.00 6.31 21.11 10.665 8.924 23622.16 6.47 21.01 10.951 8.893 23625.05 6.21 21.01 10.512 8.891 23725.12 6.46 20.31 10.898 8.559 - 23594.87 21.90 6.44 9.283 10.912 23631.52 20.30 6.36 8.592 10.766 23609.55 21.00 6.26 8.896 10.599 23633.75 21.10 5.44 8.929 9.206 4 36349.11 8.71 31.21 9.584 8.585 36461.37 8.65 30.81 9.492 8.449 36723.72 8.22 31.31 8.949 8.526 35801.24 8.58 30.51 9.589 8.521 - 35127.34 33.80 8.43 9.621 9.605 36165.50 30.90 8.48 8.545 9.381 36201.45 31.10 8.31 8.592 9.185 35269.76 30.00 8.58 8.507 9.732 8 41148.23 10.39 42.30 10.101 10.281 41270.06 11.04 31.31 10.698 7.585 41181.56 5.66 48.61 5.496 11.803 40372.37 9.68 56.50 9.591 13.996 - 40392.14 47.00 11.89 11.637 11.775 40613.80 36.90 9.16 9.086 9.019 40504.66 53.60 7.73 13.234 7.639 40388.99 48.70 11.93 12.058 11.814 16 67952.27 16.27 43.70 9.576 6.432 68031.40 10.56 53.70 6.206 7.894 6.95 12.81 46.90 7.559 6.920 67814.41 16.13 46.50 9.517 6.857 - 68031.46 51.30 11.53 7.541 6.781 68044.57 40.70 8.48 5.982 4.986 67808.13 39.60 15.86 5.840 9.355 67818.32 52.90 11.51 7.801 6.791 32 90445.09 15.41 99.90 6.817 11.045 90210.34 16.11 100.00 7.143 11.085 90221.84 17.31 98.90 7.676 10.962 90712.78 18.41 99.40 8.120 10.958 - 89155.51 99.90 12.89 11.205 5.782 90058.54 99.90 16.16 11.093 7.179 90092.31 98.60 15.41 10.944 6.840 88688.96 99.00 17.59 11.163 7.933 64 89983.76 13.66 100.00 6.071 11.113 90504.24 17.54 100.00 7.750 11.049 92043.36 17.44 99.70 7.580 10.832 90979.29 16.01 99.90 7.038 10.981 - 88615.27 99.90 14.91 11.273 6.729 89316.13 99.90 17.28 11.185 7.740 90622.85 99.90 16.81 11.024 7.420 89084.85 99.90 17.51 11.214 7.861 Run #2. Slower host is replaced with better one. ABC=0. No runs in backward directions. new old 2 24009.73 8.80 6.49 3.667 10.806 24008.43 8.00 6.32 3.334 10.524 4 40012.53 18.30 8.79 4.574 8.783 3.84 19.40 8.86 4.851 8.857 8 60500.29 26.30 12.78 4.348 8.452 60397.79 26.30 11.73 4.355 7.769 16 69619.95 39.80 14.03 5.717 8.063 70528.72 24.90 14.43 3.531 8.184 32 132522.01 53.20 21.28 4.015 6.424 132602.93 57.70 22.59 4.351 6.813 64 145738.83 60.30 25.01 4.138 6.865 143129.55 73.20 24.19 5.114 6.759 128 148184.21 69.70 24.96 4.704 6.739 148143.47 71.00 25.01 4.793 6.753 256 144798.91 69.40 25.01 4.793 6.908 144086.01 73.00 24.61 5.067 6.832 Frankly, I do not see any statistically valid correlations. Does look like it jumps-around quite a bit - for example the run#2 with -b 16 had the CPU util all over the map on the netperf side. That wasn't by any chance an SMP system? that linux didn't seem to be doing the same thing. Hence my tweaking when seeing this patch come along...] netperf does not catch this. :-) Nope :( One of these days I need to teach netperf how to extract TCP statistics from as many platforms as possible. Meantime it relies as always on the kindness of benchmarkers :) (My appologies to Tennesee Williams :) Even with this patch linux does not ack each second segment dumbly, it waits for some conditions, mostly read() emptying receive queue. Good. HP-UX is indeed dumb about this, but I'm assured it will be changing. I
Re: [PATCH][RFC] Re: high latency with TCP connections
On Mon, 18 Sep 2006 06:56:55 -0700 (PDT) David Miller [EMAIL PROTECTED] wrote: From: Alexey Kuznetsov [EMAIL PROTECTED] Date: Mon, 18 Sep 2006 14:37:05 +0400 It looks perfectly fine to me, would you like me to apply it Alexey? Yes, I think it is safe. Ok, I'll put this into net-2.6.19 for now. Thanks. Did you try this on a desktop system? Something is wrong with net-2.6.19 basic web browsing seems slower. -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] Re: high latency with TCP connections
From: Stephen Hemminger [EMAIL PROTECTED] Date: Wed, 20 Sep 2006 15:44:06 -0700 On Mon, 18 Sep 2006 06:56:55 -0700 (PDT) David Miller [EMAIL PROTECTED] wrote: Ok, I'll put this into net-2.6.19 for now. Thanks. Did you try this on a desktop system? Something is wrong with net-2.6.19 basic web browsing seems slower. It might be due to other changes, please verify that it's truly caused by Alexey's change by backing it out and retesting. Note that I had to use an updated version of Alexey's change, which he sent me privately, because the first version didn't compile :) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] Re: high latency with TCP connections
On Wed, 20 Sep 2006 15:47:56 -0700 (PDT) David Miller [EMAIL PROTECTED] wrote: From: Stephen Hemminger [EMAIL PROTECTED] Date: Wed, 20 Sep 2006 15:44:06 -0700 On Mon, 18 Sep 2006 06:56:55 -0700 (PDT) David Miller [EMAIL PROTECTED] wrote: Ok, I'll put this into net-2.6.19 for now. Thanks. Did you try this on a desktop system? Something is wrong with net-2.6.19 basic web browsing seems slower. It might be due to other changes, please verify that it's truly caused by Alexey's change by backing it out and retesting. Note that I had to use an updated version of Alexey's change, which he sent me privately, because the first version didn't compile :) It might be something else.. there are a lot of changes from 2.6.18 to net-2.6.19. -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] Re: high latency with TCP connections
From: Alexey Kuznetsov [EMAIL PROTECTED] Date: Mon, 4 Sep 2006 20:00:45 +0400 Try enclosed patch. I have no idea why 9.997 sec is so magic, but I get exactly this number on my notebook. :-) = This patch enables sending ACKs each 2d received segment. It does not affect either mss-sized connections (obviously) or connections controlled by Nagle (because there is only one small segment in flight). The idea is to record the fact that a small segment arrives on a connection, where one small segment has already been received and still not-ACKed. In this case ACK is forced after tcp_recvmsg() drains receive buffer. In other words, it is a soft each-2d-segment ACK, which is enough to preserve ACK clock even when ABC is enabled. Signed-off-by: Alexey Kuznetsov [EMAIL PROTECTED] This looks exactly like the kind of patch I tried to formulate, very unsuccessfully, last time this topic came up a year or so ago. It looks perfectly fine to me, would you like me to apply it Alexey? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] Re: high latency with TCP connections
From: Rick Jones [EMAIL PROTECTED] Date: Tue, 05 Sep 2006 10:55:16 -0700 Is this really necessary? I thought that the problems with ABC were in trying to apply byte-based heuristics from the RFC(s) to a packet-oritented cwnd in the stack? This is receiver side, and helps a sender who does congestion control based upon packet counting like Linux does. It really is less related to ABC than Alexey implies, we've always had this kind of problem as I mentioned in previous talks in the past on this issue. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] Re: high latency with TCP connections
Hello! It looks perfectly fine to me, would you like me to apply it Alexey? Yes, I think it is safe. Theoretically, there is one place where it can be not so good. Good nagling tcp connection, which makes lots of small write()s, will send MSS sized frames due to delayed ACKs. But if we ACK each other segment, more segments will come out incomplete, which could result in some decrease of throughput. But the trap for this case was set 6 years ago. For unidirectional sessions ACKs were sent not even each second segment, but each small segment. :-) This did not show any problems for those 6 years. I guess it means that the problem does not exist. Alexey - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] Re: high latency with TCP connections
David Miller wrote: From: Rick Jones [EMAIL PROTECTED] Date: Tue, 05 Sep 2006 10:55:16 -0700 Is this really necessary? I thought that the problems with ABC were in trying to apply byte-based heuristics from the RFC(s) to a packet-oritented cwnd in the stack? This is receiver side, and helps a sender who does congestion control based upon packet counting like Linux does. It really is less related to ABC than Alexey implies, we've always had this kind of problem as I mentioned in previous talks in the past on this issue. For a connection receiving nothing but sub-MSS segments this is going to non-trivially increase the number of ACKs sent no? I would expect an unpleasant increase in service demands on something like a burst enabled (./configure --enable-burst) netperf TCP_RR test: netperf -t TCP_RR -H foo -- -b N # N 1 to increase as a result. Pipelined HTTP would be like that, some NFS over TCP stuff too, maybe X traffic, other transactional workloads as well - maybe Tuxeudo. rick jones - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] Re: high latency with TCP connections
Hello! Of course, number of ACK increases. It is the goal. :-) unpleasant increase in service demands on something like a burst enabled (./configure --enable-burst) netperf TCP_RR test: netperf -t TCP_RR -H foo -- -b N # N 1 foo=localhost b patched orig 2 105874.83 105143.71 3 114208.53 114023.07 4 120493.99 120851.27 5 128087.48 128573.33 10 151328.48 151056.00 Probably, the test is done wrong. But I see no difference. to increase as a result. Pipelined HTTP would be like that, some NFS over TCP stuff too, maybe X traffic, X will be excited about better latency. What's about protocols not interested in latency, they will be a little happier, if transactions are processed asynchronously. But actually, it is not about increasing/decreasing number of ACKs. It is about killing that pain in ass which we used to have because we pretended to be too smart. Alexey - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] Re: high latency with TCP connections
Alexey Kuznetsov wrote: Hello! Of course, number of ACK increases. It is the goal. :-) unpleasant increase in service demands on something like a burst enabled (./configure --enable-burst) netperf TCP_RR test: netperf -t TCP_RR -H foo -- -b N # N 1 foo=localhost There isn't any sort of clever short-circuiting in loopback is there? I do like the convenience of testing things over loopback, but always fret about not including drivers and actual hardware interrupts etc. b patched orig 2 105874.83 105143.71 3 114208.53 114023.07 4 120493.99 120851.27 5 128087.48 128573.33 10 151328.48 151056.00 Probably, the test is done wrong. But I see no difference. Regardless, kudos for running the test. The only thing missing is the -c and -C options to enable the CPU utilization measurements which will then give the service demand on a CPU time per transaction basis. Or was this a UP system that was taken to CPU saturation? to increase as a result. Pipelined HTTP would be like that, some NFS over TCP stuff too, maybe X traffic, X will be excited about better latency. What's about protocols not interested in latency, they will be a little happier, if transactions are processed asynchronously. What i'm thinking about isn't so much about the latency as it is the aggregate throughput a system can do with lots of these protocols/connections going at the same time. Hence the concern about increases in service demand. But actually, it is not about increasing/decreasing number of ACKs. It is about killing that pain in ass which we used to have because we pretended to be too smart. :) rick jones - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] Re: high latency with TCP connections
Hello! There isn't any sort of clever short-circuiting in loopback is there? No, from all that I know. I do like the convenience of testing things over loopback, but always fret about not including drivers and actual hardware interrupts etc. Well, if the test is right, it should show cost of redundant ACKs. Regardless, kudos for running the test. The only thing missing is the -c and -C options to enable the CPU utilization measurements which will then give the service demand on a CPU time per transaction basis. Or was this a UP system that was taken to CPU saturation? It is my notebook. :-) Of course, cpu consumption is 100%. (Actally, netperf shows 100.10 :-)) I will redo test on a real network. What range of -b should I test? What i'm thinking about isn't so much about the latency I understand. Actually, I did those tests ages ago for a pure throughput case, when nothing goes in the opposite direction. I did not find a difference that time. And nobody even noticed that Linux sends ACKs _each_ small segment for unidirectional connections for all those years. :-) Alexey - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] Re: high latency with TCP connections
Regardless, kudos for running the test. The only thing missing is the -c and -C options to enable the CPU utilization measurements which will then give the service demand on a CPU time per transaction basis. Or was this a UP system that was taken to CPU saturation? It is my notebook. :-) Of course, cpu consumption is 100%. (Actally, netperf shows 100.10 :-)) Gotta love the accuracy. :) I will redo test on a real network. What range of -b should I test? I suppose that depends on your patience :) In theory, as you increase (eg double) the -b setting you should reach a point of diminishing returns wrt transaction rate. If you see that, and see the service demand flattening-out I'd say it is probably time to stop. I'm also not quite sure if abc needs to be disabled or not. I do know that I left-out one very important netperf option. The command line should be: netperf -t TCP_RR -H foo -- -b N -D where -D is added to set TCP_NODELAY. Otherwise, the ratio of transactions to data segments is fubar. That issue is also why I wonder about the setting of tcp_abc. [I have this quixotic pipedream about being able to --enable-burst, set -D and say that the number of TCP segments exchanged on the network is 2X the transaction count when request and response size are MSS. The raison d'etre for this pipe dream is maximizing PPS with TCP_RR tests without _having_ to have hundreds if not thousands of simultaneous netperfs/connections - say with just as many netperfs/connections as there are CPUs or threads/strands in the system. It was while trying to make this pipe dream a reality I first noticed that HP-UX 11i, which normally has a very nice ACK avoidance heuristic, would send an immediate ACK if it received back-to-back sub-MSS segments - thus ruining my pipe dream when it came to HP-UX testing. Hapily, I noticed that linux didn't seem to be doing the same thing. Hence my tweaking when seeing this patch come along...] What i'm thinking about isn't so much about the latency I understand. Actually, I did those tests ages ago for a pure throughput case, when nothing goes in the opposite direction. I did not find a difference that time. And nobody even noticed that Linux sends ACKs _each_ small segment for unidirectional connections for all those years. :-) Not everyone looks very closely (alas, sometimes myself included). If all anyone does is look at throughput, until they CPU saturate they wouldn't notice. Heck, before netperf and TCP_RR tests, and sadly even still today, most people just look at how fast a single-connection, unidirectional data transfer goes and leave it at that :( Thankfully, the set of most people and netdev aren't completely overlapping. rick jones - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] Re: high latency with TCP connections
Alexey Kuznetsov wrote: Hello! Some people reported that this program runs in 9.997 sec when run on FreeBSD. Try enclosed patch. I have no idea why 9.997 sec is so magic, but I get exactly this number on my notebook. :-) Alexey = This patch enables sending ACKs each 2d received segment. It does not affect either mss-sized connections (obviously) or connections controlled by Nagle (because there is only one small segment in flight). The idea is to record the fact that a small segment arrives on a connection, where one small segment has already been received and still not-ACKed. In this case ACK is forced after tcp_recvmsg() drains receive buffer. In other words, it is a soft each-2d-segment ACK, which is enough to preserve ACK clock even when ABC is enabled. Is this really necessary? I thought that the problems with ABC were in trying to apply byte-based heuristics from the RFC(s) to a packet-oritented cwnd in the stack? rick jones - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] Re: high latency with TCP connections
Hello! Is this really necessary? No, of course. We lived for ages without this, would live for another age. I thought that the problems with ABC were in trying to apply byte-based heuristics from the RFC(s) to a packet-oritented cwnd in the stack? It was just the last drop. Even with disabled ABC, that test shows some gaps in latency summed up to ~300 msec. Almost invisible, but not good. Too aggressive delack has many other issues. Even without ABC we have quadratically suppressed cwnd on TCP_NODELAY connections comparing to BSD: at sender side we suppress it by counting cwnd in packets, at receiver side by ACKing by byte counter. Each time when another victim sees artificial latencies introduced by agressive delayed acks, even though he requested TCP_NODELAY, our best argument is Stupid, you do all wrong, how could you get a decent performance? :-). Probably, we stand for a feature which really does not worth to stand for and causes nothing but permanent pain in ass. Alexey - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: high latency with TCP connections
Hello! At least for slow start it is safe, but experiments with atcp for netchannels showed that it is better not to send excessive number of acks when slow start is over, If this thing is done from tcp_cleanup_rbuf(), it should not affect performance too much. Note, that with ABC and another pathological cases, which do not allow to send more than a fixed amount of segments [ we have lots of them, f.e. sending tiny segments, we can hit sndbuf limit ], we deal with case, when slow start is _never_ over. instead we can introduce some tricky ack avoidance scheme and ack at least 2-3-4 packets or full MSS instead of two mss-sized frames. One smart scheme was used at some stage (2000, probably never merged in this form to mainstream): tcp counted amount of unacked small segments in ack.rcv_small and kept threshold in ack.rcv_thresh. + + /* If we ever saw N1 small segments from peer, it has +* enough of send buffer to send N packets and does not nagle. +* Hence, we may delay acks more aggresively. +*/ + if (tp-ack.rcv_small tp-ack.rcv_thresh+1) + tp-ack.rcv_thresh = tp-ack.rcv_small-1; + tp-ack.rcv_small = 0; That was too much of trouble for such simple thing. So, eventually it was replaced with much dumber scheme. Look at current tcp_cleanup_rbuf(). It forces ACK, each time when it sees, that some small segment was received. It survived for 6 years, so that I guess it did not hurt anybody. :-) What I would suggest to do now, is to replace: (copied 0 (icsk-icsk_ack.pending ICSK_ACK_PUSHED) !icsk-icsk_ack.pingpong !atomic_read(sk-sk_rmem_alloc))) time_to_ack = 1; with: (copied 0 (icsk-icsk_ack.unacked 1 || (icsk-icsk_ack.pending ICSK_ACK_PUSHED) !icsk-icsk_ack.pingpong) !atomic_read(sk-sk_rmem_alloc))) time_to_ack = 1; I would not hesitate even a minute, if variable unacked could be caluclated using some existing state variables. Alexey -- VGER BF report: U 0.500017 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][RFC] Re: high latency with TCP connections
Hello! Some people reported that this program runs in 9.997 sec when run on FreeBSD. Try enclosed patch. I have no idea why 9.997 sec is so magic, but I get exactly this number on my notebook. :-) Alexey = This patch enables sending ACKs each 2d received segment. It does not affect either mss-sized connections (obviously) or connections controlled by Nagle (because there is only one small segment in flight). The idea is to record the fact that a small segment arrives on a connection, where one small segment has already been received and still not-ACKed. In this case ACK is forced after tcp_recvmsg() drains receive buffer. In other words, it is a soft each-2d-segment ACK, which is enough to preserve ACK clock even when ABC is enabled. Signed-off-by: Alexey Kuznetsov [EMAIL PROTECTED] diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h index 9bf73fe..de4e83b 100644 --- a/include/net/inet_connection_sock.h +++ b/include/net/inet_connection_sock.h @@ -147,7 +147,8 @@ extern struct sock *inet_csk_clone(struc enum inet_csk_ack_state_t { ICSK_ACK_SCHED = 1, ICSK_ACK_TIMER = 2, - ICSK_ACK_PUSHED = 4 + ICSK_ACK_PUSHED = 4, + ICSK_ACK_PUSHED2 = 8 }; extern void inet_csk_init_xmit_timers(struct sock *sk, diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 934396b..4f3b76f 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -955,8 +955,11 @@ #endif * receive buffer and there was a small segment * in queue. */ - (copied 0 (icsk-icsk_ack.pending ICSK_ACK_PUSHED) -!icsk-icsk_ack.pingpong !atomic_read(sk-sk_rmem_alloc))) + (copied 0 +((icsk-icsk_ack.pending ICSK_ACK_PUSHED2) || + ((icsk-icsk_ack.pending ICSK_ACK_PUSHED) + !icsk-icsk_ack.pingpong) + !atomic_read(sk-sk_rmem_alloc))) time_to_ack = 1; } diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 111ff39..5877920 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -156,6 +156,8 @@ static void tcp_measure_rcv_mss(struct s return; } } + if (icsk-icsk_ack.pending ICSK_ACK_PUSHED) + icsk-icsk_ack.pending |= ICSK_ACK_PUSHED2; icsk-icsk_ack.pending |= ICSK_ACK_PUSHED; } } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: high latency with TCP connections
From: Pekka Savola [EMAIL PROTECTED] Date: Fri, 1 Sep 2006 12:44:48 +0300 (EEST) On Thu, 31 Aug 2006, David Miller wrote: ... Probably, aspect 1 of ABC just should be disabled. And the first my suggestion looks working too. I'm ready to rip out ABC entirely, to be honest. Or at least turn it off by default. Just as a curious observer: do you think these issues are due to ABC implementation, or due to ABC specification? It simply doesn't apply to us, as Alexey explained, because we prevent ACK divison already when we apply the ACK to the retransmit queue purging loop. If we didn't free any whole packets, we don't advance the congestion window. The other bit, dealing with delayed ACKs, we could handle another way. ABC is a very BSD specific algorithm, as Alexey also mentioned. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: high latency with TCP connections
On Fri, Sep 01, 2006 at 01:47:15PM +0400, Alexey Kuznetsov ([EMAIL PROTECTED]) wrote: Hello! problem. The problem is really at the receiver because we only ACK every other full sized frame. I had the idea to ACK every 2 frames, regardless of size, This would solve lots of problems. At least for slow start it is safe, but experiments with atcp for netchannels showed that it is better not to send excessive number of acks when slow start is over, instead we can introduce some tricky ack avoidance scheme and ack at least 2-3-4 packets or full MSS instead of two mss-sized frames. Alexey -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: high latency with TCP connections
Alexander Vodomerov wrote: On Wed, Aug 30, 2006 at 02:39:55PM -0700, David Miller wrote: Expecting any performance with one byte write's is silly. This is absolutely true. TCP_NODELAY can only save you when you are sending a small amount of data in aggregate, such as in an SSH or telnet session, whereas in the case being shown here a large amount of data is being sent in small chunks which will always get bad performance. Information is sent with one byte write's because it is not available at the moment of sending (it may be read from hardware device or user). If I change 1 to 10 or 100 nothing changes. I'm afraid there is a bit of misunderstanding here. Only very small amount of data is being sent over network. The total traffic for example I sent is only 10 bytes/s. After every 10th packet program does usleep(10) to simulate pause before next available data. There are really 3 factors: 1) total size of information is small 2) data for transferring is arrived by small portions from external source 3) it is very important that any portion should be delivered to receiver as soon as possible. Is TCP is good choice for such transfer or some other protocol is better suited? If message boundary preservation is a useful feature for your app, you could try SCTP. You should be able to do this by replacing IPPROTO_TCP with IPPROTO_SCTP and TCP_NODELAY with SCTP_NODELAY. Thanks Sridhar With best regards, Alexander. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: high latency with TCP connections
The word performance in this list seems to always mean 'throughput'. It seems though that there could be some knob to tweak for those of us who don't care so much about throughput but care a great deal about latency. SCTP has been mentioned. There is also DCCP - http://www.read.cs.ucla.edu/dccp/ -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: high latency with TCP connections
Hello! 2) a way to take delayed ACKs into account for cwnd growth This part is OK now, right? 1) protection against ACK division But Linux never had this problem... Congestion window was increased only when a whole skb is ACKed, flag FLAG_DATA_ACKED. (TSO could break this, but should not). Otherwise, this ACK just advanced snd_una and nothing more. This aspect of ABC is crucial for BSD. TCP_NODELAY sockets did not obey congestion control there. From the very beginning, before slow start it can send thousands of 1 byte segments. The only problem of kind too-aggressive with Linux was that we could develop large cwnd sending small segments, and then switch to sending mss-sized segments. It does not look scary, to be honest. :-) Linux had troubles with slow start even before ABC. Actually, some of applications can suffer of the same syndrome even if ABC disabled. With ABC it becomes TROUBLE, cwnd has no chances to develop at all. Probably, aspect 1 of ABC just should be disabled. And the first my suggestion looks working too. Alexey - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: high latency with TCP connections
From: Alexey Kuznetsov [EMAIL PROTECTED] Date: Fri, 1 Sep 2006 03:29:23 +0400 2) a way to take delayed ACKs into account for cwnd growth This part is OK now, right? This part of ABC is not on by default, and was broken until last week :-) Test in tcp_slow_start() used to be: tp-bytes_acked 2*tp-mss_cache but now it is the correct: tp-bytes_acked = 2*tp-mss_cache It allows to make two congestion window increases from one ACK, when noticing delayed ACK. Non-ABC code did not do this, but could figure this kind of thing out while scanning retransmit queue. 1) protection against ACK division But Linux never had this problem... Congestion window was increased only when a whole skb is ACKed, flag FLAG_DATA_ACKED. (TSO could break this, but should not). Otherwise, this ACK just advanced snd_una and nothing more. Ugh, I missed this. :-/ The TSO code is carefuly to only trim TSO skbs on proper boundaries, and this ensures proper setting of FLAG_DATA_ACKED too. So no problems here. The only problem of kind too-aggressive with Linux was that we could develop large cwnd sending small segments, and then switch to sending mss-sized segments. It does not look scary, to be honest. :-) Agreed. Linux had troubles with slow start even before ABC. Actually, some of applications can suffer of the same syndrome even if ABC disabled. With ABC it becomes TROUBLE, cwnd has no chances to develop at all. I've discussed that very issue here before, some time ago, with John Heffner. It was in response to a user reporting a similar problem. The problem is really at the receiver because we only ACK every other full sized frame. I had the idea to ACK every 2 frames, regardless of size, but that might have other problems. There is an assymetry between how we do congestion control on sending (packet counting) and our ACK policy on receive (packet sized based). Probably, aspect 1 of ABC just should be disabled. And the first my suggestion looks working too. I'm ready to rip out ABC entirely, to be honest. Or at least turn it off by default. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: high latency with TCP connections
On Thu, 31 Aug 2006 16:57:01 -0700 (PDT) David Miller [EMAIL PROTECTED] wrote: From: Alexey Kuznetsov [EMAIL PROTECTED] Date: Fri, 1 Sep 2006 03:29:23 +0400 2) a way to take delayed ACKs into account for cwnd growth This part is OK now, right? This part of ABC is not on by default, and was broken until last week :-) Test in tcp_slow_start() used to be: tp-bytes_acked 2*tp-mss_cache but now it is the correct: tp-bytes_acked = 2*tp-mss_cache It allows to make two congestion window increases from one ACK, when noticing delayed ACK. Non-ABC code did not do this, but could figure this kind of thing out while scanning retransmit queue. 1) protection against ACK division But Linux never had this problem... Congestion window was increased only when a whole skb is ACKed, flag FLAG_DATA_ACKED. (TSO could break this, but should not). Otherwise, this ACK just advanced snd_una and nothing more. Ugh, I missed this. :-/ The TSO code is carefuly to only trim TSO skbs on proper boundaries, and this ensures proper setting of FLAG_DATA_ACKED too. So no problems here. The only problem of kind too-aggressive with Linux was that we could develop large cwnd sending small segments, and then switch to sending mss-sized segments. It does not look scary, to be honest. :-) Agreed. Linux had troubles with slow start even before ABC. Actually, some of applications can suffer of the same syndrome even if ABC disabled. With ABC it becomes TROUBLE, cwnd has no chances to develop at all. I've discussed that very issue here before, some time ago, with John Heffner. It was in response to a user reporting a similar problem. The problem is really at the receiver because we only ACK every other full sized frame. I had the idea to ACK every 2 frames, regardless of size, but that might have other problems. There is an assymetry between how we do congestion control on sending (packet counting) and our ACK policy on receive (packet sized based). Probably, aspect 1 of ABC just should be disabled. And the first my suggestion looks working too. I'm ready to rip out ABC entirely, to be honest. Or at least turn it off by default. Turn it off for 2.6.18, by default then evaluate more for 2.6.19 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: high latency with TCP connections
I'm ready to rip out ABC entirely, to be honest. Or at least turn it off by default. Turn it off for 2.6.18, by default then evaluate more for 2.6.19 If it goes out in 2.6.18 there could probably be a good argument for going into the stable tree as well... to stop the likes of the JVM type issues that users keep hitting (which is fixed or going to be fixed by Sun). -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: high latency with TCP connections
On Wed, 30 Aug 2006 14:07:34 +0400 Alexander Vodomerov [EMAIL PROTECTED] wrote: Hello! I'm writing an application that is working over TCP. Total traffic is very low (~ 10 kb/sec), but performance is very bad. I've tried to investigate problem with tcpdump and strace, and it shows that application does multiple writes, but TCP buffers them and send after some delay (about 40 msec). Due to nature of my application, it is essential to send any available data ASAP (decreased bandwidth is not important). I've set TCP_NODELAY option on socket, but it doesn't help. Linux TCP implements Appropriate Byte Count (ABC) and this penalizes applications that do small sends. The problem is that the other side may be delaying acknowledgments. If receiver doesn't acknowledge the sender will limit itself to the congestion window. If the flow is light, then you will be limited to 4 packets. We've written a simple program to reproduce the effect. It sends 10 small packets, then sleeps for 0.1 sec. Another node tries to receive data. Strace shows that 2 packets are sent immediately and other 8 are grouped together and delayed by 40 msec. It is interesting that this effect can be seen not only on Ethernet links, but on loopback also (with the same magic constant of 40 msec). Here is a test run: server (should be run first): $ ./a.out 1 5000 Server: begin send_all Server: total time 14.216441 client: $ ./a.out 2 5000 localhost Client: connected to localhost:5000 Client: begin receive_all Client: total time 14.223265 Expected time is 10.0 sec (instead of 14.0 sec). If packets are received more often (DELIM constant is set to 1 or 2) then effect disappear. Is this a desired behaviour? How can I specify that packets should be sent really immediately after write? Some people reported that this program runs in 9.997 sec when run on FreeBSD. Please cc me on replies, as I'm not subscribed to mailing list. With best regards, Alexander. void send_all(unsigned long delay) { int i; char buf[1024]; printf(Server: begin send_all\n); for (i = 1; i TOTAL_SENDS; ++i) { write(sock, buf, 1); Expecting any performance with one byte write's is silly. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: high latency with TCP connections
From: Stephen Hemminger [EMAIL PROTECTED] Date: Wed, 30 Aug 2006 10:27:27 -0700 Linux TCP implements Appropriate Byte Count (ABC) and this penalizes applications that do small sends. The problem is that the other side may be delaying acknowledgments. If receiver doesn't acknowledge the sender will limit itself to the congestion window. If the flow is light, then you will be limited to 4 packets. Right. However it occured to me the other day that ABC could be made smarter. If we sent small frames, ABC should account for that. The problem with ABC is that it prevents CWND growth not just during ACK division, but also when we truly are sending smaller sized frames. In fact, for chatty protocols, the real load on a router for the small packets is much less than that for full sized frames. So it is in fact these small frame sending cases for which we can be less conservative, whatever that means here. So my suggestion is that ABC should go look in the retransmit queue and see how many real packets are being fully ACK'd, rather than assuming the send queue is composed of MSS sized frames. I also think we should seriously consider changing the ABC default to 2 rather than 1. Expecting any performance with one byte write's is silly. This is absolutely true. TCP_NODELAY can only save you when you are sending a small amount of data in aggregate, such as in an SSH or telnet session, whereas in the case being shown here a large amount of data is being sent in small chunks which will always get bad performance. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: high latency with TCP connections
On Wed, 30 Aug 2006 14:39:55 -0700 (PDT) David Miller [EMAIL PROTECTED] wrote: From: Stephen Hemminger [EMAIL PROTECTED] Date: Wed, 30 Aug 2006 10:27:27 -0700 Linux TCP implements Appropriate Byte Count (ABC) and this penalizes applications that do small sends. The problem is that the other side may be delaying acknowledgments. If receiver doesn't acknowledge the sender will limit itself to the congestion window. If the flow is light, then you will be limited to 4 packets. Right. However it occured to me the other day that ABC could be made smarter. If we sent small frames, ABC should account for that. The problem with ABC is that it prevents CWND growth not just during ACK division, but also when we truly are sending smaller sized frames. In fact, for chatty protocols, the real load on a router for the small packets is much less than that for full sized frames. So it is in fact these small frame sending cases for which we can be less conservative, whatever that means here. So my suggestion is that ABC should go look in the retransmit queue and see how many real packets are being fully ACK'd, rather than assuming the send queue is composed of MSS sized frames. I also think we should seriously consider changing the ABC default to 2 rather than 1. That would be a good simple first step. I can't hurt and seems reasonable. -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: high latency with TCP connections
David Miller wrote: From: Stephen Hemminger [EMAIL PROTECTED] Date: Wed, 30 Aug 2006 10:27:27 -0700 Linux TCP implements Appropriate Byte Count (ABC) and this penalizes applications that do small sends. The problem is that the other side may be delaying acknowledgments. If receiver doesn't acknowledge the sender will limit itself to the congestion window. If the flow is light, then you will be limited to 4 packets. Right. However it occured to me the other day that ABC could be made smarter. If we sent small frames, ABC should account for that. Is that part of the application of a byte-based RFC to packet-counting cwnd? rick jones - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html