Re: [9fans] fossil+venti performance question
* the SYN-ACK needs to send the local mss, not echo the remote mss. asymmetry is fine in the other side, even if ip/tcp.c isn't smart enough to keep tx and rx mss seperate. (scare quotes = untested, there may be some performance niggles if the sender is sending legal packets larger than tcb-mss.) that is what it already does as far as i can see. on the server side, we receive a SYN, put it in limbo and reply with SYN|ACK (sndsynack()) sending our local mss straight from tcpmtu(), no adjust. at this point heres no connection or tcb as everything is still in limbo. only once we receive the ACK, tcpincoming() gets called which pulls the info we got so far (including the mss sent by the client in the SYN pakcet) out of limbo and sets up a connection with its tcb. to summarize what happens on the server for incoming connection: 1.a) tcpiput() gets a SYN packet for Listening connection, calls limbo(). 1.b) limbo() saves the info (including mss) from SYN in limbo database and calls sndsynack(). 1.c) sndsynack() sends SYN|ACK packet with mss option set from tcpmtu() without any adjust. 2.a) tcpiput() gets a ACK packet for Listening connection, calls tcpincoming(). 2.b) tcpincoming() looks in limbo, finds lp. and makes new connection. 3.c) initialize our connections tcb-mss. * the setting of tcb-mss in tcpincoming is not correct, tcp-mss is set by SYN, not by ACK, and may not be reset. (see snoopy below.) you say we shouldnt initialize tcb-mss in 3.c and not use the mss from the initial SYN to adjust it. i dont understand why not as i dont see where it would be initialized otherwise. it appears that was what the initial patch from david was about to fix which made sense to me. as far as i can see, the procsyn() is unrelated to server side incoming connections. it only gets called on behalf of a client outgoing connect when the connection is in Syn_sent state and processes the SYN|ACK that was generated by the process descibed in 1.c above. -- cinap
Re: [9fans] fossil+venti performance question
2.a) tcpiput() gets a ACK packet for Listening connection, calls tcpincoming(). 2.b) tcpincoming() looks in limbo, finds lp. and makes new connection. 3.c) initialize our connections tcb-mss. * the setting of tcb-mss in tcpincoming is not correct, tcp-mss is set by SYN, not by ACK, and may not be reset. (see snoopy below.) you say we shouldnt initialize tcb-mss in 3.c and not use the mss from the initial SYN to adjust it. i dont understand why not as i dont see where it would be initialized otherwise. it appears that was what the initial patch from david was about to fix which made sense to me. that was the opposite of what i was saying. the issue was i misread tcpincoming(). - erik
Re: [9fans] fossil+venti performance question
how is this the opposite? your patch shows the tcb-mss init being removed completely from tcpincoming(). - /* our sending max segment size cannot be bigger than what he asked for */ - if(lp-mss != 0 lp-mss tcb-mss) { - tcb-mss = lp-mss; - tpriv-stats[Mss] = tcb-mss; - } + /* per rfc, we can't set the mss any more */ + //tcb-mss = tcpmtu(s-p, lp-laddr, lp-version, lp-mss, tcb-scale); -- cinap
Re: [9fans] fossil+venti performance question
On Sun May 10 14:36:15 PDT 2015, cinap_len...@felloff.net wrote: how is this the opposite? your patch shows the tcb-mss init being removed completely from tcpincoming(). - /* our sending max segment size cannot be bigger than what he asked for */ - if(lp-mss != 0 lp-mss tcb-mss) { - tcb-mss = lp-mss; - tpriv-stats[Mss] = tcb-mss; - } + /* per rfc, we can't set the mss any more */ + // tcb-mss = tcpmtu(s-p, lp-laddr, lp-version, lp-mss, tcb-scale); i haven't updated the patch. - erik
Re: [9fans] fossil+venti performance question
On Sun May 10 10:58:55 PDT 2015, 0in...@gmail.com wrote: however, after fixing things so the initial cwind isn't hosed, i get a little better story: so, actually, i think this is the root cause. the intial cwind is misset for loopback. i but that the symptom folks will see is that /net/tcp/stats shows fragmentation when performance sucks. evidently there is a backoff bug in sources' tcp, too. What is your cwind change? the patch is here: /n/atom/patch/tcpmss note i applied a patch to nettest(8) to simulate a rpc-style protocol. i still ~500MB/s with my test machine simulating rpc-style transactions, or 15µs per 8k transaction. we're at least an order of magnitude off the performance mark for this. a similar test using pipe(2) shows a latency of 5.7µs (!) for a pipe-based rpc, which limits us to about 1.4 GB/s for 8k pipe-based ping-poing rpc. - erik
Re: [9fans] fossil+venti performance question
however, after fixing things so the initial cwind isn't hosed, i get a little better story: so, actually, i think this is the root cause. the intial cwind is misset for loopback. i but that the symptom folks will see is that /net/tcp/stats shows fragmentation when performance sucks. evidently there is a backoff bug in sources' tcp, too. What is your cwind change? -- David du Colombier
Re: [9fans] fossil+venti performance question
however, after fixing things so the initial cwind isn't hosed, i get a little better story: so, actually, i think this is the root cause. the intial cwind is misset for loopback. i but that the symptom folks will see is that /net/tcp/stats shows fragmentation when performance sucks. evidently there is a backoff bug in sources' tcp, too. i'd love confirmation of this. - erik
Re: [9fans] fossil+venti performance question
2015-05-09 10:35 GMT-07:00 Lyndon Nerenberg lyn...@orthanc.ca: On May 9, 2015, at 10:30 AM, Devon H. O'Dell devon.od...@gmail.com wrote: Or when your client is on a cell phone. Cell networks are the worst. Really? Quite often I slave my laptop to my phone's LTE connection, and I never have problems with PMTU. Both here (across western Canada) and in the UK. There are lots of hacks all over the Internet to deal with various brokenness on the carrier-carrier side of things where one end is a cell network. Haven't seen anything come up super recently, but had to help debug some brokenness as recently as a year and a half ago that turned out to be some cell network with really old hardware that didn't do PMTU correctly, causing TLS connections to drop or die. IIRC this particular case was in France, but I also seem to recall the same issue in northern England and perhaps Ireland.
Re: [9fans] fossil+venti performance question
for what it's worth, the original newreno work tcp does not have the mtu bug. on a 8 processor system i have around here i get bwc; while() nettest -a 127.1 tcp!127.0.0.1!40357 count 10; 81920 bytes in 1.505948 s @ 519 MB/s (0ms) tcp!127.0.0.1!47983 count 10; 81920 bytes in 1.377984 s @ 567 MB/s (0ms) tcp!127.0.0.1!53197 count 10; 81920 bytes in 1.299967 s @ 601 MB/s (0ms) tcp!127.0.0.1!61569 count 10; 81920 bytes in 1.418073 s @ 551 MB/s (0ms) however, after fixing things so the initial cwind isn't hosed, i get a little better story: bwc; while() nettest -a 127.1 tcp!127.0.0.1!54261 count 10; 81920 bytes in .5947659 s @ 1.31e+03 MB/s (0ms) boo yah! not bad for trying to clean up some constants. - erik
Re: [9fans] fossil+venti performance question
yes, but i was not refering to the adjusting which isnt changed here. only the tcpmtu() call that got added. yes, it *should* not make any difference but maybe we'r missing something. at worst it makes the code more confusing and cause bugs in the future because one of the initializations of mss is a lie without any effect. -- cinap
Re: [9fans] fossil+venti performance question
On Fri May 8 20:12:57 PDT 2015, cinap_len...@felloff.net wrote: do we really need to initialize tcb-mss to tcpmtu() in procsyn()? as i see it, procsyn() is called only when tcb-state is Syn_sent, which only should happen for client connections doing a connect, in which case tcpsndsyn() would have initialized tcb-mss already no? i think there was a subtile reason for this, but i don't recall. a real reason for setting it here is because it makes the code easier to reason about, imo. there are a couple problems with the patch as it stands. they are inherited from previous mistakes. * the setting of tpriv-stats[Mss] is bogus. it's not shared between connections. it is also v4 only. * so, mss should be added to each tcp connection's status file. * the setting of tcb-mss in tcpincoming is not correct, tcp-mss is set by SYN, not by ACK, and may not be reset. (see snoopy below.) * the SYN-ACK needs to send the local mss, not echo the remote mss. asymmetry is fine in the other side, even if ip/tcp.c isn't smart enough to keep tx and rx mss seperate. (scare quotes = untested, there may be some performance niggles if the sender is sending legal packets larger than tcb-mss.) my patch to nix is below. i haven't submitted it yet. - erik --- 005319 ms ether(s=a0369f1c3af7 d=0cc47a328da4 pr=0800 ln=62) ip(s=10.1.1.8 d=10.1.1.9 id=ee54 frag= ttl=255 pr=6 ln=48) tcp(s=38903 d=17766 seq=3552109414 ack=0 fl=S win=65535 ck=d68e ln=0 opt4=(mss 1460) opt3=(wscale 4) opt=NOOP) 005320 ms ether(s=0cc47a328da4 d=a0369f1c3af7 pr=0800 ln=62) ip(s=10.1.1.9 d=10.1.1.8 id=54d3 frag= ttl=255 pr=6 ln=48) tcp(s=17766 d=38903 seq=441373010 ack=3552109415 fl=AS win=65535 ck=eadc ln=0 opt4=(mss 1460) opt3=(wscale 4) opt=NOOP) --- /n/dump/2015/0509/sys/src/nix/ip/tcp.c:491,501 - /sys/src/nix/ip/tcp.c:491,502 s = (Tcpctl*)(c-ptcl); return snprint(state, n, - %s qin %d qout %d rq %d.%d srtt %d mdev %d sst %lud cwin %lud swin %lud%d rwin %lud%d qscale %d timer.start %d timer.count %d rerecv %d katimer.start %d katimer.count %d\n, + %s qin %d qout %d rq %d.%d mss %d srtt %d mdev %d sst %lud cwin %lud swin %lud%d rwin %lud%d qscale %d timer.start %d timer.count %d rerecv %d katimer.start %d katimer.count %d\n, tcpstates[s-state], c-rq ? qlen(c-rq) : 0, c-wq ? qlen(c-wq) : 0, s-nreseq, s-reseqlen, + s-mss, s-srtt, s-mdev, s-ssthresh, s-cwind, s-snd.wnd, s-rcv.scale, s-rcv.wnd, s-snd.scale, s-qscale, /n/dump/2015/0509/sys/src/nix/ip/tcp.c:843,854 - /sys/src/nix/ip/tcp.c:844,857 /* mtu (- TCP + IP hdr len) of 1st hop */ static int - tcpmtu(Proto *tcp, uchar *addr, int version, uint *scale) + tcpmtu(Proto *tcp, uchar *addr, int version, uint reqmss, uint *scale) { + Tcppriv *tpriv; Ipifc *ifc; int mtu; ifc = findipifc(tcp-f, addr, 0); + tpriv = tcp-priv; switch(version){ default: case V4: /n/dump/2015/0509/sys/src/nix/ip/tcp.c:855,865 - /sys/src/nix/ip/tcp.c:858,870 mtu = DEF_MSS; if(ifc != nil) mtu = ifc-maxtu - ifc-m-hsize - (TCP4_PKT + TCP4_HDRSIZE); + tpriv-stats[Mss] = mtu; break; case V6: mtu = DEF_MSS6; if(ifc != nil) mtu = ifc-maxtu - ifc-m-hsize - (TCP6_PKT + TCP6_HDRSIZE); + tpriv-stats[Mss] = mtu + (TCP6_PKT + TCP6_HDRSIZE) - (TCP4_PKT + TCP4_HDRSIZE); break; } /* /n/dump/2015/0509/sys/src/nix/ip/tcp.c:868,873 - /sys/src/nix/ip/tcp.c:873,882 */ *scale = Defadvscale; + /* our sending max segment size cannot be bigger than what he asked for */ + if(reqmss != 0 reqmss mtu) + mtu = reqmss; + return mtu; } /n/dump/2015/0509/sys/src/nix/ip/tcp.c:1300,1307 - /sys/src/nix/ip/tcp.c:1309,1314 static void tcpsndsyn(Conv *s, Tcpctl *tcb) { - Tcppriv *tpriv; - tcb-iss = (nrand(116)16)|nrand(116); tcb-rttseq = tcb-iss; tcb-snd.wl2 = tcb-iss; /n/dump/2015/0509/sys/src/nix/ip/tcp.c:1314,1322 - /sys/src/nix/ip/tcp.c:1321,1327 tcb-sndsyntime = NOW; /* set desired mss and scale */ - tcb-mss = tcpmtu(s-p, s-laddr, s-ipversion, tcb-scale); - tpriv = s-p-priv; - tpriv-stats[Mss] = tcb-mss; + tcb-mss = tcpmtu(s-p, s-laddr, s-ipversion, 0, tcb-scale); } void /n/dump/2015/0509/sys/src/nix/ip/tcp.c:1492,1498 - /sys/src/nix/ip/tcp.c:1497,1503 seg.ack = lp-irs+1; seg.flags = SYN|ACK; seg.urg = 0; - seg.mss = tcpmtu(tcp, lp-laddr, lp-version, scale); + seg.mss = tcpmtu(tcp, lp-laddr, lp-version, 0, scale); /* send our mss, not lp-mss */ seg.wnd
Re: [9fans] fossil+venti performance question
On Fri May 8 20:12:57 PDT 2015, cinap_len...@felloff.net wrote: do we really need to initialize tcb-mss to tcpmtu() in procsyn()? as i see it, procsyn() is called only when tcb-state is Syn_sent, which only should happen for client connections doing a connect, in which case tcpsndsyn() would have initialized tcb-mss already no? yes, we should. the bug is that we confuse send mss and receive mss. the sender's mss is the one we need to repsect here. tcpsendsyn() should not set the mss, the mss it calculates is for rx. - erik
Re: [9fans] fossil+venti performance question
Looking at the first few bytes in each dir of the initial TCP handshake (with tcpdump) I see: 0x: 4500 0030 24da = from plan9 to freebsd 0x: 4500 0030 d249 4000 = from freebsd to plan9 Looks like FreeBSD always sets the DF (don't fragment) bit (0x40 in byte 6), while plan9 doesn't (byte 6 is 0x00). May be plan9 should set the DF (don't fragment) bit in the IP header and try to do path MTU discovery? Either by default or under some ctl option. easy enough until one encounters devices that don't send icmp responses because it's not implemented, or somehow considered secure that way. - erik
Re: [9fans] fossil+venti performance question
2015-05-09 10:25 GMT-07:00 Lyndon Nerenberg lyn...@orthanc.ca: On May 9, 2015, at 7:43 AM, erik quanstrom quans...@quanstro.net wrote: easy enough until one encounters devices that don't send icmp responses because it's not implemented, or somehow considered secure that way. Oddly enough, I don't see this 'problem' in the real world. And FreeBSD is far from being alone in the always-set-DF bit. The only place this bites is when you run into tiny shops with homegrown firewalls configured by people who don't understand networking or security. Me, I consider it a feature that these sites self-select themselves off the network. I'm certainly no worse off for not being able to talk to them. Or when your client is on a cell phone. Cell networks are the worst.
Re: [9fans] fossil+venti performance question
On May 9, 2015, at 7:43 AM, erik quanstrom quans...@quanstro.net wrote: easy enough until one encounters devices that don't send icmp responses because it's not implemented, or somehow considered secure that way. Oddly enough, I don't see this 'problem' in the real world. And FreeBSD is far from being alone in the always-set-DF bit. The only place this bites is when you run into tiny shops with homegrown firewalls configured by people who don't understand networking or security. Me, I consider it a feature that these sites self-select themselves off the network. I'm certainly no worse off for not being able to talk to them. signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [9fans] fossil+venti performance question
On May 9, 2015, at 10:30 AM, Devon H. O'Dell devon.od...@gmail.com wrote: Or when your client is on a cell phone. Cell networks are the worst. Really? Quite often I slave my laptop to my phone's LTE connection, and I never have problems with PMTU. Both here (across western Canada) and in the UK. signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [9fans] fossil+venti performance question
On May 9, 2015, at 10:25 AM, Lyndon Nerenberg lyn...@orthanc.ca wrote: On May 9, 2015, at 7:43 AM, erik quanstrom quans...@quanstro.net wrote: easy enough until one encounters devices that don't send icmp responses because it's not implemented, or somehow considered secure that way. Oddly enough, I don't see this 'problem' in the real world. And FreeBSD is far from being alone in the always-set-DF bit. The only place this bites is when you run into tiny shops with homegrown firewalls configured by people who don't understand networking or security. Me, I consider it a feature that these sites self-select themselves off the network. I'm certainly no worse off for not being able to talk to them. Network admins not understanding ICMP was far more common 20 years ago. Now the game has changed. At any rate no harm in trying PMTU discovery as an option (other than a SMOP).
Re: [9fans] fossil+venti performance question
do we really need to initialize tcb-mss to tcpmtu() in procsyn()? as i see it, procsyn() is called only when tcb-state is Syn_sent, which only should happen for client connections doing a connect, in which case tcpsndsyn() would have initialized tcb-mss already no? tcb-mss may still need to be adjusted at this point, as it is when /* our sending max segment size cannot be bigger than what he asked for */ so at worst this does no harm that I can see. Of course, I'm probably least qualified to pick these nits. Lucio.
Re: [9fans] fossil+venti performance question
I've enabled tcp, tcpwin and tcprxmt logs, but there isn't anything very interesting. tcpincoming s 127.0.0.1!53150/127.0.0.1!53150 d 127.0.0.1!17034/127.0.0.1!17034 v 4/4 Also, the issue is definitely related to the loopback. There is no problem when using an address on /dev/ether0. cpu% cat /net/tcp/3/local 192.168.0.100!43125 cpu% cat /net/tcp/3/remote 192.168.0.100!17034 cpu% cat /net/tcp/3/status Established qin 0 qout 0 rq 0.0 srtt 0 mdev 0 sst 1048560 cwin 396560 swin 10485604 rwin 10485604 qscale 4 timer.start 10 timer.count 10 rerecv 0 katimer.start 2400 katimer.count 2106 -- David du Colombier
Re: [9fans] fossil+venti performance question
On 8 May 2015 at 17:13, David du Colombier 0in...@gmail.com wrote: Also, the issue is definitely related to the loopback. There is no problem when using an address on /dev/ether0. oh. possibly the queue isn't big enough, given the window size. it's using qpass on a Queue with Qmsg and if the queue is full, Blocks will be discarded.
Re: [9fans] fossil+venti performance question
I've finally figured out the issue. The slowness issue only appears on the loopback, because it provides a 16384 MTU. There is an old bug in the Plan 9 TCP stack, were the TCP MSS doesn't take account the MTU for incoming connections. I originally fixed this issue in January 2015 for the Plan 9 port on Google Compute Engine. On GCE, there is an unusual 1460 MTU. The Plan 9 TCP stack defines a default 1460 MSS corresponding to a 1500 MTU. Then, the MSS is fixed according to the MTU for outgoing connections, but not incoming connections. On GCE, this issue leads to IP fragmentation, but GCE didn't handle IP fragmentation properly, so the connections were dropped. On the loopback medium, I suppose this is the opposite issue. Since the TCP stack didn't fix the MSS in the incoming connection, the programs sent multiple small 1500 bytes IP packets instead of large 16384 IP packets, but I don't know why it leads to such a slowdown. Here is the patch for the Plan 9 kernel: http://9legacy.org/9legacy/patch/9-tcp-mss.diff And Charles' 9k kernel: http://9legacy.org/9legacy/patch/9k-tcp-mss.diff -- David du Colombier
Re: [9fans] fossil+venti performance question
On Fri, 08 May 2015 21:24:13 +0200 David du Colombier 0in...@gmail.com wrote: On the loopback medium, I suppose this is the opposite issue. Since the TCP stack didn't fix the MSS in the incoming connection, the programs sent multiple small 1500 bytes IP packets instead of large 16384 IP packets, but I don't know why it leads to such a slowdown. Looking at the first few bytes in each dir of the initial TCP handshake (with tcpdump) I see: 0x: 4500 0030 24da = from plan9 to freebsd 0x: 4500 0030 d249 4000 = from freebsd to plan9 Looks like FreeBSD always sets the DF (don't fragment) bit (0x40 in byte 6), while plan9 doesn't (byte 6 is 0x00). May be plan9 should set the DF (don't fragment) bit in the IP header and try to do path MTU discovery? Either by default or under some ctl option.
Re: [9fans] fossil+venti performance question
I confirm - my old performance is back. Thanks very much David. -Steve
Re: [9fans] fossil+venti performance question
do we really need to initialize tcb-mss to tcpmtu() in procsyn()? as i see it, procsyn() is called only when tcb-state is Syn_sent, which only should happen for client connections doing a connect, in which case tcpsndsyn() would have initialized tcb-mss already no? -- cinap
Re: [9fans] fossil+venti performance question
NOW is defined as MACHP(0)-ticks, so this is a pretty course timer that can't go backwards on intel processors. this limits the timer's resolution to HZ, which on 9atom is 1000, and 100 on pretty much anything else. further limiting the resolution is the tcp retransmit timers which according to presotto are /* bounded twixt 0.3 and 64 seconds */ so i really doubt the retransmit timers are resending anything. if someone has a system that isn't working right, please post /net/tcp/connectionno/^(local remote status) i'd like to have a look. The Venti listenner: cpu% cat /net/tcp/2/local ::!17034 cpu% cat /net/tcp/2/remote ::!0 cpu% cat /net/tcp/2/status Listen qin 0 qout 0 rq 0.0 srtt 4000 mdev 0 sst 65535 cwin 1460 swin 00 rwin 655350 qscale 0 timer.start 10 timer.count 0 rerecv 0 katimer.start 2400 katimer.count 0 The TCP connection from Fossil to Venti on the loopback: cpu% cat /net/tcp/3/local 127.0.0.1!57796 cpu% cat /net/tcp/3/remote 127.0.0.1!17034 cpu% cat /net/tcp/3/status Established qin 0 qout 0 rq 0.0 srtt 80 mdev 40 sst 1048560 cwin 258192 swin 10485604 rwin 10485604 qscale 4 timer.start 10 timer.count 10 rerecv 0 katimer.start 2400 katimer.count 427 -- David du Colombier
Re: [9fans] fossil+venti performance question
cpu% cat /net/tcp/3/local 127.0.0.1!57796 cpu% cat /net/tcp/3/remote 127.0.0.1!17034 cpu% cat /net/tcp/3/status Established qin 0 qout 0 rq 0.0 srtt 80 mdev 40 sst 1048560 cwin 258192 swin 10485604 rwin 10485604 qscale 4 timer.start 10 timer.count 10 rerecv 0 katimer.start 2400 katimer.count 427 hmm... large rtt, which suggests that someone is not servicing the queues fast enough. this is for the 1gbe machine in the room with me 11/status:Established qin 0 qout 0 rq 0.0 srtt 0 mdev 0 sst 2920 cwin 61390 swin 10485604 rwin 10485604 qscale 4 timer.start 10 timer.count 10 rerecv 0 katimer.start 2400 katimer.count 2101 i would suggest turning on netlog for tcp while booting and caputing the output. sorry for the short investigation. gotta run. - erik
Re: [9fans] fossil+venti performance question
On 6 May 2015 at 22:28, David du Colombier 0in...@gmail.com wrote: Since the problem only happen when Fossil or vacfs are running on the same machine as Venti, I suppose this is somewhat related to how TCP behaves with the loopback. Interesting. That would explain the clock-like delays. Possibly it's nearly zero RTT in initial exchanges and then when venti has to do some work, things time out. You'd think it would only lead to needless retransmissions not increased latency but perhaps some calculation doesn't work properly with tiny values, causing one side to back off incorrectly.
Re: [9fans] fossil+venti performance question
Definitely interesting, and explains why I've never seen the regression (I switched to a dedicated venti server a couple of years ago). Were these the changes that erik submitted? ISTR him working on reno bits somewhere around there... On Wed, May 6, 2015 at 4:28 PM, David du Colombier 0in...@gmail.com wrote: Since the problem only happen when Fossil or vacfs are running on the same machine as Venti, I suppose this is somewhat related to how TCP behaves with the loopback. -- David du Colombier
Re: [9fans] fossil+venti performance question
On 6 May 2015 at 23:35, Steven Stallion sstall...@gmail.com wrote: Were these the changes that erik submitted? I don't think so. Someone else submitted a different set of tcp changes independently much earlier.
Re: [9fans] fossil+venti performance question
Just to be sure, I tried again, and the issue is not related to the lock change on 2013-09-19. However, now I'm sure the issue was caused by a kernel change in 2013. There is no problem when running a kernel from early 2013. -- David du Colombier
Re: [9fans] fossil+venti performance question
Since the problem only happen when Fossil or vacfs are running on the same machine as Venti, I suppose this is somewhat related to how TCP behaves with the loopback. -- David du Colombier
Re: [9fans] fossil+venti performance question
On 6 May 2015 at 21:55, David du Colombier 0in...@gmail.com wrote: However, now I'm sure the issue was caused by a kernel change in 2013. There is no problem when running a kernel from early 2013. Welly, welly, welly, well. That is interesting.
Re: [9fans] fossil+venti performance question
I got it! The regression was caused by the NewReno TCP change on 2013-01-24. https://github.com/0intro/plan9/commit/e8406a2f44 -- David du Colombier
Re: [9fans] fossil+venti performance question
On Wed May 6 15:30:24 PDT 2015, charles.fors...@gmail.com wrote: On 6 May 2015 at 22:28, David du Colombier 0in...@gmail.com wrote: Since the problem only happen when Fossil or vacfs are running on the same machine as Venti, I suppose this is somewhat related to how TCP behaves with the loopback. Interesting. That would explain the clock-like delays. Possibly it's nearly zero RTT in initial exchanges and then when venti has to do some work, things time out. You'd think it would only lead to needless retransmissions not increased latency but perhaps some calculation doesn't work properly with tiny values, causing one side to back off incorrectly. i don't think that's possible. NOW is defined as MACHP(0)-ticks, so this is a pretty course timer that can't go backwards on intel processors. this limits the timer's resolution to HZ, which on 9atom is 1000, and 100 on pretty much anything else. further limiting the resolution is the tcp retransmit timers which according to presotto are /* bounded twixt 0.3 and 64 seconds */ so i really doubt the retransmit timers are resending anything. if someone has a system that isn't working right, please post /net/tcp/connectionno/^(local remote status) i'd like to have a look. quoting steve stallion ,,, Definitely interesting, and explains why I've never seen the regression (I switched to a dedicated venti server a couple of years ago). Were these the changes that erik submitted? ISTR him working on reno bits somewhere around there... I don't think so. Someone else submitted a different set of tcp changes independently much earlier. just for the record, the earlier changes were an incorrect partial implementation of reno. i implemented newreno from the specs and added corrected window scaling and removed the problem of window slamming. we spent a month going over cases from 50µs to 100ms rtt latency and showed that we got near the theoretical max for all those cases. (big thanks to bruce wong for putting up with early, buggy versions.) during the investigation of this i found that loopback *is* slow for reasons i don't completely understand. part of this was the terrible scheduler. as part of the gsoc work, we were able to make the nix scheduler not howlingly terrible for 1-8 cpus. this improvement depends on the goodness of mcs locks. i developed a version of this, but ended up using charles' much cleaner version. there remain big problems with the tcp and ip stack. it's really slow. i can't get 400MB/s on ethernet. it seems that the 3-way interaction between tcp:tx, tcp:rx and the user-space queues is the issue. queue locking is very wasteful as well. i have some student code that addresses part of the latter problem, but it smells to me like ip/tcp.c's direct calls between tx and rx are the real issue. - erik
Re: [9fans] fossil+venti performance question
On Tue May 5 15:54:45 PDT 2015, ara...@mgk.ro wrote: It's pretty interesting that at least three people all got exactly 150kB/s on vastly different machines, both real and virtual. Maybe the number comes from some tick frequency? i might suggest altering HZ and seeing if there is a throughput change in the same ratio. - erik
Re: [9fans] fossil+venti performance question
On Wed May 6 14:28:03 PDT 2015, 0in...@gmail.com wrote: I got it! The regression was caused by the NewReno TCP change on 2013-01-24. https://github.com/0intro/plan9/commit/e8406a2f44 if you have proof, i'd be interested in reproduction of the issue from the original source, or perhaps just nix. let me know if i can help. - erik
Re: [9fans] fossil+venti performance question
It's pretty interesting that at least three people all got exactly 150kB/s on vastly different machines, both real and virtual. Maybe the number comes from some tick frequency? -- Aram Hăvărneanu
Re: [9fans] fossil+venti performance question
Thanks Anthony. I bet if you re-run the same test twice in a row, you’re going to see dramatically improved performance. I try to re-run ‘iostats md5sum /386/9pcf’. Read result is very fast. first read result is 152KB/s. second read result is 232MB/s. Your write performance in that test isn’t really relevant: they’re not hitting the file system at all. I think to write 1GB data to filesystem: iostats dd -if /dev/zero -of output -ibs 1024k -obs 1024k -count 1024 Write result of dd is 31MB/s. But this test may just write to fossil. It may not write to venti. I’m not sure why you’d see a difference in a fossil+venti setup of a different size, but the partition size relationships, and the in-memory cache size relationships, are what’s mostly important. My hardware has 2GB memory. Plan 9 configurations are almost default. (except /dev/sdC0/bloom) To increase memory size is difficult, because memory size is determined by public QEMU/KVM service plan. — kadota
Re: [9fans] fossil+venti performance question
Thanks Aram. I have spent some time debugging this, but unfortunately, I couldn't find the root cause, and I just stopped using fossil. I tried to measure performance effect by replacement of component. 1) mbr or GRUB 2) pbs or pbslba 3) sdata or sdvirtio (sdvirtio is imported from 9legacy) 4) kernel configurations (9pcf, 9pccpuf, 9pcauth, etc) unfortunately, all of the above are no performance effect. — kadota
Re: [9fans] fossil+venti performance question
I too see this, and feel, no proof, that things used to be better. I.e. the first time I read a file from venti it it very, very slow. subsequent reads from the ram cache are quick. I think venti used to be faster a few years ago. maybe another effect of this is the boot time seems slower than it used to be. sorry to be vague. -Steve On 5 May 2015, at 15:47, KADOTA Kyohei lu...@me.com wrote: Thanks Anthony. I bet if you re-run the same test twice in a row, you’re going to see dramatically improved performance. I try to re-run ‘iostats md5sum /386/9pcf’. Read result is very fast. first read result is 152KB/s. second read result is 232MB/s. Your write performance in that test isn’t really relevant: they’re not hitting the file system at all. I think to write 1GB data to filesystem: iostats dd -if /dev/zero -of output -ibs 1024k -obs 1024k -count 1024 Write result of dd is 31MB/s. But this test may just write to fossil. It may not write to venti. I’m not sure why you’d see a difference in a fossil+venti setup of a different size, but the partition size relationships, and the in-memory cache size relationships, are what’s mostly important. My hardware has 2GB memory. Plan 9 configurations are almost default. (except /dev/sdC0/bloom) To increase memory size is difficult, because memory size is determined by public QEMU/KVM service plan. — kadota
Re: [9fans] fossil+venti performance question
Hello! imho placing fossil, venti, isect, bloom and swap on single drive is bad idea. As written in in http://plan9.bell-labs.com/sys/doc/venti/venti.html - The prototype Venti server is implemented for the Plan 9 operating system in about 10,000 lines of C. The server runs on a dedicated dual 550Mhz Pentium III processor system with 2 Gbyte of memory and is accessed over a 100Mbs Ethernet network. The data log is stored on a 500 Gbyte MaxTronic IDE Raid 5 Array and the index resides on a string of 8 Seagate Cheetah 18XL 9 Gbyte SCSI drives. God ide is to store isect on multiple SSD drives :) to speed up search. My small installation - 80Gb (PATA) 9fat, fossil, swap + 40Gb isect, bloom drive (PATA) + 1Tb SATA as arenas. No RAID. 2015-05-04 21:51 GMT+03:00 David du Colombier 0in...@gmail.com: I'm experiencing the same issue as well. When I launch vacfs on the same machine as Venti, reading is very slow. When I launch vacfs on another Plan 9 or Unix machine, reading is fast. I've just made some measurements when reading a file: Vacfs running on the same machine as Venti: 151 KB/s Vacfs running on another machine: 5131 KB/s -- David du Colombier -- С наилучшими пожеланиями Жилкин Сергей With best regards Zhilkin Sergey
Re: [9fans] fossil+venti performance question
On 4 May 2015 at 19:51, David du Colombier 0in...@gmail.com wrote: I've just made some measurements when reading a file: Vacfs running on the same machine as Venti: 151 KB/s Vacfs running on another machine: 5131 KB/s How many times do you time it on each machine?
Re: [9fans] fossil+venti performance question
I've just made some measurements when reading a file: Vacfs running on the same machine as Venti: 151 KB/s Vacfs running on another machine: 5131 KB/s How many times do you time it on each machine? Maybe ten times. The results are always the same ~5%. Also, I restarted vacfs between each try. It's easy to reproduce this issue with vacfs. I think anyone running Venti on Plan 9 can observe this problem. -- David du Colombier
Re: [9fans] fossil+venti performance question
Yes, I'm pretty sure it's not related to Fossil, since it happens with vacfs as well. Also, Venti was pretty much unchanged during the last few years. I suspected it was related to the lock change on 2013-09-19. https://github.com/0intro/plan9/commit/c4d045a91e But I remember I tried to revert this change and the problem was still present. Maybe I should try again to be sure. -- David du Colombier
Re: [9fans] fossil+venti performance question
On 5 May 2015 at 16:38, David du Colombier 0in...@gmail.com wrote: How many times do you time it on each machine? Maybe ten times. The results are always the same ~5%. Also, I restarted vacfs between each try. It was the effect of the ram caches that prompted the question. My experience is similar to Steve's: it was faster, and now it's initially very very slow. I looked at changes from that version of venti to this, and I didn't see anything that would cause that. (The problem could be outside venti, but I looked at some possibly relevant kernel changes too.) Note that the raw drive speed on my venti machine is fine (no doubt it could be better, but it's fine). I convinced myself through experiments that the problem was with venti, not fossil. I used some debugging code in venti and had the impression that it took a surprisingly long time to handle each request: that the time was in venti. The effect was similar to that of a lost interrupt for a device driver. I used ratrace on it, but didn't spot an obvious culprit. I was tempted to rip out or disable the drive scheduling code in venti to see what happened, but not for the first time I ran out of time and had to get back to some other work. One thing I didn't know was that the results were different when fossil was on a different machine. I thought I'd tried that with vacfs myself, but apparently not or mine was as slow as when on the same machine.
Re: [9fans] fossil+venti performance question
I too see this, and feel, no proof, that things used to be better. I.e. the first time I read a file from venti it it very, very slow. subsequent reads from the ram cache are quick. I think venti used to be faster a few years ago. maybe another effect of this is the boot time seems slower than it used to be. sorry to be vague. I'm pretty sure this issue started something like two years ago. It looks like a regression somewhere. -- David du Colombier
[9fans] fossil+venti performance question
Hello, fans. I’m running Plan 9(labs) on public QEMU/KVM service. My Plan 9 system has a slow read performance problem. I ran 'iostats md5sum /386/9pcf’, DMA is on, read result is 150KB/s. but write performance is fast. My Plan 9 system has a 200GB HDD, formatted with fossil+venti. disk layout is: - 9fat 100MB - nvram 512B - fossil31.82GB - arenas159.11GB - isect 7.95GB - bloom 512MB - swap 512MB Also, I explained other installations. 1)200GB HDD with fossil only. 2)100GB HDD with fossil+venti. Read performance is fast (about 15MB/s) both installations. Could you tell me the reason?
Re: [9fans] fossil+venti performance question
The reason, in general: In a fossil+venti setup, fossil runs (basically) as a cache for venti. If your access just hits fossil, it’ll be quick; if not, you hit the (significantly slower) venti. I bet if you re-run the same test twice in a row, you’re going to see dramatically improved performance. Try it. If that’s true, the question is really one of venti performance; if not, you may have another system config issue. There are various changes you can make to how venti uses disk/memory that can speed things up, but I don’t have a good handle on which to suggest first. Your write performance in that test isn’t really relevant: they’re not hitting the file system at all. I’m not sure why you’d see a difference in a fossil+venti setup of a different size, but the partition size relationships, and the in-memory cache size relationships, are what’s mostly important. a
Re: [9fans] fossil+venti performance question
I'm experiencing the same issue as well. When I launch vacfs on the same machine as Venti, reading is very slow. When I launch vacfs on another Plan 9 or Unix machine, reading is fast. I've just made some measurements when reading a file: Vacfs running on the same machine as Venti: 151 KB/s Vacfs running on another machine: 5131 KB/s -- David du Colombier