Re: [9fans] fossil+venti performance question

2015-05-10 Thread cinap_lenrek
 * the SYN-ACK needs to send the local mss, not echo the remote mss.
 asymmetry is fine in the other side, even if ip/tcp.c isn't smart enough to
 keep tx and rx mss seperate.  (scare quotes = untested, there may be
 some performance niggles if the sender is sending legal packets larger than
 tcb-mss.)

that is what it already does as far as i can see. on the server side, we 
receive a
SYN, put it in limbo and reply with SYN|ACK (sndsynack()) sending our local
mss straight from tcpmtu(), no adjust. at this point heres no connection or tcb 
as
everything is still in limbo. only once we receive the ACK, tcpincoming() gets 
called which
pulls the info we got so far (including the mss sent by the client in the SYN 
pakcet) out of
limbo and sets up a connection with its tcb.

to summarize what happens on the server for incoming connection:

1.a) tcpiput() gets a SYN packet for Listening connection, calls limbo().
1.b) limbo() saves the info (including mss) from SYN in limbo database and 
calls sndsynack().
1.c) sndsynack() sends SYN|ACK packet with mss option set from tcpmtu() without 
any adjust.

2.a) tcpiput() gets a ACK packet for Listening connection, calls tcpincoming().
2.b) tcpincoming() looks in limbo, finds lp. and makes new connection.
3.c) initialize our connections tcb-mss.

 * the setting of tcb-mss in tcpincoming is not correct, tcp-mss is
 set by SYN, not by ACK, and may not be reset.  (see snoopy below.)

you say we shouldnt initialize tcb-mss in 3.c and not use the mss from the
initial SYN to adjust it. i dont understand why not as i dont see where it
would be initialized otherwise. it appears that was what the initial patch
from david was about to fix which made sense to me.

as far as i can see, the procsyn() is unrelated to server side incoming 
connections. it only gets called on behalf of a client outgoing connect
when the connection is in Syn_sent state and processes the SYN|ACK that
was generated by the process descibed in 1.c above.

--
cinap



Re: [9fans] fossil+venti performance question

2015-05-10 Thread erik quanstrom
 2.a) tcpiput() gets a ACK packet for Listening connection, calls 
 tcpincoming().
 2.b) tcpincoming() looks in limbo, finds lp. and makes new connection.
 3.c) initialize our connections tcb-mss.
 
  * the setting of tcb-mss in tcpincoming is not correct, tcp-mss is
  set by SYN, not by ACK, and may not be reset.  (see snoopy below.)
 
 you say we shouldnt initialize tcb-mss in 3.c and not use the mss from the
 initial SYN to adjust it. i dont understand why not as i dont see where it
 would be initialized otherwise. it appears that was what the initial patch
 from david was about to fix which made sense to me.

that was the opposite of what i was saying.  the issue was i misread 
tcpincoming().

- erik



Re: [9fans] fossil+venti performance question

2015-05-10 Thread cinap_lenrek
how is this the opposite? your patch shows the tcb-mss init being removed
completely from tcpincoming().
  
-   /* our sending max segment size cannot be bigger than what he asked for 
*/
-   if(lp-mss != 0  lp-mss  tcb-mss) {
-   tcb-mss = lp-mss;
-   tpriv-stats[Mss] = tcb-mss;
-   }
+   /* per rfc, we can't set the mss any more */
+ //tcb-mss = tcpmtu(s-p, lp-laddr, lp-version, lp-mss, tcb-scale);

--
cinap



Re: [9fans] fossil+venti performance question

2015-05-10 Thread erik quanstrom
On Sun May 10 14:36:15 PDT 2015, cinap_len...@felloff.net wrote:
 how is this the opposite? your patch shows the tcb-mss init being removed
 completely from tcpincoming().
   
 - /* our sending max segment size cannot be bigger than what he asked for 
 */
 - if(lp-mss != 0  lp-mss  tcb-mss) {
 - tcb-mss = lp-mss;
 - tpriv-stats[Mss] = tcb-mss;
 - }
 + /* per rfc, we can't set the mss any more */
 + //  tcb-mss = tcpmtu(s-p, lp-laddr, lp-version, lp-mss, tcb-scale);

i haven't updated the patch.

- erik



Re: [9fans] fossil+venti performance question

2015-05-10 Thread erik quanstrom
On Sun May 10 10:58:55 PDT 2015, 0in...@gmail.com wrote:
  however, after fixing things so the initial cwind isn't hosed, i get a 
  little better story:
 
  so, actually, i think this is the root cause.  the intial cwind is misset 
  for loopback.
  i but that the symptom folks will see is that /net/tcp/stats shows 
  fragmentation when
  performance sucks.  evidently there is a backoff bug in sources' tcp, too.
 
 What is your cwind change?
 

the patch is here: /n/atom/patch/tcpmss

note i applied a patch to nettest(8) to simulate a rpc-style protocol.  i still 
~500MB/s
with my test machine simulating rpc-style transactions, or 15µs per 8k 
transaction.

we're at least an order of magnitude off the performance mark for this.

a similar test using pipe(2) shows a latency of 5.7µs (!) for a pipe-based rpc, 
which limits
us to about 1.4 GB/s for 8k pipe-based ping-poing rpc.

- erik



Re: [9fans] fossil+venti performance question

2015-05-10 Thread David du Colombier
 however, after fixing things so the initial cwind isn't hosed, i get a 
 little better story:

 so, actually, i think this is the root cause.  the intial cwind is misset for 
 loopback.
 i but that the symptom folks will see is that /net/tcp/stats shows 
 fragmentation when
 performance sucks.  evidently there is a backoff bug in sources' tcp, too.

What is your cwind change?

-- 
David du Colombier



Re: [9fans] fossil+venti performance question

2015-05-09 Thread erik quanstrom
 however, after fixing things so the initial cwind isn't hosed, i get a little 
 better story:

so, actually, i think this is the root cause.  the intial cwind is misset for 
loopback.
i but that the symptom folks will see is that /net/tcp/stats shows 
fragmentation when
performance sucks.  evidently there is a backoff bug in sources' tcp, too.

i'd love confirmation of this.  

- erik



Re: [9fans] fossil+venti performance question

2015-05-09 Thread Devon H. O'Dell
2015-05-09 10:35 GMT-07:00 Lyndon Nerenberg lyn...@orthanc.ca:

 On May 9, 2015, at 10:30 AM, Devon H. O'Dell devon.od...@gmail.com wrote:

 Or when your client is on a cell phone. Cell networks are the worst.

 Really?  Quite often I slave my laptop to my phone's LTE connection, and I 
 never have problems with PMTU.  Both here (across western Canada) and in the 
 UK.

There are lots of hacks all over the Internet to deal with various
brokenness on the carrier-carrier side of things where one end is a
cell network. Haven't seen anything come up super recently, but had to
help debug some brokenness as recently as a year and a half ago that
turned out to be some cell network with really old hardware that
didn't do PMTU correctly, causing TLS connections to drop or die. IIRC
this particular case was in France, but I also seem to recall the same
issue in northern England and perhaps Ireland.



Re: [9fans] fossil+venti performance question

2015-05-09 Thread erik quanstrom
for what it's worth, the original newreno work tcp does not have the mtu
bug.  on a 8 processor system i have around here i get

bwc; while() nettest -a 127.1
tcp!127.0.0.1!40357 count 10; 81920 bytes in 1.505948 s @ 519 MB/s (0ms)
tcp!127.0.0.1!47983 count 10; 81920 bytes in 1.377984 s @ 567 MB/s (0ms)
tcp!127.0.0.1!53197 count 10; 81920 bytes in 1.299967 s @ 601 MB/s (0ms)
tcp!127.0.0.1!61569 count 10; 81920 bytes in 1.418073 s @ 551 MB/s (0ms)

however, after fixing things so the initial cwind isn't hosed, i get a little 
better story:

bwc; while() nettest -a 127.1
tcp!127.0.0.1!54261 count 10; 81920 bytes in .5947659 s @ 1.31e+03 MB/s 
(0ms)

boo yah!  not bad for trying to clean up some constants.

- erik



Re: [9fans] fossil+venti performance question

2015-05-09 Thread cinap_lenrek
yes, but i was not refering to the adjusting which isnt changed here. only
the tcpmtu() call that got added.

yes, it *should* not make any difference but maybe we'r missing
something. at worst it makes the code more confusing and cause bugs in
the future because one of the initializations of mss is a lie without
any effect.

--
cinap



Re: [9fans] fossil+venti performance question

2015-05-09 Thread erik quanstrom
On Fri May  8 20:12:57 PDT 2015, cinap_len...@felloff.net wrote:
 do we really need to initialize tcb-mss to tcpmtu() in procsyn()?
 as i see it, procsyn() is called only when tcb-state is Syn_sent,
 which only should happen for client connections doing a connect, in
 which case tcpsndsyn() would have initialized tcb-mss already no?

i think there was a subtile reason for this, but i don't recall.  a real
reason for setting it here is because it makes the code easier to reason
about, imo.

there are a couple problems with the patch as it stands.  they are
inherited from previous mistakes.

* the setting of tpriv-stats[Mss] is bogus.  it's not shared between 
connections.
it is also v4 only.  

* so, mss should be added to each tcp connection's status file.

* the setting of tcb-mss in tcpincoming is not correct, tcp-mss is
set by SYN, not by ACK, and may not be reset.  (see snoopy below.)

* the SYN-ACK needs to send the local mss, not echo the remote mss.
asymmetry is fine in the other side, even if ip/tcp.c isn't smart enough to
keep tx and rx mss seperate.  (scare quotes = untested, there may be
some performance niggles if the sender is sending legal packets larger than
tcb-mss.)

my patch to nix is below.  i haven't submitted it yet.

- erik

---
005319 ms 
ether(s=a0369f1c3af7 d=0cc47a328da4 pr=0800 ln=62)
ip(s=10.1.1.8 d=10.1.1.9 id=ee54 frag= ttl=255 pr=6 ln=48)
tcp(s=38903 d=17766 seq=3552109414 ack=0 fl=S win=65535 ck=d68e ln=0 
opt4=(mss 1460) opt3=(wscale 4) opt=NOOP)
005320 ms 
ether(s=0cc47a328da4 d=a0369f1c3af7 pr=0800 ln=62)
ip(s=10.1.1.9 d=10.1.1.8 id=54d3 frag= ttl=255 pr=6 ln=48)
tcp(s=17766 d=38903 seq=441373010 ack=3552109415 fl=AS win=65535 
ck=eadc ln=0 opt4=(mss 1460) opt3=(wscale 4) opt=NOOP)

---

/n/dump/2015/0509/sys/src/nix/ip/tcp.c:491,501 - /sys/src/nix/ip/tcp.c:491,502
s = (Tcpctl*)(c-ptcl);
  
return snprint(state, n,
-   %s qin %d qout %d rq %d.%d srtt %d mdev %d sst %lud cwin %lud 
swin %lud%d rwin %lud%d qscale %d timer.start %d timer.count %d rerecv %d 
katimer.start %d katimer.count %d\n,
+   %s qin %d qout %d rq %d.%d mss %d srtt %d mdev %d sst %lud 
cwin %lud swin %lud%d rwin %lud%d qscale %d timer.start %d timer.count %d 
rerecv %d katimer.start %d katimer.count %d\n,
tcpstates[s-state],
c-rq ? qlen(c-rq) : 0,
c-wq ? qlen(c-wq) : 0,
s-nreseq, s-reseqlen,
+   s-mss,
s-srtt, s-mdev, s-ssthresh,
s-cwind, s-snd.wnd, s-rcv.scale, s-rcv.wnd, s-snd.scale,
s-qscale,
/n/dump/2015/0509/sys/src/nix/ip/tcp.c:843,854 - /sys/src/nix/ip/tcp.c:844,857
  
  /* mtu (- TCP + IP hdr len) of 1st hop */
  static int
- tcpmtu(Proto *tcp, uchar *addr, int version, uint *scale)
+ tcpmtu(Proto *tcp, uchar *addr, int version, uint reqmss, uint *scale)
  {
+   Tcppriv *tpriv;
Ipifc *ifc;
int mtu;
  
ifc = findipifc(tcp-f, addr, 0);
+   tpriv = tcp-priv;
switch(version){
default:
case V4:
/n/dump/2015/0509/sys/src/nix/ip/tcp.c:855,865 - /sys/src/nix/ip/tcp.c:858,870
mtu = DEF_MSS;
if(ifc != nil)
mtu = ifc-maxtu - ifc-m-hsize - (TCP4_PKT + 
TCP4_HDRSIZE);
+   tpriv-stats[Mss] = mtu;
break;
case V6:
mtu = DEF_MSS6;
if(ifc != nil)
mtu = ifc-maxtu - ifc-m-hsize - (TCP6_PKT + 
TCP6_HDRSIZE);
+   tpriv-stats[Mss] = mtu + (TCP6_PKT + TCP6_HDRSIZE) - (TCP4_PKT 
+ TCP4_HDRSIZE);
break;
}
/*
/n/dump/2015/0509/sys/src/nix/ip/tcp.c:868,873 - /sys/src/nix/ip/tcp.c:873,882
 */
*scale = Defadvscale;
  
+   /* our sending max segment size cannot be bigger than what he asked for 
*/
+   if(reqmss != 0  reqmss  mtu) 
+   mtu = reqmss;
+ 
return mtu;
  }
  
/n/dump/2015/0509/sys/src/nix/ip/tcp.c:1300,1307 - 
/sys/src/nix/ip/tcp.c:1309,1314
  static void
  tcpsndsyn(Conv *s, Tcpctl *tcb)
  {
-   Tcppriv *tpriv;
- 
tcb-iss = (nrand(116)16)|nrand(116);
tcb-rttseq = tcb-iss;
tcb-snd.wl2 = tcb-iss;
/n/dump/2015/0509/sys/src/nix/ip/tcp.c:1314,1322 - 
/sys/src/nix/ip/tcp.c:1321,1327
tcb-sndsyntime = NOW;
  
/* set desired mss and scale */
-   tcb-mss = tcpmtu(s-p, s-laddr, s-ipversion, tcb-scale);
-   tpriv = s-p-priv;
-   tpriv-stats[Mss] = tcb-mss;
+   tcb-mss = tcpmtu(s-p, s-laddr, s-ipversion, 0, tcb-scale);
  }
  
  void
/n/dump/2015/0509/sys/src/nix/ip/tcp.c:1492,1498 - 
/sys/src/nix/ip/tcp.c:1497,1503
seg.ack = lp-irs+1;
seg.flags = SYN|ACK;
seg.urg = 0;
-   seg.mss = tcpmtu(tcp, lp-laddr, lp-version, scale);
+   seg.mss = tcpmtu(tcp, lp-laddr, lp-version, 0, scale);   /* send 
our mss, not lp-mss */
seg.wnd 

Re: [9fans] fossil+venti performance question

2015-05-09 Thread erik quanstrom
On Fri May  8 20:12:57 PDT 2015, cinap_len...@felloff.net wrote:
 do we really need to initialize tcb-mss to tcpmtu() in procsyn()?
 as i see it, procsyn() is called only when tcb-state is Syn_sent,
 which only should happen for client connections doing a connect, in
 which case tcpsndsyn() would have initialized tcb-mss already no?

yes, we should.  the bug is that we confuse send mss and receive mss.
the sender's mss is the one we need to repsect here.
tcpsendsyn() should not set the mss, the mss it calculates is for rx.

- erik



Re: [9fans] fossil+venti performance question

2015-05-09 Thread erik quanstrom
 Looking at the first few bytes in each dir of the initial TCP
 handshake (with tcpdump) I see:
 
 0x:  4500 0030 24da   = from plan9 to freebsd
 
 0x:  4500 0030 d249 4000  = from freebsd to plan9
 
 Looks like FreeBSD always sets the DF (don't fragment) bit
 (0x40 in byte 6), while plan9 doesn't (byte 6 is 0x00).
 
 May be plan9 should set the DF (don't fragment) bit in the IP
 header and try to do path MTU discovery? Either by default or
 under some ctl option.

easy enough until one encounters devices that don't send icmp
responses because it's not implemented, or somehow considered
secure that way.  

- erik



Re: [9fans] fossil+venti performance question

2015-05-09 Thread Devon H. O'Dell
2015-05-09 10:25 GMT-07:00 Lyndon Nerenberg lyn...@orthanc.ca:


 On May 9, 2015, at 7:43 AM, erik quanstrom quans...@quanstro.net wrote:

  easy enough until one encounters devices that don't send icmp
  responses because it's not implemented, or somehow considered
  secure that way.

 Oddly enough, I don't see this 'problem' in the real world.  And FreeBSD is 
 far from being alone in the always-set-DF bit.

 The only place this bites is when you run into tiny shops with homegrown 
 firewalls configured by people who don't understand networking or security.  
 Me, I consider it a feature that these sites self-select themselves off the 
 network.  I'm certainly no worse off for not being able to talk to them.

Or when your client is on a cell phone. Cell networks are the worst.



Re: [9fans] fossil+venti performance question

2015-05-09 Thread Lyndon Nerenberg

On May 9, 2015, at 7:43 AM, erik quanstrom quans...@quanstro.net wrote:

 easy enough until one encounters devices that don't send icmp
 responses because it's not implemented, or somehow considered
 secure that way.

Oddly enough, I don't see this 'problem' in the real world.  And FreeBSD is far 
from being alone in the always-set-DF bit.

The only place this bites is when you run into tiny shops with homegrown 
firewalls configured by people who don't understand networking or security.  
Me, I consider it a feature that these sites self-select themselves off the 
network.  I'm certainly no worse off for not being able to talk to them.


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [9fans] fossil+venti performance question

2015-05-09 Thread Lyndon Nerenberg

On May 9, 2015, at 10:30 AM, Devon H. O'Dell devon.od...@gmail.com wrote:

 Or when your client is on a cell phone. Cell networks are the worst.

Really?  Quite often I slave my laptop to my phone's LTE connection, and I 
never have problems with PMTU.  Both here (across western Canada) and in the UK.



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [9fans] fossil+venti performance question

2015-05-09 Thread Bakul Shah


 On May 9, 2015, at 10:25 AM, Lyndon Nerenberg lyn...@orthanc.ca wrote:
 
 
 On May 9, 2015, at 7:43 AM, erik quanstrom quans...@quanstro.net wrote:
 
 easy enough until one encounters devices that don't send icmp
 responses because it's not implemented, or somehow considered
 secure that way.
 
 Oddly enough, I don't see this 'problem' in the real world.  And FreeBSD is 
 far from being alone in the always-set-DF bit.
 
 The only place this bites is when you run into tiny shops with homegrown 
 firewalls configured by people who don't understand networking or security.  
 Me, I consider it a feature that these sites self-select themselves off the 
 network.  I'm certainly no worse off for not being able to talk to them.

Network admins not understanding ICMP was far more common 20 years ago. Now the 
game has changed. At any rate no harm in trying PMTU discovery as an option 
(other than a SMOP).


Re: [9fans] fossil+venti performance question

2015-05-09 Thread lucio
 do we really need to initialize tcb-mss to tcpmtu() in procsyn()?
 as i see it, procsyn() is called only when tcb-state is Syn_sent,
 which only should happen for client connections doing a connect, in
 which case tcpsndsyn() would have initialized tcb-mss already no?

tcb-mss may still need to be adjusted at this point, as it is when

/* our sending max segment size cannot be bigger than what he asked for 
*/

so at worst this does no harm that I can see.

Of course, I'm probably least qualified to pick these nits.

Lucio.




Re: [9fans] fossil+venti performance question

2015-05-08 Thread David du Colombier
I've enabled tcp, tcpwin and tcprxmt logs, but there isn't
anything very interesting.

tcpincoming s 127.0.0.1!53150/127.0.0.1!53150 d
127.0.0.1!17034/127.0.0.1!17034 v 4/4

Also, the issue is definitely related to the loopback.
There is no problem when using an address on /dev/ether0.

cpu% cat /net/tcp/3/local
192.168.0.100!43125
cpu% cat /net/tcp/3/remote
192.168.0.100!17034
cpu% cat /net/tcp/3/status
Established qin 0 qout 0 rq 0.0 srtt 0 mdev 0 sst 1048560 cwin 396560
swin 10485604 rwin 10485604 qscale 4 timer.start 10 timer.count 10
rerecv 0 katimer.start 2400 katimer.count 2106

-- 
David du Colombier



Re: [9fans] fossil+venti performance question

2015-05-08 Thread Charles Forsyth
On 8 May 2015 at 17:13, David du Colombier 0in...@gmail.com wrote:

 Also, the issue is definitely related to the loopback.
 There is no problem when using an address on /dev/ether0.


oh. possibly the queue isn't big enough, given the window size. it's using
qpass on a Queue with Qmsg
and if the queue is full, Blocks will be discarded.


Re: [9fans] fossil+venti performance question

2015-05-08 Thread David du Colombier
I've finally figured out the issue.

The slowness issue only appears on the loopback, because
it provides a 16384 MTU.

There is an old bug in the Plan 9 TCP stack, were the TCP
MSS doesn't take account the MTU for incoming connections.

I originally fixed this issue in January 2015 for the Plan 9
port on Google Compute Engine. On GCE, there is an unusual
1460 MTU.

The Plan 9 TCP stack defines a default 1460 MSS corresponding
to a 1500 MTU. Then, the MSS is fixed according to the MTU
for outgoing connections, but not incoming connections.

On GCE, this issue leads to IP fragmentation, but GCE didn't
handle IP fragmentation properly, so the connections
were dropped.

On the loopback medium, I suppose this is the opposite issue.
Since the TCP stack didn't fix the MSS in the incoming
connection, the programs sent multiple small 1500 bytes
IP packets instead of large 16384 IP packets, but I don't
know why it leads to such a slowdown.

Here is the patch for the Plan 9 kernel:

http://9legacy.org/9legacy/patch/9-tcp-mss.diff

And Charles' 9k kernel:

http://9legacy.org/9legacy/patch/9k-tcp-mss.diff

-- 
David du Colombier



Re: [9fans] fossil+venti performance question

2015-05-08 Thread Bakul Shah
On Fri, 08 May 2015 21:24:13 +0200 David du Colombier 0in...@gmail.com wrote:
 On the loopback medium, I suppose this is the opposite issue.
 Since the TCP stack didn't fix the MSS in the incoming
 connection, the programs sent multiple small 1500 bytes
 IP packets instead of large 16384 IP packets, but I don't
 know why it leads to such a slowdown.

Looking at the first few bytes in each dir of the initial TCP
handshake (with tcpdump) I see:

0x:  4500 0030 24da   = from plan9 to freebsd

0x:  4500 0030 d249 4000  = from freebsd to plan9

Looks like FreeBSD always sets the DF (don't fragment) bit
(0x40 in byte 6), while plan9 doesn't (byte 6 is 0x00).

May be plan9 should set the DF (don't fragment) bit in the IP
header and try to do path MTU discovery? Either by default or
under some ctl option.



Re: [9fans] fossil+venti performance question

2015-05-08 Thread Steve Simon
I confirm - my old performance is back.

Thanks very much David.

-Steve



Re: [9fans] fossil+venti performance question

2015-05-08 Thread cinap_lenrek
do we really need to initialize tcb-mss to tcpmtu() in procsyn()?
as i see it, procsyn() is called only when tcb-state is Syn_sent,
which only should happen for client connections doing a connect, in
which case tcpsndsyn() would have initialized tcb-mss already no?

--
cinap



Re: [9fans] fossil+venti performance question

2015-05-07 Thread David du Colombier
 NOW is defined as MACHP(0)-ticks, so this is a pretty course timer
 that can't go backwards on intel processors.  this limits the timer's 
 resolution to HZ,
 which on 9atom is 1000, and 100 on pretty much anything else.  further 
 limiting the
 resolution is the tcp retransmit timers which according to presotto are
 /* bounded twixt 0.3 and 64 seconds */
 so i really doubt the retransmit timers are resending anything.  if someone
 has a system that isn't working right, please post 
 /net/tcp/connectionno/^(local remote status)
 i'd like to have a look.

The Venti listenner:

cpu% cat /net/tcp/2/local
::!17034
cpu% cat /net/tcp/2/remote
::!0
cpu% cat /net/tcp/2/status
Listen qin 0 qout 0 rq 0.0 srtt 4000 mdev 0 sst 65535 cwin 1460 swin
00 rwin 655350 qscale 0 timer.start 10 timer.count 0 rerecv 0
katimer.start 2400 katimer.count 0

The TCP connection from Fossil to Venti on the loopback:

cpu% cat /net/tcp/3/local
127.0.0.1!57796
cpu% cat /net/tcp/3/remote
127.0.0.1!17034
cpu% cat /net/tcp/3/status
Established qin 0 qout 0 rq 0.0 srtt 80 mdev 40 sst 1048560 cwin
258192 swin 10485604 rwin 10485604 qscale 4 timer.start 10
timer.count 10 rerecv 0 katimer.start 2400 katimer.count 427

-- 
David du Colombier



Re: [9fans] fossil+venti performance question

2015-05-07 Thread erik quanstrom
 cpu% cat /net/tcp/3/local
 127.0.0.1!57796
 cpu% cat /net/tcp/3/remote
 127.0.0.1!17034
 cpu% cat /net/tcp/3/status
 Established qin 0 qout 0 rq 0.0 srtt 80 mdev 40 sst 1048560 cwin
 258192 swin 10485604 rwin 10485604 qscale 4 timer.start 10
 timer.count 10 rerecv 0 katimer.start 2400 katimer.count 427

hmm... large rtt, which suggests that someone is not servicing the queues
fast enough.  

this is for the 1gbe machine in the room with me

11/status:Established qin 0 qout 0 rq 0.0 srtt 0 mdev 0 sst 2920 cwin 61390 
swin 10485604 rwin 10485604 qscale 4 timer.start 10 timer.count 10 rerecv 0 
katimer.start 2400 katimer.count 2101

i would suggest turning on netlog for tcp while booting and caputing the output.

sorry for the short investigation.  gotta run.

- erik



Re: [9fans] fossil+venti performance question

2015-05-06 Thread Charles Forsyth
On 6 May 2015 at 22:28, David du Colombier 0in...@gmail.com wrote:

 Since the problem only happen when Fossil or vacfs are running
 on the same machine as Venti, I suppose this is somewhat related
 to how TCP behaves with the loopback.


Interesting. That would explain the clock-like delays.
Possibly it's nearly zero RTT in initial exchanges and then when venti has
to do some work,
things time out. You'd think it would only lead to needless retransmissions
not increased latency
but perhaps some calculation doesn't work properly with tiny values,
causing one side to back off
incorrectly.


Re: [9fans] fossil+venti performance question

2015-05-06 Thread Steven Stallion
Definitely interesting, and explains why I've never seen the regression (I
switched to a dedicated venti server a couple of years ago). Were these the
changes that erik submitted? ISTR him working on reno bits somewhere around
there...

On Wed, May 6, 2015 at 4:28 PM, David du Colombier 0in...@gmail.com wrote:

 Since the problem only happen when Fossil or vacfs are running
 on the same machine as Venti, I suppose this is somewhat related
 to how TCP behaves with the loopback.

 --
 David du Colombier




Re: [9fans] fossil+venti performance question

2015-05-06 Thread Charles Forsyth
On 6 May 2015 at 23:35, Steven Stallion sstall...@gmail.com wrote:

 Were these the changes that erik submitted?


I don't think so. Someone else submitted a different set of tcp changes
independently much earlier.


Re: [9fans] fossil+venti performance question

2015-05-06 Thread David du Colombier
Just to be sure, I tried again, and the issue is not related
to the lock change on 2013-09-19.

However, now I'm sure the issue was caused by a kernel
change in 2013.

There is no problem when running a kernel from early 2013.

-- 
David du Colombier



Re: [9fans] fossil+venti performance question

2015-05-06 Thread David du Colombier
Since the problem only happen when Fossil or vacfs are running
on the same machine as Venti, I suppose this is somewhat related
to how TCP behaves with the loopback.

-- 
David du Colombier



Re: [9fans] fossil+venti performance question

2015-05-06 Thread Charles Forsyth
On 6 May 2015 at 21:55, David du Colombier 0in...@gmail.com wrote:

 However, now I'm sure the issue was caused by a kernel
 change in 2013.

 There is no problem when running a kernel from early 2013.


Welly, welly, welly, well. That is interesting.


Re: [9fans] fossil+venti performance question

2015-05-06 Thread David du Colombier
I got it!

The regression was caused by the NewReno TCP
change on 2013-01-24.

https://github.com/0intro/plan9/commit/e8406a2f44

-- 
David du Colombier



Re: [9fans] fossil+venti performance question

2015-05-06 Thread erik quanstrom
On Wed May  6 15:30:24 PDT 2015, charles.fors...@gmail.com wrote:

 On 6 May 2015 at 22:28, David du Colombier 0in...@gmail.com wrote:
 
  Since the problem only happen when Fossil or vacfs are running
  on the same machine as Venti, I suppose this is somewhat related
  to how TCP behaves with the loopback.
 
 
 Interesting. That would explain the clock-like delays.
 Possibly it's nearly zero RTT in initial exchanges and then when venti has
 to do some work,
 things time out. You'd think it would only lead to needless retransmissions
 not increased latency
 but perhaps some calculation doesn't work properly with tiny values,
 causing one side to back off
 incorrectly.

i don't think that's possible.

NOW is defined as MACHP(0)-ticks, so this is a pretty course timer
that can't go backwards on intel processors.  this limits the timer's 
resolution to HZ,
which on 9atom is 1000, and 100 on pretty much anything else.  further limiting 
the
resolution is the tcp retransmit timers which according to presotto are
/* bounded twixt 0.3 and 64 seconds */
so i really doubt the retransmit timers are resending anything.  if someone
has a system that isn't working right, please post 
/net/tcp/connectionno/^(local remote status)
i'd like to have a look.

quoting steve stallion ,,,

  Definitely interesting, and explains why I've never seen the regression (I
  switched to a dedicated venti server a couple of years ago). Were these the
  changes that erik submitted? ISTR him working on reno bits somewhere around
  there...

 I don't think so. Someone else submitted a different set of tcp changes
 independently much earlier.

just for the record, the earlier changes were an incorrect partial 
implementation of
reno.  i implemented newreno from the specs and added corrected window scaling
and removed the problem of window slamming.  we spent a month going over cases
from 50µs to 100ms rtt latency and showed that we got near the theoretical max 
for
all those cases.  (big thanks to bruce wong for putting up with early, buggy 
versions.)

during the investigation of this i found that loopback *is* slow for reasons i 
don't
completely understand.  part of this was the terrible scheduler.  as part of 
the gsoc
work, we were able to make the nix scheduler not howlingly terrible for 1-8 
cpus.  this
improvement depends on the goodness of mcs locks.  i developed a version of 
this,
but ended up using charles' much cleaner version.  there remain big problems 
with
the tcp and ip stack.  it's really slow.  i can't get 400MB/s on ethernet.  it 
seems
that the 3-way interaction between tcp:tx, tcp:rx and the user-space queues is 
the issue.
queue locking is very wasteful as well.  i have some student code that 
addresses part
of the latter problem, but it smells to me like ip/tcp.c's direct calls between 
tx and rx
are the real issue.

- erik



Re: [9fans] fossil+venti performance question

2015-05-06 Thread erik quanstrom
On Tue May  5 15:54:45 PDT 2015, ara...@mgk.ro wrote:
 It's pretty interesting that at least three people all got exactly
 150kB/s on vastly different machines, both real and virtual. Maybe the
 number comes from some tick frequency?

i might suggest altering HZ and seeing if there is a throughput change in the 
same ratio.

- erik



Re: [9fans] fossil+venti performance question

2015-05-06 Thread erik quanstrom
On Wed May  6 14:28:03 PDT 2015, 0in...@gmail.com wrote:
 I got it!
 
 The regression was caused by the NewReno TCP
 change on 2013-01-24.
 
 https://github.com/0intro/plan9/commit/e8406a2f44

if you have proof, i'd be interested in reproduction of the issue from the 
original source, or
perhaps just nix.  let me know if i can help.

- erik



Re: [9fans] fossil+venti performance question

2015-05-05 Thread Aram Hăvărneanu
It's pretty interesting that at least three people all got exactly
150kB/s on vastly different machines, both real and virtual. Maybe the
number comes from some tick frequency?

-- 
Aram Hăvărneanu



Re: [9fans] fossil+venti performance question

2015-05-05 Thread KADOTA Kyohei
Thanks Anthony.

 I bet if you re-run the same test twice in a
 row, you’re going to see dramatically improved
 performance.

I try to re-run ‘iostats md5sum /386/9pcf’.
Read result is very fast.

first read result is 152KB/s.
second read result is 232MB/s.

 Your write performance in that test isn’t really
 relevant: they’re not hitting the file system at all.

I think to write 1GB data to filesystem:

iostats dd -if /dev/zero -of output -ibs 1024k -obs 1024k -count 1024

Write result of dd is 31MB/s.
But this test may just write to fossil. It may not write to venti.

 I’m not sure why you’d see a difference in a
 fossil+venti setup of a different size, but the
 partition size relationships, and the in-memory
 cache size relationships, are what’s mostly important.

My hardware has 2GB memory.
Plan 9 configurations are almost default. (except /dev/sdC0/bloom)
To increase memory size is difficult,
because memory size is determined by public QEMU/KVM service plan.

—
kadota


Re: [9fans] fossil+venti performance question

2015-05-05 Thread KADOTA Kyohei
Thanks Aram.

 I have spent some time
 debugging this, but unfortunately, I couldn't find the root cause, and
 I just stopped using fossil.

I tried to measure performance effect by replacement of component.

1) mbr or GRUB
2) pbs or pbslba
3) sdata or sdvirtio (sdvirtio is imported from 9legacy)
4) kernel configurations (9pcf, 9pccpuf, 9pcauth, etc)

unfortunately, all of the above are no performance effect.

—
kadota


Re: [9fans] fossil+venti performance question

2015-05-05 Thread st...@quintile.net
I too see this, and feel, no proof, that things used to be better. I.e. the 
first time I read a file from venti it it very, very slow. subsequent reads 
from the ram cache are quick.

I think venti used to be faster a few years ago. maybe another effect of this 
is the boot time seems slower than it used to be.

sorry to be vague.

-Steve





 On 5 May 2015, at 15:47, KADOTA Kyohei lu...@me.com wrote:
 
 Thanks Anthony.
 
 I bet if you re-run the same test twice in a
 row, you’re going to see dramatically improved
 performance.
 
 I try to re-run ‘iostats md5sum /386/9pcf’.
 Read result is very fast.
 
 first read result is 152KB/s.
 second read result is 232MB/s.
 
 Your write performance in that test isn’t really
 relevant: they’re not hitting the file system at all.
 
 I think to write 1GB data to filesystem:
 
iostats dd -if /dev/zero -of output -ibs 1024k -obs 1024k -count 1024
 
 Write result of dd is 31MB/s.
 But this test may just write to fossil. It may not write to venti.
 
 I’m not sure why you’d see a difference in a
 fossil+venti setup of a different size, but the
 partition size relationships, and the in-memory
 cache size relationships, are what’s mostly important.
 
 My hardware has 2GB memory.
 Plan 9 configurations are almost default. (except /dev/sdC0/bloom)
 To increase memory size is difficult,
 because memory size is determined by public QEMU/KVM service plan.
 
 —
 kadota



Re: [9fans] fossil+venti performance question

2015-05-05 Thread Sergey Zhilkin
Hello!

imho placing fossil, venti, isect, bloom and swap on single drive is bad
idea.
As written in in http://plan9.bell-labs.com/sys/doc/venti/venti.html - The
prototype Venti server is implemented for the Plan 9 operating system in
about 10,000 lines of C. The server runs on a dedicated dual 550Mhz Pentium
III processor system with 2 Gbyte of memory and is accessed over a 100Mbs
Ethernet network. The data log is stored on a 500 Gbyte MaxTronic IDE Raid
5 Array and the index resides on a string of 8 Seagate Cheetah 18XL 9 Gbyte
SCSI drives.

God ide is to store isect on multiple SSD drives :) to speed up search.

My small installation - 80Gb (PATA) 9fat, fossil, swap + 40Gb isect, bloom
drive (PATA) + 1Tb SATA as arenas. No RAID.

2015-05-04 21:51 GMT+03:00 David du Colombier 0in...@gmail.com:

 I'm experiencing the same issue as well.

 When I launch vacfs on the same machine as Venti,
 reading is very slow. When I launch vacfs on another
 Plan 9 or Unix machine, reading is fast.

 I've just made some measurements when reading a file:

 Vacfs running on the same machine as Venti: 151 KB/s
 Vacfs running on another machine: 5131 KB/s

 --
 David du Colombier




-- 
С наилучшими пожеланиями
Жилкин Сергей
With best regards
Zhilkin Sergey


Re: [9fans] fossil+venti performance question

2015-05-05 Thread Charles Forsyth
On 4 May 2015 at 19:51, David du Colombier 0in...@gmail.com wrote:


 I've just made some measurements when reading a file:

 Vacfs running on the same machine as Venti: 151 KB/s
 Vacfs running on another machine: 5131 KB/s


How many times do you time it on each machine?


Re: [9fans] fossil+venti performance question

2015-05-05 Thread David du Colombier
 I've just made some measurements when reading a file:

 Vacfs running on the same machine as Venti: 151 KB/s
 Vacfs running on another machine: 5131 KB/s


 How many times do you time it on each machine?

Maybe ten times. The results are always the same ~5%.
Also, I restarted vacfs between each try.

It's easy to reproduce this issue with vacfs. I think
anyone running Venti on Plan 9 can observe this problem.

-- 
David du Colombier



Re: [9fans] fossil+venti performance question

2015-05-05 Thread David du Colombier
Yes, I'm pretty sure it's not related to Fossil, since it happens with
vacfs as well.
Also, Venti was pretty much unchanged during the last few years.

I suspected it was related to the lock change on 2013-09-19.

https://github.com/0intro/plan9/commit/c4d045a91e

But I remember I tried to revert this change and the problem
was still present. Maybe I should try again to be sure.

-- 
David du Colombier



Re: [9fans] fossil+venti performance question

2015-05-05 Thread Charles Forsyth
On 5 May 2015 at 16:38, David du Colombier 0in...@gmail.com wrote:

  How many times do you time it on each machine?

 Maybe ten times. The results are always the same ~5%.
 Also, I restarted vacfs between each try.


It was the effect of the ram caches that prompted the question.

My experience is similar to Steve's: it was faster, and now it's initially
very very slow.
I looked at changes from that version of venti to this, and I didn't see
anything that would cause that.
(The problem could be outside venti, but I looked at some possibly relevant
kernel changes too.)

Note that the raw drive speed on my venti machine is fine (no doubt it
could be better, but it's fine).
I convinced myself through experiments that the problem was with venti, not
fossil.
I used some debugging code in venti and had the impression that it took a
surprisingly long time
to handle each request: that the time was in venti. The effect was similar
to that of a lost
interrupt for a device driver. I used ratrace on it, but didn't spot an
obvious culprit.
I was tempted to rip out or disable the drive scheduling code in venti to
see what happened, but
not for the first time I ran out of time and had to get back to some other
work.

One thing I didn't know was that the results were different when fossil was
on a different machine.
I thought I'd tried that with vacfs myself, but apparently not or mine was
as slow as when on the same
machine.


Re: [9fans] fossil+venti performance question

2015-05-05 Thread David du Colombier
 I too see this, and feel, no proof, that things used to be better. I.e. the 
 first time I read a file from venti it it very, very slow. subsequent reads 
 from the ram cache are quick.

 I think venti used to be faster a few years ago. maybe another effect of this 
 is the boot time seems slower than it used to be.

 sorry to be vague.

I'm pretty sure this issue started something like two years ago.
It looks like a regression somewhere.

-- 
David du Colombier



[9fans] fossil+venti performance question

2015-05-04 Thread KADOTA Kyohei
Hello, fans.

I’m running Plan 9(labs) on public QEMU/KVM service.
My Plan 9 system has a slow read performance problem.
I ran 'iostats md5sum /386/9pcf’, DMA is on, read result is 150KB/s.
but write performance is fast.

My Plan 9 system has a 200GB HDD, formatted with fossil+venti.
disk layout is:

- 9fat  100MB
- nvram 512B
- fossil31.82GB
- arenas159.11GB
- isect 7.95GB
- bloom 512MB
- swap  512MB

Also, I explained other installations.

1)200GB HDD with fossil only.
2)100GB HDD with fossil+venti.

Read performance is fast (about 15MB/s) both installations.

Could you tell me the reason?


Re: [9fans] fossil+venti performance question

2015-05-04 Thread Anthony Sorace
The reason, in general:
In a fossil+venti setup, fossil runs (basically) as a
cache for venti. If your access just hits fossil, it’ll
be quick; if not, you hit the (significantly slower)
venti. I bet if you re-run the same test twice in a
row, you’re going to see dramatically improved
performance. Try it. If that’s true, the question is
really one of venti performance; if not, you may
have another system config issue.

There are various changes you can make to how
venti uses disk/memory that can speed things up,
but I don’t have a good handle on which to
suggest first.

Your write performance in that test isn’t really
relevant: they’re not hitting the file system at all.

I’m not sure why you’d see a difference in a
fossil+venti setup of a different size, but the
partition size relationships, and the in-memory
cache size relationships, are what’s mostly important.

a




Re: [9fans] fossil+venti performance question

2015-05-04 Thread David du Colombier
I'm experiencing the same issue as well.

When I launch vacfs on the same machine as Venti,
reading is very slow. When I launch vacfs on another
Plan 9 or Unix machine, reading is fast.

I've just made some measurements when reading a file:

Vacfs running on the same machine as Venti: 151 KB/s
Vacfs running on another machine: 5131 KB/s

-- 
David du Colombier