Re: [tor-relays] Network Performance Experiment - KISTSchedRunInterval - October 2020
On 10/15/20 3:14 PM, David Goulet wrote: This is where we need your help. We would like you to notify us on this thread about any noticeable changes in CPU, RAM, or BW usage. In other words, anything that changes from the "average" you've been seeing is worth informing us. Maybe completely unrelated, but at a first glance it seems, that a iftop -B -P -N -n -b -i enp4s0 -F 5.9.158.75/32 -F [2a01:4f8:190:514a::2]/64 at a host running 2 Tor relays at 0.4.5.0-alpha-dev now shows more IPv6 connection among the top ones (wrt throughput) -- Toralf OpenPGP_0xC4EACDDE0076E94E.asc Description: application/pgp-keys OpenPGP_signature Description: OpenPGP digital signature ___ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
Re: [tor-relays] Network Performance Experiment - KISTSchedRunInterval - October 2020
Okay. Is there a page for critters with wee brains? An ELI5 or even ELI3 would be great. nifty > > This is the page of what we planned to work on. > > https://gitlab.torproject.org/tpo/core/team/-/wikis/NetworkTeam/Sponsor61/PerformanceExperiments > > We are still very early on the KIST experiment here so will update the page > with the latest today or very soon once we are starting to see/measure the > effects. > > David > > -- > 7h1/NAPdaaGpI8WG6X4FtryAZZ4EhnznUVVLqIf/04A= > > signature.asc Description: Message signed with OpenPGP ___ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
Re: [tor-relays] Network Performance Experiment - KISTSchedRunInterval - October 2020
On 15 Oct (19:26:09), Roger Dingledine wrote: > On Thu, Oct 15, 2020 at 11:40:34PM +0200, nusenu wrote: > > since it is in effect by now > > https://consensus-health.torproject.org/#consensusparams > > could you publish the exact timestamp when it came into effect? > > One can learn this from the recent consensus documents, e.g. at > https://collector.torproject.org/recent/relay-descriptors/consensuses/ > > And I agree that we should have a central experiment page (e.g. on gitlab) > that lists the experiments, when we ran them, when the network changes > occurred, what we expected to find, and what we *did* find. > > David or Mike, can you make sure that page happens? This is the page of what we planned to work on. https://gitlab.torproject.org/tpo/core/team/-/wikis/NetworkTeam/Sponsor61/PerformanceExperiments We are still very early on the KIST experiment here so will update the page with the latest today or very soon once we are starting to see/measure the effects. David -- 7h1/NAPdaaGpI8WG6X4FtryAZZ4EhnznUVVLqIf/04A= signature.asc Description: PGP signature ___ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
Re: [tor-relays] Network Performance Experiment - KISTSchedRunInterval - October 2020
On Thu, Oct 15, 2020 at 11:40:34PM +0200, nusenu wrote: > since it is in effect by now > https://consensus-health.torproject.org/#consensusparams > could you publish the exact timestamp when it came into effect? One can learn this from the recent consensus documents, e.g. at https://collector.torproject.org/recent/relay-descriptors/consensuses/ And I agree that we should have a central experiment page (e.g. on gitlab) that lists the experiments, when we ran them, when the network changes occurred, what we expected to find, and what we *did* find. David or Mike, can you make sure that page happens? > I noticed some unusual things today (exits having a non-zero guard > probability), > did you change more parameters than this one or was this the only one? No, that was the only change. We had a good discussion with Florentin et al on #tor-dev just now, where we concluded that yes, we're still in "case 3be (E scarce)", but the math still allows a little bit of use of exits for other roles: check out the networkstatus_compute_bw_weights_v10() function in src/feature/dirauth/dirvote.c. So as far as we can tell so far, we are still in the "exit scarce" case of Mike's weight voodoo, but his math allows exits to be used a little bit in non-exit roles even in this case. Wed = (weight_scale*(D - 2*E + G + M))/(3*D); Wgd = (weight_scale - Wed)/2; And Wed in this case is 9849 rather than 1. So, to say it much more plainly, we are just barely on the other side of the line from "exit capacity is so scarce that exits will only ever be used for exiting." Mike was expecting some rebalancing to be done by the bwauths, once we shifted the Kist interval, but I don't know whether we're seeing that rebalancing or if it this is a coincidence. --Roger ___ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
Re: [tor-relays] Network Performance Experiment - KISTSchedRunInterval - October 2020
On 15 Oct (23:40:34), nusenu wrote: > > KISTSchedRunInterval=2 > > > > We are still missing 1 authority to enable this param for it to take effect > > network wide. Hopefully, it should be today in the coming hours/day. > > since it is in effect by now > https://consensus-health.torproject.org/#consensusparams > could you publish the exact timestamp when it came into effect? The consensus on October 15th at 16:00 UTC was the first one that was voted with the change. > I noticed some unusual things today (exits having a non-zero guard > probability), > did you change more parameters than this one or was this the only one? We did not. That's worth looking into!? David -- p/At8/mqdd1XtA1xdO8tCHLN779O3bdI7LSsRdGRpLA= signature.asc Description: PGP signature ___ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
Re: [tor-relays] Network Performance Experiment - KISTSchedRunInterval - October 2020
> KISTSchedRunInterval=2 > > We are still missing 1 authority to enable this param for it to take effect > network wide. Hopefully, it should be today in the coming hours/day. since it is in effect by now https://consensus-health.torproject.org/#consensusparams could you publish the exact timestamp when it came into effect? I noticed some unusual things today (exits having a non-zero guard probability), did you change more parameters than this one or was this the only one? signature.asc Description: OpenPGP digital signature ___ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
[tor-relays] Network Performance Experiment - KISTSchedRunInterval - October 2020
Greetings relay operators! Tor has now embarked in a 2 year long scalability project aimed, in part, at improving the network performance. The first steps will be to measure performance on the public network in order to come up with a baseline. We'll likely be adjusting circuit window size, cell scheduling (KIST) and circuit build timeout (CBT) parameters over the next months in both contained experiments and on the live network. This announcement is about KIST parameters, our cell scheduler. Roughly a year ago, we've discovered that all tor clients are capped to ~3MB/sec in maximum outbound throughput due to how the scheduler is operating. I won't get into the details but if you are curious, it is here: https://gitlab.torproject.org/tpo/core/tor/-/issues/29427 It turns out that we now believe that the entire network, not only clients, are actually capped at 3MB/sec per channel (a channel is a connection between client -> relay or relay -> relay also called an OR connection). We've recently conducted experiments with chutney [1], which operates on the loopback interface, and we indeed hit those limits. KIST has a parameter named KISTSchedRunInterval which is currently set at 10 msec and that is our culprit. By lowering it to 2 msec, our experiment showed that the cap goes from 3MB/sec to ~5MB/sec with burst a bit higher. Now, the question is why was it set to 10 msec in the first place? Again, without getting into the technical details of the KIST paper[2], our cell scheduler requires a "grace period" in order to be able to accumulate cells and then prioritize over many circuits using an EWMA algorithm which tor has been using for a long time now. Without this, one can clog the pipes (at the TCP level) with a very loud transfer by always being scheduled and filling the TCP buffers leaving nothing for the quieter circuit. Important to note that the goal of EWMA in tor is to prioritize quiet circuit for example, an SSH session will be prioritized over a bulk HTTP transfer. This is so "likely" interactive connections are not delayed and are snappy. But, lowering this to 2 msec means less time to accumulate and in theory worst cell prioritization. However, we think this will not be a problem because we believe the network is underloaded. And, because of this 3MB/sec cap per channel, it means that tor is sending burst of cells instead of a constant stream of cells and thus it is under processing what it possibly could at the relay side. Again, all this in theory. All in all, going to 2 msec should improve speed at the very least and not make the network worst. We want to test that, measure that for a couple of weeks and then transition to a higher value and doing that until we get to 10 msec so we can clearly well compare the effect on EWMA priority and performance. One possibility will be 2 msec, 5 msec, 10 msec transition period. Yesterday, a request to our 9 directory authorities have been made to set this consensus parameter: KISTSchedRunInterval=2 We are still missing 1 authority to enable this param for it to take effect network wide. Hopefully, it should be today in the coming hours/day. This is where we need your help. We would like you to notify us on this thread about any noticeable changes in CPU, RAM, or BW usage. In other words, anything that changes from the "average" you've been seeing is worth informing us. We do NOT expect big changes for your relay(s) but there could reasonably be a change in bandwidth throughput and thus some of you could see a traffic increase, unclear at the moment. Huge thanks to everyone here! We will carefully monitor this change and if things go bad, we'll revert it as fast as we can! Thus, your help becomes extremely important! Cheers! David [1] https://git.torproject.org/chutney.git/ [2] https://arxiv.org/pdf/1709.01044.pdf -- 7h1/NAPdaaGpI8WG6X4FtryAZZ4EhnznUVVLqIf/04A= signature.asc Description: PGP signature ___ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays