Re: [tor-relays] Network Performance Experiment - KISTSchedRunInterval - October 2020

2020-10-16 Thread Toralf Förster

On 10/15/20 3:14 PM, David Goulet wrote:

This is where we need your help. We would like you to notify us on this thread
about any noticeable changes in CPU, RAM, or BW usage. In other words,
anything that changes from the "average" you've been seeing is worth informing
us.


Maybe completely unrelated, but at a first glance it seems, that a

	iftop -B -P -N -n -b -i enp4s0 -F 5.9.158.75/32 -F 
[2a01:4f8:190:514a::2]/64


at a host running 2 Tor relays at 0.4.5.0-alpha-dev now shows more IPv6 
connection among the top ones (wrt throughput)


--
Toralf


OpenPGP_0xC4EACDDE0076E94E.asc
Description: application/pgp-keys


OpenPGP_signature
Description: OpenPGP digital signature
___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Network Performance Experiment - KISTSchedRunInterval - October 2020

2020-10-16 Thread niftybunny
Okay. Is there a page for critters with wee brains? An ELI5 or even ELI3 would 
be great.

nifty


> 
> This is the page of what we planned to work on.
> 
> https://gitlab.torproject.org/tpo/core/team/-/wikis/NetworkTeam/Sponsor61/PerformanceExperiments
> 
> We are still very early on the KIST experiment here so will update the page
> with the latest today or very soon once we are starting to see/measure the
> effects.
> 
> David
> 
> --
> 7h1/NAPdaaGpI8WG6X4FtryAZZ4EhnznUVVLqIf/04A=
> 
> 



signature.asc
Description: Message signed with OpenPGP
___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Network Performance Experiment - KISTSchedRunInterval - October 2020

2020-10-16 Thread David Goulet
On 15 Oct (19:26:09), Roger Dingledine wrote:
> On Thu, Oct 15, 2020 at 11:40:34PM +0200, nusenu wrote:
> > since it is in effect by now 
> > https://consensus-health.torproject.org/#consensusparams
> > could you publish the exact timestamp when it came into effect?
> 
> One can learn this from the recent consensus documents, e.g. at
> https://collector.torproject.org/recent/relay-descriptors/consensuses/
> 
> And I agree that we should have a central experiment page (e.g. on gitlab)
> that lists the experiments, when we ran them, when the network changes
> occurred, what we expected to find, and what we *did* find.
> 
> David or Mike, can you make sure that page happens?

This is the page of what we planned to work on.

https://gitlab.torproject.org/tpo/core/team/-/wikis/NetworkTeam/Sponsor61/PerformanceExperiments

We are still very early on the KIST experiment here so will update the page
with the latest today or very soon once we are starting to see/measure the
effects.

David

-- 
7h1/NAPdaaGpI8WG6X4FtryAZZ4EhnznUVVLqIf/04A=


signature.asc
Description: PGP signature
___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Network Performance Experiment - KISTSchedRunInterval - October 2020

2020-10-15 Thread Roger Dingledine
On Thu, Oct 15, 2020 at 11:40:34PM +0200, nusenu wrote:
> since it is in effect by now 
> https://consensus-health.torproject.org/#consensusparams
> could you publish the exact timestamp when it came into effect?

One can learn this from the recent consensus documents, e.g. at
https://collector.torproject.org/recent/relay-descriptors/consensuses/

And I agree that we should have a central experiment page (e.g. on gitlab)
that lists the experiments, when we ran them, when the network changes
occurred, what we expected to find, and what we *did* find.

David or Mike, can you make sure that page happens?

> I noticed some unusual things today (exits having a non-zero guard 
> probability),
> did you change more parameters than this one or was this the only one?

No, that was the only change.

We had a good discussion with Florentin et al on #tor-dev just now,
where we concluded that yes, we're still in "case 3be (E scarce)", but
the math still allows a little bit of use of exits for other roles:
check out the networkstatus_compute_bw_weights_v10() function in
src/feature/dirauth/dirvote.c.

So as far as we can tell so far, we are still in the "exit scarce"
case of Mike's weight voodoo, but his math allows exits to be used
a little bit in non-exit roles even in this case.

Wed = (weight_scale*(D - 2*E + G + M))/(3*D);

Wgd = (weight_scale - Wed)/2;

And Wed in this case is 9849 rather than 1.

So, to say it much more plainly, we are just barely on the other side
of the line from "exit capacity is so scarce that exits will only ever
be used for exiting."

Mike was expecting some rebalancing to be done by the bwauths, once
we shifted the Kist interval, but I don't know whether we're seeing
that rebalancing or if it this is a coincidence.

--Roger

___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Network Performance Experiment - KISTSchedRunInterval - October 2020

2020-10-15 Thread David Goulet
On 15 Oct (23:40:34), nusenu wrote:
> >   KISTSchedRunInterval=2
> > 
> > We are still missing 1 authority to enable this param for it to take effect
> > network wide. Hopefully, it should be today in the coming hours/day.
> 
> since it is in effect by now 
> https://consensus-health.torproject.org/#consensusparams
> could you publish the exact timestamp when it came into effect?

The consensus on October 15th at 16:00 UTC was the first one that was voted
with the change.

> I noticed some unusual things today (exits having a non-zero guard 
> probability),
> did you change more parameters than this one or was this the only one?

We did not.

That's worth looking into!?

David

-- 
p/At8/mqdd1XtA1xdO8tCHLN779O3bdI7LSsRdGRpLA=


signature.asc
Description: PGP signature
___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


Re: [tor-relays] Network Performance Experiment - KISTSchedRunInterval - October 2020

2020-10-15 Thread nusenu
>   KISTSchedRunInterval=2
> 
> We are still missing 1 authority to enable this param for it to take effect
> network wide. Hopefully, it should be today in the coming hours/day.

since it is in effect by now 
https://consensus-health.torproject.org/#consensusparams
could you publish the exact timestamp when it came into effect?

I noticed some unusual things today (exits having a non-zero guard probability),
did you change more parameters than this one or was this the only one?




signature.asc
Description: OpenPGP digital signature
___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


[tor-relays] Network Performance Experiment - KISTSchedRunInterval - October 2020

2020-10-15 Thread David Goulet
Greetings relay operators!

Tor has now embarked in a 2 year long scalability project aimed, in part, at
improving the network performance.

The first steps will be to measure performance on the public network in order
to come up with a baseline. We'll likely be adjusting circuit window size,
cell scheduling (KIST) and circuit build timeout (CBT) parameters over the
next months in both contained experiments and on the live network.

This announcement is about KIST parameters, our cell scheduler.

Roughly a year ago, we've discovered that all tor clients are capped to
~3MB/sec in maximum outbound throughput due to how the scheduler is operating.
I won't get into the details but if you are curious, it is here:

  https://gitlab.torproject.org/tpo/core/tor/-/issues/29427

It turns out that we now believe that the entire network, not only clients,
are actually capped at 3MB/sec per channel (a channel is a connection between
client -> relay or relay -> relay also called an OR connection).

We've recently conducted experiments with chutney [1], which operates on the
loopback interface, and we indeed hit those limits.

KIST has a parameter named KISTSchedRunInterval which is currently set at 10
msec and that is our culprit. By lowering it to 2 msec, our experiment showed
that the cap goes from 3MB/sec to ~5MB/sec with burst a bit higher.

Now, the question is why was it set to 10 msec in the first place? Again,
without getting into the technical details of the KIST paper[2], our cell
scheduler requires a "grace period" in order to be able to accumulate cells
and then prioritize over many circuits using an EWMA algorithm which tor has
been using for a long time now. Without this, one can clog the pipes (at the
TCP level) with a very loud transfer by always being scheduled and filling the
TCP buffers leaving nothing for the quieter circuit.

Important to note that the goal of EWMA in tor is to prioritize quiet circuit
for example, an SSH session will be prioritized over a bulk HTTP transfer.
This is so "likely" interactive connections are not delayed and are snappy.

But, lowering this to 2 msec means less time to accumulate and in theory worst
cell prioritization.

However, we think this will not be a problem because we believe the network is
underloaded. And, because of this 3MB/sec cap per channel, it means that tor is
sending burst of cells instead of a constant stream of cells and thus it is
under processing what it possibly could at the relay side. Again, all this in
theory.

All in all, going to 2 msec should improve speed at the very least and not make
the network worst.

We want to test that, measure that for a couple of weeks and then transition to
a higher value and doing that until we get to 10 msec so we can clearly well
compare the effect on EWMA priority and performance.

One possibility will be 2 msec, 5 msec, 10 msec transition period.

Yesterday, a request to our 9 directory authorities have been made to set this
consensus parameter:

  KISTSchedRunInterval=2

We are still missing 1 authority to enable this param for it to take effect
network wide. Hopefully, it should be today in the coming hours/day.

This is where we need your help. We would like you to notify us on this thread
about any noticeable changes in CPU, RAM, or BW usage. In other words,
anything that changes from the "average" you've been seeing is worth informing
us.

We do NOT expect big changes for your relay(s) but there could reasonably be a
change in bandwidth throughput and thus some of you could see a traffic
increase, unclear at the moment.

Huge thanks to everyone here! We will carefully monitor this change and if
things go bad, we'll revert it as fast as we can! Thus, your help becomes
extremely important!

Cheers!
David

[1] https://git.torproject.org/chutney.git/
[2] https://arxiv.org/pdf/1709.01044.pdf


-- 
7h1/NAPdaaGpI8WG6X4FtryAZZ4EhnznUVVLqIf/04A=


signature.asc
Description: PGP signature
___
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays