Re: [tor-dev] [RFC] Proposal: A First Take at PoW Over Introduction Circuits

David Goulet Tue, 07 Apr 2020 07:21:25 -0700

On 06 Apr (17:08:12), Mike Perry wrote:

[snip]


> > I think you'll be bound by the amount of data a connection inbuf can take
> > which has an upper bound of 32 cells each read event.
> > 
> > Then tor will have to empty at once the inbuf, queue all INTRODUCE2 cells 
> > (at
> > most 32) in that priority queue and once done, we would process it until we
> > return to handling the connection inbuf.
> > 
> > In other words, the queue size, with tor's architecture, is bound to the
> > number of cells upper bound you can get when doing a recv() pass which is 32
> > cells.
> > 
> > Nevertheless, that limit is weirdly hardcoded in tor so you should 
> > definitely
> > think of a way to upper bound the queue and just drop the rest. A good
> > starting point would be that 32 cells number?
> 
> dgoulet and I think the following will work better than always doing 32
> cells at a time. Basically, the idea is to split our INTRO2 handling
> into "top-half" and "bottom-half" handlers.
> 
> Top-half handling:
>  1) read 32 cells off inbuf as usual
>  2) do AES relay cell decryption as usual
>  3) Parse relay headers, handle all cells as usual, except:
>     a) in hs_service_receive_introduce2(), add to pqueue
>        and return without further processing of those
>  4) Return to rest of mainloop

Agree with the above. It is trivial to do this today so very low engineering
cost.

> 
> Then, separately, also in mainloop, do the bottom half. (TODO: What
> group priority?)
> 
> Bottom-half handling:
>   I) pop a single intro2 off of pqueue (ie: max difficulty in queue)
>  II) Compare this difficulty to desc difficulty. If lower, lower
>      desc difficulty
> III) Parse it and launch RP circuit (as per current bottom half of
>      hs_service_receive_introduce2())
>  IV) trim pqueue elements, if queue "too big" (TODO: how to trim?)
>   V) Compare trim point difficulty to descriptor difficulty, if trim
>      point was higher than descriptor value, raise desc difficulty
>  VI) return to libevent/mainloop again

I would maybe try to convince you that we could dequeue more than 1 cell here
because this behavior is changing quite a bit the current state of HS.

Right now, we would get 32 cells out of the inbuf and one at a time, process
it then go back to mainloop.

This new algorithm means that we would process 1 single cell at each mainloop
event instead of 32. This is quite a decrease. Ok, it is not that exact ration
because maybe dequeue the inbuf without processing the decrypted INTRO2 is
fast but it is still a full mainloop run per cell is clearly slower than right
now.

We need some sort of performance measurements here to make an informed
decision but my guts feeling tells me that we might want to don't know process
5 or 10 cells instead of 1 per mainloop round.

We _should_ run timing measurement here to see how much delaying INTRO2
processing to another mainloop event affects the overall rate of introduction.

But let say a full mainloop run takes 100msec, we will process 50
introductions per second... that looks quite low? But could be already what we
do now, unknown.

> 
> 
> The astute reader will note that even without PoW, the above can provide
> almost the exact same functionality as the naive rate limiting currently
> done at intropoints - just cut the queue arbitrarily. Basically PoW and
> the pqueue just gives us a smarter way to decide who to reply to.
> 
> However, it still has the following potential issues:
>   A) AES will bottleneck us at ~100Mbit-300Mbit at #2 in top-half above
>   B) Extra mainloop() iterations for INTRO2s may be expensive (or not?)

Possibly, from the above, some analysis should happen. I can easily do that
once we get the tracing API upstream.

> 
> For A, onionbalance will still help by adding new back-end instances
> providing intropoints via separate back-end Tor daemons, either on the
> same box or different boxes.
> 
> But it will only help up to a point. A HSDesc maxes out at 30k, so at
> some point we'll run out of space to list more intropoints in a single
> descriptor. At that point, we can still list different intropoints at
> each HSDir position, but after that, we are screwed.

Small correctino. HSDesc max at 50k for v3 and 20k for v2. But lets just
consider v3 for the forseable future :D.

[snip]

> > I would _love_ to but could be too early for that if we consider that we are
> > still unsure that this defense will be useful or not (according to Mike as a
> > discussion on IRC).
> 
> As described above, multithreading still provides a multiplier in the
> AES bottleneck case, even over onionbalance.
> 
> But, there may be more bottlenecks than just AES crypto, so this is a
> further argument for not jumping the gun just yet, and trying v1 first
> (or even a simple prototype without pow, that just cuts the queue
> arbitrarily), and getting some more profiling data.

As an initial step, I agree. Onionbalance provides an easy way for service to
outsource client introduction to more CPUs.

But, in my opinion, onionbalance is a power user solution and thus usually a
small percentage of our .onion users that can take advantage of it. As .onion
move more and more in the mobile sphere and client to client applications
(onionshare, ricochet), it ain't much of an option :S.

Without using more CPUs in Tor, I have a _hard_ time seeing tor scale over
time especially for services. As long as we keep that in mind with our
designs, I'm good :).

> 
> Next steps (not necessarily in order):
> 
> a) pqueue plan review + more detailed description
> b) Figure out pqueue trim mechanism - can we do better than O(n)?

Initially, we could simply go with an upper limit and just drop cells as you
queue them if you are above limit? As in drop back() if you reach the limit
everytime you queue?

Else, we can get into more complicated schemes with queueing rate versus
processing rate and come down with a golden number to strike a memory and CPU
balance...?

> c) test using randomx as a hash function in various ways, esp wrt
>    key usage, cache setup, and VM infos
> d) test pqueue refactoring, maybe without pow involved yet
> e) specify a v1.5 that works at intropoint (to avoid AES bottleneck)
> f) Merge this thread back into a single proposal document

We could put the engineering details could be in an Annexe of the proposal if
we don't want to loose track of it and not dispersed on a mailing list :)?

I'm happy to help with this, let me know.

> g) other stuff we forgot, XXX's, TODOs, etc.

Cheers!
David

-- 
6Kt/Zhrgtx/7LnJzUvEZzDZSIq5yophZyqpI6PgAsv4=

signature.asc
Description: PGP signature

_______________________________________________
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

Re: [tor-dev] [RFC] Proposal: A First Take at PoW Over Introduction Circuits

Reply via email to