Re: [tor-dev] [RFC] Proposal: A First Take at PoW Over Introduction Circuits

Mike Perry Mon, 06 Apr 2020 15:09:13 -0700

Trimming to stuff that I just want to reply to; I otherwise agree.

Note: in a couple places I replied directly to asn's OP, because I
noticed some more questions that I could answer.

On 4/2/20 2:30 PM, David Goulet wrote:
>
> On 02 Apr (18:54:59), George Kadianakis wrote:
>> 2.2.2. Dynamic PoW
>>
>>   DoS is a dynamic problem where the attacker's capabilities constantly 
>> change,
>>   and hence we want our proof-of-work system to be dynamic and not stuck 
>> with a
>>   static difficulty setting. Hence, instead of forcing clients to go below a
>>   static target like in Bitcoin to be successful, we ask clients to "bid" 
>> using
>>   their PoW effort. Effectively, a client gets higher priority the higher
>>   effort they put into their proof-of-work. This is similar to how
>>   proof-of-stake works but instead of staking coins, you stake work.
> 
> So this means that desktop users will be prioritized over mobile users
> basically unless I make my phone use X% of battery?

Yes. We should be clear that this is not meant to be on all the time,
and that yes, it is likely to sacrifice access by mobile users,
depending on the current attack volume/difficulty level.

The Tor Browser Android UI could inform users the service is under
attack and direct them try on their desktop instead.

>>   If a client receives a descriptor with "pow-params", it should assume that
>>   the service is expecting a PoW input as part of the introduction protocol.
> 
> What happens with clients _without_ PoW support? They basically won't be able
> to connect I suppose? Or be put in the prio queue at the service at the very
> hand with work done = 0 ?

The work_done=0 will be better. Fake work with actual work_done=0 is
just as easy to create as omitting work, for an attacker.

>> 3.4.1. PoW verification
>>
>>    To verify the client's proof-of-work the service extracts (hash_output,
>>    seed, nonce) from the INTRODUCE1 cell and MUST do the following steps:
>>
>>    1) Make sure that the client's seed is identical to the active seed.
>>    2) Check the client's nonce for replays (see [REPLAY_PROTECTION] section).
>>    3) Verify that 'hash_output =?= argon2(seed, nonce)
> 
> So wait, the service also has to do the PoW for each client by computing the
> Argon2 hash for each cell? Or am I mis-understanding?

Yes, but the service side only has to run the hash once, which should be
fast.

The client/attacker is hashing many times, to search for a nonce that
will satisfy the target hash value comparison.

But oops! We forgot to list that directly above, which might have caused
the confusion:

0) Check that hash_output <= target_level

>> 3.4.2. The Introduction Queue  [INTRO_QUEUE]
>>
>> 3.4.2.1. Adding introductions to the introduction queue
>>
>>   When PoW is enabled and a verified introduction comes through, the service
>>   instead of jumping straight into rendezvous, queues it and prioritizes it
>>   based on how much effort was devoted by the client to PoW. This means that
>>   introduction requests with high effort should be prioritized over those 
>> with
>>   low effort.
>>
>>   To do so, the service maintains an "introduction priority queue" data
>>   structure. Each element in that priority queue is an introduction request,
>>   and its priority is the effort put into its PoW:
>>
>>   When a verified introduction comes through, the service interprets the PoW
>>   hash as a 32-byte big-endian integer 'hash_int' and based on that integer 
>> it
>>   inserts it into the right position of the priority_queue: The smallest
>>   'hash_int' goes forward in the queue. If two elements have the same value,
>>   the older one has priority over the newer one.
>>   {XXX: Is this operation with 32-bytes integers expensive? How to make 
>> cheaper?}

One option: either subtract or and-off the target, so we're comparing
difficulty relative to the target - ie a much smaller integer space.

In blockchain world, typically the difficulty target is expressed as a
bitmask anyway, probably for reasons like this.

>>   {TODO: PARAM_TUNING: If the priority queue is only ordered based on the
>>    effort what attacks can happen in various scenarios? Do we want to order 
>> on
>>    time+effort?  Which scenarios and attackers should we examine here?}
>>
>>   {TODO: PARAM_TUNING: What's the max size of the queue? How do we trim it? 
>> Can we
>>    use WRED usefully?}
> 
> I think you'll be bound by the amount of data a connection inbuf can take
> which has an upper bound of 32 cells each read event.
> 
> Then tor will have to empty at once the inbuf, queue all INTRODUCE2 cells (at
> most 32) in that priority queue and once done, we would process it until we
> return to handling the connection inbuf.
> 
> In other words, the queue size, with tor's architecture, is bound to the
> number of cells upper bound you can get when doing a recv() pass which is 32
> cells.
> 
> Nevertheless, that limit is weirdly hardcoded in tor so you should definitely
> think of a way to upper bound the queue and just drop the rest. A good
> starting point would be that 32 cells number?

dgoulet and I think the following will work better than always doing 32
cells at a time. Basically, the idea is to split our INTRO2 handling
into "top-half" and "bottom-half" handlers.

Top-half handling:
 1) read 32 cells off inbuf as usual
 2) do AES relay cell decryption as usual
 3) Parse relay headers, handle all cells as usual, except:
    a) in hs_service_receive_introduce2(), add to pqueue
       and return without further processing of those
 4) Return to rest of mainloop

Then, separately, also in mainloop, do the bottom half. (TODO: What
group priority?)

Bottom-half handling:
  I) pop a single intro2 off of pqueue (ie: max difficulty in queue)
 II) Compare this difficulty to desc difficulty. If lower, lower
     desc difficulty
III) Parse it and launch RP circuit (as per current bottom half of
     hs_service_receive_introduce2())
 IV) trim pqueue elements, if queue "too big" (TODO: how to trim?)
  V) Compare trim point difficulty to descriptor difficulty, if trim
     point was higher than descriptor value, raise desc difficulty
 VI) return to libevent/mainloop again

The astute reader will note that even without PoW, the above can provide
almost the exact same functionality as the naive rate limiting currently
done at intropoints - just cut the queue arbitrarily. Basically PoW and
the pqueue just gives us a smarter way to decide who to reply to.

However, it still has the following potential issues:
  A) AES will bottleneck us at ~100Mbit-300Mbit at #2 in top-half above
  B) Extra mainloop() iterations for INTRO2s may be expensive (or not?)

For A, onionbalance will still help by adding new back-end instances
providing intropoints via separate back-end Tor daemons, either on the
same box or different boxes.

But it will only help up to a point. A HSDesc maxes out at 30k, so at
some point we'll run out of space to list more intropoints in a single
descriptor. At that point, we can still list different intropoints at
each HSDir position, but after that, we are screwed.

We can alternatively avoid the AES bottleneck by moving this whole
system to the intropoint tor relay. Basically, we split
hs_intro_received_introduce1() into a top and bottom half in almost
exactly the same way we split hs_service_receive_introduce2(), and we
use the unencrypted extension field instead of the encrypted one.

This is a very straight-forward v1.5 change, but it requires intropoints
on the network to upgrade for it to work.

>> 3.4.2.2. Handling introductions from the introduction queue [HANDLE_QUEUE]
>>
>>   The service should handle introductions by pulling from the introduction
>>   queue.
>>
>>   Similar to how our cell scheduler works, the onion service subsystem will
>>   poll the priority queue every 100ms tick and process the first 20 cells 
>> from
>>   the priority queue (if they exist). The service will perform the rendezvous
>>   and the rest of the onion service protocol as normal.
>>
>>   With this tempo, we can process 200 introduction cells per second.
> 
> As I described above, I think we might want to do something like that for
> simplicity at first which is "empty inbuf by priority queuing all INTRODUCE2"
> and once done, process them.
> 
> Thus, it won't be like the cell scheduler that accumulates until a certain
> tick (10msec) and then process it all.
> 
>>   {XXX: Is this good?}
>>
>>   {TODO: PARAM_TUNING: STRAWMAN: This needs hella tuning. Processing 20 cells
>>   per 100ms is probably unmaintainable, since each cell is quite expensive:
>>   doing so involving path selection, crypto and making circuits. We will need
>>   to profile this procedure and see how we can do this scheduling better.}
> 
> With the above, we should be within the same performance as we have right now
> since we just deferring the processing of INTRODUCE2 cell after the inbuf is
> emptied.
> 
>>
>>   {XXX: This might be a nice place to promote multithreading. Queues and 
>> pools
>>   are nice objects to do multithreading since you can have multiple threads
>>   pull from the queue, or leave stuff on the queue. Not sure if this should 
>> be
>>   in the proposal tho.}
> 
> I would _love_ to but could be too early for that if we consider that we are
> still unsure that this defense will be useful or not (according to Mike as a
> discussion on IRC).

As described above, multithreading still provides a multiplier in the
AES bottleneck case, even over onionbalance.

But, there may be more bottlenecks than just AES crypto, so this is a
further argument for not jumping the gun just yet, and trying v1 first
(or even a simple prototype without pow, that just cuts the queue
arbitrarily), and getting some more profiling data.

Next steps (not necessarily in order):

a) pqueue plan review + more detailed description
b) Figure out pqueue trim mechanism - can we do better than O(n)?
c) test using randomx as a hash function in various ways, esp wrt
   key usage, cache setup, and VM infos
d) test pqueue refactoring, maybe without pow involved yet
e) specify a v1.5 that works at intropoint (to avoid AES bottleneck)
f) Merge this thread back into a single proposal document
g) other stuff we forgot, XXX's, TODOs, etc.

-- 
Mike Perry

signature.asc
Description: OpenPGP digital signature

_______________________________________________
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

Re: [tor-dev] [RFC] Proposal: A First Take at PoW Over Introduction Circuits

Reply via email to