Re: [tor-dev] Load Balancing in 2.7 series - incompatible with OnionBalance ?

Alec Muffett Thu, 22 Oct 2015 09:32:08 -0700

i...@tvdw.eu wrote:

> Hi Alec,

Hi Tom! I love your proposal, BTW. :-)

> Most of what you said sounds right, and I agree that caching needs TTLs (not 
> just here, all caches need to have them, always).

Thank you!

> However, you mention that one DC going down could cause a bad experience for 
> users. In most HA/DR setups I've seen there should be enough capacity if 
> something fails, is that not the case for you? Can a single data center not 
> serve all Tor traffic?

It's not the datacentre which worries me - we already know how to deal with 
those - it's the failure-based resource contention for the limited 
introduction-point space that is afforded by a maximum (?) of six descriptors 
each of which cites 10 introduction points.

A cap of 60 IPs is a clear protocol bottleneck which - even with your excellent 
idea - could break a service deployment.

Yes, in the meantime the proper solution is to split the service three ways, or 
even four, but that's administrative burden which less well-resourced 
organisations might struggle with.

Many (most?) will have a primary site and a single failover site, and it seems 
perverse that they could bounce just ONE of those sites and automatically lose 
50% of their Onion capacity for up to 24 hours UNLESS they also take down the 
OTHER site for long enough to invalidate the OnionBalance descriptors.

Such is not the description of a high-availability (HA) service, and it might 
put people off.

> If that is a problem, I would suggest adding more data centers to the pool. 
> That way if one fails, you don't lose half of the capacity, but a third (if 
> N=3) or even a tenth (if N=10).

...but you lose it for 1..24 hours, even if you simply reboot the Tor daemon.

> Anyway, such a thing is probably off-topic. To get back to the point about 
> TTLs, I just want to note that retrying failed nodes until all fail is scary:

I find that worrying, also. I'm not sure what I think about it yet, though.

> what will happen if all ten nodes get a 'rolling restart' throughout the day? 
> Wouldn't you eventually end up with all the traffic on a single node, as it's 
> the only one that hadn't been restarted yet?

Precisely.

> As far as I can see the only thing that can avoid holes like that is a TTL, 
> either hard coded to something like an hour, or just specified in the 
> descriptor. Then, if you do a rolling restart, make sure you don't do it all 
> within one TTL length, but at least two or three depending on capacity.

Concur.

desnac...@riseup.net wrote:

> Please see rend_client_get_random_intro_impl(). Clients will pick a random 
> intro point from the descriptor which seems to be the proper behavior here.

That looks great!

> I can see how a TTL might be useful in high availability scenarios like the 
> one you described. However, it does seem like something with potential 
> security implications (like, set TTL to 1 second for all your descriptors, 
> and now you have your clients keep on making directory circuits to fetch your 
> descs).

Okay, so, how about:

IDEA: if ANY descriptor introduction point connection fails AND the 
descriptor's ttl has been exceeded THEN refetch the descriptor before trying 
again?

It strikes me (though I may be wrong?) that the degenerate case for this would 
be someone with an onion killing their IP in order to force the user to refetch 
a descriptor - which is what I think would happen anyway?

At very least this proposal would add a work factor.

> For this reason I'd be interested to see this specified in a formal Tor 
> proposal (or even as a patch to prop224). It shouldn't be too big! :)

I would hesitate to add it to Prop 224 which strikes me as rather large and 
distant.  I'd love to see this by Christmas :-P

teor2...@gmail.com wrote:

> Do we connect to introduction points in the order they are listed in the 
> descriptor? If so, that's not ideal, there are surely benefits to a random 
> choice (such as load balancing).

Apparently not (re: George) :-)

> That said, we believe that rendezvous points are the bottleneck in the 
> rendezvous protocol, not introduction points.

Currently, and in most current deployments, yes.

> However, if you were to use proposal #255 to split the introduction and 
> rendezvous to separate tor instances, you would then be limited to:
> - 6*10*N tor introduction points, where there are 6 HSDirs, each receiving 10 
> different introduction points from different tor instances, and N failover 
> instances of this infrastructure competing to post descriptors. (Where N = 1, 
> 2, 3.)
> - a virtually unlimited number of tor servers doing the rendezvous and 
> exchanging data (say 1 server per M clients, where M is perhaps 100 or so, 
> but ideally dynamically determined based on load/response time).
> In this scenario, you could potentially overload the introduction points.

Exactly my concern, especially when combined with overlong lifetimes of 
mostly-zombie descriptors.

- alec

signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

Re: [tor-dev] Load Balancing in 2.7 series - incompatible with OnionBalance ?

Reply via email to