Re: How to Run High Capacity Tor Relays

2010-09-01 Thread Jacob Appelbaum
On 09/01/2010 02:28 PM, John Case wrote:
> 
>> Also, afaik, zero people in the wild are actively running Tor with any
>> crypto accelerator. May be a very painful process... I'm not really
>> interested in documenting it unless its proven to scale by actual use.
>> I want this document to end up with tested and reproduced results
>> only. You know, Science. Not computerscience ;)
> 
> 
> There was a _very_ interesting, long and detailed discussion of this
> about 1 year ago on this list.
> 
> I really do think some subset of that discussion should be included in
> your "lore", at the very least the parts pertaining to the built-in
> crypto acceleration included in recent sparc CPUs, which appear to be
> the only non-painful way to make this work.
> 
> My impression was that a significant boost could be had by accelerating
> openssl using this on-chip features...

If you're using a fast CPU, it's almost not worth the trouble to bother
with hardware acceleration.

All the best,
Jacob
***
To unsubscribe, send an e-mail to majord...@torproject.org with
unsubscribe or-talkin the body. http://archives.seul.org/or/talk/


Re: How to Run High Capacity Tor Relays

2010-09-01 Thread coderman
On Wed, Sep 1, 2010 at 2:28 PM, John Case  wrote:
>...
> I really do think some subset of that discussion should be included in your
> "lore", at the very least the parts pertaining to the built-in crypto
> acceleration included in recent sparc CPUs, which appear to be the only
> non-painful way to make this work.

if you're running a high capacity relay you likely don't need hw
acceleration because:

a. you're on a fast server with relatively modern processor to get
into the high capacity game. assembly optimized crypto is pretty fast
on these systems.

b. the compression, buffer management, and other aspects of Tor are
just as significant as the crypto specific parts on such a server.

c. the crypto hw needed to be effective is expensive, at least a
grand, or inside specialized server processors you're unlikely to have
in your dedicated / leased server hardware.


this is not to say it isn't useful. it's useful in all kinds of ways
ranging from efficiency improvements, side channel attack resistance,
to entropy sources for strong session key / nonce generation.

however, i doubt hardware crypto will prove useful for anyone in the
top tier of relay capacity to drastically improve their throughput or
efficiency overall given the current architecture of Tor itself.

and, as mentioned, there have been a number of threads on the subject,
and widely expanded OpenSSL engine support added since last year for
those interested in experimenting with hw acceleration.

best regards,
***
To unsubscribe, send an e-mail to majord...@torproject.org with
unsubscribe or-talkin the body. http://archives.seul.org/or/talk/


Re: How to Run High Capacity Tor Relays

2010-09-01 Thread John Case



Also, afaik, zero people in the wild are actively running Tor with any
crypto accelerator. May be a very painful process... I'm not really
interested in documenting it unless its proven to scale by actual use.
I want this document to end up with tested and reproduced results
only. You know, Science. Not computerscience ;)



There was a _very_ interesting, long and detailed discussion of this about 
1 year ago on this list.


I really do think some subset of that discussion should be included in 
your "lore", at the very least the parts pertaining to the built-in crypto 
acceleration included in recent sparc CPUs, which appear to be the only 
non-painful way to make this work.


My impression was that a significant boost could be had by accelerating 
openssl using this on-chip features...

***
To unsubscribe, send an e-mail to majord...@torproject.org with
unsubscribe or-talkin the body. http://archives.seul.org/or/talk/


Re: How to Run High Capacity Tor Relays

2010-08-25 Thread Mike Perry
I should have said this in my first post, but I believe that all
subsequent replies should go to tor-relays. This should be the last
post discussing technical details of relay operation on or-talk.


Thus spake coderman (coder...@gmail.com):

> > net.ipv4.tcp_keepalive_time = 1200
> 
> ^- who uses keepalive? :)

Hrmm, Tor does its own application-level keepalive. Perhaps that's how
this got merged in by confusion. Or maybe, like many of these, it was
just a blanket cut+and+paste move out of desperation to try to
increase capacity. The whole superset of voodoo thing.
 
> > net.netfilter.nf_conntrack_tcp_timeout_established=7200
> > net.netfilter.nf_conntrack_checksum=0
> > net.netfilter.nf_conntrack_max=131072
> > net.netfilter.nf_conntrack_tcp_timeout_syn_sent=15
> 
> ^- best to just disable conntrack altogether if you can. -J NOTRACK in
> the raw table as appropriate.
> you're going to each up lots of memory with a decent nf|ip_conntrack_max
> ( check /proc/sys/net/ipv4/netfilter/ip_conntrack_max , etc )

Will this remove the ability to do PREROUTING DNAT rules? I know a lot
of Tor nodes forward ports and even IPs around.

Good suggestion though. Perhaps we should mention both options in the
final draft.

> > [...]
> some dupes in here?
> 
> > net.ipv4.ip_forward=1
> > ...
> > net.ipv4.conf.default.forwarding=1
> > net.ipv4.conf.default.proxy_arp = 1
> 
> ^- BAD! this should not be enabled by default unless you're actually
> routing specifically to guest vm's or between interfaces or something.
> if you enable forwarding by default, someone may use you to relay some
> malicious traffic.

Oh shit, that is a relic of Mortiz's config. He is also planning to
provide VPN and VPS services. Good catch.

Also, does DNAT count as forwarding for the ip_forward option? 
 
> > == Did I leave anything out? ==
> >
> > Well, did I?
> 
> i'd love to see an sca6000 accelerated node.  been working with these
> recently but unfortunately they're allocated for other work...
> (most of the other crypto hw is going to be bus / implementation
> limited to less than what a beefy 64bit modern server can provide, so
> of little utility in this context.)

I'd love to hear Roger and Nick's comments on this, but isn't it
possible this might also bottleneck well before 1Gbit? I am worried it
may depend largely on the architecture of the card and our use of
openssl. Their docs claim "up to 1Gbit" but this could be using highly
parallelized processing, which tor cannot really do, as I understand
it.

Personally I think the hyperthreading option is the lowest hanging
fruit for maxing out a single Tor relay process for lowest cost.

Also, afaik, zero people in the wild are actively running Tor with any
crypto accelerator. May be a very painful process... I'm not really
interested in documenting it unless its proven to scale by actual use.
I want this document to end up with tested and reproduced results
only. You know, Science. Not computerscience ;)


-- 
Mike Perry
Mad Computer Scientist
fscked.org evil labs


pgpUMXxamWLCJ.pgp
Description: PGP signature


Re: How to Run High Capacity Tor Relays

2010-08-24 Thread coderman
On Tue, Aug 24, 2010 at 8:27 AM, Mike Perry  wrote:
> ...
> # Set the hard limit of open file descriptors really high.
> # Tor will also potentially run out of ports.
> ulimit -SHn 65000

typically in /etc/security/limits.conf. i like to append:
*   softnofile  4096
*   hardnofile  65535

but on big servers use .25mm as hard limit. (Tor not this fd hungry,
64k is fine)


> # Load an amalgam of gigabit-tuning sysctls from:
> ...
> # We have no idea which of these are needed yet for our actual use
> # case, but they do help (especially the nf-contrack ones):

you probably want to save in /etc/sysctl.conf , then sysctl -p


> ...
> net.ipv4.tcp_rmem = 4096 87380 16777216
> net.ipv4.tcp_wmem = 4096 65536 16777216
> net.core.netdev_max_backlog = 2500
> net.ipv4.tcp_no_metrics_save = 1
> net.ipv4.tcp_moderate_rcvbuf = 1
> net.core.rmem_max = 1048575
> net.core.wmem_max = 1048575

^- these are important and useful



> net.ipv4.ip_local_port_range = 1025 61000

^- that's a little aggressive, better to set FIN timeout lower. i like
5000 to 65535 ephemeral port range


> net.ipv4.tcp_max_syn_backlog = 10240
> net.ipv4.tcp_fin_timeout = 30

^- i like a fin timeout of 3-4 seconds on a busy server, otherwise
you've got lots of resources tied up in sockets waiting to die...  Tor
not quite so volatile as some services, so perhaps 30 is fine.


> net.ipv4.tcp_keepalive_time = 1200

^- who uses keepalive? :)


> net.netfilter.nf_conntrack_tcp_timeout_established=7200
> net.netfilter.nf_conntrack_checksum=0
> net.netfilter.nf_conntrack_max=131072
> net.netfilter.nf_conntrack_tcp_timeout_syn_sent=15

^- best to just disable conntrack altogether if you can. -J NOTRACK in
the raw table as appropriate.
you're going to each up lots of memory with a decent nf|ip_conntrack_max
( check /proc/sys/net/ipv4/netfilter/ip_conntrack_max , etc )


> [...]
some dupes in here?

> net.ipv4.ip_forward=1
> ...
> net.ipv4.conf.default.forwarding=1
> net.ipv4.conf.default.proxy_arp = 1

^- BAD! this should not be enabled by default unless you're actually
routing specifically to guest vm's or between interfaces or something.
if you enable forwarding by default, someone may use you to relay some
malicious traffic.

were these cut and paste errors?  remember to disable forwarding
first, before tuning other parameters, as changing this value will
reset some others back to defaults. (!!)


> net.ipv4.tcp_syncookies = 1

^- not usually worth the overhead?


> net.ipv4.conf.all.rp_filter = 1

^- note that you need to be precise with your routing metrics and such
for multi-homed with rp_filter enabled. also, this costs resources,
and if you can avoid it, do so.


> net.ipv4.conf.default.send_redirects = 1
> net.ipv4.conf.all.send_redirects = 0

^- don't know if these are too useful either. i prefer to limit ICMP
beyond this. (perhaps related to forwarding defaults above.) Ex:
echo "1" > /proc/sys/net/ipv4/icmp_echo_ignore_broadcasts
echo "1" > /proc/sys/net/ipv4/icmp_echo_ignore_all
echo "0" > /proc/sys/net/ipv4/conf/all/accept_redirects
echo "1" > /proc/sys/net/ipv4/icmp_ignore_bogus_error_responses





> == Did I leave anything out? ==
>
> Well, did I?

i'd love to see an sca6000 accelerated node.  been working with these
recently but unfortunately they're allocated for other work...
(most of the other crypto hw is going to be bus / implementation
limited to less than what a beefy 64bit modern server can provide, so
of little utility in this context.)

best regards,
***
To unsubscribe, send an e-mail to majord...@torproject.org with
unsubscribe or-talkin the body. http://archives.seul.org/or/talk/