Re: [Tor-BSD] Recognizing Randomness Exhaustion
* Libertas liber...@mykolab.com [2015-01-02 06:25]: I've tuned PF parameters in the past, but it doesn't seem to be the issue. My current pfctl and netstat -m outputs suggest that there are more than enough available resources and no reported failures. just a sidenote, it is safe to bump the default state limit, very far even on anything semi-modern. the default limit of 10k states is good for workstations and the like or tiny embedded-style deployments. I've gone up to 2M, things get a bit slow if your state table really is that big but everything keeps working. I remember someone on tor-...@list.nycbug.org suggesting that it could be at least partially due to PF being slower than other OS's firewalls. I feel offended :) Pretty certainly not. However, we're now finding that a profusion of gettimeofday() syscalls may be the issue. It was independently discovered by the operator of IPredator, the highest-bandwidth Tor relay: https://ipredator.se/guide/torserver#performance My 800 KB/s exit node had up to 7,000 gettimeofday() calls a second, along with hundreds of clock_gettime() calls. those aren't all that cheap... -- Henning Brauer, h...@bsws.de, henn...@openbsd.org BS Web Services GmbH, http://bsws.de, Full-Service ISP Secure Hosting, Mail and DNS. Virtual Dedicated Servers, Root to Fully Managed Henning Brauer Consulting, http://henningbrauer.com/
Re: [Tor-BSD] Recognizing Randomness Exhaustion
On 2015-01-01, Miod Vallat m...@online.fr wrote: I should have also specified that I didn't just go ahead and enable them because I wasn't sure if they're considered safe. I like abiding by OpenBSD's crypto best practices when possible. Is there any reason why they're disabled by default? Compiler bugs generate incorrect code for 128 bit integers. In slightly more words, we have tried enabling this code, and found out the hard way that, when compiled by the system compiler under OpenBSD, it would generate slightly wrong code, and cause computations to be subtly wrong. Until someone spends enough time checking the various compiler versions around to check which are safe to use, and which are not, this code will remain disabled in LibreSSL. The specific failure we saw was in openssh; key_parse_private_pem: bad ECDSA key when reading a saved id_ecdsa.
Re: Tor BSD underperformance (was [Tor-BSD] Recognizing Randomness Exhaustion)
teor teor2...@gmail.com writes: Tor 0.2.6.2-alpha (just in the process of being released) has some changes to queuing behaviour using the KIST algorithm. The KIST algorithm keeps the queues inside tor, and makes prioritisation decisions from there, rather than writing as much as possible to the OS TCP queues. I'm not sure how functional it is on *BSDs, but Nick Mathewson should be able to comment on that. (I've cc'd tor-dev and Nick.) From skimming the KIST paper (I will read in detail when I find time), it seems they are claiming increase in throughput of around 10%, with the main benefit being lower latency. So while that sounds great, it doesn't seem like lack of KIST is the reason for the apparent 3x slowdown observed in OpenBSD. Does anyone have experience to report on any platform other than Linux or OSX? [demime 1.01d removed an attachment of type application/pgp-signature]
Re: [Tor-BSD] Recognizing Randomness Exhaustion
Libertas liber...@mykolab.com writes: Some of the people at tor-...@lists.nycbug.org and I are trying to figure out why Tor relays under-perform when running on OpenBSD. Many such relays aren't even close to being network-bound, file-descriptor-bound, memory-bound, or CPU-bound, but relay at least 33-50% less traffic than would be expected of a Linux machine in the same situation. I'm more familiar with NetBSD, but hopefully my comments are helpful. For those not familiar, a Tor relay will eventually have an open TCP connection for each of the other 6,000 active relays, and (if it allows exit traffic) must make outside TCP connections for the user's requests, so it's pretty file-hungry and crypto-intensive. It may also have something to do with TCP. A few thoughts: * run netstat -f inet and look and the send queues. That's not really cleanly diagnostic, but if they are all huge, it's a clue * run netstat -m and vmstat -m (not sure those map from NetBSD). Look for runnig out of mbufs and mbuf clusters. Perhaps bump up NMBCLUSTERS in the kernel if it's not dynamic. * Take a critical look at your TCP performance. This is not that easy, but it's very informatve. Get and install xplot: http://www.xplot.org/ Take traces of v4 tcp trafffic with tcpdump -wTCP -i wm0 ip and tcp and then tcpdump -r TCP -tt -n -S | tcpdump2xplot Then you'll need to read all the xplot READMEs (see the source). This will show you tcp transmitted segments, sack blocks, the ack line, dup acks, and other TCP behavior. It's not that easy to follow, but if you understand TCP you'll be able to spot odd behavior far faster than reading text traces. It's possible that tcpdump2xplot may mishandle OpenBSD's tcpdump output - it's perl to turn text back into bits, and it's broken over the years with tcpdump upgrades. You may well not want to send me a trace, but if you send me the binary pcap, the text version above, or the tcpdump2xplot files, I can take a look. One possible explanation is that its randomness store gets exhausted. I once saw errors like this in my Tor logs, but I don't know how to test if it's a chronic problem. I also couldn't find anything online. Is there any easy way to test if this is the bottleneck? On NetBSD, there is rndctl -s. I would expect the same or similar in OpenBSD, and you can look every second to see if there are bits still in the pool. I don't think this will turn out to be the issue, though, if you're seeing 30% of what you think you should - I would expect the performance hit due to running out of bits to be much bigger. Greg [demime 1.01d removed an attachment of type application/pgp-signature]
Re: [Tor-BSD] Recognizing Randomness Exhaustion
On Wed, Dec 31, 2014 at 19:42, Libertas wrote: Thanks for this! I should have also specified that I didn't just go ahead and enable them because I wasn't sure if they're considered safe. I like abiding by OpenBSD's crypto best practices when possible. Is there any reason why they're disabled by default? Compiler bugs generate incorrect code for 128 bit integers.
Re: [Tor-BSD] Recognizing Randomness Exhaustion
I should have also specified that I didn't just go ahead and enable them because I wasn't sure if they're considered safe. I like abiding by OpenBSD's crypto best practices when possible. Is there any reason why they're disabled by default? Compiler bugs generate incorrect code for 128 bit integers. In slightly more words, we have tried enabling this code, and found out the hard way that, when compiled by the system compiler under OpenBSD, it would generate slightly wrong code, and cause computations to be subtly wrong. Until someone spends enough time checking the various compiler versions around to check which are safe to use, and which are not, this code will remain disabled in LibreSSL. Miod
Re: [Tor-BSD] Recognizing Randomness Exhaustion
I've tuned PF parameters in the past, but it doesn't seem to be the issue. My current pfctl and netstat -m outputs suggest that there are more than enough available resources and no reported failures. I remember someone on tor-...@list.nycbug.org suggesting that it could be at least partially due to PF being slower than other OS's firewalls. However, we're now finding that a profusion of gettimeofday() syscalls may be the issue. It was independently discovered by the operator of IPredator, the highest-bandwidth Tor relay: https://ipredator.se/guide/torserver#performance My 800 KB/s exit node had up to 7,000 gettimeofday() calls a second, along with hundreds of clock_gettime() calls. Because IPredator runs Linux, he used vsyscalls to speed things up. We'll probably need to find something more creative, like using our time caching more. We're working on it with this ticket: https://trac.torproject.org/projects/tor/ticket/14056 On 01/01/2015 10:45 PM, Richard Johnson wrote: It can also be pf-state-hungry. Further, each upstream peer Tor node, and each client on a Tor entry node, will probably be a pf src. Packets being dropped and circuits failing when the pf default limits topped out would naturally present to the tor bandwidth authorities as network congestion. In my case, I'm now fairly certain my relays usage grew to the point where they were allocation-bound in pf. The host was still using the pf defaults until recently. Since increasing the pf limits, I'm seeing better throughput. The current entries from pfctl -si currently reach 35k instead of hitting the default limit of 10k. Also, state inserts and removals are up to 50/s from 29/s, and matches are topping 56/s instead of 30/s. As well, the pfctl -si memory could not be allocated counter remains a reassuring 0 instead of increasing at 0.9/s. Additionally, netstat -m counters for pf* have a reassuring 0 in the failure column of the memory resource pool stats. Finally, Tor network traffic seems to have started climbing. I increased the limits thusly, since the host does nothing but Tor and unbound for Tor DNS. | # don't choke on lots of circuits (default is states 1, | # src-nodes 1, frags 1536) | set limit { states 10, src-nodes 10, frags 8000, \
Re: [Tor-BSD] Recognizing Randomness Exhaustion
On 2014-12-31 11:21, Libertas wrote: For those not familiar, a Tor relay will eventually have an open TCP connection for each of the other 6,000 active relays, and (if it allows exit traffic) must make outside TCP connections for the user's requests, so it's pretty file-hungry and crypto-intensive. It can also be pf-state-hungry. Further, each upstream peer Tor node, and each client on a Tor entry node, will probably be a pf src. Packets being dropped and circuits failing when the pf default limits topped out would naturally present to the tor bandwidth authorities as network congestion. In my case, I'm now fairly certain my relays usage grew to the point where they were allocation-bound in pf. The host was still using the pf defaults until recently. Since increasing the pf limits, I'm seeing better throughput. The current entries from pfctl -si currently reach 35k instead of hitting the default limit of 10k. Also, state inserts and removals are up to 50/s from 29/s, and matches are topping 56/s instead of 30/s. As well, the pfctl -si memory could not be allocated counter remains a reassuring 0 instead of increasing at 0.9/s. Additionally, netstat -m counters for pf* have a reassuring 0 in the failure column of the memory resource pool stats. Finally, Tor network traffic seems to have started climbing. I increased the limits thusly, since the host does nothing but Tor and unbound for Tor DNS. | # don't choke on lots of circuits (default is states 1, | # src-nodes 1, frags 1536) | set limit { states 10, src-nodes 10, frags 8000, \ One possible explanation is that its randomness store gets exhausted. I once saw errors like this in my Tor logs, but I don't know how to test if it's a chronic problem. I also couldn't find anything online. Is there any easy way to test if this is the bottleneck? I suspect Tor won't exhaust randomness; random(4) shouldn't block. (From a cursory look at the source, Tor references /dev/urandom, and doesn't use arc4random.) Richard
Re: [Tor-BSD] Recognizing Randomness Exhaustion
I also completely forgot to mention the below warning, which Tor 0.2.5.10 (the current release) gives when run on OpenBSD 5.6-stable amd64: We were built to run on a 64-bit CPU, with OpenSSL 1.0.1 or later, but with a version of OpenSSL that apparently lacks accelerated support for the NIST P-224 and P-256 groups. Building openssl with such support (using the enable-ec_nistp_64_gcc_128 option when configuring it) would make ECDH much faster. Were the mentioned SSL features removed from LibreSSL, or have they not yet been introduced? Could this be the culprit?
Re: [Tor-BSD] Recognizing Randomness Exhaustion
On Thu, 1 Jan 2015, at 11:49 AM, Libertas wrote: I also completely forgot to mention the below warning, which Tor 0.2.5.10 (the current release) gives when run on OpenBSD 5.6-stable amd64: We were built to run on a 64-bit CPU, with OpenSSL 1.0.1 or later, but with a version of OpenSSL that apparently lacks accelerated support for the NIST P-224 and P-256 groups. Building openssl with such support (using the enable-ec_nistp_64_gcc_128 option when configuring it) would make ECDH much faster. Were the mentioned SSL features removed from LibreSSL, or have they not yet been introduced? Could this be the culprit? It appears the code is still there, just isn't enabled by default. Some searching suggests that OpenSSL doesn't enable it by default either as the config script can't automatically work out if the platform supports it. As a test I edited /usr/include/openssl/opensslfeatures.h to remove the OPENSSL_NO_EC_NISTP_64_GCC_128 define, and rebuilt libcrypto. running `openssl speed ecdhp224 ecdhp256` without acceleration: op op/s 224 bit ecdh (nistp224) 0.0003s 3113.0 256 bit ecdh (nistp256) 0.0004s 2779.1 with acceleration: op op/s 224 bit ecdh (nistp224) 0.0001s 10556.8 256 bit ecdh (nistp256) 0.0002s 4232.4 -- Carlin
Re: [Tor-BSD] Recognizing Randomness Exhaustion
Thanks for this! I should have also specified that I didn't just go ahead and enable them because I wasn't sure if they're considered safe. I like abiding by OpenBSD's crypto best practices when possible. Is there any reason why they're disabled by default? On another note, I was skeptical about this being the cause because even OpenBSD Tor relays using only =12% of their CPU capacity have the characteristic underperformance. Unless there's a latency issue caused by this, I feel like it's probably something else. On another note, I'm looking into system call statistics and other ways to find the problem here. I'm very new to this, so suggestions on tools and techniques are appreciated. On 12/31/2014 06:47 PM, Carlin Bingham wrote: On Thu, 1 Jan 2015, at 11:49 AM, Libertas wrote: I also completely forgot to mention the below warning, which Tor 0.2.5.10 (the current release) gives when run on OpenBSD 5.6-stable amd64: We were built to run on a 64-bit CPU, with OpenSSL 1.0.1 or later, but with a version of OpenSSL that apparently lacks accelerated support for the NIST P-224 and P-256 groups. Building openssl with such support (using the enable-ec_nistp_64_gcc_128 option when configuring it) would make ECDH much faster. Were the mentioned SSL features removed from LibreSSL, or have they not yet been introduced? Could this be the culprit? It appears the code is still there, just isn't enabled by default. Some searching suggests that OpenSSL doesn't enable it by default either as the config script can't automatically work out if the platform supports it. As a test I edited /usr/include/openssl/opensslfeatures.h to remove the OPENSSL_NO_EC_NISTP_64_GCC_128 define, and rebuilt libcrypto. running `openssl speed ecdhp224 ecdhp256` without acceleration: op op/s 224 bit ecdh (nistp224) 0.0003s 3113.0 256 bit ecdh (nistp256) 0.0004s 2779.1 with acceleration: op op/s 224 bit ecdh (nistp224) 0.0001s 10556.8 256 bit ecdh (nistp256) 0.0002s 4232.4 -- Carlin
Re: Tor BSD underperformance (was [Tor-BSD] Recognizing Randomness Exhaustion)
On 1 Jan 2015, at 07:39 , Greg Troxel g...@lexort.com wrote: Libertas liber...@mykolab.com writes: Some of the people at tor-...@lists.nycbug.org and I are trying to figure out why Tor relays under-perform when running on OpenBSD. Many such relays aren't even close to being network-bound, file-descriptor-bound, memory-bound, or CPU-bound, but relay at least 33-50% less traffic than would be expected of a Linux machine in the same situation. I'm more familiar with NetBSD, but hopefully my comments are helpful. For those not familiar, a Tor relay will eventually have an open TCP connection for each of the other 6,000 active relays, and (if it allows exit traffic) must make outside TCP connections for the user's requests, so it's pretty file-hungry and crypto-intensive. It may also have something to do with TCP. A few thoughts: * run netstat -f inet and look and the send queues. That's not really cleanly diagnostic, but if they are all huge, it's a clue * run netstat -m and vmstat -m (not sure those map from NetBSD). Look for runnig out of mbufs and mbuf clusters. Perhaps bump up NMBCLUSTERS in the kernel if it's not dynamic. Tor 0.2.6.2-alpha (just in the process of being released) has some changes to queuing behaviour using the KIST algorithm. The KIST algorithm keeps the queues inside tor, and makes prioritisation decisions from there, rather than writing as much as possible to the OS TCP queues. I'm not sure how functional it is on *BSDs, but Nick Mathewson should be able to comment on that. (I've cc'd tor-dev and Nick.) teor pgp 0xABFED1AC hkp://pgp.mit.edu/ https://gist.github.com/teor2345/d033b8ce0a99adbc89c5 http://0bin.net/paste/Mu92kPyphK0bqmbA#Zvt3gzMrSCAwDN6GKsUk7Q8G-eG+Y+BLpe7wtm U66Mx [demime 1.01d removed an attachment of type application/pgp-signature which had a name of signature.asc]