Re: [Tor-BSD] Recognizing Randomness Exhaustion

2015-03-04 Thread Henning Brauer
* Libertas liber...@mykolab.com [2015-01-02 06:25]:
 I've tuned PF parameters in the past, but it doesn't seem to be the
 issue. My current pfctl and netstat -m outputs suggest that there are
 more than enough available resources and no reported failures.

just a sidenote, it is safe to bump the default state limit, very far
even on anything semi-modern. the default limit of 10k states is good
for workstations and the like or tiny embedded-style deployments. I've
gone up to 2M, things get a bit slow if your state table really is
that big but everything keeps working.

 I remember someone on tor-...@list.nycbug.org suggesting that it could
 be at least partially due to PF being slower than other OS's firewalls.

I feel offended :)
Pretty certainly not.

 However, we're now finding that a profusion of gettimeofday() syscalls
 may be the issue. It was independently discovered by the operator of
 IPredator, the highest-bandwidth Tor relay:
 
   https://ipredator.se/guide/torserver#performance
 
 My 800 KB/s exit node had up to 7,000 gettimeofday() calls a second,
 along with hundreds of clock_gettime() calls.

those aren't all that cheap...

-- 
Henning Brauer, h...@bsws.de, henn...@openbsd.org
BS Web Services GmbH, http://bsws.de, Full-Service ISP
Secure Hosting, Mail and DNS. Virtual  Dedicated Servers, Root to Fully Managed
Henning Brauer Consulting, http://henningbrauer.com/



Re: [Tor-BSD] Recognizing Randomness Exhaustion

2015-01-03 Thread Stuart Henderson
On 2015-01-01, Miod Vallat m...@online.fr wrote:
  I should have also specified that I didn't just go ahead and enable them
  because I wasn't sure if they're considered safe. I like abiding by
  OpenBSD's crypto best practices when possible.
  
  Is there any reason why they're disabled by default?
 
 Compiler bugs generate incorrect code for 128 bit integers.

 In slightly more words, we have tried enabling this code, and found out
 the hard way that, when compiled by the system compiler under OpenBSD,
 it would generate slightly wrong code, and cause computations to be
 subtly wrong.

 Until someone spends enough time checking the various compiler versions
 around to check which are safe to use, and which are not, this code will
 remain disabled in LibreSSL.

The specific failure we saw was in openssh; key_parse_private_pem: bad
ECDSA key when reading a saved id_ecdsa.



Re: Tor BSD underperformance (was [Tor-BSD] Recognizing Randomness Exhaustion)

2015-01-03 Thread Greg Troxel
teor teor2...@gmail.com writes:

 Tor 0.2.6.2-alpha (just in the process of being released) has some
 changes to queuing behaviour using the KIST algorithm.

 The KIST algorithm keeps the queues inside tor, and makes
 prioritisation decisions from there, rather than writing as much as
 possible to the OS TCP queues. I'm not sure how functional it is on
 *BSDs, but Nick Mathewson should be able to comment on that. (I've
 cc'd tor-dev and Nick.)

From skimming the KIST paper (I will read in detail when I find time),
it seems they are claiming increase in throughput of around 10%, with
the main benefit being lower latency.  So while that sounds great, it
doesn't seem like lack of KIST is the reason for the apparent 3x
slowdown observed in OpenBSD.

Does anyone have experience to report on any platform other than Linux
or OSX?

[demime 1.01d removed an attachment of type application/pgp-signature]



Re: [Tor-BSD] Recognizing Randomness Exhaustion

2015-01-01 Thread Greg Troxel
Libertas liber...@mykolab.com writes:

 Some of the people at tor-...@lists.nycbug.org and I are trying to
 figure out why Tor relays under-perform when running on OpenBSD. Many
 such relays aren't even close to being network-bound,
 file-descriptor-bound, memory-bound, or CPU-bound, but relay at least
 33-50% less traffic than would be expected of a Linux machine in the
 same situation.

I'm more familiar with NetBSD, but hopefully my comments are helpful.

 For those not familiar, a Tor relay will eventually have an open TCP
 connection for each of the other 6,000 active relays, and (if it allows
 exit traffic) must make outside TCP connections for the user's requests,
 so it's pretty file-hungry and crypto-intensive.

It may also have something to do with TCP.  A few thoughts:

* run netstat -f inet and look and the send queues.  That's not really
  cleanly diagnostic, but if they are all huge, it's a clue

* run netstat -m and vmstat -m (not sure those map from NetBSD).  Look
  for runnig out of mbufs and mbuf clusters.   Perhaps bump up
  NMBCLUSTERS in the kernel if it's not dynamic.

* Take a critical look at your TCP performance.  This is not that easy,
  but it's very informatve.   Get and install xplot:
http://www.xplot.org/
  Take traces of v4 tcp trafffic with
tcpdump -wTCP -i wm0 ip and tcp
  and then
tcpdump -r TCP -tt -n -S | tcpdump2xplot
  Then you'll need to read all the xplot READMEs (see the source).  This
  will show you tcp transmitted segments, sack blocks, the ack line, dup
  acks, and other TCP behavior.  It's not that easy to follow, but if
  you understand TCP you'll be able to spot odd behavior far faster than
  reading text traces.   It's possible that tcpdump2xplot may mishandle
  OpenBSD's tcpdump output - it's perl to turn text back into bits, and
  it's broken over the years with tcpdump upgrades.

  You may well not want to send me a trace, but if you send me the
  binary pcap, the text version above, or the tcpdump2xplot files, I can
  take a look.

 One possible explanation is that its randomness store gets exhausted. I
 once saw errors like this in my Tor logs, but I don't know how to test
 if it's a chronic problem. I also couldn't find anything online. Is
 there any easy way to test if this is the bottleneck?

On NetBSD, there is rndctl -s.  I would expect the same or similar in
OpenBSD, and you can look every second to see if there are bits still in
the pool.  I don't think this will turn out to be the issue, though, if
you're seeing 30% of what you think you should - I would expect the
performance hit due to running out of bits to be much bigger.

Greg

[demime 1.01d removed an attachment of type application/pgp-signature]



Re: [Tor-BSD] Recognizing Randomness Exhaustion

2015-01-01 Thread Ted Unangst
On Wed, Dec 31, 2014 at 19:42, Libertas wrote:
 Thanks for this!
 
 I should have also specified that I didn't just go ahead and enable them
 because I wasn't sure if they're considered safe. I like abiding by
 OpenBSD's crypto best practices when possible.
 
 Is there any reason why they're disabled by default?

Compiler bugs generate incorrect code for 128 bit integers.



Re: [Tor-BSD] Recognizing Randomness Exhaustion

2015-01-01 Thread Miod Vallat
  I should have also specified that I didn't just go ahead and enable them
  because I wasn't sure if they're considered safe. I like abiding by
  OpenBSD's crypto best practices when possible.
  
  Is there any reason why they're disabled by default?
 
 Compiler bugs generate incorrect code for 128 bit integers.

In slightly more words, we have tried enabling this code, and found out
the hard way that, when compiled by the system compiler under OpenBSD,
it would generate slightly wrong code, and cause computations to be
subtly wrong.

Until someone spends enough time checking the various compiler versions
around to check which are safe to use, and which are not, this code will
remain disabled in LibreSSL.

Miod



Re: [Tor-BSD] Recognizing Randomness Exhaustion

2015-01-01 Thread Libertas
I've tuned PF parameters in the past, but it doesn't seem to be the
issue. My current pfctl and netstat -m outputs suggest that there are
more than enough available resources and no reported failures.

I remember someone on tor-...@list.nycbug.org suggesting that it could
be at least partially due to PF being slower than other OS's firewalls.

However, we're now finding that a profusion of gettimeofday() syscalls
may be the issue. It was independently discovered by the operator of
IPredator, the highest-bandwidth Tor relay:

https://ipredator.se/guide/torserver#performance

My 800 KB/s exit node had up to 7,000 gettimeofday() calls a second,
along with hundreds of clock_gettime() calls.

Because IPredator runs Linux, he used vsyscalls to speed things up.
We'll probably need to find something more creative, like using our time
caching more.

We're working on it with this ticket:

https://trac.torproject.org/projects/tor/ticket/14056

On 01/01/2015 10:45 PM, Richard Johnson wrote:
 It can also be pf-state-hungry. Further, each upstream peer Tor node, and 
 each 
 client on a Tor entry node, will probably be a pf src.
 
 Packets being dropped and circuits failing when the pf default limits topped 
 out would naturally present to the tor bandwidth authorities as network 
 congestion.
 
 In my case, I'm now fairly certain my relays usage grew to the point where 
 they were allocation-bound in pf. The host was still using the pf defaults 
 until recently.
 
 Since increasing the pf limits, I'm seeing better throughput. The current 
 entries from pfctl -si currently reach 35k instead of hitting the default 
 limit of 10k. Also, state inserts and removals are up to 50/s from 29/s, and 
 matches are topping 56/s instead of 30/s. As well, the pfctl -si memory 
 could 
 not be allocated counter remains a reassuring 0 instead of increasing at 
 0.9/s. Additionally, netstat -m counters for pf* have a reassuring 0 in the 
 failure column of the memory resource pool stats. Finally, Tor network 
 traffic 
 seems to have started climbing.
 
 I increased the limits thusly, since the host does nothing but Tor and 
 unbound 
 for Tor DNS.
 
 | # don't choke on lots of circuits (default is states 1,
 | # src-nodes 1, frags 1536)
 | set limit { states 10, src-nodes 10, frags 8000, \



Re: [Tor-BSD] Recognizing Randomness Exhaustion

2015-01-01 Thread Richard Johnson

On 2014-12-31 11:21, Libertas wrote:

For those not familiar, a Tor relay will eventually have an open TCP
connection for each of the other 6,000 active relays, and (if it allows
exit traffic) must make outside TCP connections for the user's requests,
so it's pretty file-hungry and crypto-intensive.


It can also be pf-state-hungry. Further, each upstream peer Tor node, and each 
client on a Tor entry node, will probably be a pf src.


Packets being dropped and circuits failing when the pf default limits topped 
out would naturally present to the tor bandwidth authorities as network 
congestion.


In my case, I'm now fairly certain my relays usage grew to the point where 
they were allocation-bound in pf. The host was still using the pf defaults 
until recently.


Since increasing the pf limits, I'm seeing better throughput. The current 
entries from pfctl -si currently reach 35k instead of hitting the default 
limit of 10k. Also, state inserts and removals are up to 50/s from 29/s, and 
matches are topping 56/s instead of 30/s. As well, the pfctl -si memory could 
not be allocated counter remains a reassuring 0 instead of increasing at 
0.9/s. Additionally, netstat -m counters for pf* have a reassuring 0 in the 
failure column of the memory resource pool stats. Finally, Tor network traffic 
seems to have started climbing.


I increased the limits thusly, since the host does nothing but Tor and unbound 
for Tor DNS.


| # don't choke on lots of circuits (default is states 1,
| # src-nodes 1, frags 1536)
| set limit { states 10, src-nodes 10, frags 8000, \


One possible explanation is that its randomness store gets exhausted. I
once saw errors like this in my Tor logs, but I don't know how to test
if it's a chronic problem. I also couldn't find anything online. Is
there any easy way to test if this is the bottleneck?


I suspect Tor won't exhaust randomness; random(4) shouldn't block. (From a 
cursory look at the source, Tor references /dev/urandom, and doesn't use 
arc4random.)



Richard



Re: [Tor-BSD] Recognizing Randomness Exhaustion

2014-12-31 Thread Libertas
I also completely forgot to mention the below warning, which Tor
0.2.5.10 (the current release) gives when run on OpenBSD 5.6-stable amd64:

 We were built to run on a 64-bit CPU, with OpenSSL 1.0.1 or later,
 but with a version of OpenSSL that apparently lacks accelerated
 support for the NIST P-224 and P-256 groups. Building openssl with
 such support (using the enable-ec_nistp_64_gcc_128 option when
 configuring it) would make ECDH much faster.

Were the mentioned SSL features removed from LibreSSL, or have they not
yet been introduced? Could this be the culprit?



Re: [Tor-BSD] Recognizing Randomness Exhaustion

2014-12-31 Thread Carlin Bingham
On Thu, 1 Jan 2015, at 11:49 AM, Libertas wrote:
 I also completely forgot to mention the below warning, which Tor
 0.2.5.10 (the current release) gives when run on OpenBSD 5.6-stable
 amd64:
 
  We were built to run on a 64-bit CPU, with OpenSSL 1.0.1 or later,
  but with a version of OpenSSL that apparently lacks accelerated
  support for the NIST P-224 and P-256 groups. Building openssl with
  such support (using the enable-ec_nistp_64_gcc_128 option when
  configuring it) would make ECDH much faster.
 
 Were the mentioned SSL features removed from LibreSSL, or have they not
 yet been introduced? Could this be the culprit?
 

It appears the code is still there, just isn't enabled by default. Some
searching suggests that OpenSSL doesn't enable it by default either as
the config script can't automatically work out if the platform supports
it.

As a test I edited /usr/include/openssl/opensslfeatures.h to remove the
OPENSSL_NO_EC_NISTP_64_GCC_128 define, and rebuilt libcrypto.


running `openssl speed ecdhp224 ecdhp256`

without acceleration:

  op  op/s
 224 bit ecdh (nistp224)   0.0003s   3113.0
 256 bit ecdh (nistp256)   0.0004s   2779.1


with acceleration:

  op  op/s
 224 bit ecdh (nistp224)   0.0001s  10556.8
 256 bit ecdh (nistp256)   0.0002s   4232.4


--
Carlin



Re: [Tor-BSD] Recognizing Randomness Exhaustion

2014-12-31 Thread Libertas
Thanks for this!

I should have also specified that I didn't just go ahead and enable them
because I wasn't sure if they're considered safe. I like abiding by
OpenBSD's crypto best practices when possible.

Is there any reason why they're disabled by default?

On another note, I was skeptical about this being the cause because even
OpenBSD Tor relays using only =12% of their CPU capacity have the
characteristic underperformance. Unless there's a latency issue caused
by this, I feel like it's probably something else.

On another note, I'm looking into system call statistics and other ways
to find the problem here. I'm very new to this, so suggestions on tools
and techniques are appreciated.

On 12/31/2014 06:47 PM, Carlin Bingham wrote:
 On Thu, 1 Jan 2015, at 11:49 AM, Libertas wrote:
 I also completely forgot to mention the below warning, which Tor
 0.2.5.10 (the current release) gives when run on OpenBSD 5.6-stable
 amd64:

 We were built to run on a 64-bit CPU, with OpenSSL 1.0.1 or later,
 but with a version of OpenSSL that apparently lacks accelerated
 support for the NIST P-224 and P-256 groups. Building openssl with
 such support (using the enable-ec_nistp_64_gcc_128 option when
 configuring it) would make ECDH much faster.

 Were the mentioned SSL features removed from LibreSSL, or have they not
 yet been introduced? Could this be the culprit?

 
 It appears the code is still there, just isn't enabled by default. Some
 searching suggests that OpenSSL doesn't enable it by default either as
 the config script can't automatically work out if the platform supports
 it.
 
 As a test I edited /usr/include/openssl/opensslfeatures.h to remove the
 OPENSSL_NO_EC_NISTP_64_GCC_128 define, and rebuilt libcrypto.
 
 
 running `openssl speed ecdhp224 ecdhp256`
 
 without acceleration:
 
   op  op/s
  224 bit ecdh (nistp224)   0.0003s   3113.0
  256 bit ecdh (nistp256)   0.0004s   2779.1
 
 
 with acceleration:
 
   op  op/s
  224 bit ecdh (nistp224)   0.0001s  10556.8
  256 bit ecdh (nistp256)   0.0002s   4232.4
 
 
 --
 Carlin



Re: Tor BSD underperformance (was [Tor-BSD] Recognizing Randomness Exhaustion)

2014-12-31 Thread teor
On 1 Jan 2015, at 07:39 , Greg Troxel g...@lexort.com wrote:

 Libertas liber...@mykolab.com writes:

 Some of the people at tor-...@lists.nycbug.org and I are trying to
 figure out why Tor relays under-perform when running on OpenBSD. Many
 such relays aren't even close to being network-bound,
 file-descriptor-bound, memory-bound, or CPU-bound, but relay at least
 33-50% less traffic than would be expected of a Linux machine in the
 same situation.

 I'm more familiar with NetBSD, but hopefully my comments are helpful.

 For those not familiar, a Tor relay will eventually have an open TCP
 connection for each of the other 6,000 active relays, and (if it allows
 exit traffic) must make outside TCP connections for the user's requests,
 so it's pretty file-hungry and crypto-intensive.

 It may also have something to do with TCP.  A few thoughts:

 * run netstat -f inet and look and the send queues.  That's not really
  cleanly diagnostic, but if they are all huge, it's a clue

 * run netstat -m and vmstat -m (not sure those map from NetBSD).  Look
  for runnig out of mbufs and mbuf clusters.   Perhaps bump up
  NMBCLUSTERS in the kernel if it's not dynamic.

Tor 0.2.6.2-alpha (just in the process of being released) has some changes to
queuing behaviour using the KIST algorithm.

The KIST algorithm keeps the queues inside tor, and makes prioritisation
decisions from there, rather than writing as much as possible to the OS TCP
queues. I'm not sure how functional it is on *BSDs, but Nick Mathewson should
be able to comment on that. (I've cc'd tor-dev and Nick.)


teor
pgp 0xABFED1AC
hkp://pgp.mit.edu/
https://gist.github.com/teor2345/d033b8ce0a99adbc89c5
http://0bin.net/paste/Mu92kPyphK0bqmbA#Zvt3gzMrSCAwDN6GKsUk7Q8G-eG+Y+BLpe7wtm
U66Mx

[demime 1.01d removed an attachment of type application/pgp-signature which had 
a name of signature.asc]