Re: Patch: cprng_fast performance - please review.

2014-04-23 Thread Mindaugas Rasiukevicius
Thor Lancelot Simon t...@panix.com wrote: On Wed, Apr 16, 2014 at 09:52:22PM -0400, Thor Lancelot Simon wrote: Attached is a patch which makes cprng_fast per-CPU and lockless. *IT IS NOT WELL TESTED YET (I haven't even run test vectors) AND IS ONLY FOR REVIEW.* New diff, with some

Re: Patch: cprng_fast performance - please review.

2014-04-23 Thread David Laight
On Fri, Apr 18, 2014 at 02:41:07PM -0400, Thor Lancelot Simon wrote: Of the few systems which do have instructions that accellerate AES, on the most common implementation -- x86 -- we cannot use the instructions in question in the kernel because they use CPU state we do not save/ restore

cprng_fast and interrupts [was Re: Patch: cprng_fast performance - please review.]

2014-04-22 Thread Taylor R Campbell
Date: Fri, 18 Apr 2014 21:50:46 -0400 From: Thor Lancelot Simon t...@panix.com Are there actually any callers of cprng_fast at IPL_HIGH? Are there actually any legitimate random decisions to be made at IPL_HIGH? I'm sceptical. What do you get if you cross an elephant and a

Re: Patch: cprng_fast performance - please review.

2014-04-19 Thread darkstar
I would still suggest Salsa20 or ChaCha. My measurements with naive C code suggest that, if you buffer the output for short outputs, these take on average 40-50 Ivy Bridge cycles per request. (If you don't buffer the output, it's 300 cycles.) Long requests get ~4 cpb. In contrast, libc

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Izumi Tsutsui
tls@ wrote: @@ -160,6 +160,7 @@ include crypto/cast128/files.cast128 --- /dev/null 1 Jan 1970 00:00:00 - +++ crypto/hc128/hc128.c 17 Apr 2014 03:17:18 - : +static inline uint32_t +pack_littleendian(const uint8_t *v) +{ +#ifdef LITTLE_ENDIAN + return *((const

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Izumi Tsutsui
campbell+netbsd-tech-kern@ wrote: +void +hc128_init(hc128_state_t *state, const uint8_t *key, const uint8_t *iv) +{ + unsigned int i; + uint32_t w[1280], *p = state-p, *q = state-q; 5 KB on the stack is a lot! Granted, this is a leaf routine which in our case will be called

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Thor Lancelot Simon
On Fri, Apr 18, 2014 at 10:27:45PM +0900, Izumi Tsutsui wrote: Note the caller of this hc128_init() is: I'm afraid 9KB stack on rekeying is fatal on most ports. Well, the cipher should hardly ever get rekeyed. The rekeying intervals could be considerably larger; I did not want to increase

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Taylor R Campbell
Closer inspection of HC-128 reveals that it uses secret-dependent array indices[*], so for that reason alone I don't think we should adopt it. It also has a very large state, which is going to hurt the cache on big systems and hurt the stack on little systems. So, could you please split your

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Thor Lancelot Simon
On Fri, Apr 18, 2014 at 10:20:24PM +0900, Izumi Tsutsui wrote: LITTLE_ENDIAN != x86 This should simply be le32dec(9) otherwise it will cause unaligned trap on arm and mips etc. I believe the input is -- though declared as uint8_t -- required to always be alinged (see the comment on the

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Thor Lancelot Simon
On Fri, Apr 18, 2014 at 04:26:19PM +, Taylor R Campbell wrote: Closer inspection of HC-128 reveals that it uses secret-dependent array indices[*], so for that reason alone I don't think we should adopt it. It also has a very large state, which is going to hurt the cache on big systems

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Thor Lancelot Simon
On Fri, Apr 18, 2014 at 10:27:45PM +0900, Izumi Tsutsui wrote: Note the caller of this hc128_init() is: +static void +cprng_fast_randrekey(cprng_fast_ctx_t *ctx) +{ + uint8_t key[16], iv[16]; + hc128_state_t tempstate; + int s; + + int have_initial =

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Taylor R Campbell
Date: Fri, 18 Apr 2014 12:38:38 -0400 From: Thor Lancelot Simon t...@panix.com 3) If the algorithm's use of state-dependent array indices presents a real weakness in practice, why aren't there any published results on this and why was it chosen as, and

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Markku-Juhani Olavi Saarinen
Hi, If you want to get rid of RC4, use AES in CTR mode. It is standard, compact, clean, and really fast solution. May sound boring, but gives me a feel of solid security engineering. Note that majority of systems now have the AES-NI instructions which speed up AES implementations by an order of

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Taylor R Campbell
Date: Fri, 18 Apr 2014 19:58:06 +0200 From: Markku-Juhani Olavi Saarinen m...@iki.fi If you want to get rid of RC4, use AES in CTR mode. It is standard, compact, clean, and really fast solution. May sound boring, but gives me a feel of solid security engineering. We use that for

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Markku-Juhani Olavi Saarinen
On Fri, Apr 18, 2014 at 8:11 PM, Taylor R Campbell campbell+netbsd-tech-k...@mumble.net wrote: Date: Fri, 18 Apr 2014 19:58:06 +0200 From: Markku-Juhani Olavi Saarinen m...@iki.fi If you want to get rid of RC4, use AES in CTR mode. It is standard, compact, clean, and really fast

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Thomas Klausner
On Fri, Apr 18, 2014 at 01:39:18PM -0400, Thor Lancelot Simon wrote: How do you count to 9K? I see: 2K for p 2K for q 1280 bytes for w Are you talking about this w? + uint32_t w[1280], *p = state-p, *q = state-q; This looks like 1280x4 bytes to me. Thomas

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Thor Lancelot Simon
On Fri, Apr 18, 2014 at 06:11:39PM +, Taylor R Campbell wrote: The majority of systems certainly don't have AES-NI. Only some recent Intel CPUs do, and we can't use it in the kernel anyway. Right: plenty of systems accellerate AES, but in the wide world of systems that are not all x86

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Thor Lancelot Simon
On Fri, Apr 18, 2014 at 08:33:20PM +0200, Thomas Klausner wrote: On Fri, Apr 18, 2014 at 01:39:18PM -0400, Thor Lancelot Simon wrote: How do you count to 9K? I see: 2K for p 2K for q 1280 bytes for w Are you talking about this w? + uint32_t w[1280], *p = state-p,

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Roland C. Dowdeswell
On Fri, Apr 18, 2014 at 08:23:11PM +0200, Markku-Juhani Olavi Saarinen wrote: Agreed. AES is worse if you don't have AES-NI. It has been there on all new systems purchased in some last 3 years, so I would *guess* that it would be 50% of systems fielded out there. It hasn't been there on

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Thor Lancelot Simon
On Fri, Apr 18, 2014 at 09:54:09PM +0100, Roland C. Dowdeswell wrote: On Fri, Apr 18, 2014 at 08:23:11PM +0200, Markku-Juhani Olavi Saarinen wrote: Agreed. AES is worse if you don't have AES-NI. It has been there on all new systems purchased in some last 3 years, so I would *guess*

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Markku-Juhani Olavi Saarinen
Hi, Just one last thought: it will be on all *future* systems ? Cheers, - markku Dr. Markku-Juhani O. Saarinen m...@iki.fi US +1 (424) 666 2713 On Fri, Apr 18, 2014 at 11:00 PM, Thor Lancelot Simon t...@panix.com wrote: On Fri, Apr 18, 2014 at 09:54:09PM +0100, Roland C. Dowdeswell wrote:

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Thor Lancelot Simon
On Fri, Apr 18, 2014 at 05:00:50PM -0400, Thor Lancelot Simon wrote: Unfortunately, the virtual machines on this laptop that I use for most NetBSD development don't expose the AES-NI instructions to guests, even when doing hardware assisted virtualization. Not RDRAND neither, for So, since

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Thor Lancelot Simon
On Fri, Apr 18, 2014 at 11:04:04PM +0200, Markku-Juhani Olavi Saarinen wrote: Hi, Just one last thought: it will be on all *future* systems ? No, it won't, unless you have some funny definition of system that excludes anything that's not a high-end x86 implementation from one particular

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Matt Thomas
On Apr 18, 2014, at 11:23 AM, Markku-Juhani Olavi Saarinen m...@iki.fi wrote: It has been there on all new systems purchased in some last 3 years, so I would *guess* that it would be 50% of systems fielded out there. Not everything is x86 based.

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Izumi Tsutsui
tls@ wrote: Note the caller of this hc128_init() is: I'm afraid 9KB stack on rekeying is fatal on most ports. Well, the cipher should hardly ever get rekeyed. The rekeying intervals could be considerably larger; I did not want to increase them too much compared to the old

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Joerg Sonnenberger
On Fri, Apr 18, 2014 at 05:05:37PM -0400, Thor Lancelot Simon wrote: On Fri, Apr 18, 2014 at 05:00:50PM -0400, Thor Lancelot Simon wrote: Unfortunately, the virtual machines on this laptop that I use for most NetBSD development don't expose the AES-NI instructions to guests, even when

Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Thor Lancelot Simon
Miscellaneous notes -- I'm doing the benchmarking we discussed and pondering whether it is right to switch from hc-128 to salsa20, separately. On Thu, Apr 17, 2014 at 09:33:28PM +, Taylor R Campbell wrote: +void +hc128_init(hc128_state_t *state, const uint8_t *key, const uint8_t *iv)

Re: Patch: cprng_fast performance - please review.

2014-04-17 Thread Thor Lancelot Simon
On Wed, Apr 16, 2014 at 09:52:22PM -0400, Thor Lancelot Simon wrote: Attached is a patch which makes cprng_fast per-CPU and lockless. *IT IS NOT WELL TESTED YET (I haven't even run test vectors) AND IS ONLY FOR REVIEW.* New diff, with some missing files and incorporating some more comments

Re: Patch: cprng_fast performance - please review.

2014-04-17 Thread Taylor R Campbell
Repeating what I said privately for the public record: The changes to cprng.h expose a lot of guts of the new cprng_fast implementation. I don't see justification for this. As I understand it, the goals of rewriting cprng_fast are: 1. Replace the unsafe arc4random by a safer algorithm. 2.

Re: Patch: cprng_fast performance - please review.

2014-04-17 Thread Thor Lancelot Simon
Thank you for looking at this! On Thu, Apr 17, 2014 at 09:33:28PM +, Taylor R Campbell wrote: And the only performance constraint is that its single-threaded performance should not be worse than the existing arc4random-based cprng_fast. This, I don't agree with, unless we're going to

Patch: cprng_fast performance - please review.

2014-04-16 Thread Thor Lancelot Simon
On Wed, Apr 09, 2014 at 05:16:55PM +0100, Mindaugas Rasiukevicius wrote: Perhaps I missed it, but were the benchmark results published? Also, how about cprng_fast32/64()? If the current mechanism is indeed faster, then I am glad that you have made an improvement. To quote

Re: Patch: cprng_fast performance - please review.

2014-04-16 Thread Thor Lancelot Simon
On Wed, Apr 16, 2014 at 09:52:22PM -0400, Thor Lancelot Simon wrote: Attached is a patch which makes cprng_fast per-CPU and lockless. *IT IS NOT WELL TESTED YET (I haven't even run test vectors) AND IS ONLY FOR REVIEW.* The diff is against the head of the tls-earlyentropy branch. Taylor