from:"George Spelvin"

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-28 Thread George Spelvin

Hannes Frederic Sowa wrote:
> We call extract_crng when we run out of batched entropy and reseed. How
> often we call down to extract_crng depends on how much entropy we
> extracted by calls to get_random_int/long, so the number of calls into
> those functions matter.
> 
> In extract_crng we have a timer which reseeds every 300s the CPRNG and
> either uses completely new entropy from the CRNG or calls down into the
> CPRNG while also doing backtracing protection (which feeds chacha's
> block size / 2 back into chacha, if I read the code correctly, thus
> 1024 bits, which should be enough).

In the current code, _extract_crng checks to see if more than 300 s
have elapsed since last time it was reseeded, and if so, reseeds with
fresh entropy.

In addition, on every read (or get_random_bytes), if the request leaves
enough ranfom bits in the last ChaCha block, it feeds back 256 bits
(the ChaCha block size is 16*32 = 512 bits) for anti-backtracking.

If the last read happened to not fit under that limit (size % 512 >
256), *and* there are no calls for RNG output for a long time, there is
no  upper limit to how long the old ChaCha key can hang around.

> On Fri, 2016-12-23 at 20:17 -0500, George Spelvin wrote:
>> For example, two mix-backs of 64 bits gives you 65 bit security, not 128.
>> (Because each mixback can be guessed and verified separately.)

> Exactly, but the full reseed after running out of entropy is strong
> enough to not be defeated by your argumentation. Neither the reseed
> from the CRNG.

Yes, I was just reacting to your original statement:

>>>>> couldn't we simply use 8 bytes of the 64 byte
>>>>> return block to feed it directly back into the state chacha?

It's not the idea that's bad, just the proposed quantity.


>> If you want that, I have a pile of patches to prandom I really
>> should push upstream.  Shall I refresh them and send them to you?

> I would like to have a look at them in the new year, certainly! I can
> also take care about the core prandom patches, but don't know if I have
> time to submit the others to the different subsystems.
>
> Maybe, if David would be okay with that, we can submit all patches
> through his tree, as he is also the dedicated maintainer for prandom.

Amazing, thank you very much!  They're just minor cleanups, nothing
too exciting.  I'll put it in the queue to make sure they're up to
date.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-23 Thread George Spelvin

Hannes Frederic Sowa wrote:
> On 24.12.2016 00:39, George Spelvin wrote:
>> We just finished discussing why 8 bytes isn't enough.  If you only
>> feed back 8 bytes, an attacker who can do 2^64 computation can find it
>> (by guessing and computing forward to verify the guess) and recover the
>> previous state.  You need to feed back at least as much output as your
>> security targete.  For /dev/urandom's ChaCha20, that's 256 bits.

> I followed the discussion but it appeared to me that this has the
> additional constraint of time until the next reseeding event happenes,
> which is 300s (under the assumption that calls to get_random_int happen
> regularly, which I expect right now). After that the existing reseeding
> mechansim will ensure enough backtracking protection. The number of
> bytes can easily be increased here, given that reseeding was shown to be
> quite fast already and we produce enough output. But I am not sure if
> this is a bit overengineered in the end?

I'm not following your description of how the time-based and call-based
mechanisms interact, but for any mix-back, you should either do enough
or none at all.  (Also called "catastrophic reseeding".)

For example, two mix-backs of 64 bits gives you 65 bit security, not 128.
(Because each mixback can be guessed and verified separately.)

> Also agreed. Given your solution below to prandom_u32, I do think it
> might also work without the seqlock now.

It's not technically a seqlock; in particular the reader doesn't
spin.  But the write side, and general logic is so similar it's
a good mental model.

Basically, assume a 64-byte buffer.  The reader has gone through
32 bytes of it, and has 32 left, and when he reads another 8 bytes,
has to distinguish three cases:

1) No update; we read the old bytes and there are now 32 - 24 bytes left.
2) Update completed while we weren't looking.  There are now new
   bytes in the buffer, and we took 8 leaving 64 - 8 = 56.
3) Update in progress at the time of the read.  We don't know if we
   are seeing old bytes or new bytes, so we have to assume the worst
   and not proceeed unless 32 >= 8, but assume at the end there are
   64 - 8 = 56 new bytes left.

> I wouldn't have added a disable irqs, but given that I really like your
> proposal, I would take it in my todo branch and submit it when net-next
> window opens.

If you want that, I have a pile of patches to prandom I really
should push upstream.  Shall I refresh them and send them to you?

commit 4cf1b3d9f4fbccc29ffc2fe4ca4ff52ea77253f1
Author: George Spelvin <li...@horizon.com>
Date:   Mon Aug 31 00:05:00 2015 -0400

net: Use prandom_u32_max()

It's slightly more efficient than "prandom_u32() % range"

The net/802 code was already efficient, but prandom_u32_max() is simpler.

In net/batman-adv/bat_iv_ogm.c, batadv_iv_ogm_fwd_send_time() got changed
from picking a random number of milliseconds and converting to jiffies to
picking a random number of jiffies, since the number of milliseconds (and
thus the conversion to jiffies) is a compile-time constant.  The equivalent
code in batadv_iv_ogm_emit_send_time was not changed, because the number
of milliseconds is variable.

In net/ipv6/ip6_flowlabel.c, ip6_flowlabel had htonl(prandom_u32()),
which is silly.  Just cast to __be32 without permuting the bits.

net/sched/sch_netem.c got adjusted to only need one call to prandom_u32
instead of 2.  (Assuming skb_headlen can't exceed 512 MiB, which is
hopefully safe for some time yet.)

Signed-off-by: George Spelvin <li...@horizon.com>

commit 9c8fb80e1fd2be42c35cab1af27187d600fd85e3
Author: George Spelvin <li...@horizon.com>
Date:   Sat May 24 15:20:47 2014 -0400

mm/swapfile.c: Use prandom_u32_max()

It's slightly more efficient than "prandom_u32() % range"

    Signed-off-by: George Spelvin <li...@horizon.com>

commit 2743eb01e5c5958fd88ae78d19c5fea772d4b117
Author: George Spelvin <li...@horizon.com>
Date:   Sat May 24 15:19:53 2014 -0400

    lib: Use prandom_u32_max()

It's slightly more efficient than "prandom_u32() % range"

Signed-off-by: George Spelvin <li...@horizon.com>

commit 6a5e91bf395060a3351bfe5efc40ac20ffba2c1b
Author: George Spelvin <li...@horizon.com>
Date:   Sat May 24 15:18:50 2014 -0400

fs/xfs: Use prandom_u32_max()

It's slightly more efficient than "prandom_u32() % range".

Also changed the last argument of xfs_error_test() from "unsigned long"
to "unsigned", since the code never did support values > 2^32, and
the largest value ever passed is 100.

The code could be improved even further by passing in 2^32/rf rather
than rf, but I'll leave that to some XFS developers.

Signed-of

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-23 Thread George Spelvin

Hannes Frederic Sowa wrote:
> In general this looks good, but bitbuffer needs to be protected from
> concurrent access, thus needing at least some atomic instruction and
> disabling of interrupts for the locking if done outside of
> get_random_long. Thus I liked your previous approach more where you
> simply embed this in the already necessary get_random_long and aliased
> get_random_long as get_random_bits(BITS_PER_LONG) accordingly, IMHO.

It's meant to be part of the same approach, and I didn't include locking
because that's a requirement for *any* solution, and isn't specific
to the part I was trying to illustrate.

(As for re-using the name "get_random_long", that was just so
I didn't have to explain it.  Call it "get_random_long_internal"
if you like.)

Possible locking implementations include:
1) Use the same locking as applies to get_random_long_internal(), or
2) Make bitbuffer a per-CPU variable (note that we currently have 128
   bits of per-CPU state in get_random_int_hash[]), and this is all a
   fast-path to bypass heavier locking in get_random_long_internal().

>> But, I just realized I've been overlooking something glaringly obvious...
>> there's no reason you can't compute multple blocks in advance.
>
> In the current code on the cost of per cpu allocations thus memory.

Yes, but on 4-core machines it's still not much, and 4096-core
behemoths have RAM to burn.

> In the extract_crng case, couldn't we simply use 8 bytes of the 64 byte
> return block to feed it directly back into the state chacha? So we pass
> on 56 bytes into the pcpu buffer, and consume 8 bytes for the next
> state. This would make the window max shorter than the anti
> backtracking protection right now from 300s to 14 get_random_int calls.
> Not sure if this is worth it.

We just finished discussing why 8 bytes isn't enough.  If you only
feed back 8 bytes, an attacker who can do 2^64 computation can find it
(by guessing and computing forward to verify the guess) and recover the
previous state.  You need to feed back at least as much output as your
security targete.  For /dev/urandom's ChaCha20, that's 256 bits.

>> For example, suppose we gave each CPU a small pool to minimize locking.
>> When one runs out and calls the global refill, it could refill *all*
>> of the CPU pools.  (You don't even need locking; there may be a race to
>> determine *which* random numbers the reader sees, but they're equally
>> good either way.)

> Yes, but still disabled interrupts, otherwise the same state could be
> used twice on the same CPU. Uff, I think we have this problem in
> prandom_u32.

There are some locking gotchas, but it is doable lock-free.

Basically, it's a seqlock.  The writer increments it once (to an odd
number) before starting to overwrite the buffer, and a second time (to
an even number) after.  "Before" and "after" mean smp_wmb().

The reader can use this to figure out how much of the data in the buffer
is safely fresh.  The full sequence of checks is a bit intricate,
but straightforward.

I didn't discuss the locking because I'm confident it's solvable,
not because I wasn't aware it has to be solved.

As for prandom_u32(), what's the problem?  Are you worried that
get_cpu_var disables preemption but not interrupts, and so an
ISR might return the same value as process-level code?

First of all, that's not a problem because prandom_u32() doesn't
have security guarantees.  Occasionally returning a dupicate number
is okay.

Second, if you do care, that could be trivially fixed by throwing
a barrier() in the middle of the code.  (Patch appended; S-o-B
if anyone wants it.)

diff --git a/lib/random32.c b/lib/random32.c
index c750216d..6bee4a36 100644
--- a/lib/random32.c
+++ b/lib/random32.c
@@ -55,16 +55,29 @@ static DEFINE_PER_CPU(struct rnd_state, net_rand_state);
  *
  * This is used for pseudo-randomness with no outside seeding.
  * For more random results, use prandom_u32().
+ *
+ * The barrier() is to allow prandom_u32() to be called from interupt
+ * context without locking.  An interrupt will run to completion,
+ * updating all four state variables.  The barrier() ensures that
+ * the interrupted code will compute a different result.  Either it
+ * will have written s1 and s2 (so the interrupt will start with
+ * the updated values), or it will use the values of s3 and s4
+ * updated by the interrupt.
+ *
+ * (The same logic applies recursively to nested interrupts, trap
+ * handlers, and NMIs.)
  */
 u32 prandom_u32_state(struct rnd_state *state)
 {
+   register u32 x;
 #define TAUSWORTHE(s, a, b, c, d) ((s & c) << d) ^ (((s << a) ^ s) >> b)
-   state->s1 = TAUSWORTHE(state->s1,  6, 13, 4294967294U, 18U);
-   state->s2 = TAUSWORTHE(state->s2,  2, 27, 4294967288U,  2U);
-   state->s3 = TAUSWORTHE(state->s3, 13, 21, 4294967280U,  7U);
-   state->s4 = TAUSWORTHE(state->s4,  3, 12, 4294967168U, 13U);
+   x  = state->s1 = TAUSWORTHE(state->s1,  6, 13,

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-23 Thread George Spelvin

(Cc: list trimmed slightly as the topic is wandering a bit.)

Hannes Frederic Sowa wrote:
> On Thu, 2016-12-22 at 19:07 -0500, George Spelvin wrote:
>> Adding might_lock() annotations will improve coverage a lot.
>
> Might be hard to find the correct lock we take later down the code
> path, but if that is possible, certainly.

The point of might_lock() is that you don't have to.  You find the
worst case (most global) lock that the code *might* take if all the
buffer-empty conditions are true, and tell lockdep "assume this lock is
taken all the time".

>> Hannes Frederic Sowa wrote:
>>> Yes, that does look nice indeed. Accounting for bits instead of bytes
>>> shouldn't be a huge problem either. Maybe it gets a bit more verbose in
>>> case you can't satisfy a request with one batched entropy block and have
>>> to consume randomness from two.

For example, here's a simple bit-buffer implementation that wraps around
a get_random_long.  The bitbuffer is of the form "1", where the
x bits are valid, and the position of the msbit indicates how many bits
are valid.

extern unsigned long get_random_long();
static unsigned long bitbuffer = 1; /* Holds 0..BITS_PER_LONG-1 bits */
unsigned long get_random_bits(unsigned char bits)
{
/* We handle bits == BITS_PER_LONG,and not bits == 0 */
unsigned long mask = -1ul >> (BITS_PER_LONG - bits);
unsigned long val;

if (bitbuffer > mask) {
/* Request can be satisfied out of the bit buffer */
val = bitbuffer;
bitbuffer >>= bits;
} else {
/*
 * Not enough bits, but enough room in bitbuffer for the
 * leftovers.  avail < bits, so avail + 64 <= bits + 63.
 */
val = get_random_long();
bitbuffer = bitbuffer << (BITS_PER_LONG - bits)
| val >> 1 >> (bits-1);
}
return val & mask;
}

> When we hit the chacha20 without doing a reseed we only mutate the
> state of chacha, but being an invertible function in its own, a
> proposal would be to mix parts of the chacha20 output back into the
> state, which, as a result, would cause slowdown because we couldn't
> propagate the complete output of the cipher back to the caller (looking
> at the function _extract_crng).

Basically, yes.  Half of the output goes to rekeying itself.

But, I just realized I've been overlooking something glaringly obvious...
there's no reason you can't compute multple blocks in advance.

The standard assumption in antibacktracking is that you'll *notice* the
state capture and stop trusting the random numbers afterward; you just
want the output *before* to be secure.  In other words, cops busting
down the door can't find the session key used in the message you just sent.

So you can compute and store random numbers ahead of need.

This can amortize the antibacktracking as much as you'd like.

For example, suppose we gave each CPU a small pool to minimize locking.
When one runs out and calls the global refill, it could refill *all*
of the CPU pools.  (You don't even need locking; there may be a race to
determine *which* random numbers the reader sees, but they're equally
good either way.)

> Or are you referring that the anti-backtrack protection should happen
> in every call from get_random_int?

If you ask for anti-backtracking without qualification, that's the
goal, since you don't know how long will elapse until the next call.  

It's like fsync().  There are lots of more limited forms of "keep my
data safe in case of a crash", but the most basic one is "if we lost
power the very instant the call returned, the data would be safe."
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-22 Thread George Spelvin

Hannes Frederic Sowa wrote:
> A lockdep test should still be done. ;)

Adding might_lock() annotations will improve coverage a lot.

> Yes, that does look nice indeed. Accounting for bits instead of bytes
> shouldn't be a huge problem either. Maybe it gets a bit more verbose in
> case you can't satisfy a request with one batched entropy block and have
> to consume randomness from two.

The bit granularity is also for the callers' convenience, so they don't
have to mask again.  Whether get_random_bits rounds up to byte boundaries
internally or not is something else.

When the current batch runs low, I was actually thinking of throwing
away the remaining bits and computing a new batch of 512.  But it's
whatever works best at implementation time.

>>> It could only mix the output back in every two calls, in which case
>>> you can backtrack up to one call but you need to do 2^128 work to
>>> backtrack farther.  But yes, this is getting excessively complicated.
> 
>> No, if you're willing to accept limited backtrack, this is a perfectly
>> acceptable solution, and not too complicated.  You could do it phase-less
>> if you like; store the previous output, then after generating the new
>> one, mix in both.  Then overwrite the previous output.  (But doing two
>> rounds of a crypto primtive to avoid one conditional jump is stupid,
>> so forget that.)

> Can you quickly explain why we lose the backtracking capability?

Sure.  An RNG is (state[i], output[i]) = f(state[i-1]).  The goal of
backtracking is to compute output[i], or better yet state[i-1], given
state[i].

For example, consider an OFB or CTR mode generator.  The state is a key
and and IV, and you encrypt the IV with the key to produce output, then
either replace the IV with the output, or increment it.  Either way,
since you still have the key, you can invert the transformation and
recover the previous IV.

The standard way around this is to use the Davies-Meyer construction:

IV[i] = IV[i-1] + E(IV[i-1], key)

This is the standard way to make a non-invertible random function
out of an invertible random permutation.

>From the sum, there's no easy way to find the ciphertext *or* the
plaintext that was encrypted.  Assuming the encryption is secure,
the only way to reverse it is brute force: guess IV[i-1] and run the
operation forward to see if the resultant IV[i] matches.

There are a variety of ways to organize this computation, since the
guess gives toy both IV[i-1] and E(IV[i-1], key) = IV[i] - IV[i-1], including
running E forward, backward, or starting from both ends to see if you
meet in the middle.

The way you add the encryption output to the IV is not very important.
It can be addition, xor, or some more complex invertible transformation.
In the case of SipHash, the "encryption" output is smaller than the
input, so we have to get a bit more creative, but it's still basically
the same thing.

The problem is that the output which is combined with the IV is too small.
With only 64 bits, trying all possible values is practical.  (The world's
Bitcoin miners are collectively computing SHA-256(SHA-256(input)) 1.7 * 2^64
times per second.)

By basically doing two iterations at once and mixing in 128 bits of
output, the guessing attack is rendered impractical.  The only downside
is that you need to remember and store one result between when it's
computed and last used.  This is part of the state, so an attack can
find output[i-1], but not anything farther back.

> ChaCha as a block cipher gives a "perfect" permutation from the output
> of either the CRNG or the CPRNG, which actually itself has backtracking
> protection.

I'm not quite understanding.  The /dev/random implementation uses some
of the ChaCha output as a new ChaCha key (that's another way to mix output
back into the state) to prevent backtracking.  But this slows it down, and
again if you want to be efficient, you're generating and storing large batches
of entropy and storing it in the RNG state.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-22 Thread George Spelvin

> I do tend to like Ted's version in which we use batched
> get_random_bytes() output.  If it's fast enough, it's simpler and lets
> us get the full strength of a CSPRNG.

With the ChaCha20 generator, that's fine, although note that this abandons
anti-backtracking entirely.

It also takes locks, something the previous get_random_int code
path avoided.  Do we need to audit the call sites to ensure that's safe?

And there is the issue that the existing callers assume that there's a
fixed cost per word.  A good half of get_random_long calls are followed by
"& ~PAGE_MASK" to extract the low 12 bits.  Or "& ((1ul << mmap_rnd_bits)
- 1)" to extract the low 28.  If we have a buffer we're going to have to
pay to refill, it would be nice to use less than 8 bytes to satisfy those.

But that can be a followup patch.  I'm thinking

unsigned long get_random_bits(unsigned bits)
E.g. get_random_bits(PAGE_SHIFT),
 get_random_bits(mmap_rnd_bits),
u32 imm_rnd = get_random_bits(32)

unsigned get_random_mod(unsigned modulus)
E.g. get_random_mod(hole) & ~(alignment - 1);
 get_random_mod(port_scan_backoff)
(Althogh probably drivers/s390/scsi/zfcp_fc.c should be changed
to prandom.)

with, until the audit is completed:
#define get_random_int() get_random_bits(32)
#define get_random_long() get_random_bits(BITS_PER_LONG)

> It could only mix the output back in every two calls, in which case
> you can backtrack up to one call but you need to do 2^128 work to
> backtrack farther.  But yes, this is getting excessively complicated.

No, if you're willing to accept limited backtrack, this is a perfectly
acceptable solution, and not too complicated.  You could do it phase-less
if you like; store the previous output, then after generating the new
one, mix in both.  Then overwrite the previous output.  (But doing two
rounds of a crypto primtive to avoid one conditional jump is stupid,
so forget that.)

>> Hmm, interesting.  Although, for ASLR, we could use get_random_bytes()
>> directly and be done with it.  It won't be a bottleneck.

Isn't that what you already suggested?

I don't mind fewer primtives; I got a bit fixated on "Replace MD5 with
SipHash".  It's just the locking that I want to check isn't a problem.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-22 Thread George Spelvin

> Having slept on this, I like it less.  The problem is that a
> backtracking attacker doesn't just learn H(random seed || entropy_0 ||
> secret || ...) -- they learn the internal state of the hash function
> that generates that value.  This probably breaks any attempt to apply
> security properties of the hash function.  For example, the internal
> state could easily contain a whole bunch of prior outputs it in
> verbatim.

The problem is, anti-backtracing is in severe tension with your desire
to use unmodified SipHash.

First of all, I'd like to repeat that it isn't a design goal of the current
generator and isn't necessary.

The current generator just returns hash[0] from the MD5 state, then
leaves the state stored.  The fact that it conceals earlier outputs is
an accident of the Davies-Meyer structure of md5_transform.

It isn't necessary, because every secret generated is stored unencrypted
for as long as it's of value.  A few values are used for retransmit
backoffs and random MAC addresses.  Those are revealed to the world as
soon as they're used.

Most values are used for ASLR.  These address are of interest to an
attacker trying to mount a buffer-overflow attack, but that only lasts
as long as the process is running to receive buffers.  After the process
exits, knowledge of its layout is worthless.

And this is stored as long as it's running in kernel-accessible data,
so storing a *second* copy in less conveniently kernel-accessible data
(the RNG state) doesn't make the situation any worse.


In addition to the above, if you're assuming a state capture, then
since we have (for necessary efficieny reasons) a negligible about of
fresh entropy, an attacker has the secret key and can predict *future*
outputs very easily.

Given that information, an attacker doesn't need to learn the layout of
vulnerable server X.  If they have a buffer overflow, they can crash
the current instance and wait for a fresh image to be started (with a
known address space) to launch their attack at.


Kernel state capture attacks are a very unlikely attack, mostly because
it's a narrow target a hair's breadth away from the much more interesting
outright kernel compromise (attacker gains write access as well as read)
which renders all this fancy cryptanaysis moot.


Now, the main point:  it's not likely to be solvable.

The problem with unmodified SipHash is that is has only 64 bits of
output.  No mix-back mechanism can get around the fundamental problem
that that's too small to prevent a brute-force guessing attack.  You need
wider mix-back.  And getting more output from unmodified SipHash requires
more finalization rounds, which is expensive.

(Aside: 64 bits does have the advantage that it can't be brute-forced on
the attacked machine.  It must be exfiltrated to the attacker, and the
solution returned to the attack code.  But doing this is getting easier
all the time.)

Adding antibacktracking to SipHash is trivial: just add a Davies-Meyer
structure around its internal state.  Remember the internal state before
hashing in the entropy and secret, generate the output, then add the
previous and final states together for storage.

This is a standard textbook construction, very cheap, and doesn't touch
the compression function which is the target of analysis and attacks,
but it requires poking around in SipHash's internal state.  (And people
who read the textbooks without understanding them will get upset because
the textbooks all talk about using this construction with block ciphers,
and SipHash's compression function is not advertised as a block cipher.)

Alternative designs exist; you could extract additional output from
earlier rounds of SipHash, using the duplex sponge construction you
mentioned earlier.  That output would be used for mixback purposes *only*,
so wouldn't affect the security proof of the "primary" output.
But this is also getting creative with SipHash's internals.


Now, you could use a completely *different* cryptographic primitive
to enforce one-way-ness, and apply SipHash as a strong output transform,
but that doesn't feel like good design, and is probably more expensive.


Finally, your discomfort about an attacker learning the internal state...
if an attacker knows the key and the input, they can construct the
internal state.  Yes, we could discard the internal state and construct
a fresh one on the next call to get_random_int, but what are you going
to key it with?  What are you going to feed it?  What keeps *that*
internal state any more secret from an attacker than one that's explicitly
stored?

Keeping the internal state around is a cacheing optimization, that's all.

*If* you're assuming a state capture, the only thing secret from the
attacker is any fresh entropy collected between the time of capture
and the time of generation.  Due to mandatory efficiency requirements,
this is very small.  

I really think you're wishing for the impossible here.


A final note: although I'm disagreeing with you,

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-22 Thread George Spelvin

> True, but it's called get_random_int(), and it seems like making it
> stronger, especially if the performance cost is low to zero, is a good
> thing.

If it's cheap enough, I don't mind.  But it's documented as being
marginal-quality, used where speed is more important.

In particular, it's *not* used for key material; only values that matter
only as long as they are in use.  Whule they're in use, they can't be
concealed from an attacker with kernel access, and when they're dne
being used, they're worthless.

>> If you want anti-backtracking, though, it's easy to add.  What we
>> hash is:
>>
>> entropy_0 || secret || output_0 || entropy_1 || secret || output_1 || ...
>>
>> You mix the output word right back in to the (unfinalized) state after
>> generating it.  This is still equivalent to unmodified back-box SipHash,
>> you're just using a (conceptually independent) SipHash invocation to
>> produce some of its input.

> Ah, cute.  This could probably be sped up by doing something like:
>
> entropy_0 || secret || output_0 ^ entropy_1 || secret || ...

I'm disinclined to do that because that requires deferring the mixing
until the *next* time you generate something.  Storing the value you
don't want revealed by a state capture defeats the purpose.

I'd rather mix it in immediately, so you have anti-backtracking from
the moment of creation.

Also, I don't think it's particularly "cute" or clever; mixing output back
in is the standard way all antibacktracking is accomplished.  It's how
the Davies-Meyer hash construction makes a reversible function one-way.

(There is a second way to do it by throwing away state, but that's
expensive in seed entropy.)

> It's a little weak because the output is only 64 bits, so you could
> plausibly backtrack it on a GPU or FPGA cluster or on an ASIC if the
> old entropy is guessable.  I suspect there are sneaky ways around it
> like using output_n-1 ^ output_n-2 or similar.  I'll sleep on it.

Ah, yes, I see.  Given the final state, you guess the output word, go
back one round, then forward the finalization rounds.   Is the output
equal to the guessed output?  You'll find the true value, plus
Poisson(1 - 2^-64) additional.  (Since you have 2^64-1 chances at
something with probability 1 in 2^64.)

And this can be iterated as often as you like to get earlier output words,
as long as you can guess the entropy.  *That's* the part that hurts;
you'd like something that peters out.

You could use the double-length-output SipHash variant (which requires
a second set of finalization rounds) and mix more output back, but
that's expensive.

The challenge is coming up with more unpredictable data to mix in than one
invocation of SipHash returns.  And without storing previous output
anywhere, because that is exactly wrong.

A running sum or xor or whatever of the outputs doesn't help, because
once you've guessed the last output, you can backtrack that for no
additional effort.

State capture is incredibly difficult, our application doesn't require
resistance anyway... unless you can think of something cheap, I think
we can just live with this.

>> I'd *like* to persuade you that skipping the padding byte wouldn't
>> invalidate any security proofs, because it's true and would simplify
>> the code.  But if you want 100% stock, I'm willing to cater to that.

> I lean toward stock in the absence of a particularly good reason.  At
> the very least I'd want to read that paper carefully.

Er... adding the length is standard Merkle-Damgaard strengthening.
Why you do this is described in the original papers by Merkle and Damgaard.

The lay summary is at
https://en.wikipedia.org/wiki/Merkle-Damgard_construction

The original sources are:
http://www.merkle.com/papers/Thesis1979.pdf
http://saluc.engr.uconn.edu/refs/algorithms/hashalg/damgard89adesign.pdf

Merkle describes the construction; Damgaard proves it secure.  Basically,
appending the length is required to handle variable-length input if the
input is not itself self-delimiting.

The proof of security is theorem 3.1 in the latter.  (The first, more
detailed explanation involves the use of an extra bit, which the second
then explains how todo without.)

In particular, see the top of page 420, which notes that the security
proof only requires encoding *how much padding is added* in the final
block, not the overall length of the message, and the second remark on
p. 421 which notes that no such suffix is required if it's not necessary
to distinguish messages with different numbers of trailing null bytes.

The rules are alluded to in the "Choice of padding rule" part of the
"Rationale" section of the SipHash paper (p. 7), but the description is
very brief because it assumes the reader has the background.

That's why they say "We could have chosen a slightly simpler padding rule,
such as appending a 80 byte followed by zeroes."

The thing is, if the amount of the last block that is used is fixed
(within the domain of a particular key), you don't

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-21 Thread George Spelvin

Andy Lutomirski wrote:
> I don't even think it needs that.  This is just adding a
> non-destructive final operation, right?

It is, but the problem is that SipHash is intended for *small* inputs,
so the standard implementations aren't broken into init/update/final
functions.

There's just one big function that keeps the state variables in
registers and never stores them anywhere.

If we *had* init/update/final functions, then it would be trivial.

> Just to clarify, if we replace SipHash with a black box, I think this
> effectively means, where "entropy" is random_get_entropy() || jiffies
> || current->pid:

> The first call returns H(random seed || entropy_0 || secret).  The
> second call returns H(random seed || entropy_0 || secret || entropy_1
> || secret).  Etc.

Basically, yes.  I was skipping the padding byte and keying the
finalization rounds on the grounds of "can't hurt and might help",
but we could do it a more standard way.

> If not, then I have a fairly strong preference to keep whatever
> construction we come up with consistent with something that could
> actually happen with invocations of unmodified SipHash -- then all the
> security analysis on SipHash goes through.

Okay.  I don't think it makes a difference, but it's not a *big* waste
of time.  If we have finalization rounds, we can reduce the secret
to 128 bits.

If we include the padding byte, we can do one of two things:
1) Make the secret 184 bits, to fill up the final partial word as
   much as possible, or
2) Make the entropy 1 byte smaller and conceptually misalign the
   secret.  What we'd actually do is remove the last byte of
   the secret and include it in the entropy words, but that's
   just a rotation of the secret between storage and hashing.

Also, I assume you'd like SipHash-2-4, since you want to rely
on a security analysis.

(Regarding the padding byte, getting it right might be annoying
to do exactly.  All of the security analysis depends *only* on
its low 3 bits indicating how much of the final block is used.
As it says in the SipHash paper, they included 8 bits just because
it was easy.  But if you want it exact, it's just one more byte of
state.)

> The one thing I don't like is
> that I don't see how to prove that you can't run it backwards if you
> manage to acquire a memory dump.  In fact, I that that there exist, at
> least in theory, hash functions that are secure in the random oracle
> model but that *can* be run backwards given the full state.  From
> memory, SHA-3 has exactly that property, and it would be a bit sad for
> a CSPRNG to be reversible.

Er...  get_random_int() is specifically *not* designed to be resistant
to state capture, and I didn't try.  Remember, what it's used for
is ASLR, what we're worried about is somene learning the layouts
of still-running processes, and and if you get a memory dump, you have
the memory layout!

If you want anti-backtracking, though, it's easy to add.  What we
hash is:

entropy_0 || secret || output_0 || entropy_1 || secret || output_1 || ...

You mix the output word right back in to the (unfinalized) state after
generating it.  This is still equivalent to unmodified back-box SipHash,
you're just using a (conceptually independent) SipHash invocation to
produce some of its input.

Each output is produced by copying the state, padding & finalizing after the
secret.

In fact, to make our lives easier, let's define the secret to end with
a counter byte that happens to be equal to the padding byte.  The input
stream will be:

Previous output: 8 (or 4 for HalfSipHash) bytes
Entropy: 15 bytes (8 bytes timer, 4 bytes jiffies, 3 bytes pid)
Secret: 16 bytes
Counter: 1 byte
...repeat...

> We could also periodically mix in a big (128-bit?) chunk of fresh
> urandom output to keep the bad guys guessing.

Simpler and faster to just update the global master secret.
The state is per-CPU, so mixing in has to be repeated per CPU.

With these changes, I'm satisifed that it's secure, cheap, has a
sufficiently wide state size, *and* all standard SipHash analysis applies.

The only remaining issues are:
1) How many rounds, and
2) May we use HalfSipHash?

I'd *like* to persuade you that skipping the padding byte wouldn't
invalidate any security proofs, because it's true and would simplify
the code.  But if you want 100% stock, I'm willing to cater to that.

Ted, what do you think?
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [kernel-hardening] Re: HalfSipHash Acceptable Usage

2016-12-21 Thread George Spelvin

> Plus the benchmark was bogus anyway, and when I built a more specific
> harness -- actually comparing the TCP sequence number functions --
> SipHash was faster than MD5, even on register starved x86. So I think
> we're fine and this chapter of the discussion can come to a close, in
> order to move on to more interesting things.

Do we have to go through this?  No, the benchmark was *not* bogus.

Here's myresults from *your* benchmark.  I can't reboot some of my test
machines, so I took net/core/secure_seq.c, lib/siphash.c, lib/md5.c and
include/linux/siphash.h straight out of your test tree.

Then I replaced the kernel #includes with the necessary typedefs
and #defines to make it compile in user-space.  (Voluminous but
straightforward.)  E.g.

#define __aligned(x) __attribute__((__aligned__(x)))
#define cacheline_aligned __aligned(64)
#define CONFIG_INET 1
#define IS_ENABLED(x) 1
#define ktime_get_real_ns() 0
#define sysctl_tcp_timestamps 0

... etc.

Then I modified your benchmark code into the appended code.  The
differences are:
* I didn't iterate 100K times, I timed the functions *once*.
* I saved the times in a buffer and printed them all at the end
  so printf() wouldn't pollute the caches.
* Before every even-numbered iteration, I flushed the I-cache
  of everything from _init to _fini (i.e. all the non-library code).
  This cold-cache case is what is going to happen in the kernel.

In the results below, note that I did *not* re-flush between phases
of the test.  The effects of cacheing is clearly apparent in the tcpv4
results, where the tcpv6 code loaded the cache.

You can also see that the SipHash code benefits more from cacheing when
entered with a cold cache, as it iterates over the input words, while
the MD5 code is one big unrolled blob.

Order of computation is down the columns first, across second.

The P4 results were:
tcpv6 md5 cold: 40843488358435843568
tcpv4 md5 cold: 1052 996 9961060 996
tcpv6 siphash cold: 40803296331232963312
tcpv4 siphash cold: 29682748297227162716
tcpv6 md5 hot:   900 712 712712  712
tcpv4 md5 hot:   632 672 672672  672
tcpv6 siphash hot:  24842292234023402340
tcpv4 siphash hot:  16601560156423401564

SipHash actually wins slightly in the cold-cache case, because
it iterates more.  In the hot-cache case, it loses horribly.

Core 2 duo:
tcpv6 md5 cold: 33962868296430122832
tcpv4 md5 cold: 13681044132013321308
tcpv6 siphash cold: 29402952291624482604
tcpv4 siphash cold: 31922988357635043624
tcpv6 md5 hot:  11161032 99610081008
tcpv4 md5 hot:   936 936 936 936 936
tcpv6 siphash hot:  12001236123611881188
tcpv4 siphash hot:   936 804 804 804 804

Pretty much a tie, honestly.

Ivy Bridge:
tcpv6 md5 cold: 60866136696263586060
tcpv4 md5 cold:  816 732104610541012
tcpv6 siphash cold: 37561886215223902566
tcpv4 siphash cold: 32642108302631203526
tcpv6 md5 hot:  1062 808 824 824 832
tcpv4 md5 hot:   730 730 740 748 748
tcpv6 siphash hot:   960 952 9361112 926
tcpv4 siphash hot:   638 544 562 552 560

Modern processors *hate* cold caches.  But notice how md5 is *faster*
than SipHash on hot-cache IPv6.

Ivy Bridge, -m64:
tcpv6 md5 cold: 46803672395636163525
tcpv4 md5 cold: 10661416117911791134
tcpv6 siphash cold:  9401258199516092255
tcpv4 siphash cold: 14401269129218701621
tcpv6 md5 hot:  1372112210881088
tcpv4 md5 hot:   997 997 997 997 998
tcpv6 siphash hot:   340 340 340 352 340
tcpv4 siphash hot:   227 238 238 238 238

Of course, when you compile -m64, SipHash is unbeatable.


Here's the modified benchmark() code.  The entire package is
a bit voluminous for the mailing list, but anyone is welcome to it.

static void clflush(void)
{
extern char const _init, _fini;
char const *p = &_init;

while (p < &_fini) {
asm("clflush %0" : : "m" (*p));
p += 64;
}
}

typedef uint32_t cycles_t;
static cycles_t get_cycles(void)
{
uint32_t eax, edx;
asm volatile("rdtsc" : "=a" (eax), "=d" (edx));
return eax;
}

static int benchmark(void)
{
cycles_t start, finish;
int i;
u32 seq_number = 0;
__be32 saddr6[4] = { 1, 4, 182, 393 }, daddr6[4] = { 9192, 18288, 
222, 0xff10 };
__be32 saddr4 = 2, daddr4 = 182112;
__be16 sport = 22, dport = 41992;

Re: HalfSipHash Acceptable Usage

2016-12-21 Thread George Spelvin

As a separate message, to disentangle the threads, I'd like to
talk about get_random_long().

After some thinking, I still like the "state-preserving" construct
that's equivalent to the current MD5 code.  Yes, we could just do
siphash(current_cpu || per_cpu_counter, global_key), but it's nice to
preserve a bit more.

It requires library support from the SipHash code to return the full
SipHash state, but I hope that's a fair thing to ask for.

Here's my current straw man design for comment.  It's very similar to
the current MD5-based design, but feeds all the seed material in the
"correct" way, as opposed to Xring directly into the MD5 state.

* Each CPU has a (Half)SipHash state vector,
  "unsigned long get_random_int_hash[4]".  Unlike the current
  MD5 code, we take care to initialize it to an asymmetric state.

* There's a global 256-bit random_int_secret (which we could
  reseed periodically).

To generate a random number:
* If get_random_int_hash is all-zero, seed it with fresh a half-sized
  SipHash key and the appropriate XOR constants.
* Generate three words of random_get_entropy(), jiffies, and current->pid.
  (This is arbitary seed material, copied from the current code.)
* Crank through that with (Half)SipHash-1-0.
* Crank through the random_int_secret with (Half)SipHash-1-0.
* Return v1 ^ v3.

Here are the reasons:
* The first step is just paranoia, but SipHash's security promise depends
  on starting with an asymmetric state, we want unique per-CPU states,
  and it's a one-time cost.
* When the input words are themselves secret, there's no security
  advantage, and almost no speed advantage, to doing two rounds for one
  input word versus two words with one round each.  Thus, SipHash-1.
* The above is not exactly true, due to the before+after XOR pattern
  that SipHash uses, but I think it's true anyway.
* Likewise, there's no benefit to unkeyed finalization rounds over keyed
  ones.  That's why I just enlarged the global secret.
* The per-call seed material is hashed first on general principles,
  because that's the novel part that might have fresh entropy.
* To the extent the initial state is secret, the rounds processing the
  global secret are 4 finalization rounds for the initial state and
  the per-call entropy.
* The final word(s) of the global secret might be vulnerable to analysis,
  due to incomplete mixing, but since the global secret is always hashed
  in the same order, and larger that the desired security level, the
  initial words should be secure.
* By carrying forward the full internal state, we ensure that repeated
  calls return different results, and to the extent that the per-call
  seed material has entropy, it's preserved.
* The final return is all that's needed, since the last steps in the 
  SipRound are "v1 ^= v2" and "v3 ^= v0".  It's no security loss,
  and a very minor speedup.
* Also, this avoids directly "exposing" the final XOR with the last
  word of the global secret (which is made to v0).

If I'm allowed to use full SipHash, some shortcuts can be taken,
but I believe the above would be secure with HalfSipHash.

If additional performance is required, I'd consider shrinking the
global secret to 192 bits on 32-bit machines but I want more than
128 bits of ey material, and enough rounds to be equivalent to 4
finalization rounds.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: HalfSipHash Acceptable Usage

2016-12-21 Thread George Spelvin

Theodore Ts'o wrote:
> On Wed, Dec 21, 2016 at 01:37:51PM -0500, George Spelvin wrote:
>> SipHash annihilates the competition on 64-bit superscalar hardware.
>> SipHash dominates the field on 64-bit in-order hardware.
>> SipHash wins easily on 32-bit hardware *with enough registers*.
>> On register-starved 32-bit machines, it really struggles.

> And "with enough registers" includes ARM and MIPS, right?

Yes.  As a matter of fact, 32-bit ARM does particularly well
on 64-bit SipHash due to its shift+op instructions.

There is a noticeable performance drop, but nothing catastrophic.

The main thing I've been worried about is all the flow tracking
and NAT done by small home routers, and that's addressed by using
HalfSipHash for the hash tables.  They don't *initiate* a lot of
TCP sessions.

> So the only
> real problem is 32-bit x86, and you're right, at that point, only
> people who might care are people who are using a space-radiation
> hardened 386 --- and they're not likely to be doing high throughput
> TCP connections.  :-)

The only requirement on performance is "don't make DaveM angry." :-)

I was just trying to answer the question of why we *worried* about the
performance, not specifically argue that we *should* use HalfSipHash.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: HalfSipHash Acceptable Usage

2016-12-21 Thread George Spelvin

Eric Dumazet wrote:
> Now I am quite confused.
>
> George said :
>> Cycles per byte on 1024 bytes of data:
>>   Pentium Core 2  Ivy
>>   4   Duo Bridge
>> SipHash-2-4   38.9 8.3 5.8
>> HalfSipHash-2-4   12.7 4.5 3.2
>> MD58.3 5.7 4.7
>
> That really was for 1024 bytes blocks, so pretty much useless for our
> discussion ?

No, they're actually quite relevant, but you have to interpret them
correctly.  I thought I explained in the text following that table,
but let me make it clearer:

To find the time to compute the SipHash of N bytes, round (N+17) up to
the next multiple of 8 bytes and multiply by the numbers above.

To find the time to compute the HalfSipHash of N bytes, round (N+9) up to
the next multiple of 4 bytes and multiply by the numbers above.

To find the time to compute the MD5 of N bytes, round (N+9) up to the
next multiple of 64 bytes and multiply by the numbers above.

It's the different rounding rules that make all the difference.  For small
input blocks, SipHash can be slower per byte yet still faster because
it hashes fewer bytes.

> Reading your numbers last week, I thought SipHash was faster, but George
> numbers are giving the opposite impression.

SipHash annihilates the competition on 64-bit superscalar hardware.
SipHash dominates the field on 64-bit in-order hardware.
SipHash wins easily on 32-bit hardware *with enough registers*.
On register-starved 32-bit machines, it really struggles.

As I explained, in that last case, SipHash barely wins at all.
(On a P4, it actually *loses* to MD5, not that anyone cares.  Running
on a P4 and caring about performance are mutually exclusive.)
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: HalfSipHash Acceptable Usage

2016-12-21 Thread George Spelvin

Linus wrote:
>> How much does kernel_fpu_begin()/kernel_fpu_end() cost?
>
> It's now better than it used to be, but it's absolutely disastrous
> still. We're talking easily many hundreds of cycles. Under some loads,
> thousands.

I think I've been thoroughly dissuaded, but just to clarify one thing
that resembles a misunderstanding:

> In contrast, in reality, especially with things like "do it once or
> twice per incoming packet", you'll easily hit the absolute worst
> cases, where not only does it take a few hundred cycles to save the FP
> state, you'll then return to user space in between packets, which
> triggers the slow-path return code and reloads the FP state, which is
> another few hundred cycles plus.

Everything being discussed is per-TCP-connection overhead, *not* per
packet.  (Twice for outgoing connections, because one is to generate
the ephemeral port number.)

I know you know this, but I don't want anyone spectating to be confused
about it.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: HalfSipHash Acceptable Usage

2016-12-21 Thread George Spelvin

Actually, DJB just made a very relevant suggestion.

As I've mentioned, the 32-bit performance problems are an x86-specific
problem.  ARM does very well, and other processors aren't bad at all.

SipHash fits very nicely (and runs very fast) in the MMX registers.

They're 64 bits, and there are 8 of them, so the integer registers can
be reserved for pointers and loop counters and all that.  And there's
reference code available.

How much does kernel_fpu_begin()/kernel_fpu_end() cost?

Although there are a lot of pre-MMX x86es in embedded control applications,
I don't think anyone is worried about their networking performance.
(Specifically, all of this affects only connection setup, not throughput 
on established connections.)
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: HalfSipHash Acceptable Usage

2016-12-20 Thread George Spelvin

Eric Dumazet wrote:
> On Tue, 2016-12-20 at 22:28 -0500, George Spelvin wrote:
>> Cycles per byte on 1024 bytes of data:
>>  Pentium Core 2  Ivy
>>  4   Duo Bridge
>> SipHash-2-4  38.9 8.3 5.8
>> HalfSipHash-2-4  12.7 4.5 3.2
>> MD5   8.3 5.7 4.7
>
> So definitely not faster.
> 
> 38 cycles per byte is a problem, considering IPV6 is ramping up.

As I said earlier, SipHash performance on 32-bit x86 really sucks,
because it wants an absolute minimum of 9 32-bit registers (8 for the
state plus one temporary for the rotates), and x86 has 7.

> What about SHA performance (syncookies) on P4 ?

I recompiled with -mtune=pentium4 and re-ran.  MD5 time went *up* by
0.3 cycles/byte, HalfSipHash went down by 1 cycle, and SipHash didn't
change:

Cycles per byte on 1024 bytes of data:
Pentium Core 2  Ivy
4   Duo Bridge
SipHash-2-4 38.9 8.3 5.8
HalfSipHash-2-4 11.5 4.5 3.2
MD5  8.6 5.7 4.7
SHA-1   19.0 8.0 6.8

(This is with a verbatim copy of the lib/sha1.c code; I might be
able to optimize it with some asm hackery.)

Anyway, you see why we were looking longingly at HalfSipHash.

In fact, I have an idea.  Allow me to make the following concrete
suggestion for using HalfSipHash with 128 bits of key material:

- 64 bits are used as the key.
- The other 64 bits are used as an IV which is prepended to
  the message to be hashed.

As a matter of practical implementation, we precompute the effect
of hashing the IV and store the 128-bit HalfSipHash state, which
is used just like a 128-bit key.

Because of the way it is constructed, it is obviously no weaker than
standard HalfSipHash's 64-bit security claim.

I don't know the security of this, and it's almost certainly weaker than
128 bits, but I *hope* it's at least a few bits stronger than 64 bits.
80 would be enough to dissuade any attacker without a six-figure budget
(that's per attack, not a one-time capital investment).  96 would be
ample for our purposes.

What I do know is that it makes a brute-force attack without
significant cryptanalytic effort impossible.

To match the spec exactly, we'd need to add the 8-byte IV length to
the length byte which pads the final block, but from a security point
of view, it does not matter.  As long as we are consistent within any
single key, any unique mapping between padding byte and message length
(mod 256) is equally good.

We may choose based on implementation convenience.

(Also note my earlier comments about when it is okay to omit the padding
length byte entirely: any time all the data to be hashed with a given
key is fixed in format or self-delimiting (e.g. null-terminated).
This applies to many of the networking uses.)
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: HalfSipHash Acceptable Usage

2016-12-20 Thread George Spelvin

> I do not see why SipHash, if faster than MD5 and more secure, would be a
> problem.

Because on 32-bit x86, it's slower.

Cycles per byte on 1024 bytes of data:
Pentium Core 2  Ivy
4   Duo Bridge
SipHash-2-4 38.9 8.3 5.8
HalfSipHash-2-4 12.7 4.5 3.2
MD5  8.3 5.7 4.7

SipHash is more parallelizable and runs faster on superscalar processors,
but MD5 is optimized for 2000-era processors, and is faster on them than
HalfSipHash even.

Now, in the applications we care about, we're hashing short blocks, and
SipHash has the advantage that it can hash less than 64 bytes.  But it
also pays a penalty on short blocks for the finalization, equivalent to
two words (16 bytes) of input.

It turns out that on both Ivy Bridge and Core 2 Duo, the crossover happens
between 23 (SipHash is faster) and 24 (MD5 is faster) bytes of input.

This is assuming you're adding the 1 byte of length padding to SipHash's
input, so 24 bytes pads to 4 64-bit words, which makes 2*4+4 = 12 rounds,
vs. one block for MD5.  (MD5 takes a similar jump between 55 and 56 bytes.)

On a P4, SipHash is *never* faster; it takes 2.5x longer than MD5 on a
12-byte block (an IPv4 address/port pair).

This is why there was discussion of using HalfSipHash on these machines.
(On a P4, the HalfSipHash/MD5 crossover is somewhere between 24 and 31
bytes; I haven't benchmarked every possible size.)
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: HalfSipHash Acceptable Usage

2016-12-20 Thread George Spelvin

Theodore Ts'o wrote:
> On Mon, Dec 19, 2016 at 06:32:44PM +0100, Jason A. Donenfeld wrote:
>> 1) Anything that requires actual long-term security will use
>> SipHash2-4, with the 64-bit output and the 128-bit key. This includes
>> things like TCP sequence numbers. This seems pretty uncontroversial to
>> me. Seem okay to you?

> Um, why do TCP sequence numbers need long-term security?  So long as
> you rekey every 5 minutes or so, TCP sequence numbers don't need any
> more security than that, since even if you break the key used to
> generate initial sequence numbers seven a minute or two later, any
> pending TCP connections will have timed out long before.
> 
> See the security analysis done in RFC 6528[1], where among other
> things, it points out why MD5 is acceptable with periodic rekeying,
> although there is the concern that this could break certain hueristics
> used when establishing new connections during the TIME-WAIT state.

Because we don't rekey TCP sequence numbers, ever.  See commit
6e5714eaf77d79ae1c8b47e3e040ff5411b717ec

To rekey them requires dividing the sequence number base into a "random"
part and some "generation" msbits.  While we can do better than the
previous 8+24 split (I'd suggest 4+28 or 3+29), only 2 is tricks, and
1 generation bit isn't enough.

So while it helps in the long term, it reduces the security offered by
the random part in the short term.  (If I know 4 bits of your ISN,
I only need to send 256 MB to hit your TCP window.)

At the time, I objected, and suggested doing two hashes, with a fixed
32-bit base plus a split rekeyed portion, but that was vetoed on the
grounds of performance.

On further consideration, the fixed base doesn't help much.
(Details below for anyone that cares.)

Suppose we let the TCP initial sequence number be:

(Hash(, fixed_key) & 0x) +
(i << 28) + (Hash(, key[i]) & 0x0fff) +
(current_time_in_nanoseconds / 64)

It's not hugely difficult to mount an effective attack against a
64-bit fixed_key.

As an attacker, I can ask the target to send me these numbers for dstPort
values i control and other values I know.  I can (with high probability)
detect the large jumps when the generation changes, so I can make a
significant number of queries with the same generation.  After 23-ish
queries, I have enough information to identify a 64-bit fixed_key.

I don't know the current generation counter "i", but I know it's the
same for all my queries, so for any two queries, the maximum difference
between the 28-bit hash values is 29 bits.  (We can also add a small
margin to allow for timeing uncertainty, but that's even less.)

So if I guess a fixed key, hash my known plaintexts with that guess,
subtract the ciphertexts from the observed sequence numbers, and the
difference between the remaining (unknown) 28-bit hash values plus
timestamps exceeds what's possible, my guess is wrong.

I can then repeat with additional known plaintexts, reducing the space
of admissible keys by about 3 bits each time.

Assuming I can rent GPU horsepower from a bitcoin miner to do this in a
reasonable period of time, after 22 known plaintext differences, I have
uniquely identified the key.

Of course, in practice I'd do is a first pass with maybe 6 plaintexts
on the GPU, and then deal with the candidates found in a second pass.
But either way, it's about 2.3 SipHash evaluations per key tested.
As I noted earlier, a bitcoin blockchain block, worth 25 bitcoins,
currently costs 2^71 evaluations of SHA-2 (2^70 evaluations of double
SHA-2), and that's accomplished every 10 minutes, this is definitely
practical.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-19 Thread George Spelvin

David Laight wrote:
> From: George Spelvin
...
>> uint32_t
>> hsiphash24(char const *in, size_t len, uint32_t const key[2])
>> {
>>  uint32_t c = key[0];
>>  uint32_t d = key[1];
>>  uint32_t a = 0x6c796765 ^ 0x736f6d65;
>>  uint32_t b = d ^ 0x74656462 ^ 0x646f7261;

> I've not looked closely, but is that (in some sense) duplicating
> the key length?
> So you could set a = key[2] and b = key[3] and still have an
> working hash - albeit not exactly the one specified.

That's tempting, but not necessarily effective.  (A similar unsuccesful
idea can be found in discussions of "DES with independent round keys".
Or see the design discussion of Salsa20 and the constants in its input.)

You can increase the key size, but that might not increase the *security*
any.

The big issue is that there are a *lot* of square root attack in
cryptanalysis.  Because SipHash's state is twice the size of the key,
such an attack will have the same complexity as key exhaustion and need
not be considered.  To make a stronger security claim, you need to start
working through them all and show that they don't apply.

For SipHash in particular, an important property is asymmetry of the
internal state.  That's what duplicating the key with XORs guarantees.
If the two halves of the state end up identical, the mixing is much
weaker.

Now the probability of ending up in a "mirror state" is the square
root of the state size (1 in 2^64 for HalfSipHash's 128-bit state),
which is the same probability as guessing a key, so it's not a
problem that has to be considered when making a 64-bit security claim.

But if you want a higher security level, you have to think about
what can happen.

That said, I have been thinking very hard about

a = c ^ 0x48536970; /* 'HSip' */
d = key[2];

By guaranteeing that a and c are different, we get the desired
asymmetry, and the XOR of b and d is determined by the first word of
the message anyway, so this isn't weakening anything.

96 bits is far beyond the reach of any brute-force attack, and if a
more sophisticated 64-bit attack exists, it's at least out of the reach
of the script kiddies, and will almost certainly have a non-negligible
constant factor and more limits in when it can be applied.

> Is it worth using the 32bit hash for IP addresses on 64bit systems that
> can't do misaligned accessed?

Not a good idea.  To hash 64 bits of input:

* Full SipHash has to do two loads, a shift, an or, and two rounds of mixing.
* HalfSipHash has to do a load, two rounds, another load, and two more rounds.

In other words, in addition to being less secure, it's half the speed.  

Also, what systems are you thinking about?  x86, ARMv8, PowerPC, and
S390 (and ia64, if anyone cares) all handle unaligned loads.  MIPS has
efficient support.  Alpha and HPPA are for retrocomputing fans, not
people who care about performance.

So you're down to SPARC.  Which conveniently has the same maintainer as
the networking code, so I figure DaveM can take care of that himself. :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-17 Thread George Spelvin

To follow up on my comments that your benchmark results were peculiar,
here's my benchmark code.

It just computes the hash of all n*(n+1)/2 possible non-empty substrings
of a buffer of n (called "max" below) bytes.  "cpb" is "cycles per byte".

(The average length is (n+2)/3, c.f. https://oeis.org/A000292)

On x86-32, HSipHash is asymptotically twice the speed of SipHash,
rising to 2.5x for short strings:

SipHash/HSipHash benchmark, sizeof(long) = 4
 SipHash: max=   4 cycles= 10495 cpb=524.7500 (sum=47a4f5554869fa97)
HSipHash: max=   4 cycles=  3400 cpb=170. (sum=146a863e)
 SipHash: max=   8 cycles= 24468 cpb=203.9000 (sum=21c41a86355affcc)
HSipHash: max=   8 cycles=  9237 cpb= 76.9750 (sum=d3b5e0cd)
 SipHash: max=  16 cycles= 94622 cpb=115.9583 (sum=26d816b72721e48f)
HSipHash: max=  16 cycles= 34499 cpb= 42.2782 (sum=16bb7475)
 SipHash: max=  32 cycles=418767 cpb= 69.9811 (sum=dd5a97694b8a832d)
HSipHash: max=  32 cycles=156695 cpb= 26.1857 (sum=eed00fcb)
 SipHash: max=  64 cycles=   2119152 cpb= 46.3101 (sum=a2a725aecc09ed00)
HSipHash: max=  64 cycles=   1008678 cpb= 22.0428 (sum=99b9f4f)
 SipHash: max= 128 cycles=  12728659 cpb= 35.5788 (sum=420878cd20272817)
HSipHash: max= 128 cycles=   5452931 cpb= 15.2419 (sum=f1f4ad18)
 SipHash: max= 256 cycles=  38931946 cpb= 13.7615 (sum=e05dfb28b90dfd98)
HSipHash: max= 256 cycles=  13807312 cpb=  4.8805 (sum=ceeafcc1)
 SipHash: max= 512 cycles= 205537380 cpb=  9.1346 (sum=7d129d4de145fbea)
HSipHash: max= 512 cycles= 103420960 cpb=  4.5963 (sum=7f15a313)
 SipHash: max=1024 cycles=1540259472 cpb=  8.5817 (sum=cca7cbdc778ca8af)
HSipHash: max=1024 cycles= 796090824 cpb=  4.4355 (sum=d8f3374f)

On x86-64, SipHash is consistently faster, asymptotically approaching 2x
for long strings:

SipHash/HSipHash benchmark, sizeof(long) = 8
 SipHash: max=   4 cycles=  2642 cpb=132.1000 (sum=47a4f5554869fa97)
HSipHash: max=   4 cycles=  2498 cpb=124.9000 (sum=146a863e)
 SipHash: max=   8 cycles=  5270 cpb= 43.9167 (sum=21c41a86355affcc)
HSipHash: max=   8 cycles=  7140 cpb= 59.5000 (sum=d3b5e0cd)
 SipHash: max=  16 cycles= 19950 cpb= 24.4485 (sum=26d816b72721e48f)
HSipHash: max=  16 cycles= 23546 cpb= 28.8554 (sum=16bb7475)
 SipHash: max=  32 cycles= 80188 cpb= 13.4004 (sum=dd5a97694b8a832d)
HSipHash: max=  32 cycles=101218 cpb= 16.9148 (sum=eed00fcb)
 SipHash: max=  64 cycles=373286 cpb=  8.1575 (sum=a2a725aecc09ed00)
HSipHash: max=  64 cycles=535568 cpb= 11.7038 (sum=99b9f4f)
 SipHash: max= 128 cycles=   2075224 cpb=  5.8006 (sum=420878cd20272817)
HSipHash: max= 128 cycles=   3336820 cpb=  9.3270 (sum=f1f4ad18)
 SipHash: max= 256 cycles=  14276278 cpb=  5.0463 (sum=e05dfb28b90dfd98)
HSipHash: max= 256 cycles=  28847880 cpb= 10.1970 (sum=ceeafcc1)
 SipHash: max= 512 cycles=  50135180 cpb=  2.2281 (sum=7d129d4de145fbea)
HSipHash: max= 512 cycles=  86145916 cpb=  3.8286 (sum=7f15a313)
 SipHash: max=1024 cycles= 334111900 cpb=  1.8615 (sum=cca7cbdc778ca8af)
HSipHash: max=1024 cycles= 640432452 cpb=  3.5682 (sum=d8f3374f)


Here's the code; compile with -DSELFTEST.  (The main purpose of
printing the sum is to prevent dead code elimination.)


#if SELFTEST
#include 
#include 

static inline uint64_t rol64(uint64_t word, unsigned int shift)
{
return word << shift | word >> (64 - shift);
}

static inline uint32_t rol32(uint32_t word, unsigned int shift)
{
return word << shift | word >> (32 - shift);
}

static inline uint64_t get_unaligned_le64(void const *p)
{
return *(uint64_t const *)p;
}

static inline uint32_t get_unaligned_le32(void const *p)
{
return *(uint32_t const *)p;
}

static inline uint64_t le64_to_cpup(uint64_t const *p)
{
return *p;
}

static inline uint32_t le32_to_cpup(uint32_t const *p)
{
return *p;
}


#else
#include/* For rol64 */
#include 
#include 
#include 
#endif

/* The basic ARX mixing function, taken from Skein */
#define SIP_MIX(a, b, s) ((a) += (b), (b) = rol64(b, s), (b) ^= (a))

/*
 * The complete SipRound.  Note that, when unrolled twice like below,
 * the 32-bit rotates drop out on 32-bit machines.
 */
#define SIP_ROUND(a, b, c, d) \
(SIP_MIX(a, b, 13), SIP_MIX(c, d, 16), (a) = rol64(a, 32), \
 SIP_MIX(c, b, 17), SIP_MIX(a, d, 21), (c) = rol64(c, 32))

/*
 * This is rolled up more than most implementations, resulting in about
 * 55% the code size.  Speed is a few precent slower.  A crude benchmark
 * (for (i=1; i <= max; i++) for (j = 0; j < 4096-i; j++) hash(buf+j, i);)
 * produces the following timings (in usec):
 *
 *  i386i386i386x86_64  x86_64  x86_64  x86_64
 * Length   small   unroll  halfmd4 small   unroll  halfmd4 teahash
 * 1..4106910291608 195 160 399 690
 * 1..8248323813851 410 360 9881659
 * 1..12   430341526207 690 61816422690
 * 1..16   61225931

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-17 Thread George Spelvin

BTW, here's some SipHash code I wrote for Linux a while ago.

My target application was ext4 directory hashing, resulting in different
implementation choices, although I still think that a rolled-up
implementation like this is reasonable.  Reducing I-cache impact speeds
up the calling code.

One thing I'd like to suggest you steal is the way it handles the
fetch of the final partial word.  It's a lot smaller and faster than
an 8-way case statement.


#include/* For rol64 */
#include 
#include 
#include 

/* The basic ARX mixing function, taken from Skein */
#define SIP_MIX(a, b, s) ((a) += (b), (b) = rol64(b, s), (b) ^= (a))

/*
 * The complete SipRound.  Note that, when unrolled twice like below,
 * the 32-bit rotates drop out on 32-bit machines.
 */
#define SIP_ROUND(a, b, c, d) \
(SIP_MIX(a, b, 13), SIP_MIX(c, d, 16), (a) = rol64(a, 32), \
 SIP_MIX(c, b, 17), SIP_MIX(a, d, 21), (c) = rol64(c, 32))

/*
 * This is rolled up more than most implementations, resulting in about
 * 55% the code size.  Speed is a few precent slower.  A crude benchmark
 * (for (i=1; i <= max; i++) for (j = 0; j < 4096-i; j++) hash(buf+j, i);)
 * produces the following timings (in usec):
 *
 *  i386i386i386x86_64  x86_64  x86_64  x86_64
 * Length   small   unroll  halfmd4 small   unroll  halfmd4 teahash
 * 1..4106910291608 195 160 399 690
 * 1..8248323813851 410 360 9881659
 * 1..12   430341526207 690 61816422690
 * 1..16   612259318668 968 87623633786
 * 1..20   83488137   112451323118531625567
 * 1..24  10580   10327   139351657150440667635
 * 1..28  13211   12956   168032069187150289759
 * 1..32  15843   15572   19725247022606084   11932
 * 1..36  18864   18609   24259293426787566   14794
 * 1..1024  5890194 6130242 10264816 881933  881244 3617392 7589036
 *
 * The performance penalty is quite minor, decreasing for long strings,
 * and it's significantly faster than half_md4, so I'm going for the
 * I-cache win.
 */
uint64_t
siphash24(char const *in, size_t len, uint32_t const seed[4])
{
uint64_t a = 0x736f6d6570736575;/* somepseu */
uint64_t b = 0x646f72616e646f6d;/* dorandom */
uint64_t c = 0x6c7967656e657261;/* lygenera */
uint64_t d = 0x7465646279746573;/* tedbytes */
uint64_t m = 0;
uint8_t padbyte = len;

/*
 * Mix in the 128-bit hash seed.  This is in a format convenient
 * to the ext3/ext4 code.  Please feel free to adapt the
 * */
if (seed) {
m = seed[2] | (uint64_t)seed[3] << 32;
b ^= m;
d ^= m;
m = seed[0] | (uint64_t)seed[1] << 32;
/* a ^= m; is done in loop below */
c ^= m;
}

/*
 * By using the same SipRound code for all iterations, we
 * save space, at the expense of some branch prediction.  But
 * branch prediction is hard because of variable length anyway.
 */
len = len/8 + 3;/* Now number of rounds to perform */
do {
a ^= m;

switch (--len) {
unsigned bytes;

default:/* Full words */
d ^= m = get_unaligned_le64(in);
in += 8;
break;
case 2: /* Final partial word */
/*
 * We'd like to do one 64-bit fetch rather than
 * mess around with bytes, but reading past the end
 * might hit a protection boundary.  Fortunately,
 * we know that protection boundaries are aligned,
 * so we can consider only three cases:
 * - The remainder occupies zero words
 * - The remainder fits into one word
 * - The remainder straddles two words
 */
bytes = padbyte & 7;

if (bytes == 0) {
m = 0;
} else {
unsigned offset = (unsigned)(uintptr_t)in & 7;

if (offset + bytes <= 8) {
m = le64_to_cpup((uint64_t const *)
(in - offset));
m >>= 8*offset;
} else {
m = get_unaligned_le64(in);
}
m &= ((uint64_t)1 <<

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin

> I already did this. Check my branch.

Do you think it should return "u32" (as you currently have it) or
"unsigned long"?  I thought the latter, since it doesn't cost any
more and makes more 

> I wonder if this could also lead to a similar aliasing
> with arch_get_random_int, since I'm pretty sure all rdrand-like
> instructions return native word size anyway.

Well, Intel's can return 16, 32 or 64 bits, and it makes a
small difference with reseed scheduling.

>> - Ted, Andy Lutorminski and I will try to figure out a construction of
>>   get_random_long() that we all like.

> And me, I hope... No need to make this exclusive.

Gaah, engage brain before fingers.  That was so obvious I didn't say
it, and the result came out sounding extremely rude.

A better (but longer) way to write it would be "I'm sorry that I, Ted,
and Andy are all arguing with you and each other about how to do this
and we can't finalize this part yet".
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin

> 64-bit security for an RNG is not reasonable even with rekeying. No no
> no. Considering we already have a massive speed-up here with the
> secure version, there's zero reason to start weakening the security
> because we're trigger happy with our benchmarks. No no no.

Just to clarify, I was discussing the idea with Ted (who's in charge of
the whole thing, not me), not trying to make any sort of final decision
on the subject.  I need to look at the various users (46 non-trivial ones
for get_random_int, 15 for get_random_long) and see what their security
requirements actually are.

I'm also trying to see if HalfSipHash can be used in a way that gives
slightly more than 64 bits of effective security.

The problem is that the old MD5-based transform had unclear, but
obviously ample, security.  There were 64 bytes of global secret and
16 chaining bytes per CPU.  Adapting SipHash (even the full version)
takes more thinking.

An actual HalfSipHash-based equivalent to the existing code would be:

#define RANDOM_INT_WORDS (64 / sizeof(long))/* 16 or 8 */

static u32 random_int_secret[RANDOM_INT_WORDS]
cacheline_aligned __read_mostly;
static DEFINE_PER_CPU(unsigned long[4], get_random_int_hash)
__aligned(sizeof(unsigned long));

unsigned long get_random_long(void)
{
unsigned long *hash = get_cpu_var(get_random_int_hash);
unsigned long v0 = hash[0], v1 = hash[1], v2 = hash[2], v3 = hash[3];
int i;

/* This could be improved, but it's equivalent */
v0 += current->pid + jiffies + random_get_entropy();

for (i = 0; i < RANDOM_INT_WORDS; i++) {
v3 ^= random_int_secret[i];
HSIPROUND;
HSIPROUND;
v0 ^= random_int_secret[i];
}
/* To be equivalent, we *don't* finalize the transform */

hash[0] = v0; hash[1] = v1; hash[2] = v2; hash[3] = v3;
put_cpu_var(get_random_int_hash);

return v0 ^ v1 ^ v2 ^ v3;
}

I don't think there's a 2^64 attack on that.

But 64 bytes of global secret is ridiculous if the hash function
doesn't require that minimum block size.  It'll take some thinking.


Ths advice I'd give now is:
- Implement
unsigned long hsiphash(const void *data, size_t len, const unsigned long key[2])
  .. as SipHash on 64-bit (maybe SipHash-1-3, still being discussed) and
  HalfSipHash on 32-bit.
- Document when it may or may not be used carefully.
- #define get_random_int (unsigned)get_random_long
- Ted, Andy Lutorminski and I will try to figure out a construction of
  get_random_long() that we all like.


('scuse me for a few hours, I have some unrelated things I really *should*
be working on...)
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin

An idea I had which mght be useful:

You could perhaps save two rounds in siphash_*u64.

The final word with the length (called "b" in your implementation)
only needs to be there if the input is variable-sized.

If every use of a given key is of a fixed-size input, you don't need
a length suffix.  When the input is an even number of words, that can
save you two rounds.

This requires an audit of callers (e.g. you have to use different
keys for IPv4 and IPv6 ISNs), but can save time.

(This is crypto 101; search "MD-strengthening" or see the remark on
p. 101 on Damgaard's 1989 paper "A design principle for hash functions" at
http://saluc.engr.uconn.edu/refs/algorithms/hashalg/damgard89adesign.pdf
but I'm sure that Ted, Jean-Philippe, and/or DJB will confirm if you'd
like.)

Jason A. Donenfeld wrote:
> Oh, okay, that is exactly what I thought was going on. I just thought
> you were implying that jiffies could be moved inside the hash, which
> then confused my understanding of how things should be. In any case,
> thanks for the explanation.

No, the rekeying procedure is cleverer.

The thing is, all that matters is that the ISN increments fast enough,
but not wrap too soon.

It *is* permitted to change the random base, as long as it only
increases, and slower than the timestamp does.

So what you do is every few minutes, you increment the high 4 bits of the
random base and change the key used to generate the low 28 bits.

The base used for any particular host might change from 0x1000
to 0x2fff, or from 0x1fff to 0x2000, but either way, it's
increasing, and not too fast.

This has the downside that an attacker can see 4 bits of the base,
so only needs to send send 2^28 = 256 MB to flood the connection,
but the upside that the key used to generate the low bits changes
faster than it can be broken.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin

> What should we do with get_random_int() and get_random_long()?  In
> some cases it's being used in performance sensitive areas, and where
> anti-DoS protection might be enough.  In others, maybe not so much.

This is tricky.  The entire get_random_int() structure is an abuse of
the hash function and will need to be thoroughly rethought to convert
it to SipHash.  Remember, SipHash's security goals are very different
from MD5, so there's no obvious way to do the conversion.

(It's *documented* as "not cryptographically secure", but we know
where that goes.)

> If we rekeyed the secret used by get_random_int() and
> get_random_long() frequently (say, every minute or every 5 minutes),
> would that be sufficient for current and future users of these
> interfaces?

Remembering that on "real" machines it's full SipHash, then I'd say that
64-bit security + rekeying seems reasonable.

The question is, the idea has recently been floated to make hsiphash =
SipHash-1-3 on 64-bit machines.  Is *that* okay?


The annoying thing about the currently proposed patch is that the *only*
chaining is the returned value.  What I'd *like* to do is the same
pattern as we do with md5, and remember v[0..3] between invocations.
But there's no partial SipHash primitive; we only get one word back.

Even
*chaining += ret = siphash_3u64(...)

would be an improvement.

Although we could do something like

c0 = chaining[0];
chaining[0] = c1 = chaining[1];

ret = hsiphash(c0, c1, ...)

chaining[1] = c0 + ret;
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin

Jason A. Donenfeld wrote:
> I saw that jiffies addition in there and was wondering what it was all
> about. It's currently added _after_ the siphash input, not before, to
> keep with how the old algorithm worked. I'm not sure if this is
> correct or if there's something wrong with that, as I haven't studied
> how it works. If that jiffies should be part of the siphash input and
> not added to the result, please tell me. Otherwise I'll keep things
> how they are to avoid breaking something that seems to be working.

Oh, geez, I didn't realize you didn't understand this code.

Full details at
https://en.wikipedia.org/wiki/TCP_sequence_prediction_attack

But yes, the sequence number is supposed to be (random base) + (timestamp).
In the old days before Canter & Siegel when the internet was a nice place,
people just used a counter that started at boot time.

But then someone observed that I can start a connection to host X,
see the sequence number it gives back to me, and thereby learn the
seauence number it's using on its connections to host Y.

And I can use that to inject forged data into an X-to-Y connection,
without ever seeing a single byte of the traffic!  (If I *can* observe
the traffic, of course, none of this makes the slightest difference.)

So the random base was made a keyed hash of the endpoint identifiers.
(Practically only the hosts matter, but generally the ports are thrown
in for good measure.)  That way, the ISN that host X sends to me
tells me nothing about the ISN it's using to talk to host Y.  Now the
only way to inject forged data into the X-to-Y connection is to
send 2^32 bytes, which is a little less practical.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin

Tom Herbert wrote:
> Tested this. Distribution and avalanche effect are still good. Speed
> wise I see about a 33% improvement over siphash (20 nsecs/op versus 32
> nsecs). That's about 3x of jhash speed (7 nsecs). So that might closer
> to a more palatable replacement for jhash. Do we lose any security
> advantages with halfsiphash?

What are you testing on?  And what input size?  And does "33% improvement"
mean 4/3 the rate and 3/4 the time?  Or 2/3 the time and 3/2 the rate?

These are very odd results.  On a 64-bit machine, SipHash should be the
same speed per round, and faster because it hashes more data per round.
(Unless you're hitting some unexpected cache/decode effect due to REX
prefixes.)

On a 32-bit machine (other than ARM, where your results might make sense,
or maybe if you're hashing large amounts of data), the difference should
be larger.

And yes, there is a *significant* security loss.  SipHash is 128 bits
("don't worry about it").  hsiphash is 64 bits, which is known breakable
("worry about it"), so we have to do a careful analysis of the cost of
a successful attack.

As mentioned in the e-mails that just flew by, hsiphash is intended
*only* for 32-bit machines which bog down on full SipHash.  On all 64-bit
machines, it will be implemented as an alias for SipHash and the security
concerns will Just Go Away.

The place where hsiphash is expected to make a big difference is 32-bit
x86.  If you only see 33% difference with "gcc -m32", I'm going to be
very confused.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin

>> On a 64-bit machine, 64-bit SipHash is *always* faster than 32-bit, and
>> should be used always.  Don't even compile the 32-bit code, to prevent
>> anyone accidentally using it, and make hsiphash an alias for siphash.

> Fascinating! Okay. So I'll alias hsiphash to siphash on 64-bit then. I
> like this arrangement.

This is a basic assumption I make in the security analysis below:
on most machines, it's 128-bit-key SipHash everywhere and we can
consider security solved.

Our analysis *only* has to consider 32-bit machines.  My big concern
is home routers, with IoT appliances coming second.  The routers have
severe hardware cost constraints (so limited CPU power), but see a lot
of traffic and need to process (NAT) it.

> That's a nice analysis. Might one conclude from that that hsiphash is
> not useful for our purposes? Or does it still remain useful for
> network facing code?

I think for attacks where the threat is a DoS, it's usable.  The point
is you only have to raise the cost to equal that of a packet flood.
(Just like in electronic warfare, the best you can possibly do is force
the enemy to use broadband jamming.)

Hash collision attacks just aren't that powerful.  The original PoC
was against an application that implemented a hard limit on hash chain
length as a DoS defense, which the attack then exploited to turn it into
a hard DoS.

>> Let me consider your second example above, "secure against local users".
>> I should dig through your patchset and find the details, but what exactly
>> are the consequences of such an attack?  Hasn't a local user already
>> got much better ways to DoS the system?

> For example, an unpriv'd user putting lots of entries in one hash
> bucket for a shared resource that's used by root, like filesystems or
> other lookup tables. If he can cause root to use more of root's cpu
> schedule budget than otherwise in a directed way, then that's a bad
> DoS.

This issue was recently discussed when we redesigned the dcache hash.
Even a successful attack doesn't slow things down all *that* much.

Before you overkill every hash table in the kernel, think about whether
it's a bigger problem than the dcache.  (Hint: it's probably not.)
There's no point armor-plating the side door when the front door was
just upgraded from screen to wood.

>> These days, 32-bit CPUs are for embedded applications: network appliances,
>> TVs, etc.  That means basically single-user.  Even phones are 64 bit.
>> Is this really a threat that needs to be defended against?

> I interpret this to indicate all the more reason to alias hsiphash to
> siphash on 64-bit, and then the problem space collapses in a clear
> way.

Yes, exactly.  

> Right. Hence the need for always using full siphash and not hsiphash
> for sequence numbers, per my earlier email to David.
>
>> I wish we could get away with 64-bit security, but given that the
>> modern internet involves attacks from NSA/Spetssvyaz/3PLA, I agree
>> it's just not enough.
>
> I take this comment to be relavent for the sequence number case.

Yes.

> For hashtables and hashtable flooding, is it still your opinion that
> we will benefit from hsiphash? Or is this final conclusion a rejection
> of hsiphash for that too? We're talking about two different use cases,
> and your email kind of interleaved both into your analysis, so I'm not
> certain so to precisely what your conclusion is for each use case. Can
> you clear up the ambiguity?

My (speaking enerally; I should walk through every hash table you've
converted) opinion is that:

- Hash tables, even network-facing ones, can all use hsiphash as long
  as an attacker can only see collisions, i.e. ((H(x) ^ H(y)) & bits) ==
  0, and the consequences of a successful attack is only more collisions
  (timing).  While the attack is only 2x the cost (two hashes rather than
  one to test a key), the knowledge of the collision is statistical,
  especially for network attackers, which raises the cost of guessing
  beyond an even more brute-force attack.
- When the hash value directly visible (e.g. included in a network
  packet), full SipHash should be the default.
- Syncookies *could* use hsiphash, especially as there are
  two keys in there.  Not sure if we need the performance.
- For TCP ISNs, I'd prefer to use full SipHash.  I know this is
  a very hot path, and if that's a performance bottleneck,
  we can work harder on it.

In particular, TCP ISNs *used* to rotate the key periodically,
limiting the time available to an attacker to perform an
attack before the secret goes stale and is useless.  commit
6e5714eaf77d79ae1c8b47e3e040ff5411b717ec upgraded to md5 and dropped
the key rotation.

If 2x hsiphash is faster than siphash, we could use a double-hashing
system like syncookies.  One 32-bit hash with a permanent key, summed
with a k-bit counter and a (32-k)-bit hash, where the key is rotated
(and the counter incremented) periodically.

The requirement is that the increment rate of the counter hash doesn't

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin

> It appears that hsiphash can produce either 32-bit output or 64-bit
> output, with the output length parameter as part of the hash algorithm
> in there. When I code this for my kernel patchset, I very likely will
> only implement one output length size. Right now I'm leaning toward
> 32-bit.

A 128-bit output option was added to SipHash after the initial publication;
this is just the equivalent in 32-bit.

> - Is this a reasonable choice?

Yes.

> - Are there reasons why hsiphash with 64-bit output would be
>   reasonable? Or will we be fine sticking with 32-bit output only?

Personally, I'd put in a comment saying that "there's a 64-bit output
variant that's not implemented" and punt until someone find a need.

> With both hsiphash and siphash, the division of usage will probably become:
> - Use 64-bit output 128-bit key siphash for keyed RNG-like things,
>   such as syncookies and sequence numbers
> - Use 64-bit output 128-bit key siphash for hashtables that must
>   absolutely be secure to an extremely high bandwidth attacker, such as
>   userspace directly DoSing a kernel hashtable
> - Use 32-bit output 64-bit key hsiphash for quick hashtable functions
>   that still must be secure but do not require as large of a security
>   margin.

On a 64-bit machine, 64-bit SipHash is *always* faster than 32-bit, and
should be used always.  Don't even compile the 32-bit code, to prevent
anyone accidentally using it, and make hsiphash an alias for siphash.

On a 32-bit machine, it's a much trickier case.  I'd be tempted to
use the 32-bit code always, but it needs examination.

Fortunately, the cost of brute-forcing hash functions can be fairly
exactly quantified, thanks to bitcoin miners.  It currently takes 2^70
hashes to create one bitcoin block, worth 25 bitcoins ($19,500).  Thus,
2^63 hashes cost $152.

Now, there are two factors that must be considered:
- That's a very very "wholesale" rate.  That's assuming you're doing
  large numbers of these and can put in the up-front effort designing
  silicon ASICs to do the attack.
- That's for a more difficult hash (double sha-256) than SipHash.
  That's a constant fator, but a pretty significant one.  If the wholesale
  assumption holds, that might bring the cost down another 6 or 7 bits,
  to $1-2 per break.

If you're not the NSA and limited to general-purpose silicon, let's
assume a state of the art GPU (Radeon HD 7970; AMD GPUs seem do to better
than nVidia).  The bitcoin mining rate for those is about 700M/second,
29.4 bits.  So 63 bits is 152502 GPU-days, divided by some factor
to account for SipHash's high speed compared to two rounds of SHA-2.
Call it 1000 GPU-days.

It's very doable, but also very non-trivial.  The question is, wouldn't
it be cheaper and easier just to do a brute-force flooding DDoS?

(This is why I wish the key size could be tweaked up to 80 bits.
That would take all these numbers out of the reasonable range.)


Let me consider your second example above, "secure against local users".
I should dig through your patchset and find the details, but what exactly
are the consequences of such an attack?  Hasn't a local user already
got much better ways to DoS the system?

The thing to remember is that we're worried only about the combination
of a *new* Linux kernel (new build or under active maintenance) and a
32-bit host.  You'd be hard-pressed to find a *single* machine fitting
that description which is hosting multiple users or VMs and is not 64-bit.

These days, 32-bit CPUs are for embedded applications: network appliances,
TVs, etc.  That means basically single-user.  Even phones are 64 bit.
Is this really a threat that needs to be defended against?


For your first case, network applications, the additional security
is definitely attractive.  Syncookies are only a DoS, but sequence
numbers are a real security issue; they can let you inject data into a
TCP connection.

Hash tables are much harder to attack.  The information you get back from
timing probes is statistical, and thus testing a key is more expensive.
With sequence numbers, large amounts (32 bits) the hash output is
directly observable.

I wish we could get away with 64-bit security, but given that the
modern internet involves attacks from NSA/Spetssvyaz/3PLA, I agree
it's just not enough.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v5 2/4] siphash: add Nu{32,64} helpers

2016-12-16 Thread George Spelvin

Jason A. Donenfeld wrote:
> Isn't that equivalent to:
>   v0 = key[0];
>   v1 = key[1];
>   v2 = key[0] ^ (0x736f6d6570736575ULL ^ 0x646f72616e646f6dULL);
>   v3 = key[1] ^ (0x646f72616e646f6dULL ^ 0x7465646279746573ULL);

(Pre-XORing key[] with the first two constants which, if the constants
are random in the first place, can be a no-op.)  Other than the typo
in the v2 line, yes.  If they key is non-public, then you can xor an
arbitrary constant in to both halves to slightly speed up the startup.

(Nits: There's a typo in the v2 line, you don't need to parenthesize
associative operators like xor, and the "ull" suffix is redundant here.)

> Those constants also look like ASCII strings.

They are.  The ASCII is "somepseudorandomlygeneratedbytes".

> What cryptographic analysis has been done on the values?

They're "nothing up my sleeve numbers".

They're arbitrary numbers, and almost any other values would do exactly
as well.  The main properties are:

1) They're different (particulatly v0 != v2 and v1 != v3), and
2) Neither they, nor their xor, is rotationally symmetric like 0x.
   (Because SipHash is mostly rotationally symmetric, broken only by the
   interruption of the carry chain at the msbit, it helps slightly
   to break this up at the beginning.)

Those exact values only matter for portability.  If you don't need anyone
else to be able to compute matching outputs, then you could use any other
convenient constants (like the MD5 round constants).
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-15 Thread George Spelvin

Jean-Philippe Aumasson wrote:
> If a halved version of SipHash can bring significant performance boost
> (with 32b words instead of 64b words) with an acceptable security level
> (64-bit enough?) then we may design such a version.

It would be fairly significant, a 2x speed benefit on a lot of 32-bit
machines.

First is the fact that a 64-bit SipHash round on a generic 32-bit machine
requires not twice as many instructions, but more than three.

Consider the core SipHash quarter-round operation:
a += b;
b = rotate_left(b, k)
b ^= a

The add and xor are equivalent between 32- and 64-bit rounds; twice the
instructions do twice the work.  (There's a dependency via the carry
bit between the two halves of the add, but it ends up not being on the
critical path even in a superscalar implementation.)

The problem is the rotates.  Although some particularly nice code is
possible on 32-bit ARM due to its support for shift-and-xor operations,
on a generic 32-bit CPU the rotate grows to 6 instructions with a 2-cycle
dependency chain (more in practice because barrel shifters are large and
even quad-issue CPUs can't do 4 shifts per cycle):

temp_lo = b_lo >> (32-k)
temp_hi = b_hi >> (32-k)
b_lo <<= k
b_hi <<= k
b_lo ^= temp_hi
b_hi ^= temp_lo

The resultant instruction counts and (assuming wide issue)
latencies are:

64-bit SipHash  "Half" SipHash
Inst.   Latency Inst.   Latency
 10  33  2  Quarter round
 40  6   12  4  Full round
 80 12   24  8  Two rounds
 82 13   26  9  Mix in one word
 82 13   52 18  Mix in 64 bits
166 26   61 18  Four round finalization + final XOR
248 39  113 36  Hash 64 bits
330 52  165 54  Hash 128 bits
412 65  217 72  Hash 192 bits

While the ideal latencies are actually better for the 64-bit algorithm,
that requires an unrealistic 6+-wide superscalar implementation that's
more than twice as wide as the 64-bit code requires (which is already
optimized for quad-issue).  For a 1- or 2-wide processor, the instruction
counts dominate, and not only does the 64-bit algorithm take 60% more
time to mix in the same number of bytes, but the finalization rounds
bring the ratio to 2:1 for small inputs.

(And I haven't included the possible savings if the input size is an odd
number of 32-bit words, such as networking applications which include
the source/dest port numbers.)

Notes on particular processors:
- x86 can do a 64-bit rotate in 3 instructions and 2 cycles using
  the SHLD/SHRD instructions instead:
movl%b_hi, %temp
shldl   $k, %b_lo, %b_hi
shldl   $k, %temp, %b_lo
  ... but as I mentioned the problem is registers.  SipHash needs 8 32-bit
  words plus at least one temporary, and 32-bit x86 has only 7 available.
  (And compilers can rarely manage to keep more than 6 of them busy.)
- 64-bit SipHash is particularly efficient on 32-bit ARM due to its
  support for shift-and-op instructions.  The 64-bit shift and following
  xor can be done in 4 instructions.  So the only benefit is from the
  reduced finalization.
- Double-width adds cost a little more on CPUs like MIPS and RISC-V without
  condition codes.
- Certain particularly crappy uClinux processors with slow shifts
  (68000, anyone?) really suffer from extra shifts.

One *weakly* requested feature: It might simplify some programming
interfaces if we could use the same key for multiple hash tables with a
1-word "tweak" (e.g. pointer to the hash table, so it could be assumed
non-zero if that helped) to make distinct functions.  That would let us
more safely use a global key for multiple small hash tables without the
need to add code to generate and store key material for each place that
an unkeyed hash is replaced.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-15 Thread George Spelvin

> If a halved version of SipHash can bring significant performance boost
> (with 32b words instead of 64b words) with an acceptable security level
> (64-bit enough?) then we may design such a version.

I was thinking if the key could be pushed to 80 bits, that would be nice,
but honestly 64 bits is fine.  This is DoS protection, and while it's
possible to brute-force a 64 bit secret, there are more effective (DDoS)
attacks possible for the same cost.

(I'd suggest a name of "HalfSipHash" to convey the reduced security
effectively.)

> Regarding output size, are 64 bits sufficient?

As a replacement for jhash, 32 bits are sufficient.  It's for
indexing an in-memory hash table on a 32-bit machine.


(When you're done thinking about this, as a matter of personal interest
I'd love a hash expert's opinion on
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2a18da7a9c7886f1c7307f8d3f23f24318583f03
and
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8387ff2577eb9ed245df9a39947f66976c6bcd02
which is a non-cryptographic hash function of novel design that's
inspired by SipHash.)
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-15 Thread George Spelvin

> While SipHash is extremely fast for a cryptographically secure function,
> it is likely a tiny bit slower than the insecure jhash, and so replacements
> will be evaluated on a case-by-case basis based on whether or not the
> difference in speed is negligible and whether or not the current jhash usage
> poses a real security risk.

To quantify that, jhash is 27 instructions per 12 bytes of input, with a
dependency path length of 13 instructions.  (24/12 in __jash_mix, plus
3/1 for adding the input to the state.) The final add + __jhash_final
is 24 instructions with a path length of 15, which is close enough for
this handwaving.  Call it 18n instructions and 8n cycles for 8n bytes.

SipHash (on a 64-bit machine) is 14 instructions with a dependency path
length of 4 *per round*.  Two rounds per 8 bytes, plus plus two adds
and one cycle per input word, plus four rounds to finish makes 30n+46
instructions and 9n+16 cycles for 8n bytes.

So *if* you have a 64-bit 4-way superscalar machine, it's not that much
slower once it gets going, but the four-round finalization is quite
noticeable for short inputs.

For typical kernel input lengths "within a factor of 2" is
probably more accurate than "a tiny bit".

You lose a factor of 2 if you machine is 2-way or non-superscalar,
and a second factor of 2 if it's a 32-bit machine.

I mention this because there are a lot of home routers and other netwoek
appliances running Linux on 32-bit ARM and MIPS processors.  For those,
it's a factor of *eight*, which is a lot more than "a tiny bit".

The real killer is if you don't have enough registers; SipHash performs
horribly on i386 because it uses more state than i386 has registers.

(If i386 performance is desired, you might ask Jean-Philippe for some
rotate constants for a 32-bit variant with 64 bits of key.  Note that
SipHash's security proof requires that key length + input length is
strictly less than the state size, so for a 4x32-bit variant, while
you could stretch the key length a little, you'd have a hard limit at
95 bits.)


A second point, the final XOR in SipHash is either a (very minor) design
mistake, or an opportunity for optimization, depending on how you look
at it.  Look at the end of the function:

>+  SIPROUND;
>+  SIPROUND;
>+  return (v0 ^ v1) ^ (v2 ^ v3);

Expanding that out, you get:
+   v0 += v1; v1 = rol64(v1, 13); v1 ^= v0; v0 = rol64(v0, 32);
+   v2 += v3; v3 = rol64(v3, 16); v3 ^= v2;
+   v0 += v3; v3 = rol64(v3, 21); v3 ^= v0;
+   v2 += v1; v1 = rol64(v1, 17); v1 ^= v2; v2 = rol64(v2, 32);
+   return v0 ^ v1 ^ v2 ^ v3;

Since the final XOR includes both v0 and v3, it's undoing the "v3 ^= v0"
two lines earlier, so the value of v0 doesn't matter after its XOR into
v1 on line one.

The final SIPROUND and return can then be optimized to

+   v0 += v1; v1 = rol64(v1, 13); v1 ^= v0;
+   v2 += v3; v3 = rol64(v3, 16); v3 ^= v2;
+   v3 = rol64(v3, 21);
+   v2 += v1; v1 = rol64(v1, 17); v1 ^= v2; v2 = rol64(v2, 32);
+   return v1 ^ v2 ^ v3;

A 32-bit implementation could further tweak the 4 instructions of
v1 ^= v2; v2 = rol64(v2, 32); v1 ^= v2;

gcc 6.2.1 -O3 compiles it to basically:
v1.low ^= v2.low;
v1.high ^= v2.high;
v1.low ^= v2.high;
v1.high ^= v2.low;
but it could be written as:
v2.low ^= v2.high;
v1.low ^= v2.low;
v1.high ^= v2.low;

Alternatively, if it's for private use only (key not shared with other
systems), a slightly stronger variant would "return v1 ^ v3;".
(The final swap of v2 is dead code, but a compiler can spot that easily.)
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] siphash: add cryptographically secure hashtable function

2016-12-10 Thread George Spelvin

> There's a 32-bit secret random salt (inet_ehash_secret) which means
> that in practice, inet_ehashfn() will select 1 out of 2^32 different
> hash functions at random each time you boot the kernel; without
> knowing which one it selected, how can a local or remote attacker can
> force IPv4 connections/whatever to go into a single hash bucket?

By figuring out the salt.  The thing is, the timing of hash table lookups
*is externally visible*.  If I create connections to the target, then
see which ones make responses on previous connections slightly slower,
I gain information about the salt.

I dont't know *where* in the hash table the collissions occur, but I
know *which* inputs collide, and that's enough for me to learn something.

(I need more connections than the size of the hash table, but even
with just one IP source I can use 64K ports on my end times however
many the target has open on its end.)

With enough information (google "unicity distance") I can recover the
entire salt.  It's not like I care about the cryptographic strength of
the hash; simply trying all 4 billion possible seeds is pretty fast on
a 4 GHz processor.

Once that happens, I can choose a target connection whose timing I can't
observe directly and pack its hash chain without being obvious about it.

> I am happy to be proven wrong, but you make it sound very easy to
> exploit the current situation, so I would just like to ask whether you
> have a concrete way to do that?

I don't think anyone's implemented an attack on this particular hash
table yet, and the reason it hasn't been urgent is that it's just a mild
DoS attack it makes the computer noticeably slower withough disabling
it completely.

But the general style of attack is well known and has been repeatedly
demonstrated.  Its practicality is not in question.  The only question is
whether it's *more* practical that simpler techniques that don't depend
on any such algorithmic subtlety like brute-force flooding.

But if the history of Internet security has taught us one thing, it's
that naively hoping something won't be a problem is doomed.


The main issue is performance.  IPv6 addresses are big, and although
SipHash is fast by the standard of cryptographic hashes, it's far slower
than jhash or any other non-cryptographic hash.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

2016-06-29 Thread George Spelvin

>> Also not mentioned in the documentation is that some algorithms *do*
>> have different implementations depending on key size.  SHA-2 is the
>> classic example.

> What do you mean by that? SHA has no keying at all.

In this case, the analagous property is hash size.  Sorry, I thought
that was so obvious I didn't need to say it.

Specifically, SHA2-256 (and -224) and SHA2-512 (and -384) are separate
algorithms with similar structures but deparate implementations.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

2016-06-28 Thread George Spelvin

> We have actually gained quite a bit of documentation recently.
> Have you looked at Documentation/DocBook/crypto-API.tmpl?
> 
> More is always welcome of course.

It's improved since I last looked at it, but there are still many structures
that aren't described:

- struct crypto_instance
- struct crypto_spawn
- struct crypto_blkcipher
- struct blkcipher_desc
- More on the context structures returned by crypto_tfm_ctx

Also not mentioned in the documentation is that some algorithms *do*
have different implementations depending on key size.  SHA-2 is the
classic example.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

2016-06-28 Thread George Spelvin

Just a note, crypto/cts.c also does a lot of sg_set_buf() in stack buffers.

I have a local patch (appended, if anyone wants) to reduce the wasteful
amount of buffer space it uses (from 7 to 3 blocks on encrypt, from
6 to 3 blocks on decrypt), but it would take some rework to convert to
crypto_cipher_encrypt_one() or avoid stack buffers entirely.

commit c0aa0ae38dc6115b378939c5483ba6c7eb65d92a
Author: George Spelvin <li...@sciencehorizons.net>
Date:   Sat Oct 10 17:26:08 2015 -0400

crypto: cts - Reduce internal buffer usage

It only takes a 3-block temporary buffer to handle all the tricky
CTS cases.  Encryption could in theory be done with two, but at a cost
in complexity.

But it's still a saving from the previous six blocks on the stack.

One issue I'm uncertain of and I'd like clarification on: to simplify
the cts_cbc_{en,de}crypt calls, I pass in the lcldesc structure which
contains the ctx->child transform rather than the parent one.  I'm
assuming the block sizes are guaranteed to be the same (they're set up
in crypto_cts_alloc by copying), but I haven't been able to prove it to
my satisfaction.

Signed-off-by: George Spelvin <li...@sciencehorizons.net>

diff --git a/crypto/cts.c b/crypto/cts.c
index e467ec0ac..e24d2e15 100644
--- a/crypto/cts.c
+++ b/crypto/cts.c
@@ -70,54 +70,44 @@ static int crypto_cts_setkey(struct crypto_tfm *parent, 
const u8 *key,
return err;
 }
 
-static int cts_cbc_encrypt(struct crypto_cts_ctx *ctx,
-  struct blkcipher_desc *desc,
+/*
+ * The final CTS encryption is just like CBC encryption except that:
+ * - the last plaintext block is zero-padded,
+ * - the second-last ciphertext block is trimmed, and
+ * - the last (complete) block of ciphertext is output before the
+ *   (truncated) second-last one.
+ */
+static int cts_cbc_encrypt(struct blkcipher_desc *lcldesc,
   struct scatterlist *dst,
   struct scatterlist *src,
   unsigned int offset,
   unsigned int nbytes)
 {
-   int bsize = crypto_blkcipher_blocksize(desc->tfm);
-   u8 tmp[bsize], tmp2[bsize];
-   struct blkcipher_desc lcldesc;
-   struct scatterlist sgsrc[1], sgdst[1];
+   int bsize = crypto_blkcipher_blocksize(lcldesc->tfm);
+   u8 tmp[3*bsize] __aligned(8);
+   struct scatterlist sgsrc[1], sgdst[2];
int lastn = nbytes - bsize;
-   u8 iv[bsize];
-   u8 s[bsize * 2], d[bsize * 2];
int err;
 
-   if (lastn < 0)
+   if (lastn <= 0)
return -EINVAL;
 
-   sg_init_table(sgsrc, 1);
-   sg_init_table(sgdst, 1);
-
-   memset(s, 0, sizeof(s));
-   scatterwalk_map_and_copy(s, src, offset, nbytes, 0);
-
-   memcpy(iv, desc->info, bsize);
-
-   lcldesc.tfm = ctx->child;
-   lcldesc.info = iv;
-   lcldesc.flags = desc->flags;
-
-   sg_set_buf([0], s, bsize);
-   sg_set_buf([0], tmp, bsize);
-   err = crypto_blkcipher_encrypt_iv(, sgdst, sgsrc, bsize);
-
-   memcpy(d + bsize, tmp, lastn);
-
-   lcldesc.info = tmp;
-
-   sg_set_buf([0], s + bsize, bsize);
-   sg_set_buf([0], tmp2, bsize);
-   err = crypto_blkcipher_encrypt_iv(, sgdst, sgsrc, bsize);
-
-   memcpy(d, tmp2, bsize);
-
-   scatterwalk_map_and_copy(d, dst, offset, nbytes, 1);
-
-   memcpy(desc->info, tmp2, bsize);
+   /* Copy the input to a temporary buffer; tmp = xxx, P[n-1], P[n] */
+   memset(tmp+2*bsize, 0, bsize);
+   scatterwalk_map_and_copy(tmp+bsize, src, offset, nbytes, 0);
+
+   sg_init_one(sgsrc, tmp+bsize, 2*bsize);
+   /* Initialize dst specially to do the rearrangement for us */
+   sg_init_table(sgdst, 2);
+   sg_set_buf(sgdst+0, tmp+bsize, bsize);
+   sg_set_buf(sgdst+1, tmp,   bsize);
+
+   /* CBC-encrypt in place the two blocks; tmp = C[n], C[n-1], P[n] */
+   err = crypto_blkcipher_encrypt_iv(lcldesc, sgdst, sgsrc, 2*bsize);
+
+   /* Copy beginning of tmp to the output */
+   scatterwalk_map_and_copy(tmp, dst, offset, nbytes, 1);
+   memzero_explicit(tmp, sizeof(tmp));
 
return err;
 }
@@ -126,8 +116,8 @@ static int crypto_cts_encrypt(struct blkcipher_desc *desc,
  struct scatterlist *dst, struct scatterlist *src,
  unsigned int nbytes)
 {
-   struct crypto_cts_ctx *ctx = crypto_blkcipher_ctx(desc->tfm);
int bsize = crypto_blkcipher_blocksize(desc->tfm);
+   struct crypto_cts_ctx *ctx = crypto_blkcipher_ctx(desc->tfm);
int tot_blocks = (nbytes + bsize - 1) / bsize;
int cbc_blocks = tot_blocks > 2 ? tot_blocks - 2 : 0;
struct blkcipher_desc lcldesc;
@@ -140,14 +130,14 @@ static int crypto_cts_encrypt(struct blkcipher_desc *desc,
if (tot_blocks == 1) {
err = crypto_blkc

Re: [PATCH v5 0/7] /dev/random - a new approach

2016-06-20 Thread George Spelvin

> With that being said, wouldn't it make sense to:
> 
> - Get rid of the entropy heuristic entirely and just assume a fixed value of 
> entropy for a given event?

What does that gain you?  You can always impose an upper bound, but *some*
evidence that it's not a metronome is nice to have.

> - remove the high-res time stamp and the jiffies collection in 
> add_disk_randomness and add_input_randomness to not run into the correlation 
> issue?

Again, you can argue for setting the estimate to zero, but why *remove*
the timestamp?  Maybe you lose nothing,maybe you lose something, but it's 
definitely
a monotonic decrease.

> - In addition, let us credit the remaining information zero bits of entropy 
> and just use it to stir the input_pool.

Unfortunately, that is of limited use.  We mustn't remove more bits (of data,
as well as entropy) from the input pool that there are bits of entropy coming 
in.

So the extra uncounted entropy never goes anywhere and does very little good.
So any time the input pool is "full" (by counted entropy), then the uncounted
entropy has been squeezed out and thrown away.

> - Conversely, as we now would not have the correlation issue any more, let us 
> change the add_interrupt_randomness to credit each received interrupt one bit 
> of entropy or something in this vicinity?  Only if random_get_entropy returns 
> 0, let us drop the credited entropy rate to something like 1/10th or 1/20th 
> bit per event.

Baically, do you have a convincing argument that *eery* interrupt has
this?  Even those coming from strongly periodic signals like audio DMA
buffer fills?

> Hence, we cannot estimate the entropy level at runtime. All we can do is 
> having a good conservative estimate. And for such estimate, I feel that 
> throwing lots of code against that problem is not helpful.

I agree that the efficiency constraints preclude having a really
good solution.  But is it worth giving up?

For example, suppose wecome up with a decent estimator, but only use it
when we're low on entropy.  When things are comfortable, underestimate.

For example, a low-overhead entropy estimator can be derived from
Maurer's universal test.  There are all sort of conditions required to
get an accurate measurement of entropy, but violating them produces
a conservative *underestimate*, which is just fine for an on-line
entropy estimator.  You can hash non-binary inputs to save table space;
collisions cause an entropy underestimate.  You can use a limited-range
age counter (e.g. 1 byte); wraps cause entropy underestimate.  You need
to initialize the history table before measurements are accurate, but
initializing everything to zero causes an initial entropy underestimate.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 0/5] /dev/random - a new approach

2016-05-31 Thread George Spelvin

I'll be a while going through this.

I was thinking about our earlier discussion where I was hammering on
the point that compressing entropy too early is a mistake, and just
now realized that I should have given you credit for my recent 4.7-rc1
patch 2a18da7a.  The hash function ("good, fast AND cheap!") introduced
there exploits that point: using a larger hash state (and postponing
compression to the final size) dramatically reduces the requirements on
the hash mixing function.

I wasn't conscious of it at the time, but I just now realized that
explaining it clarified the point in my mind, which led to applying
the principle in other situations.

So thank you!
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: random(4) changes

2016-04-29 Thread George Spelvin

> I think we can agree that we disagree.

Yes, we agree on that, at least!

The problem is, this is supposed to be a matter of fact, not opinion,
so there should be one right answer.

I suppose it's possible it's still an issue of terminology, but we've
exhausted

> Though, I will get back to the drawing board and think about the problem
> of how to efficiently collect entropy.

For *collecting* it, there are two obvious sources that would be very
nice to use, I've just never had the courage to dig into the relevant
subsystems deep enough to add the hooks.

These could be turned on when entropy is required and turned off afterward:

1) The RTC periodic interrupt.  This is invariably driven by
   a separate 32.768 Hz crystal, so the jitter against the main
   clock oscillator is a useful source.

   A stadanrd PC RTC can run at up to 8 kHz, and probably deliver
   a few hundred bits/s.

2) Any available ADCs, especially audio ADCs.  A 16-bit or better ADC is a
   very rich source of entropy.  We'd need hook into the audio subsystem
   which could activate the ADC when needed and, most importantly,
   guarantee that the data did not go anywhere else.

   Becasue we can't guarantee that the audio input is quiet when collected,
   the entropy would have to be estimated a priori rather than deduced
   from measurements.  Some measurements would still be useful as a sanity
   check to ensure the data isn't digital zero or something.


For storing it after collecting it, I still think the current CRC-based
scheme is pretty good.  Note that cyclical XOR is a special case of this,
just using a polynomial of x^4096-1.  The good properties of CRCs for
detection of hardware-type errors are exactly equivalent to the collision
resistance properties desired for an entropy pool.

A collision results in an undetected error in the former case, and loss
of entropy in the latter.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: random(4) changes

2016-04-29 Thread George Spelvin

> What I am saying that the bits in one given time stamp are mutually 
> independent. I.e. bit 0 of one time stamp does not depend on bit 1 of that 
> very same time stamp.

And I'm saying that's wrong.

We are interested in the correlation from the point of view of someone
who knows all previous time stamps (and a lot else besides).

Based on this knowledge of what happened before, it is possible to
deduce a correlation.

> I totally agree with you that bit 0 from time stamp 1 may tell you something 
> about bit 0 of time stamp 2. And therefore I am not considering multiple time 
> stamps for this first step.

But also bits 0 and 1 of time stamp 1 will tell you something about the
correlation of bits 0 and 1 of time stamp 2.

To simplify the discussion, let's slow down the time stamp counter
to slightly more than 1 tick per interrupt.

Suppose time stamp 1 (I'll call it x) happens to end in the bits "10".
The attacker knows, based on the rates of the interrupts and TSC ticks,
that the next time stamp is probably x+1 or x+2.  It might be x+3 or more,
but that's pretty unlikely.

This means that the lsbits are probably 11 or 00.  So there's a strong
correlation between bit 0 and bit 1.

> IID: A sequence of random variables for which each element of the
> sequence has the same probability distribution as the other values
> and all values are mutually independent.
>
> Independent: Two discrete random variables X and Y are (statistically)
> independent if the probability that an observation of X will have
> a certain value does not change, given knowledge of the value of
> an observation of Y (and vice versa). When this is the case, the
> probability that the observed values of X and Y will be x and y,
> respectively, is equal to the probability that the observed value of
> X will be x (determined without regard for the value of y) multiplied
> by the probability that the observed value of Y will be y (determined
> without regard for the value of x).

These are both exactly correct.

Notice in particular the statement that a probability (of X) can change
based on knowledge (of Y).  The special case where it doesn't change is
called independence.

> All I am claiming and all I am saying is that the bits 0 through 31 in one 
> given time stamp are mutually indpendent. And thus the XOR of those 
> independent bits is appropriate.

And I'm saying that's flat out wrong.  The bits are NOT mutually
independent.  The correlation might be small in some cases, but
it's not exactly zero.

>> When woring word-at-a-time like this, we also have to consider
>> cross-correlations among the bits of a word.  The most general way to

> Tell me where the correlations should be within one word.  Where do you claim 
> they come from?

>From all the other knowledge of the machine.  Knowledge of previous
timestamps, kernel internals, interrupt service routine execution times,
interrupt mitigation timers in various hardware, periodic interrupts, etc.

>> For example, if p(9) = 0.5, p(10) = 0.5, and p(x) = 0 for all other x,
>> then we have 1 bit of entropy in the word.
>> 
>> We can analyze it bit a time, and proceed in oe of two ways:
>> 
>> - If we go lsbit first, then we have 1 bit of entropy in bit 0, but
>>   then zero entropy in bit 1.
>> - If we go msbit first, we have 1 bit of entropy in bit 1, but then
>>   zero entropy in bit 0.

> This description makes no sense.

I tried to make it as simple as I could.

Let me try again.

Assume for the purpose of discussion that we are able to predict, by some
magical means irrelevant to the example, that the next timestamp will
definitely be either 9 or 10, with 50% probability of each.

This is obviously an extremely simplified example, but the basic logic
applies to more complex cases.

We can analyze the entropy of the following timestamp in three ways:

1) Consider the timestamp all at once.  The entropy of the word is the
   expected value of -log2(p[i]).  This is p[9] * -log2(p[9]) +
   p[10] * -log2(p[10]) = 0.5*1 + 0.5*1 = 1.0 bits of entropy.

2) Consider the timestamp a bit at a time, starting from the lsbit.
2a) We have no idea what bit 0 is, so the entropy of this bit is 1.0.
2b) Given that we have already seen bit 0, we know with certainty that
bit 1 is its complement, so bit 1 contributes zero entropy.
(See the definition of "independent" above.  Having gained knowledge
of bit 0, the probability of bit 1 having a particular value has
changed.  Thus bits 0 and 1 are not independent.)
2c) Bits 2..31 are all known ahead of time, so contribute zero entropy.

3) Consider the timestamp a bit at a time, starting from the msbit.
3a) We know bits 31..2 ahead of time, so they contribute zero entropy.
3b) We have no idea what bit 1 is, so it contributes 1.0 bits of entropy.
3c) Having seen bit 1, we know with certainty that bit 0 is its
complement, so bit 0 contributes zero entropy.

The point of the example is that regardless of the method used to
add it up,

Re: random(4) changes

2016-04-29 Thread George Spelvin

> I think there is a slight mixup: IID is not related to an attacker
> predicting things. IID is simply a statistical measure, it is either there
> or not. It does not depend on an attacker (assuming that the attacker
> cannot change the data). Note, the IID is only needed to claim that the
> XOR will be entropy preserving.

1. It DOES depend on the attacker.  Any statement about independence
   depends on the available knowledge.
2. XOR being entropy preserving depends on independence ONLY, it does
   NOT depend on identical distribution.  The latter is a red herring.
   (An English metaphor for "irrelevant distraction.")
3. Precisely because the bits are not independent, XOR is not
   guaranteed to be entropy-preserving (your sense) on real data.

To give a specific example, suppose that an attacker can predict that the
counter will be either x or x+1 on the upcoming sample.  For simplicity,
assume the probabilites are exactly 50%, so there is one full bit of
entropy in the lsbit.

But if x ends in ..01, then x+1 ends in ..10, and they have the same
XOR, and the attacker knows (0 bits if entropy) the XOR of the bottom
two bits even though they know nothing about the bottom bit in isolation.

>>> There is absolutely no limit to the 32 bits. We easily can take the high
>>> bits too. But we know (as you mention below), an attacker has more and
>>> more knowledge about the selected bits the higher the bit is as he can
>>> predict an event with a certain degree of probability.

>> Yes, an attacker has more information about higher bits.
>> 
>> This is the defintion of NOT identically distributed!

> So, you are saying that by looking at data, you change their statistical 
> distribution?

Yes.

For example, if I have seen the previous sample and it is 0x,
I know that the distribution of the msbit of the following sample
is heavily biased toward 0.

If I have seen the previous sample and it is 0x7fff, I know that the
distribution of the msbit is heavily biased toward 1.

If I had not looked at the preceding samples, I would not be able
to draw those conclusions.

Remember, the following sample doesn't have a distribution; it is a
future fact.  The only thing that has a distribution is my advance
knowledge (prediction) of that fact.

>> *If* they were identically distributed, a suggestion I'm pointing
>> out the ridiculous implications of, then an attacker's knowledge
>> of each of them would be identical.

> Not at all, you mix the attackers knowledge again with a pure statistical 
> property.

I don't understand what a "pure statistical property" means.

The distribution of a single independent bit can be described
completely by giving the probability of it being 1.

In the absence of correlations (dependencies), this single number
completely describes the attacker's knowledge of the bit.

Several bits have identical distributions if and only if the
probability of their being 1 is identical.

This is the same as saying that the attacker's knowledge of the
bits is identical.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: random(4) changes

2016-04-29 Thread George Spelvin

(Note that we have two chains of e-mails crossing mid-stream.  I'm in
the middle of working on a much longer reply to your previous e-mail.)

>> They're not independent, nor are they identically distributed.

> That is an interesting statement: you say that the time stamp has holes
> in it, i.e. some values have zero probability of being selected!

That's not at all what I said.  It may be true, depending on Intel's
TSC implementation, but I didn't say or imply it.

> Second, you imply that when bit x of a given time stamp has some
> particular value, bit y can be deduced from bit x.

Yes.  For example, bit 30 can be deduced from bit 31, given our
assumption that the attacker has knowledge of previous timestamps, and
likely inter-interrupt times.  If bit 31 has changed, bit 30 is almost
certainly zero.  The bits are not independent.

The distribution of bit 31 is, with very high probability, equal to that
in the previous timestamp.  Bit 0, not so much.

In other words, bits 31 and 0 have different distributions.  They are
not identically distributed.

I gave this example in my previous e-mail
Message-ID: <20160429004748.9422.qm...@ns.horizon.com>

>> If they were identically distributed, they'd all have identical
>> entropy.  And there's be no reason to stop at 32 bits.  If the high
>> 32 bits have the same entropy as the low
>> entropy too?.

> There is absolutely no limit to the 32 bits. We easily can take the high bits 
> too. But we know (as you mention below), an attacker has more and more 
> knowledge about the selected bits the higher the bit is as he can predict an 
> event with a certain degree of probability.

Yes, an attacker has more information about higher bits.

This is the defintion of NOT identically distributed!

*If* they were identically distributed, a suggestion I'm pointing
out the ridiculous implications of, then an attacker's knowledge
of each of them would be identical.

If that were the case (and it's not), then the high 32 bits would be as
good a source of entropy as the low 32 bits.

>>> 2. show that the maximum entropy of each of the individual bits is equal
>>> or  more to my entropy estimate I apply.
>> 
>> I'm not sure what you mean here.  When you say "maximum entropy" is that
>> a non-strict upper bound?
>> 
>> Or does that mean that at least one bit achieves that maximum?

> Exactly that -- I have to show that at least one bit out of the 32
> bit value reaches that maximum, i.e. one bit has more entropy than my
> entropy estimate.

That will be an interesting claim to argue for.  Where do you make it?

>>> Regarding 1: The time stamp (or cycle counter) is a 32 bit value where
>>> each of the bits does not depend on the other bits. When considering one
>>> and only one time stamp value and we look at, say, the first 20 bits,
>>> there is no way it is clear what the missing 12 bits will be.

>> If you deliberately exclude all external data, then a 32-bit
>> constant is random.  (I suggest 17, the "most random number".)
>> 
>> But that's meaningless.  When we talk about "entropy", we are talking
>> about an attacker's uncertainty about the value.  Any other measure is
>> garbage in, and produces nothing but garbage out.

> Correct.

You mean that I'm correct that your description of the timestamp bits
as independent is meaningless?

> Maybe I am not smart enough for attacking the system. Maybe others are
> smarter than me and find a way to attack it to get to 5 or 6 bits of
> accuracy. Yet, this is means there is way more entropy than I need --
> this is my first safety margin.

I agree that the amount of entropy per timing sample is almost
certainly much higher than the current /dev/random credits it for.

The hard part is proving it.  All a statistical test can show is
that its model has a hard time predicting the output.  It can't show
tha non-existence of a better model.  That's why /dev/random is so
conservative.

Many years ago, when clock rates were below 1 GHz, I wrote a small kernel
module which disabled all other interrupts, and did nothing but take timer
interrupts and capture TSC values to RAM.  (It stopped when the buffer
was full and let the system continue.)  This was on a system with both
CPU and timer clocks generated from a single crystal by PLL.

I got a nice gaussian distribution of interrupt timings, relative
to a best-fit line,, with a standard deviation of about 8 cycles.

If I wanted to repeat that these days, I'd have to either disable in
the BIOS, or take into account, spread-spectrum clocking.  Modern clock
PLLs, to reduce EMI, deliberately modulate the CPU clock at about 30 kHz.
That adds a "ripple" with about 40 ns p-p to the TSC values relative to
a non-modulated external clock.

If I'm not careful, I could think that was 40 ns * 3.2 GHz = 128 cycles
of unpredicatability when it's just a periodic pattern.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More

Re: random(4) changes

2016-04-29 Thread George Spelvin

>From smuel...@chronox.de Fri Apr 29 04:56:49 2016
From: Stephan Mueller <smuel...@chronox.de>
To: George Spelvin <li...@horizon.com>
Cc: herb...@gondor.apana.org.au, linux-crypto@vger.kernel.org, 
linux-ker...@vger.kernel.org, sandyinch...@gmail.com, ty...@mit.edu
Subject: Re: random(4) changes
Date: Thu, 28 Apr 2016 22:15:17 +0200
User-Agent: KMail/4.14.10 (Linux/4.4.7-300.fc23.x86_64; KDE/4.14.18; x86_64; ; )
In-Reply-To: <20160427002346.12354.qm...@ns.horizon.com>
References: <20160427002346.12354.qm...@ns.horizon.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="us-ascii"

Am Dienstag, 26. April 2016, 20:23:46 schrieb George Spelvin:

Hi George,

> > And considering that I only want to have 0.9 bits of entropy, why
> > should I not collapse it? The XOR operation does not destroy the existing
> > entropy, it only caps it to at most one bit of information theoretical
> > entropy.
> 
> No.  Absolutely, demonstrably false.
> 
> The XOR operation certainly *does* destroy entropy.
> If you have 0.9 bits of entropy to start, you will have less
> after the XOR.  It does NOT return min(input, 1) bits.

As I am having difficulties following your explanation, let us start at the 
definition:

XOR is defined as an entropy preserving operation, provided the two arguments 
to the XOR operation are statistically independent (let us remember that 
caveat for later).

> That means, the entropy behavior of H(A XOR B) is max(H(A), H(B)) if they are 
> independent.

Actually, no.  If they're independent, it can be greater.

For example, if A has half a bit of entropy, and B has half a bit
(both in the Shannon sense), then A XOR B will have 0.713537
bits.

> 1. the individual bits of a given 32 bit time stamp are independent
>(or IID in terms of NIST)

They're not independent, nor are they identically distributed.

If they were identically distributed, they'd all have identical
entropy.  And there's be no reason to stop at 32 bits.  If the high
32 bits have the same entropy as the low
entropy too?.

> 2. show that the maximum entropy of each of the individual bits is equal or 
>more to my entropy estimate I apply.

I'm not sure what you mean here.  When you say "maximum entropy" is that
a non-strict upper bound?

Or does that mean that at least one bit achieves that maximum?

That will be a much more interesting proof.

> Regarding 1: The time stamp (or cycle counter) is a 32 bit value where
> each of the bits does not depend on the other bits. When considering one
> and only one time stamp value and we look at, say, the first 20 bits,
> there is no way it is clear what the missing 12 bits will be.

If you deliberately exclude all external data, then a 32-bit
constant is random.  (I suggest 17, the "most random number".)

But that's meaningless.  When we talk about "entropy", we are talking
about an attacker's uncertainty about the value.  Any other measure is
garbage in, and proiduces nothing but garbage out.

 Note I
am not saying that when comparing two or more time stamps that one
cannot deduce the bits! And here it is clear that the bits within
one given time stamp are independent, but multiple time stamps are
not independent. This finding is supported with measurements given in
3.4.1 (I understand that the measurements are only supportive and no
proof). Figure 3.1 shows an (almost) rectangular distribution which is
the hint to an equidistribution which in turn supports the finding that
the individual bits within a time stamp are independent. In addition,
when you look at the Shannon/Min Entropy values (which do not give an
entropy estimate here, but only help in understanding the distribution!),
the values show that the distribution has hardly any discontinuities --
please read the explanation surrounding the figure.

Regarding 2: I did numerous measurements that show that the low bits do have 
close to one bit of entropy per data bit. If I may ask to consider section 
3.4.1 again (please consider that I tried to break the logic by applying a 
pathological generation of interrupts here to stimulate the worst case): The 
entropy is not found in the absolute time stamps, but visible in the time 
deltas (and the uncertainty of the variations of those). So I calculated the 
time deltas from the collected set of time stamps of events. Now, when simply 
using the four (you may also use three or perhaps five) lower bits of the time 
delta values, we can calculate an interesting and very important Minimum 
Entropy value: the Markov Min Entropy. Using the table 2, I calculated the 
Markov Min Entropy of the data set of the 4 low bit time delta values. The 
result shows that the 4 bit values still have 3.92 bits of entropy (about 0.98 
bits of entropy per data bit). Ok, one worst case measurement may not be good 
enough. So I contin

Re: random(4) changes

2016-04-28 Thread George Spelvin

I'd like to apologise for the harsh tone of my earlier message.
I was frustrated, and it showed.


I hope I can be more gentle as I continue to elaborate on the shortcomings
of that scheme.

Concentrating entropy is hard.  To paraphrase Princess Leia, "the more
you tighten your grip, the more entropy will slip through your fingers."

While it's true that the entropy of two independent bit strings is at
least as high as the entropy of either one, it's not necessarily much
higher, either.


I'm going to discuss two things here.  First, what happens if you assume
that the inpuit samples are made up of independent and identically
distributed (i.i.d.) random bits, and second why they're not i.i.d.

When talking about single bits, all that matters is the probabilities
of the two values.  I'm going to assume that the bits are 1 with some
probability p <= 0.5, and 0 with probability p >= 0.5, but that's just
for convenience.

You can XOR any attacker-known pattern into those bits and it doesn't
change the entropy.  What matters is the probability that they match an
attacker's prediction, and the probability that they don't match.

For example, if an attacker knows that a bit is 75% likely to match
some other bit, then I'll describe it with p=0.25.


i.i.d. bits of course all have the same entropy.  Suppose that we have 32
bits, with 0.1 bits of entropy each, making 3.2 bits of entropy in total.

This is a pretty good sample; it shoud be easy to get one bit with high
entropy out of this, right?

Well, no.

0.1 bits of Shannon entropy means that a bit is 0 with probability 98.7%,
and 1 with probability p = 1.3%.

If you XOR two such bits together, the result is 1 with probability
2 * p * (1-p) (which is also less than 0.5).


Let's work out the entropy assuming we start with 0.1 of a bit of Shannon
entropy per bit, and 0.1 of a bit of min-entropy per bit, and then
form the XOR of increasingly large blocks:

Starting with 32/10 bits of Shannon entropy
 1: p=0.012987  Shannon=0.10 (sum 3.20)  Min=0.018859 (sum 0.603482)
 2: p=0.025636  Shannon=0.172013 (sum 2.752203)  Min=0.037468 (sum 0.599486)
 4: p=0.049958  Shannon=0.286220 (sum 2.289760)  Min=0.073937 (sum 0.591499)
 8: p=0.094925  Shannon=0.452699 (sum 1.810795)  Min=0.143891 (sum 0.575563)
16: p=0.171829  Shannon=0.661871 (sum 1.323741)  Min=0.271999 (sum 0.543997)
32: p=0.284607  Shannon=0.861653 (sum 0.861653)  Min=0.483192 (sum 0.483192)

The last line is the probability that the XOR of 32 such bits is 1.
28.46% is still some distance from 50/50, isn't it?

The min-entropy per bit comes close to doubling each time, since it
suffers less from saturation effects as it approaches 1 bit per bit,
but still 20% of the entropy is lost.

Starting with 32/10 bits of min-entropy
 1: p=0.066967  Shannon=0.354502 (sum 11.344068)  Min=0.10 (sum 3.20)
 2: p=0.124965  Shannon=0.543466 (sum  8.695452)  Min=0.192587 (sum 3.081394)
 4: p=0.218697  Shannon=0.757782 (sum  6.062253)  Min=0.356046 (sum 2.848372)
 8: p=0.341738  Shannon=0.926472 (sum  3.705887)  Min=0.603265 (sum 2.413061)
16: p=0.449906  Shannon=0.992747 (sum  1.985494)  Min=0.862250 (sum 1.724500)
32: p=0.494981  Shannon=0.27 (sum  0.27)  Min=0.985591 (sum 0.985591)

Because min-entropy is a more conservaitve estimate, the starting probability
is much closer to 50/50, and we got very close to unbiased.  But it took
11 bits of Shannon entropy to do it!


Working the formula backward, we can deduce that to get 0.95 bits of
Shannon entropy by XORing 32 i.i.d. bits, each of the input bits needs
to have p = 0.020511, 0.1443 bits of entropy, a total of 4.62 bits.


In fact, if the maximum entropy per bit is low enough then the
probability of getting the all-zero word is greater than 50% and
that imposes an upper limit on the entropy produces by *any* hashing
scheme.

For 0.1 bits of Shannon entropy per bit, (1-p)^32 = 0.658164 so even if
you reduced to one bit using !, you couldn't get closer to 50:50
than that.  (0.926565 bit Shannon, 0.603482 bits min-entropy.)


It turns out that lots of weakly random i.i.d. bits are a worst case for
this sort of reduction; if divide entropy in a "triangular" way, where
bit i gets (i+1) parts of the entropy (dividing by (32*33/2) to normalize)
then we get a final XOR of

p=0.338856  Shannon=0.923720  Min=0.596964

which is better than the p=0.284607 we got from i.i.d. bits.


Is it, then, perhaps the case that the i.i.d. assumption is not a good one?
The bit patterns we're talking about don't seem to resemble realistic
timestamps.

It's a very bad one, as I'll explain.


*Independence* can exist in a vacuum: not correlated with anything.
That's what /dev/random is trying to produce.  But the seed material
is anything but.

There are two major sources of information to an attacker about the
seed entropy:

1) An attacker is assumed to know all prior seed inputs.
   That's because we're trying to quantify the *additional* entropy
   contributed by this sample.

Re: random(4) changes

2016-04-26 Thread George Spelvin

> And considering that I only want to have 0.9 bits of entropy, why
> should I not collapse it? The XOR operation does not destroy the existing
> entropy, it only caps it to at most one bit of information theoretical
> entropy.

No.  Absolutely, demonstrably false.

The XOR operation certainly *does* destroy entropy.
If you have 0.9 bits of entropy to start, you will have less
after the XOR.  It does NOT return min(input, 1) bits.

In rare cases, the XOR won't destroy entropy, but that's statistically
unlikely.


Here's a proof:

If there are only two possible inputs (timings), and those inputs have
opposite parities, then the XOR will cause no collisions and no entropy
is destroyed.

If you have at least three possibilities, hashing down to one bit (by
XOR or any other algorithm) must cause a collision, and that collision
will lose entropy.

Just as an example, let me use a 3-option distribution with roughly 0.5
bit of Shannon entropy: probabilities 90%, 9% and 1%.  Then list all
the possible collisions, and the Shannon and min-entropy in each case:

%   Shannon Min
90/9/1  0.5159  0.1520
90/10   0.4690  (91%)   0.1520  (100%)
91/90.4365  (85%)   0.1361  (90%)
99/10.0808  (16%)   0.0145  (10%)
100 0   0

If you reduce the number of cases to 2, you lose Shannon entropy, always.

Min-entropy is preserved 1/4 of the time if you get lucky and none of the
less-likely options collide with the most-likely.

If the 4 possible collision cases are equally likely (which is the
case if the hashing to one bit is a random function), then you expect
to retain half of the input entropy.


If there are more than three possible inputs, the situation gets worse,
and the likelihood of no loss of min-entropy falls.


In a case of particular interest to an RNG, consider the min-entropy when
there are a large number of possible input measurements.  The min-entropy is
simply -log2(p(max)), where p(max) is the probability of the most likely
outcome.  If p(max) > 50%, then the input min-entropy is less than 1 bit.

In this case we can assume that, when collapsing to a single bit, the
less likely cases will be distributed uniformly between colliding and
not colliding with the most likely alternative.

Thus, the probability of the most likely increases from p to
p + (1-p)/2 = (1+p)/2, and the min-entropy correspondingly
decreases from -log2(p) to -log2((1+p)/2).

The ratio of output to input min-entropy varies from 50% near 0 bits to
45.7% at 0.5 bits to 41.5% at 1 bit input.

In this case, which I think is a plausible case for /dev/random
measurements, you're throwing away half the entropy.


Beyond 1 bit of input entropy, the ratio gets worse as the output
asymptotically approaches 1 bit of entropy.  Specifically, in order to
get 0.9 bits of min-entropy in the output (p(max) = 0.5358), you need
3.8 bits (p(max) = 0.07177 = 1/14) in the input!


I'm sorry, but collapsing individual samples to 1 bit is a Bad Design,
full stop.  It's not the algorithm used to do the reduction, it's the
reduction itself.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: random(4) changes

2016-04-26 Thread George Spelvin

Schrieb Stephan Mueller:
> Am Montag, 25. April 2016, 21:59:43 schrieb George Spelvin:
>> Indeed, this is an incredibly popular novice mistake and I don't
>> understand why people keep making it.

> Can you please elaborate on your statement to help me understanding the issue 
> and substantiate your claim here?

Basically, hashing down to 1 bit limits the entropy to 1 bit.

If there happened to be more than 1 bit of entropy in the
original input (and many timestamps *do* have more entropy than
that; the hard part is identifying which), you've thrown it away.

You need to hash eventually, to convert the large amount of weakly-random
input to the desired strongly-random output, but you should do that as
late in the processing as possible, and in as large blocks as possible.

The "novice mistake" is to try to concentrate the entropy (reduce the
number of bits used to store it) too soon.  You want to defer that as much
as possible.

> Please note the mathematical background I outlined in my documentation: What I
> try is to collapse the received data such as a time stamp into one bit by
> XORing each bit with each other. Note, the bits within a time stamp are IID
> (independent and identically distributed -- i.e. when you see one or more bits
> of a given time stamp, you cannot derive the yet unseen bit values). 
> Technically this is identical to a parity calculation.

And I'm still struggling to understand it.  You wrote it up formally, so
I want to stare at it for a few hours (which is a couple of days calendar
time) before passing judgement on it.

For example, my initial reaction is that the IID claim seems ridiculous.

Bit 63 of a rdtsc timestamp is always zero.  It's initialized to zero on
boot and no computer with a TSC has been up for the 50+ years it would
take to flip that bit.

But presumably that's obvious to you, too, so I'm misunderstanding.

I'm trying to catch up on your paper and all the other comments in this
thread at the same time, and my brain is a bit scattered.

I'm trying to resist the urge to respond until I understand everything
that's already been said, but as I mentioned previously, I'm not being
entirely successful.

> - the output of the entropy pool is meant to be fed into a DRBG. Such DRBG 
> (let us take the example of a Hash DRBG) will, well, hash the input data. So, 
> what help does a hash to raw entropy before feeding it to a DRBG which will 
> hash it (again)?

The two-stage hashing is a matter of practical implementation necessity.

Ideally, we'd take all of the raw sampled data and use a strong hash
on it directly.  But that requires an impractical amount of storage.

Just as good would be to losslessly compress the data.  If we could do
*perfect* compression, we'd get pure entropy directly.  But the latter
is impossible and even the former is impractical.

So we hash it to fit it into a fixed-size buffer.  This hash does
not have to be cryptographically strong, just minimize collisions.
(Since a collision is the one and only way to lose entropy.)  This is
explained in the comment at drivers/char/random.c:335.

A second design goal of this first stage hash is speed; we want to
minimize interrupt overhead.  Since it was first designed, cache effects
have gotten stronger and the scattered access to a large pool could be
improved upon, but it's still reasonably fast.


The second stage hash (DRBG or equivalent) then uses a strong cryptographic
algorithm to generate the final output.

> - the entropy pool maintenance does not need to have any backtracking 
> resistance as (1) it is always postprocessed by the cryptographic operation
> of the DRBG, and (2) constantly overwritten by new interrupts coming in

I don't see how (1) is relevant at all; if you can recover the DRBG
seed, you can recover the DRBG output, and (2) might not be fast enough.
For example, suppose someone suspends to disk immediately after generating
a key.

(I'm assuming you instantiate a new DRBG for each open() of /dev/random.
I haven't read your code yet to verify that.)

If the amount of entropy added after the key generation is attackable (say
it's around 32 bits), then the image on disk can reveal the previously
generated key.

You're right that it's not a very critical feature in most use cases,
but it's not very expensive to implement and I think a lot of people
would question its dismissal.  Knowledgeable people, never mind the
howls from the peanut gallery if they hear we're weakening /dev/random.

(I mention that the NIST DRBGs implement anti-backtracking, so presumably
they think it's an important feature.)

> - to hash raw input data is usually performed to whiten it. When you have a 
>   need to whiten it, it contains skews and statistical weaknesses that
>   you try to disguise. My approach is to not disguise anything -- I try
>   to have "nothing up my sleeve".

Only the final hash, which pr

Re: random(4) changes

2016-04-25 Thread George Spelvin

I just discovered this huge conversation and am trying to read it all
and get caught up.

I know I should finish reading before I start writing, but too many ideas
are bubbling up.


> not the rest of Stephan's input handling code: the parity
> calculation and XORing the resulting single bit into the entropy pool.

Indeed, this is an incredibly popular novice mistake and I don't
understand why people keep making it.

Look, *we don't know how* much entropy is in any given sample.  It's
unmeasurable in practice, so /dev/random uses some heuristics and
a huge amount of conservatism.

Because of the crude nature of this estimate, the amount of entropy
that's *probably* there is much larger than the computed guarantee.

To preserve this "bonus entropy", it's very important *not* to
have any bottlenecks like hashing down to a single bit.

Real events consist of a lot of highly predictable low-entropy samples,
with an occasional high-entropy exception.  Even if the average is less
than one bit, you want to ensure there's plenty of room for much *more*
than one bit so you aren't throwing away occasional high-entropy data.

This is why the current per-CPU fast pools are 128 bits and spilled *long*
before they approach full.

Precisely because the entropy estimate is so poor, I'd like *at least*
8 times as many bits as the entropy estimate, and more isn't a bad thing.

Eventually, you have to narrow it down to N strongly random bits with N
bits of entropy, but when you do that:

1. Use the largest batches possible, averaging over as much input as
   possible, and
2. Use a good collision-resistant, and preferably cryptographically
   strong, hash.  /dev/random's CRC-based input mix is pretty much
   the lightest defensible thing.  XOR is bad for for the same reason
   that any additive checksum is weak.


> I also quite like Stephan's idea of replacing the two output pools
> with a NIST-approved DBRG, mainly because this would probably make
> getting various certifications easier.

Do you know of an example of such a certification?

I don't really *mind* the NIST DRBGs (well, except for Dual_EC_DRBG
of course), but they don't really *help* in this context, either.

The good and bad thing about them is that they're highly correlated to
the strength of the underlying primitives.  If you're already relying
on AES in your application, then an AES-based DRBG avoids adding another
algorithm to your attack surface.  (Be it cryptanalytic, implementation,
or side channel.)

They also provide a nice reliable cookbook solution to anti-backtracking
and incorporating weakly random seed material (the timestamps) into
their state.

If they're not seeded with full entropy, then an algorithm like CTR_DRBG
isn't significantly *stronger* the underlying cipher.  Which is not
wonderful if /dev/random uses AES and someone's generating a Keccak key
with it (or vice versa).

And if they are seeded with full entropy, they aren't contrbuting
anything; it's just waving a dead chicken over your already-secure
input seed.

If waving a dead chicken would help with some real-world bureaucratic
obstacle, I don't mind, but otherwise it's pointless bloat.

(Exception: for large amounts of output, NIST's DRBGs have the problem
that they are bottlenecked at "seedlen" bits of internal state.
This makes sense for their intended use, which is in an application
with a specific security requirement, but an issue for a fully
general-purpose random bit source.)


> In the current driver -- and I think in Stephan's, though I have not
> looked at his code in any detail, only his paper -- heavy use of
> /dev/urandom or the kernel get_random_bytes() call can deplete the
> entropy available to /dev/random. That can be a serious problem in
> some circumstances, but I think I have a fix.

Actually, that hasn't been too big a problem for a while.  Of course
it depletes it *somewhat*, and it should, but there's the contents of
B plus some reserved in I (see rsvd_bytes in xfer_secondary_pool())
that won't be used up.

> You have an input pool (I) plus a blocking pool (B) & a non-blocking
> pool (NB). The problem is what to do when NB must produce a lot of
> output but you do not want to deplete I too much. B & NB might be
> replaced by DBRGs and the problem would not change.

I, B and NB *are* DRBGs, just not the NIST design.

> B must be reseeded before every /dev/random output, NB after some
> number of output blocks. I used #define SAFE_OUT 503 but some other
> number might be better depending how NB is implemented & how
> paranoid/conservative one feels.

Actually, it's better to use Ferguson & Schneier's idea of "catastropic
reseeding".  If you never want to block, you can never guarantee
reseeding.  This is not a big problem; a decent DRBG can generate
petabytes of output without reseeding.

(The reseed interval should be no more than the square root of the
state size, to force a reseed before a cycle appears.  NIST further
clamp it to 2^48, but that's somewhat

Re: [BISECTED] 4943ba16 (include crypto- module prefix) breaks wifi

2015-04-30 Thread George Spelvin

Sorry for the long silence; the last e-mails arrived as I went on a trip,
and the packet got lost.

I just upgraded my laptop to 4.0.1 and had to remember the magic
incantation to get the wireless working.  (modprobe ctr)

 George, any updates on this?

It turns out that I found the problem.  An odd bit of pilot error, but
NOT a kernel problem.  See bottom.

 Also, could you please provide the output of depmod -n | grep
 crypto-? There should be lines for crypto-ccm and crypto-ctr if you
 build them as modules.

alias crypto-twofish-asm twofish_i586
alias crypto-twofish twofish_i586
alias crypto-salsa20-asm salsa20_i586
alias crypto-salsa20 salsa20_i586
alias crypto-serpent serpent_sse2_i586
alias crypto-cmac cmac
alias crypto-xcbc xcbc
alias crypto-md4 md4
alias crypto-sha256-generic sha256_generic
alias crypto-sha256 sha256_generic
alias crypto-sha224-generic sha256_generic
alias crypto-sha224 sha256_generic
alias crypto-ecb ecb
alias crypto-lrw lrw
alias crypto-xts xts
alias crypto-ctr ctr
alias crypto-rfc3686 ctr
alias crypto-gcm gcm
alias crypto-rfc4543 gcm
alias crypto-rfc4106 gcm
alias crypto-gcm_base gcm
alias crypto-ccm ccm
alias crypto-rfc4309 ccm
alias crypto-ccm_base ccm
alias crypto-cryptd cryptd
alias crypto-twofish-generic twofish_generic
alias crypto-twofish twofish_generic
alias crypto-serpent-generic serpent_generic
alias crypto-serpent serpent_generic
alias crypto-tnepres serpent_generic
alias crypto-salsa20-generic salsa20_generic
alias crypto-salsa20 salsa20_generic
alias crypto-michael_mic michael_mic
alias crypto-crc32 crc32
alias crypto-ansi_cprng ansi_cprng
alias crypto-stdrng ansi_cprng
alias crypto-ghash-generic ghash_generic
alias crypto-ghash ghash_generic


Anyway, the problem was a long time ago, in /etc/modprobe.d/local.conf,
I had blacklisted several unwanted crypto modules in order to suppress
some mysterious urge my system had to load every loadable module at
boot time.

Hey, bozo!  The point of making them modules is that I *don't* want them
taking up unswappable memory all the time!  So I hit it with a hammer.

One line written years ago was blacklist ctr.  The light dawns.

It turns out that this doesn's stop an explicit modprobe ctr from
working, but *does* stop alias processing that resolves to ctr.
Thus, the kernel change broke my strange kmod (mis-)configuration.

This fits the observed symptoms, and I apologize for wasting your time.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BISECTED] 4943ba16 (include crypto- module prefix) breaks wifi

2015-02-17 Thread George Spelvin

 And now my head-scratching: your bisect shows that v3.19-rc7 fails,
 but that prior to 4943ba16bbc2, things work correctly? As in, when
 _only_ 5d26a105b5a7 is in your tree things _work_?

Basically, yes.  v3.19-rc7 fails with basically the same symptoms:
wpa_supplicant keeps looping trying to associate.  I haven't compared
the log messages line-by-line.

It appears that it negotiates a crypto key and then fails to pass
it down to the kernel.

I haven't tested 5d26a105b5a7 specifically (when you say _only_
5d26a105b5a7 is in your tree, do you mean that state, or that commit
cherry-picked?), but 4 commits later (476c7fe20f30) did get tested,
and it worked.  (In fact, that was the final test in my bisect
process; once I typed git bisect good, things worked.)

Also, one more detail:
- When I changed CTR and CCM from =m to =y, things started working.

Perhaps it's the unexpected case of a module using a static algorithm?
Anyway, let me collect more information before you put more time into
it.

 Could you rename and instrument your /sbin/modprobe to do something like:

 #!/bin/sh
 echo $@  /root/modprobe.log
 exec /sbin/modprobe.bin $@
 
 Perhaps there's something being explicitly requested rather than
 having it be kernel auto-loaded?

Glad to.  Thanks for the pointer!

Unless you say otherwise, I'll probably test on 3.19 release, since that
seems the most useful baseline for future repair.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[BISECTED] 4943ba16 (include crypto- module prefix) breaks wifi

2015-02-16 Thread George Spelvin

I discovered when (belatedly) testing 3.19-rc7 the other week that
my laptop wifi was broken and would no longer associate.

I wasted a lot of time trying to bisect changes in net/wireless and
net/drivers wireless before figuring out that it was sonewhere else in
the kernel.  An unrestricted bisect quickly homed in on this commit.

Apparently this is causing some necessary crypto algorithms to fail to
load, breaking my wifi.

Perhaps I'm displaying my ignorance of what's supposed to happen,
but shouldn't make install have installed some files with names like
/lib/modules/`uname r`/kernel/crypto/crypto-*.ko?

Or is it something only I'm hitting because I have a lot of common
crypto algorithms statically compiled into my kernel?

CONFIG_CRYPTO_CBC=y
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_AES=y
CONFIG_CRYPTO_AES_586=y
CONFIG_CRYPTO_ARC4=y


In more detail, when things are working,  (such as on commit 4943ba1^
= 476c7fe2), wpa_supplicant logs:

wlan1: SME: Trying to authenticate with aa:bb:cc:dd:ee:ff (SSID='FOO' freq=24xx 
MHz)
wlan1: Trying to associate with aa:bb:cc:dd:ee:ff (SSID='FOO' freq=24xx MHz)
wlan1: Associated with aa:bb:cc:dd:ee:ff
wlan1: WPA: Key negotiation completed with aa:bb:cc:dd:ee:ff [PTK=CCMP GTK=CCMP]
wlan1: CTRL-EVENT-CONNECTED - Connection to aa:bb:cc:dd:ee:ff completed (aith) 
[id=0 id_str=]

Followed by group rekeying completed messages at 10 minute intervals.

Trying this on kernel 4943ba16 produces instead an endless loop of:

wlan1: SME: Trying to authenticate with aa:bb:cc:dd:ee:ff (SSID='FOO' freq=24xx 
MHz)
wlan1: Trying to associate with aa:bb:cc:dd:ee:ff (SSID='FOO' freq=24xx MHz)
wlan1: Associated with aa:bb:cc:dd:ee:ff
wlan1: WPA: Failed to set PTK to the driver (alg=3 keylen=16 
bssid=aa:bb:cc:dd:ee:ff)
wlan1: CTRL-EVENT-DISCONNECTED bssid=aa:bb:cc:dd:ee:ff reason=1


The kernel logs are not particularly informative.

They just go through the usual successful series, but end with

wlan1: RxAssocResp from aa:bb:cc:dd:ee:ff (capab=0x431 status=0 aid=1)
wlan1: associated
wlan1: deauthenticating from 11:bb:cc:dd:ee:ff by local choice (Reason: 
1=UNSPECIFIED)

While successful connection ends before that last line.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 00/25] Multiple changes to crypto/ansi_cprng.c

2014-12-15 Thread George Spelvin

 That output is good for the VST test vectors. For the MCT vectors, I need the 
 1th value.

That was test 9 in the first group:

 [167586.784923] COUNT = 9
 [167586.784925] Key = 10379b53317a2500879e88ad445ea387
 [167586.784927] DT = 055a913d7587d54ee58c053fd4beb4a2
 [167586.784928] V = a7d058a34e1bf49b40f0b6d26661f889
 [167586.791891] R = c252c3f173558775929fe3fb8345feb2
 [167586.791892] cprng: Test 9 passed
 [167586.797633] cprng: Stutter test 9 passed

Just like the CAVS test vectors, I don't print the loops value
anywhere.

 With some minor editor massaging (deleting the timestamps and
 inserting a blank line before every COUNT line), it matches the
 ANSI931_AES128MCT.fax and ANSI931_AES128VST.fax you sent.  I left it
 un-massaged as some sort of evidence that it isn't just a direct copy.

 I cannot match these test vectors and the results to the ones I sent to you. 
 E.g. I do not find the key value f3b1666d13607242ed061cabb8d46202 anywhere in 
 the data set.

Sorry, that's the union of the testmgr.h tests, a couple I added by hand,
and (at the end) the ones you sent me.

So no, you can't find all of my test results in your test vectors, but
all of your test vectors are in my test results.

(In all cases, the software compares the results with the expected answers.)
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 00/25] Multiple changes to crypto/ansi_cprng.c

2014-12-15 Thread George Spelvin

 - the non-determinism you get from get_random_int is very weak. If you start
 thinking about the information theoretical entropy behind that function that
 is used once in a while, you may not get much entropy. Please, please, please,
 I do not want to start a discussion around entropy -- I will not participate
 in such discussion :-)

I could have such a discussion, but there's no need to; for the most part,
I agree with you.  I wasn't trying to design a *good* RNG, I was trying
to comply slavishly with what X9.17, X9.31, and the NIST specification
call for: a timetamp.

To quote http://csrc.nist.gov/groups/STM/cavp/documents/rng/931rngext.pdf,
section 3:

# Let DT be a date/time vector which is updated on each iteration. 

That's what I was trying to produce, nothing more and nothing less.

You will agree, I hope, that the result from get_random_int *does* include
the entropy of a high-resolution timestamp?  Which is cryptographically
equivalent to including the unobfuscated timestamp?

 Thus, I am questioning whether such slightly non-deterministic RNG would be 
 used.

As far as I know the only reason to *ever* use ansi_cprng is for
regulatory/certification reasons.  It's not horrible, but it's definitely
been superseded, by the NIST SP800-90A generators at least.

Which is why I'm trying to follow the spec as precisely as possible.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 00/25] Multiple changes to crypto/ansi_cprng.c

2014-12-15 Thread George Spelvin

 If you look into other X9.31 implementations, you see that the DT vector 
 is a time stamp (even sometimes initialized to just 0 or 1) and then 
 incremented each time. Thus, you get some form of a counter mode for 
 the AES core.

I'm not saying you're wrong, but that still seems to me an extremely
contrived interpretation.  You're right that the tick rate isn't specified
and once per call isn't unambigously forbidden, but the plain reading
of the spec calls for a clock, not a counter.

 But of course, you could update DT with time stamps, provided you can 
 prove that they are monotonically increasing. 

I don't even see a need for that, although I could easily do it (by
adding get_random_int() rather than replacing).  I thought the goal was
that they were non-repeating, so short cycles are impossible.

 You will agree, I hope, that the result from get_random_int *does*
 include the entropy of a high-resolution timestamp?  Which is
 cryptographically equivalent to including the unobfuscated timestamp?

 get_random_int does provide entropy, but my gut feeling (I have not done 
 measurements) is that it is in the range of maybe 2 / 3 bits per 
 invocation.

You said you didn't want to start a conversation about entropy, remember?
:-)  So I'm not discussing the issue, but please don't interpret that
as conceding anything.

As I said, it doesn't matter; my goal is just to do what the spec asks
for, as faithfully as possible without introducing any stupidities in
the process.

 Which is why I'm trying to follow the spec as precisely as possible.

 If you only look at the regulatory side, then you must be aware of 
 SP800-131A applicable at least to the US side. X9.31 is sunsetted by the 
 end of 2015 and even not FIPS 140-2 certifiable any more for new 
 validations.

Well, if nobody wants it, why not simply rip it out?

I'm assuming there are other requirements documents that refer to
those documents, and haven't been updated to reflect tha changes.

That's what tends to happen: requirements flow downstream over
a period of years.  NIST may change its mind, but my contract
hasn't noticed yet.

I know this from personal experience: I've had frustrating discussions
about a too hard to change requirement for 1024-bit DSA *and* FIPS
140 certification.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 00/25] Multiple changes to crypto/ansi_cprng.c

2014-12-15 Thread George Spelvin

 With that then, I'm really fine with the changes given that they pass the NIST
 tests.

So here's the current list of issues.  First, minor ones:
1) Add const to DRBG interface, as per Stephan's request.
2) Revised version of that final patch that, you know, actually works.
3) Re-run tests at the very end just to make sure.

And the major ones:
4) Is non-deterministic DT desired?
5) If yes, how to request it?

On point 4, here are the primary arguments against:

* It makes the generator non-deterministic, which is a significant
  interface change and may break some applications.
* This is a crufty old generator, used primarily for compatibility,
  and it's best not to upset its quiet retirement.

And the primary arguments for:
* It's an honest good-faith implementation of the spec requirements.
  Using a counter is, IMHO, a strained interpretation.
* The implementation isn't particularly difficult.

After considering various options, my current (not very firm) thought
is that the best way to provide a non-deterministic option would be via
a separate algorithm name.

But externally-visible names are a high-level design issue and I could
definitely use some guidance there.  Opinions?
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 05/25] crypto: ansi_cprng - Eliminate ctx-I and ctx-last_rand_data

2014-12-14 Thread George Spelvin

 Due to the huge number of diffs, I may have missed the following point. 
 Therefore, please help me:

No problem at all!  If you're doing me the kindness of actually reading
and reviewing this, I have *lots* of time to act as a tour guide.

I've just had my nose in this code, and your memory is presumably a bit
rustier on some details, even if you understand the larger system better
than I do.

(I hope that English figure of speech isn't too obscure for you.)

 Where do I see that priming?

It's in the same place as it always has been: in fips_cprng_reset,
just below the comment this primes our continuity test.

Patch 12 changes the priming call from get_prng_bytes to
_get_more_prng_bytes in order to get rid of the rdata stack buffer.

Patches 5 and 21 make inconsequential syntactic changes to the area.

 Note, this priming should have an ability to be disabled for performing the 
 CAVS tests as they (as stupid as it may sound) want the very first random 
 number after the seeding.

In this regard, I didn't touch the existing code, which distinguishes the
functions fips_cprng_reset which does the priming, and cprng_reset
which doesn't, and exports two struct crypto_alg interfaces to make them
both available.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 00/25] Multiple changes to crypto/ansi_cprng.c

2014-12-14 Thread George Spelvin

 Pending issues:
 * Neil would like me to post the results of the NIST and FIPS test
   vectors.  The current code doesn't print anything on a successful
   test; I need to know what result format is wanted.
 * Stephan says he has the FIPS test vectors referred to above and
   will send them to me when he finds them.

 I have send these vectors about a week ago. Do you have results?

I got them, but I've been twiddling my thumbs waiting for the first
point above: to know what format the results should be in.

 Note, apart from the two comments  I sent today, I do not see any big
 prolems in the patch set. However, I would like to see the test results.

As far as I can tell, the tcrypt.ko module prints *nothing* if a test
is successful, and there is no module option to make it more verbose,
so I don't know what to show people.

I can say yes, I ran it, and saw no errors but that seems unsatisfactory.

What would you like to see, other than the following uuencoded test
results:

begin 664 test_results
`
end

As I mentioned (in passing; you may have forgotten), for my own use,
I simply copied the test code  vectors out of testmgr.[ch] and pasted
it into ansi_cprng.c's prng_mod_init (at the end, so it doesn't mess up
the line numbers of the other patches), with some tweaks to say Test
%d passed.

There's also a debug option that dumps all kinds of details in
_get_more_prng_bytes, but that produces so much output on a 10,000-round
MCT that the kernel log buffer overflows and loses data.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 00/25] Multiple changes to crypto/ansi_cprng.c

2014-12-14 Thread George Spelvin

 I have send these vectors about a week ago. Do you have results?

BTW, here is my current home-grown debugging output.

With some minor editor massaging (deleting the timestamps and
inserting a blank line before every COUNT line), it matches the
ANSI931_AES128MCT.fax and ANSI931_AES128VST.fax you sent.  I left it
un-massaged as some sort of evidence that it isn't just a direct copy.

[167586.775196] alg_test_cprng: testing ansi_cprng: err 0
[167586.775220] COUNT = 0
[167586.775222] Key = f3b1666d13607242ed061cabb8d46202
[167586.775225] DT = e6b3be782a23fa62d71d4afbb0e922f9
[167586.775226] V = 8000
[167586.775230] R = 59531ed13bb0c05584796685c12f7641
[167586.775231] cprng: Test 0 passed
[167586.775234] cprng: Stutter test 0 passed
[167586.775235] COUNT = 1
[167586.775236] Key = f3b1666d13607242ed061cabb8d46202
[167586.775237] DT = e6b3be782a23fa62d71d4afbb0e922fa
[167586.775237] V = c000
[167586.775239] R = 7c222cf4ca8fa24c1c9cb641a9f3220d
[167586.775240] cprng: Test 1 passed
[167586.775242] cprng: Stutter test 1 passed
[167586.775243] COUNT = 2
[167586.775244] Key = f3b1666d13607242ed061cabb8d46202
[167586.775245] DT = e6b3be782a23fa62d71d4afbb0e922fb
[167586.775245] V = e000
[167586.775247] R = 8aaa003966675be529142881a94d4ec7
[167586.775248] cprng: Test 2 passed
[167586.775250] cprng: Stutter test 2 passed
[167586.775250] COUNT = 3
[167586.775251] Key = f3b1666d13607242ed061cabb8d46202
[167586.775252] DT = e6b3be782a23fa62d71d4afbb0e922fc
[167586.775253] V = f000
[167586.775255] R = 88dda456302423e5f69da57e7b95c73a
[167586.775255] cprng: Test 3 passed
[167586.775257] cprng: Stutter test 3 passed
[167586.775258] COUNT = 4
[167586.775259] Key = f3b1666d13607242ed061cabb8d46202
[167586.775260] DT = e6b3be782a23fa62d71d4afbb0e922fd
[167586.775260] V = f800
[167586.775262] R = 052592466179d2cb78c40b140a5a9ac8
[167586.775263] cprng: Test 4 passed
[167586.775265] cprng: Stutter test 4 passed
[167586.775266] COUNT = 5
[167586.775266] Key = 9f5b51200bf334b5d82be8c37255c848
[167586.775267] DT = 6376bbe52902ba3b67c925fa701f11ac
[167586.775268] V = 572c8e76872647977e74fbddc49501d1
[167586.775270] R = 2b6c7070067fc873079864a30a316927
[167586.775271] cprng: Test 5 passed
[167586.775273] cprng: Stutter test 5 passed
[167586.775274] COUNT = 6
[167586.775275] Key = 9f5b51200bf334b5d82be8c37255c848
[167586.775276] DT = 6376bbe52902ba3b67c925fa701f11ac
[167586.775276] V = 572c8e76872647977e74fbddc49501d1
[167586.775279] R = 3bed4e991f23567cf7e2c9306e12b6d2
[167586.775280] cprng: Test 6 passed
[167586.775283] cprng: Stutter test 6 passed
[167586.775284] COUNT = 7
[167586.775285] Key = 9f5b51200bf334b5d82be8c37255c848
[167586.775286] DT = 6376bbe52902ba3b67c925fa701f11ac
[167586.775286] V = 572c8e76872647977e74fbddc49501d1
[167586.775291] R = a5975adb24344f7e2dbbaf96060063ee
[167586.775291] cprng: Test 7 passed
[167586.775297] cprng: Stutter test 7 passed
[167586.775298] COUNT = 8
[167586.775298] Key = 9f5b51200bf334b5d82be8c37255c848
[167586.775299] DT = 6376bbe52902ba3b67c925fa701f11ac
[167586.775300] V = 572c8e76872647977e74fbddc49501d1
[167586.780146] R = 48e9bd0d06ee18fbe45790d5c3fc9b73
[167586.780147] cprng: Test 8 passed
[167586.784921] cprng: Stutter test 8 passed
[167586.784923] COUNT = 9
[167586.784925] Key = 10379b53317a2500879e88ad445ea387
[167586.784927] DT = 055a913d7587d54ee58c053fd4beb4a2
[167586.784928] V = a7d058a34e1bf49b40f0b6d26661f889
[167586.791891] R = c252c3f173558775929fe3fb8345feb2
[167586.791892] cprng: Test 9 passed
[167586.797633] cprng: Stutter test 9 passed
[167586.797635] COUNT = 10
[167586.797637] Key = a9d40024ad65b4120636f28758662cbf
[167586.797639] DT = 8f210f47c3534774067563d111251f84
[167586.797640] V = 8000
[167586.797645] R = a80bd181ddc04a55a554da4d520dd493
[167586.797646] cprng: Test 10 passed
[167586.797650] cprng: Stutter test 10 passed
[167586.797652] [X9.31]
[AES 128-Key]
[167586.797654] COUNT = 0
[167586.797656] Key = a9d40024ad65b4120636f28758662cbf
[167586.797657] DT = 8f210f47c3534774067563d111251f84
[167586.797659] V = 8000
[167586.797662] R = a80bd181ddc04a55a554da4d520dd493
[167586.797664] COUNT = 1
[167586.797665] Key = a9d40024ad65b4120636f28758662cbf
[167586.797667] DT = 8f210f47c3534774067563d111251f85
[167586.797668] V = c000
[167586.797671] R = 5ed27d0a7883408b42ecfdb7bfffc056
[167586.797672] COUNT = 2
[167586.797674] Key = a9d40024ad65b4120636f28758662cbf
[167586.797675] DT = 8f210f47c3534774067563d111251f86
[167586.797677] V = e000
[167586.797680] R = 40859edac9fee6735262389927aa063c
[167586.797682] COUNT = 3
[167586.797683] Key = a9d40024ad65b4120636f28758662cbf
[167586.797685] DT = 8f210f47c3534774067563d111251f87
[167586.797686] V = f000
[167586.797689] R =

Re: [PATCH v2 25/25] crypto: ansi_cprng - If non-deterministic, don't buffer old output

2014-12-08 Thread George Spelvin

 Not your motivations, just the posting mechanics.  If you just want to
 discuss a patch, and aren't yet proposing it for inclusion, you should
 put RFC in the prefix of every patch header.

I understand the principle, and I should have on those patches (mea
culpa), but really *all* patch postings are for comment; I think of RFC
as comment only; please don't apply this.

But it wasn't marked RFC, so that's why I posted a note downgrading
it when I realized I messed it up.  The note was basically oh, shit,
I introduced a bug at the last minute; thankfully that was the most RFC
of the entire series, so nobody's likely to have merged it.

But it certainly is the case that for any significant patch series,
I really don't expect v1 to get merged as-is.

I'm serious about the changes, and it wouldn't have been a problem if
you had applied v1, but it would have surprised me.  Realistically,
I expect a couple of rounds of discussion and tweaking of the specific
form of the changes before people agree it's ready to go in.

And I think that's the case here; I adjusted a lot of details based on
feedback, but at a high level nothing changed; v2 makes the same changes
that v1 did.

 Not particularly opposed to the idea, I just know that several use cases
 rely on deterministic behavior for those entities that share the secret
 information, so I need to be sure that the deterministic behavior remains
 and is the default.

Right, because it's advertised as a PRNG.  Thinking about it, would
a separate crypto_alg with a different seedsize be a better solution
than obscure rules about seed size?  And something in the cra_flags
to indicate it's nondeterminsitic?
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 01/25] crypto: ansi_cprng - unroll _get_more_prng_bytes

2014-12-07 Thread George Spelvin

It's more legible, and the code is 16 bytes smaller (i386).

Signed-off-by: George Spelvin li...@horizon.com
---
 crypto/ansi_cprng.c | 91 +
 1 file changed, 35 insertions(+), 56 deletions(-)

diff --git a/crypto/ansi_cprng.c b/crypto/ansi_cprng.c
index b63b5094..ce315bf7 100644
--- a/crypto/ansi_cprng.c
+++ b/crypto/ansi_cprng.c
@@ -100,69 +100,48 @@ static int _get_more_prng_bytes(struct prng_context *ctx, 
int cont_test)
hexdump(Input V: , ctx-V, DEFAULT_BLK_SZ);
 
/*
-* This algorithm is a 3 stage state machine
+* Start by encrypting the counter value
+* This gives us an intermediate value I
 */
-   for (i = 0; i  3; i++) {
+   memcpy(tmp, ctx-DT, DEFAULT_BLK_SZ);
+   output = ctx-I;
+   hexdump(tmp stage 0: , tmp, DEFAULT_BLK_SZ);
+   crypto_cipher_encrypt_one(ctx-tfm, output, tmp);
 
-   switch (i) {
-   case 0:
-   /*
-* Start by encrypting the counter value
-* This gives us an intermediate value I
-*/
-   memcpy(tmp, ctx-DT, DEFAULT_BLK_SZ);
-   output = ctx-I;
-   hexdump(tmp stage 0: , tmp, DEFAULT_BLK_SZ);
-   break;
-   case 1:
-
-   /*
-* Next xor I with our secret vector V
-* encrypt that result to obtain our
-* pseudo random data which we output
-*/
-   xor_vectors(ctx-I, ctx-V, tmp, DEFAULT_BLK_SZ);
-   hexdump(tmp stage 1: , tmp, DEFAULT_BLK_SZ);
-   output = ctx-rand_data;
-   break;
-   case 2:
-   /*
-* First check that we didn't produce the same
-* random data that we did last time around through this
-*/
-   if (!memcmp(ctx-rand_data, ctx-last_rand_data,
-   DEFAULT_BLK_SZ)) {
-   if (cont_test) {
-   panic(cprng %p Failed repetition 
check!\n,
-   ctx);
-   }
-
-   printk(KERN_ERR
-   ctx %p Failed repetition check!\n,
-   ctx);
-
-   ctx-flags |= PRNG_NEED_RESET;
-   return -EINVAL;
-   }
-   memcpy(ctx-last_rand_data, ctx-rand_data,
-   DEFAULT_BLK_SZ);
+   /*
+* Next xor I with our secret vector V
+* encrypt that result to obtain our
+* pseudo random data which we output
+*/
+   xor_vectors(ctx-I, ctx-V, tmp, DEFAULT_BLK_SZ);
+   hexdump(tmp stage 1: , tmp, DEFAULT_BLK_SZ);
+   output = ctx-rand_data;
+   crypto_cipher_encrypt_one(ctx-tfm, output, tmp);
 
-   /*
-* Lastly xor the random data with I
-* and encrypt that to obtain a new secret vector V
-*/
-   xor_vectors(ctx-rand_data, ctx-I, tmp,
-   DEFAULT_BLK_SZ);
-   output = ctx-V;
-   hexdump(tmp stage 2: , tmp, DEFAULT_BLK_SZ);
-   break;
+   /*
+* First check that we didn't produce the same
+* random data that we did last time around through this
+*/
+   if (!memcmp(ctx-rand_data, ctx-last_rand_data, DEFAULT_BLK_SZ)) {
+   if (cont_test) {
+   panic(cprng %p Failed repetition check!\n, ctx);
}
 
+   printk(KERN_ERR ctx %p Failed repetition check!\n, ctx);
 
-   /* do the encryption */
-   crypto_cipher_encrypt_one(ctx-tfm, output, tmp);
-
+   ctx-flags |= PRNG_NEED_RESET;
+   return -EINVAL;
}
+   memcpy(ctx-last_rand_data, ctx-rand_data, DEFAULT_BLK_SZ);
+
+   /*
+* Lastly xor the random data with I
+* and encrypt that to obtain a new secret vector V
+*/
+   xor_vectors(ctx-rand_data, ctx-I, tmp, DEFAULT_BLK_SZ);
+   output = ctx-V;
+   hexdump(tmp stage 2: , tmp, DEFAULT_BLK_SZ);
+   crypto_cipher_encrypt_one(ctx-tfm, output, tmp);
 
/*
 * Now update our DT value
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 00/25] Multiple changes to crypto/ansi_cprng.c

2014-12-07 Thread George Spelvin

This is a reworked version of my earlier patch series, based on feedback
from Neil Horman and Stephan Mueller.  Thank you both very much!

It's mostly the same content as before, but I've tried to improve comments
and commit messages to address questions, to reorder the patches to put
the questionable stuff at the end, and I've also (at Neil's prodding)
made some larger scale changes.

I've added appropriate const qualifiers to the RNG API, and also const
declarations to all of the self-tests in testmgr.h.  (That's a very
large but simple patch.)

The significant code improvement is the addition of what I call the
stutter test to testmgr.  This reads from the RNG in irregular chunks
and verifies that the output matches that produced by a more regular
pattern.  This should prevent any recurrence of CVE-2013-4345.
(It itself passed an important test by detecting a bug in my code!)

Dropped change:
* Neil said he wanted deterministic to remain the default, so I dropped
  the patch that changed the default seedsize.

Pending issues:
* Neil would like me to post the results of the NIST and FIPS test
  vectors.  The current code doesn't print anything on a successful
  test; I need to know what result format is wanted.
* Stephan says he has the FIPS test vectors referred to above and
  will send them to me when he finds them.
* Is non-deterministic mode (last three patches) wanted?

George Spelvin (25):
  crypto: ansi_cprng - unroll _get_more_prng_bytes
  crypto: ansi_cprng - Additional _get_more_prng_bytes cleanup
  crypto: ansi_cprng - Use %phN rather than print_hex_dump for debug
  crypto: ansi_cprng - Make debug output more like NIST test vectors
  crypto: ansi_cprng - Eliminate ctx-I and ctx-last_rand_data
  crypto: ansi_cprng - Make cont_test a bool
  crypto: ansi_cprng - Shrink context some more
  crypto: ansi_cprng - Don't call reset_prng_context from cprng_init
  crypto: ansi_cprng - Make length types consistent
  crypto: ansi_cprng - Use u8 data types consistently internally
  crypto: ansi_cprng - Eliminate unused PRNG_FIXED_SIZE flag
  crypto: ansi_cprng - Get rid of rdata buffer in fips_cprng_reset
  crypto: Add appropriate consts to RNG API
  crypto: tcrypt - Add const qualifiers all over the test code.
  crypto: testmgr - Merge seed arrays in struct cprng_testvec
  crypto: testmgr - Report failure on zero-length crypto_rng_get_bytes
  crypto: testmgr - Don't crash if CPRNG test result is large
  crypto: testmgr - Add CPRNG stutter test.
  crypto: ansi_cprng - simplify get_prng_bytes
  crypto: ansi_cprng - simplify xor_vectors() to xor_block()
  crypto: ansi_cprng - Rename rand_data_valid more sensibly
  crypto: ansi_cprng - Tweak comments
  crypto: ansi_cprng - Introduce a union cipherblock
  crypto: ansi_cprng - Introduce non-deterministic mode
  crypto: ansi_cprng - If non-deterministic, don't buffer old output

 crypto/ansi_cprng.c| 369 
 crypto/krng.c  |   2 +-
 crypto/rng.c   |   3 +-
 crypto/tcrypt.c|  46 ++---
 crypto/tcrypt.h|  30 +--
 crypto/testmgr.c   | 190 +--
 crypto/testmgr.h   | 502 -
 include/crypto/rng.h   |   2 +-
 include/linux/crypto.h |   6 +-
 9 files changed, 587 insertions(+), 563 deletions(-)

-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 02/25] crypto: ansi_cprng - Additional _get_more_prng_bytes cleanup

2014-12-07 Thread George Spelvin

The local variable output is no longer required.

Signed-off-by: George Spelvin li...@horizon.com
---
 crypto/ansi_cprng.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/crypto/ansi_cprng.c b/crypto/ansi_cprng.c
index ce315bf7..143e0cfa 100644
--- a/crypto/ansi_cprng.c
+++ b/crypto/ansi_cprng.c
@@ -89,8 +89,6 @@ static int _get_more_prng_bytes(struct prng_context *ctx, int 
cont_test)
 {
int i;
unsigned char tmp[DEFAULT_BLK_SZ];
-   unsigned char *output = NULL;
-
 
dbgprint(KERN_CRIT Calling _get_more_prng_bytes for context %p\n,
ctx);
@@ -104,9 +102,8 @@ static int _get_more_prng_bytes(struct prng_context *ctx, 
int cont_test)
 * This gives us an intermediate value I
 */
memcpy(tmp, ctx-DT, DEFAULT_BLK_SZ);
-   output = ctx-I;
hexdump(tmp stage 0: , tmp, DEFAULT_BLK_SZ);
-   crypto_cipher_encrypt_one(ctx-tfm, output, tmp);
+   crypto_cipher_encrypt_one(ctx-tfm, ctx-I, tmp);
 
/*
 * Next xor I with our secret vector V
@@ -115,8 +112,7 @@ static int _get_more_prng_bytes(struct prng_context *ctx, 
int cont_test)
 */
xor_vectors(ctx-I, ctx-V, tmp, DEFAULT_BLK_SZ);
hexdump(tmp stage 1: , tmp, DEFAULT_BLK_SZ);
-   output = ctx-rand_data;
-   crypto_cipher_encrypt_one(ctx-tfm, output, tmp);
+   crypto_cipher_encrypt_one(ctx-tfm, ctx-rand_data, tmp);
 
/*
 * First check that we didn't produce the same
@@ -139,9 +135,8 @@ static int _get_more_prng_bytes(struct prng_context *ctx, 
int cont_test)
 * and encrypt that to obtain a new secret vector V
 */
xor_vectors(ctx-rand_data, ctx-I, tmp, DEFAULT_BLK_SZ);
-   output = ctx-V;
hexdump(tmp stage 2: , tmp, DEFAULT_BLK_SZ);
-   crypto_cipher_encrypt_one(ctx-tfm, output, tmp);
+   crypto_cipher_encrypt_one(ctx-tfm, ctx-V, tmp);
 
/*
 * Now update our DT value
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 04/25] crypto: ansi_cprng - Make debug output more like NIST test vectors

2014-12-07 Thread George Spelvin

This uses more meaningful labels (if you have the spec as a
reference), and avoids printing some stuff (like the original DT)
twice.

It also strips out the len parameter and uses a fixed length of
DEFAULT_BLK_SZ.

Signed-off-by: George Spelvin li...@horizon.com
---
 crypto/ansi_cprng.c | 25 -
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/crypto/ansi_cprng.c b/crypto/ansi_cprng.c
index b54e4e75..325aa727d 100644
--- a/crypto/ansi_cprng.c
+++ b/crypto/ansi_cprng.c
@@ -19,6 +19,7 @@
 #include linux/module.h
 #include linux/moduleparam.h
 #include linux/string.h
+#include linux/stringify.h
 
 #include internal.h
 
@@ -57,10 +58,11 @@ struct prng_context {
 
 static int dbg;
 
-static void hexdump(char *note, unsigned char *buf, unsigned int len)
+static void hexdump(char const *note, const unsigned char buf[DEFAULT_BLK_SZ])
 {
if (dbg) {
-   printk(KERN_CRIT %s%*phN, note, (int)len, buf);
+   printk(KERN_CRIT %s = % __stringify(DEFAULT_BLK_SZ) phN,
+   note, buf);
}
 }
 
@@ -90,17 +92,16 @@ static int _get_more_prng_bytes(struct prng_context *ctx, 
int cont_test)
dbgprint(KERN_CRIT Calling _get_more_prng_bytes for context %p\n,
ctx);
 
-   hexdump(Input DT: , ctx-DT, DEFAULT_BLK_SZ);
-   hexdump(Input I: , ctx-I, DEFAULT_BLK_SZ);
-   hexdump(Input V: , ctx-V, DEFAULT_BLK_SZ);
+   hexdump(DT, ctx-DT);
+   hexdump(V, ctx-V);
 
/*
 * Start by encrypting the counter value
 * This gives us an intermediate value I
 */
memcpy(tmp, ctx-DT, DEFAULT_BLK_SZ);
-   hexdump(tmp stage 0: , tmp, DEFAULT_BLK_SZ);
crypto_cipher_encrypt_one(ctx-tfm, ctx-I, tmp);
+   hexdump(I, ctx-I);
 
/*
 * Next xor I with our secret vector V
@@ -108,8 +109,9 @@ static int _get_more_prng_bytes(struct prng_context *ctx, 
int cont_test)
 * pseudo random data which we output
 */
xor_vectors(ctx-I, ctx-V, tmp, DEFAULT_BLK_SZ);
-   hexdump(tmp stage 1: , tmp, DEFAULT_BLK_SZ);
+   hexdump(V^I, tmp);
crypto_cipher_encrypt_one(ctx-tfm, ctx-rand_data, tmp);
+   hexdump(R, ctx-rand_data);
 
/*
 * First check that we didn't produce the same
@@ -132,8 +134,9 @@ static int _get_more_prng_bytes(struct prng_context *ctx, 
int cont_test)
 * and encrypt that to obtain a new secret vector V
 */
xor_vectors(ctx-rand_data, ctx-I, tmp, DEFAULT_BLK_SZ);
-   hexdump(tmp stage 2: , tmp, DEFAULT_BLK_SZ);
+   hexdump(R^I, tmp);
crypto_cipher_encrypt_one(ctx-tfm, ctx-V, tmp);
+   hexdump(V', ctx-V);
 
/*
 * Now update our DT value
@@ -143,15 +146,11 @@ static int _get_more_prng_bytes(struct prng_context *ctx, 
int cont_test)
if (ctx-DT[i] != 0)
break;
}
+   hexdump(DT', ctx-DT);
 
dbgprint(Returning new block for context %p\n, ctx);
ctx-rand_data_valid = 0;
 
-   hexdump(Output DT: , ctx-DT, DEFAULT_BLK_SZ);
-   hexdump(Output I: , ctx-I, DEFAULT_BLK_SZ);
-   hexdump(Output V: , ctx-V, DEFAULT_BLK_SZ);
-   hexdump(New Random Data: , ctx-rand_data, DEFAULT_BLK_SZ);
-
return 0;
 }
 
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 12/25] crypto: ansi_cprng - Get rid of rdata buffer in fips_cprng_reset

2014-12-07 Thread George Spelvin

Calling the lower-level function does what's needed with less overhead.

Signed-off-by: George Spelvin li...@horizon.com
---
 crypto/ansi_cprng.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/crypto/ansi_cprng.c b/crypto/ansi_cprng.c
index f6a1e987..249b944f 100644
--- a/crypto/ansi_cprng.c
+++ b/crypto/ansi_cprng.c
@@ -329,7 +329,6 @@ static int fips_cprng_get_random(struct crypto_rng *tfm, u8 
*rdata,
 
 static int fips_cprng_reset(struct crypto_rng *tfm, u8 *seed, unsigned int 
slen)
 {
-   u8 rdata[DEFAULT_BLK_SZ];
u8 *key = seed + DEFAULT_BLK_SZ;
int rc;
 
@@ -348,7 +347,7 @@ static int fips_cprng_reset(struct crypto_rng *tfm, u8 
*seed, unsigned int slen)
goto out;
 
/* this primes our continuity test */
-   rc = get_prng_bytes(rdata, DEFAULT_BLK_SZ, prng, false);
+   rc = _get_more_prng_bytes(prng, false);
prng-rand_data_valid = DEFAULT_BLK_SZ;
 
 out:
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 23/25] crypto: ansi_cprng - Introduce a union cipherblock

2014-12-07 Thread George Spelvin

This ensures alignment and makes xor_block more efficient,
but it's mostly in preparation for later changes.

Signed-off-by: George Spelvin li...@horizon.com
---
 crypto/ansi_cprng.c | 73 ++---
 1 file changed, 41 insertions(+), 32 deletions(-)

diff --git a/crypto/ansi_cprng.c b/crypto/ansi_cprng.c
index 9c8475a2..4d256d74 100644
--- a/crypto/ansi_cprng.c
+++ b/crypto/ansi_cprng.c
@@ -25,6 +25,15 @@
 
 #define DEFAULT_PRNG_KSZ 16
 #define DEFAULT_BLK_SZ 16
+#define BLK_INTS ((DEFAULT_BLK_SZ + 3) / 4)
+#define BLK_LONGS ((DEFAULT_BLK_SZ + 7) / 8)
+
+/* An aligned buffer thant can be accessed various ways */
+union cipherblock {
+   u8 b[DEFAULT_BLK_SZ];
+   u32 i[BLK_INTS];
+   u64 l[BLK_LONGS];
+};
 
 /*
  * Flags for the prng_context flags field
@@ -47,15 +56,15 @@ struct prng_context {
spinlock_t prng_lock;
u8 flags;
u8 rand_read_pos;
-   u8 rand_data[DEFAULT_BLK_SZ];
-   u8 DT[DEFAULT_BLK_SZ];
-   u8 V[DEFAULT_BLK_SZ];
+   union cipherblock rand_data;
+   union cipherblock DT;
+   union cipherblock V;
struct crypto_cipher *tfm;
 };
 
 static int dbg;
 
-static void hexdump(char const *note, const u8 buf[DEFAULT_BLK_SZ])
+static void hexdump(char const *note, const union cipherblock *buf)
 {
if (dbg) {
printk(KERN_CRIT %s = % __stringify(DEFAULT_BLK_SZ) phN,
@@ -68,11 +77,11 @@ if (dbg)\
printk(format, ##args);\
 } while (0)
 
-static void xor_block(const u8 in[DEFAULT_BLK_SZ], u8 out[DEFAULT_BLK_SZ])
+static void xor_block(const u64 in[BLK_LONGS], u64 out[BLK_LONGS])
 {
int i;
 
-   for (i = 0; i  DEFAULT_BLK_SZ; i++)
+   for (i = 0; i  BLK_LONGS; i++)
out[i] ^= in[i];
 }
 
@@ -83,20 +92,20 @@ static void xor_block(const u8 in[DEFAULT_BLK_SZ], u8 
out[DEFAULT_BLK_SZ])
 static int _get_more_prng_bytes(struct prng_context *ctx, bool cont_test)
 {
int i;
-   u8 tmp[DEFAULT_BLK_SZ];
+   union cipherblock tmp;
 
dbgprint(KERN_CRIT Calling _get_more_prng_bytes for context %p\n,
ctx);
 
-   hexdump(DT, ctx-DT);
-   hexdump(V, ctx-V);
+   hexdump(DT, ctx-DT);
+   hexdump(V, ctx-V);
 
/*
 * Start by encrypting the counter value
 * This gives us an intermediate value I (stored in tmp)
 */
-   crypto_cipher_encrypt_one(ctx-tfm, tmp, ctx-DT);
-   hexdump(I, tmp);
+   crypto_cipher_encrypt_one(ctx-tfm, tmp.b, ctx-DT.b);
+   hexdump(I, tmp);
 
/*
 * Next xor I with our secret vector V.  Encrypt that result
@@ -104,16 +113,16 @@ static int _get_more_prng_bytes(struct prng_context *ctx, 
bool cont_test)
 * keep that output in ctx-V for the moment; we need the
 * previous rand_data for ons more thing.
 */
-   xor_block(tmp, ctx-V);
-   hexdump(V^I, ctx-V);
-   crypto_cipher_encrypt_one(ctx-tfm, ctx-V, ctx-V);
-   hexdump(R, ctx-V);
+   xor_block(tmp.l, ctx-V.l);
+   hexdump(V^I, ctx-V);
+   crypto_cipher_encrypt_one(ctx-tfm, ctx-V.b, ctx-V.b);
+   hexdump(R, ctx-V);
 
/*
 * Check that we didn't produce the same random data that we
 * did last time around.
 */
-   if (!memcmp(ctx-V, ctx-rand_data, DEFAULT_BLK_SZ)) {
+   if (!memcmp(ctx-V.b, ctx-rand_data.b, DEFAULT_BLK_SZ)) {
if (cont_test) {
panic(cprng %p Failed repetition check!\n, ctx);
}
@@ -126,27 +135,27 @@ static int _get_more_prng_bytes(struct prng_context *ctx, 
bool cont_test)
/*
 * Okay, the new data is okay, copy it to the buffer.
 */
-   memcpy(ctx-rand_data, ctx-V, DEFAULT_BLK_SZ);
+   ctx-rand_data = ctx-V;
 
/*
 * Lastly xor the random data with I and encrypt that to obtain
 * a new secret vector V.
 */
-   xor_block(tmp, ctx-V);
-   hexdump(R^I, ctx-V);
-   memzero_explicit(tmp, DEFAULT_BLK_SZ);
-   crypto_cipher_encrypt_one(ctx-tfm, ctx-V, ctx-V);
-   hexdump(V', ctx-V);
+   xor_block(tmp.l, ctx-V.l);
+   hexdump(R^I, ctx-V);
+   memzero_explicit(tmp.b, DEFAULT_BLK_SZ);
+   crypto_cipher_encrypt_one(ctx-tfm, ctx-V.b, ctx-V.b);
+   hexdump(V', ctx-V);
 
/*
 * Now update our DT value
 */
for (i = DEFAULT_BLK_SZ - 1; i = 0; i--) {
-   ctx-DT[i] += 1;
-   if (ctx-DT[i] != 0)
+   ctx-DT.b[i] += 1;
+   if (ctx-DT.b[i] != 0)
break;
}
-   hexdump(DT', ctx-DT);
+   hexdump(DT', ctx-DT);
 
dbgprint(Returning new block for context %p\n, ctx);
 
@@ -175,7 +184,7 @@ static int get_prng_bytes(u8 *buf, unsigned int nbytes,
/* Leading partial block */
unsigned int avail = DEFAULT_BLK_SZ - read_pos;
 
-   memcpy(ptr, ctx-rand_data + read_pos, avail

[PATCH v2 16/25] crypto: testmgr - Report failure on zero-length crypto_rng_get_bytes

2014-12-07 Thread George Spelvin

If crypto_rng_get_bytes returns an error code, returning it is a good
idea, but if it simply returns zero, the test shouldn't abort successfully.

Signed-off-by: George Spelvin li...@horizon.com
---
 crypto/testmgr.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 0e179c72..9faf265f 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -1476,6 +1476,8 @@ static int test_cprng(struct crypto_rng *tfm,
   the correct amount of random data for 
   %s (requested %d, got %d)\n, algo,
   template[i].rlen, err);
+   if (err = 0)
+   err = -EINVAL;
break;
}
}
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 13/25] crypto: Add appropriate consts to RNG API

2014-12-07 Thread George Spelvin

Signed-off-by: George Spelvin li...@horizon.com
---
 crypto/ansi_cprng.c| 11 ++-
 crypto/krng.c  |  2 +-
 crypto/rng.c   |  3 ++-
 include/crypto/rng.h   |  2 +-
 include/linux/crypto.h |  6 --
 5 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/crypto/ansi_cprng.c b/crypto/ansi_cprng.c
index 249b944f..c1c81266 100644
--- a/crypto/ansi_cprng.c
+++ b/crypto/ansi_cprng.c
@@ -299,11 +299,11 @@ static int cprng_get_random(struct crypto_rng *tfm, u8 
*rdata,
  *  V and KEY are required during reset, and DT is optional, detected
  *  as being present by testing the length of the seed
  */
-static int cprng_reset(struct crypto_rng *tfm, u8 *seed, unsigned int slen)
+static int cprng_reset(struct crypto_rng *tfm, const u8 *seed, unsigned int 
slen)
 {
struct prng_context *prng = crypto_rng_ctx(tfm);
-   u8 *key = seed + DEFAULT_BLK_SZ;
-   u8 *dt = NULL;
+   const u8 *key = seed + DEFAULT_BLK_SZ;
+   const u8 *dt = NULL;
 
if (slen  DEFAULT_PRNG_KSZ + DEFAULT_BLK_SZ)
return -EINVAL;
@@ -327,9 +327,10 @@ static int fips_cprng_get_random(struct crypto_rng *tfm, 
u8 *rdata,
return get_prng_bytes(rdata, dlen, prng, true);
 }
 
-static int fips_cprng_reset(struct crypto_rng *tfm, u8 *seed, unsigned int 
slen)
+static int fips_cprng_reset(struct crypto_rng *tfm, const u8 *seed,
+   unsigned int slen)
 {
-   u8 *key = seed + DEFAULT_BLK_SZ;
+   const u8 *key = seed + DEFAULT_BLK_SZ;
int rc;
 
struct prng_context *prng = crypto_rng_ctx(tfm);
diff --git a/crypto/krng.c b/crypto/krng.c
index a2d2b72f..007ea7e3 100644
--- a/crypto/krng.c
+++ b/crypto/krng.c
@@ -22,7 +22,7 @@ static int krng_get_random(struct crypto_rng *tfm, u8 *rdata, 
unsigned int dlen)
return 0;
 }
 
-static int krng_reset(struct crypto_rng *tfm, u8 *seed, unsigned int slen)
+static int krng_reset(struct crypto_rng *tfm, const u8 *seed, unsigned int 
slen)
 {
return 0;
 }
diff --git a/crypto/rng.c b/crypto/rng.c
index e0a25c24..9e3a6efd 100644
--- a/crypto/rng.c
+++ b/crypto/rng.c
@@ -29,7 +29,8 @@ struct crypto_rng *crypto_default_rng;
 EXPORT_SYMBOL_GPL(crypto_default_rng);
 static int crypto_default_rng_refcnt;
 
-static int rngapi_reset(struct crypto_rng *tfm, u8 *seed, unsigned int slen)
+static int rngapi_reset(struct crypto_rng *tfm, const u8 *seed,
+   unsigned int slen)
 {
u8 *buf = NULL;
int err;
diff --git a/include/crypto/rng.h b/include/crypto/rng.h
index c93f9b91..9659300a 100644
--- a/include/crypto/rng.h
+++ b/include/crypto/rng.h
@@ -62,7 +62,7 @@ static inline int crypto_rng_get_bytes(struct crypto_rng *tfm,
 }
 
 static inline int crypto_rng_reset(struct crypto_rng *tfm,
-  u8 *seed, unsigned int slen)
+  const u8 *seed, unsigned int slen)
 {
return crypto_rng_crt(tfm)-rng_reset(tfm, seed, slen);
 }
diff --git a/include/linux/crypto.h b/include/linux/crypto.h
index d45e9496..8aa6350b 100644
--- a/include/linux/crypto.h
+++ b/include/linux/crypto.h
@@ -264,7 +264,8 @@ struct compress_alg {
 struct rng_alg {
int (*rng_make_random)(struct crypto_rng *tfm, u8 *rdata,
   unsigned int dlen);
-   int (*rng_reset)(struct crypto_rng *tfm, u8 *seed, unsigned int slen);
+   int (*rng_reset)(struct crypto_rng *tfm, const u8 *seed,
+  unsigned int slen);
 
unsigned int seedsize;
 };
@@ -399,7 +400,8 @@ struct compress_tfm {
 struct rng_tfm {
int (*rng_gen_random)(struct crypto_rng *tfm, u8 *rdata,
  unsigned int dlen);
-   int (*rng_reset)(struct crypto_rng *tfm, u8 *seed, unsigned int slen);
+   int (*rng_reset)(struct crypto_rng *tfm, const u8 *seed,
+unsigned int slen);
 };
 
 #define crt_ablkcipher crt_u.ablkcipher
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 05/25] crypto: ansi_cprng - Eliminate ctx-I and ctx-last_rand_data

2014-12-07 Thread George Spelvin

Careful use of the other available buffers avoids the need for
these, shrinking the context by 32 bytes.

Neither the debug output nor the FIPS-required anti-repetition check
are changed in the slightest.

Signed-off-by: George Spelvin li...@horizon.com
---
 crypto/ansi_cprng.c | 50 --
 1 file changed, 24 insertions(+), 26 deletions(-)

diff --git a/crypto/ansi_cprng.c b/crypto/ansi_cprng.c
index 325aa727d..2edac42e 100644
--- a/crypto/ansi_cprng.c
+++ b/crypto/ansi_cprng.c
@@ -37,19 +37,14 @@
 
 /*
  * Note: DT is our counter value
- *  I is our intermediate value
  *  V is our seed vector
  * See http://csrc.nist.gov/groups/STM/cavp/documents/rng/931rngext.pdf
  * for implementation details
  */
-
-
 struct prng_context {
spinlock_t prng_lock;
unsigned char rand_data[DEFAULT_BLK_SZ];
-   unsigned char last_rand_data[DEFAULT_BLK_SZ];
unsigned char DT[DEFAULT_BLK_SZ];
-   unsigned char I[DEFAULT_BLK_SZ];
unsigned char V[DEFAULT_BLK_SZ];
u32 rand_data_valid;
struct crypto_cipher *tfm;
@@ -97,27 +92,27 @@ static int _get_more_prng_bytes(struct prng_context *ctx, 
int cont_test)
 
/*
 * Start by encrypting the counter value
-* This gives us an intermediate value I
+* This gives us an intermediate value I (stored in tmp)
 */
-   memcpy(tmp, ctx-DT, DEFAULT_BLK_SZ);
-   crypto_cipher_encrypt_one(ctx-tfm, ctx-I, tmp);
-   hexdump(I, ctx-I);
+   crypto_cipher_encrypt_one(ctx-tfm, tmp, ctx-DT);
+   hexdump(I, tmp);
 
/*
-* Next xor I with our secret vector V
-* encrypt that result to obtain our
-* pseudo random data which we output
+* Next xor I with our secret vector V.  Encrypt that result
+* to obtain our pseudo random data which we output.  But
+* keep that output in ctx-V for the moment; we need the
+* previous rand_data for ons more thing.
 */
-   xor_vectors(ctx-I, ctx-V, tmp, DEFAULT_BLK_SZ);
-   hexdump(V^I, tmp);
-   crypto_cipher_encrypt_one(ctx-tfm, ctx-rand_data, tmp);
-   hexdump(R, ctx-rand_data);
+   xor_vectors(tmp, ctx-V, ctx-V, DEFAULT_BLK_SZ);
+   hexdump(V^I, ctx-V);
+   crypto_cipher_encrypt_one(ctx-tfm, ctx-V, ctx-V);
+   hexdump(R, ctx-V);
 
/*
-* First check that we didn't produce the same
-* random data that we did last time around through this
+* Check that we didn't produce the same random data that we
+* did last time around.
 */
-   if (!memcmp(ctx-rand_data, ctx-last_rand_data, DEFAULT_BLK_SZ)) {
+   if (!memcmp(ctx-V, ctx-rand_data, DEFAULT_BLK_SZ)) {
if (cont_test) {
panic(cprng %p Failed repetition check!\n, ctx);
}
@@ -127,15 +122,19 @@ static int _get_more_prng_bytes(struct prng_context *ctx, 
int cont_test)
ctx-flags |= PRNG_NEED_RESET;
return -EINVAL;
}
-   memcpy(ctx-last_rand_data, ctx-rand_data, DEFAULT_BLK_SZ);
+   /*
+* Okay, the new data is okay, copy it to the buffer.
+*/
+   memcpy(ctx-rand_data, ctx-V, DEFAULT_BLK_SZ);
 
/*
-* Lastly xor the random data with I
-* and encrypt that to obtain a new secret vector V
+* Lastly xor the random data with I and encrypt that to obtain
+* a new secret vector V.
 */
-   xor_vectors(ctx-rand_data, ctx-I, tmp, DEFAULT_BLK_SZ);
-   hexdump(R^I, tmp);
-   crypto_cipher_encrypt_one(ctx-tfm, ctx-V, tmp);
+   xor_vectors(tmp, ctx-V, ctx-V, DEFAULT_BLK_SZ);
+   hexdump(R^I, ctx-V);
+   memzero_explicit(tmp, DEFAULT_BLK_SZ);
+   crypto_cipher_encrypt_one(ctx-tfm, ctx-V, ctx-V);
hexdump(V', ctx-V);
 
/*
@@ -272,7 +271,6 @@ static int reset_prng_context(struct prng_context *ctx,
memset(ctx-DT, 0, DEFAULT_BLK_SZ);
 
memset(ctx-rand_data, 0, DEFAULT_BLK_SZ);
-   memset(ctx-last_rand_data, 0, DEFAULT_BLK_SZ);
 
ctx-rand_data_valid = DEFAULT_BLK_SZ;
 
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 24/25] crypto: ansi_cprng - Introduce non-deterministic mode

2014-12-07 Thread George Spelvin

If DT is not provided, use timetamp data for the DT vector.

This is permitted by the NIST spec (although the determinstic
mode is still required for testing purposes), and encouraged by
the X9.31 and X9.17 standards the RNG is adopted from.

The question, however, is whether it's okay to have a CPRNG
that's not deterministic.

Signed-off-by: George Spelvin li...@horizon.com
---
 crypto/ansi_cprng.c | 52 
 1 file changed, 40 insertions(+), 12 deletions(-)

diff --git a/crypto/ansi_cprng.c b/crypto/ansi_cprng.c
index 4d256d74..d39ce301 100644
--- a/crypto/ansi_cprng.c
+++ b/crypto/ansi_cprng.c
@@ -40,6 +40,7 @@ union cipherblock {
  */
 
 #define PRNG_NEED_RESET 0x1
+#define PRNG_DETERMINISTIC 0x02
 
 /*
  * Note: DT is our counter value
@@ -47,10 +48,12 @@ union cipherblock {
  * See http://csrc.nist.gov/groups/STM/cavp/documents/rng/931rngext.pdf
  * for implementation details.
  *
- * Note that even though DT stands for date/time, since this is a
- * deterministic pseudo-random generator, it is a determinsitic counter,
- * not a timestamp.  Its function is not to inject seed entropy, but to
- * ensure a long period in the output.
+ * Note that even though DT stands for date/time, this generator may be
+ * operated in a fully determinsitic mode by specifying it in the initial
+ * seed.  In this case, it does not inject any timestamp-based entropy,
+ * but operates as a simple counter to ensure a long period in the output.
+ *
+ * Both options are permitted by the NIST recommendation.
  */
 struct prng_context {
spinlock_t prng_lock;
@@ -97,6 +100,16 @@ static int _get_more_prng_bytes(struct prng_context *ctx, 
bool cont_test)
dbgprint(KERN_CRIT Calling _get_more_prng_bytes for context %p\n,
ctx);
 
+   /*
+* get_random_int produces a result based on the system jiffies
+* and random_get_entropy(), the highest-resolution timestamp
+* available.  This meets the spirit of the X9.17/X9.31 generation
+* specifications, but it's masked by hashing, so it can't be used
+* to leak information about /dev/random's seed material.
+*/
+   if (!(ctx-flags  PRNG_DETERMINISTIC))
+   ctx-DT.i[0] = get_random_int();
+
hexdump(DT, ctx-DT);
hexdump(V, ctx-V);
 
@@ -150,12 +163,16 @@ static int _get_more_prng_bytes(struct prng_context *ctx, 
bool cont_test)
/*
 * Now update our DT value
 */
-   for (i = DEFAULT_BLK_SZ - 1; i = 0; i--) {
-   ctx-DT.b[i] += 1;
-   if (ctx-DT.b[i] != 0)
-   break;
+   if (ctx-flags  PRNG_DETERMINISTIC) {
+   for (i = DEFAULT_BLK_SZ - 1; i = 0; i--) {
+   ctx-DT.b[i] += 1;
+   if (ctx-DT.b[i] != 0)
+   break;
+   }
+   hexdump(DT', ctx-DT);
+   } else {
+   ctx-DT.i[0] = 0;   /* Prevent backtracking */
}
-   hexdump(DT', ctx-DT);
 
dbgprint(Returning new block for context %p\n, ctx);
 
@@ -226,13 +243,24 @@ static int reset_prng_context(struct prng_context *ctx, 
const u8 *key,
int ret;
 
spin_lock_bh(ctx-prng_lock);
-   ctx-flags |= PRNG_NEED_RESET;
+   ctx-flags = PRNG_NEED_RESET;
ctx-rand_read_pos = DEFAULT_BLK_SZ;
 
memset(ctx-rand_data.b, 0, DEFAULT_BLK_SZ);
 
-   if (!DT)
-   DT = ctx-rand_data.b;  /* Use all-zeros if NULL */
+   if (DT) {
+   ctx-flags |= PRNG_DETERMINISTIC;
+   memcpy(ctx-DT.b, DT, DEFAULT_BLK_SZ);
+   } else {
+   int i;
+   /*
+* We will generate a fresh DT based on timestamp each time.
+* Also pad rest of buffer with seed, on general principles.
+* We reserve the first int for fresh entropy.
+*/
+   for (i = 1; i  BLK_INTS; i++)
+   ctx-DT.i[i] = get_random_int();
+   }
 
memcpy(ctx-DT.b, DT, DEFAULT_BLK_SZ);
memcpy(ctx-V.b, V, DEFAULT_BLK_SZ);
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 19/25] crypto: ansi_cprng - simplify get_prng_bytes

2014-12-07 Thread George Spelvin

The crypto is so slow that there's no point unrolling this function.

A simpler and clearer implementation will do just fine.

Two other changes:
1. Move the debug print outside the spinlock;
2. Move all modification of rand_data_valid out of _get_more_prng_bytes;
   that variable belongs to the byte-at-a-time layer outside the
   block-oriented primitive.

This is the code that gave us CVE-2013-4345; it's full of corner cases
which the standard test vectors don't exercise.  The stutter test was
created to exercise it.  (And yes, it did catch problems during
development.)

Signed-off-by: George Spelvin li...@horizon.com
---
 crypto/ansi_cprng.c | 74 ++---
 1 file changed, 25 insertions(+), 49 deletions(-)

diff --git a/crypto/ansi_cprng.c b/crypto/ansi_cprng.c
index c1c81266..a8cf98a5 100644
--- a/crypto/ansi_cprng.c
+++ b/crypto/ansi_cprng.c
@@ -145,7 +145,6 @@ static int _get_more_prng_bytes(struct prng_context *ctx, 
bool cont_test)
hexdump(DT', ctx-DT);
 
dbgprint(Returning new block for context %p\n, ctx);
-   ctx-rand_data_valid = 0;
 
return 0;
 }
@@ -156,68 +155,45 @@ static int get_prng_bytes(u8 *buf, unsigned int nbytes,
 {
u8 *ptr = buf;
unsigned int byte_count = nbytes;
-   int err;
+   unsigned int read_pos;
+   int err = -EINVAL;
 
+   dbgprint(KERN_CRIT getting %u random bytes for context %p\n,
+   nbytes, ctx);
 
spin_lock_bh(ctx-prng_lock);
 
-   err = -EINVAL;
if (ctx-flags  PRNG_NEED_RESET)
goto done;
 
-   err = byte_count;
-
-   dbgprint(KERN_CRIT getting %d random bytes for context %p\n,
-   byte_count, ctx);
-
-
-remainder:
-   if (ctx-rand_data_valid == DEFAULT_BLK_SZ) {
-   if (_get_more_prng_bytes(ctx, do_cont_test)  0) {
-   memset(buf, 0, nbytes);
-   err = -EINVAL;
-   goto done;
-   }
-   }
-
-   /*
-* Copy any data less than an entire block
-*/
-   if (byte_count  DEFAULT_BLK_SZ) {
-empty_rbuf:
-   while (ctx-rand_data_valid  DEFAULT_BLK_SZ) {
-   *ptr = ctx-rand_data[ctx-rand_data_valid];
-   ptr++;
-   byte_count--;
-   ctx-rand_data_valid++;
-   if (byte_count == 0)
-   goto done;
-   }
-   }
-
-   /*
-* Now copy whole blocks
-*/
-   for (; byte_count = DEFAULT_BLK_SZ; byte_count -= DEFAULT_BLK_SZ) {
-   if (ctx-rand_data_valid == DEFAULT_BLK_SZ) {
+   read_pos = ctx-rand_data_valid;
+   if (byte_count  DEFAULT_BLK_SZ - read_pos) {
+   /* Leading partial block */
+   unsigned int avail = DEFAULT_BLK_SZ - read_pos;
+
+   memcpy(ptr, ctx-rand_data + read_pos, avail);
+   ptr += avail;
+   byte_count -= avail;
+   read_pos = 0;
+
+   /* Intermediate full blocks */
+   for (;;) {
if (_get_more_prng_bytes(ctx, do_cont_test)  0) {
memset(buf, 0, nbytes);
-   err = -EINVAL;
goto done;
}
+   if (byte_count  DEFAULT_BLK_SZ)
+   break;
+   memcpy(ptr, ctx-rand_data, DEFAULT_BLK_SZ);
+   ptr += DEFAULT_BLK_SZ;
+   byte_count -= DEFAULT_BLK_SZ;
}
-   if (ctx-rand_data_valid  0)
-   goto empty_rbuf;
-   memcpy(ptr, ctx-rand_data, DEFAULT_BLK_SZ);
-   ctx-rand_data_valid += DEFAULT_BLK_SZ;
-   ptr += DEFAULT_BLK_SZ;
}
 
-   /*
-* Now go back and get any remaining partial block
-*/
-   if (byte_count)
-   goto remainder;
+   /* The final partial block; read_pos + byte_count = DEFAULT_BLK_SZ */
+   memcpy(ptr, ctx-rand_data + read_pos, byte_count);
+   ctx-rand_data_valid = read_pos + byte_count;
+   err = nbytes;
 
 done:
spin_unlock_bh(ctx-prng_lock);
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 18/25] crypto: testmgr - Add CPRNG stutter test.

2014-12-07 Thread George Spelvin

This is a test for the kind of bug that caused CVE-2013-4345,
which was in the code that adapted from fixed-size cipher blocks
to arbitrary-sized reads.

This does every known answer test twice: the first time using
block-aligned reads (assuming the known answer is a full block),
and a second time using reads of size 0, 1, 2, ... 61, 0, 1, ...

A simple 32-bit hash of all of the output bytes (not just the known
ones) is compared between the two.  Any error in the bookkeeping
will result in a mismatch.

A non-cryptographic hash suffices to detect coding errors.
We just need something better than an additive checksum, because
transposition is possible.

The maximum read size (61) is chosen so the pattern repeats
after 1891 = 31 * 61 bytes.  If this is relatively prime to the
CPRNG's internal buffer size, a long enough test will eventually
explore every possible read size at every possible alignment in
the CPRNG's internal buffer.

Signed-off-by: George Spelvin li...@horizon.com
---
 crypto/testmgr.c | 98 ++--
 1 file changed, 89 insertions(+), 9 deletions(-)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 6bf43682..a15860ad 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -1455,11 +1455,15 @@ static int test_cprng(struct crypto_rng *tfm,
  const struct cprng_testvec *template, unsigned int tcount)
 {
const char *algo = crypto_tfm_alg_driver_name(crypto_rng_tfm(tfm));
-   int err = 0, i, j;
-   u8 result[32];
+   int err = 0, i;
 
for (i = 0; i  tcount; i++) {
-   if (template[i].rlen  sizeof(result)) {
+   int j, k, bytes;
+   u32 hash1 = 0, hash2 = 0;
+   u8 result[61];  /* See below for size advice */
+   int rlen = template[i].rlen;
+
+   if (rlen  sizeof(result)) {
printk(KERN_CRIT alg: cprng: Cannot test %s\n, algo);
err = -EOVERFLOW;
break;
@@ -1468,33 +1472,109 @@ static int test_cprng(struct crypto_rng *tfm,
 
err = crypto_rng_reset(tfm, template[i].seed, template[i].slen);
if (err) {
+fail_reset:
printk(KERN_ERR alg: cprng: Failed to reset rng 
   for %s\n, algo);
break;
}
 
for (j = 0; j  template[i].loops; j++) {
-   err = crypto_rng_get_bytes(tfm, result,
-  template[i].rlen);
-   if (err != template[i].rlen) {
+   err = crypto_rng_get_bytes(tfm, result, rlen);
+   if (err != rlen) {
+fail_get:
printk(KERN_ERR alg: cprng: Failed to obtain 
   the correct amount of random data for 
   %s (requested %d, got %d)\n, algo,
-  template[i].rlen, err);
+  rlen, err);
if (err = 0)
err = -EINVAL;
break;
}
+   /* Compute simple hash for use by stutter test */
+   for (k = 0; k  rlen; k++) {
+   hash1 += result[k];
+   hash1 += hash1  10;
+   hash1 ^= hash1  6;
+   }
}
 
-   err = memcmp(result, template[i].result, template[i].rlen);
+   err = memcmp(result, template[i].result, rlen);
if (err) {
printk(KERN_ERR alg: cprng: Test %d failed for %s\n,
   i, algo);
-   hexdump(result, template[i].rlen);
+   hexdump(result, rlen);
err = -EINVAL;
break;
}
+
+   /*
+* Stutter test: repeat the computation, using odd-sized
+* reads.  This is to verify the output deblocking code,
+* which was the source of CVE-2013-4345.  So we read
+* from the RNG in 0, 1, 2, 3, 4, 5, ... byte chunks.
+*
+* When the read size reaches the size of our buffer,
+* the read size starts back at 0.
+*
+* For the current buffer size of 61, that happens after
+* 1891 = 31 * 61 bytes.  If this is relatively prime to
+* the CPRNG's internal state size, this will, if the test
+* is long enough, do every possible read size starting
+* at every possible offset in the CPRNG's internal state.
+*
+* The basic requirement to produce

[PATCH v2 21/25] crypto: ansi_cprng - Rename rand_data_valid more sensibly

2014-12-07 Thread George Spelvin

Since it counts the number of bytes in rand_data which have been output,
and are no longer available for output, it's hardly a count of
valid bytes.  rand_data_pos is more appropriate.

Signed-off-by: George Spelvin li...@horizon.com
---
 crypto/ansi_cprng.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/crypto/ansi_cprng.c b/crypto/ansi_cprng.c
index f345b575..f3e280c4 100644
--- a/crypto/ansi_cprng.c
+++ b/crypto/ansi_cprng.c
@@ -41,7 +41,7 @@
 struct prng_context {
spinlock_t prng_lock;
u8 flags;
-   u8 rand_data_valid;
+   u8 rand_read_pos;
u8 rand_data[DEFAULT_BLK_SZ];
u8 DT[DEFAULT_BLK_SZ];
u8 V[DEFAULT_BLK_SZ];
@@ -165,7 +165,7 @@ static int get_prng_bytes(u8 *buf, unsigned int nbytes,
if (ctx-flags  PRNG_NEED_RESET)
goto done;
 
-   read_pos = ctx-rand_data_valid;
+   read_pos = ctx-rand_read_pos;
if (byte_count  DEFAULT_BLK_SZ - read_pos) {
/* Leading partial block */
unsigned int avail = DEFAULT_BLK_SZ - read_pos;
@@ -191,7 +191,7 @@ static int get_prng_bytes(u8 *buf, unsigned int nbytes,
 
/* The final partial block; read_pos + byte_count = DEFAULT_BLK_SZ */
memcpy(ptr, ctx-rand_data + read_pos, byte_count);
-   ctx-rand_data_valid = read_pos + byte_count;
+   ctx-rand_read_pos = read_pos + byte_count;
err = nbytes;
 
 done:
@@ -213,7 +213,7 @@ static int reset_prng_context(struct prng_context *ctx, 
const u8 *key,
 
spin_lock_bh(ctx-prng_lock);
ctx-flags |= PRNG_NEED_RESET;
-   ctx-rand_data_valid = DEFAULT_BLK_SZ;
+   ctx-rand_read_pos = DEFAULT_BLK_SZ;
 
memset(ctx-rand_data, 0, DEFAULT_BLK_SZ);
 
@@ -324,7 +324,7 @@ static int fips_cprng_reset(struct crypto_rng *tfm, const 
u8 *seed,
 
/* this primes our continuity test */
rc = _get_more_prng_bytes(prng, false);
-   prng-rand_data_valid = DEFAULT_BLK_SZ;
+   prng-rand_read_pos = DEFAULT_BLK_SZ;
 
 out:
return rc;
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 20/25] crypto: ansi_cprng - simplify xor_vectors() to xor_block()

2014-12-07 Thread George Spelvin

It doesn't need a second input or a length parameter.

Signed-off-by: George Spelvin li...@horizon.com
---
 crypto/ansi_cprng.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/crypto/ansi_cprng.c b/crypto/ansi_cprng.c
index a8cf98a5..f345b575 100644
--- a/crypto/ansi_cprng.c
+++ b/crypto/ansi_cprng.c
@@ -63,15 +63,14 @@ if (dbg)\
printk(format, ##args);\
 } while (0)
 
-static void xor_vectors(const u8 *in1, const u8 *in2,
-   u8 *out, unsigned int size)
+static void xor_block(const u8 in[DEFAULT_BLK_SZ], u8 out[DEFAULT_BLK_SZ])
 {
int i;
 
-   for (i = 0; i  size; i++)
-   out[i] = in1[i] ^ in2[i];
-
+   for (i = 0; i  DEFAULT_BLK_SZ; i++)
+   out[i] ^= in[i];
 }
+
 /*
  * Returns DEFAULT_BLK_SZ bytes of random data per call
  * returns 0 if generation succeeded, 0 if something went wrong
@@ -100,7 +99,7 @@ static int _get_more_prng_bytes(struct prng_context *ctx, 
bool cont_test)
 * keep that output in ctx-V for the moment; we need the
 * previous rand_data for ons more thing.
 */
-   xor_vectors(tmp, ctx-V, ctx-V, DEFAULT_BLK_SZ);
+   xor_block(tmp, ctx-V);
hexdump(V^I, ctx-V);
crypto_cipher_encrypt_one(ctx-tfm, ctx-V, ctx-V);
hexdump(R, ctx-V);
@@ -128,7 +127,7 @@ static int _get_more_prng_bytes(struct prng_context *ctx, 
bool cont_test)
 * Lastly xor the random data with I and encrypt that to obtain
 * a new secret vector V.
 */
-   xor_vectors(tmp, ctx-V, ctx-V, DEFAULT_BLK_SZ);
+   xor_block(tmp, ctx-V);
hexdump(R^I, ctx-V);
memzero_explicit(tmp, DEFAULT_BLK_SZ);
crypto_cipher_encrypt_one(ctx-tfm, ctx-V, ctx-V);
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 25/25] crypto: ansi_cprng - If non-deterministic, don't buffer old output

2014-12-07 Thread George Spelvin

The standards documents are silent on how to handle multi-part
output, so this is an RFC for something in the spirit of the
source specifications, but not actually required by them.

Signed-off-by: George Spelvin li...@horizon.com
---
 crypto/ansi_cprng.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/crypto/ansi_cprng.c b/crypto/ansi_cprng.c
index d39ce301..b88d3446 100644
--- a/crypto/ansi_cprng.c
+++ b/crypto/ansi_cprng.c
@@ -198,12 +198,14 @@ static int get_prng_bytes(u8 *buf, unsigned int nbytes,
 
read_pos = ctx-rand_read_pos;
if (byte_count  DEFAULT_BLK_SZ - read_pos) {
-   /* Leading partial block */
-   unsigned int avail = DEFAULT_BLK_SZ - read_pos;
+   if (ctx-flags  PRNG_DETERMINISTIC) {
+   /* Return leftover data from previous call */
+   unsigned int avail = DEFAULT_BLK_SZ - read_pos;
 
-   memcpy(ptr, ctx-rand_data.b + read_pos, avail);
-   ptr += avail;
-   byte_count -= avail;
+   memcpy(ptr, ctx-rand_data.b + read_pos, avail);
+   ptr += avail;
+   byte_count -= avail;
+   }
read_pos = 0;
 
/* Intermediate full blocks */
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 14/25] crypto: tcrypt - Add const qualifiers all over the test code.

2014-12-07 Thread George Spelvin

Huge diff, but simple.

Signed-off-by: George Spelvin li...@horizon.com
---
 crypto/tcrypt.c  |  46 +++
 crypto/tcrypt.h  |  30 ++--
 crypto/testmgr.c |  58 
 crypto/testmgr.h | 410 +++
 4 files changed, 276 insertions(+), 268 deletions(-)

diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c
index 890449e6..38a57ad0 100644
--- a/crypto/tcrypt.c
+++ b/crypto/tcrypt.c
@@ -63,7 +63,7 @@ static u32 mask;
 static int mode;
 static char *tvmem[TVMEMSIZE];
 
-static char *check[] = {
+static const char * const check[] = {
des, md5, des3_ede, rot13, sha1, sha224, sha256,
blowfish, twofish, serpent, sha384, sha512, md4, aes,
cast6, arc4, michael_mic, deflate, crc32c, tea, xtea,
@@ -211,8 +211,8 @@ out:
return ret;
 }
 
-static u32 block_sizes[] = { 16, 64, 256, 1024, 8192, 0 };
-static u32 aead_sizes[] = { 16, 64, 256, 512, 1024, 2048, 4096, 8192, 0 };
+static const u32 block_sizes[] = { 16, 64, 256, 1024, 8192, 0 };
+static const u32 aead_sizes[] = { 16, 64, 256, 512, 1024, 2048, 4096, 8192, 0 
};
 
 #define XBUFSIZE 8
 #define MAX_IVLEN 32
@@ -266,9 +266,9 @@ static void sg_init_aead(struct scatterlist *sg, char 
*xbuf[XBUFSIZE],
 }
 
 static void test_aead_speed(const char *algo, int enc, unsigned int secs,
-   struct aead_speed_template *template,
+   const struct aead_speed_template *template,
unsigned int tcount, u8 authsize,
-   unsigned int aad_size, u8 *keysize)
+   unsigned int aad_size, const u8 *keysize)
 {
unsigned int i, j;
struct crypto_aead *tfm;
@@ -284,7 +284,7 @@ static void test_aead_speed(const char *algo, int enc, 
unsigned int secs,
char *xbuf[XBUFSIZE];
char *xoutbuf[XBUFSIZE];
char *axbuf[XBUFSIZE];
-   unsigned int *b_size;
+   const unsigned int *b_size;
unsigned int iv_len;
 
if (aad_size = PAGE_SIZE) {
@@ -412,8 +412,8 @@ out_noxbuf:
 }
 
 static void test_cipher_speed(const char *algo, int enc, unsigned int secs,
- struct cipher_speed_template *template,
- unsigned int tcount, u8 *keysize)
+ const struct cipher_speed_template *template,
+ unsigned int tcount, const u8 *keysize)
 {
unsigned int ret, i, j, iv_len;
const char *key;
@@ -421,7 +421,7 @@ static void test_cipher_speed(const char *algo, int enc, 
unsigned int secs,
struct crypto_blkcipher *tfm;
struct blkcipher_desc desc;
const char *e;
-   u32 *b_size;
+   const u32 *b_size;
 
if (enc == ENCRYPT)
e = encryption;
@@ -513,7 +513,7 @@ out:
 
 static int test_hash_jiffies_digest(struct hash_desc *desc,
struct scatterlist *sg, int blen,
-   char *out, int secs)
+   u8 *out, int secs)
 {
unsigned long start, end;
int bcount;
@@ -533,7 +533,7 @@ static int test_hash_jiffies_digest(struct hash_desc *desc,
 }
 
 static int test_hash_jiffies(struct hash_desc *desc, struct scatterlist *sg,
-int blen, int plen, char *out, int secs)
+int blen, int plen, u8 *out, int secs)
 {
unsigned long start, end;
int bcount, pcount;
@@ -565,7 +565,7 @@ static int test_hash_jiffies(struct hash_desc *desc, struct 
scatterlist *sg,
 }
 
 static int test_hash_cycles_digest(struct hash_desc *desc,
-  struct scatterlist *sg, int blen, char *out)
+  struct scatterlist *sg, int blen, u8 *out)
 {
unsigned long cycles = 0;
int i;
@@ -608,7 +608,7 @@ out:
 }
 
 static int test_hash_cycles(struct hash_desc *desc, struct scatterlist *sg,
-   int blen, int plen, char *out)
+   int blen, int plen, u8 *out)
 {
unsigned long cycles = 0;
int i, pcount;
@@ -681,7 +681,7 @@ static void test_hash_sg_init(struct scatterlist *sg)
 }
 
 static void test_hash_speed(const char *algo, unsigned int secs,
-   struct hash_speed *speed)
+   const struct hash_speed *speed)
 {
struct scatterlist sg[TVMEMSIZE];
struct crypto_hash *tfm;
@@ -773,7 +773,7 @@ static inline int do_one_ahash_op(struct ahash_request 
*req, int ret)
 }
 
 static int test_ahash_jiffies_digest(struct ahash_request *req, int blen,
-char *out, int secs)
+u8 *out, int secs)
 {
unsigned long start, end;
int bcount;
@@ -793,7 +793,7 @@ static int test_ahash_jiffies_digest(struct ahash_request 
*req, int blen,
 }
 
 static int test_ahash_jiffies(struct ahash_request *req, int blen

[PATCH v2 15/25] crypto: testmgr - Merge seed arrays in struct cprng_testvec

2014-12-07 Thread George Spelvin

The current code stores three pointers to three arrays, and three lengths,
and then has to kmalloc an array just to concatenate them.

This seems ridiculous.  Just store one combined array and combined
length, and don't do any reformatting at run-time.

Signed-off-by: George Spelvin li...@horizon.com
---
 crypto/testmgr.c | 37 ++---
 crypto/testmgr.h | 98 +---
 2 files changed, 53 insertions(+), 82 deletions(-)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 9f4746eb..0e179c72 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -1455,33 +1455,17 @@ static int test_cprng(struct crypto_rng *tfm,
  const struct cprng_testvec *template, unsigned int tcount)
 {
const char *algo = crypto_tfm_alg_driver_name(crypto_rng_tfm(tfm));
-   int err = 0, i, j, seedsize;
-   u8 *seed;
-   char result[32];
-
-   seedsize = crypto_rng_seedsize(tfm);
-
-   seed = kmalloc(seedsize, GFP_KERNEL);
-   if (!seed) {
-   printk(KERN_ERR alg: cprng: Failed to allocate seed space 
-  for %s\n, algo);
-   return -ENOMEM;
-   }
+   int err = 0, i, j;
+   u8 result[32];
 
for (i = 0; i  tcount; i++) {
-   memset(result, 0, 32);
+   memset(result, 0, sizeof result);
 
-   memcpy(seed, template[i].v, template[i].vlen);
-   memcpy(seed + template[i].vlen, template[i].key,
-  template[i].klen);
-   memcpy(seed + template[i].vlen + template[i].klen,
-  template[i].dt, template[i].dtlen);
-
-   err = crypto_rng_reset(tfm, seed, seedsize);
+   err = crypto_rng_reset(tfm, template[i].seed, template[i].slen);
if (err) {
printk(KERN_ERR alg: cprng: Failed to reset rng 
   for %s\n, algo);
-   goto out;
+   break;
}
 
for (j = 0; j  template[i].loops; j++) {
@@ -1492,23 +1476,20 @@ static int test_cprng(struct crypto_rng *tfm,
   the correct amount of random data for 
   %s (requested %d, got %d)\n, algo,
   template[i].rlen, err);
-   goto out;
+   break;
}
}
 
-   err = memcmp(result, template[i].result,
-template[i].rlen);
+   err = memcmp(result, template[i].result, template[i].rlen);
if (err) {
printk(KERN_ERR alg: cprng: Test %d failed for %s\n,
   i, algo);
hexdump(result, template[i].rlen);
err = -EINVAL;
-   goto out;
+   break;
}
}
 
-out:
-   kfree(seed);
return err;
 }
 
@@ -1730,6 +1711,8 @@ static int alg_test_cprng(const struct alg_test_desc 
*desc, const char *driver,
 
crypto_free_rng(rng);
 
+printk(alg_test_cprng: testing %s: err %d\n, driver, err);
+
return err;
 }
 
diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index 306a33b2..af346520 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -81,14 +81,10 @@ struct aead_testvec {
 };
 
 struct cprng_testvec {
-   const char *key;
-   const char *dt;
-   const char *v;
+   const char *seed;
const char *result;
-   unsigned char klen;
-   unsigned short dtlen;
-   unsigned short vlen;
-   unsigned short rlen;
+   unsigned char slen;
+   unsigned char rlen;
unsigned short loops;
 };
 
@@ -20708,90 +20704,82 @@ static const struct aead_testvec 
aes_ccm_rfc4309_dec_tv_template[] = {
  * test vectors, taken from Appendix B.2.9 and B.2.10:
  * http://csrc.nist.gov/groups/STM/cavp/documents/rng/RNGVS.pdf
  * Only AES-128 is supported at this time.
+ *
+ * CPRNGs take a single seed argument.  For this algorithm, it consists
+ * of the concatenation of the initial vector V, the key, and (optional)
+ * the initial DT.
  */
 #define ANSI_CPRNG_AES_TEST_VECTORS6
 
 static const struct cprng_testvec ansi_cprng_aes_tv_template[] = {
{
-   .key= \xf3\xb1\x66\x6d\x13\x60\x72\x42
- \xed\x06\x1c\xab\xb8\xd4\x62\x02,
-   .klen   = 16,
-   .dt = \xe6\xb3\xbe\x78\x2a\x23\xfa\x62
+   .seed   = \x80\x00\x00\x00\x00\x00\x00\x00/* V[16] */
+ \x00\x00\x00\x00\x00\x00\x00\x00
+ \xf3\xb1\x66\x6d\x13\x60\x72\x42/* Key[16] */
+ \xed\x06\x1c\xab\xb8\xd4\x62\x02
+ \xe6\xb3\xbe\x78\x2a\x23\xfa\x62/* DT[16] */
  \xd7\x1d

[PATCH v2 06/25] crypto: ansi_cprng - Make cont_test a bool

2014-12-07 Thread George Spelvin

This makes no difference to the generated code, but I like to use bool
where appropriate for documentation.

Signed-off-by: George Spelvin li...@horizon.com
---
 crypto/ansi_cprng.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/crypto/ansi_cprng.c b/crypto/ansi_cprng.c
index 2edac42e..730e0857 100644
--- a/crypto/ansi_cprng.c
+++ b/crypto/ansi_cprng.c
@@ -79,7 +79,7 @@ static void xor_vectors(unsigned char *in1, unsigned char 
*in2,
  * Returns DEFAULT_BLK_SZ bytes of random data per call
  * returns 0 if generation succeeded, 0 if something went wrong
  */
-static int _get_more_prng_bytes(struct prng_context *ctx, int cont_test)
+static int _get_more_prng_bytes(struct prng_context *ctx, bool cont_test)
 {
int i;
unsigned char tmp[DEFAULT_BLK_SZ];
@@ -155,7 +155,7 @@ static int _get_more_prng_bytes(struct prng_context *ctx, 
int cont_test)
 
 /* Our exported functions */
 static int get_prng_bytes(char *buf, size_t nbytes, struct prng_context *ctx,
-   int do_cont_test)
+   bool do_cont_test)
 {
unsigned char *ptr = buf;
unsigned int byte_count = (unsigned int)nbytes;
@@ -322,7 +322,7 @@ static int cprng_get_random(struct crypto_rng *tfm, u8 
*rdata,
 {
struct prng_context *prng = crypto_rng_ctx(tfm);
 
-   return get_prng_bytes(rdata, dlen, prng, 0);
+   return get_prng_bytes(rdata, dlen, prng, false);
 }
 
 /*
@@ -356,7 +356,7 @@ static int fips_cprng_get_random(struct crypto_rng *tfm, u8 
*rdata,
 {
struct prng_context *prng = crypto_rng_ctx(tfm);
 
-   return get_prng_bytes(rdata, dlen, prng, 1);
+   return get_prng_bytes(rdata, dlen, prng, true);
 }
 
 static int fips_cprng_reset(struct crypto_rng *tfm, u8 *seed, unsigned int 
slen)
@@ -380,7 +380,7 @@ static int fips_cprng_reset(struct crypto_rng *tfm, u8 
*seed, unsigned int slen)
goto out;
 
/* this primes our continuity test */
-   rc = get_prng_bytes(rdata, DEFAULT_BLK_SZ, prng, 0);
+   rc = get_prng_bytes(rdata, DEFAULT_BLK_SZ, prng, false);
prng-rand_data_valid = DEFAULT_BLK_SZ;
 
 out:
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 08/25] crypto: ansi_cprng - Don't call reset_prng_context from cprng_init

2014-12-07 Thread George Spelvin

The PRNG_NEEDS_RESET flag ensures that it will be called, so
reset_prng_context() no longer needs to support NULL key and V pointers.

Signed-off-by: George Spelvin li...@horizon.com
---
 crypto/ansi_cprng.c | 47 ++-
 1 file changed, 14 insertions(+), 33 deletions(-)

diff --git a/crypto/ansi_cprng.c b/crypto/ansi_cprng.c
index 022662d7..62b8f958 100644
--- a/crypto/ansi_cprng.c
+++ b/crypto/ansi_cprng.c
@@ -23,10 +23,8 @@
 
 #include internal.h
 
-#define DEFAULT_PRNG_KEY 0123456789abcdef
 #define DEFAULT_PRNG_KSZ 16
 #define DEFAULT_BLK_SZ 16
-#define DEFAULT_V_SEED zaybxcwdveuftgsh
 
 /*
  * Flags for the prng_context flags field
@@ -250,41 +248,28 @@ static int reset_prng_context(struct prng_context *ctx,
  unsigned char *V, unsigned char *DT)
 {
int ret;
-   unsigned char *prng_key;
 
spin_lock_bh(ctx-prng_lock);
ctx-flags |= PRNG_NEED_RESET;
-
-   prng_key = (key != NULL) ? key : (unsigned char *)DEFAULT_PRNG_KEY;
-
-   if (!key)
-   klen = DEFAULT_PRNG_KSZ;
-
-   if (V)
-   memcpy(ctx-V, V, DEFAULT_BLK_SZ);
-   else
-   memcpy(ctx-V, DEFAULT_V_SEED, DEFAULT_BLK_SZ);
-
-   if (DT)
-   memcpy(ctx-DT, DT, DEFAULT_BLK_SZ);
-   else
-   memset(ctx-DT, 0, DEFAULT_BLK_SZ);
-
-   memset(ctx-rand_data, 0, DEFAULT_BLK_SZ);
-
ctx-rand_data_valid = DEFAULT_BLK_SZ;
 
-   ret = crypto_cipher_setkey(ctx-tfm, prng_key, klen);
+   memset(ctx-rand_data, 0, DEFAULT_BLK_SZ);
+
+   if (!DT)
+   DT = ctx-rand_data;/* Use all-zeros if NULL */
+
+   memcpy(ctx-DT, DT, DEFAULT_BLK_SZ);
+   memcpy(ctx-V, V, DEFAULT_BLK_SZ);
+
+   ret = crypto_cipher_setkey(ctx-tfm, key, klen);
if (ret) {
dbgprint(KERN_CRIT PRNG: setkey() failed flags=%x\n,
crypto_cipher_get_flags(ctx-tfm));
-   goto out;
+   } else {
+   ctx-flags = ~PRNG_NEED_RESET;
}
-
-   ret = 0;
-   ctx-flags = ~PRNG_NEED_RESET;
-out:
spin_unlock_bh(ctx-prng_lock);
+
return ret;
 }
 
@@ -300,13 +285,9 @@ static int cprng_init(struct crypto_tfm *tfm)
return PTR_ERR(ctx-tfm);
}
 
-   if (reset_prng_context(ctx, NULL, DEFAULT_PRNG_KSZ, NULL, NULL)  0)
-   return -EINVAL;
-
/*
-* after allocation, we should always force the user to reset
-* so they don't inadvertently use the insecure default values
-* without specifying them intentially
+* After allocation, we always force the user to reset, which
+* completes initialization of the context.
 */
ctx-flags |= PRNG_NEED_RESET;
return 0;
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 07/25] crypto: ansi_cprng - Shrink context some more

2014-12-07 Thread George Spelvin

ctx-flags only has 2 bits assigned, so there's no need for a
32-bit field.  Likewise, rand_data_valid is at most 16.

A typical x86 spinlock_t is 16 bits, so they fit very nicely
right next to it, shrinking the 64-bit context structure to
64 bytes.

Signed-off-by: George Spelvin li...@horizon.com
---
 crypto/ansi_cprng.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/crypto/ansi_cprng.c b/crypto/ansi_cprng.c
index 730e0857..022662d7 100644
--- a/crypto/ansi_cprng.c
+++ b/crypto/ansi_cprng.c
@@ -43,12 +43,12 @@
  */
 struct prng_context {
spinlock_t prng_lock;
+   u8 flags;
+   u8 rand_data_valid;
unsigned char rand_data[DEFAULT_BLK_SZ];
unsigned char DT[DEFAULT_BLK_SZ];
unsigned char V[DEFAULT_BLK_SZ];
-   u32 rand_data_valid;
struct crypto_cipher *tfm;
-   u32 flags;
 };
 
 static int dbg;
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 03/25] crypto: ansi_cprng - Use %phN rather than print_hex_dump for debug

2014-12-07 Thread George Spelvin

Since the goal is to compare to NIST test vectors, which are printed
on a single like without spaces, this comes much closer.

Signed-off-by: George Spelvin li...@horizon.com
---
 crypto/ansi_cprng.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/crypto/ansi_cprng.c b/crypto/ansi_cprng.c
index 143e0cfa..b54e4e75 100644
--- a/crypto/ansi_cprng.c
+++ b/crypto/ansi_cprng.c
@@ -60,10 +60,7 @@ static int dbg;
 static void hexdump(char *note, unsigned char *buf, unsigned int len)
 {
if (dbg) {
-   printk(KERN_CRIT %s, note);
-   print_hex_dump(KERN_CONT, , DUMP_PREFIX_OFFSET,
-   16, 1,
-   buf, len, false);
+   printk(KERN_CRIT %s%*phN, note, (int)len, buf);
}
 }
 
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 17/25] crypto: testmgr - Don't crash if CPRNG test result is large

2014-12-07 Thread George Spelvin

The idea is to catch as many programmer mistakes as possible.

Signed-off-by: George Spelvin li...@horizon.com
---
 crypto/testmgr.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 9faf265f..6bf43682 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -1459,6 +1459,11 @@ static int test_cprng(struct crypto_rng *tfm,
u8 result[32];
 
for (i = 0; i  tcount; i++) {
+   if (template[i].rlen  sizeof(result)) {
+   printk(KERN_CRIT alg: cprng: Cannot test %s\n, algo);
+   err = -EOVERFLOW;
+   break;
+   }
memset(result, 0, sizeof result);
 
err = crypto_rng_reset(tfm, template[i].seed, template[i].slen);
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 22/25] crypto: ansi_cprng - Tweak comments

2014-12-07 Thread George Spelvin

It's not based on the NIST-recommended algorithm, it *is* the
NIST-recommended algorithm, and has even passed their validation
tests.

Also make clear that it's intended to be a determinsitic generator,
despite the confusing name of the DT vector.

Signed-off-by: George Spelvin li...@horizon.com
---
 crypto/ansi_cprng.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/crypto/ansi_cprng.c b/crypto/ansi_cprng.c
index f3e280c4..9c8475a2 100644
--- a/crypto/ansi_cprng.c
+++ b/crypto/ansi_cprng.c
@@ -1,7 +1,9 @@
 /*
- * PRNG: Pseudo Random Number Generator
- *   Based on NIST Recommended PRNG From ANSI X9.31 Appendix A.2.4 using
- *   AES 128 cipher
+ * PRNG: This is the NIST-Recommended Random Number Generator Based
+ *  on ANSI X9.31 Appendix A.2.4 using the AES 128 cipher.
+ *  Many specific kernel snapshots have collected validations from
+ *  the NIST RNG Validation System; results are available at
+ *  http://csrc.nist.gov/groups/STM/cavp/documents/rng/rngval.html
  *
  *  (C) Neil Horman nhor...@tuxdriver.com
  *
@@ -9,8 +11,6 @@
  *  under the terms of the GNU General Public License as published by the
  *  Free Software Foundation; either version 2 of the License, or (at your
  *  any later version.
- *
- *
  */
 
 #include crypto/internal/rng.h
@@ -36,7 +36,12 @@
  * Note: DT is our counter value
  *  V is our seed vector
  * See http://csrc.nist.gov/groups/STM/cavp/documents/rng/931rngext.pdf
- * for implementation details
+ * for implementation details.
+ *
+ * Note that even though DT stands for date/time, since this is a
+ * deterministic pseudo-random generator, it is a determinsitic counter,
+ * not a timestamp.  Its function is not to inject seed entropy, but to
+ * ensure a long period in the output.
  */
 struct prng_context {
spinlock_t prng_lock;
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 25/25] crypto: ansi_cprng - If non-deterministic, don't buffer old output

2014-12-07 Thread George Spelvin

By the way, this patch includes a bug due to a last minute oh, I can
make that more efficient! which I realized after a night's sleep.
(The v1 patch worked, FWIW.)

Anyway, it's an RFC; I'm not even sure if I want this personally,
but it's a bit of extra paranoia to always genreate fresh seed per
read.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/17] Multiple changes to crypto/ansi_cprng.c

2014-12-03 Thread George Spelvin

 So, generally speaking I'm ok with most of this I think,

One open issue... I thought you had said that the non-determinsitic
version was not in the current X9.31 and I had the impression that it
wasn't wanted.  I've got reorganized patch series that gets rid of that
and just tweaks the comments.

I'm happy to put it back, of course.  Or just hold off until Herbert
chimes in with an opinion.

As an example of the reorganization, now the comment for patch 4 in the
series is a bit clearer:

crypto: ansi_cprng - Eliminate ctx-I and ctx-last_rand_data

Careful use of the other available buffers avoids the need for
these, shrinking the context by 32 bytes.

Neither the debug output nor the FIPS-required anti-repetition check
are changed in the slightest.

(That's because I change the debug output in patches 2 and 3, to make
this more-sensitive patch easier to review.)

 but before it goes anywhere, it really needs to pass the NIST and FIPS
 test vectors.  Can you do that please and post the results?

It of course passes the ones in testmgr.h, and I can add the others from
http://csrc.nist.gov/groups/STM/cavp/documents/rng/RNGVS.pdf
and
http://csrc.nist.gov/groups/STM/cavp/documents/rng/rngtestvectors.zip

I didn't know there were separate FIPS test vectors for this PRNG;
can you give me a clue where to find them?


I'm also not sure what the results look like.  As I mentioned previously,
I have been unable to figure out how to make the tcrypt code print anything
in the event of a successful test, so its output is the empty string.

I am using my own version which prints cprng: Test %d passed, and
I can turn debugging on, but the 1-round MCT produces an annoying
amount of output in that case.

(Note to self: teach test_cprng to test odd-sized reads, since that was
a previous bug source and I'm touching that code some more.)

 I'm fine with changing the output, as I don't think anything
 particularly relies on the format, but I can't speak for others.

The current debug output for the first 5 testmgr.h tests (the 6th is
omitted) is follows, but obviously things get reconfirmed right at
the end.

getting 16 random bytes for context 88042ce41b18
Calling _get_more_prng_bytes for context 88042ce41b18
DT = e6b3be782a23fa62d71d4afbb0e922f9
V = 8000
I = 92640cf08d302f2550ea3d73d1985385
V^I = 12640cf08d302f2550ea3d73d1985385
R = 59531ed13bb0c05584796685c12f7641
R^I = cb371221b680ef70d4935bf610b725c4
V' = 2ac489ad47640b6d86380229e769adc3
DT' = e6b3be782a23fa62d71d4afbb0e922fa
Returning new block for context 88042ce41b18
returning 16 from get_prng_bytes in context 88042ce41b18
cprng: Test 0 passed
getting 16 random bytes for context 88042ce41b18
Calling _get_more_prng_bytes for context 88042ce41b18
DT = e6b3be782a23fa62d71d4afbb0e922fa
V = c000
I = b30a8cba1c4cd3a14af7d9db9488f5f0
V^I = 730a8cba1c4cd3a14af7d9db9488f5f0
R = 7c222cf4ca8fa24c1c9cb641a9f3220d
R^I = cf28a04ed6c371ed566b6f9a3d7bd7fd
V' = fcbd61e7c51dd167624c56cb97b4c66d
DT' = e6b3be782a23fa62d71d4afbb0e922fb
Returning new block for context 88042ce41b18
returning 16 from get_prng_bytes in context 88042ce41b18
cprng: Test 1 passed
getting 16 random bytes for context 88042ce41b18
Calling _get_more_prng_bytes for context 88042ce41b18
DT = e6b3be782a23fa62d71d4afbb0e922fb
V = e000
I = d761699cc3de4a90234c62eec2479cd9
V^I = 3761699cc3de4a90234c62eec2479cd9
R = 8aaa003966675be529142881a94d4ec7
R^I = 5dcb69a5a5b911750a584a6f6b0ad21e
V' = ef2c1fbf609a68f8fe5110636bf4bf6a
DT' = e6b3be782a23fa62d71d4afbb0e922fc
Returning new block for context 88042ce41b18
returning 16 from get_prng_bytes in context 88042ce41b18
cprng: Test 2 passed
getting 16 random bytes for context 88042ce41b18
Calling _get_more_prng_bytes for context 88042ce41b18
DT = e6b3be782a23fa62d71d4afbb0e922fc
V = f000
I = 048ecb25943e8c31d614cd8a23f13f84
V^I = f48ecb25943e8c31d614cd8a23f13f84
R = 88dda456302423e5f69da57e7b95c73a
R^I = 8c536f73a41aafd4208968f45864f8be
V' = 48893b71bce400b65e21ba378a0ad570
DT' = e6b3be782a23fa62d71d4afbb0e922fd
Returning new block for context 88042ce41b18
returning 16 from get_prng_bytes in context 88042ce41b18
cprng: Test 3 passed
getting 16 random bytes for context 88042ce41b18
Calling _get_more_prng_bytes for context 88042ce41b18
DT = e6b3be782a23fa62d71d4afbb0e922fd
V = f800
I = 3a6a1754bdf6f844e53662990cadc492
V^I = c26a1754bdf6f844e53662990cadc492
R = 052592466179d2cb78c40b140a5a9ac8
R^I = 3f4f8512dc8f2a8f9df2698d06f75e5a
V' = ac4b23c4fb1e85098d9927afa617ad88
DT' = e6b3be782a23fa62d71d4afbb0e922fe
Returning new block for context 88042ce41b18
returning 16 from get_prng_bytes in context 88042ce41b18
cprng: Test 4 passed
self_test: err 0
prng_mod_init: err 0


 I noticed the unsigned char vs u8 issue, but didn't make the change

[PATCH 00/17] Multiple changes to crypto/ansi_cprng.c

2014-12-02 Thread George Spelvin

Thank you all for your hospitality to my amateurish efforts.

Given that this all started with me reading the code in an attempt to
understand it, it is likely that some of the problems I address are
actually misunderstandings on my part.  If all I get from this patch series
is some enlightenment, that's okay.

It's even more likely that some parts contain the germ of a good idea,
but will need considerable rework to be acceptable.  I look forward
to that too.

Expecting that much more work will be needed, I've run the testmgr.h
test vectors on this code, but haven't desk-checked it as thoroughly as
security-sensitive code should be before final acceptance.  If the ideas
pass muster, I'll double-check the implementatoin details.

Really, I'm just trying to understand the code.  Patches are just
a very specific way of talking about it.

Here's an overview of the series.  It's a lot of code cleanup, and
a bit of semantic change.

[01/17] crypto: ansi_cprng - Rename rand_data_valid more sensibly
I find it confusing, so I rename it to rand_data_pos
[02/17] crypto: ansi_cprng - Eliminate ctx-last_rand_data
Shrink the context structure.
[03/17] crypto: ansi_cprng - Eliminate ctx-I
Shrink it some more.
[04/17] crypto: ansi_cprng - simplify xor_vectors() to xor_block()
We don't need a size parameter.
[05/17] crypto: ansi_cprng - Add const annotations to hexdump()
I like const.
[06/17] crypto: ansi_cprng - Eliminate unused PRNG_FIXED_SIZE flag
It's dead code ACAICT.
[07/17] crypto: ansi_cprng - Shrink rand_read_pos  flags
A little more context shrinking.
[08/17] crypto: ansi_cprng - Require non-null key  V in reset_prng_context
Sue to the PRNG_NEED_RESET flag, cprng_init() doesn't need to call
reset_prng_context) directly.
[09/17] crypto: ansi_cprng - Clean up some variable types
Try to be more consistent about signed vs. unsigned.
[10/17] crypto: ansi_cprng - simplify get_prng_bytes
Code shrink  simplification.
[11/17] crypto: ansi_cprng - unroll _get_more_prng_bytes
Slight object code size savings, and definite readability improvement.
[12/17] crypto: ansi_cprng - Create a block buffer data type
union { u8 bytes[16]; u32 ints[4]; u64 longs[2]; }, sort of.
[13/17] crypto: ansi_cprng - If DT is not provided, use a fresh timestamp
This is the big semantic change.  I'm rather proud of the use
of get_random_int() as a timestamp, but feedback is definitely
solicited!
[14/17] crypto: ansi_cprng - If DT is omitted, don't buffer old output
Not sure if this is desirable or not.
[15/17] crypto: testmgr - Teach test_cprng to handle non-default seed sizes
I consider this a latent bug that patch 17 sould expose.
[16/17] crypto: testmgr - Merge seed arrays in struct cprng_testvec
I think this is a better way of solving the preceding problem,
but it's more intrusive.  All the consts I add are not a critical
part of the patch, but an example of what I'd like to do to the
rest of the code.
[17/17] crypto: ansi_cprng - Shrink default seed size
This makes (some amount of) true entropy the default.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 01/17] crypto: ansi_cprng - Rename rand_data_valid more sensibly

2014-12-02 Thread George Spelvin

It's more like rand_data_invalid (data which has already been output),
so it's a pretty bad misnomer.  But rand_data_pos is even better.

Signed-off-by: George Spelvin li...@horizon.com
---
 crypto/ansi_cprng.c | 25 +++--
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/crypto/ansi_cprng.c b/crypto/ansi_cprng.c
index 97fe3110..c9e1684b 100644
--- a/crypto/ansi_cprng.c
+++ b/crypto/ansi_cprng.c
@@ -50,7 +50,7 @@ struct prng_context {
unsigned char DT[DEFAULT_BLK_SZ];
unsigned char I[DEFAULT_BLK_SZ];
unsigned char V[DEFAULT_BLK_SZ];
-   u32 rand_data_valid;
+   u32 rand_read_pos;  /* Offset into rand_data[] */
struct crypto_cipher *tfm;
u32 flags;
 };
@@ -174,7 +174,7 @@ static int _get_more_prng_bytes(struct prng_context *ctx, 
int cont_test)
}
 
dbgprint(Returning new block for context %p\n, ctx);
-   ctx-rand_data_valid = 0;
+   ctx-rand_read_pos = 0;
 
hexdump(Output DT: , ctx-DT, DEFAULT_BLK_SZ);
hexdump(Output I: , ctx-I, DEFAULT_BLK_SZ);
@@ -217,7 +217,7 @@ static int get_prng_bytes(char *buf, size_t nbytes, struct 
prng_context *ctx,
 
 
 remainder:
-   if (ctx-rand_data_valid == DEFAULT_BLK_SZ) {
+   if (ctx-rand_read_pos == DEFAULT_BLK_SZ) {
if (_get_more_prng_bytes(ctx, do_cont_test)  0) {
memset(buf, 0, nbytes);
err = -EINVAL;
@@ -230,12 +230,9 @@ remainder:
 */
if (byte_count  DEFAULT_BLK_SZ) {
 empty_rbuf:
-   while (ctx-rand_data_valid  DEFAULT_BLK_SZ) {
-   *ptr = ctx-rand_data[ctx-rand_data_valid];
-   ptr++;
-   byte_count--;
-   ctx-rand_data_valid++;
-   if (byte_count == 0)
+   while (ctx-rand_read_pos  DEFAULT_BLK_SZ) {
+   *ptr++ = ctx-rand_data[ctx-rand_read_pos++];
+   if (--byte_count == 0)
goto done;
}
}
@@ -244,17 +241,17 @@ empty_rbuf:
 * Now copy whole blocks
 */
for (; byte_count = DEFAULT_BLK_SZ; byte_count -= DEFAULT_BLK_SZ) {
-   if (ctx-rand_data_valid == DEFAULT_BLK_SZ) {
+   if (ctx-rand_read_pos == DEFAULT_BLK_SZ) {
if (_get_more_prng_bytes(ctx, do_cont_test)  0) {
memset(buf, 0, nbytes);
err = -EINVAL;
goto done;
}
}
-   if (ctx-rand_data_valid  0)
+   if (ctx-rand_read_pos  0)
goto empty_rbuf;
memcpy(ptr, ctx-rand_data, DEFAULT_BLK_SZ);
-   ctx-rand_data_valid += DEFAULT_BLK_SZ;
+   ctx-rand_read_pos += DEFAULT_BLK_SZ;
ptr += DEFAULT_BLK_SZ;
}
 
@@ -304,7 +301,7 @@ static int reset_prng_context(struct prng_context *ctx,
memset(ctx-rand_data, 0, DEFAULT_BLK_SZ);
memset(ctx-last_rand_data, 0, DEFAULT_BLK_SZ);
 
-   ctx-rand_data_valid = DEFAULT_BLK_SZ;
+   ctx-rand_read_pos = DEFAULT_BLK_SZ;/* Force immediate refill */
 
ret = crypto_cipher_setkey(ctx-tfm, prng_key, klen);
if (ret) {
@@ -413,7 +410,7 @@ static int fips_cprng_reset(struct crypto_rng *tfm, u8 
*seed, unsigned int slen)
 
/* this primes our continuity test */
rc = get_prng_bytes(rdata, DEFAULT_BLK_SZ, prng, 0);
-   prng-rand_data_valid = DEFAULT_BLK_SZ;
+   prng-rand_read_pos = DEFAULT_BLK_SZ;
 
 out:
return rc;
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 02/17] crypto: ansi_cprng - Eliminate ctx-last_rand_data

2014-12-02 Thread George Spelvin

It's simply not necessary.

Signed-off-by: George Spelvin li...@horizon.com
---
 crypto/ansi_cprng.c | 28 +++-
 1 file changed, 11 insertions(+), 17 deletions(-)

diff --git a/crypto/ansi_cprng.c b/crypto/ansi_cprng.c
index c9e1684b..c0a27288 100644
--- a/crypto/ansi_cprng.c
+++ b/crypto/ansi_cprng.c
@@ -46,7 +46,6 @@
 struct prng_context {
spinlock_t prng_lock;
unsigned char rand_data[DEFAULT_BLK_SZ];
-   unsigned char last_rand_data[DEFAULT_BLK_SZ];
unsigned char DT[DEFAULT_BLK_SZ];
unsigned char I[DEFAULT_BLK_SZ];
unsigned char V[DEFAULT_BLK_SZ];
@@ -89,8 +88,6 @@ static int _get_more_prng_bytes(struct prng_context *ctx, int 
cont_test)
 {
int i;
unsigned char tmp[DEFAULT_BLK_SZ];
-   unsigned char *output = NULL;
-
 
dbgprint(KERN_CRIT Calling _get_more_prng_bytes for context %p\n,
ctx);
@@ -103,6 +100,7 @@ static int _get_more_prng_bytes(struct prng_context *ctx, 
int cont_test)
 * This algorithm is a 3 stage state machine
 */
for (i = 0; i  3; i++) {
+   unsigned char *output;
 
switch (i) {
case 0:
@@ -115,23 +113,23 @@ static int _get_more_prng_bytes(struct prng_context *ctx, 
int cont_test)
hexdump(tmp stage 0: , tmp, DEFAULT_BLK_SZ);
break;
case 1:
-
/*
-* Next xor I with our secret vector V
-* encrypt that result to obtain our
-* pseudo random data which we output
+* Next xor I with our secret vector V.
+* Encrypt that result to obtain our pseudo random
+* data which we output.  It is kept temporarily
+* in (no longer used) V until we have done the
+* anti-repetition compare.
 */
xor_vectors(ctx-I, ctx-V, tmp, DEFAULT_BLK_SZ);
hexdump(tmp stage 1: , tmp, DEFAULT_BLK_SZ);
-   output = ctx-rand_data;
+   output = ctx-V;
break;
case 2:
/*
 * First check that we didn't produce the same
-* random data that we did last time around through this
+* random data that we did last time around.
 */
-   if (!memcmp(ctx-rand_data, ctx-last_rand_data,
-   DEFAULT_BLK_SZ)) {
+   if (!memcmp(ctx-V, ctx-rand_data, DEFAULT_BLK_SZ)) {
if (cont_test) {
panic(cprng %p Failed repetition 
check!\n,
ctx);
@@ -144,15 +142,13 @@ static int _get_more_prng_bytes(struct prng_context *ctx, 
int cont_test)
ctx-flags |= PRNG_NEED_RESET;
return -EINVAL;
}
-   memcpy(ctx-last_rand_data, ctx-rand_data,
-   DEFAULT_BLK_SZ);
+   memcpy(ctx-rand_data, ctx-V, DEFAULT_BLK_SZ);
 
/*
 * Lastly xor the random data with I
 * and encrypt that to obtain a new secret vector V
 */
-   xor_vectors(ctx-rand_data, ctx-I, tmp,
-   DEFAULT_BLK_SZ);
+   xor_vectors(ctx-I, ctx-V, tmp, DEFAULT_BLK_SZ);
output = ctx-V;
hexdump(tmp stage 2: , tmp, DEFAULT_BLK_SZ);
break;
@@ -161,7 +157,6 @@ static int _get_more_prng_bytes(struct prng_context *ctx, 
int cont_test)
 
/* do the encryption */
crypto_cipher_encrypt_one(ctx-tfm, output, tmp);
-
}
 
/*
@@ -299,7 +294,6 @@ static int reset_prng_context(struct prng_context *ctx,
memset(ctx-DT, 0, DEFAULT_BLK_SZ);
 
memset(ctx-rand_data, 0, DEFAULT_BLK_SZ);
-   memset(ctx-last_rand_data, 0, DEFAULT_BLK_SZ);
 
ctx-rand_read_pos = DEFAULT_BLK_SZ;/* Force immediate refill */
 
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

PATCH 04/17] crypto: ansi_cprng - simplify xor_vectors() to xor_block()

2014-12-02 Thread George Spelvin

It doesn't need a second input or a length parameter.

Signed-off-by: George Spelvin li...@horizon.com
---
 crypto/ansi_cprng.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/crypto/ansi_cprng.c b/crypto/ansi_cprng.c
index 6b844f13..4856c84c7 100644
--- a/crypto/ansi_cprng.c
+++ b/crypto/ansi_cprng.c
@@ -74,14 +74,12 @@ if (dbg)\
printk(format, ##args);\
 } while (0)
 
-static void xor_vectors(unsigned char *in1, unsigned char *in2,
-   unsigned char *out, unsigned int size)
+static void xor_block(unsigned char const *in, unsigned char *out)
 {
int i;
 
-   for (i = 0; i  size; i++)
-   out[i] = in1[i] ^ in2[i];
-
+   for (i = 0; i  DEFAULT_BLK_SZ; i++)
+   out[i] ^= in[i];
 }
 /*
  * Returns DEFAULT_BLK_SZ bytes of random data per call
@@ -123,7 +121,7 @@ static int _get_more_prng_bytes(struct prng_context *ctx, 
int cont_test)
 * in (no longer used) V until we have done the
 * anti-repetition compare.
 */
-   xor_vectors(tmp, ctx-V, ctx-V, DEFAULT_BLK_SZ);
+   xor_block(tmp, ctx-V);
hexdump(input stage 1: , ctx-V, DEFAULT_BLK_SZ);
input = output = ctx-V;
break;
@@ -151,7 +149,7 @@ static int _get_more_prng_bytes(struct prng_context *ctx, 
int cont_test)
 * Lastly xor the random data with I
 * and encrypt that to obtain a new secret vector V
 */
-   xor_vectors(tmp, ctx-V, ctx-V, DEFAULT_BLK_SZ);
+   xor_block(tmp, ctx-V);
hexdump(input stage 2: , ctx-V, DEFAULT_BLK_SZ);
input = output = ctx-V;
break;
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 17/17] crypto: ansi_cprng - Shrink default seed size

2014-12-02 Thread George Spelvin

The larger seed with deterministic DT is still supported, but make the
default non-deterministically random, with fresh DT.

This must come after the patch that makes testmgr.c not depend on
the default seedsize to allocate its buffers, since it tests the
non-default (larger seed) case.

Signed-off-by: George Spelvin li...@horizon.com
---
 crypto/ansi_cprng.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Patch 15 is a prerequisite for this one.

diff --git a/crypto/ansi_cprng.c b/crypto/ansi_cprng.c
index 4ed7c0cf..875c4bf9 100644
--- a/crypto/ansi_cprng.c
+++ b/crypto/ansi_cprng.c
@@ -390,7 +390,7 @@ static struct crypto_alg rng_algs[] = { {
.rng = {
.rng_make_random= cprng_get_random,
.rng_reset  = cprng_reset,
-   .seedsize = DEFAULT_PRNG_KSZ + 2*DEFAULT_BLK_SZ,
+   .seedsize = DEFAULT_BLK_SZ + DEFAULT_PRNG_KSZ
}
}
 #ifdef CONFIG_CRYPTO_FIPS
@@ -408,7 +408,7 @@ static struct crypto_alg rng_algs[] = { {
.rng = {
.rng_make_random= fips_cprng_get_random,
.rng_reset  = fips_cprng_reset,
-   .seedsize = DEFAULT_PRNG_KSZ + 2*DEFAULT_BLK_SZ,
+   .seedsize = DEFAULT_BLK_SZ + DEFAULT_PRNG_KSZ
}
}
 #endif
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 02/17] crypto: ansi_cprng - Eliminate ctx-last_rand_data

2014-12-02 Thread George Spelvin

From smuel...@chronox.de Tue Dec 02 08:57:23 2014
X-AuthUser: s...@eperm.de
From: Stephan Mueller smuel...@chronox.de
To: George Spelvin li...@horizon.com
Cc: herb...@gondor.apana.org.au, nhor...@tuxdriver.com, 
linux-crypto@vger.kernel.org
Subject: Re: [PATCH 02/17] crypto: ansi_cprng - Eliminate ctx-last_rand_data
Date: Tue, 02 Dec 2014 09:57:17 +0100
User-Agent: KMail/4.14.2 (Linux/3.17.2-200.fc20.x86_64; KDE/4.14.2; x86_64; ; )
In-Reply-To: 20141202083550.17918.qm...@ns.horizon.com
References: 20141202083550.17918.qm...@ns.horizon.com
MIME-Version: 1.0
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset=us-ascii

Am Dienstag, 2. Dezember 2014, 03:35:50 schrieb George Spelvin:

Hi George,

 It's simply not necessary.

 Can you please be a bit more verbose on why you think this is not 
 necessary?

Sorry, I thought the code made that obvious.  The two buffers have to
exist simultaneously very briefly in order to be compared, but the
old data can be overwritten immediately thereafter.

So what the revised code does is:

I := E(DT)  (The buffer is called tmp)
V ^= I
V := E(V)   (This can be stored in V without problems)
compare V with read_data
read_data := V
V ^= I
V := E(V)

 Have you tested that change with reference test vectors -- what do 
 testmgr test vectors say?

As I explained in part 00, yes.  The behaviour is identical.

I should mention, however, that I did not exactly use testmgr; I cut 
pasted the relevant test vectors  code into ansi_cprng.c, then verified
that the tests passed with both old and modified code.  I have so far been
unable to figure out how to make the tcrypt module do anything useful.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 01/17] crypto: ansi_cprng - Rename rand_data_valid more sensibly

2014-12-02 Thread George Spelvin

  if (byte_count  DEFAULT_BLK_SZ) {
 empty_rbuf:
- while (ctx-rand_data_valid  DEFAULT_BLK_SZ) {
- *ptr = ctx-rand_data[ctx-rand_data_valid];
- ptr++;
- byte_count--;
- ctx-rand_data_valid++;
- if (byte_count == 0)
+ while (ctx-rand_read_pos  DEFAULT_BLK_SZ) {
+ *ptr++ = ctx-rand_data[ctx-rand_read_pos++];
+ if (--byte_count == 0)
  goto done;

 I am against such collapsing of constructs into one-liners. It is not 
 obvious at first sight, which value gets incremented in what order. Such 
 collapsing was the cause for CVE-2013-4345 as it caused an off-by one.

Given that this turns out to the the exact code where that happened, I can
see the sensitivity of the matter!  But I disagree in this case, and
it took me a while to figure out how to phrase it.

It's a code style issue, which means that it doesn't have a clear
technical answer.  It's more of a voting on what people think is
clearer thing.


In the case of if (--byte_count) issue, that's not something I feel very
strongly about.

But in the case of *ptr++ = ctx-rand_data[ctx-rand_read_pos++], it's
the opposite.  While I'm not going to pick a fight over it, my opinion
is that this form is clearly better.

There are two advantages to the code in this form:

1. *dst++ = *src++ is a C idiom, like for (i = 0; i  N; i++).
   It's very easy to recognize and understand.  A more broken-up form
   less obviously does all the necessary accounting.

2. The original bug was NOT caused by using side effects.  Consider the
   bug fix:

--- a/crypto/ansi_cprng.c
+++ b/crypto/ansi_cprng.c
@@ -230,11 +230,11 @@ remainder:
 */
if (byte_count  DEFAULT_BLK_SZ) {
 empty_rbuf:
-   for (; ctx-rand_data_valid  DEFAULT_BLK_SZ;
-   ctx-rand_data_valid++) {
+   while (ctx-rand_data_valid  DEFAULT_BLK_SZ) {
*ptr = ctx-rand_data[ctx-rand_data_valid];
ptr++;
byte_count--;
+   ctx-rand_data_valid++;
if (byte_count == 0)
goto done;
}

The problem was the *separation* of the copy and the associated increment.

In some situations, one was done and not the other, leading to data
duplication.  The fix was to move the increment of ctx-rand_data_valid
to before the if (byte_count == 0) loop exit test.

However, putting the increment in the same line as the copy reduces the
separation that caused the original bug, and makes it *more* clear that
the parts always happen together.

This, fundamentally, is the reason that I actually find it preferable:
it's conceptually one operation that should always have all of its parts
done together, and putting it on one line makes that easier for
the reader to recognize.

The problem with the original was putting it in the for(), not
using side effect expressions.


The above logic doesn't apply to if (--byte_count), which is why I
don't care about it nearly as much.


All that said, there are two significant remaining points:

3. this patch claims it's just a variable rename; perhaps it should stick
   to that, and
4. patch 10/15 totally deletes this code and replaces it with something
   even simpler, so if that is acceptable this entire discussion may be moot.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Is ansi_cprng.c supposed to be an implmentation of X9.31?

2014-12-02 Thread George Spelvin

 That is an old version.  The updated version (published in 2005), and
 specified in the ansi_cprng.c file removes that language.

Oh!  Thank you!  I'm pretty sure I read the 1998 version.

In fact, apparently there's a 2010 version:

http://www.codesdownload.org/3761-ANSI-X9-TR-31-2010.html

I need to figure out how to get my hands on that.

Presumably this is the 2005 version with the 2009 supplement incorporated.
If I could read Chinese, I might be able to find it here:

http://www.docin.com/p-524511188.html

 The long and the short of it is that, if you want a cprng who's output can be
 predicted by any entity with the IV and KEY values, then DT has to be known
 initially and updated in a predictable fashion that is independent of the data
 being transmitted.  Using a real date/time vector can't do that.

Er, yes, this is all extremely obvious; I'm not quite sure why we're
belabouring it.  Fully deterministic generators have their uses, which
is why I had to ask in the beginning what the design intent was.

If this is *intended* to be purely deterministic, there's nothing to fix.
I'd like to propose a small comment clarification because a quick reading
confused me.

But when I talked about making it random, you said send a patch, so
I did.  If you don't want the semantic change, I'm not upset.  The other
code cleanups are hopefully (after I've finished polishing them) useful;
just stop before the whole union block business.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Is ansi_cprng.c supposed to be an implmentation of X9.31?

2014-12-02 Thread George Spelvin

 Not sure what you're concerned about, or what you're even referencing.

Sorry for the confusion, but it's genuine.  See below for what I think is
the solution.

 The shash_desc structure represents a discrete block of data that is
 being hashed.  It can be updated with new data, reset, finalized, etc.
 It only points to the crypto_hash structure that it is associated with
 (as it must, given that the crypto layer needs to know what algorithm
 to use to hash the data and what key to use).  But it doesn't store a
 second copy of the key on its own.

Er, I saw the following code:

static int michael_setkey(struct crypto_shash *tfm, const u8 *key,
  unsigned int keylen)
{
struct michael_mic_ctx *mctx = crypto_shash_ctx(tfm);

const __le32 *data = (const __le32 *)key;

if (keylen != 8) {
crypto_shash_set_flags(tfm, CRYPTO_TFM_RES_BAD_KEY_LEN);
return -EINVAL;
}

mctx-l = le32_to_cpu(data[0]);
mctx-r = le32_to_cpu(data[1]);
return 0;
}

and thought okay, the key is in mctx-l and mctx-r.

Then I saw

static int michael_init(struct shash_desc *desc)
{
struct michael_mic_desc_ctx *mctx = shash_desc_ctx(desc);
struct michael_mic_ctx *ctx = crypto_shash_ctx(desc-tfm);
mctx-pending_len = 0;
mctx-l = ctx-l;
mctx-r = ctx-r;

return 0;
}

and thought the key is being copied from the ctx to the desc.  Why
the heck are they doing it in two steps?

I spent a long time trying to figure out why there were two separate
layers, as it looked like

struct michael_mic_ctx {
u32 l, r;
};

was a strict subset of

struct michael_mic_desc_ctx {
u8 pending[4];
size_t pending_len;

u32 l, r;
};

and thus the former was unnecessary.

(Another code optimization that jumps out at me: given that pending_len
is strictly between 0 and 3, using a 64-bit size_t unnecessarily bloats
the context structure to 24 bytes.  Shrinking it to 32 bits would reduce
the context to 16 bytes, and given that at most three bytes of pending[]
are ever used, except transiently in the update() function, pending_len
could be stored in pending[3] and and the context shrunk to 12 bytes.)


I'm thinking that the confusion comes from my unfortunate choice of
which file to start reading and the peculiar way that michael_mic's key
is more like an IV, so there is neither key scheduling nor persistent
access to it.


== My guess as to how this works ==

(If I say anything wrong, please correct me!)

A ctx is an abstraction of a (scheduled) key.  With symmetric ciphers,
it's very common to have a complex key set up (I'm looking at you, twofish)
which can be re-used for multiple messages/packets/whatever.

It has to be an opaque object because it might correspond to some
state in a hardware acclerator, with the software swapping keys in
and out like virtual memory.

A desc is the abstraction of an IV.  It handles chaining and block
boundaries, to encrypt/hash/transform a stream/bvec/sk_buff of data.

The reason for the separation is that multiple descs may simultaneously
access the same ctx.

This is okay because the ctx reference is conceptually read-only.
(If there's swapping of hardware contexts going on behind the curtain,
the ctx may not be entirely immutable, but it's like a cache: you're
not supposed to notice the difference.)

There are some cases, like ECB, where a desc is a ridiculously thin
wrapper and seems like overkill, but it's there to provide a consistent
interface.

(Design choice: a single ctx can handle both encryption and decryption.
For algorithms that have different key schedules for the two directions,
it has to waste the space even if the cipher is only being used forward.)

 (On a related point, the general lack of const declarations throughout the
 crypto layer has been a source of frustration.)
 
 So fix it.  Making large claims about what frustrates you about the
 code without producing any fixes doesn't make people listen to you.

I'm happy to!  It's just a large, invasive, change and I wonder how welcome
it is.  Maybe there's a reason for the omission that I'm not seeing?

You know the aphorism never attribute to malice that which can be
adequately explained by stupidity?  Well, there's a follow-on, which
says be careful attributing to someone else's stupidity that which
can be adequately explained by your own.

It's a pattern I go through frequently, and I think others do too:
my initial reaction is what the @#$@# is this $#!+?, but I try to
consider whether the fault is actually mine before speaking aloud (or
hitting send, as the case may be).

If it's soemthing that annoys you too and you just haven't gotten around
to fixing it, I'd be delighted!

I just wanted to start with something smaller in scope, confined to
a single source file, where I felt like I understood the implications
better.

 The other big issue I'm struggling with is how to get the

Re: [PATCH 02/17] crypto: ansi_cprng - Eliminate ctx-last_rand_data

2014-12-02 Thread George Spelvin

 NACK

 The assumption that its not needed is incorrect.  In fips mode its explicitly
 needed to validate that the rng isn't reproducing identical random data.

Please take a second look.  The validation is still there; I fully
understand that and preserved that.  (Well, I broke it later getting
over-eager looking for places to put memzero_explicit, but already
sent a follow-on message about that.)

Only the *buffer* is unnecessary and was deleted.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 03/17] crypto: ansi_cprng - Eliminate ctx-I

2014-12-02 Thread George Spelvin

 I'm only ok with removing I if you can continue to be able to output it.
 given that I is listed as part of the test sequences that NIST provides,
 I'd like to be able to compare the values.

I can do that easily, but I can't print the *input* I, which
is the result of encrypting the previous DT, as it's thrown
away earlier.

You'd have to look further back in the debug messages to find it.

Is changing the format of the debug messages okay?  I'd like the debug
messages to describe the code, but I don't know if you have something
that parses the current output.


The test output I see on p. 33 of
http://csrc.nist.gov/groups/STM/cavp/documents/rng/RNGVS.pdf
doesn't include I.  Can you point me to a sample that includes I?

It might be best to more significantly  rework the debug messages to
resemble the NIST test vectors.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 07/17] crypto: ansi_cprng - Shrink rand_read_pos flags

2014-12-02 Thread George Spelvin

 --- a/crypto/ansi_cprng.c
 +++ b/crypto/ansi_cprng.c
 @@ -51,9 +51,9 @@ struct prng_context {
  unsigned char rand_data[DEFAULT_BLK_SZ];
  unsigned char DT[DEFAULT_BLK_SZ];
  unsigned char V[DEFAULT_BLK_SZ];
 -u32 rand_read_pos;  /* Offset into rand_data[] */
 +unsigned char rand_read_pos;/* Offset into rand_data[] */

 u8 please.  Also, not sure if this helps much, as I think the padding
 will just get you back to word alignment on each of these.

I noticed the unsigned char vs u8 issue, but didn't make the change
as I didn't think the readailility improvement was worth the code churn.

But I'd be happy to clean that up, too!

Should I convert all the buffers and function prototypes, or is there
some criterion for distinguishing which gets which?   (E.g. buffers are
unsigned char but control variables are u8.)

And actually, you do win.  spinlock_t is 16 bits on x86,
and the buffers are all 16 bytes.   (80 bytes before my earlier
patches, 48 bytes after.)

So the the structure goes from:

32-bit  64-bit  Variable
Offset  SizeOffset  Size
 0  20  2   spinlock_t prng_lock
 2  16   2  16  unsigned char rand_data[16]
18  16  18  16  unsigned char DT[16]
34  16  34  16  unsigned char V[16]
50  2   50  2   (alignemnt)
52  4   52  4   u32 rand_read_pos
56  4   56  8   struct crypto_cipher *tfm
60  4   64  4   u32 flags
68  4   (alignment)
64  72  (structure size)

to

32-bit  64-bit  Variable
Offset  SizeOffset  Size
34  16  34  16  unsigned char V[16]
50  1   50  1   u8 rand_read_pos
51  1   51  1   u8 flags
52  4   (alignment)
52  4   56  8   struct crypto_cipher *tfm
56  64  (structure size)

You still get 4 bytes of alignment padding on x86-64, but given that
the structure has 60 bytes of payload, that's the minimum possible.

You save 6 bytes of variables and 2 bytes of padding on both
32- and 64-bit systems, for a total of 8 bytes, and that's enough
to knock you into a smaller slab object bin on 64-bit.


I forget where I read the terminology, but the most efficient
wat to pack a structure is in an organ pipe configuraiton where
the biggest (in terms of *alignment*) members are on the outside
and the structre and the smaller elements are on the inside.
Putting a 32-bit flags after a 64-bit pointer violates that.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 11/17] crypto: ansi_cprng - unroll _get_more_prng_bytes

2014-12-02 Thread George Spelvin

The order of the v1 patches is somewhat in order of increasing scale,
reflecting my learning about the code.  But if this unroll is okay,
it would probably make the most sense to reorder things so it's
first and then other changes can be made on top of the simpler code.

Given the unusual implementation choice of a loop and a switch, it's
obviously a deliberate one, so I assume there's a reason for it that
I'm just not seeing.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Is ansi_cprng.c supposed to be an implmentation of X9.31?

2014-12-01 Thread George Spelvin

 The other one, which I have to think *very* hard about, is whether
 using the same seed material as /dev/random in this much weaker
 cryptographic construction will introduce a flaw in *it*.

 I have no idea what you are referring to here. The caller is requred to 
 seed the DRNG. Typically that is done with a call to get_random_bytes. 
 As Neil mentioned, the X9.31 DRNG implementation does not seed itself.

Er, yes it does.  Not the key, but the IV vector V[].

Here's the closest thing I can find on-line to a direct quote
from the ANSI Spec:

http://www.untruth.org/~josh/security/ansi931rng.PDF

Let DT be a date/time vector which is updated on each iteration.

This timestamp is a source of continuous reseed material.  The spec
doesn't impose specific resolution requirements, but it's hopefully
obvious that the finer, the better.  The goal is a vector which changes
per iteration.

There are limitations due to its finite entropy and lack of catastrophic
reseeding, as Kelsey  Schneier pointed out:

https://www.schneier.com/paper-prngs.html

but it is intended as an entropy source.

Hopefully very soon I'll have a patch series to post.  (Unfortunately,
I just managed to oops my kernel and need to reboot to continue testing.)

If I don't do much more, it'll be patch 13/17 in the series.  I think
the use of get_random_int() is a good compromise between the raw
random_get_entropy() timestamp and the (too heavy) get_random_bytes().
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Is ansi_cprng.c supposed to be an implmentation of X9.31?

2014-12-01 Thread George Spelvin

 See Documentation/DocBook/crypto-API.tmpl in the cryptodev-2.6 tree. 
 There you will find tons of documentation (which will be merged during 
 3.19-rc1)

Yes, I've been reading that.  It certainly helps a great deal, but
still leaves me with some significant questions.

I started researching the crypto layer when I proposed using Dan
Bernstein's SipHash elsewhere in the kernel and someone asked for a
crypto API wrapper for it.  That seemed a simple enough request to me,
but it's been a deeper rabbit hole than I expected.

I started reading the code to another keyed hash, michael_mic, as a model,
but I'm stil trying to understand the intended difference between struct
crypto_shash and struct shash_desc, and in particular why both have
a copy of the key.  The SHASH API documentation at

https://git.kernel.org/cgit/linux/kernel/git/herbert/cryptodev-2.6.git/tree/include/crypto/hash.h

isn't particularly enlightening.  If the crypto_shash were entirely
read-only and the shash_desc were the entire volatile state, that would
make sense, but as it is I'm confused about the design intent.

(On a related point, the general lack of const declarations throughout the
crypto layer has been a source of frustration.)


The other big issue I'm struggling with is how to get the tcrypt.ko module
to print ansi_cprng has passed its tests.  All it has produced for me
so far is a few kilobytes of dmesg spam about ciphers which aren't even
configured in my kernel.

After a few hours of trying to figure out what the alg and type parameters
do, I gave up and cut and pasted the tests into prng_mod_init().
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Is ansi_cprng.c supposed to implement X9.17/X9.31's RNG?

2014-11-29 Thread George Spelvin

Sorry for the duplicate; I had a crash and I thought the mail was lost.
First message was not quite finished, second is a rewrite from scratch.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 114 matches

Mail list logo