Re: [crypto 4/8] chtls: CPL handler definition

2017-12-05 Thread Hannes Frederic Sowa
Hello,

On Tue, Dec 5, 2017, at 12:40, Atul Gupta wrote:
> CPL handlers for TLS session, record transmit and receive

This does very much looks like full TCP offload with TLS on top? It
would be nice if you could give a few more details in the patch
descriptions.

Bye,
Hannes


Re: [PATCH v3 net-next 0/4] kernel TLS

2017-06-14 Thread Hannes Frederic Sowa
Hello Dave,

On Wed, Jun 14, 2017, at 21:47, David Miller wrote:
> From: Dave Watson 
> Date: Wed, 14 Jun 2017 11:36:54 -0700
> 
> > This series adds support for kernel TLS encryption over TCP sockets.
> > A standard TCP socket is converted to a TLS socket using a setsockopt.
> > Only symmetric crypto is done in the kernel, as well as TLS record
> > framing.  The handshake remains in userspace, and the negotiated
> > cipher keys/iv are provided to the TCP socket.
> > 
> > We implemented support for this API in OpenSSL 1.1.0, the code is
> > available at https://github.com/Mellanox/tls-openssl/tree/master
> > 
> > It should work with any TLS library with similar modifications,
> > a test tool using gnutls is here: 
> > https://github.com/Mellanox/tls-af_ktls_tool
> > 
> > RFC patch to openssl:
> > https://mta.openssl.org/pipermail/openssl-dev/2017-June/009384.html
>  ...
> 
> I really want to apply this, so everyone give it a good review :-)

one question for this patch set:

What is the reason for not allowing key updates for the TX path? I was
always loud pointing out the problems with TLSv1.2 renegotiation and
TLSv1.3 key update alerts. This patch set uses encryption in a
synchronous way directly in the socket layer and thus wouldn't suffer
from problems regarding updates of the key. My hunch is that you leave
this option open so you can later on introduce asynchronous crypto which
might be used on hardware? It looks also be doable in case of MSG_MORE.
Otherwise by allowing key updates to the data path I would not see any
problems with key updates in TLS.

The reason why I am asking is that it is hard to predict how many bytes
will be send through a connection. TLSv1.3 recommends (SHOULD)
implementation writes to update the key after 362GB, afaik NIST even has
a lower margin. After that the symmetric crypto might become too weak.

Anyway, this patch seems easy and maybe with key updates added later on
doesn't seem to have any problems pointed out by me so far.

Thanks,
Hannes


Re: [RFC TLS Offload Support 00/15] cover letter

2017-03-29 Thread Hannes Frederic Sowa
Hello,

On 29.03.2017 19:41, David Miller wrote:
> From: Aviad Yehezkel 
> Date: Tue, 28 Mar 2017 16:26:17 +0300
> 
>> TLS Tx crypto offload is a new feature of network devices. It
>> enables the kernel TLS socket to skip encryption and authentication
>> operations on the transmit side of the data path, delegating those
>> to the NIC. In turn, the NIC encrypts packets that belong to an
>> offloaded TLS socket on the fly. The NIC does not modify any packet
>> headers. It expects to receive fully framed TCP packets with TLS
>> records as payload. The NIC replaces plaintext with ciphertext and
>> fills the authentication tag. The NIC does not hold any state beyond
>> the context needed to encrypt the next expected packet,
>> i.e. expected TCP sequence number and crypto state.
> 
> It seems like, since you do the TLS framing in TCP and the card is
> expecting to fill in certain aspects, there is a requirement that the
> packet contents aren't mangled between the TLS framing code and when
> the SKB hits the card.
> 
> Is this right?
> 
> For example, what happens if netfilter splits a TLS Tx offloaded frame
> into two TCP segments?

Furthermore, it doesn't seem to work with bonding or any other virtual
interface, which could move the skb's to be processed on another NIC, as
the context is put onto the NIC. Even a redirect can not be processed
anymore (seems like those patches try to stick the connection to an
interface anyway).

Wouldn't it be possible to keep the state in software and push down a
security context per skb, which get applied during sending? If not
possible via hw, slowpath can encrypt packet in sw.

Also sticking connections to outgoing interfaces might work for TX, but
you can't force the interface where packets come in.

Bye,
Hannes



Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-27 Thread Hannes Frederic Sowa
Hello,

On Fri, 2016-12-23 at 20:17 -0500, George Spelvin wrote:
> Hannes Frederic Sowa wrote:
> > On 24.12.2016 00:39, George Spelvin wrote:
> > > We just finished discussing why 8 bytes isn't enough.  If you only
> > > feed back 8 bytes, an attacker who can do 2^64 computation can find it
> > > (by guessing and computing forward to verify the guess) and recover the
> > > previous state.  You need to feed back at least as much output as your
> > > security targete.  For /dev/urandom's ChaCha20, that's 256 bits.
> > I followed the discussion but it appeared to me that this has the
> > additional constraint of time until the next reseeding event happenes,
> > which is 300s (under the assumption that calls to get_random_int happen
> > regularly, which I expect right now). After that the existing reseeding
> > mechansim will ensure enough backtracking protection. The number of
> > bytes can easily be increased here, given that reseeding was shown to be
> > quite fast already and we produce enough output. But I am not sure if
> > this is a bit overengineered in the end?
> 
> I'm not following your description of how the time-based and call-based
> mechanisms interact, but for any mix-back, you should either do enough
> or none at all.  (Also called "catastrophic reseeding".)

We call extract_crng when we run out of batched entropy and reseed. How
often we call down to extract_crng depends on how much entropy we
extracted by calls to get_random_int/long, so the number of calls into
those functions matter.

In extract_crng we have a timer which reseeds every 300s the CPRNG and
either uses completely new entropy from the CRNG or calls down into the
CPRNG while also doing backtracing protection (which feeds chacha's
block size / 2 back into chacha, if I read the code correctly, thus
1024 bits, which should be enough).

> For example, two mix-backs of 64 bits gives you 65 bit security, not 128.
> (Because each mixback can be guessed and verified separately.)

Exactly, but the full reseed after running out of entropy is strong
enough to not be defeated by your argumentation. Neither the reseed
from the CRNG.

> > Also agreed. Given your solution below to prandom_u32, I do think it
> > might also work without the seqlock now.
> 
> It's not technically a seqlock; in particular the reader doesn't
> spin.  But the write side, and general logic is so similar it's
> a good mental model.
> 
> Basically, assume a 64-byte buffer.  The reader has gone through
> 32 bytes of it, and has 32 left, and when he reads another 8 bytes,
> has to distinguish three cases:
> 
> 1) No update; we read the old bytes and there are now 32 - 24 bytes left.
> 2) Update completed while we weren't looking.  There are now new
>bytes in the buffer, and we took 8 leaving 64 - 8 = 56.
> 3) Update in progress at the time of the read.  We don't know if we
>are seeing old bytes or new bytes, so we have to assume the worst
>and not proceeed unless 32 >= 8, but assume at the end there are
>64 - 8 = 56 new bytes left.
> 
> > I wouldn't have added a disable irqs, but given that I really like your
> > proposal, I would take it in my todo branch and submit it when net-next
> > window opens.
> 
> If you want that, I have a pile of patches to prandom I really
> should push upstream.  Shall I refresh them and send them to you?

I would like to have a look at them in the new year, certainly! I can
also take care about the core prandom patches, but don't know if I have
time to submit the others to the different subsystems.

Maybe, if David would be okay with that, we can submit all patches
through his tree, as he is also the dedicated maintainer for prandom.

> [... patch descriptions ...]

Thanks,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-23 Thread Hannes Frederic Sowa
Hi,

On 24.12.2016 00:39, George Spelvin wrote:
> Hannes Frederic Sowa wrote:
>> In general this looks good, but bitbuffer needs to be protected from
>> concurrent access, thus needing at least some atomic instruction and
>> disabling of interrupts for the locking if done outside of
>> get_random_long. Thus I liked your previous approach more where you
>> simply embed this in the already necessary get_random_long and aliased
>> get_random_long as get_random_bits(BITS_PER_LONG) accordingly, IMHO.
> 
> It's meant to be part of the same approach, and I didn't include locking
> because that's a requirement for *any* solution, and isn't specific
> to the part I was trying to illustrate.
> 
> (As for re-using the name "get_random_long", that was just so
> I didn't have to explain it.  Call it "get_random_long_internal"
> if you like.)
> 
> Possible locking implementations include:
> 1) Use the same locking as applies to get_random_long_internal(), or
> 2) Make bitbuffer a per-CPU variable (note that we currently have 128
>bits of per-CPU state in get_random_int_hash[]), and this is all a
>fast-path to bypass heavier locking in get_random_long_internal().

I understood that this is definitely a solvable problem.

>>> But, I just realized I've been overlooking something glaringly obvious...
>>> there's no reason you can't compute multple blocks in advance.
>>
>> In the current code on the cost of per cpu allocations thus memory.
> 
> Yes, but on 4-core machines it's still not much, and 4096-core
> behemoths have RAM to burn.
> 
>> In the extract_crng case, couldn't we simply use 8 bytes of the 64 byte
>> return block to feed it directly back into the state chacha? So we pass
>> on 56 bytes into the pcpu buffer, and consume 8 bytes for the next
>> state. This would make the window max shorter than the anti
>> backtracking protection right now from 300s to 14 get_random_int calls.
>> Not sure if this is worth it.
> 
> We just finished discussing why 8 bytes isn't enough.  If you only
> feed back 8 bytes, an attacker who can do 2^64 computation can find it
> (by guessing and computing forward to verify the guess) and recover the
> previous state.  You need to feed back at least as much output as your
> security targete.  For /dev/urandom's ChaCha20, that's 256 bits.

I followed the discussion but it appeared to me that this has the
additional constraint of time until the next reseeding event happenes,
which is 300s (under the assumption that calls to get_random_int happen
regularly, which I expect right now). After that the existing reseeding
mechansim will ensure enough backtracking protection. The number of
bytes can easily be increased here, given that reseeding was shown to be
quite fast already and we produce enough output. But I am not sure if
this is a bit overengineered in the end?

>>> For example, suppose we gave each CPU a small pool to minimize locking.
>>> When one runs out and calls the global refill, it could refill *all*
>>> of the CPU pools.  (You don't even need locking; there may be a race to
>>> determine *which* random numbers the reader sees, but they're equally
>>> good either way.)
> 
>> Yes, but still disabled interrupts, otherwise the same state could be
>> used twice on the same CPU. Uff, I think we have this problem in
>> prandom_u32.
> 
> There are some locking gotchas, but it is doable lock-free.
> 
> Basically, it's a seqlock.  The writer increments it once (to an odd
> number) before starting to overwrite the buffer, and a second time (to
> an even number) after.  "Before" and "after" mean smp_wmb().
> 
> The reader can use this to figure out how much of the data in the buffer
> is safely fresh.  The full sequence of checks is a bit intricate,
> but straightforward.
> 
> I didn't discuss the locking because I'm confident it's solvable,
> not because I wasn't aware it has to be solved.

Also agreed. Given your solution below to prandom_u32, I do think it
might also work without the seqlock now.

> As for prandom_u32(), what's the problem?  Are you worried that
> get_cpu_var disables preemption but not interrupts, and so an
> ISR might return the same value as process-level code?

Yes, exactly those were my thoughts.

> First of all, that's not a problem because prandom_u32() doesn't
> have security guarantees.  Occasionally returning a dupicate number
> is okay.
>
> Second, if you do care, that could be trivially fixed by throwing
> a barrier() in the middle of the code.  (Patch appended; S-o-B
> if anyone wants it.)

I wouldn't have added a disable irqs, but given that I really like your
proposal, I would take it in my todo

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-23 Thread Hannes Frederic Sowa
Hi,

On Fri, 2016-12-23 at 13:26 -0500, George Spelvin wrote:
> (Cc: list trimmed slightly as the topic is wandering a bit.)
> 
> Hannes Frederic Sowa wrote:
> > On Thu, 2016-12-22 at 19:07 -0500, George Spelvin wrote:
> > > Adding might_lock() annotations will improve coverage a lot.
> > 
> > Might be hard to find the correct lock we take later down the code
> > path, but if that is possible, certainly.
> 
> The point of might_lock() is that you don't have to.  You find the
> worst case (most global) lock that the code *might* take if all the
> buffer-empty conditions are true, and tell lockdep "assume this lock is
> taken all the time".

Yes, of course. Probably indicating input_pool's and primary_crng's
locks are good to indicate here, but on NUMA other locks might be
taken.

> > > Hannes Frederic Sowa wrote:
> > > > Yes, that does look nice indeed. Accounting for bits instead of bytes
> > > > shouldn't be a huge problem either. Maybe it gets a bit more verbose in
> > > > case you can't satisfy a request with one batched entropy block and have
> > > > to consume randomness from two.
> 
> For example, here's a simple bit-buffer implementation that wraps around
> a get_random_long.  The bitbuffer is of the form "1", where the
> x bits are valid, and the position of the msbit indicates how many bits
> are valid.
> 
> extern unsigned long get_random_long();
> static unsigned long bitbuffer = 1;   /* Holds 0..BITS_PER_LONG-1 bits */
> unsigned long get_random_bits(unsigned char bits)
> {
>   /* We handle bits == BITS_PER_LONG,and not bits == 0 */
>   unsigned long mask = -1ul >> (BITS_PER_LONG - bits);
>   unsigned long val;
> 
>   if (bitbuffer > mask) {
>   /* Request can be satisfied out of the bit buffer */
>   val = bitbuffer;
>   bitbuffer >>= bits;
>   } else {
>   /*
>* Not enough bits, but enough room in bitbuffer for the
>* leftovers.  avail < bits, so avail + 64 <= bits + 63.
>*/
>   val = get_random_long();
>   bitbuffer = bitbuffer << (BITS_PER_LONG - bits)
>   | val >> 1 >> (bits-1);
>   }
>   return val & mask;
> }

In general this looks good, but bitbuffer needs to be protected from
concurrent access, thus needing at least some atomic instruction and
disabling of interrupts for the locking if done outside of
get_random_long. Thus I liked your previous approach more where you
simply embed this in the already necessary get_random_long and aliased
get_random_long as get_random_bits(BITS_PER_LONG) accordingly, IMHO.

> > When we hit the chacha20 without doing a reseed we only mutate the
> > state of chacha, but being an invertible function in its own, a
> > proposal would be to mix parts of the chacha20 output back into the
> > state, which, as a result, would cause slowdown because we couldn't
> > propagate the complete output of the cipher back to the caller (looking
> > at the function _extract_crng).
> 
> Basically, yes.  Half of the output goes to rekeying itself.

Okay, thanks and understood. :)

> But, I just realized I've been overlooking something glaringly obvious...
> there's no reason you can't compute multple blocks in advance.

In the current code on the cost of per cpu allocations thus memory.

> The standard assumption in antibacktracking is that you'll *notice* the
> state capture and stop trusting the random numbers afterward; you just
> want the output *before* to be secure.  In other words, cops busting
> down the door can't find the session key used in the message you just sent.

Yep, analogous to forward secrecy and future secrecy (self healing) is
provided by catastrophic reseeding from /dev/random.

In the extract_crng case, couldn't we simply use 8 bytes of the 64 byte
return block to feed it directly back into the state chacha? So we pass
on 56 bytes into the pcpu buffer, and consume 8 bytes for the next
state. This would make the window max shorter than the anti
backtracking protection right now from 300s to 14 get_random_int calls.
Not sure if this is worth it.

> So you can compute and store random numbers ahead of need.
> 
> This can amortize the antibacktracking as much as you'd like.
> 
> For example, suppose we gave each CPU a small pool to minimize locking.
> When one runs out and calls the global refill, it could refill *all*
> of the CPU pools.  (You don't even need locking; there may be a race to
> determine *which* random numbers the reader sees, but they're equally
> good either way.)

Yes, but still disabled interrupts, otherwise the sa

Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-23 Thread Hannes Frederic Sowa
On 23.12.2016 17:42, Andy Lutomirski wrote:
> On Fri, Dec 23, 2016 at 8:23 AM, Andy Lutomirski <l...@amacapital.net> wrote:
>> On Fri, Dec 23, 2016 at 3:59 AM, Daniel Borkmann <dan...@iogearbox.net> 
>> wrote:
>>> On 12/23/2016 11:59 AM, Hannes Frederic Sowa wrote:
>>>>
>>>> On Fri, 2016-12-23 at 11:04 +0100, Daniel Borkmann wrote:
>>>>>
>>>>> On 12/22/2016 05:59 PM, Hannes Frederic Sowa wrote:
>>>>>>
>>>>>> On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
>>>
>>> [...]
>>>
>>>>>> The hashing is not a proper sha1 neither, unfortunately. I think that
>>>>>> is why it will have a custom implementation in iproute2?
>>>>>
>>>>>
>>>>> Still trying to catch up on this admittedly bit confusing thread. I
>>>>> did run automated tests over couple of days comparing the data I got
>>>>> from fdinfo with the one from af_alg and found no mismatch on the test
>>>>> cases varying from min to max possible program sizes. In the process
>>>>> of testing, as you might have seen on netdev, I found couple of other
>>>>> bugs in bpf code along the way and fixed them up as well. So my question,
>>>>> do you or Andy or anyone participating in claiming this have any
>>>>> concrete data or test cases that suggests something different? If yes,
>>>>> I'm very curious to hear about it and willing fix it up, of course.
>>>>> When I'm back from pto I'll prep and cook up my test suite to be
>>>>> included into the selftests/bpf/, should have done this initially,
>>>>> sorry about that. I'll also post something to expose the alg, that
>>>>> sounds fine to me.
>>>>
>>>>
>>>> Looking into your code closer, I noticed that you indeed seem to do the
>>>> finalization of sha-1 by hand by aligning and padding the buffer
>>>> accordingly and also patching in the necessary payload length.
>>>>
>>>> Apologies for my side for claiming that this is not correct sha1
>>>> output, I was only looking at sha_transform and its implementation and
>>>> couldn't see the padding and finalization round with embedding the data
>>>> length in there and hadn't thought of it being done manually.
>>>>
>>>> Anyway, is it difficult to get the sha finalization into some common
>>>> code library? It is not very bpf specific and crypto code reviewers
>>>> won't find it there at all.
>>>
>>>
>>> Yes, sure, I'll rework it that way (early next year when I'm back if
>>> that's fine with you).
>>
>> Can we make it SHA-256 before 4.10 comes out, though?  This really
>> looks like it will be used in situations where collisions matter and
>> it will be exposed to malicious programs, and SHA-1 should not be used
>> for new designs for this purpose because it simply isn't long enough.
>>
>> Also, a SHA-1 digest isn't a pile of u32s, so u32 digest[...] is very
>> misleading.  That should be u8 or, at the very least, __be32.
>>
>> I realize that there isn't a sha-256 implementation in lib, but would
>> it really be so bad to make the bpf digest only work (for now) when
>> crypto is enabled?  I would *love* to see the crypto core learn how to
>> export simple primitives for direct use without needing the whole
>> crypto core, and this doesn't seem particularly hard to do, but I
>> don't think that's 4.10 material.
> 
> I'm going to try to send out RFC patches for all of this today or
> tomorrow.  It doesn't look bad at all.

Factoring out sha3 to lib/ and use it as standalone and in crypto api
doesn't seem hard, yep. I also proposed this to Daniel offlist.

Bye,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-23 Thread Hannes Frederic Sowa
On Thu, 2016-12-22 at 19:07 -0500, George Spelvin wrote:
> Hannes Frederic Sowa wrote:
> > A lockdep test should still be done. ;)
> 
> Adding might_lock() annotations will improve coverage a lot.

Might be hard to find the correct lock we take later down the code
path, but if that is possible, certainly.

> > Yes, that does look nice indeed. Accounting for bits instead of bytes
> > shouldn't be a huge problem either. Maybe it gets a bit more verbose in
> > case you can't satisfy a request with one batched entropy block and have
> > to consume randomness from two.
> 
> The bit granularity is also for the callers' convenience, so they don't
> have to mask again.  Whether get_random_bits rounds up to byte boundaries
> internally or not is something else.
> 
> When the current batch runs low, I was actually thinking of throwing
> away the remaining bits and computing a new batch of 512.  But it's
> whatever works best at implementation time.
> 
> > > > It could only mix the output back in every two calls, in which case
> > > > you can backtrack up to one call but you need to do 2^128 work to
> > > > backtrack farther.  But yes, this is getting excessively complicated.
> > > No, if you're willing to accept limited backtrack, this is a perfectly
> > > acceptable solution, and not too complicated.  You could do it phase-less
> > > if you like; store the previous output, then after generating the new
> > > one, mix in both.  Then overwrite the previous output.  (But doing two
> > > rounds of a crypto primtive to avoid one conditional jump is stupid,
> > > so forget that.)
> > Can you quickly explain why we lose the backtracking capability?
> 
> Sure.  An RNG is (state[i], output[i]) = f(state[i-1]).  The goal of
> backtracking is to compute output[i], or better yet state[i-1], given
> state[i].
> 
> For example, consider an OFB or CTR mode generator.  The state is a key
> and and IV, and you encrypt the IV with the key to produce output, then
> either replace the IV with the output, or increment it.  Either way,
> since you still have the key, you can invert the transformation and
> recover the previous IV.
> 
> The standard way around this is to use the Davies-Meyer construction:
> 
> IV[i] = IV[i-1] + E(IV[i-1], key)
> 
> This is the standard way to make a non-invertible random function
> out of an invertible random permutation.
> 
> From the sum, there's no easy way to find the ciphertext *or* the
> plaintext that was encrypted.  Assuming the encryption is secure,
> the only way to reverse it is brute force: guess IV[i-1] and run the
> operation forward to see if the resultant IV[i] matches.
> 
> There are a variety of ways to organize this computation, since the
> guess gives toy both IV[i-1] and E(IV[i-1], key) = IV[i] - IV[i-1], including
> running E forward, backward, or starting from both ends to see if you
> meet in the middle.
> 
> The way you add the encryption output to the IV is not very important.
> It can be addition, xor, or some more complex invertible transformation.
> In the case of SipHash, the "encryption" output is smaller than the
> input, so we have to get a bit more creative, but it's still basically
> the same thing.
> 
> The problem is that the output which is combined with the IV is too small.
> With only 64 bits, trying all possible values is practical.  (The world's
> Bitcoin miners are collectively computing SHA-256(SHA-256(input)) 1.7 * 2^64
> times per second.)
> 
> By basically doing two iterations at once and mixing in 128 bits of
> output, the guessing attack is rendered impractical.  The only downside
> is that you need to remember and store one result between when it's
> computed and last used.  This is part of the state, so an attack can
> find output[i-1], but not anything farther back.

Thanks a lot for the explanation!

> > ChaCha as a block cipher gives a "perfect" permutation from the output
> > of either the CRNG or the CPRNG, which actually itself has backtracking
> > protection.
> 
> I'm not quite understanding.  The /dev/random implementation uses some
> of the ChaCha output as a new ChaCha key (that's another way to mix output
> back into the state) to prevent backtracking.  But this slows it down, and
> again if you want to be efficient, you're generating and storing large batches
> of entropy and storing it in the RNG state.

I was actually referring to the anti-backtrack protection in
/dev/random and also /dev/urandom, from where we reseed every 300
seconds and if our batched entropy runs low with Ted's/Jason's current
patch for get_random_int.

As far as I can understand it, backtracking is not a problem in case 

Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-23 Thread Hannes Frederic Sowa
On Fri, 2016-12-23 at 11:04 +0100, Daniel Borkmann wrote:
> On 12/22/2016 05:59 PM, Hannes Frederic Sowa wrote:
> > On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
> > > On Thu, Dec 22, 2016 at 7:51 AM, Hannes Frederic Sowa
> > > <han...@stressinduktion.org> wrote:
> > > > On Thu, 2016-12-22 at 16:41 +0100, Jason A. Donenfeld wrote:
> > > > > On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa
> > > > > <han...@stressinduktion.org> wrote:
> > > > > > IPv6 you cannot touch anymore. The hashing algorithm is part of 
> > > > > > uAPI.
> > > > > > You don't want to give people new IPv6 addresses with the same 
> > > > > > stable
> > > > > > secret (across reboots) after a kernel upgrade. Maybe they lose
> > > > > > connectivity then and it is extra work?
> > > > > 
> > > > > Ahh, too bad. So it goes.
> > > > 
> > > > If no other users survive we can put it into the ipv6 module.
> > > > 
> > > > > > The bpf hash stuff can be changed during this merge window, as it is
> > > > > > not yet in a released kernel. Albeit I would probably have preferred
> > > > > > something like sha256 here, which can be easily replicated by user
> > > > > > space tools (minus the problem of patching out references to not
> > > > > > hashable data, which must be zeroed).
> > > > > 
> > > > > Oh, interesting, so time is of the essence then. Do you want to handle
> > > > > changing the new eBPF code to something not-SHA1 before it's too late,
> > > > > as part of a ne
> > > 
> > > w patchset that can fast track itself to David? And
> > > > > then I can preserve my large series for the next merge window.
> > > > 
> > > > This algorithm should be a non-seeded algorithm, because the hashes
> > > > should be stable and verifiable by user space tooling. Thus this would
> > > > need a hashing algorithm that is hardened against pre-image
> > > > attacks/collision resistance, which siphash is not. I would prefer some
> > > > higher order SHA algorithm for that actually.
> > > > 
> > > 
> > > You mean:
> > > 
> > > commit 7bd509e311f408f7a5132fcdde2069af65fa05ae
> > > Author: Daniel Borkmann <dan...@iogearbox.net>
> > > Date:   Sun Dec 4 23:19:41 2016 +0100
> > > 
> > >  bpf: add prog_digest and expose it via fdinfo/netlink
> > > 
> > > 
> > > Yes, please!  This actually matters for security -- imagine a
> > > malicious program brute-forcing a collision so that it gets loaded
> > > wrong.  And this is IMO a use case for SHA-256 or SHA-512/256
> > > (preferably the latter).  Speed basically doesn't matter here and
> > > Blake2 is both less stable (didn't they slightly change it recently?)
> > > and much less well studied.
> > 
> > We don't prevent ebpf programs being loaded based on the digest but
> > just to uniquely identify loaded programs from user space and match up
> > with their source.
> > 
> > There have been talks about signing bpf programs, thus this would
> > probably need another digest algorithm additionally to that one,
> > wasting probably instructions. Probably going somewhere in direction of
> > PKCS#7 might be the thing to do (which leads to the problem to make
> > PKCS#7 attackable by ordinary unpriv users, hmpf).
> > 
> > > My inclination would have been to seed them with something that isn't
> > > exposed to userspace for the precise reason that it would prevent user
> > > code from making assumptions about what's in the hash.  But if there's
> > > a use case for why user code needs to be able to calculate the hash on
> > > its own, then that's fine.  But perhaps the actual fdinfo string
> > > should be "sha256:abcd1234..." to give some flexibility down the road.
> > > 
> > > Also:
> > > 
> > > +   result = (__force __be32 *)fp->digest;
> > > +   for (i = 0; i < SHA_DIGEST_WORDS; i++)
> > > +   result[i] = cpu_to_be32(fp->digest[i]);
> > > 
> > > Everyone, please, please, please don't open-code crypto primitives.
> > > Is this and the code above it even correct?  It might be but on a very
> > > brief glance it looks wrong to me.  If you're doing this to avoid
> > > depending on crypto, then fi

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-22 Thread Hannes Frederic Sowa
On 22.12.2016 22:11, George Spelvin wrote:
>> I do tend to like Ted's version in which we use batched
>> get_random_bytes() output.  If it's fast enough, it's simpler and lets
>> us get the full strength of a CSPRNG.
> 
> With the ChaCha20 generator, that's fine, although note that this abandons
> anti-backtracking entirely.
> 
> It also takes locks, something the previous get_random_int code
> path avoided.  Do we need to audit the call sites to ensure that's safe?

We have spin_lock_irq* locks on the way. Of course they can hurt in when
contended. The situation should be the same as with get_random_bytes,
callable from every possible situation in the kernel without fear, so
this should also work for get_random_int. A lockdep test should still be
done. ;)

> And there is the issue that the existing callers assume that there's a
> fixed cost per word.  A good half of get_random_long calls are followed by
> "& ~PAGE_MASK" to extract the low 12 bits.  Or "& ((1ul << mmap_rnd_bits)
> - 1)" to extract the low 28.  If we have a buffer we're going to have to
> pay to refill, it would be nice to use less than 8 bytes to satisfy those.
> 
> But that can be a followup patch.  I'm thinking
> 
> unsigned long get_random_bits(unsigned bits)
>   E.g. get_random_bits(PAGE_SHIFT),
>get_random_bits(mmap_rnd_bits),
>   u32 imm_rnd = get_random_bits(32)
> 
> unsigned get_random_mod(unsigned modulus)
>   E.g. get_random_mod(hole) & ~(alignment - 1);
>get_random_mod(port_scan_backoff)
>   (Althogh probably drivers/s390/scsi/zfcp_fc.c should be changed
>   to prandom.)
> 
> with, until the audit is completed:
> #define get_random_int() get_random_bits(32)
> #define get_random_long() get_random_bits(BITS_PER_LONG)

Yes, that does look nice indeed. Accounting for bits instead of bytes
shouldn't be a huge problem either. Maybe it gets a bit more verbose in
case you can't satisfy a request with one batched entropy block and have
to consume randomness from two.

>> It could only mix the output back in every two calls, in which case
>> you can backtrack up to one call but you need to do 2^128 work to
>> backtrack farther.  But yes, this is getting excessively complicated.
> 
> No, if you're willing to accept limited backtrack, this is a perfectly
> acceptable solution, and not too complicated.  You could do it phase-less
> if you like; store the previous output, then after generating the new
> one, mix in both.  Then overwrite the previous output.  (But doing two
> rounds of a crypto primtive to avoid one conditional jump is stupid,
> so forget that.)

Can you quickly explain why we lose the backtracking capability?

ChaCha as a block cipher gives a "perfect" permutation from the output
of either the CRNG or the CPRNG, which actually itself has backtracking
protection.

Thanks for explaining,
Hannes


--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-22 Thread Hannes Frederic Sowa
On 22.12.2016 20:56, Andy Lutomirski wrote:
> It's also not quite clear to me why userspace needs to be able to
> calculate the digest on its own.  A bpf(BPF_CALC_PROGRAM_DIGEST)
> command that takes a BPF program as input and hashes it would seem to
> serve the same purpose, and that would allow the kernel to key the
> digest and change the algorithm down the road without breaking things.

I think that people expect digests of BPF programs to be stable over
time and reboots.

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Hannes Frederic Sowa
On 22.12.2016 16:54, Theodore Ts'o wrote:
> On Thu, Dec 22, 2016 at 02:10:33PM +0100, Jason A. Donenfeld wrote:
>> On Thu, Dec 22, 2016 at 1:47 PM, Hannes Frederic Sowa
>> <han...@stressinduktion.org> wrote:
>>> following up on what appears to be a random subject: ;)
>>>
>>> IIRC, ext4 code by default still uses half_md4 for hashing of filenames
>>> in the htree. siphash seems to fit this use case pretty good.
>>
>> I saw this too. I'll try to address it in v8 of this series.
> 
> This is a separate issue, and this series is getting a bit too
> complex.  So I'd suggest pushing this off to a separate change.
> 
> Changing the htree hash algorithm is an on-disk format change, and so
> we couldn't roll it out until e2fsprogs gets updated and rolled out
> pretty broadley.  In fact George sent me patches to add siphash as a
> hash algorithm for htree a while back (for both the kernel and
> e2fsprogs), but I never got around to testing and applying them,
> mainly because while it's technically faster, I had other higher
> priority issues to work on --- and see previous comments regarding
> pixel peeping.  Improving the hash algorithm by tens or even hundreds
> of nanoseconds isn't really going to matter since we only do a htree
> lookup on a file creation or cold cache lookup, and the SSD or HDD I/O
> times will dominate.  And from the power perspective, saving
> microwatts of CPU power isn't going to matter if you're going to be
> spinning up the storage device

I wasn't concerned about performance but more about DoS resilience. I
wonder how safe half md4 actually is in terms of allowing users to
generate long hash chains in the filesystem (in terms of length
extension attacks against half_md4).

In ext4, is it actually possible that a "disrupter" learns about the
hashing secret in the way how the inodes are returned during getdents?

Thanks,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-22 Thread Hannes Frederic Sowa
On Thu, 2016-12-22 at 09:25 -0800, Andy Lutomirski wrote:
> On Thu, Dec 22, 2016 at 8:59 AM, Hannes Frederic Sowa
> <han...@stressinduktion.org> wrote:
> > On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
> > > 
> > > You mean:
> > > 
> > > commit 7bd509e311f408f7a5132fcdde2069af65fa05ae
> > > Author: Daniel Borkmann <dan...@iogearbox.net>
> > > Date:   Sun Dec 4 23:19:41 2016 +0100
> > > 
> > > bpf: add prog_digest and expose it via fdinfo/netlink
> > > 
> > > 
> > > Yes, please!  This actually matters for security -- imagine a
> > > malicious program brute-forcing a collision so that it gets loaded
> > > wrong.  And this is IMO a use case for SHA-256 or SHA-512/256
> > > (preferably the latter).  Speed basically doesn't matter here and
> > > Blake2 is both less stable (didn't they slightly change it recently?)
> > > and much less well studied.
> > 
> > We don't prevent ebpf programs being loaded based on the digest but
> > just to uniquely identify loaded programs from user space and match up
> > with their source.
> 
> The commit log talks about using the hash to see if the program has
> already been compiled and JITted.  If that's done, then a collision
> will directly cause the kernel to malfunction.

Yeah, it still shouldn't crash the kernel but it could cause
malfunctions because assumptions are not met from user space thus it
could act in a strange way:

My personal biggest concern is that users of this API will at some
point in time assume this digist is unique (as a key itself for a
hashtables f.e.), while it is actually not (and not enforced so by the
kernel). If you can get an unpriv ebpf program inserted to the kernel
with the same weak hash, a controller daemon could pick it up and bind
it to another ebpf hook, probably outside of the unpriv realm the user
was in before. Only the sorting matters, which might be unstable and is
not guaranteed by anything in most hash table based data structures.

The API seems flawed to me.

> > > My inclination would have been to seed them with something that isn't
> > > exposed to userspace for the precise reason that it would prevent user
> > > code from making assumptions about what's in the hash.  But if there's
> > > a use case for why user code needs to be able to calculate the hash on
> > > its own, then that's fine.  But perhaps the actual fdinfo string
> > > should be "sha256:abcd1234..." to give some flexibility down the road.

To add to this, I am very much in favor of that. Right now it doesn't
have a name because it is a custom algorithm. ;)

> > > 
> > > Also:
> > > 
> > > +   result = (__force __be32 *)fp->digest;
> > > +   for (i = 0; i < SHA_DIGEST_WORDS; i++)
> > > +   result[i] = cpu_to_be32(fp->digest[i]);
> > > 
> > > Everyone, please, please, please don't open-code crypto primitives.
> > > Is this and the code above it even correct?  It might be but on a very
> > > brief glance it looks wrong to me.  If you're doing this to avoid
> > > depending on crypto, then fix crypto so you can pull in the algorithm
> > > without pulling in the whole crypto core.
> > 
> > The hashing is not a proper sha1 neither, unfortunately. I think that
> > is why it will have a custom implementation in iproute2?
> 
> Putting on crypto hat:
> 
> NAK NAK NAK NAK NAK.  "The Linux kernel invented a new primitive in
> 2016 when people know better and is going to handle it by porting that
> new primitive to userspace" is not a particularly good argument.
> 
> Okay, crypto hack back off.
> 
> > 
> > I wondered if bpf program loading should have used the module loading
> > infrastructure from the beginning...
> 
> That would be way too complicated and would be nasty for the unprivileged 
> cases.

I was more or less just thinking about using the syscalls and user
space representation not the generic infrastructure, as it is anyway
too much concerned with real kernel modules (would probably also solve
your cgroup-ebpf thread, as it can be passed by unique name or name and
hash ;) ). Anyway...

> > > At the very least, there should be a separate function that calculates
> > > the hash of a buffer and that function should explicitly run itself
> > > against test vectors of various lengths to make sure that it's
> > > calculating what it claims to be calculating.  And it doesn't look
> > > like the userspace code has landed, so, if this thing isn't
> > > calculating SHA1 correctly, it's plausible that no one has noticed.
> > 
> > I hope this was known from the beginning, this is not sha1 unfortunately.
> > 
> > But ebpf elf programs also need preprocessing to get rid of some
> > embedded load-depending data, so maybe it was considered to be just
> > enough?
> 
> I suspect it was actually an accident.

Maybe, I don't know.

Bye,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-22 Thread Hannes Frederic Sowa
On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
> On Thu, Dec 22, 2016 at 7:51 AM, Hannes Frederic Sowa
> <han...@stressinduktion.org> wrote:
> > On Thu, 2016-12-22 at 16:41 +0100, Jason A. Donenfeld wrote:
> > > Hi Hannes,
> > > 
> > > On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa
> > > <han...@stressinduktion.org> wrote:
> > > > IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI.
> > > > You don't want to give people new IPv6 addresses with the same stable
> > > > secret (across reboots) after a kernel upgrade. Maybe they lose
> > > > connectivity then and it is extra work?
> > > 
> > > Ahh, too bad. So it goes.
> > 
> > If no other users survive we can put it into the ipv6 module.
> > 
> > > > The bpf hash stuff can be changed during this merge window, as it is
> > > > not yet in a released kernel. Albeit I would probably have preferred
> > > > something like sha256 here, which can be easily replicated by user
> > > > space tools (minus the problem of patching out references to not
> > > > hashable data, which must be zeroed).
> > > 
> > > Oh, interesting, so time is of the essence then. Do you want to handle
> > > changing the new eBPF code to something not-SHA1 before it's too late,
> > > as part of a ne
> 
> w patchset that can fast track itself to David? And
> > > then I can preserve my large series for the next merge window.
> > 
> > This algorithm should be a non-seeded algorithm, because the hashes
> > should be stable and verifiable by user space tooling. Thus this would
> > need a hashing algorithm that is hardened against pre-image
> > attacks/collision resistance, which siphash is not. I would prefer some
> > higher order SHA algorithm for that actually.
> > 
> 
> You mean:
> 
> commit 7bd509e311f408f7a5132fcdde2069af65fa05ae
> Author: Daniel Borkmann <dan...@iogearbox.net>
> Date:   Sun Dec 4 23:19:41 2016 +0100
> 
> bpf: add prog_digest and expose it via fdinfo/netlink
> 
> 
> Yes, please!  This actually matters for security -- imagine a
> malicious program brute-forcing a collision so that it gets loaded
> wrong.  And this is IMO a use case for SHA-256 or SHA-512/256
> (preferably the latter).  Speed basically doesn't matter here and
> Blake2 is both less stable (didn't they slightly change it recently?)
> and much less well studied.

We don't prevent ebpf programs being loaded based on the digest but
just to uniquely identify loaded programs from user space and match up
with their source.

There have been talks about signing bpf programs, thus this would
probably need another digest algorithm additionally to that one,
wasting probably instructions. Probably going somewhere in direction of
PKCS#7 might be the thing to do (which leads to the problem to make
PKCS#7 attackable by ordinary unpriv users, hmpf).

> My inclination would have been to seed them with something that isn't
> exposed to userspace for the precise reason that it would prevent user
> code from making assumptions about what's in the hash.  But if there's
> a use case for why user code needs to be able to calculate the hash on
> its own, then that's fine.  But perhaps the actual fdinfo string
> should be "sha256:abcd1234..." to give some flexibility down the road.
> 
> Also:
> 
> +   result = (__force __be32 *)fp->digest;
> +   for (i = 0; i < SHA_DIGEST_WORDS; i++)
> +   result[i] = cpu_to_be32(fp->digest[i]);
> 
> Everyone, please, please, please don't open-code crypto primitives.
> Is this and the code above it even correct?  It might be but on a very
> brief glance it looks wrong to me.  If you're doing this to avoid
> depending on crypto, then fix crypto so you can pull in the algorithm
> without pulling in the whole crypto core.

The hashing is not a proper sha1 neither, unfortunately. I think that
is why it will have a custom implementation in iproute2?

I wondered if bpf program loading should have used the module loading
infrastructure from the beginning...

> At the very least, there should be a separate function that calculates
> the hash of a buffer and that function should explicitly run itself
> against test vectors of various lengths to make sure that it's
> calculating what it claims to be calculating.  And it doesn't look
> like the userspace code has landed, so, if this thing isn't
> calculating SHA1 correctly, it's plausible that no one has noticed.

I hope this was known from the beginning, this is not sha1 unfortunately.

But ebpf elf programs also need preprocessing to get rid of some
embedded load-depending data, so maybe it was considered to be just
enough?

Bye,
Hannes


--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Hannes Frederic Sowa
On Thu, 2016-12-22 at 16:41 +0100, Jason A. Donenfeld wrote:
> Hi Hannes,
> 
> On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa
> <han...@stressinduktion.org> wrote:
> > IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI.
> > You don't want to give people new IPv6 addresses with the same stable
> > secret (across reboots) after a kernel upgrade. Maybe they lose
> > connectivity then and it is extra work?
> 
> Ahh, too bad. So it goes.

If no other users survive we can put it into the ipv6 module.

> > The bpf hash stuff can be changed during this merge window, as it is
> > not yet in a released kernel. Albeit I would probably have preferred
> > something like sha256 here, which can be easily replicated by user
> > space tools (minus the problem of patching out references to not
> > hashable data, which must be zeroed).
> 
> Oh, interesting, so time is of the essence then. Do you want to handle
> changing the new eBPF code to something not-SHA1 before it's too late,
> as part of a new patchset that can fast track itself to David? And
> then I can preserve my large series for the next merge window.

This algorithm should be a non-seeded algorithm, because the hashes
should be stable and verifiable by user space tooling. Thus this would
need a hashing algorithm that is hardened against pre-image
attacks/collision resistance, which siphash is not. I would prefer some
higher order SHA algorithm for that actually.

Bye,
Hannes
 
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Hannes Frederic Sowa
On Thu, 2016-12-22 at 16:29 +0100, Jason A. Donenfeld wrote:
> On Thu, Dec 22, 2016 at 4:12 PM, Jason A. Donenfeld  wrote:
> > As a first step, I'm considering adding a patch to move halfmd4.c
> > inside the ext4 domain, or at the very least, simply remove it from
> > linux/cryptohash.h. That'll then leave the handful of bizarre sha1
> > usages to consider.
> 
> Specifically something like this:
> 
> https://git.zx2c4.com/linux-dev/commit/?h=siphash=978213351f9633bd1e3d1fdc3f19d28e36eeac90
> 
> That only leaves two more uses of "cryptohash" to consider, but they
> require a bit of help. First, sha_transform in net/ipv6/addrconf.c.
> That might be a straight-forward conversion to SipHash, but perhaps
> not; I need to look closely and think about it. The next is
> sha_transform in kernel/bpf/core.c. I really have no idea what's going
> on with the eBPF stuff, so that will take a bit longer to study. Maybe
> sha1 is fine in the end there? I'm not sure yet.

IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI.
You don't want to give people new IPv6 addresses with the same stable
secret (across reboots) after a kernel upgrade. Maybe they lose
connectivity then and it is extra work?

The bpf hash stuff can be changed during this merge window, as it is
not yet in a released kernel. Albeit I would probably have preferred
something like sha256 here, which can be easily replicated by user
space tools (minus the problem of patching out references to not
hashable data, which must be zeroed).

Bye,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Hannes Frederic Sowa
On 22.12.2016 14:10, Jason A. Donenfeld wrote:
> On Thu, Dec 22, 2016 at 1:47 PM, Hannes Frederic Sowa
> <han...@stressinduktion.org> wrote:
>> following up on what appears to be a random subject: ;)
>>
>> IIRC, ext4 code by default still uses half_md4 for hashing of filenames
>> in the htree. siphash seems to fit this use case pretty good.
> 
> I saw this too. I'll try to address it in v8 of this series.

This change would need a new version of the ext4 super block, because
you should not change existing file systems.


--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Hannes Frederic Sowa
Hi Ted,

On Thu, 2016-12-22 at 00:41 -0500, Theodore Ts'o wrote:
> On Thu, Dec 22, 2016 at 03:49:39AM +0100, Jason A. Donenfeld wrote:
> > 
> > Funny -- while you guys were sending this back & forth, I was writing
> > my reply to Andy which essentially arrives at the same conclusion.
> > Given that we're all arriving to the same thing, and that Ted shot in
> > this direction long before we all did, I'm leaning toward abandoning
> > SipHash for the de-MD5-ification of get_random_int/long, and working
> > on polishing Ted's idea into something shiny for this patchset.
> 
> here are my numbers comparing siphash (using the first three patches
> of the v7 siphash patches) with my batched chacha20 implementation.
> The results are taken by running get_random_* 1 times, and then
> dividing the numbers by 1 to get the average number of cycles for
> the call.  I compiled 32-bit and 64-bit kernels, and ran the results
> using kvm:
> 
>siphashbatched chacha20
>  get_random_int  get_random_long   get_random_int   get_random_long   
> 
> 32-bit270  278 114146
> 64-bit 75   75 106186
> 
> > I did have two objections to this. The first was that my SipHash
> > construction is faster.
> 
> Well, it's faster on everything except 32-bit x86.  :-P
> 
> > The second, and the more
> > important one, was that batching entropy up like this means that 32
> > calls will be really fast, and then the 33rd will be slow, since it
> > has to do a whole ChaCha round, because get_random_bytes must be
> > called to refill the batch.
> 
> ... and this will take 2121 cycles on 64-bit x86, and 2315 cycles on a
> 32-bit x86.  Which on a 2.3 GHz processor, is just under a
> microsecond.  As far as being inconsistent on process startup, I very
> much doubt a microsecond is really going to be visible to the user.  :-)
> 
> The bottom line is that I think we're really "pixel peeping" at this
> point --- which is what obsessed digital photographers will do when
> debating the quality of a Canon vs Nikon DSLR by blowing up a photo by
> a thousand times, and then trying to claim that this is visible to the
> human eye.  Or people who obsessing over the frequency response curves
> of TH-X00 headphones with Mahogony vs Purpleheart wood, when it's
> likely that in a blind head-to-head comparison, most people wouldn't
> be able to tell the difference
> 
> I think the main argument for using the batched getrandom approach is
> that it, I would argue, simpler than introducing siphash into the
> picture.  On 64-bit platforms it is faster and more consistent, so
> it's basically that versus complexity of having to adding siphash to
> the things that people have to analyze when considering random number
> security on Linux.   But it's a close call either way, I think.

following up on what appears to be a random subject: ;)

IIRC, ext4 code by default still uses half_md4 for hashing of filenames
in the htree. siphash seems to fit this use case pretty good.

xfs could also need an update, as they don't seed the directory hash
tables at all (but not sure if they are vulnerable). I should improve
[1] a bit.

[1] http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/cmds/xfstests.git;a=blo
b;f=src/dirhash_collide.c;h=55cec872d5061ac2ca0f56d1f11e6bf349d5bb97;hb
=HEAD

Bye,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-21 Thread Hannes Frederic Sowa
On 22.12.2016 00:42, Andy Lutomirski wrote:
> On Wed, Dec 21, 2016 at 3:02 PM, Jason A. Donenfeld  wrote:
>>  unsigned int get_random_int(void)
>>  {
>> -   __u32 *hash;
>> -   unsigned int ret;
>> -
>> -   if (arch_get_random_int())
>> -   return ret;
>> -
>> -   hash = get_cpu_var(get_random_int_hash);
>> -
>> -   hash[0] += current->pid + jiffies + random_get_entropy();
>> -   md5_transform(hash, random_int_secret);
>> -   ret = hash[0];
>> -   put_cpu_var(get_random_int_hash);
>> -
>> -   return ret;
>> +   unsigned int arch_result;
>> +   u64 result;
>> +   struct random_int_secret *secret;
>> +
>> +   if (arch_get_random_int(_result))
>> +   return arch_result;
>> +
>> +   secret = get_random_int_secret();
>> +   result = siphash_3u64(secret->chaining, jiffies,
>> + (u64)random_get_entropy() + current->pid,
>> + secret->secret);
>> +   secret->chaining += result;
>> +   put_cpu_var(secret);
>> +   return result;
>>  }
>>  EXPORT_SYMBOL(get_random_int);
> 
> Hmm.  I haven't tried to prove anything for real.  But here goes (in
> the random oracle model):
> 
> Suppose I'm an attacker and I don't know the secret or the chaining
> value.  Then, regardless of what the entropy is, I can't predict the
> numbers.
> 
> Now suppose I do know the secret and the chaining value due to some
> leak.  If I want to deduce prior outputs, I think I'm stuck: I'd need
> to find a value "result" such that prev_chaining + result = chaining
> and result = H(prev_chaining, ..., secret);.  I don't think this can
> be done efficiently in the random oracle model regardless of what the
> "..." is.
> 
> But, if I know the secret and chaining value, I can predict the next
> output assuming I can guess the entropy.  What's worse is that, even
> if I can't guess the entropy, if I *observe* the next output then I
> can calculate the next chaining value.
> 
> So this is probably good enough, and making it better is hard.  Changing it 
> to:
> 
> u64 entropy = (u64)random_get_entropy() + current->pid;
> result = siphash(..., entropy, ...);
> secret->chaining += result + entropy;
> 
> would reduce this problem by forcing an attacker to brute-force the
> entropy on each iteration, which is probably an improvement.
> 
> To fully fix it, something like "catastrophic reseeding" would be
> needed, but that's hard to get right.

I wonder if Ted's proposal was analyzed further in terms of performance
if get_random_int should provide cprng alike properties?

For reference: https://lkml.org/lkml/2016/12/14/351

The proposal made sense to me and would completely solve the above
mentioned problem on the cost of repeatedly reseeding from the crng.

Bye,
Hannes


--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Hannes Frederic Sowa
On Fri, Dec 16, 2016, at 22:01, Jason A. Donenfeld wrote:
> Yes, on x86-64. But on i386 chacha20 incurs nearly the same kind of
> slowdown as siphash, so I expect the comparison to be more or less
> equal. There's another thing I really didn't like about your chacha20
> approach which is that it uses the /dev/urandom pool, which means
> various things need to kick in in the background to refill this.
> Additionally, having to refill the buffered chacha output every 32 or
> so longs isn't nice. These things together make for inconsistent and
> hard to understand general operating system performance, because
> get_random_long is called at every process startup for ASLR. So, in
> the end, I believe there's another reason for going with the siphash
> approach: deterministic performance.

*Hust*, so from where do you generate your key for siphash if called
early from ASLR?

Bye,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/4] siphash: add cryptographically secure hashtable function

2016-12-15 Thread Hannes Frederic Sowa
On 16.12.2016 00:43, Jason A. Donenfeld wrote:
> Hi Hannes,
> 
> Good news.
> 
> On Thu, Dec 15, 2016 at 10:45 PM, Hannes Frederic Sowa
> <han...@stressinduktion.org> wrote:
>>> How's that sound?
>>
>> I am still very much concerned about the API.
> 
> Thanks for pushing me and putting up with my daftness... the constant
> folding works absolutely perfectly. I've run several tests. When gcc
> knows that a struct is aligned (say, via __aligned(8)), then it erases
> the branch and makes a direct jump to the aligned code. When it's
> uncertain, it evaluates at runtime. So, now there is a single
> siphash() function that chooses the best one automatically. Behind the
> scene there's siphash_aligned and siphash_unaligned, but nobody needs
> to call these directly. (Should I rename these to have a double
> underscore prefix?) On platforms that have
> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS, of course all of this
> disappears and everything goes directly to the aligned version.
> 
> So, I think this assuages your concerns entirely. A single API entry
> point that does the right thing.
> 
> Whew! Good thinking, and thanks again for the suggestion.

Awesome, thanks for trying this out. This basically resolves my concern
API-wise so far.

Hannes out. ;)

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/4] siphash: add cryptographically secure hashtable function

2016-12-15 Thread Hannes Frederic Sowa
On 15.12.2016 22:25, Jason A. Donenfeld wrote:
> On Thu, Dec 15, 2016 at 10:17 PM, Hannes Frederic Sowa
> <han...@stressinduktion.org> wrote:
>> And I was exactly questioning this.
>>
>> static unsigned int inet6_hash_frag(__be32 id, const struct in6_addr *saddr,
>> const struct in6_addr *daddr)
>> {
>> net_get_random_once(_frags.rnd, sizeof(ip6_frags.rnd));
>> return jhash_3words(ipv6_addr_hash(saddr), ipv6_addr_hash(daddr),
>> (__force u32)id, ip6_frags.rnd);
>> }
> 
> For this example, the replacement is the function entitled siphash_4u32:
> 
>  static unsigned int inet6_hash_frag(__be32 id, const struct in6_addr *saddr,
>  const struct in6_addr *daddr)
>  {
>  net_get_random_once(_frags.rnd, sizeof(ip6_frags.rnd));
>  return siphash_4u32(ipv6_addr_hash(saddr), ipv6_addr_hash(daddr),
>  (__force u32)id, 0, ip6_frags.rnd);
>  }
> 
> And then you make ip6_frags.rnd be of type siphash_key_t. Then
> everything is taken care of and works beautifully. Please see v5 of
> this patchset.

Sorry to not be specific enough, the Hash-DoS is in ipv6_addr_hash.
Maybe it was a silly example to start with, sorry. But anyway, your
proposal wouldn't have prevented the hash DoS. I wanted to show how it
can be difficult to make sure that all pointers come from an appropriate
aligned memory region.

The idea would be to actually factor out the key in the data structure
and align it with __aligned(SIPHASH_ALIGNMENT), make sure the padding
bits are all equal zero to not cause any bugs and irregularities with
the corresponding equality function. This might need some serious review
when switching to siphash to actually make use of it and prevent
HashDoS. Or simply use the unaligned version always...

>> I would be interested if the compiler can actually constant-fold the
>> address of the stack allocation with an simple if () or some
>> __builtin_constant_p fiddeling, so we don't have this constant review
>> overhead to which function we pass which data. This would also make
>> this whole discussion moot.
> 
> I'll play with it to see if the compiler is capable of doing that.
> Does anybody know off hand if it is or if there are other examples of
> the compiler doing that?

Not of the top of my head, but it should be easy to test.

> In any case, for all current replacement of jhash_1word, jhash_2words,
> jhash_3words, there's the siphash_2u32 or siphash_4u32 functions. This
> covers the majority of cases.

Agreed and this is also totally fine by me.

> For replacements of md5_transform, either the data is small and can
> fit in siphash_Nu{32,64}, or it can be put into a struct explicitly
> aligned on the stack.

> For the remaining use of jhash_nwords, either siphash() can be used or
> siphash_unaligned() can be used if the source is of unknown alignment.
> Both functions have their alignment requirements (or lack thereof)
> documented in a docbook comment.

I think the warning needs to be bigger, seriously. Most of the people
develop on 64 bit arch, where it will just work during testing and break
later on 32 bit. ;)

> I'll look into the constant folding to see if it actually works. If it
> does, I'll use it. If not, I believe the current solution works.
> 
> How's that sound?

I am still very much concerned about the API.

By the way, if you target net-next, it is currently closed. So no need
to hurry.

Bye,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/4] siphash: add cryptographically secure hashtable function

2016-12-15 Thread Hannes Frederic Sowa
On 15.12.2016 21:43, Jason A. Donenfeld wrote:
> On Thu, Dec 15, 2016 at 9:31 PM, Hannes Frederic Sowa
> <han...@stressinduktion.org> wrote:
>> ARM64 and x86-64 have memory operations that are not vector operations
>> that operate on 128 bit memory.
> 
> Fair enough. imull I guess.
> 
>> How do you know that the compiler for some architecture will not chose a
>> more optimized instruction to load a 64 bit memory value into two 32 bit
>> registers if you tell the compiler it is 8 byte aligned but it actually
>> isn't? I don't know the answer but telling the compiler some data is 8
>> byte aligned while it isn't really pretty much seems like a call for
>> trouble.
> 
> If a compiler is in the business of using special 64-bit instructions
> on 64-bit aligned data, then it is also the job of the compiler to
> align structs to 64-bits when passed __aligned(8), which is what we've
> done in this code. If the compiler were to hypothetically choose to
> ignore that and internally convert it to a __aligned(4), then it would
> only be able to do so with the knowledge that it will never use 64-bit
> aligned data instructions. But so far as I can tell, gcc always
> respects __aligned(8), which is why I use it in this patchset.
> 
> I think there might have been confusion here, because perhaps someone
> was hoping that since in6_addr is 128-bits, that the __aligned
> attribute would not be required and that the struct would just
> automatically be aligned to at least 8 bytes. But in fact, as I
> mentioned, in6_addr is actually composed of u32[4] and not u64[2], so
> it will only be aligned to 4 bytes, making the __aligned(8) necessary.
> 
> I think for the purposes of this patchset, this is a solved problem.
> There's the unaligned version of the function if you don't know about
> the data, and there's the aligned version if you're using
> __aligned(SIPHASH_ALIGNMENT) on your data. Plain and simple.

And I was exactly questioning this.

static unsigned int inet6_hash_frag(__be32 id, const struct in6_addr *saddr,
const struct in6_addr *daddr)
{
net_get_random_once(_frags.rnd, sizeof(ip6_frags.rnd));
return jhash_3words(ipv6_addr_hash(saddr), ipv6_addr_hash(daddr),
(__force u32)id, ip6_frags.rnd);
}

This function had a hash DoS (and kind of still has), but it has been
mitigated by explicit checks, I hope.

So you start looking for all the pointers where ipv6 addresses could
come from and find some globally defined struct where I would need to
put the aligned(SIPHASH_ALIGNMENT) into to make this work on 32 bit
code? Otherwise just the unaligned version is safe on 32 bit code.

Who knows this? It isn't even obvious by looking at the header!

I would be interested if the compiler can actually constant-fold the
address of the stack allocation with an simple if () or some
__builtin_constant_p fiddeling, so we don't have this constant review
overhead to which function we pass which data. This would also make
this whole discussion mood.

Bye,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/4] siphash: add cryptographically secure hashtable function

2016-12-15 Thread Hannes Frederic Sowa
On 15.12.2016 22:04, Peter Zijlstra wrote:
> On Thu, Dec 15, 2016 at 09:43:04PM +0100, Jason A. Donenfeld wrote:
>> On Thu, Dec 15, 2016 at 9:31 PM, Hannes Frederic Sowa
>> <han...@stressinduktion.org> wrote:
>>> ARM64 and x86-64 have memory operations that are not vector operations
>>> that operate on 128 bit memory.
>>
>> Fair enough. imull I guess.
> 
> imull is into rdx:rax, not memory. I suspect he's talking about
> cmpxchg16b.

Exactly and I think I saw a ll/sc 128 bit on armv8 to atomically
manipulate linked lists.

Bye,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/4] siphash: add cryptographically secure hashtable function

2016-12-15 Thread Hannes Frederic Sowa
Hello,

On 15.12.2016 19:50, Jason A. Donenfeld wrote:
> Hi David & Hannes,
> 
> This conversation is veering off course.

Why?

> I think this doesn't really
> matter at all. Gcc converts u64 into essentially a pair of u32 on
> 32-bit platforms, so the alignment requirements for 32-bit is at a
> maximum 32 bits. On 64-bit platforms the alignment requirements are
> related at a maximum to the biggest register size, so 64-bit
> alignment. For this reason, no matter the behavior of __aligned(8),
> we're okay. Likewise, even without __aligned(8), if gcc aligns structs
> by their biggest member, then we get 4 byte alignment on 32-bit and 8
> byte alignment on 64-bit, which is fine. There's no 32-bit platform
> that will trap on a 64-bit unaligned access because there's no such
> thing as a 64-bit access there. In short, we're fine.

ARM64 and x86-64 have memory operations that are not vector operations
that operate on 128 bit memory.

How do you know that the compiler for some architecture will not chose a
more optimized instruction to load a 64 bit memory value into two 32 bit
registers if you tell the compiler it is 8 byte aligned but it actually
isn't? I don't know the answer but telling the compiler some data is 8
byte aligned while it isn't really pretty much seems like a call for
trouble.

Why can't a compiler not vectorize this code if it can prove that it
doesn't conflict with other register users?

Bye,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/4] siphash: add cryptographically secure hashtable function

2016-12-15 Thread Hannes Frederic Sowa
On 15.12.2016 16:41, David Laight wrote:
> Try (retyped):
> 
> echo 'struct { long a; long long b; } s; int bar { return sizeof s; }' >foo.c
> gcc [-m32] -O2 -S foo.c; cat foo.s
> 
> And look at what is generated.

I used __alignof__(unsigned long long) with -m32.

>> Right now ipv6 addresses have an alignment of 4. So we couldn't even
>> naturally pass them to siphash but would need to copy them around, which
>> I feel like a source of bugs.
> 
> That is more of a problem on systems that don't support misaligned accesses.
> Reading the 64bit values with two explicit 32bit reads would work.
> I think you can get gcc to do that by adding an aligned(4) attribute to the
> structure member.

Yes, and that is actually my fear, because we support those
architectures. I can't comment on that as I don't understand enough of this.

If someone finds a way to cause misaligned reads on a small box this
seems (maybe depending on sysctls they get fixed up or panic) to be a
much bigger issue than having a hash DoS.

Thanks,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/4] siphash: add cryptographically secure hashtable function

2016-12-15 Thread Hannes Frederic Sowa
On 15.12.2016 14:56, David Laight wrote:
> From: Hannes Frederic Sowa
>> Sent: 15 December 2016 12:50
>> On 15.12.2016 13:28, David Laight wrote:
>>> From: Hannes Frederic Sowa
>>>> Sent: 15 December 2016 12:23
>>> ...
>>>> Hmm? Even the Intel ABI expects alignment of unsigned long long to be 8
>>>> bytes on 32 bit. Do you question that?
>>>
>>> Yes.
>>>
>>> The linux ABI for x86 (32 bit) only requires 32bit alignment for u64 (etc).
>>
>> Hmm, u64 on 32 bit is unsigned long long and not unsigned long. Thus I
>> am actually not sure if the ABI would say anything about that (sorry
>> also for my wrong statement above).
>>
>> Alignment requirement of unsigned long long on gcc with -m32 actually
>> seem to be 8.
> 
> It depends on the architecture.
> For x86 it is definitely 4.

May I ask for a reference? I couldn't see unsigned long long being
mentioned in the ia32 abi spec that I found. I agree that those accesses
might be synthetically assembled by gcc and for me the alignment of 4
would have seemed natural. But my gcc at least in 32 bit mode disagrees
with that.

> It might be 8 for sparc, ppc and/or alpha.

This is something to find out...

Right now ipv6 addresses have an alignment of 4. So we couldn't even
naturally pass them to siphash but would need to copy them around, which
I feel like a source of bugs.

Bye,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/4] siphash: add cryptographically secure hashtable function

2016-12-15 Thread Hannes Frederic Sowa
On 15.12.2016 13:28, David Laight wrote:
> From: Hannes Frederic Sowa
>> Sent: 15 December 2016 12:23
> ...
>> Hmm? Even the Intel ABI expects alignment of unsigned long long to be 8
>> bytes on 32 bit. Do you question that?
> 
> Yes.
> 
> The linux ABI for x86 (32 bit) only requires 32bit alignment for u64 (etc).

Hmm, u64 on 32 bit is unsigned long long and not unsigned long. Thus I
am actually not sure if the ABI would say anything about that (sorry
also for my wrong statement above).

Alignment requirement of unsigned long long on gcc with -m32 actually
seem to be 8.


--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/4] siphash: add cryptographically secure hashtable function

2016-12-15 Thread Hannes Frederic Sowa
On 15.12.2016 12:04, David Laight wrote:
> From: Hannes Frederic Sowa
>> Sent: 14 December 2016 22:03
>> On 14.12.2016 13:46, Jason A. Donenfeld wrote:
>>> Hi David,
>>>
>>> On Wed, Dec 14, 2016 at 10:56 AM, David Laight <david.lai...@aculab.com> 
>>> wrote:
>>>> ...
>>>>> +u64 siphash24(const u8 *data, size_t len, const u8 
>>>>> key[SIPHASH24_KEY_LEN])
>>>> ...
>>>>> + u64 k0 = get_unaligned_le64(key);
>>>>> + u64 k1 = get_unaligned_le64(key + sizeof(u64));
>>>> ...
>>>>> + m = get_unaligned_le64(data);
>>>>
>>>> All these unaligned accesses are going to get expensive on architectures
>>>> like sparc64.
>>>
>>> Yes, the unaligned accesses aren't pretty. Since in pretty much all
>>> use cases thus far, the data can easily be made aligned, perhaps it
>>> makes sense to create siphash24() and siphash24_unaligned(). Any
>>> thoughts on doing something like that?
>>
>> I fear that the alignment requirement will be a source of bugs on 32 bit
>> machines, where you cannot even simply take a well aligned struct on a
>> stack and put it into the normal siphash(aligned) function without
>> adding alignment annotations everywhere. Even blocks returned from
>> kmalloc on 32 bit are not aligned to 64 bit.
> 
> Are you doing anything that will require 64bit alignment on 32bit systems?
> It is unlikely that the kernel can use any simd registers that have wider
> alignment requirements.
> 
> You also really don't want to request on-stack items have large alignments.
> While gcc can generate code to do it, it isn't pretty.

Hmm? Even the Intel ABI expects alignment of unsigned long long to be 8
bytes on 32 bit. Do you question that?



--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/4] siphash: add cryptographically secure hashtable function

2016-12-15 Thread Hannes Frederic Sowa
On 15.12.2016 00:29, Jason A. Donenfeld wrote:
> Hi Hannes,
> 
> On Wed, Dec 14, 2016 at 11:03 PM, Hannes Frederic Sowa
> <han...@stressinduktion.org> wrote:
>> I fear that the alignment requirement will be a source of bugs on 32 bit
>> machines, where you cannot even simply take a well aligned struct on a
>> stack and put it into the normal siphash(aligned) function without
>> adding alignment annotations everywhere. Even blocks returned from
>> kmalloc on 32 bit are not aligned to 64 bit.
> 
> That's what the "__aligned(SIPHASH24_ALIGNMENT)" attribute is for. The
> aligned siphash function will be for structs explicitly made for
> siphash consumption. For everything else there's siphash_unaligned.

So in case you have a pointer from somewhere on 32 bit you can
essentially only guarantee it has natural alignment or max. native
alignment (based on the arch). gcc only fulfills your request for
alignment when you allocate on the stack (minus gcc bugs).

Let's say you get a pointer from somewhere, maybe embedded in a struct,
which came from kmalloc. kmalloc doesn't care about aligned attribute,
it will align according to architecture description. That said, if you
want to hash that, you would need manually align the memory returned
from kmalloc or make sure the the data is more than naturally aligned on
that architecture.

>> Can we do this a runtime check and just have one function (siphash)
>> dealing with that?
> 
> Seems like the runtime branching on the aligned function would be bad
> for performance, when we likely know at compile time if it's going to
> be aligned or not. I suppose we could add that check just to the
> unaligned version, and rename it to "maybe_unaligned"? Is this what
> you have in mind?

I argue that you mostly don't know at compile time if it is correctly
aligned if the alignment requirements are larger than the natural ones.

Also, we don't even have that for memcpy, even we use it probably much
more than hashing, so I think this is overkill.

Bye,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/4] siphash: add cryptographically secure hashtable function

2016-12-14 Thread Hannes Frederic Sowa
On 14.12.2016 13:46, Jason A. Donenfeld wrote:
> Hi David,
> 
> On Wed, Dec 14, 2016 at 10:56 AM, David Laight  
> wrote:
>> ...
>>> +u64 siphash24(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN])
>> ...
>>> + u64 k0 = get_unaligned_le64(key);
>>> + u64 k1 = get_unaligned_le64(key + sizeof(u64));
>> ...
>>> + m = get_unaligned_le64(data);
>>
>> All these unaligned accesses are going to get expensive on architectures
>> like sparc64.
> 
> Yes, the unaligned accesses aren't pretty. Since in pretty much all
> use cases thus far, the data can easily be made aligned, perhaps it
> makes sense to create siphash24() and siphash24_unaligned(). Any
> thoughts on doing something like that?

I fear that the alignment requirement will be a source of bugs on 32 bit
machines, where you cannot even simply take a well aligned struct on a
stack and put it into the normal siphash(aligned) function without
adding alignment annotations everywhere. Even blocks returned from
kmalloc on 32 bit are not aligned to 64 bit.

Can we do this a runtime check and just have one function (siphash)
dealing with that?

Bye,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/4] secure_seq: use siphash24 instead of md5_transform

2016-12-14 Thread Hannes Frederic Sowa
Hey Jason,

On 14.12.2016 20:38, Jason A. Donenfeld wrote:
> On Wed, Dec 14, 2016 at 8:22 PM, Hannes Frederic Sowa
> <han...@stressinduktion.org> wrote:
>> I don't think this helps. Did you test it? I don't see reason why
>> padding could be left out between `d' and `end' because of the flexible
>> array member?
> 
> Because the type u8 doesn't require any alignment requirements, it can
> nestle right up there cozy with the u16:
> 
> zx2c4@thinkpad ~ $ cat a.c
> #include 
> #include 
> #include 
> int main()
> {
>struct {
>uint64_t a;
>uint32_t b;
>uint32_t c;
>uint16_t d;
>char x[];
>} a;
>printf("%zu\n", sizeof(a));
>printf("%zu\n", offsetof(typeof(a), x));
>return 0;
> }
> zx2c4@thinkpad ~ $ gcc a.c
> zx2c4@thinkpad ~ $ ./a.out
> 24
> 18

Sorry, I misread the patch. You are using offsetof. In this case remove
the char x[] and just use offsetofend because it is misleading
otherwise. Should work like that though.

What I don't really understand is that the addition of this complexity
actually reduces the performance, as you have to take the "if (left)"
branch during hashing and causes you to make a load_unaligned_zeropad.

Bye,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/4] secure_seq: use siphash24 instead of md5_transform

2016-12-14 Thread Hannes Frederic Sowa
On 14.12.2016 19:06, Jason A. Donenfeld wrote:
> Hi David,
> 
> On Wed, Dec 14, 2016 at 6:56 PM, David Miller  wrote:
>> Just marking the structure __packed, whether necessary or not, makes
>> the compiler assume that the members are not aligned and causes
>> byte-by-byte accesses to be performed for words.
>> Never, _ever_, use __packed unless absolutely necessary, it pessimizes
>> the code on cpus that require proper alignment of types.
> 
> Oh, jimminy cricket, I did not realize that it made assignments
> byte-by-byte *always*. So what options am I left with? What
> immediately comes to mind are:
> 
> 1)
> 
> struct {
> u64 a;
> u32 b;
> u32 c;
> u16 d;
> u8 end[];

I don't think this helps. Did you test it? I don't see reason why
padding could be left out between `d' and `end' because of the flexible
array member?

Bye,
Hannes


--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/4] siphash: add cryptographically secure hashtable function

2016-12-14 Thread Hannes Frederic Sowa
Hello,

On 14.12.2016 14:10, Jason A. Donenfeld wrote:
> On Wed, Dec 14, 2016 at 12:21 PM, Hannes Frederic Sowa
> <han...@stressinduktion.org> wrote:
>> Can you show or cite benchmarks in comparison with jhash? Last time I
>> looked, especially for short inputs, siphash didn't beat jhash (also on
>> all the 32 bit devices etc.).
> 
> I assume that jhash is likely faster than siphash, but I wouldn't be
> surprised if with optimization we can make siphash at least pretty
> close on 64-bit platforms. (I'll do some tests though; maybe I'm wrong
> and jhash is already slower.)

Yes, numbers would be very usable here. I am mostly concerned about
small plastic router cases. E.g. assume you double packet processing
time with a change of the hashing function at what point is the actual
packet processing more of an attack vector than the hashtable?

> With that said, siphash is here to replace uses of jhash where
> hashtable poisoning vulnerabilities make it necessary. Where there's
> no significant security improvement, if there's no speed improvement
> either, then of course nothing's going to change.

It still changes currently well working source. ;-)

> I should have mentioned md5_transform in this first message too, as
> two other patches in this series actually replace md5_transform usage
> with siphash. I think in this case, siphash is a clear performance
> winner (and security winner) over md5_transform. So if the push back
> against replacing jhash usages is just too high, at the very least it
> remains useful already for the md5_transform usage.

MD5 is considered broken because its collision resistance is broken?
SipHash doesn't even claim to have collision resistance (which we don't
need here)?

But I agree, certainly it could be a nice speed-up!

>> This pretty much depends on the linearity of the hash function? I don't
>> think a crypto secure hash function is needed for a hash table. Albeit I
>> agree that siphash certainly looks good to be used here.
> 
> In order to prevent the aforementioned poisoning attacks, a PRF with
> perfect linearity is required, which is what's achieved when it's a
> cryptographically secure one. Check out section 7 of
> https://131002.net/siphash/siphash.pdf .

I think you mean non-linearity. Otherwise I agree that siphash is
certainly a better suited hashing algorithm as far as I know. But it
would be really interesting to compare some performance numbers. Hard to
say anything without them.

>> I am pretty sure that SipHash still needs a random key per hash table
>> also. So far it was only the choice of hash function you are questioning.
> 
> Siphash needs a random secret key, yes. The point is that the hash
> function remains secure so long as the secret key is kept secret.
> Other functions can't make the same guarantee, and so nervous periodic
> key rotation is necessary, but in most cases nothing is done, and so
> things just leak over time.
> 
> 
>> Hmm, I tried to follow up with all the HashDoS work and so far didn't
>> see any HashDoS attacks against the Jenkins/SpookyHash family.
>>
>> If this is an issue we might need to also put those changes into stable.
> 
> jhash just isn't secure; it's not a cryptographically secure PRF. If
> there hasn't already been an academic paper put out there about it
> this year, let's make this thread 1000 messages long to garner
> attention, and next year perhaps we'll see one. No doubt that
> motivated government organizations, defense contractors, criminals,
> and other netizens have already done research in private. Replacing
> insecure functions with secure functions is usually a good thing.

I think this is a weak argument.

In general I am in favor to switch to siphash, but it would be nice to
see some benchmarks with the specific kernel implementation also on some
smaller 32 bit CPUs and especially without using any SIMD instructions
(which might have been used in paper comparison).

Bye,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/4] secure_seq: use siphash24 instead of md5_transform

2016-12-14 Thread Hannes Frederic Sowa
On 14.12.2016 13:53, Jason A. Donenfeld wrote:
> Hi David,
> 
> On Wed, Dec 14, 2016 at 10:51 AM, David Laight  
> wrote:
>> From: Jason A. Donenfeld
>>> Sent: 14 December 2016 00:17
>>> This gives a clear speed and security improvement. Rather than manually
>>> filling MD5 buffers, we simply create a layout by a simple anonymous
>>> struct, for which gcc generates rather efficient code.
>> ...
>>> + const struct {
>>> + struct in6_addr saddr;
>>> + struct in6_addr daddr;
>>> + __be16 sport;
>>> + __be16 dport;
>>> + } __packed combined = {
>>> + .saddr = *(struct in6_addr *)saddr,
>>> + .daddr = *(struct in6_addr *)daddr,
>>> + .sport = sport,
>>> + .dport = dport
>>> + };
>>
>> You need to look at the effect of marking this (and the other)
>> structures 'packed' on architectures like sparc64.
> 
> In all current uses of __packed in the code, I think the impact is
> precisely zero, because all structures have members in descending
> order of size, with each member being a perfect multiple of the one
> below it. The __packed is therefore just there for safety, in case
> somebody comes in and screws everything up by sticking a u8 in
> between. In that case, it wouldn't be desirable to hash the structure
> padding bits. In the worst case, I don't believe the impact would be
> worse than a byte-by-byte memcpy, which is what the old code did. But
> anyway, these structures are already naturally packed anyway, so the
> present impact is nil.

__packed not only removes all padding of the struct but also changes the
alignment assumptions for the whole struct itself. The rule, the struct
is aligned by its maximum alignment of a member is no longer true. That
said, the code accessing this struct will change (not on archs that can
deal efficiently with unaligned access, but on others).

A proper test for not introducing padding is to use something with
BUILD_BUG_ON. Also gcc also clears the padding of the struct, so padding
shouldn't be that bad in there (it is better than byte access on mips).

Btw. I think gcc7 gets support for store merging optimization.

Bye,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/2] Crypto kernel tls socket

2015-11-24 Thread Hannes Frederic Sowa
Hello,

Stephan Mueller  writes:

> Am Dienstag, 24. November 2015, 18:34:55 schrieb Herbert Xu:
>
> Hi Herbert,
>
>>On Mon, Nov 23, 2015 at 09:43:02AM -0800, Dave Watson wrote:
>>> Userspace crypto interface for TLS.  Currently supports gcm(aes) 128bit
>>> only, however the interface is the same as the rest of the SOCK_ALG
>>> interface, so it should be possible to add more without any user interface
>>> changes.
>>
>>SOCK_ALG exists to export crypto algorithms to user-space.  So if
>>we decided to support TLS as an algorithm then I guess this makes
>>sense.
>>
>>However, I must say that it wouldn't have been my first pick.  I'd
>>imagine a TLS socket to look more like a TCP socket, or perhaps a
>>KCM socket as proposed by Tom.
>
> If I may ask: what is the benefit of having TLS in kernel space? I do not see 
> any reason why higher-level protocols should be in the kernel as they do not 
> relate to accessing hardware.

There are some crypto acclerators out there so that putting tls into the
kernel would give a net benefit, because otherwise user space has to
copy data into the kernel for device access and back to user space until
it can finally be send out on the wire.

Since processors provide aesni and other crypto extensions as part of
their instruction set architecture, this, of course, does not make sense
any more.

> The only reason I could fathom is to keep the negotiated keys in a secure 
> realm. But that could be done without having parts or the whole TLS protocol 
> stack in the kernel. If the key management should stay in the kernel as a 
> protection domain, I would rather think that the kernel should offer a plug-
> and-play of raw ciphers where user space is responsible to form a protocol. 
> This way, we do not limit such key management to TLS, but allow any kind of 
> protocol to use it.
>
> E.g. the kernel performs the key transport with RSA or key agreement with DH 
> using keyring-based key material. The resulting shared secret is again 
> maintained in the key ring where user space can use the symmetric ciphers of 
> the kernel with those keys. User space would only see references to keys but 
> no real keys. However, only user space knows when to invoke what cipher to 
> implement a specific protocol.

You could also keep the secret in a master process and talk to that via
ipc.

Bye,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG/PATCH] kernel RNG and its secrets

2015-03-18 Thread Hannes Frederic Sowa


On Wed, Mar 18, 2015, at 10:53, mancha wrote:
 Hi.
 
 The kernel RNG introduced memzero_explicit in d4c5efdb9777 to protect
 memory cleansing against things like dead store optimization:
 
void memzero_explicit(void *s, size_t count)
{
memset(s, 0, count);
OPTIMIZER_HIDE_VAR(s);
}
 
 OPTIMIZER_HIDE_VAR, introduced in fe8c8a126806 to protect crypto_memneq
 against timing analysis, is defined when using gcc as:
 
#define OPTIMIZER_HIDE_VAR(var) __asm__ ( : =r (var) : 0 (var))
 
 My tests with gcc 4.8.2 on x86 find it insufficient to prevent gcc from
 optimizing out memset (i.e. secrets remain in memory).
 
 Two things that do work:
 
__asm__ __volatile__ ( : =r (var) : 0 (var))

You are correct, volatile signature should be added to
OPTIMIZER_HIDE_VAR. Because we use an output variable =r, gcc is
allowed to check if it is needed and may remove the asm statement.
Another option would be to just use var as an input variable - asm
blocks without output variables are always considered being volatile by
gcc.

Can you send a patch?

I don't think it is security critical, as Daniel pointed out, the call
will happen because the function is an external call to the crypto
functions, thus the compiler has to flush memory on return.

Bye,
Hannes
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG/PATCH] kernel RNG and its secrets

2015-03-18 Thread Hannes Frederic Sowa
On Wed, Mar 18, 2015, at 12:09, Stephan Mueller wrote:
 Am Mittwoch, 18. März 2015, 11:56:43 schrieb Daniel Borkmann:
 On 03/18/2015 11:50 AM, Hannes Frederic Sowa wrote:
  On Wed, Mar 18, 2015, at 10:53, mancha wrote:
  Hi.
  
  The kernel RNG introduced memzero_explicit in d4c5efdb9777 to
  protect
  
  memory cleansing against things like dead store optimization:
  void memzero_explicit(void *s, size_t count)
  {
  
  memset(s, 0, count);
  OPTIMIZER_HIDE_VAR(s);
  
  }
  
  OPTIMIZER_HIDE_VAR, introduced in fe8c8a126806 to protect
  crypto_memneq 
  against timing analysis, is defined when using gcc as:
  #define OPTIMIZER_HIDE_VAR(var) __asm__ ( : =r (var) : 0
  (var))
  
  My tests with gcc 4.8.2 on x86 find it insufficient to prevent gcc
  from optimizing out memset (i.e. secrets remain in memory).
  
  Two things that do work:
  __asm__ __volatile__ ( : =r (var) : 0 (var))
  
  You are correct, volatile signature should be added to
  OPTIMIZER_HIDE_VAR. Because we use an output variable =r, gcc is
  allowed to check if it is needed and may remove the asm statement.
  Another option would be to just use var as an input variable - asm
  blocks without output variables are always considered being volatile
  by gcc.
  
  Can you send a patch?
  
  I don't think it is security critical, as Daniel pointed out, the
  call
  will happen because the function is an external call to the crypto
  functions, thus the compiler has to flush memory on return.
 
 Just had a look.
 
 $ gdb vmlinux
 (gdb) disassemble memzero_explicit
 Dump of assembler code for function memzero_explicit:
 0x813a18b0 +0: push   %rbp
 0x813a18b1 +1: mov%rsi,%rdx
 0x813a18b4 +4: xor%esi,%esi
 0x813a18b6 +6: mov%rsp,%rbp
 0x813a18b9 +9: callq  0x813a7120 memset
 0x813a18be +14:pop%rbp
 0x813a18bf +15:retq
 End of assembler dump.
 
 (gdb) disassemble extract_entropy
 [...]
 0x814a5000 +304:   sub%r15,%rbx
 0x814a5003 +307:   jne0x814a4f80
 extract_entropy+176 0x814a5009 +313: mov%r12,%rdi
 0x814a500c +316:   mov$0xa,%esi
 0x814a5011 +321:   callq  0x813a18b0
 memzero_explicit 0x814a5016 +326:mov-0x48(%rbp),%rax
 [...]
 
 I would be fine with __volatile__.
 
 Are we sure that simply adding a __volatile__ works in any case? I just 
 did a test with a simple user space app:
 
 static inline void memset_secure(void *s, int c, size_t n)
 {
 memset(s, c, n);
 //__asm__ __volatile__(: : :memory);
 __asm__ __volatile__( : =r (s) : 0 (s));
 }
 

Good point, thanks!

Of course an input or output of s does not force the memory pointed to
by s being flushed.


My proposal would be to add a

#define OPTIMIZER_HIDE_MEM(ptr, len) __asm__ __volatile__ ( : : m(
({ struct { u8 b[len]; } *p = (void *)ptr ; *p; }) )

and use this in the code function.

This is documented in gcc manual 6.43.2.5.

Bye,
Hannes


--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG/PATCH] kernel RNG and its secrets

2015-03-18 Thread Hannes Frederic Sowa


On Wed, Mar 18, 2015, at 13:14, Stephan Mueller wrote:
 Am Mittwoch, 18. März 2015, 13:02:12 schrieb Hannes Frederic Sowa:
 
 Hi Hannes,
 
 On Wed, Mar 18, 2015, at 12:09, Stephan Mueller wrote:
  Am Mittwoch, 18. März 2015, 11:56:43 schrieb Daniel Borkmann:
  On 03/18/2015 11:50 AM, Hannes Frederic Sowa wrote:
   On Wed, Mar 18, 2015, at 10:53, mancha wrote:
   Hi.
   
   The kernel RNG introduced memzero_explicit in d4c5efdb9777 to
   protect
   
   memory cleansing against things like dead store optimization:
   void memzero_explicit(void *s, size_t count)
   {
   
   memset(s, 0, count);
   OPTIMIZER_HIDE_VAR(s);
   
   }
   
   OPTIMIZER_HIDE_VAR, introduced in fe8c8a126806 to protect
   crypto_memneq
   
   against timing analysis, is defined when using gcc as:
   #define OPTIMIZER_HIDE_VAR(var) __asm__ ( : =r (var) :
   0
   (var))
   
   My tests with gcc 4.8.2 on x86 find it insufficient to prevent
   gcc
   from optimizing out memset (i.e. secrets remain in memory).
   
   Two things that do work:
   __asm__ __volatile__ ( : =r (var) : 0 (var))
   
   You are correct, volatile signature should be added to
   OPTIMIZER_HIDE_VAR. Because we use an output variable =r, gcc is
   allowed to check if it is needed and may remove the asm statement.
   Another option would be to just use var as an input variable - asm
   blocks without output variables are always considered being
   volatile
   by gcc.
   
   Can you send a patch?
   
   I don't think it is security critical, as Daniel pointed out, the
   call
   will happen because the function is an external call to the crypto
   functions, thus the compiler has to flush memory on return.
  
  Just had a look.
  
  $ gdb vmlinux
  (gdb) disassemble memzero_explicit
  
  Dump of assembler code for function memzero_explicit:
  0x813a18b0 +0:  push   %rbp
  0x813a18b1 +1:  mov%rsi,%rdx
  0x813a18b4 +4:  xor%esi,%esi
  0x813a18b6 +6:  mov%rsp,%rbp
  0x813a18b9 +9:  callq  0x813a7120 
 memset
  0x813a18be +14: pop%rbp
  0x813a18bf +15: retq
  
  End of assembler dump.
  
  (gdb) disassemble extract_entropy
  [...]
  
  0x814a5000 +304:sub%r15,%rbx
  0x814a5003 +307:jne0x814a4f80
  
  extract_entropy+176 0x814a5009 +313:  mov%r12,%rdi
  
  0x814a500c +316:mov$0xa,%esi
  0x814a5011 +321:callq  0x813a18b0
  
  memzero_explicit 0x814a5016 +326: mov   
  -0x48(%rbp),%rax
  [...]
  
  I would be fine with __volatile__.
  
  Are we sure that simply adding a __volatile__ works in any case? I
  just did a test with a simple user space app:
  
  static inline void memset_secure(void *s, int c, size_t n)
  {
  
  memset(s, c, n);
  //__asm__ __volatile__(: : :memory);
  __asm__ __volatile__( : =r (s) : 0 (s));
  
  }
 
 Good point, thanks!
 
 Of course an input or output of s does not force the memory pointed to
 by s being flushed.
 
 
 My proposal would be to add a
 
 #define OPTIMIZER_HIDE_MEM(ptr, len) __asm__ __volatile__ ( : : m(
 ({ struct { u8 b[len]; } *p = (void *)ptr ; *p; }) )
 
 and use this in the code function.
 
 This is documented in gcc manual 6.43.2.5.
 
 That one adds the zeroization instructuctions. But now there are much 
 more than with the barrier.
 
   400469:   48 c7 04 24 00 00 00movq   $0x0,(%rsp)
   400470:   00 
   400471:   48 c7 44 24 08 00 00movq   $0x0,0x8(%rsp)
   400478:   00 00 
   40047a:   c7 44 24 10 00 00 00movl   $0x0,0x10(%rsp)
   400481:   00 
   400482:   48 c7 44 24 20 00 00movq   $0x0,0x20(%rsp)
   400489:   00 00 
   40048b:   48 c7 44 24 28 00 00movq   $0x0,0x28(%rsp)
   400492:   00 00 
   400494:   c7 44 24 30 00 00 00movl   $0x0,0x30(%rsp)
   40049b:   00
 
 Any ideas?

Hmm, correct definition of u8?

Which version of gcc do you use? I can't see any difference if I compile
your example at -O2.

Bye,
Hannes

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG/PATCH] kernel RNG and its secrets

2015-03-18 Thread Hannes Frederic Sowa


On Wed, Mar 18, 2015, at 13:42, Daniel Borkmann wrote:
 On 03/18/2015 01:20 PM, Stephan Mueller wrote:
  Am Mittwoch, 18. März 2015, 13:19:07 schrieb Hannes Frederic Sowa:
  My proposal would be to add a
 
  #define OPTIMIZER_HIDE_MEM(ptr, len) __asm__ __volatile__ ( : :
  m(
  ({ struct { u8 b[len]; } *p = (void *)ptr ; *p; }) )
 
  and use this in the code function.
 
  This is documented in gcc manual 6.43.2.5.
 
  That one adds the zeroization instructuctions. But now there are much
  more than with the barrier.
 
 400469:   48 c7 04 24 00 00 00movq   $0x0,(%rsp)
 400470:   00
 400471:   48 c7 44 24 08 00 00movq   $0x0,0x8(%rsp)
 400478:   00 00
 40047a:   c7 44 24 10 00 00 00movl   $0x0,0x10(%rsp)
 400481:   00
 400482:   48 c7 44 24 20 00 00movq   $0x0,0x20(%rsp)
 400489:   00 00
 40048b:   48 c7 44 24 28 00 00movq   $0x0,0x28(%rsp)
 400492:   00 00
 400494:   c7 44 24 30 00 00 00movl   $0x0,0x30(%rsp)
 40049b:   00
 
  Any ideas?
 
  Hmm, correct definition of u8?
 
  I use unsigned char
 
  Which version of gcc do you use? I can't see any difference if I
  compile your example at -O2.
 
  gcc-Version 4.9.2 20150212 (Red Hat 4.9.2-6) (GCC)

Well, was an error on my side, I see the same behavior.

 
 I can see the same with the gcc version I previously posted. So
 it clears the 20 bytes from your example (movq, movq, movl) at
 two locations, presumably buf[] and b[].

Yes, it looks like that. The reservation on the stack changes, too.

Seems like just using barrier() is the best and easiest option.

Thanks,
Hannes
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -crypto] lib: memzero_explicit: use barrier instead of OPTIMIZER_HIDE_VAR

2015-03-18 Thread Hannes Frederic Sowa
On Wed, Mar 18, 2015, at 18:47, Daniel Borkmann wrote:
 From: mancha security manc...@zoho.com
 
 OPTIMIZER_HIDE_VAR(), as defined when using gcc, is insufficient to
 ensure protection from dead store optimization.
 
 For the random driver and crypto drivers, calls are emitted ...
 
   $ gdb vmlinux
   (gdb) disassemble memzero_explicit
   Dump of assembler code for function memzero_explicit:
 0x813a18b0 +0:push   %rbp
 0x813a18b1 +1:mov%rsi,%rdx
 0x813a18b4 +4:xor%esi,%esi
 0x813a18b6 +6:mov%rsp,%rbp
 0x813a18b9 +9:callq  0x813a7120 memset
 0x813a18be +14:   pop%rbp
 0x813a18bf +15:   retq
   End of assembler dump.
 
   (gdb) disassemble extract_entropy
   [...]
 0x814a5009 +313:  mov%r12,%rdi
 0x814a500c +316:  mov$0xa,%esi
 0x814a5011 +321:  callq  0x813a18b0
 memzero_explicit
 0x814a5016 +326:  mov-0x48(%rbp),%rax
   [...]
 
 ... but in case in future we might use facilities such as LTO, then
 OPTIMIZER_HIDE_VAR() is not sufficient to protect gcc from a possible
 eviction of the memset(). We have to use a compiler barrier instead.
 
 Minimal test example when we assume memzero_explicit() would *not* be
 a call, but would have been *inlined* instead:
 
   static inline void memzero_explicit(void *s, size_t count)
   {
 memset(s, 0, count);
 foo
   }
 
   int main(void)
   {
 char buff[20];
 
 snprintf(buff, sizeof(buff) - 1, test);
 printf(%s, buff);
 
 memzero_explicit(buff, sizeof(buff));
 return 0;
   }
 
 With foo := OPTIMIZER_HIDE_VAR():
 
   (gdb) disassemble main
   Dump of assembler code for function main:
   [...]
0x00400464 +36:   callq  0x400410 printf@plt
0x00400469 +41:   xor%eax,%eax
0x0040046b +43:   add$0x28,%rsp
0x0040046f +47:   retq
   End of assembler dump.
 
 With foo := barrier():
 
   (gdb) disassemble main
   Dump of assembler code for function main:
   [...]
0x00400464 +36:   callq  0x400410 printf@plt
0x00400469 +41:   movq   $0x0,(%rsp)
0x00400471 +49:   movq   $0x0,0x8(%rsp)
0x0040047a +58:   movl   $0x0,0x10(%rsp)
0x00400482 +66:   xor%eax,%eax
0x00400484 +68:   add$0x28,%rsp
0x00400488 +72:   retq
   End of assembler dump.
 
 As can be seen, movq, movq, movl are being emitted inlined
 via memset().
 
 Reference: http://thread.gmane.org/gmane.linux.kernel.cryptoapi/13764/
 Fixes: d4c5efdb9777 (random: add and use memzero_explicit() for clearing
 data)
 Cc: Hannes Frederic Sowa han...@stressinduktion.org
 Cc: Stephan Mueller smuel...@chronox.de
 Cc: Theodore Ts'o ty...@mit.edu
 Signed-off-by: mancha security manc...@zoho.com
 Signed-off-by: Daniel Borkmann dan...@iogearbox.net
 ---
  Sending to Herbert as crypto/random are the main users.
  Based against -crypto tree. Thanks!

Acked-by: Hannes Frederic Sowa han...@stressinduktion.org

Still checking on how to realize the test. Thanks!
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG/PATCH] kernel RNG and its secrets

2015-03-18 Thread Hannes Frederic Sowa
On Wed, Mar 18, 2015, at 18:41, Theodore Ts'o wrote:
 Maybe we should add a kernel self-test that automatically checks
 whether or not memset_explicit() gets optimized away?  Otherwise we
 might not notice when gcc or how we implement barrier() or whatever
 else we end up using ends up changing.
 
 It shold be something that is really fast, so it might be a good idea
 to simply automatically run it as part of an __initcall()
 unconditionally.  We can debate where the __initcall() lives, but I'd
 prefer that it be run even if the crypto layer isn't configured for
 some reason.  Hopefully such an self-test is small enough that the
 kernel bloat people won't complain.  :-)
 
-Ted

Maybe a BUILD_BUGON: ;)

__label__ l1, l2;
char buffer[1024];
l1:
memset(buffer, 0, 1024);
l2:
  BUILD_BUGON(l1 == l2);

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] crypto: memzero_explicit - make sure to clear out sensitive data

2014-09-08 Thread Hannes Frederic Sowa
On So, 2014-09-07 at 23:23 +0200, Daniel Borkmann wrote:
 Recently, in commit 13aa93c70e71 (random: add and use memzero_explicit()
 for clearing data), we have found that GCC may optimize some memset()
 cases away when it detects a stack variable is not being used anymore
 and going out of scope. This can happen, for example, in cases when we
 are clearing out sensitive information such as keying material or any
 e.g. intermediate results from crypto computations, etc.
 
 With the help of Coccinelle, we can figure out and fix such occurences
 in the crypto subsytem as well. Julia Lawall provided the following
 Coccinelle program:
 
   @@
   type T;
   identifier x;
   @@
 
   T x;
   ... when exists
   when any
   -memset
   +memzero_explicit
  (x,
   -0,
  ...)
   ... when != x
   when strict
 
   @@
   type T;
   identifier x;
   @@
 
   T x[...];
   ... when exists
   when any
   -memset
   +memzero_explicit
  (x,
   -0,
  ...)
   ... when != x
   when strict

I think this Coccinelle patch won't make it completely unnecessary for a
manual audit as it does not take optimizations (dead code eliminitation)
into account?

 
 Therefore, make use of the drop-in replacement memzero_explicit() for
 exactly such cases instead of using memset().
 
 Signed-off-by: Daniel Borkmann dbork...@redhat.com
 Cc: Julia Lawall julia.law...@lip6.fr
 Cc: Herbert Xu herb...@gondor.apana.org.au
 Cc: Theodore Ts'o ty...@mit.edu
 Cc: Hannes Frederic Sowa han...@stressinduktion.org

Acked-by: Hannes Frederic Sowa han...@stressinduktion.org

Thanks,
Hannes


--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Add TLS record layer encryption module

2014-08-25 Thread Hannes Frederic Sowa
Hi,

On Di, 2014-07-29 at 12:32 +0300, Cristian Stoica wrote:
 This set of patches introduces support for TLS 1.0 record layer
 encryption/decryption with a corresponding algorithm called
 tls10(hmac(hash),cbc(cipher)).
 
 Similarly to authenc.c on which it is based, this module mixes the base
 algorithms in software to produce an algorithm that does record layer
 encryption and decryption for TLS1.0.
 Any combination of hw and sw base algorithms is possible, but the purpose
 is to take advantage of hardware acceleration for TLS record layer offloading
 when hardware acceleration is present.
 
 This is a software alternative to forthcoming Freescale caam patches that
 will add support for one-pass hardware-only TLS record layer offloading.
 
 Performance figures depend largely on several factors including hardware
 support and record layer size. For user-space applications the
 kernel/user-space interface is also important. That said, we have done several
 performance tests using openssl and cryptodev on Freescale QorIQ platforms.
 On P4080, for a single stream of records larger than 512 bytes, the 
 performance
 improved from about 22Mbytes/s to 64Mbytes/s while also reducing CPU load.
 
 The purpose of this module is to enable TLS kernel offloading on hw platforms
 that have acceleration for AES/SHA1 but do not have direct support for TLS
 record layer.
 
 (minor dependency on pending patch
   crypto: testmgr.c: white space fix-ups on test_aead)
 
 Cristian Stoica (2):
   crypto: add support for TLS 1.0 record encryption
   crypto: add TLS 1.0 test vectors for AES-CBC-HMAC-SHA1
 
  crypto/Kconfig   |  20 ++
  crypto/Makefile  |   1 +
  crypto/authenc.c |   5 +-
  crypto/tcrypt.c  |   5 +
  crypto/testmgr.c |  41 +++-
  crypto/testmgr.h | 217 +++
  crypto/tls.c | 528 
 +++
  include/crypto/authenc.h |   3 +
  8 files changed, 808 insertions(+), 12 deletions(-)
  create mode 100644 crypto/tls.c
 

Maybe could you add net...@vger.kernel.org to Cc on your next
submission?

It would be great if this feature would be made available in some way
that user space does TLS handshaking over a socket and symmetric keys
could later be installed via e.g. setsockopt and kernel offloads tls
processing over this socket.

Alert message handling seems problematic, though and might require some
out-of-band interface.

Thanks,
Hannes


--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] random: introduce getrandom(2) system call

2014-07-23 Thread Hannes Frederic Sowa
Hi,

On Wed, Jul 23, 2014, at 00:59, Theodore Ts'o wrote:
 But why would you need to use GRND_RANDOM in your scenario, and accept
 your application potentially getting stalled and stuck in amber for
 perhaps hours?  If you are going to accept your application stalling
 like that, you can do the pointer arithmatic.  It's really not hard,
 and someone who can't do that, again, shouldn't be allowd anywhere
 near crypto code in the first place (and if they are, they'll probably
 be making lots of other, equally fatal if not more so, newbie
 mistakes).

I favored the idea of having a non-failing non-partial-read getrandom
syscall. But I am with you if it often causes long stalls that we should
stick to the old semantics.

Thanks,
Hannes

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] random: introduce getrandom(2) system call

2014-07-23 Thread Hannes Frederic Sowa


On Wed, Jul 23, 2014, at 13:52, George Spelvin wrote:
 I keep wishing for a more general solution.  For example, some way to
 have a spare extra fd that could be accessed with a special O_NOFAIL
 flag.
 
 That would allow any number of library functions to not fail, such as
 logging from nasty corner cases.
 
 But you'd have to provide one per thread, and block non-fatal signals
 while it was open, so you don't get reentrancy problems.  Ick.
 
 
 This overly-specialized system call (and worse yet, a blocking
 system call that you can't put into a poll() loop) just feels ugly
 to me.  Is it *absolutely* necessary?

One point that often came up besides fd exhaustion is missing
/dev/u?random device nodes in chroot environments.

I also thought about a more general interface, like e.g. an
opennod(dev_t device, int flags) call but all those ideas ended up being
very complex changes besides having design issues. getrandom is simple
and solves a real problem.

The only problem I see, that we allow access to /dev/random without
checking any permission bits like we did on opening /dev/random before
and we cannot restrict applications to deplete the whole entropy pool.

 For example, how about simply making getentropy() a library function that
 aborts if it can't open /dev/urandom?  If you're suffering fd exhaustion,
 you're being DoSed already.

Maybe applications want to mitigate fd exhaustion.

Bye,
Hannes
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] random: introduce getrandom(2) system call

2014-07-22 Thread Hannes Frederic Sowa
Hello,

On Di, 2014-07-22 at 00:44 -0400, Theodore Ts'o wrote:
 On Tue, Jul 22, 2014 at 03:02:20AM +0200, Hannes Frederic Sowa wrote:
  
  Ted, would it make sense to specifiy a 512 byte upper bound limit for
  random entropy extraction (I am not yet convinced to do that for
  urandom) and in case the syscall should block we make sure that we
  extract the amount of bytes the user requested?
 
 On some systems, it's still possible that with a large /dev/random
 extraction, you could end up blocking for hours.  So either the
 getrandom(2) syscall needs to be uninterruptible, which has one set of
 problems (and not just the user typing ^C, but also things like being
 able to process alarms, which is highly problematic indeed), or you
 need to allow it to be interruptible by a signal, in which case
 userspace needs to check the error return for things like EINTR
 anyway.  And if you have to check the error return, you might as well
 check the number of bytes returned.

I think a lot of checks are of the type if (getrandom()  0), so this
actually was the kind of programming errors I wanted to guard against. 
Also, on some systems it is very likely that we return a short write to
user space, so it prevents this very common situation. Even if one would
like to extract 64 bytes randomness, one would often need two calls to
getrandom() on my virtualized systems. Would be great to know that in
blocking mode, either I get a -1 or the buffer is always filled and I
won't need to do pointer arithmetic to fill the rest of the buffer.

I still think it is a good idea even without caring about the signal
interruptions.

 Yes, one could in theory set up a new variant of uninterruptible
 signals that only exited if the signal caused the process to exit, and
 otherwise, forced a system call restart even if SA_INTERRUPTIBLE was
 not set in sigalarim, but that's add *way* more complexity than this
 deserves.

I don't want to do that. It is totally ok if we return -1 and set errno
to EINTR in case of pending signals.

 Basically, I view /dev/random as an advanced call, and someone who
 uses it should know what they are doing.  It's not the default for a
 reason.

I agree, but I still think this would make the interface a bit more
friendly to use.

Thanks,
Hannes


--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] random: introduce getrandom(2) system call

2014-07-21 Thread Hannes Frederic Sowa
On Mo, 2014-07-21 at 07:21 -0400, George Spelvin wrote:
  I don't like partial reads/writes and think that a lot of people get
  them wrong, because they often only check for negative return values.
 
 The v1 patch, which did it right IMHO, didn't do partial reads in the
 case we're talking about:
 
 + if (count  256)
 + return -EINVAL;

I checked and unfortunately it does not; 256 bytes is way too large to
guarantee non-existence of partial reads. On a typical older system
without rdrand/rdseed you would have to limit the amount of bytes to
extract to about 32. That's way too low. That said, the 512 bytes check
only in case of extracting bytes from blocking pool would serve no
purpose.

  In case of urandom extraction, I wouldn't actually limit the number of
  bytes. A lot of applications I have seen already extract more than 128
  out of urandom (not for seeding a prng but just to mess around with some
  memory). I don't see a reason why getrandom shouldn't be used for that.
  It just adds one more thing to look out for if using getrandom() in
  urandom mode, especially during porting an application over to this new
  interface.
 
 Again, I disagree.  If it's just messing around code, use /dev/urandom.
 It's more portable and you don't care about the fd exhaustion attacks.
 
 If it's actual code to be used in anger, fix it to not abuse /dev/urandom.
 
 You're right that a quick hack might be broken on purpose, but without
 exception, *all* code that I have seen which reads 64 or more bytes from
 /dev/*random is broken, and highlighting the brokenness is a highly
 desirable thing.
 
 The sole and exclusive reason for this syscall to exist at all is to
 solve a security problem.  Supporting broken security code does no favors
 to anyone.

Then let's agree to disagree. :)

I think it is dangerous if application will get ported to this new
interface without checking size limitations and will only notice it
after the applications will be rolled out (if at all).

Bye,
Hannes


--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] random: introduce getrandom(2) system call

2014-07-21 Thread Hannes Frederic Sowa
On Mo, 2014-07-21 at 17:27 +0200, Hannes Frederic Sowa wrote:
 On Mo, 2014-07-21 at 07:21 -0400, George Spelvin wrote:
   I don't like partial reads/writes and think that a lot of people get
   them wrong, because they often only check for negative return values.
  
  The v1 patch, which did it right IMHO, didn't do partial reads in the
  case we're talking about:
  
  +   if (count  256)
  +   return -EINVAL;
 
 I checked and unfortunately it does not; 256 bytes is way too large to
 guarantee non-existence of partial reads. On a typical older system
 without rdrand/rdseed you would have to limit the amount of bytes to
 extract to about 32. That's way too low. That said, the 512 bytes check
 only in case of extracting bytes from blocking pool would serve no
 purpose.

Ted, would it make sense to specifiy a 512 byte upper bound limit for
random entropy extraction (I am not yet convinced to do that for
urandom) and in case the syscall should block we make sure that we
extract the amount of bytes the user requested?

Having a quick look maybe something like this? Only compile tested and
maybe can still be simplified. It guarantees we don't do a partial write
to user space for sub 512 bytes requests.

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 2721543..c0db6f5 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1345,8 +1345,14 @@ static int arch_random_refill(void)
return n;
 }
 
+enum blocking_mode {
+   NONBLOCKING,
+   SYSCALL_BLOCK,
+   DEV_RANDOM_BLOCK
+};
+
 static ssize_t
-_random_read(int nonblock, char __user *buf, size_t nbytes)
+_random_read(enum blocking_mode mode, char __user *buf, size_t nbytes)
 {
ssize_t n;
 
@@ -1361,7 +1367,7 @@ _random_read(int nonblock, char __user *buf, size_t 
nbytes)
trace_random_read(n*8, (nbytes-n)*8,
  ENTROPY_BITS(blocking_pool),
  ENTROPY_BITS(input_pool));
-   if (n  0)
+   if (mode != SYSCALL_BLOCK  n  0)
return n;
 
/* Pool is (near) empty.  Maybe wait and retry. */
@@ -1370,7 +1376,7 @@ _random_read(int nonblock, char __user *buf, size_t 
nbytes)
if (arch_random_refill())
continue;
 
-   if (nonblock)
+   if (mode == NONBLOCKING)
return -EAGAIN;
 
wait_event_interruptible(random_read_wait,
@@ -1384,7 +1390,8 @@ _random_read(int nonblock, char __user *buf, size_t 
nbytes)
 static ssize_t
 random_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
 {
-   return _random_read(file-f_flags  O_NONBLOCK, buf, nbytes);
+   return _random_read(file-f_flags  O_NONBLOCK ? NONBLOCKING :
+   DEV_RANDOM_BLOCK, buf, nbytes);
 }
 
 static ssize_t
@@ -1534,8 +1541,6 @@ const struct file_operations urandom_fops = {
 SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count,
unsigned int, flags)
 {
-   int r;
-
if (flags  ~(GRND_NONBLOCK|GRND_RANDOM))
return -EINVAL;
 
@@ -1543,7 +1548,8 @@ SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, 
count,
count = INT_MAX;
 
if (flags  GRND_RANDOM)
-   return _random_read(flags  GRND_NONBLOCK, buf, count);
+   return _random_read(flags  GRND_NONBLOCK ? NONBLOCKING :
+   SYSCALL_BLOCK, buf, count);
 
if (unlikely(nonblocking_pool.initialized == 0)) {
if (flags  GRND_NONBLOCK)

Bye,
Hannes


--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] random: introduce getrandom(2) system call

2014-07-20 Thread Hannes Frederic Sowa
On So, 2014-07-20 at 13:03 -0400, George Spelvin wrote:
  In the end people would just recall getentropy in a loop and fetch 256
  bytes each time. I don't think the artificial limit does make any sense.
  I agree that this allows a potential misuse of the interface, but
  doesn't a warning in dmesg suffice?
 
 It makes their code not work, so they can are forced to think about
 fixing it before adding the obvious workaround.
 
  It also makes it easier to port applications from open(/dev/*random),
  read(...) to getentropy() by reusing the same limits.
 
 But such an application *is broken*.  Making it easier to port is
 an anti-goal.  The goal is to make it enough of a hassle that
 people will *fix* their code.
 
 There's a *reason* that the /dev/random man page explicitly tells
 people not to trust software that reads more than 32 bytes at a time
 from /dev/random:
 
  While some safety margin above that minimum is reasonable, as a guard
  against flaws in the CPRNG algorithm, no cryptographic primitive avail-
  able today can hope to promise more than 256 bits of security, so if
  any program reads more than 256 bits (32 bytes) from the kernel random
  pool per invocation, or per reasonable reseed interval (not less than
  one minute), that should be taken as a sign that its cryptography is
  *not* skillfully implemented.
 
 (not skilfuly implemented was the phrase chosen after some discussion to
 convey either a quick hack or something you dhouldn't trust.)
 
 To expand on what I said in my mail to Ted, 256 is too high.
 I'd go with OpenBSD's 128 bytes or even drop it to 64.

I don't like partial reads/writes and think that a lot of people get
them wrong, because they often only check for negative return values.

I thought about the following check (as replacement for the old check):

/* we will always generate a partial buffer fill */
if (flags  GRND_RANDOM  count  512)
return -EINVAL;

We could also be more conservative and return -EINVAL in case a stray
write happened if one tried to extract less than 512 by checking the
return values of random_read(), but somehow this sounds dangerous to me.

In case of urandom extraction, I wouldn't actually limit the number of
bytes. A lot of applications I have seen already extract more than 128
out of urandom (not for seeding a prng but just to mess around with some
memory). I don't see a reason why getrandom shouldn't be used for that.
It just adds one more thing to look out for if using getrandom() in
urandom mode, especially during porting an application over to this new
interface.

Bye,
Hannes


--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] random: introduce getrandom(2) system call

2014-07-17 Thread Hannes Frederic Sowa
On Do, 2014-07-17 at 05:18 -0400, Theodore Ts'o wrote:
 SYNOPSIS
   #include linux/random.h
 
   int getrandom(void *buf, size_t buflen, unsigned int flags);

Cool, I think the interface is sane.

Btw. couldn't libressl etc. fall back to binary_sysctl
kernel.random.uuid and seed with that as a last resort? We have it
available for few more years.

 +SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count,
 + unsigned int, flags)
 +{
 + int r;
 +
 + if (count  256)
 + return -EINVAL;
 +

Why this arbitrary limitation? Couldn't we just check for  SSIZE_MAX
or to be more conservative to INT_MAX?

 + if (flags  GRND_RANDOM) {
 + return _random_read(!(flags  GRND_BLOCK), buf, count);
 + }
 + if (flags  GRND_BLOCK) {
 + r = wait_for_completion_interruptible(urandom_initialized);
 + if (r)
 + return r;
 + } else if (!completion_done(urandom_initialized))
 + return -EAGAIN;
 + return urandom_read(NULL, buf, count, NULL);
 +}
 +

Great, thanks Ted,
Hannes



--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] random: introduce getrandom(2) system call

2014-07-17 Thread Hannes Frederic Sowa
On Do, 2014-07-17 at 08:52 -0400, Theodore Ts'o wrote:
 On Thu, Jul 17, 2014 at 12:57:07PM +0200, Hannes Frederic Sowa wrote:
  
  Btw. couldn't libressl etc. fall back to binary_sysctl
  kernel.random.uuid and seed with that as a last resort? We have it
  available for few more years.
 
 Yes, they could.  But trying to avoid more uses of binary_sysctl seems
 to be a good thing, I think.  The other thing is that is that this
 interface provides is the ability to block until the entropy pool is
 initialized, which isn't a big deal for x86 systems, but might be
 useful as a gentle forcing function to force ARM systems to figure out
 good ways of making sure the entropy pools are initialized (i.e., by
 actually providing !@#!@ cycle counter) without breaking userspace
 compatibility --- since this is a new interface.

I am not questioning this new interface - I like it - just wanted to
mention there is already a safe fallback for LibreSSL in the way they
already seem to do it in OpenBSD (via sysctl).

 
   + if (count  256)
   + return -EINVAL;
   +
  
  Why this arbitrary limitation? Couldn't we just check for  SSIZE_MAX
  or to be more conservative to INT_MAX?
 
 I'm not wedded to this limitation.  OpenBSD's getentropy(2) has an
 architected arbitrary limit of 128 bytes.  I haven't made a final
 decision if the right answer is to hard code some value, or make this
 limit be configurable, or remote the limit entirely (which in practice
 would be SSIZE_MAX or INT_MAX).
 
 The main argument I can see for putting in a limit is to encourage the
 proper use of the interface.  In practice, anything larger than 128
 probably means the interface is getting misused, either due to a bug
 or some other kind of oversight.
 
 For example, when I started instrumenting /dev/urandom, I caught
 Google Chrome pulling 4k out of /dev/urandom --- twice --- at startup
 time.  It turns out it was the fault of the NSS library, which was
 using fopen() to access /dev/urandom.  (Sigh.)

In the end people would just recall getentropy in a loop and fetch 256
bytes each time. I don't think the artificial limit does make any sense.
I agree that this allows a potential misuse of the interface, but
doesn't a warning in dmesg suffice?

It also makes it easier to port applications from open(/dev/*random),
read(...) to getentropy() by reusing the same limits.

I would vote for warning (at about 256 bytes) + no limit.

Thanks,
Hannes


--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] update consumers of MSG_MORE to recognize MSG_SENDPAGE_NOTLAST

2013-11-28 Thread Hannes Frederic Sowa
On Sun, Nov 24, 2013 at 10:36:28PM -0800, Shawn Landden wrote:
 Commit 35f9c09fe (tcp: tcp_sendpages() should call tcp_push() once)
 added an internal flag MSG_SENDPAGE_NOTLAST, similar to
 MSG_MORE.
 
 algif_hash, algif_skcipher, and udp used MSG_MORE from tcp_sendpages()
 and need to see the new flag as identical to MSG_MORE.
 
 This fixes sendfile() on AF_ALG.
 
 v3: also fix udp

The UDP bits look fine to me.

Greetings,

  Hannes

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] update consumers of MSG_MORE to recognize MSG_SENDPAGE_NOTLAST

2013-11-24 Thread Hannes Frederic Sowa
On Sun, Nov 24, 2013 at 06:08:59PM -0800, Shawn Landden wrote:
 Commit 35f9c09fe (tcp: tcp_sendpages() should call tcp_push() once)
 added an internal flag MSG_SENDPAGE_NOTLAST, similar to
 MSG_MORE.
 
 algif_hash and algif_skcipher used MSG_MORE from tcp_sendpages()
 and need to see the new flag as identical to MSG_MORE.
 
 This fixes sendfile() on AF_ALG.

Don't we need a similar fix for udp_sendpage?

Greetings,

  Hannes

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html