Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel

2017-10-13 Thread Andy Lutomirski
On Fri, Oct 13, 2017 at 12:09 PM, Linus Torvalds
 wrote:
> On Fri, Oct 13, 2017 at 6:56 AM, Andrey Ryabinin
>  wrote:
>>
>> This could be fixed by s/vmovdqa/vmovdqu change like bellow, but maybe the 
>> right fix
>> would be to align the data properly?
>
> I suspect anything that has the SHA extensions should also do
> unaligned loads efficiently. The whole "aligned only" model is broken.
> It's just doing two loads from the state pointer, there's likely no
> point in trying to align it.
>
> So your patch looks fine, but maybe somebody could add the required
> alignment to the sha256 context allocation (which I don't know where
> it is).

IIRC if we try the latter, then we'll risk hitting the #*!&@% gcc bug
that mostly prevents 16-byte alignment from working on GCC before 4.8
or so.  That way lies debugging disasters.


Re: [PATCH V2] Fix a sleep-in-atomic bug in shash_setkey_unaligned

2017-10-03 Thread Andy Lutomirski
On Mon, Oct 2, 2017 at 10:26 PM, Herbert Xu <herb...@gondor.apana.org.au> wrote:
> On Mon, Oct 02, 2017 at 09:18:24PM -0700, Andy Lutomirski wrote:
>> > On Oct 2, 2017, at 7:25 PM, Jia-Ju Bai <baijiaju1...@163.com> wrote:
>> >
>> > The SCTP program may sleep under a spinlock, and the function call path is:
>> > sctp_generate_t3_rtx_event (acquire the spinlock)
>> >  sctp_do_sm
>> >sctp_side_effects
>> >  sctp_cmd_interpreter
>> >sctp_make_init_ack
>> >  sctp_pack_cookie
>> >crypto_shash_setkey
>> >  shash_setkey_unaligned
>> >kmalloc(GFP_KERNEL)
>>
>> I'm going to go out on a limb here: why on Earth is out crypto API so
>> full of indirection that we allocate memory at all here?
>
> The crypto API operates on a one key per-tfm basis.  So normally
> tfm allocation and key setting is done once only and not done on
> the data path.
>
> I have looked at the SCTP code and it appears to fit this paradigm.
> That is, we should be able to allocate the tfm and set the key when
> the key is actually generated via get_random_bytes, rather than every
> time the key is used which is not only a waste but as you see runs
> into API issues.

It's a waste because it loses a pre-computation advantage.

The fact that it has memory allocation issues is crypto API's fault,
full stop.  There is no legit reason to need to allocate anything.


Re: [PATCH V2] Fix a sleep-in-atomic bug in shash_setkey_unaligned

2017-10-02 Thread Andy Lutomirski
> On Oct 2, 2017, at 7:25 PM, Jia-Ju Bai  wrote:
>
> The SCTP program may sleep under a spinlock, and the function call path is:
> sctp_generate_t3_rtx_event (acquire the spinlock)
>  sctp_do_sm
>sctp_side_effects
>  sctp_cmd_interpreter
>sctp_make_init_ack
>  sctp_pack_cookie
>crypto_shash_setkey
>  shash_setkey_unaligned
>kmalloc(GFP_KERNEL)
>

I'm going to go out on a limb here: why on Earth is out crypto API so
full of indirection that we allocate memory at all here?

We're synchronously computing a hash of a small amount of data using
either HMAC-SHA1 or HMAC-SHA256 (determined at runtime) if I read it
right.  There's a sane way to do this that doesn't need kmalloc,
alloca, or fancy indirection.  And then there's crypto_shash_xyz().

--Andy, who is sick of seeing stupid bugs caused by the fact that it's
basically impossible to use the crypto API in a sane way.

P.S. gnulib has:

int hmac_sha256 (const void *key, size_t keylen, const void *in,
size_t inlen, void *resbuf);

An init/update/final-style API would be nice, but something like this
would be a phenomenal improvement over what we have.


Re: x86-64: Maintain 16-byte stack alignment

2017-01-12 Thread Andy Lutomirski
On Thu, Jan 12, 2017 at 7:11 PM, Josh Poimboeuf <jpoim...@redhat.com> wrote:
> On Thu, Jan 12, 2017 at 05:46:55PM -0800, Andy Lutomirski wrote:
>> On Thu, Jan 12, 2017 at 12:15 PM, Josh Poimboeuf <jpoim...@redhat.com> wrote:
>> > On Thu, Jan 12, 2017 at 12:08:07PM -0800, Andy Lutomirski wrote:
>> >> On Thu, Jan 12, 2017 at 11:51 AM, Linus Torvalds
>> >> <torva...@linux-foundation.org> wrote:
>> >> > On Thu, Jan 12, 2017 at 6:02 AM, Josh Poimboeuf <jpoim...@redhat.com> 
>> >> > wrote:
>> >> >>
>> >> >> Just to clarify, I think you're asking if, for versions of gcc which
>> >> >> don't support -mpreferred-stack-boundary=3, objtool can analyze all C
>> >> >> functions to ensure their stacks are 16-byte aligned.
>> >> >>
>> >> >> It's certainly possible, but I don't see how that solves the problem.
>> >> >> The stack will still be misaligned by entry code.  Or am I missing
>> >> >> something?
>> >> >
>> >> > I think the argument is that we *could* try to align things, if we
>> >> > just had some tool that actually then verified that we aren't missing
>> >> > anything.
>> >> >
>> >> > I'm not entirely happy with checking the generated code, though,
>> >> > because as Ingo says, you have a 50:50 chance of just getting it right
>> >> > by mistake. So I'd much rather have some static tool that checks
>> >> > things at a code level (ie coccinelle or sparse).
>> >>
>> >> What I meant was checking the entry code to see if it aligns stack
>> >> frames, and good luck getting sparse to do that.  Hmm, getting 16-byte
>> >> alignment for real may actually be entirely a lost cause.  After all,
>> >> I think we have some inline functions that do asm volatile ("call
>> >> ..."), and I don't see any credible way of forcing alignment short of
>> >> generating an entirely new stack frame and aligning that.
>> >
>> > Actually we already found all such cases and fixed them by forcing a new
>> > stack frame, thanks to objtool.  For example, see 55a76b59b5fe.
>>
>> What I mean is: what guarantees that the stack is properly aligned for
>> the subroutine call?  gcc promises to set up a stack frame, but does
>> it promise that rsp will be properly aligned to call a C function?
>
> Yes, I did an experiment and you're right.  I had naively assumed that
> all stack frames would be aligned.

Just to check: did you do your experiment with -mpreferred-stack-boundary=4?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: x86-64: Maintain 16-byte stack alignment

2017-01-12 Thread Andy Lutomirski
On Thu, Jan 12, 2017 at 12:15 PM, Josh Poimboeuf <jpoim...@redhat.com> wrote:
> On Thu, Jan 12, 2017 at 12:08:07PM -0800, Andy Lutomirski wrote:
>> On Thu, Jan 12, 2017 at 11:51 AM, Linus Torvalds
>> <torva...@linux-foundation.org> wrote:
>> > On Thu, Jan 12, 2017 at 6:02 AM, Josh Poimboeuf <jpoim...@redhat.com> 
>> > wrote:
>> >>
>> >> Just to clarify, I think you're asking if, for versions of gcc which
>> >> don't support -mpreferred-stack-boundary=3, objtool can analyze all C
>> >> functions to ensure their stacks are 16-byte aligned.
>> >>
>> >> It's certainly possible, but I don't see how that solves the problem.
>> >> The stack will still be misaligned by entry code.  Or am I missing
>> >> something?
>> >
>> > I think the argument is that we *could* try to align things, if we
>> > just had some tool that actually then verified that we aren't missing
>> > anything.
>> >
>> > I'm not entirely happy with checking the generated code, though,
>> > because as Ingo says, you have a 50:50 chance of just getting it right
>> > by mistake. So I'd much rather have some static tool that checks
>> > things at a code level (ie coccinelle or sparse).
>>
>> What I meant was checking the entry code to see if it aligns stack
>> frames, and good luck getting sparse to do that.  Hmm, getting 16-byte
>> alignment for real may actually be entirely a lost cause.  After all,
>> I think we have some inline functions that do asm volatile ("call
>> ..."), and I don't see any credible way of forcing alignment short of
>> generating an entirely new stack frame and aligning that.
>
> Actually we already found all such cases and fixed them by forcing a new
> stack frame, thanks to objtool.  For example, see 55a76b59b5fe.

What I mean is: what guarantees that the stack is properly aligned for
the subroutine call?  gcc promises to set up a stack frame, but does
it promise that rsp will be properly aligned to call a C function?
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: x86-64: Maintain 16-byte stack alignment

2017-01-12 Thread Andy Lutomirski
On Thu, Jan 12, 2017 at 11:51 AM, Linus Torvalds
 wrote:
> On Thu, Jan 12, 2017 at 6:02 AM, Josh Poimboeuf  wrote:
>>
>> Just to clarify, I think you're asking if, for versions of gcc which
>> don't support -mpreferred-stack-boundary=3, objtool can analyze all C
>> functions to ensure their stacks are 16-byte aligned.
>>
>> It's certainly possible, but I don't see how that solves the problem.
>> The stack will still be misaligned by entry code.  Or am I missing
>> something?
>
> I think the argument is that we *could* try to align things, if we
> just had some tool that actually then verified that we aren't missing
> anything.
>
> I'm not entirely happy with checking the generated code, though,
> because as Ingo says, you have a 50:50 chance of just getting it right
> by mistake. So I'd much rather have some static tool that checks
> things at a code level (ie coccinelle or sparse).

What I meant was checking the entry code to see if it aligns stack
frames, and good luck getting sparse to do that.  Hmm, getting 16-byte
alignment for real may actually be entirely a lost cause.  After all,
I think we have some inline functions that do asm volatile ("call
..."), and I don't see any credible way of forcing alignment short of
generating an entirely new stack frame and aligning that.  Ick.  This
whole situation stinks, and I wish that the gcc developers had been
less daft here in the first place or that we'd noticed and gotten it
fixed much longer ago.

Can we come up with a macro like STACK_ALIGN_16 that turns into
__aligned__(32) on bad gcc versions and combine that with your sparse
patch?
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: x86-64: Maintain 16-byte stack alignment

2017-01-11 Thread Andy Lutomirski
On Wed, Jan 11, 2017 at 11:05 PM, Herbert Xu
 wrote:
> On Tue, Jan 10, 2017 at 09:05:28AM -0800, Linus Torvalds wrote:
>>
>> I'm pretty sure we have random asm code that may not maintain a
>> 16-byte stack alignment when it calls other code (including, in some
>> cases, calling C code).
>>
>> So I'm not at all convinced that this is a good idea. We shouldn't
>> expect 16-byte alignment to be something trustworthy.
>
> So what if we audited all the x86 assembly code to fix this? Would
> it then be acceptable to do a 16-byte aligned stack?
>
> On the face of it it doesn't seem to be a huge amount of code
> assuming they mostly live under arch/x86.

The problem is that we have nasties like TRACE_IRQS_OFF.  Performance
doesn't really matter for these macros, so we could probably rig up a
helper for forcibly align the stack there.  Maybe
FRAME_BEGIN_FORCE_ALIGN?  I also think I'd rather not to modify
pt_regs.  We should just fix the small number of code paths that
create a pt_regs and then call into C code to align the stack.

But if we can't do this with automatic verification, then I'm not sure
I want to do it at all.  The asm is already more precarious than I'd
like, and having a code path that is misaligned is asking for obscure
bugs down the road.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 8/8] crypto/testmgr: Allocate only the required output size for hash tests

2017-01-11 Thread Andy Lutomirski
On Wed, Jan 11, 2017 at 11:47 PM, Herbert Xu
<herb...@gondor.apana.org.au> wrote:
> Andy Lutomirski <l...@kernel.org> wrote:
>> There are some hashes (e.g. sha224) that have some internal trickery
>> to make sure that only the correct number of output bytes are
>> generated.  If something goes wrong, they could potentially overrun
>> the output buffer.
>>
>> Make the test more robust by allocating only enough space for the
>> correct output size so that memory debugging will catch the error if
>> the output is overrun.
>>
>> Tested by intentionally breaking sha224 to output all 256
>> internally-generated bits while running on KASAN.
>>
>> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
>> Cc: Herbert Xu <herb...@gondor.apana.org.au>
>> Signed-off-by: Andy Lutomirski <l...@kernel.org>
>
> This patch doesn't seem to depend on anything else in the series.
> Do you want me to take it separately?

Yes, please.  Its only relation to the rest of the series is that I
wanted to make sure that I didn't mess up sha224's finalization code.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: x86-64: Maintain 16-byte stack alignment

2017-01-11 Thread Andy Lutomirski
On Tue, Jan 10, 2017 at 10:01 PM, Andy Lutomirski <l...@amacapital.net> wrote:
> On Tue, Jan 10, 2017 at 8:35 PM, Herbert Xu <herb...@gondor.apana.org.au> 
> wrote:
>> On Tue, Jan 10, 2017 at 08:17:17PM -0800, Linus Torvalds wrote:
>>>
>>> That said, I do think that the "don't assume stack alignment, do it by
>>> hand" may be the safer thing. Because who knows what the random rules
>>> will be on other architectures.
>>
>> Sure we can ban the use of attribute aligned on stacks.  But
>> what about indirect uses through structures?  For example, if
>> someone does
>>
>> struct foo {
>> } __attribute__ ((__aligned__(16)));
>>
>> int bar(...)
>> {
>> struct foo f;
>>
>> return baz();
>> }
>>
>> then baz will end up with an unaligned argument.  The worst part
>> is that it is not at all obvious to the person writing the function
>> bar.
>
> Linus, I'm starting to lean toward agreeing with Herbert here, except
> that we should consider making it conditional on having a silly GCC
> version.  After all, the silly GCC versions are wasting space and time
> with alignment instructions no matter what we do, so this would just
> mean tweaking the asm and adding some kind of check_stack_alignment()
> helper to throw out a WARN_ONCE() if we miss one.  The problem with
> making it conditional is that making pt_regs effectively live at a
> variable offset from %rsp is just nasty.

So actually doing this is gross because we have calls from asm to C
all over the place.  But... maybe we can automate all the testing.
Josh, how hard would it be to teach objtool to (if requested by an
option) check that stack frames with statically known size preserve
16-byte stack alignment?

I find it rather annoying that gcc before 4.8 malfunctions when it
sees __aligned__(16) on x86_64 kernels.  Sigh.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: x86-64: Maintain 16-byte stack alignment

2017-01-11 Thread Andy Lutomirski
On Wed, Jan 11, 2017 at 12:09 AM, Herbert Xu
 wrote:
> On Wed, Jan 11, 2017 at 08:06:54AM +, Ard Biesheuvel wrote:
>>
>> Couldn't we update the __aligned(x) macro to emit 32 if arch == x86
>> and x == 16? All other cases should work just fine afaict
>
> Not everyone uses that macro.  You'd also need to add some checks
> to stop people from using the gcc __attribute__ directly.
>

You'd also have to audit things to make sure that __aligned__(16)
isn't being used for non-stack purposes.  After all, __aligned__(16)
in static data is fine, and it's also fine as a promise to GCC that
some object is 16-byte aligned.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 7/8] net: Rename TCA*BPF_DIGEST to ..._SHA256

2017-01-11 Thread Andy Lutomirski
On Wed, Jan 11, 2017 at 1:09 AM, Daniel Borkmann <dan...@iogearbox.net> wrote:
> Hi Andy,
>
> On 01/11/2017 04:11 AM, Andy Lutomirski wrote:
>>
>> On Tue, Jan 10, 2017 at 4:50 PM, Daniel Borkmann <dan...@iogearbox.net>
>> wrote:
>>>
>>> On 01/11/2017 12:24 AM, Andy Lutomirski wrote:
>>>>
>>>>
>>>> This makes it easier to add another digest algorithm down the road if
>>>> needed.  It also serves to force any programs that might have been
>>>> written against a kernel that had the old field name to notice the
>>>> change and make any necessary changes.
>>>>
>>>> This shouldn't violate any stable API policies, as no released kernel
>>>> has ever had TCA*BPF_DIGEST.
>>>
>>>
>>> Imho, this and patch 6/8 is not really needed. Should there ever
>>> another digest alg be used (doubt it), then you'd need a new nl
>>> attribute and fdinfo line anyway to keep existing stuff intact.
>>> Nobody made the claim that you can just change this underneath
>>> and not respecting abi for existing applications when I read from
>>> above that such apps now will get "forced" to notice a change.
>>
>>
>> Fair enough.  I was more concerned about prerelease iproute2 versions,
>> but maybe that's a nonissue.  I'll drop these two patches.
>
>
> Ok. Sleeping over this a bit, how about a general rename into
> "prog_tag" for fdinfo and TCA_BPF_TAG resp. TCA_ACT_BPF_TAG for
> the netlink attributes, fwiw, it might reduce any assumptions on
> this being made? If this would be preferable, I could cook that
> patch against -net for renaming it?

That would be fine with me.

I think there are two reasonable approaches to computing the actual tag.

1. Use a standard, modern cryptographic hash.  SHA-256, SHA-512,
Blake2b, whatever.  SHA-1 is a bad choice in part because it's partly
broken and in part because the implementation in lib/ is a real mess
to use (as you noticed while writing the code).

2. Use whatever algorithm you like but make the tag so short that it's
obviously not collision-free.  48 or 64 bits is probably reasonable.

The intermediate versions are just asking for trouble.  Alexei wants
to make the tag shorter, but I admit I still don't understand why he
prefers that over using a better crypto hash and letting user code
truncate the tag if it wants.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 8/8] crypto/testmgr: Allocate only the required output size for hash tests

2017-01-11 Thread Andy Lutomirski
On Wed, Jan 11, 2017 at 7:13 AM, David Laight <david.lai...@aculab.com> wrote:
> From: Andy Lutomirski
>> Sent: 10 January 2017 23:25
>> There are some hashes (e.g. sha224) that have some internal trickery
>> to make sure that only the correct number of output bytes are
>> generated.  If something goes wrong, they could potentially overrun
>> the output buffer.
>>
>> Make the test more robust by allocating only enough space for the
>> correct output size so that memory debugging will catch the error if
>> the output is overrun.
>
> Might be better to test this by allocating an overlong buffer
> and then explicitly checking that the output hasn't overrun
> the allowed space.
>
> If nothing else the error message will be clearer.

I thought about that, but then I'd have to figure out what poison
value to use.  Both KASAN and the usual slab debugging are quite good
at this kind of checking, and KASAN will even point you to the
problematic code directly.

--Andy


-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: x86-64: Maintain 16-byte stack alignment

2017-01-10 Thread Andy Lutomirski
On Tue, Jan 10, 2017 at 8:35 PM, Herbert Xu  wrote:
> On Tue, Jan 10, 2017 at 08:17:17PM -0800, Linus Torvalds wrote:
>>
>> That said, I do think that the "don't assume stack alignment, do it by
>> hand" may be the safer thing. Because who knows what the random rules
>> will be on other architectures.
>
> Sure we can ban the use of attribute aligned on stacks.  But
> what about indirect uses through structures?  For example, if
> someone does
>
> struct foo {
> } __attribute__ ((__aligned__(16)));
>
> int bar(...)
> {
> struct foo f;
>
> return baz();
> }
>
> then baz will end up with an unaligned argument.  The worst part
> is that it is not at all obvious to the person writing the function
> bar.

Linus, I'm starting to lean toward agreeing with Herbert here, except
that we should consider making it conditional on having a silly GCC
version.  After all, the silly GCC versions are wasting space and time
with alignment instructions no matter what we do, so this would just
mean tweaking the asm and adding some kind of check_stack_alignment()
helper to throw out a WARN_ONCE() if we miss one.  The problem with
making it conditional is that making pt_regs effectively live at a
variable offset from %rsp is just nasty.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 7/8] net: Rename TCA*BPF_DIGEST to ..._SHA256

2017-01-10 Thread Andy Lutomirski
On Tue, Jan 10, 2017 at 4:50 PM, Daniel Borkmann <dan...@iogearbox.net> wrote:
> On 01/11/2017 12:24 AM, Andy Lutomirski wrote:
>>
>> This makes it easier to add another digest algorithm down the road if
>> needed.  It also serves to force any programs that might have been
>> written against a kernel that had the old field name to notice the
>> change and make any necessary changes.
>>
>> This shouldn't violate any stable API policies, as no released kernel
>> has ever had TCA*BPF_DIGEST.
>
>
> Imho, this and patch 6/8 is not really needed. Should there ever
> another digest alg be used (doubt it), then you'd need a new nl
> attribute and fdinfo line anyway to keep existing stuff intact.
> Nobody made the claim that you can just change this underneath
> and not respecting abi for existing applications when I read from
> above that such apps now will get "forced" to notice a change.

Fair enough.  I was more concerned about prerelease iproute2 versions,
but maybe that's a nonissue.  I'll drop these two patches.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 7/8] net: Rename TCA*BPF_DIGEST to ..._SHA256

2017-01-10 Thread Andy Lutomirski
This makes it easier to add another digest algorithm down the road if
needed.  It also serves to force any programs that might have been
written against a kernel that had the old field name to notice the
change and make any necessary changes.

This shouldn't violate any stable API policies, as no released kernel
has ever had TCA*BPF_DIGEST.

Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: Alexei Starovoitov <a...@kernel.org>
Signed-off-by: Andy Lutomirski <l...@kernel.org>
---
 include/uapi/linux/pkt_cls.h   | 2 +-
 include/uapi/linux/tc_act/tc_bpf.h | 2 +-
 net/sched/act_bpf.c| 2 +-
 net/sched/cls_bpf.c| 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index cb4bcdc58543..ac6b300c1550 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -397,7 +397,7 @@ enum {
TCA_BPF_NAME,
TCA_BPF_FLAGS,
TCA_BPF_FLAGS_GEN,
-   TCA_BPF_DIGEST,
+   TCA_BPF_SHA256,
__TCA_BPF_MAX,
 };
 
diff --git a/include/uapi/linux/tc_act/tc_bpf.h 
b/include/uapi/linux/tc_act/tc_bpf.h
index a6b88a6f7f71..eae18a7430eb 100644
--- a/include/uapi/linux/tc_act/tc_bpf.h
+++ b/include/uapi/linux/tc_act/tc_bpf.h
@@ -27,7 +27,7 @@ enum {
TCA_ACT_BPF_FD,
TCA_ACT_BPF_NAME,
TCA_ACT_BPF_PAD,
-   TCA_ACT_BPF_DIGEST,
+   TCA_ACT_BPF_SHA256,
__TCA_ACT_BPF_MAX,
 };
 #define TCA_ACT_BPF_MAX (__TCA_ACT_BPF_MAX - 1)
diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
index 1c60317f0121..3868a66d0b24 100644
--- a/net/sched/act_bpf.c
+++ b/net/sched/act_bpf.c
@@ -123,7 +123,7 @@ static int tcf_bpf_dump_ebpf_info(const struct tcf_bpf 
*prog,
nla_put_string(skb, TCA_ACT_BPF_NAME, prog->bpf_name))
return -EMSGSIZE;
 
-   nla = nla_reserve(skb, TCA_ACT_BPF_DIGEST,
+   nla = nla_reserve(skb, TCA_ACT_BPF_SHA256,
  sizeof(prog->filter->digest));
if (nla == NULL)
return -EMSGSIZE;
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index adc776048d1a..6fc110321621 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -555,7 +555,7 @@ static int cls_bpf_dump_ebpf_info(const struct cls_bpf_prog 
*prog,
nla_put_string(skb, TCA_BPF_NAME, prog->bpf_name))
return -EMSGSIZE;
 
-   nla = nla_reserve(skb, TCA_BPF_DIGEST, sizeof(prog->filter->digest));
+   nla = nla_reserve(skb, TCA_BPF_SHA256, sizeof(prog->filter->digest));
if (nla == NULL)
return -EMSGSIZE;
 
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/8] bpf: Use SHA256 instead of SHA1 for bpf digests

2017-01-10 Thread Andy Lutomirski
SHA1 is considered obsolete.  It is no longer considered to be
collision resistant and, in general, it should not be used for new
applications.  Change the new-in-4.10 BPF digest to SHA-256.  This
means that potential future applications of the digest that need
collision resistance will be able to use the BPF digest.  Applications
that just want a short identifier for a BPF program are welcome to
truncate the digest.

This is also a cleanup IMO -- the new sha256_*_direct() API is much
nicer than the old SHA1 library helpers.  It will also enable
incremental hashing so the BPF code can avoid calling vmalloc().

I moved the digest field to keep all of the bpf program metadata in
the same cache line.

Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: Alexei Starovoitov <a...@kernel.org>
Signed-off-by: Andy Lutomirski <l...@kernel.org>
---
 include/linux/filter.h | 11 +++
 init/Kconfig   |  1 +
 kernel/bpf/core.c  | 42 --
 3 files changed, 12 insertions(+), 42 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 702314253797..23df2574e30c 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -14,7 +14,8 @@
 #include 
 #include 
 #include 
-#include 
+
+#include 
 
 #include 
 
@@ -408,11 +409,11 @@ struct bpf_prog {
kmemcheck_bitfield_end(meta);
enum bpf_prog_type  type;   /* Type of BPF program */
u32 len;/* Number of filter blocks */
-   u32 digest[SHA_DIGEST_WORDS]; /* Program digest */
struct bpf_prog_aux *aux;   /* Auxiliary fields */
struct sock_fprog_kern  *orig_prog; /* Original BPF program */
unsigned int(*bpf_func)(const void *ctx,
const struct bpf_insn *insn);
+   u8  digest[SHA256_DIGEST_SIZE]; /* Program digest */
/* Instructions for interpreter */
union {
struct sock_filter  insns[0];
@@ -519,12 +520,6 @@ static inline u32 bpf_prog_insn_size(const struct bpf_prog 
*prog)
return prog->len * sizeof(struct bpf_insn);
 }
 
-static inline u32 bpf_prog_digest_scratch_size(const struct bpf_prog *prog)
-{
-   return round_up(bpf_prog_insn_size(prog) +
-   sizeof(__be64) + 1, SHA_MESSAGE_BYTES);
-}
-
 static inline unsigned int bpf_prog_size(unsigned int proglen)
 {
return max(sizeof(struct bpf_prog),
diff --git a/init/Kconfig b/init/Kconfig
index 223b734abccd..f1ea6d023f8c 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1389,6 +1389,7 @@ config HAVE_PCSPKR_PLATFORM
 # interpreter that classic socket filters depend on
 config BPF
bool
+   select CRYPTO_SHA256_DIRECT
 
 menuconfig EXPERT
bool "Configure standard kernel features (expert users)"
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 1eb4f1303756..668b92f6ab58 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -148,22 +148,18 @@ void __bpf_prog_free(struct bpf_prog *fp)
 
 int bpf_prog_calc_digest(struct bpf_prog *fp)
 {
-   const u32 bits_offset = SHA_MESSAGE_BYTES - sizeof(__be64);
-   u32 raw_size = bpf_prog_digest_scratch_size(fp);
-   u32 ws[SHA_WORKSPACE_WORDS];
-   u32 i, bsize, psize, blocks;
+   struct sha256_state sha;
+   u32 i, psize;
struct bpf_insn *dst;
bool was_ld_map;
-   u8 *raw, *todo;
-   __be32 *result;
-   __be64 *bits;
+   u8 *raw;
 
-   raw = vmalloc(raw_size);
+   psize = bpf_prog_insn_size(fp);
+   raw = vmalloc(psize);
if (!raw)
return -ENOMEM;
 
-   sha_init(fp->digest);
-   memset(ws, 0, sizeof(ws));
+   sha256_init_direct();
 
/* We need to take out the map fd for the digest calculation
 * since they are unstable from user space side.
@@ -188,30 +184,8 @@ int bpf_prog_calc_digest(struct bpf_prog *fp)
}
}
 
-   psize = bpf_prog_insn_size(fp);
-   memset([psize], 0, raw_size - psize);
-   raw[psize++] = 0x80;
-
-   bsize  = round_up(psize, SHA_MESSAGE_BYTES);
-   blocks = bsize / SHA_MESSAGE_BYTES;
-   todo   = raw;
-   if (bsize - psize >= sizeof(__be64)) {
-   bits = (__be64 *)(todo + bsize - sizeof(__be64));
-   } else {
-   bits = (__be64 *)(todo + bsize + bits_offset);
-   blocks++;
-   }
-   *bits = cpu_to_be64((psize - 1) << 3);
-
-   while (blocks--) {
-   sha_transform(fp->digest, todo, ws);
-   todo += SHA_MESSAGE_BYTES;
-   }
-
-   result = (__force __be32 *)fp->digest;
-   for (i = 0; i < SHA_DIGEST_WORDS; i++)
-   result[i] = cpu_to_be32(fp->digest[i]);
-
+   sha256_update_direct(, raw, psize);
+   sha256_final_direct(, fp->digest);
vfree(raw);
  

[PATCH v2 6/8] bpf: Rename fdinfo's prog_digest to prog_sha256

2017-01-10 Thread Andy Lutomirski
This makes it easier to add another digest algorithm down the road
if needed.  It also serves to force any programs that might have
been written against a kernel that had 'prog_digest' to be updated.

This shouldn't violate any stable API policies, as no released
kernel has ever had 'prog_digest'.

Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: Alexei Starovoitov <a...@kernel.org>
Signed-off-by: Andy Lutomirski <l...@kernel.org>
---
 kernel/bpf/syscall.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index e89acea22ecf..956370b80296 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -694,7 +694,7 @@ static void bpf_prog_show_fdinfo(struct seq_file *m, struct 
file *filp)
seq_printf(m,
   "prog_type:\t%u\n"
   "prog_jited:\t%u\n"
-  "prog_digest:\t%s\n"
+  "prog_sha256:\t%s\n"
   "memlock:\t%llu\n",
   prog->type,
   prog->jited,
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/8] crypto/sha256: Build the SHA256 core separately from the crypto module

2017-01-10 Thread Andy Lutomirski
This just moves code around -- no code changes in this patch.  This
wil let BPF-based tracing link against the SHA256 core code without
depending on the crypto core.

Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Herbert Xu <herb...@gondor.apana.org.au>
Signed-off-by: Andy Lutomirski <l...@kernel.org>
---
 crypto/Kconfig  |   8 ++
 crypto/Makefile |   1 +
 crypto/sha256_direct.c  | 238 
 crypto/sha256_generic.c | 214 ---
 4 files changed, 247 insertions(+), 214 deletions(-)
 create mode 100644 crypto/sha256_direct.c

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 160f08e721cc..b83ae6789e78 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -10,6 +10,13 @@ config XOR_BLOCKS
 source "crypto/async_tx/Kconfig"
 
 #
+# Cryptographic algorithms that are usable without the Crypto API.
+# None of these should have visible config options.
+#
+config CRYPTO_SHA256_DIRECT
+   bool
+
+#
 # Cryptographic API Configuration
 #
 menuconfig CRYPTO
@@ -763,6 +770,7 @@ config CRYPTO_SHA512_MB
 
 config CRYPTO_SHA256
tristate "SHA224 and SHA256 digest algorithm"
+   select CRYPTO_SHA256_DIRECT
select CRYPTO_HASH
help
  SHA256 secure hash standard (DFIPS 180-2).
diff --git a/crypto/Makefile b/crypto/Makefile
index b8f0e3eb0791..e0118a3b0d99 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -71,6 +71,7 @@ obj-$(CONFIG_CRYPTO_RMD160) += rmd160.o
 obj-$(CONFIG_CRYPTO_RMD256) += rmd256.o
 obj-$(CONFIG_CRYPTO_RMD320) += rmd320.o
 obj-$(CONFIG_CRYPTO_SHA1) += sha1_generic.o
+obj-$(CONFIG_CRYPTO_SHA256_DIRECT) += sha256_direct.o
 obj-$(CONFIG_CRYPTO_SHA256) += sha256_generic.o
 obj-$(CONFIG_CRYPTO_SHA512) += sha512_generic.o
 obj-$(CONFIG_CRYPTO_SHA3) += sha3_generic.o
diff --git a/crypto/sha256_direct.c b/crypto/sha256_direct.c
new file mode 100644
index ..2029e4c08339
--- /dev/null
+++ b/crypto/sha256_direct.c
@@ -0,0 +1,238 @@
+/*
+ * Cryptographic API.
+ *
+ * SHA-256, as specified in
+ * http://csrc.nist.gov/groups/STM/cavp/documents/shs/sha256-384-512.pdf
+ *
+ * SHA-256 code by Jean-Luc Cooke <jlco...@certainkey.com>.
+ *
+ * Copyright (c) Jean-Luc Cooke <jlco...@certainkey.com>
+ * Copyright (c) Andrew McDonald <and...@mcdonald.org.uk>
+ * Copyright (c) 2002 James Morris <jmor...@intercode.com.au>
+ * SHA224 Support Copyright 2007 Intel Corporation <jonathan.ly...@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option) 
+ * any later version.
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static inline u32 Ch(u32 x, u32 y, u32 z)
+{
+   return z ^ (x & (y ^ z));
+}
+
+static inline u32 Maj(u32 x, u32 y, u32 z)
+{
+   return (x & y) | (z & (x | y));
+}
+
+#define e0(x)   (ror32(x, 2) ^ ror32(x,13) ^ ror32(x,22))
+#define e1(x)   (ror32(x, 6) ^ ror32(x,11) ^ ror32(x,25))
+#define s0(x)   (ror32(x, 7) ^ ror32(x,18) ^ (x >> 3))
+#define s1(x)   (ror32(x,17) ^ ror32(x,19) ^ (x >> 10))
+
+static inline void LOAD_OP(int I, u32 *W, const u8 *input)
+{
+   W[I] = get_unaligned_be32((__u32 *)input + I);
+}
+
+static inline void BLEND_OP(int I, u32 *W)
+{
+   W[I] = s1(W[I-2]) + W[I-7] + s0(W[I-15]) + W[I-16];
+}
+
+static void sha256_transform(u32 *state, const u8 *input)
+{
+   u32 a, b, c, d, e, f, g, h, t1, t2;
+   u32 W[64];
+   int i;
+
+   /* load the input */
+   for (i = 0; i < 16; i++)
+   LOAD_OP(i, W, input);
+
+   /* now blend */
+   for (i = 16; i < 64; i++)
+   BLEND_OP(i, W);
+
+   /* load the state into our registers */
+   a=state[0];  b=state[1];  c=state[2];  d=state[3];
+   e=state[4];  f=state[5];  g=state[6];  h=state[7];
+
+   /* now iterate */
+   t1 = h + e1(e) + Ch(e,f,g) + 0x428a2f98 + W[ 0];
+   t2 = e0(a) + Maj(a,b,c);d+=t1;h=t1+t2;
+   t1 = g + e1(d) + Ch(d,e,f) + 0x71374491 + W[ 1];
+   t2 = e0(h) + Maj(h,a,b);c+=t1;g=t1+t2;
+   t1 = f + e1(c) + Ch(c,d,e) + 0xb5c0fbcf + W[ 2];
+   t2 = e0(g) + Maj(g,h,a);b+=t1;f=t1+t2;
+   t1 = e + e1(b) + Ch(b,c,d) + 0xe9b5dba5 + W[ 3];
+   t2 = e0(f) + Maj(f,g,h);a+=t1;e=t1+t2;
+   t1 = d + e1(a) + Ch(a,b,c) + 0x3956c25b + W[ 4];
+   t2 = e0(e) + Maj(e,f,g);h+=t1;d=t1+t2;
+   t1 = c + e1(h) + Ch(h,a,b) + 0x59f111f1 + W[ 5];
+   t2 = e0(d) + Maj(d,e,f);g+=t1;c=t1+t2;
+   t1 = b + e1(g) + Ch(g,h,a) + 0x923f82a4 + W[ 6];
+   t2 = e0(c) + Maj(c,d,e);f+=t1;b=t1+t2;
+   t1 = a + e1(f) + Ch(f,g,h) + 0xab1c5ed5 + W[ 7];
+   t2 = e0(b) + Maj(b,c,d);e+=t1;a=t1+t2;
+
+   t1 =

Re: x86-64: Maintain 16-byte stack alignment

2017-01-10 Thread Andy Lutomirski
On Tue, Jan 10, 2017 at 12:00 PM, Ard Biesheuvel
<ard.biesheu...@linaro.org> wrote:
> On 10 January 2017 at 19:22, Andy Lutomirski <l...@amacapital.net> wrote:
>> On Tue, Jan 10, 2017 at 11:16 AM, Ard Biesheuvel
>> <ard.biesheu...@linaro.org> wrote:
>>> On 10 January 2017 at 19:00, Andy Lutomirski <l...@amacapital.net> wrote:
>>>> On Tue, Jan 10, 2017 at 9:30 AM, Ard Biesheuvel
>>>> <ard.biesheu...@linaro.org> wrote:
>>>>> On 10 January 2017 at 14:33, Herbert Xu <herb...@gondor.apana.org.au> 
>>>>> wrote:
>>>>>> I recently applied the patch
>>>>>>
>>>>>> https://patchwork.kernel.org/patch/9468391/
>>>>>>
>>>>>> and ended up with a boot crash when it tried to run the x86 chacha20
>>>>>> code.  It turned out that the patch changed a manually aligned
>>>>>> stack buffer to one that is aligned by gcc.  What was happening was
>>>>>> that gcc can stack align to any value on x86-64 except 16.  The
>>>>>> reason is that gcc assumes that the stack is always 16-byte aligned,
>>>>>> which is not actually the case in the kernel.
>>>>>>
>>>>>
>>>>> Apologies for introducing this breakage. It seemed like an obvious and
>>>>> simple cleanup, so I didn't even bother to mention it in the commit
>>>>> log, but if the kernel does not guarantee 16 byte alignment, I guess
>>>>> we should revert to the old method. If SSE instructions are the only
>>>>> ones that require this alignment, then I suppose not having a ABI
>>>>> conforming stack pointer should not be an issue in general.
>>>>
>>>> Here's what I think is really going on.  This is partially from
>>>> memory, so I could be off base.  The kernel is up against
>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383, which means that,
>>>> on some GCC versions (like the bad one and maybe even current ones),
>>>> things compiled without -mno-sse can't have the stack alignment set
>>>> properly.  IMO we should fix this in the affected code, not the entry
>>>> code.  In fact, I think that fixing it in the entry code won't even
>>>> fully fix it because modern GCC will compile the rest of the kernel
>>>> with 8-byte alignment and the stack will get randomly unaligned (GCC
>>>> 4.8 and newer).
>>>>
>>>> Can we just add __attribute__((force_align_arg_pointer)) to the
>>>> affected functions?  Maybe have:
>>>>
>>>> #define __USES_SSE __attribute__((force_align_arg_pointer))
>>>>
>>>> on affected gcc versions?
>>>>
>>>> ***HOWEVER***
>>>>
>>>> I think this is missing the tree for the supposed forest.  The actual
>>>> affected code appears to be:
>>>>
>>>> static int chacha20_simd(struct blkcipher_desc *desc, struct scatterlist 
>>>> *dst,
>>>>  struct scatterlist *src, unsigned int nbytes)
>>>> {
>>>> u32 *state, state_buf[16 + (CHACHA20_STATE_ALIGN / sizeof(u32)) - 
>>>> 1];
>>>>
>>>> ...
>>>>
>>>> state = (u32 *)roundup((uintptr_t)state_buf, CHACHA20_STATE_ALIGN);
>>>>
>>>> gcc presumably infers (incorrectly) that state_buf is 16-byte aligned
>>>> and optimizes out the roundup.  How about just declaring an actual
>>>> __aligned(16) buffer, marking the function
>>>> __attribute__((force_align_arg_pointer)), and being done with it?
>>>> After all, we need that forcible alignment on *all* gcc versions.
>>>>
>>>
>>> Actually, the breakage is introduced by the patch Herbert refers to
>>>
>>> https://patchwork.kernel.org/patch/9468391/
>>>
>>> where the state is replaced by a simple
>>>
>>> u32 state[16] __aligned(CHACHA20_STATE_ALIGN);
>>>
>>> which seemed harmless enough to me. So the code above works fine.
>>
>> So how about just the one-line patch of adding the
>> force_align_arg_pointer?  Would that solve the problem?
>
> If it does what it says on the tin, it should fix the issue, but after
> adding the attribute, I get the exact same object output, so there's
> something dodgy going on here.

Ugh, that's annoying.  Maybe it needs noinline too?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 5/8] bpf: Avoid copying the entire BPF program when hashing it

2017-01-10 Thread Andy Lutomirski
The sha256 helpers can consume a message incrementally, so there's no need
to allocate a buffer to store the whole blob to be hashed.

This may be a slight slowdown for very long messages because gcc can't
inline the sha256_update() calls.  For reasonably-sized programs,
however, this should be a considerable speedup as vmalloc() is quite
slow.

Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: Alexei Starovoitov <a...@kernel.org>
Signed-off-by: Andy Lutomirski <l...@kernel.org>
---
 kernel/bpf/core.c | 33 +
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 668b92f6ab58..106162a1bc54 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -149,44 +149,37 @@ void __bpf_prog_free(struct bpf_prog *fp)
 int bpf_prog_calc_digest(struct bpf_prog *fp)
 {
struct sha256_state sha;
-   u32 i, psize;
-   struct bpf_insn *dst;
+   u32 i;
bool was_ld_map;
-   u8 *raw;
-
-   psize = bpf_prog_insn_size(fp);
-   raw = vmalloc(psize);
-   if (!raw)
-   return -ENOMEM;
 
sha256_init_direct();
 
/* We need to take out the map fd for the digest calculation
 * since they are unstable from user space side.
 */
-   dst = (void *)raw;
for (i = 0, was_ld_map = false; i < fp->len; i++) {
-   dst[i] = fp->insnsi[i];
+   struct bpf_insn insn = fp->insnsi[i];
+
if (!was_ld_map &&
-   dst[i].code == (BPF_LD | BPF_IMM | BPF_DW) &&
-   dst[i].src_reg == BPF_PSEUDO_MAP_FD) {
+   insn.code == (BPF_LD | BPF_IMM | BPF_DW) &&
+   insn.src_reg == BPF_PSEUDO_MAP_FD) {
was_ld_map = true;
-   dst[i].imm = 0;
+   insn.imm = 0;
} else if (was_ld_map &&
-  dst[i].code == 0 &&
-  dst[i].dst_reg == 0 &&
-  dst[i].src_reg == 0 &&
-  dst[i].off == 0) {
+  insn.code == 0 &&
+  insn.dst_reg == 0 &&
+  insn.src_reg == 0 &&
+  insn.off == 0) {
was_ld_map = false;
-   dst[i].imm = 0;
+   insn.imm = 0;
} else {
was_ld_map = false;
}
+
+   sha256_update_direct(, (const u8 *), sizeof(insn));
}
 
-   sha256_update_direct(, raw, psize);
sha256_final_direct(, fp->digest);
-   vfree(raw);
return 0;
 }
 
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 8/8] crypto/testmgr: Allocate only the required output size for hash tests

2017-01-10 Thread Andy Lutomirski
There are some hashes (e.g. sha224) that have some internal trickery
to make sure that only the correct number of output bytes are
generated.  If something goes wrong, they could potentially overrun
the output buffer.

Make the test more robust by allocating only enough space for the
correct output size so that memory debugging will catch the error if
the output is overrun.

Tested by intentionally breaking sha224 to output all 256
internally-generated bits while running on KASAN.

Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Herbert Xu <herb...@gondor.apana.org.au>
Signed-off-by: Andy Lutomirski <l...@kernel.org>
---
 crypto/testmgr.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index f616ad74cce7..575fc28a9ab2 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -265,6 +265,7 @@ static int __test_hash(struct crypto_ahash *tfm, struct 
hash_testvec *template,
   const int align_offset)
 {
const char *algo = crypto_tfm_alg_driver_name(crypto_ahash_tfm(tfm));
+   size_t digest_size = crypto_ahash_digestsize(tfm);
unsigned int i, j, k, temp;
struct scatterlist sg[8];
char *result;
@@ -275,7 +276,7 @@ static int __test_hash(struct crypto_ahash *tfm, struct 
hash_testvec *template,
char *xbuf[XBUFSIZE];
int ret = -ENOMEM;
 
-   result = kmalloc(MAX_DIGEST_SIZE, GFP_KERNEL);
+   result = kmalloc(digest_size, GFP_KERNEL);
if (!result)
return ret;
key = kmalloc(MAX_KEYLEN, GFP_KERNEL);
@@ -305,7 +306,7 @@ static int __test_hash(struct crypto_ahash *tfm, struct 
hash_testvec *template,
goto out;
 
j++;
-   memset(result, 0, MAX_DIGEST_SIZE);
+   memset(result, 0, digest_size);
 
hash_buff = xbuf[0];
hash_buff += align_offset;
@@ -380,7 +381,7 @@ static int __test_hash(struct crypto_ahash *tfm, struct 
hash_testvec *template,
continue;
 
j++;
-   memset(result, 0, MAX_DIGEST_SIZE);
+   memset(result, 0, digest_size);
 
temp = 0;
sg_init_table(sg, template[i].np);
@@ -458,7 +459,7 @@ static int __test_hash(struct crypto_ahash *tfm, struct 
hash_testvec *template,
continue;
 
j++;
-   memset(result, 0, MAX_DIGEST_SIZE);
+   memset(result, 0, digest_size);
 
ret = -EINVAL;
hash_buff = xbuf[0];
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/8] crypto/sha256: Export a sha256_{init,update,final}_direct() API

2017-01-10 Thread Andy Lutomirski
This provides a very simple interface for kernel code to use to do
synchronous, unaccelerated, virtual-address-based SHA256 hashing
without needing to create a crypto context.

Subsequent patches will make this work without building the crypto
core and will use to avoid making BPF-based tracing depend on
crypto.

Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Herbert Xu <herb...@gondor.apana.org.au>
Signed-off-by: Andy Lutomirski <l...@kernel.org>
---
 crypto/sha256_generic.c  | 31 ++-
 include/crypto/sha.h | 24 
 include/crypto/sha256_base.h | 13 -
 3 files changed, 50 insertions(+), 18 deletions(-)

diff --git a/crypto/sha256_generic.c b/crypto/sha256_generic.c
index 8f9c47e1a96e..573e114382f9 100644
--- a/crypto/sha256_generic.c
+++ b/crypto/sha256_generic.c
@@ -240,24 +240,45 @@ static void sha256_generic_block_fn(struct sha256_state 
*sst, u8 const *src,
}
 }
 
+void sha256_update_direct(struct sha256_state *sctx, const u8 *data,
+ unsigned int len)
+{
+   __sha256_base_do_update(sctx, data, len, sha256_generic_block_fn);
+}
+EXPORT_SYMBOL(sha256_update_direct);
+
 int crypto_sha256_update(struct shash_desc *desc, const u8 *data,
  unsigned int len)
 {
-   return sha256_base_do_update(desc, data, len, sha256_generic_block_fn);
+   sha256_update_direct(shash_desc_ctx(desc), data, len);
+   return 0;
 }
 EXPORT_SYMBOL(crypto_sha256_update);
 
 static int sha256_final(struct shash_desc *desc, u8 *out)
 {
-   sha256_base_do_finalize(desc, sha256_generic_block_fn);
-   return sha256_base_finish(desc, out);
+   __sha256_final_direct(shash_desc_ctx(desc),
+ crypto_shash_digestsize(desc->tfm), out);
+   return 0;
 }
 
+void __sha256_final_direct(struct sha256_state *sctx, unsigned int digest_size,
+  u8 *out)
+{
+   sha256_do_finalize_direct(sctx, sha256_generic_block_fn);
+   __sha256_base_finish(sctx, digest_size, out);
+}
+EXPORT_SYMBOL(sha256_final_direct);
+
 int crypto_sha256_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *hash)
 {
-   sha256_base_do_update(desc, data, len, sha256_generic_block_fn);
-   return sha256_final(desc, hash);
+   struct sha256_state *sctx = shash_desc_ctx(desc);
+   unsigned int digest_size = crypto_shash_digestsize(desc->tfm);
+
+   sha256_update_direct(sctx, data, len);
+   __sha256_final_direct(sctx, digest_size, hash);
+   return 0;
 }
 EXPORT_SYMBOL(crypto_sha256_finup);
 
diff --git a/include/crypto/sha.h b/include/crypto/sha.h
index c94d3eb1cefd..0df1a0e42c95 100644
--- a/include/crypto/sha.h
+++ b/include/crypto/sha.h
@@ -88,6 +88,30 @@ struct sha512_state {
u8 buf[SHA512_BLOCK_SIZE];
 };
 
+static inline void sha256_init_direct(struct sha256_state *sctx)
+{
+   sctx->state[0] = SHA256_H0;
+   sctx->state[1] = SHA256_H1;
+   sctx->state[2] = SHA256_H2;
+   sctx->state[3] = SHA256_H3;
+   sctx->state[4] = SHA256_H4;
+   sctx->state[5] = SHA256_H5;
+   sctx->state[6] = SHA256_H6;
+   sctx->state[7] = SHA256_H7;
+   sctx->count = 0;
+}
+
+extern void sha256_update_direct(struct sha256_state *sctx, const u8 *data,
+unsigned int len);
+
+extern void __sha256_final_direct(struct sha256_state *sctx,
+ unsigned int digest_size, u8 *out);
+
+static inline void sha256_final_direct(struct sha256_state *sctx, u8 *out)
+{
+   __sha256_final_direct(sctx, SHA256_DIGEST_SIZE, out);
+}
+
 struct shash_desc;
 
 extern int crypto_sha1_update(struct shash_desc *desc, const u8 *data,
diff --git a/include/crypto/sha256_base.h b/include/crypto/sha256_base.h
index fc77b8e099a7..9bbe73ce458f 100644
--- a/include/crypto/sha256_base.h
+++ b/include/crypto/sha256_base.h
@@ -37,19 +37,6 @@ static inline int sha224_base_init(struct shash_desc *desc)
return 0;
 }
 
-static inline void sha256_init_direct(struct sha256_state *sctx)
-{
-   sctx->state[0] = SHA256_H0;
-   sctx->state[1] = SHA256_H1;
-   sctx->state[2] = SHA256_H2;
-   sctx->state[3] = SHA256_H3;
-   sctx->state[4] = SHA256_H4;
-   sctx->state[5] = SHA256_H5;
-   sctx->state[6] = SHA256_H6;
-   sctx->state[7] = SHA256_H7;
-   sctx->count = 0;
-}
-
 static inline int sha256_base_init(struct shash_desc *desc)
 {
sha256_init_direct(shash_desc_ctx(desc));
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/8] Switch BPF's digest to SHA256

2017-01-10 Thread Andy Lutomirski
I can imagine future uses for the new-in-4.10 BPF digest feature that
would be problematic if malicious users could produce collisions, and
SHA-1 is no longer consdiered to be collision-free.  Even without
needing collision resistance, SHA-1 is no longer recommended for new
applications.  Switch the BPF digest to SHA-256 instead.

The actual switchover is trivial.  Most of this series consists of
cleanups to the SHA256 code to make it usable as a standalone library
(since BPF should not depend on crypto).

The cleaned up library is much more user-friendly than the SHA-1 code,
so this also significantly tidies up the BPF digest code.

This is intended for 4.10.  If this series misses 4.10 and nothing
takes its place, then we'll have an unpleasant ABI stability
situation.

NB: I would be happy to make parallel changes to the SHA-512 code if
the crypto folks would like for me to do so.  I haven't yet because I
wanted to minimize churn.  Also, the changes will be essentially
identical to the SHA-256 changes and I want to get the latter
reviewed first.

Andy Lutomirski (8):
  crypto/sha256: Factor out the parts of base API that don't use
shash_desc
  crypto/sha256: Export a sha256_{init,update,final}_direct() API
  crypto/sha256: Build the SHA256 core separately from the crypto module
  bpf: Use SHA256 instead of SHA1 for bpf digests
  bpf: Avoid copying the entire BPF program when hashing it
  bpf: Rename fdinfo's prog_digest to prog_sha256
  net: Rename TCA*BPF_DIGEST to ..._SHA256
  crypto/testmgr: Allocate only the required output size for hash tests

 crypto/Kconfig |   8 ++
 crypto/Makefile|   1 +
 crypto/sha256_direct.c | 238 +
 crypto/sha256_generic.c| 215 ++---
 crypto/testmgr.c   |   9 +-
 include/crypto/sha.h   |  24 
 include/crypto/sha256_base.h   |  58 +
 include/linux/filter.h |  11 +-
 include/uapi/linux/pkt_cls.h   |   2 +-
 include/uapi/linux/tc_act/tc_bpf.h |   2 +-
 init/Kconfig   |   1 +
 kernel/bpf/core.c  |  63 +++---
 kernel/bpf/syscall.c   |   2 +-
 net/sched/act_bpf.c|   2 +-
 net/sched/cls_bpf.c|   2 +-
 15 files changed, 343 insertions(+), 295 deletions(-)
 create mode 100644 crypto/sha256_direct.c

-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/8] crypto/sha256: Factor out the parts of base API that don't use shash_desc

2017-01-10 Thread Andy Lutomirski
I want to expose a minimal SHA256 API that can be used without the
depending on the crypto core.  To prepare for this, factor out the
meat of the sha256_base_*() helpers.

Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Herbert Xu <herb...@gondor.apana.org.au>
Signed-off-by: Andy Lutomirski <l...@kernel.org>
---
 include/crypto/sha256_base.h | 53 ++--
 1 file changed, 36 insertions(+), 17 deletions(-)

diff --git a/include/crypto/sha256_base.h b/include/crypto/sha256_base.h
index d1f2195bb7de..fc77b8e099a7 100644
--- a/include/crypto/sha256_base.h
+++ b/include/crypto/sha256_base.h
@@ -18,10 +18,8 @@
 typedef void (sha256_block_fn)(struct sha256_state *sst, u8 const *src,
   int blocks);
 
-static inline int sha224_base_init(struct shash_desc *desc)
+static inline void sha224_init_direct(struct sha256_state *sctx)
 {
-   struct sha256_state *sctx = shash_desc_ctx(desc);
-
sctx->state[0] = SHA224_H0;
sctx->state[1] = SHA224_H1;
sctx->state[2] = SHA224_H2;
@@ -31,14 +29,16 @@ static inline int sha224_base_init(struct shash_desc *desc)
sctx->state[6] = SHA224_H6;
sctx->state[7] = SHA224_H7;
sctx->count = 0;
+}
 
+static inline int sha224_base_init(struct shash_desc *desc)
+{
+   sha224_init_direct(shash_desc_ctx(desc));
return 0;
 }
 
-static inline int sha256_base_init(struct shash_desc *desc)
+static inline void sha256_init_direct(struct sha256_state *sctx)
 {
-   struct sha256_state *sctx = shash_desc_ctx(desc);
-
sctx->state[0] = SHA256_H0;
sctx->state[1] = SHA256_H1;
sctx->state[2] = SHA256_H2;
@@ -48,16 +48,19 @@ static inline int sha256_base_init(struct shash_desc *desc)
sctx->state[6] = SHA256_H6;
sctx->state[7] = SHA256_H7;
sctx->count = 0;
+}
 
+static inline int sha256_base_init(struct shash_desc *desc)
+{
+   sha256_init_direct(shash_desc_ctx(desc));
return 0;
 }
 
-static inline int sha256_base_do_update(struct shash_desc *desc,
-   const u8 *data,
-   unsigned int len,
-   sha256_block_fn *block_fn)
+static inline void __sha256_base_do_update(struct sha256_state *sctx,
+  const u8 *data,
+  unsigned int len,
+  sha256_block_fn *block_fn)
 {
-   struct sha256_state *sctx = shash_desc_ctx(desc);
unsigned int partial = sctx->count % SHA256_BLOCK_SIZE;
 
sctx->count += len;
@@ -86,15 +89,21 @@ static inline int sha256_base_do_update(struct shash_desc 
*desc,
}
if (len)
memcpy(sctx->buf + partial, data, len);
+}
 
+static inline int sha256_base_do_update(struct shash_desc *desc,
+   const u8 *data,
+   unsigned int len,
+   sha256_block_fn *block_fn)
+{
+   __sha256_base_do_update(shash_desc_ctx(desc), data, len, block_fn);
return 0;
 }
 
-static inline int sha256_base_do_finalize(struct shash_desc *desc,
- sha256_block_fn *block_fn)
+static inline void sha256_do_finalize_direct(struct sha256_state *sctx,
+sha256_block_fn *block_fn)
 {
const int bit_offset = SHA256_BLOCK_SIZE - sizeof(__be64);
-   struct sha256_state *sctx = shash_desc_ctx(desc);
__be64 *bits = (__be64 *)(sctx->buf + bit_offset);
unsigned int partial = sctx->count % SHA256_BLOCK_SIZE;
 
@@ -109,14 +118,18 @@ static inline int sha256_base_do_finalize(struct 
shash_desc *desc,
memset(sctx->buf + partial, 0x0, bit_offset - partial);
*bits = cpu_to_be64(sctx->count << 3);
block_fn(sctx, sctx->buf, 1);
+}
 
+static inline int sha256_base_do_finalize(struct shash_desc *desc,
+ sha256_block_fn *block_fn)
+{
+   sha256_do_finalize_direct(shash_desc_ctx(desc), block_fn);
return 0;
 }
 
-static inline int sha256_base_finish(struct shash_desc *desc, u8 *out)
+static inline void __sha256_base_finish(struct sha256_state *sctx,
+   unsigned int digest_size, u8 *out)
 {
-   unsigned int digest_size = crypto_shash_digestsize(desc->tfm);
-   struct sha256_state *sctx = shash_desc_ctx(desc);
__be32 *digest = (__be32 *)out;
int i;
 
@@ -124,5 +137,11 @@ static inline int sha256_base_finish(struct shash_desc 
*desc, u8 *out)
put_unaligned_be32(sctx->state[i], digest++);
 
*sctx = (struct sha256_state){};
+}
+
+static inline int sha256_base_finish(struct shash_desc *desc, u8 *out)
+{
+   __sha256_base_f

Re: x86-64: Maintain 16-byte stack alignment

2017-01-10 Thread Andy Lutomirski
On Tue, Jan 10, 2017 at 11:16 AM, Ard Biesheuvel
<ard.biesheu...@linaro.org> wrote:
> On 10 January 2017 at 19:00, Andy Lutomirski <l...@amacapital.net> wrote:
>> On Tue, Jan 10, 2017 at 9:30 AM, Ard Biesheuvel
>> <ard.biesheu...@linaro.org> wrote:
>>> On 10 January 2017 at 14:33, Herbert Xu <herb...@gondor.apana.org.au> wrote:
>>>> I recently applied the patch
>>>>
>>>> https://patchwork.kernel.org/patch/9468391/
>>>>
>>>> and ended up with a boot crash when it tried to run the x86 chacha20
>>>> code.  It turned out that the patch changed a manually aligned
>>>> stack buffer to one that is aligned by gcc.  What was happening was
>>>> that gcc can stack align to any value on x86-64 except 16.  The
>>>> reason is that gcc assumes that the stack is always 16-byte aligned,
>>>> which is not actually the case in the kernel.
>>>>
>>>
>>> Apologies for introducing this breakage. It seemed like an obvious and
>>> simple cleanup, so I didn't even bother to mention it in the commit
>>> log, but if the kernel does not guarantee 16 byte alignment, I guess
>>> we should revert to the old method. If SSE instructions are the only
>>> ones that require this alignment, then I suppose not having a ABI
>>> conforming stack pointer should not be an issue in general.
>>
>> Here's what I think is really going on.  This is partially from
>> memory, so I could be off base.  The kernel is up against
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383, which means that,
>> on some GCC versions (like the bad one and maybe even current ones),
>> things compiled without -mno-sse can't have the stack alignment set
>> properly.  IMO we should fix this in the affected code, not the entry
>> code.  In fact, I think that fixing it in the entry code won't even
>> fully fix it because modern GCC will compile the rest of the kernel
>> with 8-byte alignment and the stack will get randomly unaligned (GCC
>> 4.8 and newer).
>>
>> Can we just add __attribute__((force_align_arg_pointer)) to the
>> affected functions?  Maybe have:
>>
>> #define __USES_SSE __attribute__((force_align_arg_pointer))
>>
>> on affected gcc versions?
>>
>> ***HOWEVER***
>>
>> I think this is missing the tree for the supposed forest.  The actual
>> affected code appears to be:
>>
>> static int chacha20_simd(struct blkcipher_desc *desc, struct scatterlist 
>> *dst,
>>  struct scatterlist *src, unsigned int nbytes)
>> {
>> u32 *state, state_buf[16 + (CHACHA20_STATE_ALIGN / sizeof(u32)) - 1];
>>
>> ...
>>
>> state = (u32 *)roundup((uintptr_t)state_buf, CHACHA20_STATE_ALIGN);
>>
>> gcc presumably infers (incorrectly) that state_buf is 16-byte aligned
>> and optimizes out the roundup.  How about just declaring an actual
>> __aligned(16) buffer, marking the function
>> __attribute__((force_align_arg_pointer)), and being done with it?
>> After all, we need that forcible alignment on *all* gcc versions.
>>
>
> Actually, the breakage is introduced by the patch Herbert refers to
>
> https://patchwork.kernel.org/patch/9468391/
>
> where the state is replaced by a simple
>
> u32 state[16] __aligned(CHACHA20_STATE_ALIGN);
>
> which seemed harmless enough to me. So the code above works fine.

So how about just the one-line patch of adding the
force_align_arg_pointer?  Would that solve the problem?
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: x86-64: Maintain 16-byte stack alignment

2017-01-10 Thread Andy Lutomirski
On Tue, Jan 10, 2017 at 9:05 AM, Linus Torvalds
<torva...@linux-foundation.org> wrote:
> On Tue, Jan 10, 2017 at 6:39 AM, Herbert Xu <herb...@gondor.apana.org.au> 
> wrote:
>>
>> BTW this is with Debian gcc 4.7.2 which does not allow an 8-byte
>> stack alignment as attempted by the Makefile:
>
> I'm pretty sure we have random asm code that may not maintain a
> sus16-byte stack alignment when it calls other code (including, in some
> cases, calling C code).

I suspect so.

If we change this, changing pt_regs might make sense but is kind of
weird.  It also needs to be tested with and without frame pointers.

>
> So I'm not at all convinced that this is a good idea. We shouldn't
> expect 16-byte alignment to be something trustworthy.
>
>  Linus



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash

2016-12-27 Thread Andy Lutomirski
On Tue, Dec 27, 2016 at 6:16 AM, Daniel Borkmann <dan...@iogearbox.net> wrote:
> On 12/27/2016 10:58 AM, Herbert Xu wrote:
>>
>> On Mon, Dec 26, 2016 at 10:08:48AM -0800, Andy Lutomirski wrote:
>>>
>>>
>>> According to Daniel, the networking folks want to let embedded systems
>>> include BPF without requiring the crypto core.
>>
>>
>> Last I checked the IPv4 stack depended on the crypto API so this
>> sounds bogus.
>
>
> I think there's a bit of a mixup here with what I said. To clarify,
> requirement back then from tracing folks was that bpf engine and
> therefore bpf syscall can be build w/o networking enabled for small
> devices, so dependencies preferably need to be kept on a absolute
> minimum, same counts for either making it suddenly a depend on
> CRYPTO or a select CRYPTO for just those few lines that can be
> pulled in from lib/ code instead.

Somehow I had that in my head as "networking" not "tracing", probably
because of the TCA stuff.  Whoops.

Anyway, I'm rewriting the crypto part of the patch completely based on
Ard's feedback.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 4.10 3/6] bpf: Use SHA256 instead of SHA1 for bpf digests

2016-12-26 Thread Andy Lutomirski
On Mon, Dec 26, 2016 at 5:36 PM, Alexei Starovoitov
<alexei.starovoi...@gmail.com> wrote:
> On Sat, Dec 24, 2016 at 08:59:53PM +0100, Daniel Borkmann wrote:
>> On 12/24/2016 03:22 AM, Andy Lutomirski wrote:
>> >BPF digests are intended to be used to avoid reloading programs that
>> >are already loaded.  For use cases (CRIU?) where untrusted programs
>> >are involved, intentional hash collisions could cause the wrong BPF
>> >program to execute.  Additionally, if BPF digests are ever used
>> >in-kernel to skip verification, a hash collision could give privilege
>> >escalation directly.
>>
>> Just for the record, digests will never ever be used to skip the
>> verification step, so I don't know why this idea even comes up
>> here (?) or is part of the changelog? As this will never be done
>> anyway, rather drop that part so we can avoid confusion on this?
>
> +1 to what Daniel said above.
>
> For the others let me explain what this patch set is actually
> trying to accomplish.

The patch:

a) cleans up the code and

b) uses a cryptographic hash that is actually believed to satisfy the
definition of a cryptographic hash.

There's no excuse for not doing b.

> and I have an obvious NACK for bpf related patches 3,4,5,6.

Did you *read* the ones that were pure cleanups?

>
> sha1 is 20 bytes which is already a bit long to print and copy paste by 
> humans.
> whereas 4 byte jhash is a bit too short, since collisions are not that rare
> and may lead to incorrect assumptions from the users that develop the 
> programs.
> I would prefer something in 6-10 byte range that prevents collisions most of
> the time and short to print as hex, but I don't know of anything like this
> in the existing kernel and inventing bpf specific hash is not great.
> Another requirement for debugging (and prog_digest) that user space
> should be able to produce the same hash without asking kernel, so
> sha1 fits that as well, since it's well known and easy to put into library.

Then truncate them in user space.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash

2016-12-26 Thread Andy Lutomirski
On Mon, Dec 26, 2016 at 9:51 AM, Ard Biesheuvel
<ard.biesheu...@linaro.org> wrote:
> On 26 December 2016 at 07:57, Herbert Xu <herb...@gondor.apana.org.au> wrote:
>> On Sat, Dec 24, 2016 at 09:57:53AM -0800, Andy Lutomirski wrote:
>>>
>>> I actually do use incremental hashing later on.   BPF currently
>>> vmallocs() a big temporary buffer just so it can fill it and hash it.
>>> I change it to hash as it goes.
>>
>> How much data is this supposed to hash on average? If it's a large
>> amount then perhaps using the existing crypto API would be a better
>> option than adding this.
>>
>
> This is a good point actually: you didn't explain *why* BPF shouldn't
> depend on the crypto API.

According to Daniel, the networking folks want to let embedded systems
include BPF without requiring the crypto core.

At some point, I'd also like to use modern hash functions for module
verification.  If doing so would require the crypto core to be
available when modules are loaded, then the crypto core couldn't be
modular.  (Although it occurs to me that my patches get that wrong --
if this change happens, I need to split the code so that the library
functions can be built in even if CRYPTO=m.)

Daniel, would you be okay with BPF selecting CRYPTO and CRYPTO_HASH?

Also, as a bikeshed thought: I could call the functions
sha256_init_direct(), etc.  Then there wouldn't be namespace
collisions and the fact that they bypass accelerated drivers would be
more obvious.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash

2016-12-24 Thread Andy Lutomirski
On Sat, Dec 24, 2016 at 2:33 AM, Ard Biesheuvel
<ard.biesheu...@linaro.org> wrote:
> Hi Andy,
>
> On 24 December 2016 at 02:22, Andy Lutomirski <l...@kernel.org> wrote:
>> There are some pieecs of kernel code that want to compute SHA256
>> directly without going through the crypto core.  Adjust the exported
>> API to decouple it from the crypto core.
>>
>
> There are a bunch of things happening at the same time in this patch,
> i.e., unnecessary renames of functions with static linkage, return
> type changes to the base prototypes (int (*)(...) to void (*)(...))
> and the change for the base functions to take a struct sha256_state
> ctx rather than a shash_desc. I suppose you are mainly after the
> latter, so could we please drop the other changes?
>
> For the name clashes, could we simply use the crypto_ prefix for the
> globally visible functions rather than using names that are already in
> use? (and having to go around clean up the conflicts)
> As for the return type changes, the base functions intentionally
> return int to allow tail calls from the functions exposed by the
> crypto API (whose prototypes cannot be changed). Unlikely to matter in
> the grand scheme of things (especially now that the base layer
> consists of static inline functions primarily), but it is equally
> pointless to go around and change them to return void IMO.
>
> So what remains is the struct shash_desc to struct sha256_state
> change, which makes sense given that you are after a sha256_digest()
> function that does not require the crypto API. But it seems your use
> case does not rely on incremental hashing, and so there is no reason
> for the state to be exposed outside of the implementation, and we
> could simply expose a crypto_sha256_digest() routine from the
> sha256_generic.c implementation instead.

I actually do use incremental hashing later on.   BPF currently
vmallocs() a big temporary buffer just so it can fill it and hash it.
I change it to hash as it goes.

I painted the bike shed the other way because I thought that crypto_
names should indicate that they're the versions compatible with the
crypto API, but I take your point about churn.  Part of the reason I
didn't want to use crypto_sha256_update is because that function is
currently like this:

int crypto_sha256_update(struct shash_desc *desc, const u8 *data,
  unsigned int len)

and I wanted to avoid churn.  The sha256_update() functions scattered
all over were static, so I didn't worry about them.

I'm going to give this another try as a split-up series that tries to
avoid making any changes beyond simple function renames to the
drivers.

>
> Also, I strongly feel that crypto and other security related patches
> should be tested before being posted, even if they are only RFC,
> especially when they are posted by high profile kernel devs like
> yourself. (Your code incorrectly calls crypto_sha2_final() in a couple
> of places, resulting in the finalization being performed twice, once
> with the accelerated block transform and again with the generic
> transform)
>

I tested it, albeit poorly.  I wanted feedback on the API (thanks!)
and I figured I could more carefully check the implementation once the
API survives a bit of review.  Since it looks like I have to rework
this, I'd need to re-test anyway.

>> I suspect this will very slightly speed up the SHA256 shash operations
>> as well by reducing the amount of indirection involved.
>>
>
> I think you have a valid point when it comes to the complexity of the
> crypto API in general. But the struct sha256_state is embedded in the
> shash_desc rather than referred to via a pointer, so the level of
> indirection does not appear to change. And given how 99.9% of the
> SHA256 execution time is spent in the block transform routine anyway,
> I expect the performance delta to be in the noise tbh.

s/very slightly/negligibly?  There's an extra speedup from avoiding a
variable-length stack allocation, but that probably doesn't matter
much either.

>
> Finally, another thing to keep in mind is that the base layers of
> SHA-1, SHA-256 and SHA-512 are intentionally structured in the same
> way. If there is a need for a digest() entry point, I'd prefer to add
> them for all flavours.

I want to get sha256 right first.  Once it's in good shape, making the
same changes to the other variants should be easy.

>
> Whether this still belongs under crypto or under lib/sha256.c as a
> library function (allowing archs to override it) is open for debate.
> If hashing BPF programs becomes a hot spot, we probably have bigger
> problems.
>
> Regards,
> Ard.
>
> P.S. I do take your point regarding the arch_sha256_block_transform()
> proposed in your follow up email, but ther

Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash

2016-12-23 Thread Andy Lutomirski
On Fri, Dec 23, 2016 at 6:22 PM, Andy Lutomirski <l...@kernel.org> wrote:
> There are some pieecs of kernel code that want to compute SHA256
> directly without going through the crypto core.  Adjust the exported
> API to decouple it from the crypto core.
>
> I suspect this will very slightly speed up the SHA256 shash operations
> as well by reducing the amount of indirection involved.
>

I should also mention: there's a nice potential cleanup that's
possible on top of this.  Currently, most of the accelerated SHA256
implementations just swap out the block function.  Another approach to
enabling this would be to restructure sha256_update along the lines
of:

sha256_block_fn_t fn = arch_sha256_block_fn(len);
sha256_base_do_update(sctx, data, len, arch_sha256_block_fn(len));

The idea being that arch code can decide whether to use an accelerated
block function based on context (x86, for example, can't always use
xmm regs) and length (on x86, using the accelerated versions for short
digests is very slow due to the state save/restore that happens) and
then the core code can just use it.

This would allow a lot of the boilerplate that this patch was forced
to modify to be deleted outright.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 4.10 3/6] bpf: Use SHA256 instead of SHA1 for bpf digests

2016-12-23 Thread Andy Lutomirski
BPF digests are intended to be used to avoid reloading programs that
are already loaded.  For use cases (CRIU?) where untrusted programs
are involved, intentional hash collisions could cause the wrong BPF
program to execute.  Additionally, if BPF digests are ever used
in-kernel to skip verification, a hash collision could give privilege
escalation directly.

SHA1 is no longer considered adequately collision-resistant (see, for
example, all the major browsers dropping support for SHA1
certificates).  Use SHA256 instead.

I moved the digest field to keep all of the bpf program metadata in
the same cache line.

Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: Alexei Starovoitov <a...@kernel.org>
Signed-off-by: Andy Lutomirski <l...@kernel.org>
---
 include/linux/filter.h | 11 +++
 init/Kconfig   |  1 +
 kernel/bpf/core.c  | 41 +++--
 3 files changed, 11 insertions(+), 42 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 702314253797..23df2574e30c 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -14,7 +14,8 @@
 #include 
 #include 
 #include 
-#include 
+
+#include 
 
 #include 
 
@@ -408,11 +409,11 @@ struct bpf_prog {
kmemcheck_bitfield_end(meta);
enum bpf_prog_type  type;   /* Type of BPF program */
u32 len;/* Number of filter blocks */
-   u32 digest[SHA_DIGEST_WORDS]; /* Program digest */
struct bpf_prog_aux *aux;   /* Auxiliary fields */
struct sock_fprog_kern  *orig_prog; /* Original BPF program */
unsigned int(*bpf_func)(const void *ctx,
const struct bpf_insn *insn);
+   u8  digest[SHA256_DIGEST_SIZE]; /* Program digest */
/* Instructions for interpreter */
union {
struct sock_filter  insns[0];
@@ -519,12 +520,6 @@ static inline u32 bpf_prog_insn_size(const struct bpf_prog 
*prog)
return prog->len * sizeof(struct bpf_insn);
 }
 
-static inline u32 bpf_prog_digest_scratch_size(const struct bpf_prog *prog)
-{
-   return round_up(bpf_prog_insn_size(prog) +
-   sizeof(__be64) + 1, SHA_MESSAGE_BYTES);
-}
-
 static inline unsigned int bpf_prog_size(unsigned int proglen)
 {
return max(sizeof(struct bpf_prog),
diff --git a/init/Kconfig b/init/Kconfig
index 223b734abccd..5a4e2d99cc38 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1634,6 +1634,7 @@ config BPF_SYSCALL
bool "Enable bpf() system call"
select ANON_INODES
select BPF
+   select CRYPTO_SHA256_LIB
default n
help
  Enable the bpf() system call that allows to manipulate eBPF
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 1eb4f1303756..911993863799 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -148,22 +148,18 @@ void __bpf_prog_free(struct bpf_prog *fp)
 
 int bpf_prog_calc_digest(struct bpf_prog *fp)
 {
-   const u32 bits_offset = SHA_MESSAGE_BYTES - sizeof(__be64);
-   u32 raw_size = bpf_prog_digest_scratch_size(fp);
-   u32 ws[SHA_WORKSPACE_WORDS];
-   u32 i, bsize, psize, blocks;
+   struct sha256_state sha;
+   u32 i, psize;
struct bpf_insn *dst;
bool was_ld_map;
-   u8 *raw, *todo;
-   __be32 *result;
-   __be64 *bits;
+   u8 *raw;
 
-   raw = vmalloc(raw_size);
+   psize = bpf_prog_insn_size(fp);
+   raw = vmalloc(psize);
if (!raw)
return -ENOMEM;
 
-   sha_init(fp->digest);
-   memset(ws, 0, sizeof(ws));
+   sha256_init();
 
/* We need to take out the map fd for the digest calculation
 * since they are unstable from user space side.
@@ -188,30 +184,7 @@ int bpf_prog_calc_digest(struct bpf_prog *fp)
}
}
 
-   psize = bpf_prog_insn_size(fp);
-   memset([psize], 0, raw_size - psize);
-   raw[psize++] = 0x80;
-
-   bsize  = round_up(psize, SHA_MESSAGE_BYTES);
-   blocks = bsize / SHA_MESSAGE_BYTES;
-   todo   = raw;
-   if (bsize - psize >= sizeof(__be64)) {
-   bits = (__be64 *)(todo + bsize - sizeof(__be64));
-   } else {
-   bits = (__be64 *)(todo + bsize + bits_offset);
-   blocks++;
-   }
-   *bits = cpu_to_be64((psize - 1) << 3);
-
-   while (blocks--) {
-   sha_transform(fp->digest, todo, ws);
-   todo += SHA_MESSAGE_BYTES;
-   }
-
-   result = (__force __be32 *)fp->digest;
-   for (i = 0; i < SHA_DIGEST_WORDS; i++)
-   result[i] = cpu_to_be32(fp->digest[i]);
-
+   sha256_finup(, raw, psize, fp->digest);
vfree(raw);
return 0;
 }
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of

[RFC PATCH 4.10 2/6] crypto/sha256: Make the sha256 library functions selectable

2016-12-23 Thread Andy Lutomirski
This will let other kernel code call into sha256_init(), etc. without
pulling in the core crypto code.

Signed-off-by: Andy Lutomirski <l...@kernel.org>
---
 crypto/Kconfig  | 8 
 crypto/Makefile | 2 +-
 crypto/sha256_generic.c | 4 
 include/crypto/sha.h| 4 
 4 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 160f08e721cc..85a2b3440c2b 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -10,6 +10,13 @@ config XOR_BLOCKS
 source "crypto/async_tx/Kconfig"
 
 #
+# Cryptographic algorithms that are usable without the Crypto API.
+# None of these should have visible config options.
+#
+config CRYPTO_SHA256_LIB
+   bool
+
+#
 # Cryptographic API Configuration
 #
 menuconfig CRYPTO
@@ -763,6 +770,7 @@ config CRYPTO_SHA512_MB
 
 config CRYPTO_SHA256
tristate "SHA224 and SHA256 digest algorithm"
+   select CRYPTO_SHA256_LIB
select CRYPTO_HASH
help
  SHA256 secure hash standard (DFIPS 180-2).
diff --git a/crypto/Makefile b/crypto/Makefile
index b8f0e3eb0791..d147d4c911f5 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -71,7 +71,7 @@ obj-$(CONFIG_CRYPTO_RMD160) += rmd160.o
 obj-$(CONFIG_CRYPTO_RMD256) += rmd256.o
 obj-$(CONFIG_CRYPTO_RMD320) += rmd320.o
 obj-$(CONFIG_CRYPTO_SHA1) += sha1_generic.o
-obj-$(CONFIG_CRYPTO_SHA256) += sha256_generic.o
+obj-$(CONFIG_CRYPTO_SHA256_LIB) += sha256_generic.o
 obj-$(CONFIG_CRYPTO_SHA512) += sha512_generic.o
 obj-$(CONFIG_CRYPTO_SHA3) += sha3_generic.o
 obj-$(CONFIG_CRYPTO_WP512) += wp512.o
diff --git a/crypto/sha256_generic.c b/crypto/sha256_generic.c
index f2747893402c..9df71ac66dc4 100644
--- a/crypto/sha256_generic.c
+++ b/crypto/sha256_generic.c
@@ -261,6 +261,8 @@ void sha256_final(struct sha256_state *sctx, u8 *out)
 }
 EXPORT_SYMBOL(sha256_final);
 
+#ifdef CONFIG_CRYPTO_HASH
+
 static int crypto_sha256_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
 {
@@ -328,6 +330,8 @@ static void __exit sha256_generic_mod_fini(void)
 module_init(sha256_generic_mod_init);
 module_exit(sha256_generic_mod_fini);
 
+#endif /* CONFIG_CRYPTO_HASH */
+
 MODULE_LICENSE("GPL");
 MODULE_DESCRIPTION("SHA-224 and SHA-256 Secure Hash Algorithm");
 
diff --git a/include/crypto/sha.h b/include/crypto/sha.h
index 2b6978471605..381ba7fa5e3f 100644
--- a/include/crypto/sha.h
+++ b/include/crypto/sha.h
@@ -96,6 +96,8 @@ extern int crypto_sha1_update(struct shash_desc *desc, const 
u8 *data,
 extern int crypto_sha1_finup(struct shash_desc *desc, const u8 *data,
 unsigned int len, u8 *hash);
 
+#ifdef CONFIG_CRYPTO_SHA256_LIB
+
 static inline void sha256_init(struct sha256_state *sctx)
 {
sctx->state[0] = SHA256_H0;
@@ -121,6 +123,8 @@ static inline void sha256_finup(struct sha256_state *sctx, 
const u8 *data,
sha256_final(sctx, hash);
 }
 
+#endif /* CONFIG_CRYPTO_SHA256_LIB */
+
 extern int crypto_sha512_update(struct shash_desc *desc, const u8 *data,
  unsigned int len);
 
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash

2016-12-23 Thread Andy Lutomirski
There are some pieecs of kernel code that want to compute SHA256
directly without going through the crypto core.  Adjust the exported
API to decouple it from the crypto core.

I suspect this will very slightly speed up the SHA256 shash operations
as well by reducing the amount of indirection involved.

Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Herbert Xu <herb...@gondor.apana.org.au>
Signed-off-by: Andy Lutomirski <l...@kernel.org>
---
 arch/arm/crypto/sha2-ce-glue.c  | 10 ---
 arch/arm/crypto/sha256_glue.c   | 23 ++-
 arch/arm/crypto/sha256_neon_glue.c  | 34 +++--
 arch/arm64/crypto/sha2-ce-glue.c| 13 
 arch/arm64/crypto/sha256-glue.c | 59 +
 arch/x86/crypto/sha256_ssse3_glue.c | 46 +
 arch/x86/purgatory/purgatory.c  |  2 +-
 arch/x86/purgatory/sha256.c | 25 ++--
 arch/x86/purgatory/sha256.h | 22 --
 crypto/sha256_generic.c | 50 +++
 include/crypto/sha.h| 29 ++
 include/crypto/sha256_base.h| 40 -
 12 files changed, 184 insertions(+), 169 deletions(-)
 delete mode 100644 arch/x86/purgatory/sha256.h

diff --git a/arch/arm/crypto/sha2-ce-glue.c b/arch/arm/crypto/sha2-ce-glue.c
index 0755b2d657f3..8832c2f85591 100644
--- a/arch/arm/crypto/sha2-ce-glue.c
+++ b/arch/arm/crypto/sha2-ce-glue.c
@@ -38,7 +38,7 @@ static int sha2_ce_update(struct shash_desc *desc, const u8 
*data,
return crypto_sha256_arm_update(desc, data, len);
 
kernel_neon_begin();
-   sha256_base_do_update(desc, data, len,
+   sha256_base_do_update(sctx, data, len,
  (sha256_block_fn *)sha2_ce_transform);
kernel_neon_end();
 
@@ -48,17 +48,19 @@ static int sha2_ce_update(struct shash_desc *desc, const u8 
*data,
 static int sha2_ce_finup(struct shash_desc *desc, const u8 *data,
 unsigned int len, u8 *out)
 {
+   struct sha256_state *sctx = shash_desc_ctx(desc);
+
if (!may_use_simd())
return crypto_sha256_arm_finup(desc, data, len, out);
 
kernel_neon_begin();
if (len)
-   sha256_base_do_update(desc, data, len,
+   sha256_base_do_update(sctx, data, len,
  (sha256_block_fn *)sha2_ce_transform);
-   sha256_base_do_finalize(desc, (sha256_block_fn *)sha2_ce_transform);
+   sha256_base_do_finalize(sctx, (sha256_block_fn *)sha2_ce_transform);
kernel_neon_end();
 
-   return sha256_base_finish(desc, out);
+   return crypto_sha2_final(desc, out);
 }
 
 static int sha2_ce_final(struct shash_desc *desc, u8 *out)
diff --git a/arch/arm/crypto/sha256_glue.c b/arch/arm/crypto/sha256_glue.c
index a84e869ef900..405a29a9a9d3 100644
--- a/arch/arm/crypto/sha256_glue.c
+++ b/arch/arm/crypto/sha256_glue.c
@@ -36,27 +36,34 @@ asmlinkage void sha256_block_data_order(u32 *digest, const 
void *data,
 int crypto_sha256_arm_update(struct shash_desc *desc, const u8 *data,
 unsigned int len)
 {
+   struct sha256_state *sctx = shash_desc_ctx(desc);
+
/* make sure casting to sha256_block_fn() is safe */
BUILD_BUG_ON(offsetof(struct sha256_state, state) != 0);
 
-   return sha256_base_do_update(desc, data, len,
+   sha256_base_do_update(sctx, data, len,
(sha256_block_fn *)sha256_block_data_order);
+   return 0;
 }
 EXPORT_SYMBOL(crypto_sha256_arm_update);
 
-static int sha256_final(struct shash_desc *desc, u8 *out)
+static int sha256_arm_final(struct shash_desc *desc, u8 *out)
 {
-   sha256_base_do_finalize(desc,
+   struct sha256_state *sctx = shash_desc_ctx(desc);
+
+   sha256_base_do_finalize(sctx,
(sha256_block_fn *)sha256_block_data_order);
-   return sha256_base_finish(desc, out);
+   return crypto_sha2_final(desc, out);
 }
 
 int crypto_sha256_arm_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
 {
-   sha256_base_do_update(desc, data, len,
+   struct sha256_state *sctx = shash_desc_ctx(desc);
+
+   sha256_base_do_update(sctx, data, len,
  (sha256_block_fn *)sha256_block_data_order);
-   return sha256_final(desc, out);
+   return crypto_sha2_final(desc, out);
 }
 EXPORT_SYMBOL(crypto_sha256_arm_finup);
 
@@ -64,7 +71,7 @@ static struct shash_alg algs[] = { {
.digestsize =   SHA256_DIGEST_SIZE,
.init   =   sha256_base_init,
.update =   crypto_sha256_arm_update,
-   .final  =   sha256_final,
+   .final  =   sha256_arm_final,
.finup  =   crypto_sha256_arm_finup,
.descsize   =   sizeof(struct sha256_sta

[RFC PATCH 4.10 4/6] bpf: Avoid copying the entire BPF program when hashing it

2016-12-23 Thread Andy Lutomirski
The sha256 helpers can consume a message incrementally, so there's no need
to allocate a buffer to store the whole blob to be hashed.

This may be a slight slowdown for very long messages because gcc can't
inline the sha256_update() calls.  For reasonably-sized programs,
however, this should be a considerable speedup as vmalloc() is quite
slow.

Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: Alexei Starovoitov <a...@kernel.org>
Signed-off-by: Andy Lutomirski <l...@kernel.org>
---
 kernel/bpf/core.c | 34 ++
 1 file changed, 14 insertions(+), 20 deletions(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 911993863799..1c2931f505af 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -149,43 +149,37 @@ void __bpf_prog_free(struct bpf_prog *fp)
 int bpf_prog_calc_digest(struct bpf_prog *fp)
 {
struct sha256_state sha;
-   u32 i, psize;
-   struct bpf_insn *dst;
+   u32 i;
bool was_ld_map;
-   u8 *raw;
-
-   psize = bpf_prog_insn_size(fp);
-   raw = vmalloc(psize);
-   if (!raw)
-   return -ENOMEM;
 
sha256_init();
 
/* We need to take out the map fd for the digest calculation
 * since they are unstable from user space side.
 */
-   dst = (void *)raw;
for (i = 0, was_ld_map = false; i < fp->len; i++) {
-   dst[i] = fp->insnsi[i];
+   struct bpf_insn insn = fp->insnsi[i];
+
if (!was_ld_map &&
-   dst[i].code == (BPF_LD | BPF_IMM | BPF_DW) &&
-   dst[i].src_reg == BPF_PSEUDO_MAP_FD) {
+   insn.code == (BPF_LD | BPF_IMM | BPF_DW) &&
+   insn.src_reg == BPF_PSEUDO_MAP_FD) {
was_ld_map = true;
-   dst[i].imm = 0;
+   insn.imm = 0;
} else if (was_ld_map &&
-  dst[i].code == 0 &&
-  dst[i].dst_reg == 0 &&
-  dst[i].src_reg == 0 &&
-  dst[i].off == 0) {
+  insn.code == 0 &&
+  insn.dst_reg == 0 &&
+  insn.src_reg == 0 &&
+  insn.off == 0) {
was_ld_map = false;
-   dst[i].imm = 0;
+   insn.imm = 0;
} else {
was_ld_map = false;
}
+
+   sha256_update(, (const u8 *), sizeof(insn));
}
 
-   sha256_finup(, raw, psize, fp->digest);
-   vfree(raw);
+   sha256_final(, fp->digest);
return 0;
 }
 
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 4.10 5/6] bpf: Rename fdinfo's prog_digest to prog_sha256

2016-12-23 Thread Andy Lutomirski
This makes it easier to add another digest algorithm down the road
if needed.  It also serves to force any programs that might have
been written against a kernel that had 'prog_digest' to be updated.

This shouldn't violate any stable API policies, as no released
kernel has ever had 'prog_digest'.

Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: Alexei Starovoitov <a...@kernel.org>
Signed-off-by: Andy Lutomirski <l...@kernel.org>
---
 kernel/bpf/syscall.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index e89acea22ecf..956370b80296 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -694,7 +694,7 @@ static void bpf_prog_show_fdinfo(struct seq_file *m, struct 
file *filp)
seq_printf(m,
   "prog_type:\t%u\n"
   "prog_jited:\t%u\n"
-  "prog_digest:\t%s\n"
+  "prog_sha256:\t%s\n"
   "memlock:\t%llu\n",
   prog->type,
   prog->jited,
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 4.10 6/6] net: Rename TCA*BPF_DIGEST to ..._SHA256

2016-12-23 Thread Andy Lutomirski
This makes it easier to add another digest algorithm down the road if
needed.  It also serves to force any programs that might have been
written against a kernel that had the old field name to notice the
change and make any necessary changes.

This shouldn't violate any stable API policies, as no released kernel
has ever had TCA*BPF_DIGEST.

Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: Alexei Starovoitov <a...@kernel.org>
Signed-off-by: Andy Lutomirski <l...@kernel.org>
---
 include/uapi/linux/pkt_cls.h   | 2 +-
 include/uapi/linux/tc_act/tc_bpf.h | 2 +-
 net/sched/act_bpf.c| 2 +-
 net/sched/cls_bpf.c| 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index cb4bcdc58543..ac6b300c1550 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -397,7 +397,7 @@ enum {
TCA_BPF_NAME,
TCA_BPF_FLAGS,
TCA_BPF_FLAGS_GEN,
-   TCA_BPF_DIGEST,
+   TCA_BPF_SHA256,
__TCA_BPF_MAX,
 };
 
diff --git a/include/uapi/linux/tc_act/tc_bpf.h 
b/include/uapi/linux/tc_act/tc_bpf.h
index a6b88a6f7f71..eae18a7430eb 100644
--- a/include/uapi/linux/tc_act/tc_bpf.h
+++ b/include/uapi/linux/tc_act/tc_bpf.h
@@ -27,7 +27,7 @@ enum {
TCA_ACT_BPF_FD,
TCA_ACT_BPF_NAME,
TCA_ACT_BPF_PAD,
-   TCA_ACT_BPF_DIGEST,
+   TCA_ACT_BPF_SHA256,
__TCA_ACT_BPF_MAX,
 };
 #define TCA_ACT_BPF_MAX (__TCA_ACT_BPF_MAX - 1)
diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
index 1c60317f0121..3868a66d0b24 100644
--- a/net/sched/act_bpf.c
+++ b/net/sched/act_bpf.c
@@ -123,7 +123,7 @@ static int tcf_bpf_dump_ebpf_info(const struct tcf_bpf 
*prog,
nla_put_string(skb, TCA_ACT_BPF_NAME, prog->bpf_name))
return -EMSGSIZE;
 
-   nla = nla_reserve(skb, TCA_ACT_BPF_DIGEST,
+   nla = nla_reserve(skb, TCA_ACT_BPF_SHA256,
  sizeof(prog->filter->digest));
if (nla == NULL)
return -EMSGSIZE;
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index adc776048d1a..6fc110321621 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -555,7 +555,7 @@ static int cls_bpf_dump_ebpf_info(const struct cls_bpf_prog 
*prog,
nla_put_string(skb, TCA_BPF_NAME, prog->bpf_name))
return -EMSGSIZE;
 
-   nla = nla_reserve(skb, TCA_BPF_DIGEST, sizeof(prog->filter->digest));
+   nla = nla_reserve(skb, TCA_BPF_SHA256, sizeof(prog->filter->digest));
if (nla == NULL)
return -EMSGSIZE;
 
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 4.10 0/6] Switch BPF's digest to SHA256

2016-12-23 Thread Andy Lutomirski
Since there are plenty of uses for the new-in-4.10 BPF digest feature
that would be problematic if malicious users could produce collisions,
the BPF digest should be collision-resistant.  SHA-1 is no longer
considered collision-resistant, so switch it to SHA-256.

The actual switchover is trivial.  Most of this series consists of
cleanups to the SHA256 code to make it usable as a standalone library
(since BPF should not depend on crypto).

The cleaned up library is much more user-friendly than the SHA-1 code,
so this also significantly tidies up the BPF digest code.

This is intended for 4.10.  If this series misses 4.10 and nothing
takes its place, then we'll have an unpleasant ABI stability
situation.

Andy Lutomirski (6):
  crypto/sha256: Refactor the API so it can be used without shash
  crypto/sha256: Make the sha256 library functions selectable
  bpf: Use SHA256 instead of SHA1 for bpf digests
  bpf: Avoid copying the entire BPF program when hashing it
  bpf: Rename fdinfo's prog_digest to prog_sha256
  net: Rename TCA*BPF_DIGEST to ..._SHA256

 arch/arm/crypto/sha2-ce-glue.c  | 10 +++---
 arch/arm/crypto/sha256_glue.c   | 23 +-
 arch/arm/crypto/sha256_neon_glue.c  | 34 ++--
 arch/arm64/crypto/sha2-ce-glue.c| 13 
 arch/arm64/crypto/sha256-glue.c | 59 +++---
 arch/x86/crypto/sha256_ssse3_glue.c | 46 ---
 arch/x86/purgatory/purgatory.c  |  2 +-
 arch/x86/purgatory/sha256.c | 25 ++-
 arch/x86/purgatory/sha256.h | 22 -
 crypto/Kconfig  |  8 +
 crypto/Makefile |  2 +-
 crypto/sha256_generic.c | 54 +++
 include/crypto/sha.h| 33 ---
 include/crypto/sha256_base.h| 40 +++
 include/linux/filter.h  | 11 ++-
 include/uapi/linux/pkt_cls.h|  2 +-
 include/uapi/linux/tc_act/tc_bpf.h  |  2 +-
 init/Kconfig|  1 +
 kernel/bpf/core.c   | 63 +
 kernel/bpf/syscall.c|  2 +-
 net/sched/act_bpf.c |  2 +-
 net/sched/cls_bpf.c |  2 +-
 22 files changed, 225 insertions(+), 231 deletions(-)
 delete mode 100644 arch/x86/purgatory/sha256.h

-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-23 Thread Andy Lutomirski
On Fri, Dec 23, 2016 at 8:23 AM, Andy Lutomirski <l...@amacapital.net> wrote:
> On Fri, Dec 23, 2016 at 3:59 AM, Daniel Borkmann <dan...@iogearbox.net> wrote:
>> On 12/23/2016 11:59 AM, Hannes Frederic Sowa wrote:
>>>
>>> On Fri, 2016-12-23 at 11:04 +0100, Daniel Borkmann wrote:
>>>>
>>>> On 12/22/2016 05:59 PM, Hannes Frederic Sowa wrote:
>>>>>
>>>>> On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
>>
>> [...]
>>
>>>>> The hashing is not a proper sha1 neither, unfortunately. I think that
>>>>> is why it will have a custom implementation in iproute2?
>>>>
>>>>
>>>> Still trying to catch up on this admittedly bit confusing thread. I
>>>> did run automated tests over couple of days comparing the data I got
>>>> from fdinfo with the one from af_alg and found no mismatch on the test
>>>> cases varying from min to max possible program sizes. In the process
>>>> of testing, as you might have seen on netdev, I found couple of other
>>>> bugs in bpf code along the way and fixed them up as well. So my question,
>>>> do you or Andy or anyone participating in claiming this have any
>>>> concrete data or test cases that suggests something different? If yes,
>>>> I'm very curious to hear about it and willing fix it up, of course.
>>>> When I'm back from pto I'll prep and cook up my test suite to be
>>>> included into the selftests/bpf/, should have done this initially,
>>>> sorry about that. I'll also post something to expose the alg, that
>>>> sounds fine to me.
>>>
>>>
>>> Looking into your code closer, I noticed that you indeed seem to do the
>>> finalization of sha-1 by hand by aligning and padding the buffer
>>> accordingly and also patching in the necessary payload length.
>>>
>>> Apologies for my side for claiming that this is not correct sha1
>>> output, I was only looking at sha_transform and its implementation and
>>> couldn't see the padding and finalization round with embedding the data
>>> length in there and hadn't thought of it being done manually.
>>>
>>> Anyway, is it difficult to get the sha finalization into some common
>>> code library? It is not very bpf specific and crypto code reviewers
>>> won't find it there at all.
>>
>>
>> Yes, sure, I'll rework it that way (early next year when I'm back if
>> that's fine with you).
>
> Can we make it SHA-256 before 4.10 comes out, though?  This really
> looks like it will be used in situations where collisions matter and
> it will be exposed to malicious programs, and SHA-1 should not be used
> for new designs for this purpose because it simply isn't long enough.
>
> Also, a SHA-1 digest isn't a pile of u32s, so u32 digest[...] is very
> misleading.  That should be u8 or, at the very least, __be32.
>
> I realize that there isn't a sha-256 implementation in lib, but would
> it really be so bad to make the bpf digest only work (for now) when
> crypto is enabled?  I would *love* to see the crypto core learn how to
> export simple primitives for direct use without needing the whole
> crypto core, and this doesn't seem particularly hard to do, but I
> don't think that's 4.10 material.

I'm going to try to send out RFC patches for all of this today or
tomorrow.  It doesn't look bad at all.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-23 Thread Andy Lutomirski
On Fri, Dec 23, 2016 at 3:59 AM, Daniel Borkmann <dan...@iogearbox.net> wrote:
> On 12/23/2016 11:59 AM, Hannes Frederic Sowa wrote:
>>
>> On Fri, 2016-12-23 at 11:04 +0100, Daniel Borkmann wrote:
>>>
>>> On 12/22/2016 05:59 PM, Hannes Frederic Sowa wrote:
>>>>
>>>> On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
>
> [...]
>
>>>> The hashing is not a proper sha1 neither, unfortunately. I think that
>>>> is why it will have a custom implementation in iproute2?
>>>
>>>
>>> Still trying to catch up on this admittedly bit confusing thread. I
>>> did run automated tests over couple of days comparing the data I got
>>> from fdinfo with the one from af_alg and found no mismatch on the test
>>> cases varying from min to max possible program sizes. In the process
>>> of testing, as you might have seen on netdev, I found couple of other
>>> bugs in bpf code along the way and fixed them up as well. So my question,
>>> do you or Andy or anyone participating in claiming this have any
>>> concrete data or test cases that suggests something different? If yes,
>>> I'm very curious to hear about it and willing fix it up, of course.
>>> When I'm back from pto I'll prep and cook up my test suite to be
>>> included into the selftests/bpf/, should have done this initially,
>>> sorry about that. I'll also post something to expose the alg, that
>>> sounds fine to me.
>>
>>
>> Looking into your code closer, I noticed that you indeed seem to do the
>> finalization of sha-1 by hand by aligning and padding the buffer
>> accordingly and also patching in the necessary payload length.
>>
>> Apologies for my side for claiming that this is not correct sha1
>> output, I was only looking at sha_transform and its implementation and
>> couldn't see the padding and finalization round with embedding the data
>> length in there and hadn't thought of it being done manually.
>>
>> Anyway, is it difficult to get the sha finalization into some common
>> code library? It is not very bpf specific and crypto code reviewers
>> won't find it there at all.
>
>
> Yes, sure, I'll rework it that way (early next year when I'm back if
> that's fine with you).

Can we make it SHA-256 before 4.10 comes out, though?  This really
looks like it will be used in situations where collisions matter and
it will be exposed to malicious programs, and SHA-1 should not be used
for new designs for this purpose because it simply isn't long enough.

Also, a SHA-1 digest isn't a pile of u32s, so u32 digest[...] is very
misleading.  That should be u8 or, at the very least, __be32.

I realize that there isn't a sha-256 implementation in lib, but would
it really be so bad to make the bpf digest only work (for now) when
crypto is enabled?  I would *love* to see the crypto core learn how to
export simple primitives for direct use without needing the whole
crypto core, and this doesn't seem particularly hard to do, but I
don't think that's 4.10 material.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-22 Thread Andy Lutomirski
On Thu, Dec 22, 2016 at 11:34 AM, Alexei Starovoitov
<alexei.starovoi...@gmail.com> wrote:
> On Thu, Dec 22, 2016 at 9:25 AM, Andy Lutomirski <l...@amacapital.net> wrote:
>> On Thu, Dec 22, 2016 at 8:59 AM, Hannes Frederic Sowa
>> <han...@stressinduktion.org> wrote:
>>> On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
>>>
>>> We don't prevent ebpf programs being loaded based on the digest but
>>> just to uniquely identify loaded programs from user space and match up
>>> with their source.
>>
>> The commit log talks about using the hash to see if the program has
>> already been compiled and JITted.  If that's done, then a collision
>> will directly cause the kernel to malfunction.
>
> Andy, please read the code.
> we could have used jhash there just as well.
> Collisions are fine.

There's relevant in the code to read yet AFAICS.  The code exports it
via fdinfo, and userspace is expected to do something with it.  The
commit message says:

When programs are pinned and retrieved by an ELF loader, the loader
can check the program's digest through fdinfo and compare it against
one that was generated over the ELF file's program section to see
if the program needs to be reloaded.

I assume this means that a userspace component is expected to compare
the digest of a loaded program to a digest of a program it wants to
load and to use the result of the comparison to decide whether the
programs are the same.  If that's indeed the case (and it sure sounds
like it, and I fully expect CRIU to do very similar things when
support is added), then malicious collisions do matter.

It's also not quite clear to me why userspace needs to be able to
calculate the digest on its own.  A bpf(BPF_CALC_PROGRAM_DIGEST)
command that takes a BPF program as input and hashes it would seem to
serve the same purpose, and that would allow the kernel to key the
digest and change the algorithm down the road without breaking things.

Regardless, adding a new hash algorithm that is almost-but-not-quite
SHA-1 and making it a stable interface to userspace is not a good
thing.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-22 Thread Andy Lutomirski
On Thu, Dec 22, 2016 at 11:24 AM, George Spelvin
 wrote:
>> Having slept on this, I like it less.  The problem is that a
>> backtracking attacker doesn't just learn H(random seed || entropy_0 ||
>> secret || ...) -- they learn the internal state of the hash function
>> that generates that value.  This probably breaks any attempt to apply
>> security properties of the hash function.  For example, the internal
>> state could easily contain a whole bunch of prior outputs it in
>> verbatim.
>
> The problem is, anti-backtracing is in severe tension with your desire
> to use unmodified SipHash.
>
> First of all, I'd like to repeat that it isn't a design goal of the current
> generator and isn't necessary.

Agreed.

> Now, the main point:  it's not likely to be solvable.
>
> The problem with unmodified SipHash is that is has only 64 bits of
> output.  No mix-back mechanism can get around the fundamental problem
> that that's too small to prevent a brute-force guessing attack.  You need
> wider mix-back.  And getting more output from unmodified SipHash requires
> more finalization rounds, which is expensive.

It could only mix the output back in every two calls, in which case
you can backtrack up to one call but you need to do 2^128 work to
backtrack farther.  But yes, this is getting excessively complicated.

> Finally, your discomfort about an attacker learning the internal state...
> if an attacker knows the key and the input, they can construct the
> internal state.  Yes, we could discard the internal state and construct
> a fresh one on the next call to get_random_int, but what are you going
> to key it with?  What are you going to feed it?  What keeps *that*
> internal state any more secret from an attacker than one that's explicitly
> stored?

I do tend to like Ted's version in which we use batched
get_random_bytes() output.  If it's fast enough, it's simpler and lets
us get the full strength of a CSPRNG.

(Aside: some day I want to move all that code from drivers/ to lib/
and teach it to be buildable in userspace, too, so it's easy to play
with it, feed it test vectors, confirm that it's equivalent to a
reference implementation, write up the actual design and try to get
real cryptographers to analyze it, etc.)

> For example, clearly stating the concern over starting new processes
> with predictable layout, and the limits on the fresh entropy supply,
> has made me realize that there *is* a possible source: each exec()
> is passed 128 bits from get_random_bytes in the AT_RANDOM element of
> its auxv.  Since get_random_int() accesses "current" anyway, we could
> store some seed material there rather than using "pid".  While this is
> not fresh for each call to get_random_int, it *is* fresh for each new
> address space whose layout is being randomized.

Hmm, interesting.  Although, for ASLR, we could use get_random_bytes()
directly and be done with it.  It won't be a bottleneck.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-22 Thread Andy Lutomirski
On Thu, Dec 22, 2016 at 8:59 AM, Hannes Frederic Sowa
<han...@stressinduktion.org> wrote:
> On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
>> On Thu, Dec 22, 2016 at 7:51 AM, Hannes Frederic Sowa
>> <han...@stressinduktion.org> wrote:
>> > On Thu, 2016-12-22 at 16:41 +0100, Jason A. Donenfeld wrote:
>> > > Hi Hannes,
>> > >
>> > > On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa
>> > > <han...@stressinduktion.org> wrote:
>> > > > IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI.
>> > > > You don't want to give people new IPv6 addresses with the same stable
>> > > > secret (across reboots) after a kernel upgrade. Maybe they lose
>> > > > connectivity then and it is extra work?
>> > >
>> > > Ahh, too bad. So it goes.
>> >
>> > If no other users survive we can put it into the ipv6 module.
>> >
>> > > > The bpf hash stuff can be changed during this merge window, as it is
>> > > > not yet in a released kernel. Albeit I would probably have preferred
>> > > > something like sha256 here, which can be easily replicated by user
>> > > > space tools (minus the problem of patching out references to not
>> > > > hashable data, which must be zeroed).
>> > >
>> > > Oh, interesting, so time is of the essence then. Do you want to handle
>> > > changing the new eBPF code to something not-SHA1 before it's too late,
>> > > as part of a ne
>>
>> w patchset that can fast track itself to David? And
>> > > then I can preserve my large series for the next merge window.
>> >
>> > This algorithm should be a non-seeded algorithm, because the hashes
>> > should be stable and verifiable by user space tooling. Thus this would
>> > need a hashing algorithm that is hardened against pre-image
>> > attacks/collision resistance, which siphash is not. I would prefer some
>> > higher order SHA algorithm for that actually.
>> >
>>
>> You mean:
>>
>> commit 7bd509e311f408f7a5132fcdde2069af65fa05ae
>> Author: Daniel Borkmann <dan...@iogearbox.net>
>> Date:   Sun Dec 4 23:19:41 2016 +0100
>>
>> bpf: add prog_digest and expose it via fdinfo/netlink
>>
>>
>> Yes, please!  This actually matters for security -- imagine a
>> malicious program brute-forcing a collision so that it gets loaded
>> wrong.  And this is IMO a use case for SHA-256 or SHA-512/256
>> (preferably the latter).  Speed basically doesn't matter here and
>> Blake2 is both less stable (didn't they slightly change it recently?)
>> and much less well studied.
>
> We don't prevent ebpf programs being loaded based on the digest but
> just to uniquely identify loaded programs from user space and match up
> with their source.

The commit log talks about using the hash to see if the program has
already been compiled and JITted.  If that's done, then a collision
will directly cause the kernel to malfunction.

>> My inclination would have been to seed them with something that isn't
>> exposed to userspace for the precise reason that it would prevent user
>> code from making assumptions about what's in the hash.  But if there's
>> a use case for why user code needs to be able to calculate the hash on
>> its own, then that's fine.  But perhaps the actual fdinfo string
>> should be "sha256:abcd1234..." to give some flexibility down the road.
>>
>> Also:
>>
>> +   result = (__force __be32 *)fp->digest;
>> +   for (i = 0; i < SHA_DIGEST_WORDS; i++)
>> +   result[i] = cpu_to_be32(fp->digest[i]);
>>
>> Everyone, please, please, please don't open-code crypto primitives.
>> Is this and the code above it even correct?  It might be but on a very
>> brief glance it looks wrong to me.  If you're doing this to avoid
>> depending on crypto, then fix crypto so you can pull in the algorithm
>> without pulling in the whole crypto core.
>
> The hashing is not a proper sha1 neither, unfortunately. I think that
> is why it will have a custom implementation in iproute2?

Putting on crypto hat:

NAK NAK NAK NAK NAK.  "The Linux kernel invented a new primitive in
2016 when people know better and is going to handle it by porting that
new primitive to userspace" is not a particularly good argument.

Okay, crypto hack back off.

>
> I wondered if bpf program loading should have used the module loading
> infrastructure from the beginning...

That would be way too complicated and would be n

Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-22 Thread Andy Lutomirski
On Thu, Dec 22, 2016 at 8:28 AM, Jason A. Donenfeld  wrote:
> Hi all,
>
> I don't know what your design requirements are for this. It looks like
> you're generating some kind of crypto digest of a program, and you
> need to avoid collisions. If you'd like to go with a PRF (keyed hash
> function) that uses some kernel secret key, then I'd strongly suggest
> using Keyed-Blake2. Alternatively, if you need for userspace to be
> able to calculate the same hash, and don't want to use some kernel
> secret, then I'd still suggest using Blake2, which will be fast and
> secure.
>
> If you can wait until January, I'll work on a commit adding the
> primitive to the tree. I've already written it and I just need to get
> things cleaned up.
>
>> Blake2 is both less stable (didn't they slightly change it recently?)
>
> No, Blake2 is very stable. It's also extremely secure and has been
> extensively studied. Not to mention it's faster than SHA2. And if you
> want to use it as a PRF, it's obvious better suited and faster to use
> Blake2's keyed PRF mode than HMAC-SHA2.
>
> If you don't care about performance, and you don't want to use a PRF,
> then just use SHA2-256. If you're particularly concerned about certain
> types of attacks, you could go with SHA2-512 truncated to 256 bytes,
> but somehow I doubt you need this.

I don't think this cares about performance.  (Well, it cares about
performance, but the verifier will likely dominiate the cost by such a
large margin that the hash algo doesn't matter.)  And staying
FIPS-compliant-ish is worth a little bit, so I'd advocate for
something in the SHA2 family.

> If userspace hasn't landed, can we get away with changing this code
> after 4.10? Or should we just fix it before 4.10? Or should we revert
> it before 4.10? Development-policy-things like this I have zero clue
> about, so I heed to your guidance.

I think it should be fixed or reverted before 4.10.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-22 Thread Andy Lutomirski
On Wed, Dec 21, 2016 at 6:07 PM, Andy Lutomirski <l...@kernel.org> wrote:
> On Wed, Dec 21, 2016 at 5:13 PM, George Spelvin
> <li...@sciencehorizons.net> wrote:
>> As a separate message, to disentangle the threads, I'd like to
>> talk about get_random_long().
>>
>> After some thinking, I still like the "state-preserving" construct
>> that's equivalent to the current MD5 code.  Yes, we could just do
>> siphash(current_cpu || per_cpu_counter, global_key), but it's nice to
>> preserve a bit more.
>>
>> It requires library support from the SipHash code to return the full
>> SipHash state, but I hope that's a fair thing to ask for.
>
> I don't even think it needs that.  This is just adding a
> non-destructive final operation, right?
>
>>
>> Here's my current straw man design for comment.  It's very similar to
>> the current MD5-based design, but feeds all the seed material in the
>> "correct" way, as opposed to Xring directly into the MD5 state.
>>
>> * Each CPU has a (Half)SipHash state vector,
>>   "unsigned long get_random_int_hash[4]".  Unlike the current
>>   MD5 code, we take care to initialize it to an asymmetric state.
>>
>> * There's a global 256-bit random_int_secret (which we could
>>   reseed periodically).
>>
>> To generate a random number:
>> * If get_random_int_hash is all-zero, seed it with fresh a half-sized
>>   SipHash key and the appropriate XOR constants.
>> * Generate three words of random_get_entropy(), jiffies, and current->pid.
>>   (This is arbitary seed material, copied from the current code.)
>> * Crank through that with (Half)SipHash-1-0.
>> * Crank through the random_int_secret with (Half)SipHash-1-0.
>> * Return v1 ^ v3.
>
> Just to clarify, if we replace SipHash with a black box, I think this
> effectively means, where "entropy" is random_get_entropy() || jiffies
> || current->pid:
>
> The first call returns H(random seed || entropy_0 || secret).  The
> second call returns H(random seed || entropy_0 || secret || entropy_1
> || secret).  Etc.

Having slept on this, I like it less.  The problem is that a
backtracking attacker doesn't just learn H(random seed || entropy_0 ||
secret || ...) -- they learn the internal state of the hash function
that generates that value.  This probably breaks any attempt to apply
security properties of the hash function.  For example, the internal
state could easily contain a whole bunch of prior outputs it in
verbatim.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-22 Thread Andy Lutomirski
On Thu, Dec 22, 2016 at 7:51 AM, Hannes Frederic Sowa
 wrote:
> On Thu, 2016-12-22 at 16:41 +0100, Jason A. Donenfeld wrote:
>> Hi Hannes,
>>
>> On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa
>>  wrote:
>> > IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI.
>> > You don't want to give people new IPv6 addresses with the same stable
>> > secret (across reboots) after a kernel upgrade. Maybe they lose
>> > connectivity then and it is extra work?
>>
>> Ahh, too bad. So it goes.
>
> If no other users survive we can put it into the ipv6 module.
>
>> > The bpf hash stuff can be changed during this merge window, as it is
>> > not yet in a released kernel. Albeit I would probably have preferred
>> > something like sha256 here, which can be easily replicated by user
>> > space tools (minus the problem of patching out references to not
>> > hashable data, which must be zeroed).
>>
>> Oh, interesting, so time is of the essence then. Do you want to handle
>> changing the new eBPF code to something not-SHA1 before it's too late,
>> as part of a ne
w patchset that can fast track itself to David? And
>> then I can preserve my large series for the next merge window.
>
> This algorithm should be a non-seeded algorithm, because the hashes
> should be stable and verifiable by user space tooling. Thus this would
> need a hashing algorithm that is hardened against pre-image
> attacks/collision resistance, which siphash is not. I would prefer some
> higher order SHA algorithm for that actually.
>

You mean:

commit 7bd509e311f408f7a5132fcdde2069af65fa05ae
Author: Daniel Borkmann 
Date:   Sun Dec 4 23:19:41 2016 +0100

bpf: add prog_digest and expose it via fdinfo/netlink


Yes, please!  This actually matters for security -- imagine a
malicious program brute-forcing a collision so that it gets loaded
wrong.  And this is IMO a use case for SHA-256 or SHA-512/256
(preferably the latter).  Speed basically doesn't matter here and
Blake2 is both less stable (didn't they slightly change it recently?)
and much less well studied.

My inclination would have been to seed them with something that isn't
exposed to userspace for the precise reason that it would prevent user
code from making assumptions about what's in the hash.  But if there's
a use case for why user code needs to be able to calculate the hash on
its own, then that's fine.  But perhaps the actual fdinfo string
should be "sha256:abcd1234..." to give some flexibility down the road.

Also:

+   result = (__force __be32 *)fp->digest;
+   for (i = 0; i < SHA_DIGEST_WORDS; i++)
+   result[i] = cpu_to_be32(fp->digest[i]);

Everyone, please, please, please don't open-code crypto primitives.
Is this and the code above it even correct?  It might be but on a very
brief glance it looks wrong to me.  If you're doing this to avoid
depending on crypto, then fix crypto so you can pull in the algorithm
without pulling in the whole crypto core.

At the very least, there should be a separate function that calculates
the hash of a buffer and that function should explicitly run itself
against test vectors of various lengths to make sure that it's
calculating what it claims to be calculating.  And it doesn't look
like the userspace code has landed, so, if this thing isn't
calculating SHA1 correctly, it's plausible that no one has noticed.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-21 Thread Andy Lutomirski
On Wed, Dec 21, 2016 at 9:01 PM, George Spelvin
<li...@sciencehorizons.net> wrote:
> Andy Lutomirski wrote:
>> I don't even think it needs that.  This is just adding a
>> non-destructive final operation, right?
>
> It is, but the problem is that SipHash is intended for *small* inputs,
> so the standard implementations aren't broken into init/update/final
> functions.
>
> There's just one big function that keeps the state variables in
> registers and never stores them anywhere.
>
> If we *had* init/update/final functions, then it would be trivial.
>
>> Just to clarify, if we replace SipHash with a black box, I think this
>> effectively means, where "entropy" is random_get_entropy() || jiffies
>> || current->pid:
>
>> The first call returns H(random seed || entropy_0 || secret).  The
>> second call returns H(random seed || entropy_0 || secret || entropy_1
>> || secret).  Etc.
>
> Basically, yes.  I was skipping the padding byte and keying the
> finalization rounds on the grounds of "can't hurt and might help",
> but we could do it a more standard way.
>
>> If not, then I have a fairly strong preference to keep whatever
>> construction we come up with consistent with something that could
>> actually happen with invocations of unmodified SipHash -- then all the
>> security analysis on SipHash goes through.
>
> Okay.  I don't think it makes a difference, but it's not a *big* waste
> of time.  If we have finalization rounds, we can reduce the secret
> to 128 bits.
>
> If we include the padding byte, we can do one of two things:
> 1) Make the secret 184 bits, to fill up the final partial word as
>much as possible, or
> 2) Make the entropy 1 byte smaller and conceptually misalign the
>secret.  What we'd actually do is remove the last byte of
>the secret and include it in the entropy words, but that's
>just a rotation of the secret between storage and hashing.
>
> Also, I assume you'd like SipHash-2-4, since you want to rely
> on a security analysis.

I haven't looked, but I assume that the analysis at least thought
about reduced rounds, so maybe other variants are okay.

>> The one thing I don't like is
>> that I don't see how to prove that you can't run it backwards if you
>> manage to acquire a memory dump.  In fact, I that that there exist, at
>> least in theory, hash functions that are secure in the random oracle
>> model but that *can* be run backwards given the full state.  From
>> memory, SHA-3 has exactly that property, and it would be a bit sad for
>> a CSPRNG to be reversible.
>
> Er...  get_random_int() is specifically *not* designed to be resistant
> to state capture, and I didn't try.  Remember, what it's used for
> is ASLR, what we're worried about is somene learning the layouts
> of still-running processes, and and if you get a memory dump, you have
> the memory layout!

True, but it's called get_random_int(), and it seems like making it
stronger, especially if the performance cost is low to zero, is a good
thing.

>
> If you want anti-backtracking, though, it's easy to add.  What we
> hash is:
>
> entropy_0 || secret || output_0 || entropy_1 || secret || output_1 || ...
>
> You mix the output word right back in to the (unfinalized) state after
> generating it.  This is still equivalent to unmodified back-box SipHash,
> you're just using a (conceptually independent) SipHash invocation to
> produce some of its input.

Ah, cute.  This could probably be sped up by doing something like:

entropy_0 || secret || output_0 ^ entropy_1 || secret || ...

It's a little weak because the output is only 64 bits, so you could
plausibly backtrack it on a GPU or FPGA cluster or on an ASIC if the
old entropy is guessable.  I suspect there are sneaky ways around it
like using output_n-1 ^ output_n-2 or similar.  I'll sleep on it.

>
> The only remaining issues are:
> 1) How many rounds, and
> 2) May we use HalfSipHash?

I haven't looked closely enough to have a real opinion here.  I don't
know what the security margin is believed to be.

>
> I'd *like* to persuade you that skipping the padding byte wouldn't
> invalidate any security proofs, because it's true and would simplify
> the code.  But if you want 100% stock, I'm willing to cater to that.

I lean toward stock in the absence of a particularly good reason.  At
the very least I'd want to read that paper carefully.

>
> Ted, what do you think?



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-21 Thread Andy Lutomirski
On Wed, Dec 21, 2016 at 6:07 PM, Hannes Frederic Sowa
<han...@stressinduktion.org> wrote:
> On 22.12.2016 00:42, Andy Lutomirski wrote:
>> On Wed, Dec 21, 2016 at 3:02 PM, Jason A. Donenfeld <ja...@zx2c4.com> wrote:
>>>  unsigned int get_random_int(void)
>>>  {
>>> -   __u32 *hash;
>>> -   unsigned int ret;
>>> -
>>> -   if (arch_get_random_int())
>>> -   return ret;
>>> -
>>> -   hash = get_cpu_var(get_random_int_hash);
>>> -
>>> -   hash[0] += current->pid + jiffies + random_get_entropy();
>>> -   md5_transform(hash, random_int_secret);
>>> -   ret = hash[0];
>>> -   put_cpu_var(get_random_int_hash);
>>> -
>>> -   return ret;
>>> +   unsigned int arch_result;
>>> +   u64 result;
>>> +   struct random_int_secret *secret;
>>> +
>>> +   if (arch_get_random_int(_result))
>>> +   return arch_result;
>>> +
>>> +   secret = get_random_int_secret();
>>> +   result = siphash_3u64(secret->chaining, jiffies,
>>> + (u64)random_get_entropy() + current->pid,
>>> + secret->secret);
>>> +   secret->chaining += result;
>>> +   put_cpu_var(secret);
>>> +   return result;
>>>  }
>>>  EXPORT_SYMBOL(get_random_int);
>>
>> Hmm.  I haven't tried to prove anything for real.  But here goes (in
>> the random oracle model):
>>
>> Suppose I'm an attacker and I don't know the secret or the chaining
>> value.  Then, regardless of what the entropy is, I can't predict the
>> numbers.
>>
>> Now suppose I do know the secret and the chaining value due to some
>> leak.  If I want to deduce prior outputs, I think I'm stuck: I'd need
>> to find a value "result" such that prev_chaining + result = chaining
>> and result = H(prev_chaining, ..., secret);.  I don't think this can
>> be done efficiently in the random oracle model regardless of what the
>> "..." is.
>>
>> But, if I know the secret and chaining value, I can predict the next
>> output assuming I can guess the entropy.  What's worse is that, even
>> if I can't guess the entropy, if I *observe* the next output then I
>> can calculate the next chaining value.
>>
>> So this is probably good enough, and making it better is hard.  Changing it 
>> to:
>>
>> u64 entropy = (u64)random_get_entropy() + current->pid;
>> result = siphash(..., entropy, ...);
>> secret->chaining += result + entropy;
>>
>> would reduce this problem by forcing an attacker to brute-force the
>> entropy on each iteration, which is probably an improvement.
>>
>> To fully fix it, something like "catastrophic reseeding" would be
>> needed, but that's hard to get right.
>
> I wonder if Ted's proposal was analyzed further in terms of performance
> if get_random_int should provide cprng alike properties?
>
> For reference: https://lkml.org/lkml/2016/12/14/351
>
> The proposal made sense to me and would completely solve the above
> mentioned problem on the cost of repeatedly reseeding from the crng.
>

Unless I've misunderstood it, Ted's proposal causes get_random_int()
to return bytes straight from urandom (effectively), which should make
it very strong.  And if urandom is competitively fast now, I don't see
the problem.  ChaCha20 is designed for speed, after all.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-21 Thread Andy Lutomirski
On Wed, Dec 21, 2016 at 5:13 PM, George Spelvin
 wrote:
> As a separate message, to disentangle the threads, I'd like to
> talk about get_random_long().
>
> After some thinking, I still like the "state-preserving" construct
> that's equivalent to the current MD5 code.  Yes, we could just do
> siphash(current_cpu || per_cpu_counter, global_key), but it's nice to
> preserve a bit more.
>
> It requires library support from the SipHash code to return the full
> SipHash state, but I hope that's a fair thing to ask for.

I don't even think it needs that.  This is just adding a
non-destructive final operation, right?

>
> Here's my current straw man design for comment.  It's very similar to
> the current MD5-based design, but feeds all the seed material in the
> "correct" way, as opposed to Xring directly into the MD5 state.
>
> * Each CPU has a (Half)SipHash state vector,
>   "unsigned long get_random_int_hash[4]".  Unlike the current
>   MD5 code, we take care to initialize it to an asymmetric state.
>
> * There's a global 256-bit random_int_secret (which we could
>   reseed periodically).
>
> To generate a random number:
> * If get_random_int_hash is all-zero, seed it with fresh a half-sized
>   SipHash key and the appropriate XOR constants.
> * Generate three words of random_get_entropy(), jiffies, and current->pid.
>   (This is arbitary seed material, copied from the current code.)
> * Crank through that with (Half)SipHash-1-0.
> * Crank through the random_int_secret with (Half)SipHash-1-0.
> * Return v1 ^ v3.

Just to clarify, if we replace SipHash with a black box, I think this
effectively means, where "entropy" is random_get_entropy() || jiffies
|| current->pid:

The first call returns H(random seed || entropy_0 || secret).  The
second call returns H(random seed || entropy_0 || secret || entropy_1
|| secret).  Etc.

If not, then I have a fairly strong preference to keep whatever
construction we come up with consistent with something that could
actually happen with invocations of unmodified SipHash -- then all the
security analysis on SipHash goes through.

Anyway, I have mixed thoughts about the construction.  It manages to
have a wide state at essentially no cost, which buys us quite a bit of
work factor to break it.  Even with full knowledge of the state, an
output doesn't reveal the entropy except to the extent that it can be
brute-force (this is just whatever the appropriate extended version of
first preimage resistance gives us).  The one thing I don't like is
that I don't see how to prove that you can't run it backwards if you
manage to acquire a memory dump.  In fact, I that that there exist, at
least in theory, hash functions that are secure in the random oracle
model but that *can* be run backwards given the full state.  From
memory, SHA-3 has exactly that property, and it would be a bit sad for
a CSPRNG to be reversible.

We could also periodically mix in a big (128-bit?) chunk of fresh
urandom output to keep the bad guys guessing.

(P.S.  This kind of resembles the duplex sponge construction.  If
hardware SHA-3 ever shows up, a duplex sponge RNG might nice indeed.)
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: HalfSipHash Acceptable Usage

2016-12-21 Thread Andy Lutomirski
On Wed, Dec 21, 2016 at 9:25 AM, Linus Torvalds
 wrote:
> On Wed, Dec 21, 2016 at 7:55 AM, George Spelvin
>  wrote:
>>
>> How much does kernel_fpu_begin()/kernel_fpu_end() cost?
>
> It's now better than it used to be, but it's absolutely disastrous
> still. We're talking easily many hundreds of cycles. Under some loads,
> thousands.
>
> And I warn you already: it will _benchmark_ a hell of a lot better
> than it will work in reality. In benchmarks, you'll hit all the
> optimizations ("oh, I've already saved away all the FP registers, no
> need to do it again").
>
> In contrast, in reality, especially with things like "do it once or
> twice per incoming packet", you'll easily hit the absolute worst
> cases, where not only does it take a few hundred cycles to save the FP
> state, you'll then return to user space in between packets, which
> triggers the slow-path return code and reloads the FP state, which is
> another few hundred cycles plus.

Hah, you're thinking that the x86 code works the way that Rik and I
want it to work, and you just made my day. :)  What actually happens
is that the state is saved in kernel_fpu_begin() and restored in
kernel_fpu_end(), and it'll take a few hundred cycles best case.  If
you do it a bunch of times in a loop, you *might* trigger a CPU
optimization that notices that the state being saved is the same state
that was just restored, but you're still going to pay the full restore
code each round trip no matter what.

The code is much clearer in 4.10 kernels now that I deleted the unused
"lazy" branches.

>
> Similarly, in benchmarks you'll hit the "modern CPU's power on the AVX
> unit and keep it powered up for a while afterwards", while in real
> life you would quite easily hit the "oh, AVX is powered down because
> we were idle, now it powers up at half speed which is another latency
> hit _and_ the AVX unit won't run full out anyway".

I *think* that was mostly fixed in Broadwell or thereabouts (in terms
of latency -- throughput and power consumption still suffers).
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-21 Thread Andy Lutomirski
On Wed, Dec 21, 2016 at 3:02 PM, Jason A. Donenfeld  wrote:
>  unsigned int get_random_int(void)
>  {
> -   __u32 *hash;
> -   unsigned int ret;
> -
> -   if (arch_get_random_int())
> -   return ret;
> -
> -   hash = get_cpu_var(get_random_int_hash);
> -
> -   hash[0] += current->pid + jiffies + random_get_entropy();
> -   md5_transform(hash, random_int_secret);
> -   ret = hash[0];
> -   put_cpu_var(get_random_int_hash);
> -
> -   return ret;
> +   unsigned int arch_result;
> +   u64 result;
> +   struct random_int_secret *secret;
> +
> +   if (arch_get_random_int(_result))
> +   return arch_result;
> +
> +   secret = get_random_int_secret();
> +   result = siphash_3u64(secret->chaining, jiffies,
> + (u64)random_get_entropy() + current->pid,
> + secret->secret);
> +   secret->chaining += result;
> +   put_cpu_var(secret);
> +   return result;
>  }
>  EXPORT_SYMBOL(get_random_int);

Hmm.  I haven't tried to prove anything for real.  But here goes (in
the random oracle model):

Suppose I'm an attacker and I don't know the secret or the chaining
value.  Then, regardless of what the entropy is, I can't predict the
numbers.

Now suppose I do know the secret and the chaining value due to some
leak.  If I want to deduce prior outputs, I think I'm stuck: I'd need
to find a value "result" such that prev_chaining + result = chaining
and result = H(prev_chaining, ..., secret);.  I don't think this can
be done efficiently in the random oracle model regardless of what the
"..." is.

But, if I know the secret and chaining value, I can predict the next
output assuming I can guess the entropy.  What's worse is that, even
if I can't guess the entropy, if I *observe* the next output then I
can calculate the next chaining value.

So this is probably good enough, and making it better is hard.  Changing it to:

u64 entropy = (u64)random_get_entropy() + current->pid;
result = siphash(..., entropy, ...);
secret->chaining += result + entropy;

would reduce this problem by forcing an attacker to brute-force the
entropy on each iteration, which is probably an improvement.

To fully fix it, something like "catastrophic reseeding" would be
needed, but that's hard to get right.

(An aside: on x86 at least, using two percpu variables is faster
because directly percpu access is essentially free, whereas getting
the address of a percpu variable is not free.)
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 3/5] random: use SipHash in place of MD5

2016-12-16 Thread Andy Lutomirski
On Fri, Dec 16, 2016 at 1:45 PM, Jason A. Donenfeld <ja...@zx2c4.com> wrote:
> Hi Andy,
>
> On Fri, Dec 16, 2016 at 10:31 PM, Andy Lutomirski <l...@amacapital.net> wrote:
>> I think it would be nice to try to strenghen the PRNG construction.
>> FWIW, I'm not an expert in PRNGs, and there's fairly extensive
>> literature, but I can at least try.
>
> In an effort to keep this patchset as initially as uncontroversial as
> possible, I kept the same same construction as before and just swapped
> out slow MD5 for fast Siphash. Additionally, the function
> documentation says that it isn't cryptographically secure. But in the
> end I certainly agree with you; we should most definitely improve
> things, and seeing the eyeballs now on this series, I think we now
> might have a mandate to do so.
>
>> 1. A one-time leak of memory contents doesn't ruin security until
>> reboot.  This is especially value across suspend and/or hibernation.
>
> Ted and I were discussing this in another thread, and it sounds like
> he wants the same thing. I'll add re-generation of the secret every
> once in a while. Perhaps time-based makes more sense than
> counter-based for rekeying frequency?

Counter may be faster -- you don't need to read a timer.  Lots of CPUs
are surprisingly slow at timing.  OTOH jiffies are fast.

>
>> 2. An attack with a low work factor (2^64?) shouldn't break the scheme
>> until reboot.
>
> It won't. The key is 128-bits.

Whoops, I thought the key was 64-bit...

>
>> This is effectively doing:
>>
>> output = H(prev_output, weak "entropy", per-boot secret);
>>
>> One unfortunately downside is that, if used in a context where an
>> attacker can see a single output, the attacker learns the chaining
>> value.  If the attacker can guess the entropy, then, with 2^64 work,
>> they learn the secret, and they can predict future outputs.
>
> No, the secret is 128-bits, which isn't feasibly guessable. The secret
> also isn't part of the hash, but rather is the generator of the hash
> function. A keyed hash (a PRF) is a bit of a different construction
> than just hashing a secret value into a hash function.

I was thinking in the random oracle model, in which case the whole
function is just a PRF that takes a few input parameters, one of which
is a key.

>
>> Second, change the mode so that an attacker doesn't learn so much
>> internal state.  For example:
>>
>> output = H(old_chain, entropy, secret);
>> new_chain = old_chain + entropy + output;
>
> Happy to make this change, with making the chaining value additive
> rather than a replacement.
>
> However, I'm not sure adding entropy to the new_chain makes a
> different. That entropy is basically just the cycle count plus the
> jiffies count. If an attacker can already guess them, then adding them
> again to the chaining value doesn't really add much.

Agreed.  A simpler contruction would be:

chaining++;
output = H(chaining, secret);

And this looks a whole lot like Ted's ChaCha20 construction.

The benefit of my construction is that (in the random oracle model,
assuming my intuition is right), if an attacker misses ~128 samples
and entropy has at least one bit of independent min-entropy per
sample, then the attacker needs ~2^128 work to brute-force the
chaining value even fi the attacker knew both the original chaining
value and the secret.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Andy Lutomirski
On Fri, Dec 16, 2016 at 2:13 PM, George Spelvin
 wrote:
>> What should we do with get_random_int() and get_random_long()?  In
>> some cases it's being used in performance sensitive areas, and where
>> anti-DoS protection might be enough.  In others, maybe not so much.
>
> This is tricky.  The entire get_random_int() structure is an abuse of
> the hash function and will need to be thoroughly rethought to convert
> it to SipHash.  Remember, SipHash's security goals are very different
> from MD5, so there's no obvious way to do the conversion.
>
> (It's *documented* as "not cryptographically secure", but we know
> where that goes.)
>
>> If we rekeyed the secret used by get_random_int() and
>> get_random_long() frequently (say, every minute or every 5 minutes),
>> would that be sufficient for current and future users of these
>> interfaces?
>
> Remembering that on "real" machines it's full SipHash, then I'd say that
> 64-bit security + rekeying seems reasonable.
>
> The question is, the idea has recently been floated to make hsiphash =
> SipHash-1-3 on 64-bit machines.  Is *that* okay?
>
>
> The annoying thing about the currently proposed patch is that the *only*
> chaining is the returned value.  What I'd *like* to do is the same
> pattern as we do with md5, and remember v[0..3] between invocations.
> But there's no partial SipHash primitive; we only get one word back.
>
> Even
> *chaining += ret = siphash_3u64(...)
>
> would be an improvement.

This is almost exactly what I suggested in my email on the other
thread from a few seconds ago :)

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 3/5] random: use SipHash in place of MD5

2016-12-16 Thread Andy Lutomirski
On Thu, Dec 15, 2016 at 7:03 PM, Jason A. Donenfeld  wrote:
> -static DEFINE_PER_CPU(__u32 [MD5_DIGEST_WORDS], get_random_int_hash)
> -   __aligned(sizeof(unsigned long));
> +static DEFINE_PER_CPU(u64, get_random_int_chaining);
>

[...]

>  unsigned long get_random_long(void)
>  {
> -   __u32 *hash;
> unsigned long ret;
> +   u64 *chaining;
>
> if (arch_get_random_long())
> return ret;
>
> -   hash = get_cpu_var(get_random_int_hash);
> -
> -   hash[0] += current->pid + jiffies + random_get_entropy();
> -   md5_transform(hash, random_int_secret);
> -   ret = *(unsigned long *)hash;
> -   put_cpu_var(get_random_int_hash);
> -
> +   chaining = _cpu_var(get_random_int_chaining);
> +   ret = *chaining = siphash_3u64(*chaining, jiffies, 
> random_get_entropy() +
> +  current->pid, random_int_secret);
> +   put_cpu_var(get_random_int_chaining);
> return ret;
>  }

I think it would be nice to try to strenghen the PRNG construction.
FWIW, I'm not an expert in PRNGs, and there's fairly extensive
literature, but I can at least try.  Here are some properties I'd
like:

1. A one-time leak of memory contents doesn't ruin security until
reboot.  This is especially value across suspend and/or hibernation.

2. An attack with a low work factor (2^64?) shouldn't break the scheme
until reboot.


This is effectively doing:

output = H(prev_output, weak "entropy", per-boot secret);

One unfortunately downside is that, if used in a context where an
attacker can see a single output, the attacker learns the chaining
value.  If the attacker can guess the entropy, then, with 2^64 work,
they learn the secret, and they can predict future outputs.

I would advocate adding two types of improvements.  First, re-seed it
every now and then (every 128 calls?) by just replacing both the
chaining value and the percpu secret with fresh CSPRNG output.
Second, change the mode so that an attacker doesn't learn so much
internal state.  For example:

output = H(old_chain, entropy, secret);
new_chain = old_chain + entropy + output;

This increases the effort needed to brute-force the internal state
from 2^64 to 2^128 (barring any weaknesses in the scheme).

Also, can we not call this get_random_int()?  get_random_int() sounds
too much like get_random_bytes(), and the latter is intended to be a
real CSPRNG.  Can we call it get_weak_random_int() or similar?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] keys/encrypted: Fix two crypto-on-the-stack bugs

2016-12-13 Thread Andy Lutomirski
On Tue, Dec 13, 2016 at 6:48 PM, Andy Lutomirski <l...@kernel.org> wrote:
> The driver put a constant buffer of all zeros on the stack and
> pointed a scatterlist entry at it in two places.  This doesn't work
> with virtual stacks.  Use ZERO_PAGE instead.

Wait a second...

> -   sg_set_buf(_out[1], pad, sizeof pad);
> +   sg_set_buf(_out[1], empty_zero_page, 16);

My fix here is obviously bogus (I meant to use ZERO_PAGE(0)), but what
exactly is the code trying to do?  The old code makes no sense.  It's
setting the *output* buffer to zeroed padding.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] wusbcore: Fix one more crypto-on-the-stack bug

2016-12-13 Thread Andy Lutomirski
The driver put a constant buffer of all zeros on the stack and
pointed a scatterlist entry at it.  This doesn't work with virtual
stacks.  Use ZERO_PAGE instead.

Cc: sta...@vger.kernel.org # 4.9 only
Reported-by: Eric Biggers <ebigge...@gmail.com>
Signed-off-by: Andy Lutomirski <l...@kernel.org>
---
 drivers/usb/wusbcore/crypto.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/usb/wusbcore/crypto.c b/drivers/usb/wusbcore/crypto.c
index 79451f7ef1b7..062c205f0046 100644
--- a/drivers/usb/wusbcore/crypto.c
+++ b/drivers/usb/wusbcore/crypto.c
@@ -216,7 +216,6 @@ static int wusb_ccm_mac(struct crypto_skcipher *tfm_cbc,
struct scatterlist sg[4], sg_dst;
void *dst_buf;
size_t dst_size;
-   const u8 bzero[16] = { 0 };
u8 iv[crypto_skcipher_ivsize(tfm_cbc)];
size_t zero_padding;
 
@@ -261,7 +260,7 @@ static int wusb_ccm_mac(struct crypto_skcipher *tfm_cbc,
sg_set_buf([1], >b1, sizeof(scratch->b1));
sg_set_buf([2], b, blen);
/* 0 if well behaved :) */
-   sg_set_buf([3], bzero, zero_padding);
+   sg_set_page([3], ZERO_PAGE(0), zero_padding, 0);
sg_init_one(_dst, dst_buf, dst_size);
 
skcipher_request_set_tfm(req, tfm_cbc);
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] keys/encrypted: Fix two crypto-on-the-stack bugs

2016-12-13 Thread Andy Lutomirski
The driver put a constant buffer of all zeros on the stack and
pointed a scatterlist entry at it in two places.  This doesn't work
with virtual stacks.  Use ZERO_PAGE instead.

Cc: sta...@vger.kernel.org # 4.9 only
Reported-by: Eric Biggers <ebigge...@gmail.com>
Signed-off-by: Andy Lutomirski <l...@kernel.org>
---
 security/keys/encrypted-keys/encrypted.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/security/keys/encrypted-keys/encrypted.c 
b/security/keys/encrypted-keys/encrypted.c
index 17a06105ccb6..e963160ed26a 100644
--- a/security/keys/encrypted-keys/encrypted.c
+++ b/security/keys/encrypted-keys/encrypted.c
@@ -481,7 +481,6 @@ static int derived_key_encrypt(struct encrypted_key_payload 
*epayload,
unsigned int encrypted_datalen;
u8 iv[AES_BLOCK_SIZE];
unsigned int padlen;
-   char pad[16];
int ret;
 
encrypted_datalen = roundup(epayload->decrypted_datalen, blksize);
@@ -493,11 +492,10 @@ static int derived_key_encrypt(struct 
encrypted_key_payload *epayload,
goto out;
dump_decrypted_data(epayload);
 
-   memset(pad, 0, sizeof pad);
sg_init_table(sg_in, 2);
sg_set_buf(_in[0], epayload->decrypted_data,
   epayload->decrypted_datalen);
-   sg_set_buf(_in[1], pad, padlen);
+   sg_set_page(_in[1], ZERO_PAGE(0), padlen, 0);
 
sg_init_table(sg_out, 1);
sg_set_buf(sg_out, epayload->encrypted_data, encrypted_datalen);
@@ -584,7 +582,6 @@ static int derived_key_decrypt(struct encrypted_key_payload 
*epayload,
struct skcipher_request *req;
unsigned int encrypted_datalen;
u8 iv[AES_BLOCK_SIZE];
-   char pad[16];
int ret;
 
encrypted_datalen = roundup(epayload->decrypted_datalen, blksize);
@@ -594,13 +591,12 @@ static int derived_key_decrypt(struct 
encrypted_key_payload *epayload,
goto out;
dump_encrypted_data(epayload, encrypted_datalen);
 
-   memset(pad, 0, sizeof pad);
sg_init_table(sg_in, 1);
sg_init_table(sg_out, 2);
sg_set_buf(sg_in, epayload->encrypted_data, encrypted_datalen);
sg_set_buf(_out[0], epayload->decrypted_data,
   epayload->decrypted_datalen);
-   sg_set_buf(_out[1], pad, sizeof pad);
+   sg_set_buf(_out[1], empty_zero_page, 16);
 
memcpy(iv, epayload->iv, sizeof(iv));
skcipher_request_set_crypt(req, sg_in, sg_out, encrypted_datalen, iv);
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remaining crypto API regressions with CONFIG_VMAP_STACK

2016-12-13 Thread Andy Lutomirski
On Mon, Dec 12, 2016 at 7:39 PM, Herbert Xu <herb...@gondor.apana.org.au> wrote:
> On Mon, Dec 12, 2016 at 10:34:10AM -0800, Andy Lutomirski wrote:
>>
>> Here's my status.
>>
>> > drivers/crypto/bfin_crc.c:351
>> > drivers/crypto/qce/sha.c:299
>> > drivers/crypto/sahara.c:973,988
>> > drivers/crypto/talitos.c:1910
>> > drivers/crypto/qce/sha.c:325
>>
>> I have a patch to make these depend on !VMAP_STACK.
>
> Why? They're all marked as ASYNC AFAIK.
>
>> I have a patch to convert this to, drumroll please:
>>
>> priv->tx_tfm_mic = crypto_alloc_shash("michael_mic", 0,
>>   CRYPTO_ALG_ASYNC);
>>
>> Herbert, I'm at a loss as what a "shash" that's "ASYNC" even means.
>
> Having 0 as type and CRYPTO_ALG_ASYNC as mask in general means
> that we're requesting a sync algorithm (i.e., ASYNC bit off).
>
> However, it is completely unnecessary for shash as they can never
> be async.  So this could be changed to just ("michael_mic", 0, 0).

I'm confused by a bunch of this.

1. Is it really the case that crypto_alloc_xyz(..., CRYPTO_ALG_ASYNC)
means to allocate a *synchronous* transform?  That's not what I
expected.

2. What guarantees that an async request is never allocated on the
stack?  If it's just convention, could an assertion be added
somewhere?
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] orinoco: Use shash instead of ahash for MIC calculations

2016-12-13 Thread Andy Lutomirski
On Tue, Dec 13, 2016 at 3:35 AM, Kalle Valo <kv...@codeaurora.org> wrote:
> Andy Lutomirski <l...@kernel.org> writes:
>
>> Eric Biggers pointed out that the orinoco driver pointed scatterlists
>> at the stack.
>>
>> Fix it by switching from ahash to shash.  The result should be
>> simpler, faster, and more correct.
>>
>> Cc: sta...@vger.kernel.org # 4.9 only
>> Reported-by: Eric Biggers <ebigge...@gmail.com>
>> Signed-off-by: Andy Lutomirski <l...@kernel.org>
>
> "more correct"? Does this fix a real user visible bug or what? And why
> just stable 4.9, does this maybe have something to do with
> CONFIG_VMAP_STACK?

Whoops, I had that text in some other patches but forgot to put it in
this one.  It'll blow up with CONFIG_VMAP_STACK=y if a debug option
like CONFIG_DEBUG_VIRTUAL=y is set.  It may work by accident if
debugging is off.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] keys/encrypted: Fix two crypto-on-the-stack bugs

2016-12-13 Thread Andy Lutomirski
[add some people who might know]

On Tue, Dec 13, 2016 at 4:20 AM, David Laight <david.lai...@aculab.com> wrote:
> From: Andy Lutomirski
>> Sent: 12 December 2016 20:53
>> The driver put a constant buffer of all zeros on the stack and
>> pointed a scatterlist entry at it in two places.  This doesn't work
>> with virtual stacks.  Use a static 16-byte buffer of zeros instead.
> ...
>
> I didn't think you could dma from static data either.

According to lib/dma-debug.c, you can't dma to or from kernel text or
rodata, but you can dma to or from kernel bss or data.  So
empty_zero_page should be okay, because it's not rodata right now.

But I think this is rather silly.  Joerg, Linus, etc: would it be okay
to change lib/dma-debug.c to allow DMA *from* rodata?  After all,
rodata is ordinary memory, is backed by struct page, etc.  And DMA
from the zero page had better be okay because I think it happens if
you mmap some zeros, don't write to them, and then direct I/O them to
a device.  Then I could also move empty_zero_page to rodata.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] keys/encrypted: Fix two crypto-on-the-stack bugs

2016-12-12 Thread Andy Lutomirski
On Mon, Dec 12, 2016 at 2:28 PM, David Howells <dhowe...@redhat.com> wrote:
> Andy Lutomirski <l...@kernel.org> wrote:
>
>> +static const char zero_pad[16] = {0};
>
> Isn't there a global page of zeros or something that we can share?  Also, you
> shouldn't explicitly initialise it so that it stays in .bss.

This is a double-edged sword.  It seems that omitting the
initialization causes it to go in .bss, which isn't read only.  I have
no idea why initializing make a difference at all -- the IMO sensible
behavior would be to put it in .rodata as NOBITS either way.

But I'll use empty_zero_page.

>
>> - sg_set_buf(_out[1], pad, sizeof pad);
>> + sg_set_buf(_out[1], zero_pad, sizeof zero_pad);
>
> Can you put brackets on the sizeof?

Will do for v2.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] wusbcore: Fix one more crypto-on-the-stack bug

2016-12-12 Thread Andy Lutomirski
On Mon, Dec 12, 2016 at 1:44 PM, Greg KH <gre...@linuxfoundation.org> wrote:
> On Mon, Dec 12, 2016 at 12:52:45PM -0800, Andy Lutomirski wrote:
>> The driver put a constant buffer of all zeros on the stack and
>> pointed a scatterlist entry at it.  This doesn't work with virtual
>> stacks.  Make the buffer static to fix it.
>>
>> Cc: sta...@vger.kernel.org # 4.9 only
>> Reported-by: Eric Biggers <ebigge...@gmail.com>
>> Signed-off-by: Andy Lutomirski <l...@kernel.org>
>> ---
>>  drivers/usb/wusbcore/crypto.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/usb/wusbcore/crypto.c b/drivers/usb/wusbcore/crypto.c
>> index 79451f7ef1b7..a7e007a0cd49 100644
>> --- a/drivers/usb/wusbcore/crypto.c
>> +++ b/drivers/usb/wusbcore/crypto.c
>> @@ -216,7 +216,7 @@ static int wusb_ccm_mac(struct crypto_skcipher *tfm_cbc,
>>   struct scatterlist sg[4], sg_dst;
>>   void *dst_buf;
>>   size_t dst_size;
>> - const u8 bzero[16] = { 0 };
>> + static const u8 bzero[16] = { 0 };
>
> Hm, can static memory handle DMA?  That's a requirement of the USB
> stack, does this data later end up being sent down to a USB host
> controller?

I think it doesn't, but I'll switch it to use empty_zero_page instead.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] orinoco: Use shash instead of ahash for MIC calculations

2016-12-12 Thread Andy Lutomirski
Eric Biggers pointed out that the orinoco driver pointed scatterlists
at the stack.

Fix it by switching from ahash to shash.  The result should be
simpler, faster, and more correct.

Cc: sta...@vger.kernel.org # 4.9 only
Reported-by: Eric Biggers <ebigge...@gmail.com>
Signed-off-by: Andy Lutomirski <l...@kernel.org>
---

Compile-tested only.

drivers/net/wireless/intersil/orinoco/mic.c | 44 +++--
 drivers/net/wireless/intersil/orinoco/mic.h |  3 +-
 drivers/net/wireless/intersil/orinoco/orinoco.h |  4 +--
 3 files changed, 30 insertions(+), 21 deletions(-)

diff --git a/drivers/net/wireless/intersil/orinoco/mic.c 
b/drivers/net/wireless/intersil/orinoco/mic.c
index bc7397d709d3..08bc7822f820 100644
--- a/drivers/net/wireless/intersil/orinoco/mic.c
+++ b/drivers/net/wireless/intersil/orinoco/mic.c
@@ -16,7 +16,7 @@
 //
 int orinoco_mic_init(struct orinoco_private *priv)
 {
-   priv->tx_tfm_mic = crypto_alloc_ahash("michael_mic", 0,
+   priv->tx_tfm_mic = crypto_alloc_shash("michael_mic", 0,
  CRYPTO_ALG_ASYNC);
if (IS_ERR(priv->tx_tfm_mic)) {
printk(KERN_DEBUG "orinoco_mic_init: could not allocate "
@@ -25,7 +25,7 @@ int orinoco_mic_init(struct orinoco_private *priv)
return -ENOMEM;
}
 
-   priv->rx_tfm_mic = crypto_alloc_ahash("michael_mic", 0,
+   priv->rx_tfm_mic = crypto_alloc_shash("michael_mic", 0,
  CRYPTO_ALG_ASYNC);
if (IS_ERR(priv->rx_tfm_mic)) {
printk(KERN_DEBUG "orinoco_mic_init: could not allocate "
@@ -40,17 +40,16 @@ int orinoco_mic_init(struct orinoco_private *priv)
 void orinoco_mic_free(struct orinoco_private *priv)
 {
if (priv->tx_tfm_mic)
-   crypto_free_ahash(priv->tx_tfm_mic);
+   crypto_free_shash(priv->tx_tfm_mic);
if (priv->rx_tfm_mic)
-   crypto_free_ahash(priv->rx_tfm_mic);
+   crypto_free_shash(priv->rx_tfm_mic);
 }
 
-int orinoco_mic(struct crypto_ahash *tfm_michael, u8 *key,
+int orinoco_mic(struct crypto_shash *tfm_michael, u8 *key,
u8 *da, u8 *sa, u8 priority,
u8 *data, size_t data_len, u8 *mic)
 {
-   AHASH_REQUEST_ON_STACK(req, tfm_michael);
-   struct scatterlist sg[2];
+   SHASH_DESC_ON_STACK(desc, tfm_michael);
u8 hdr[ETH_HLEN + 2]; /* size of header + padding */
int err;
 
@@ -67,18 +66,27 @@ int orinoco_mic(struct crypto_ahash *tfm_michael, u8 *key,
hdr[ETH_ALEN * 2 + 2] = 0;
hdr[ETH_ALEN * 2 + 3] = 0;
 
-   /* Use scatter gather to MIC header and data in one go */
-   sg_init_table(sg, 2);
-   sg_set_buf([0], hdr, sizeof(hdr));
-   sg_set_buf([1], data, data_len);
+   desc->tfm = tfm_michael;
+   desc->flags = 0;
 
-   if (crypto_ahash_setkey(tfm_michael, key, MIC_KEYLEN))
-   return -1;
+   err = crypto_shash_setkey(tfm_michael, key, MIC_KEYLEN);
+   if (err)
+   return err;
+
+   err = crypto_shash_init(desc);
+   if (err)
+   return err;
+
+   err = crypto_shash_update(desc, hdr, sizeof(hdr));
+   if (err)
+   return err;
+
+   err = crypto_shash_update(desc, data, data_len);
+   if (err)
+   return err;
+
+   err = crypto_shash_final(desc, mic);
+   shash_desc_zero(desc);
 
-   ahash_request_set_tfm(req, tfm_michael);
-   ahash_request_set_callback(req, 0, NULL, NULL);
-   ahash_request_set_crypt(req, sg, mic, data_len + sizeof(hdr));
-   err = crypto_ahash_digest(req);
-   ahash_request_zero(req);
return err;
 }
diff --git a/drivers/net/wireless/intersil/orinoco/mic.h 
b/drivers/net/wireless/intersil/orinoco/mic.h
index ce731d05cc98..e8724e889219 100644
--- a/drivers/net/wireless/intersil/orinoco/mic.h
+++ b/drivers/net/wireless/intersil/orinoco/mic.h
@@ -6,6 +6,7 @@
 #define _ORINOCO_MIC_H_
 
 #include 
+#include 
 
 #define MICHAEL_MIC_LEN 8
 
@@ -15,7 +16,7 @@ struct crypto_ahash;
 
 int orinoco_mic_init(struct orinoco_private *priv);
 void orinoco_mic_free(struct orinoco_private *priv);
-int orinoco_mic(struct crypto_ahash *tfm_michael, u8 *key,
+int orinoco_mic(struct crypto_shash *tfm_michael, u8 *key,
u8 *da, u8 *sa, u8 priority,
u8 *data, size_t data_len, u8 *mic);
 
diff --git a/drivers/net/wireless/intersil/orinoco/orinoco.h 
b/drivers/net/wireless/intersil/orinoco/orinoco.h
index 2f0c84b1c440..5fa1c3e3713f 100644
--- a/drivers/net/wireless/intersil/orinoco/orinoco.h
+++ b/drivers/net/wireless/intersil/orinoco/orinoco.h
@@ -152,8 +152,8 @@ struct orinoco_private {
u8 *wpa_ie;
int wpa_ie_len;
 
-   struct crypto_ahash *rx_tfm_

[PATCH] crypto: Make a few drivers depend on !VMAP_STACK

2016-12-12 Thread Andy Lutomirski
Eric Biggers found several crypto drivers that point scatterlists at
the stack.  These drivers should never load on x86, but, for future
safety, make them depend on !VMAP_STACK.

No -stable backport should be needed as no released kernel
configuration should be affected.

Reported-by: Eric Biggers <ebigge...@gmail.com>
Signed-off-by: Andy Lutomirski <l...@kernel.org>
---
 drivers/crypto/Kconfig | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index 4d2b81f2b223..481e67e54ffd 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -245,7 +245,7 @@ config CRYPTO_DEV_TALITOS
select CRYPTO_BLKCIPHER
select CRYPTO_HASH
select HW_RANDOM
-   depends on FSL_SOC
+   depends on FSL_SOC && !VMAP_STACK
help
  Say 'Y' here to use the Freescale Security Engine (SEC)
  to offload cryptographic algorithm computation.
@@ -357,7 +357,7 @@ config CRYPTO_DEV_PICOXCELL
 
 config CRYPTO_DEV_SAHARA
tristate "Support for SAHARA crypto accelerator"
-   depends on ARCH_MXC && OF
+   depends on ARCH_MXC && OF && !VMAP_STACK
select CRYPTO_BLKCIPHER
select CRYPTO_AES
select CRYPTO_ECB
@@ -410,7 +410,7 @@ endif # if CRYPTO_DEV_UX500
 
 config CRYPTO_DEV_BFIN_CRC
tristate "Support for Blackfin CRC hardware"
-   depends on BF60x
+   depends on BF60x && !VMAP_STACK
help
  Newer Blackfin processors have CRC hardware. Select this if you
  want to use the Blackfin CRC module.
@@ -487,7 +487,7 @@ source "drivers/crypto/qat/Kconfig"
 
 config CRYPTO_DEV_QCE
tristate "Qualcomm crypto engine accelerator"
-   depends on (ARCH_QCOM || COMPILE_TEST) && HAS_DMA && HAS_IOMEM
+   depends on (ARCH_QCOM || COMPILE_TEST) && HAS_DMA && HAS_IOMEM && 
!VMAP_STACK
select CRYPTO_AES
select CRYPTO_DES
select CRYPTO_ECB
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] cifs: Fix smbencrypt() to stop pointing a scatterlist at the stack

2016-12-12 Thread Andy Lutomirski
smbencrypt() points a scatterlist to the stack, which is breaks if
CONFIG_VMAP_STACK=y.

Fix it by switching to crypto_cipher_encrypt_one().  The new code
should be considerably faster as an added benefit.

This code is nearly identical to some code that Eric Biggers
suggested.

Cc: sta...@vger.kernel.org # 4.9 only
Reported-by: Eric Biggers <ebigge...@gmail.com>
Signed-off-by: Andy Lutomirski <l...@kernel.org>
---

Compile-tested only.

fs/cifs/smbencrypt.c | 40 
 1 file changed, 8 insertions(+), 32 deletions(-)

diff --git a/fs/cifs/smbencrypt.c b/fs/cifs/smbencrypt.c
index 699b7868108f..c12bffefa3c9 100644
--- a/fs/cifs/smbencrypt.c
+++ b/fs/cifs/smbencrypt.c
@@ -23,7 +23,7 @@
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
 */
 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -69,46 +69,22 @@ str_to_key(unsigned char *str, unsigned char *key)
 static int
 smbhash(unsigned char *out, const unsigned char *in, unsigned char *key)
 {
-   int rc;
unsigned char key2[8];
-   struct crypto_skcipher *tfm_des;
-   struct scatterlist sgin, sgout;
-   struct skcipher_request *req;
+   struct crypto_cipher *tfm_des;
 
str_to_key(key, key2);
 
-   tfm_des = crypto_alloc_skcipher("ecb(des)", 0, CRYPTO_ALG_ASYNC);
+   tfm_des = crypto_alloc_cipher("des", 0, 0);
if (IS_ERR(tfm_des)) {
-   rc = PTR_ERR(tfm_des);
-   cifs_dbg(VFS, "could not allocate des crypto API\n");
-   goto smbhash_err;
-   }
-
-   req = skcipher_request_alloc(tfm_des, GFP_KERNEL);
-   if (!req) {
-   rc = -ENOMEM;
cifs_dbg(VFS, "could not allocate des crypto API\n");
-   goto smbhash_free_skcipher;
+   return PTR_ERR(tfm_des);
}
 
-   crypto_skcipher_setkey(tfm_des, key2, 8);
-
-   sg_init_one(, in, 8);
-   sg_init_one(, out, 8);
+   crypto_cipher_setkey(tfm_des, key2, 8);
+   crypto_cipher_encrypt_one(tfm_des, out, in);
+   crypto_free_cipher(tfm_des);
 
-   skcipher_request_set_callback(req, 0, NULL, NULL);
-   skcipher_request_set_crypt(req, , , 8, NULL);
-
-   rc = crypto_skcipher_encrypt(req);
-   if (rc)
-   cifs_dbg(VFS, "could not encrypt crypt key rc: %d\n", rc);
-
-   skcipher_request_free(req);
-
-smbhash_free_skcipher:
-   crypto_free_skcipher(tfm_des);
-smbhash_err:
-   return rc;
+   return 0;
 }
 
 static int
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] wusbcore: Fix one more crypto-on-the-stack bug

2016-12-12 Thread Andy Lutomirski
The driver put a constant buffer of all zeros on the stack and
pointed a scatterlist entry at it.  This doesn't work with virtual
stacks.  Make the buffer static to fix it.

Cc: sta...@vger.kernel.org # 4.9 only
Reported-by: Eric Biggers <ebigge...@gmail.com>
Signed-off-by: Andy Lutomirski <l...@kernel.org>
---
 drivers/usb/wusbcore/crypto.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/wusbcore/crypto.c b/drivers/usb/wusbcore/crypto.c
index 79451f7ef1b7..a7e007a0cd49 100644
--- a/drivers/usb/wusbcore/crypto.c
+++ b/drivers/usb/wusbcore/crypto.c
@@ -216,7 +216,7 @@ static int wusb_ccm_mac(struct crypto_skcipher *tfm_cbc,
struct scatterlist sg[4], sg_dst;
void *dst_buf;
size_t dst_size;
-   const u8 bzero[16] = { 0 };
+   static const u8 bzero[16] = { 0 };
u8 iv[crypto_skcipher_ivsize(tfm_cbc)];
size_t zero_padding;
 
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] keys/encrypted: Fix two crypto-on-the-stack bugs

2016-12-12 Thread Andy Lutomirski
The driver put a constant buffer of all zeros on the stack and
pointed a scatterlist entry at it in two places.  This doesn't work
with virtual stacks.  Use a static 16-byte buffer of zeros instead.

Cc: sta...@vger.kernel.org # 4.9 only
Reported-by: Eric Biggers <ebigge...@gmail.com>
Signed-off-by: Andy Lutomirski <l...@kernel.org>
---
 security/keys/encrypted-keys/encrypted.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/security/keys/encrypted-keys/encrypted.c 
b/security/keys/encrypted-keys/encrypted.c
index 17a06105ccb6..fab2fb864002 100644
--- a/security/keys/encrypted-keys/encrypted.c
+++ b/security/keys/encrypted-keys/encrypted.c
@@ -46,6 +46,7 @@ static const char key_format_default[] = "default";
 static const char key_format_ecryptfs[] = "ecryptfs";
 static unsigned int ivsize;
 static int blksize;
+static const char zero_pad[16] = {0};
 
 #define KEY_TRUSTED_PREFIX_LEN (sizeof (KEY_TRUSTED_PREFIX) - 1)
 #define KEY_USER_PREFIX_LEN (sizeof (KEY_USER_PREFIX) - 1)
@@ -481,7 +482,6 @@ static int derived_key_encrypt(struct encrypted_key_payload 
*epayload,
unsigned int encrypted_datalen;
u8 iv[AES_BLOCK_SIZE];
unsigned int padlen;
-   char pad[16];
int ret;
 
encrypted_datalen = roundup(epayload->decrypted_datalen, blksize);
@@ -493,11 +493,10 @@ static int derived_key_encrypt(struct 
encrypted_key_payload *epayload,
goto out;
dump_decrypted_data(epayload);
 
-   memset(pad, 0, sizeof pad);
sg_init_table(sg_in, 2);
sg_set_buf(_in[0], epayload->decrypted_data,
   epayload->decrypted_datalen);
-   sg_set_buf(_in[1], pad, padlen);
+   sg_set_buf(_in[1], zero_pad, padlen);
 
sg_init_table(sg_out, 1);
sg_set_buf(sg_out, epayload->encrypted_data, encrypted_datalen);
@@ -584,7 +583,6 @@ static int derived_key_decrypt(struct encrypted_key_payload 
*epayload,
struct skcipher_request *req;
unsigned int encrypted_datalen;
u8 iv[AES_BLOCK_SIZE];
-   char pad[16];
int ret;
 
encrypted_datalen = roundup(epayload->decrypted_datalen, blksize);
@@ -594,13 +592,12 @@ static int derived_key_decrypt(struct 
encrypted_key_payload *epayload,
goto out;
dump_encrypted_data(epayload, encrypted_datalen);
 
-   memset(pad, 0, sizeof pad);
sg_init_table(sg_in, 1);
sg_init_table(sg_out, 2);
sg_set_buf(sg_in, epayload->encrypted_data, encrypted_datalen);
sg_set_buf(_out[0], epayload->decrypted_data,
   epayload->decrypted_datalen);
-   sg_set_buf(_out[1], pad, sizeof pad);
+   sg_set_buf(_out[1], zero_pad, sizeof zero_pad);
 
memcpy(iv, epayload->iv, sizeof(iv));
skcipher_request_set_crypt(req, sg_in, sg_out, encrypted_datalen, iv);
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remaining crypto API regressions with CONFIG_VMAP_STACK

2016-12-12 Thread Andy Lutomirski
On Fri, Dec 9, 2016 at 3:08 PM, Eric Biggers  wrote:
> In the 4.9 kernel, virtually-mapped stacks will be supported and enabled by
> default on x86_64.  This has been exposing a number of problems in which
> on-stack buffers are being passed into the crypto API, which to support crypto
> accelerators operates on 'struct page' rather than on virtual memory.

Here's my status.

> drivers/crypto/bfin_crc.c:351
> drivers/crypto/qce/sha.c:299
> drivers/crypto/sahara.c:973,988
> drivers/crypto/talitos.c:1910
> drivers/crypto/qce/sha.c:325

I have a patch to make these depend on !VMAP_STACK.

> drivers/crypto/ccp/ccp-crypto-aes-cmac.c:105,119,142
> drivers/crypto/ccp/ccp-crypto-sha.c:95,109,124
> drivers/crypto/ccp/ccp-crypto-aes-xts.c:162
> drivers/crypto/ccp/ccp-crypto-aes.c:94

According to Herbert, these are fine.  I'm personally less convinced
since I'm very confused as to what "async" means in the crypto code,
but I'm going to leave these alone.

>
> And these other places do crypto operations on buffers clearly on the stack:
>
> drivers/usb/wusbcore/crypto.c:264
> security/keys/encrypted-keys/encrypted.c:500

I have a patch.

> drivers/net/wireless/intersil/orinoco/mic.c:72

I have a patch to convert this to, drumroll please:

priv->tx_tfm_mic = crypto_alloc_shash("michael_mic", 0,
  CRYPTO_ALG_ASYNC);

Herbert, I'm at a loss as what a "shash" that's "ASYNC" even means.

> net/ceph/crypto.c:182

This:

size_t zero_padding = (0x10 - (src_len & 0x0f));

is an amazing line of code...

But this driver uses cbc and wants to do synchronous crypto, and I
don't think that the crypto API supports real synchronous crypto using
CBC, so I'm going to let someone else fix this.

> net/rxrpc/rxkad.c:737,1000

Herbert, can you fix this?

> fs/cifs/smbencrypt.c:96

I have a patch.


My pile is here:

https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/log/?h=crypto

I'll send out the patches soon.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remaining crypto API regressions with CONFIG_VMAP_STACK

2016-12-11 Thread Andy Lutomirski
On Fri, Dec 9, 2016 at 3:08 PM, Eric Biggers  wrote:
> In the 4.9 kernel, virtually-mapped stacks will be supported and enabled by
> default on x86_64.  This has been exposing a number of problems in which
> on-stack buffers are being passed into the crypto API, which to support crypto
> accelerators operates on 'struct page' rather than on virtual memory.
>

> fs/cifs/smbencrypt.c:96

This should use crypto_cipher_encrypt_one(), I think.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kernel-hardening] Re: Remaining crypto API regressions with CONFIG_VMAP_STACK

2016-12-10 Thread Andy Lutomirski
cc: Viro because I'm talking about iov_iter.

On Sat, Dec 10, 2016 at 6:45 AM, Jason A. Donenfeld  wrote:
> Hi Herbert,
>
> On Sat, Dec 10, 2016 at 6:37 AM, Herbert Xu  
> wrote:
>> As for AEAD we never had a sync interface to begin with and I
>> don't think I'm going to add one.
>
> That's too bad to hear. I hope you'll reconsider. Modern cryptographic
> design is heading more and more in the direction of using AEADs for
> interesting things, and having a sync interface would be a lot easier
> for implementing these protocols. In the same way many protocols need
> a hash of some data, now protocols often want some particular data
> encrypted with an AEAD using a particular key and nonce and AD. One
> protocol that comes to mind is Noise [1].
>

I think that sync vs async has gotten conflated with
vectored-vs-nonvectored and the results are unfortunate.

There are a lot of users in the tree that are trying to do crypto on
very small pieces of data and want to have that data consist of the
concatenation of two small buffers and/or want to use primitives that
don't have "sync" interfaces.  These users are stuck using async
interfaces even though using async implementations makes no sense for
them.

I'd love to see the API restructured a bit to decouple all of these
considerations.  One approach might be to teach iov_iter about
scatterlists.  Then, for each primitive, there could be two entry
points:

1. A simplified and lower-overhead entry.  You pass it an iov_iter
(and, depending on what the operation is, an output iov_iter), it does
the crypto synchronously, and returns.  Operating in-place might be
permitted for some primitives.

2. A full-featured async entry.  You pass it iov_iters and it requires
that the iov_iters be backed by scatterlists in order to operate
asynchronously.

I see no reason that the decisions to use virtual vs physical
addressing or to do vectored vs non-vectored IO should be tied up with
asynchronicity.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Remaining crypto API regressions with CONFIG_VMAP_STACK

2016-12-09 Thread Andy Lutomirski
On Fri, Dec 9, 2016 at 3:08 PM, Eric Biggers  wrote:
> In the 4.9 kernel, virtually-mapped stacks will be supported and enabled by
> default on x86_64.  This has been exposing a number of problems in which
> on-stack buffers are being passed into the crypto API, which to support crypto
> accelerators operates on 'struct page' rather than on virtual memory.
>
> Some of these problems have already been fixed, but I was wondering how many
> problems remain, so I briefly looked through all the callers of sg_set_buf() 
> and
> sg_init_one().  Overall I found quite a few remaining problems, detailed 
> below.
>
> The following crypto drivers initialize a scatterlist to point into an
> ahash_request, which may have been allocated on the stack with
> AHASH_REQUEST_ON_STACK():
>
> drivers/crypto/bfin_crc.c:351
> drivers/crypto/qce/sha.c:299
> drivers/crypto/sahara.c:973,988
> drivers/crypto/talitos.c:1910

This are impossible or highly unlikely on x86.

> drivers/crypto/ccp/ccp-crypto-aes-cmac.c:105,119,142
> drivers/crypto/ccp/ccp-crypto-sha.c:95,109,124

These

> drivers/crypto/qce/sha.c:325

This is impossible on x86.

>
> The following crypto drivers initialize a scatterlist to point into an
> ablkcipher_request, which may have been allocated on the stack with
> SKCIPHER_REQUEST_ON_STACK():
>
> drivers/crypto/ccp/ccp-crypto-aes-xts.c:162
> drivers/crypto/ccp/ccp-crypto-aes.c:94

These are real, and I wish I'd known about them sooner.

>
> And these other places do crypto operations on buffers clearly on the stack:
>
> drivers/net/wireless/intersil/orinoco/mic.c:72

Ick.

> drivers/usb/wusbcore/crypto.c:264

Well, crud.  I thought I had fixed this driver but I missed one case.
Will send a fix tomorrow.  But I'm still unconvinced that this
hardware ever shipped.

> net/ceph/crypto.c:182

Ick.

> net/rxrpc/rxkad.c:737,1000

Well, crud.  This was supposed to have been fixed in:

commit a263629da519b2064588377416e067727e2cbdf9
Author: Herbert Xu 
Date:   Sun Jun 26 14:55:24 2016 -0700

rxrpc: Avoid using stack memory in SG lists in rxkad


> security/keys/encrypted-keys/encrypted.c:500

That's a trivial one-liner.  Patch coming tomorrow.

> fs/cifs/smbencrypt.c:96

Ick.

>
> Note: I almost certainly missed some, since I excluded places where the use 
> of a
> stack buffer was not obvious to me.  I also excluded AEAD algorithms since 
> there
> isn't an AEAD_REQUEST_ON_STACK() macro (yet).
>
> The "good" news with these bugs is that on x86_64 without CONFIG_DEBUG_SG=y or
> CONFIG_DEBUG_VIRTUAL=y, you can still do virt_to_page() and then 
> page_address()
> on a vmalloc address and get back the same address, even though you aren't
> *supposed* to be able to do this.  This will make things still work for most
> people.  The bad news is that if you happen to have consumed just about 1 page
> (or N pages) of your stack at the time you call the crypto API, your stack
> buffer may actually span physically non-contiguous pages, so the crypto
> algorithm will scribble over some unrelated page.

Are you sure?  If it round-trips to the same virtual address, it
doesn't matter if the buffer is contiguous.

>  Also, hardware crypto drivers
> which actually do operate on physical memory will break too.

Those were already broken.  DMA has been illegal on the stack for
years and DMA debugging would have caught it.

>
> So I am wondering: is the best solution really to make all these crypto API
> algorithms and users use heap buffers, as opposed to something like 
> maintaining
> a lowmem alias for the stack, or introducing a more general function to 
> convert
> buffers (possibly in the vmalloc space) into scatterlists?  And if the current
> solution is desired, who is going to fix all of these bugs and when?

The *right* solution IMO is to fix crypto to stop using scatterlists.
Scatterlists are for DMA using physical addresses, and they're
inappropriate almost every user of them that's using them for crypto.
kiov would be much better -- it would make sense and it would be
faster.

I have a hack to make scatterlists pointing to the stack work (as long
as they're only one element), but that's seriously gross.

Herbert, how hard would it be to teach the crypto code to use a more
sensible data structure than scatterlist and to use coccinelle fix
this stuff for real?

In the mean time, we should patch the handful of drivers that matter.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vmalloced stacks and scatterwalk_map_and_copy()

2016-11-20 Thread Andy Lutomirski
[Adding Thorsten to help keep this from getting lost]

On Thu, Nov 3, 2016 at 1:30 PM, Andy Lutomirski <l...@amacapital.net> wrote:
> On Thu, Nov 3, 2016 at 11:16 AM, Eric Biggers <ebigg...@google.com> wrote:
>> Hello,
>>
>> I hit the BUG_ON() in arch/x86/mm/physaddr.c:26 while testing some crypto 
>> code
>> in an x86_64 kernel with CONFIG_DEBUG_VIRTUAL=y and CONFIG_VMAP_STACK=y:
>>
>> /* carry flag will be set if starting x was >= PAGE_OFFSET */
>> VIRTUAL_BUG_ON((x > y) || !phys_addr_valid(x));
>>
>> The problem is the following code in scatterwalk_map_and_copy() in
>> crypto/scatterwalk.c, which tries to determine if the buffer passed in 
>> aliases
>> the physical memory of the first segment of the scatterlist:
>>
>> if (sg_page(sg) == virt_to_page(buf) &&
>> sg->offset == offset_in_page(buf))
>> return;
>
> ...
>
>>
>> Currently I think the best solution would be to require that callers to
>> scatterwalk_map_and_copy() do not alias their source and destination.  Then 
>> the
>> alias check could be removed.  This check has only been there since v4.2 
>> (commit
>> 74412fd5d71b6), so I'd hope not many callers rely on the behavior.  I'm not 
>> sure
>> exactly which ones do, though.
>>
>> Thoughts on this?
>
> The relevant commit is:
>
> commit 74412fd5d71b6eda0beb302aa467da000f0d530c
> Author: Herbert Xu <herb...@gondor.apana.org.au>
> Date:   Thu May 21 15:11:12 2015 +0800
>
> crypto: scatterwalk - Check for same address in map_and_copy
>
> This patch adds a check for in scatterwalk_map_and_copy to avoid
> copying from the same address to the same address.  This is going
> to be used for IV copying in AEAD IV generators.
>
> There is no provision for partial overlaps.
>
> This patch also uses the new scatterwalk_ffwd instead of doing
> it by hand in scatterwalk_map_and_copy.
>
> Signed-off-by: Herbert Xu <herb...@gondor.apana.org.au>
>
> Herbert, can you clarify this?  The check seems rather bizarre --
> you're doing an incomplete check for aliasing and skipping the whole
> copy if the beginning aliases.  In any event the stack *can't*
> reasonably alias the scatterlist because a scatterlist can't safely
> point to the stack.  Is there any code that actually relies on the
> aliasing-detecting behavior?
>
> Also, Herbert, it seems like the considerable majority of the crypto
> code is acting on kernel virtual memory addresses and does software
> processing.  Would it perhaps make sense to add a kvec-based or
> iov_iter-based interface to the crypto code?  I bet it would be quite
> a bit faster and it would make crypto on stack buffers work directly.


Ping, everyone!

It's getting quite close to 4.9 release time.  Is there an actual bug
here?  Because, if so, we need to fix it.  My preference is to just
delete the weird aliasing check, but it would be really nice to know
if that check is needed for some reason.

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] fscrypto: don't use on-stack buffer for filename encryption

2016-11-06 Thread Andy Lutomirski
On Nov 5, 2016 8:13 AM, "Kent Overstreet"  wrote:
>
> On Thu, Nov 03, 2016 at 03:03:01PM -0700, Eric Biggers wrote:
> > With the new (in 4.9) option to use a virtually-mapped stack
> > (CONFIG_VMAP_STACK), stack buffers cannot be used as input/output for
> > the scatterlist crypto API because they may not be directly mappable to
> > struct page.  For short filenames, fname_encrypt() was encrypting a
> > stack buffer holding the padded filename.  Fix it by encrypting the
> > filename in-place in the output buffer, thereby making the temporary
> > buffer unnecessary.
> >
> > This bug could most easily be observed in a CONFIG_DEBUG_SG kernel
> > because this allowed the BUG in sg_set_buf() to be triggered.
> >
> > Signed-off-by: Eric Biggers 
>
> > - alloc_buf = kmalloc(ciphertext_len, GFP_NOFS);
> > - if (!alloc_buf)
> > - return -ENOMEM;
> > - workbuf = alloc_buf;
>
> Vmalloc memory does have struct pages - you just need to use vmalloc_to_page()
> instead of virt_to_page. Look at drivers/md/bcache/util.c bch_bio_map() if you
> want an example.
>
> It would be better to just fix the sg code to handle vmalloc memory, instead 
> of
> adding a kmalloc() that can fail (and an error path that inevitably won't be
> tested).

Probably not, because (a) vmalloc_to_page is slow and (b) stack
buffers can span physically noncontiguous pages.

I think it's best to either avoid stack buffers or to teach crypto about kiov.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vmalloced stacks and scatterwalk_map_and_copy()

2016-11-03 Thread Andy Lutomirski
On Thu, Nov 3, 2016 at 4:10 PM, Eric Biggers <ebigg...@google.com> wrote:
> On Thu, Nov 03, 2016 at 02:12:07PM -0700, Eric Biggers wrote:
>> On Thu, Nov 03, 2016 at 01:30:49PM -0700, Andy Lutomirski wrote:
>> >
>> > Also, Herbert, it seems like the considerable majority of the crypto
>> > code is acting on kernel virtual memory addresses and does software
>> > processing.  Would it perhaps make sense to add a kvec-based or
>> > iov_iter-based interface to the crypto code?  I bet it would be quite
>> > a bit faster and it would make crypto on stack buffers work directly.
>>
>> I'd like to hear Herbert's opinion on this too, but as I understand it, if a
>> symmetric cipher API operating on virtual addresses was added, similar to the
>> existing "shash" API it would only allow software processing.  Whereas with 
>> the
>> current API you can request a transform and use it the same way regardless of
>> whether the crypto framework has chosen a software or hardware 
>> implementation,
>> or a combination thereof.  If this wasn't a concern then I expect using 
>> virtual
>> addresses would indeed simplify things a lot, at least for users not already
>> working with physical memory (struct page).
>>
>> Either way, in the near term it looks like 4.9 will be released with the new
>> behavior that encryption/decryption is not supported on stack buffers.
>> Separately from the scatterwalk_map_and_copy() issue, today I've found two
>> places in the filesystem-level encryption code that do encryption on stack
>> buffers and therefore hit the 'BUG_ON(!virt_addr_valid(buf));' in 
>> sg_set_buf().
>> I will be sending patches to fix these, but I suspect there may be more 
>> crypto
>> API users elsewhere that have this same problem.
>>
>> Eric
>
> [Added linux-mm to Cc]
>
> For what it's worth, grsecurity has a special case to allow a scatterlist 
> entry
> to be created from a stack buffer:
>
> static inline void sg_set_buf(struct scatterlist *sg, const void *buf,
>   unsigned int buflen)
> {
> const void *realbuf = buf;
>
> #ifdef CONFIG_GRKERNSEC_KSTACKOVERFLOW
> if (object_starts_on_stack(buf))
> realbuf = buf - current->stack + 
> current->lowmem_stack;
> #endif
>
> #ifdef CONFIG_DEBUG_SG
> BUG_ON(!virt_addr_valid(realbuf));
> #endif
> sg_set_page(sg, virt_to_page(realbuf), buflen, 
> offset_in_page(realbuf));
> }

Yes, that's how grsecurity works.  The upstream version is going to do
it right instead of hacking around it.

> I don't know about all the relative merits of the two approaches.  But one of
> the things that will need to be done with the currently upstream approach is
> that all callers of sg_set_buf() will need to be checked to make sure they
> aren't using stack addresses, and any that are will need to be updated to do
> otherwise, e.g. by using heap-allocated memory.

I tried to do this, but I may have missed a couple example.

>  I suppose this is already
> happening, but in the case of the crypto API it will probably take a while for
> all the users to be identified and updated.  (And it's not always clear from 
> the
> local context whether something can be stack memory or not, e.g. the memory 
> for
> crypto request objects may be either.)

The crypto request objects can live on the stack just fine.  It's the
request buffers that need to live elsewhere (or the alternative
interfaces can be used, or the crypto core code can start using
something other than scatterlists).

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vmalloced stacks and scatterwalk_map_and_copy()

2016-11-03 Thread Andy Lutomirski
On Thu, Nov 3, 2016 at 11:16 AM, Eric Biggers  wrote:
> Hello,
>
> I hit the BUG_ON() in arch/x86/mm/physaddr.c:26 while testing some crypto code
> in an x86_64 kernel with CONFIG_DEBUG_VIRTUAL=y and CONFIG_VMAP_STACK=y:
>
> /* carry flag will be set if starting x was >= PAGE_OFFSET */
> VIRTUAL_BUG_ON((x > y) || !phys_addr_valid(x));
>
> The problem is the following code in scatterwalk_map_and_copy() in
> crypto/scatterwalk.c, which tries to determine if the buffer passed in aliases
> the physical memory of the first segment of the scatterlist:
>
> if (sg_page(sg) == virt_to_page(buf) &&
> sg->offset == offset_in_page(buf))
> return;

...

>
> Currently I think the best solution would be to require that callers to
> scatterwalk_map_and_copy() do not alias their source and destination.  Then 
> the
> alias check could be removed.  This check has only been there since v4.2 
> (commit
> 74412fd5d71b6), so I'd hope not many callers rely on the behavior.  I'm not 
> sure
> exactly which ones do, though.
>
> Thoughts on this?

The relevant commit is:

commit 74412fd5d71b6eda0beb302aa467da000f0d530c
Author: Herbert Xu 
Date:   Thu May 21 15:11:12 2015 +0800

crypto: scatterwalk - Check for same address in map_and_copy

This patch adds a check for in scatterwalk_map_and_copy to avoid
copying from the same address to the same address.  This is going
to be used for IV copying in AEAD IV generators.

There is no provision for partial overlaps.

This patch also uses the new scatterwalk_ffwd instead of doing
it by hand in scatterwalk_map_and_copy.

Signed-off-by: Herbert Xu 

Herbert, can you clarify this?  The check seems rather bizarre --
you're doing an incomplete check for aliasing and skipping the whole
copy if the beginning aliases.  In any event the stack *can't*
reasonably alias the scatterlist because a scatterlist can't safely
point to the stack.  Is there any code that actually relies on the
aliasing-detecting behavior?

Also, Herbert, it seems like the considerable majority of the crypto
code is acting on kernel virtual memory addresses and does software
processing.  Would it perhaps make sense to add a kvec-based or
iov_iter-based interface to the crypto code?  I bet it would be quite
a bit faster and it would make crypto on stack buffers work directly.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND] hwrng: core - don't pass stack allocated buffer to rng->read()

2016-10-21 Thread Andy Lutomirski
On Fri, Oct 21, 2016 at 1:48 PM, Laszlo Ersek  wrote:
> The virtio-rng backend for hwrng passes the buffer that it receives for
> filling to sg_set_buf() directly, in:
>
> virtio_read()   [drivers/char/hw_random/virtio-rng.c]
>   register_buffer() [drivers/char/hw_random/virtio-rng.c]
> sg_init_one()   [lib/scatterlist.c]
>   sg_set_buf()  [include/linux/scatterlist.h]
>
> In turn, the sg_set_buf() function, when built with CONFIG_DEBUG_SG,
> actively enforces (justifiedly) that the buffer used within the
> scatter-gather list live in physically contiguous memory:
>
>   BUG_ON(!virt_addr_valid(buf));
>
> The combination of the above two facts means that whatever calls
> virtio_read() -- via the hwrng.read() method -- has to allocate the
> recipient buffer in physically contiguous memory.

Indeed.  This bug should be fixed by:

https://git.kernel.org/cgit/linux/kernel/git/herbert/cryptodev-2.6.git/commit/?id=6d4952d9d9d4dc2bb9c0255d95a09405a1e958f7
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH resend 4.9] hw_random: Don't use a stack buffer in add_early_randomness()

2016-10-17 Thread Andy Lutomirski
On Mon, Oct 17, 2016 at 11:36 AM, Stephan Mueller <smuel...@chronox.de> wrote:
> Am Montag, 17. Oktober 2016, 10:30:13 CEST schrieb Andy Lutomirski:
>
> Hi Andy,
>>
>> Sure, but shouldn't that be a separate patch covering the whole hw_crypto
>> core?
>
> I think that you are right -- there are many more cases where a memset(0) is
> warranted.
>
> Do you want to make this change or should I send a patch?

Can you do it?  I have my work cut out for me making sure that all the
known regressions get stomped quickly...
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH resend 4.9] hw_random: Don't use a stack buffer in add_early_randomness()

2016-10-17 Thread Andy Lutomirski
On Mon, Oct 17, 2016 at 10:17 AM, Stephan Mueller <smuel...@chronox.de> wrote:
> Am Montag, 17. Oktober 2016, 10:06:27 CEST schrieb Andy Lutomirski:
>
> Hi Andy,
>
>> diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
>> index 9203f2d130c0..340f96e44642 100644
>> --- a/drivers/char/hw_random/core.c
>> +++ b/drivers/char/hw_random/core.c
>> @@ -84,14 +84,14 @@ static size_t rng_buffer_size(void)
>>
>>  static void add_early_randomness(struct hwrng *rng)
>>  {
>> - unsigned char bytes[16];
>>   int bytes_read;
>> + size_t size = min_t(size_t, 16, rng_buffer_size());
>>
>>   mutex_lock(_mutex);
>> - bytes_read = rng_get_data(rng, bytes, sizeof(bytes), 1);
>> + bytes_read = rng_get_data(rng, rng_buffer, size, 1);
>>   mutex_unlock(_mutex);
>>   if (bytes_read > 0)
>> - add_device_randomness(bytes, bytes_read);
>> + add_device_randomness(rng_buffer, bytes_read);
>
> Shouldn't there be a memset(0) of the rng_buffer at this point to avoid having
> such data lingering in memory?

Sure, but shouldn't that be a separate patch covering the whole hw_crypto core?

--Andy


-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH resend 4.9] hw_random: Don't use a stack buffer in add_early_randomness()

2016-10-17 Thread Andy Lutomirski
hw_random carefully avoids using a stack buffer except in
add_early_randomness().  This causes a crash in virtio_rng if
CONFIG_VMAP_STACK=y.

Reported-by: Matt Mullins <mmull...@mmlx.us>
Tested-by: Matt Mullins <mmull...@mmlx.us>
Fixes: d3cc7996473a ("hwrng: fetch randomness only after device init")
Signed-off-by: Andy Lutomirski <l...@kernel.org>
---

This fixes a crash in 4.9-rc1.

resending because I typoed the git send-email command.  I stealthily added
Matt's Tested-by, too.

 drivers/char/hw_random/core.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 9203f2d130c0..340f96e44642 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -84,14 +84,14 @@ static size_t rng_buffer_size(void)
 
 static void add_early_randomness(struct hwrng *rng)
 {
-   unsigned char bytes[16];
int bytes_read;
+   size_t size = min_t(size_t, 16, rng_buffer_size());
 
mutex_lock(_mutex);
-   bytes_read = rng_get_data(rng, bytes, sizeof(bytes), 1);
+   bytes_read = rng_get_data(rng, rng_buffer, size, 1);
mutex_unlock(_mutex);
if (bytes_read > 0)
-   add_device_randomness(bytes, bytes_read);
+   add_device_randomness(rng_buffer, bytes_read);
 }
 
 static inline void cleanup_rng(struct kref *kref)
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio-rng scatterlist null pointer dereference with VMAP_STACK=y

2016-10-16 Thread Andy Lutomirski
On Sun, Oct 16, 2016 at 9:59 AM, Andy Lutomirski <l...@amacapital.net> wrote:
> On Sat, Oct 15, 2016 at 5:21 PM, Matt Mullins <mmull...@mmlx.us> wrote:
>> With VMAP_STACK=y and HW_RANDOM_VIRTIO=y, I get the following crash:
>>
>> [1.470437] BUG: unable to handle kernel NULL pointer dereference at  
>> (null)
>> [1.473350] IP: [] sg_init_one+0x65/0x90
>> [1.474658] PGD 0
>> [1.475169] Oops:  [#1] SMP
>>
>> Entering kdb (current=0x880069b0c980, pid 1) on processor 1 Oops: (null)
>> due to oops @ 0x812f7c35
>> CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.8.0-08682-g41844e3 #246
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> task: 880069b0c980 task.stack: c9c54000
>> RIP: 0010:[]  [] sg_init_one+0x65/0x90
>> RSP: :c9c57d00  EFLAGS: 00010246
>> RAX:  RBX: c9c57d20 RCX: 00082000
>> RDX: 0d68 RSI: 00041c57 RDI: c9c57d40
>> RBP: 0820 R08:  R09: c9c57d20
>> R10: 00019228 R11: 88017fff8500 R12: 0010
>> R13: 0010 R14:  R15: 
>> FS:  () GS:88017fc0() knlGS:
>> CS:  0010 DS:  ES:  CR0: 80050033
>> CR2:  CR3: 01806000 CR4: 001406a0
>> Stack:
>>  8800694bf500 0001 c9c57d68 8138286f
>>  0002   
>>  8800694bf500 c9c57d68 8800694bf500 
>> Call Trace:
>>  [] ? virtio_read+0x9f/0xe0
>>  [] ? add_early_randomness+0x3e/0xa0
>>  [] ? set_current_rng+0x3f/0x160
>>  [] ? hwrng_register+0x186/0x1d0
>>  [] ? virtrng_scan+0x10/0x20
>>  [] ? virtio_dev_probe+0x21e/0x2b0
>>  [] ? really_probe+0x14e/0x250
>>  [] ? __driver_attach+0x82/0xa0
>>  [] ? really_probe+0x250/0x250
>>  [] ? bus_for_each_dev+0x52/0x80
>>  [] ? bus_add_driver+0x1a3/0x220
>>  [] ? hwrng_modinit+0xc/0xc
>>  [] ? driver_register+0x56/0xd0
>>  [] ? hwrng_modinit+0xc/0xc
>>  [] ? do_one_initcall+0x32/0x150
>>  [] ? parse_args+0x170/0x390
>>  [] ? kernel_init_freeable+0x11e/0x1a3
>>  [] ? set_debug_rodata+0xc/0xc
>>  [] ? rest_init+0x70/0x70
>>  [] ? kernel_init+0x5/0x100
>>  [] ? ret_from_fork+0x25/0x30
>> Code: 01 c5 48 89 ee 48 89 e9 48 c1 ed 23 48 8b 04 ed c0 49 9c 81 48 c1 ee 0c
>> 48 c1 e9 1b 48 85 c0 74 0a 0f b6 c9 48 c1 e1 04 48 01 c8 <48> 8b 08 48 89 f0 
>> 89
>> 53 08 48 c1 e0 06 44 89 63 0c 48 83 e1 fc
>>
>>
>> It looks like add_early_randomness() uses a buffer on the stack, which (with
>> VMAP_STACK) fails to be added to a scatterlist due to use of virt_to_page in
>> sg_set_buf().
>>
>> None of this looks obviously wrong to me, but in combination it is not
>> functional.  Hence, I don't have a particular fix in mind, and I've CC'ed 
>> folks
>> from both the hw_random and scatterlist history.
>
> Using a buffer on the stack is bogus here.  Presumably virtio_read
> should allocate a heap buffer.

On second thought, this clearly belongs in the core code.  Patch coming.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio-rng scatterlist null pointer dereference with VMAP_STACK=y

2016-10-16 Thread Andy Lutomirski
On Sat, Oct 15, 2016 at 5:21 PM, Matt Mullins <mmull...@mmlx.us> wrote:
> With VMAP_STACK=y and HW_RANDOM_VIRTIO=y, I get the following crash:
>
> [1.470437] BUG: unable to handle kernel NULL pointer dereference at  
> (null)
> [1.473350] IP: [] sg_init_one+0x65/0x90
> [1.474658] PGD 0
> [1.475169] Oops:  [#1] SMP
>
> Entering kdb (current=0x880069b0c980, pid 1) on processor 1 Oops: (null)
> due to oops @ 0x812f7c35
> CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.8.0-08682-g41844e3 #246
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 880069b0c980 task.stack: c9c54000
> RIP: 0010:[]  [] sg_init_one+0x65/0x90
> RSP: :c9c57d00  EFLAGS: 00010246
> RAX:  RBX: c9c57d20 RCX: 00082000
> RDX: 0d68 RSI: 00041c57 RDI: c9c57d40
> RBP: 0820 R08:  R09: c9c57d20
> R10: 00019228 R11: 88017fff8500 R12: 0010
> R13: 0010 R14:  R15: 
> FS:  () GS:88017fc0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2:  CR3: 01806000 CR4: 001406a0
> Stack:
>  8800694bf500 0001 c9c57d68 8138286f
>  0002   
>  8800694bf500 c9c57d68 8800694bf500 
> Call Trace:
>  [] ? virtio_read+0x9f/0xe0
>  [] ? add_early_randomness+0x3e/0xa0
>  [] ? set_current_rng+0x3f/0x160
>  [] ? hwrng_register+0x186/0x1d0
>  [] ? virtrng_scan+0x10/0x20
>  [] ? virtio_dev_probe+0x21e/0x2b0
>  [] ? really_probe+0x14e/0x250
>  [] ? __driver_attach+0x82/0xa0
>  [] ? really_probe+0x250/0x250
>  [] ? bus_for_each_dev+0x52/0x80
>  [] ? bus_add_driver+0x1a3/0x220
>  [] ? hwrng_modinit+0xc/0xc
>  [] ? driver_register+0x56/0xd0
>  [] ? hwrng_modinit+0xc/0xc
>  [] ? do_one_initcall+0x32/0x150
>  [] ? parse_args+0x170/0x390
>  [] ? kernel_init_freeable+0x11e/0x1a3
>  [] ? set_debug_rodata+0xc/0xc
>  [] ? rest_init+0x70/0x70
>  [] ? kernel_init+0x5/0x100
>  [] ? ret_from_fork+0x25/0x30
> Code: 01 c5 48 89 ee 48 89 e9 48 c1 ed 23 48 8b 04 ed c0 49 9c 81 48 c1 ee 0c
> 48 c1 e9 1b 48 85 c0 74 0a 0f b6 c9 48 c1 e1 04 48 01 c8 <48> 8b 08 48 89 f0 
> 89
> 53 08 48 c1 e0 06 44 89 63 0c 48 83 e1 fc
>
>
> It looks like add_early_randomness() uses a buffer on the stack, which (with
> VMAP_STACK) fails to be added to a scatterlist due to use of virt_to_page in
> sg_set_buf().
>
> None of this looks obviously wrong to me, but in combination it is not
> functional.  Hence, I don't have a particular fix in mind, and I've CC'ed 
> folks
> from both the hw_random and scatterlist history.

Using a buffer on the stack is bogus here.  Presumably virtio_read
should allocate a heap buffer.

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

2016-06-23 Thread Andy Lutomirski
On Wed, Jun 22, 2016 at 11:41 PM, Herbert Xu
 wrote:
> On Thu, Jun 23, 2016 at 11:48:25AM +0800, Herbert Xu wrote:
>>
>> No we never had such an API in the kernel.  However, I see that
>> rxkad does some pretty silly things and we should be able to avoid
>> using the stack in pretty much all cases.  Let me try to come up with
>> something.
>
> Here it is:
>
> ---8<---
> Subject: rxrpc: Avoid using stack memory in SG lists in rxkad

Looks reasonable to me.  Unless anyone tells me otherwise, my plan is
to queue it in my virtually-mapped stack series and to ask Ingo to
apply it via -tip.

If it went in via the networking tree, that would work as well, but it
would introduce a bisectability problem.

Thanks!

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

2016-06-22 Thread Andy Lutomirski
On Wed, Jun 22, 2016 at 2:48 PM, Andy Lutomirski <l...@amacapital.net> wrote:
> On Tue, Jun 21, 2016 at 5:52 PM, Andy Lutomirski <l...@amacapital.net> wrote:
>> On Tue, Jun 21, 2016 at 5:42 PM, Herbert Xu <herb...@gondor.apana.org.au> 
>> wrote:
>>> On Tue, Jun 21, 2016 at 10:43:40AM -0700, Andy Lutomirski wrote:
>>>>
>>>> Is there a straightforward way that bluetooth and, potentially, other
>>>> drivers can just do synchronous crypto in a small buffer specified by
>>>> its virtual address?  The actual cryptography part of the crypto code
>>>> already works this way, but I can't find an API for it.
>>>
>>> Yes, single block users should use crypto_cipher_encrypt_one, an
>>> example would be drivers/md/dm-crypt.c.
>>>
>>
>> Aha!  I expected something like that to exist, but I couldn't find it.
>> I'll change the two offenders I've found so far to use it.
>>
>
> Before I do this, can you explain what the difference is between
> crypto_cipher and crypto_skcipher?  net/bluetooth/smp.c currently uses
> crypto_alloc_skcipher, which you added in:
>
> commit 71af2f6bb22a4bf42663e10f1d8913d4967ed07f
> Author: Herbert Xu <herb...@gondor.apana.org.au>
> Date:   Sun Jan 24 21:18:30 2016 +0800
>
> Bluetooth: Use skcipher and hash
>
> Am I just supposed to replace "skcipher" with "cipher" everywhere?

It looks like I'm supposed to that and to use "aes" instead of "ebc(aes)".

*However*, the other offender I've found (net/rxrpc/rxkad.c) uses
"pcbc(fcrypt)", which doesn't appear to be usable with this API.  Is
there no way to say "I want synchronous crypto on this VA range" using
the skcipher API?

--Andy


-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

2016-06-22 Thread Andy Lutomirski
On Tue, Jun 21, 2016 at 5:52 PM, Andy Lutomirski <l...@amacapital.net> wrote:
> On Tue, Jun 21, 2016 at 5:42 PM, Herbert Xu <herb...@gondor.apana.org.au> 
> wrote:
>> On Tue, Jun 21, 2016 at 10:43:40AM -0700, Andy Lutomirski wrote:
>>>
>>> Is there a straightforward way that bluetooth and, potentially, other
>>> drivers can just do synchronous crypto in a small buffer specified by
>>> its virtual address?  The actual cryptography part of the crypto code
>>> already works this way, but I can't find an API for it.
>>
>> Yes, single block users should use crypto_cipher_encrypt_one, an
>> example would be drivers/md/dm-crypt.c.
>>
>
> Aha!  I expected something like that to exist, but I couldn't find it.
> I'll change the two offenders I've found so far to use it.
>

Before I do this, can you explain what the difference is between
crypto_cipher and crypto_skcipher?  net/bluetooth/smp.c currently uses
crypto_alloc_skcipher, which you added in:

commit 71af2f6bb22a4bf42663e10f1d8913d4967ed07f
Author: Herbert Xu <herb...@gondor.apana.org.au>
Date:   Sun Jan 24 21:18:30 2016 +0800

Bluetooth: Use skcipher and hash

Am I just supposed to replace "skcipher" with "cipher" everywhere?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

2016-06-21 Thread Andy Lutomirski
On Tue, Jun 21, 2016 at 5:42 PM, Herbert Xu <herb...@gondor.apana.org.au> wrote:
> On Tue, Jun 21, 2016 at 10:43:40AM -0700, Andy Lutomirski wrote:
>>
>> Is there a straightforward way that bluetooth and, potentially, other
>> drivers can just do synchronous crypto in a small buffer specified by
>> its virtual address?  The actual cryptography part of the crypto code
>> already works this way, but I can't find an API for it.
>
> Yes, single block users should use crypto_cipher_encrypt_one, an
> example would be drivers/md/dm-crypt.c.
>

Aha!  I expected something like that to exist, but I couldn't find it.
I'll change the two offenders I've found so far to use it.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

2016-06-21 Thread Andy Lutomirski
net/bluetooth/smp.c (in smp_e) wants to do AES-ECB on a 16-byte stack
buffer, which seems eminently reasonable to me.  It does it like this:

sg_init_one(, data, 16);

skcipher_request_set_tfm(req, tfm);
skcipher_request_set_callback(req, 0, NULL, NULL);
skcipher_request_set_crypt(req, , , 16, NULL);

err = crypto_skcipher_encrypt(req);
skcipher_request_zero(req);
if (err)
BT_ERR("Encrypt data error %d", err);

I tried to figure out what exactly that does, and I got like in so
many layers of indirection that I mostly gave up.  But it appears to
map the stack address to a physical address, stick it in a
scatterlist, follow several function pointers, go through a bunch of
"scatterwalk" indirection, and call blkcipher_next_fast, which calls
blkcipher_map_src, which calls scatterwalk_map, which calls
kmap_atomic (!).  This is anything but fast.

I think this code has several serious problems:

 - It uses kmap_atomic to access a 16-byte stack buffer.  This is absurd.

 - It blows up if the stack is in vmalloc space, because you can't
virt_to_phys on the stack buffer in the first place.  (This is why I
care.)  And I really, really don't want to write sg_init_stack to
create a scatterlist that points to the stack, although such a thing
could be done if absolutely necessary.

 - It's very, very compllicated and it does something very, very
simple (call aes_encrypt on a plain old u8 *).

Oh yeah, it sets CRYPTO_ALG_ASYNC, too.  I can't even figure out what
that does because the actual handling of that flag is so many layers
deep.

Is there a straightforward way that bluetooth and, potentially, other
drivers can just do synchronous crypto in a small buffer specified by
its virtual address?  The actual cryptography part of the crypto code
already works this way, but I can't find an API for it.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: General flags to turn things off (getrandom, pid lookup, etc)

2014-07-27 Thread Andy Lutomirski
On Jul 27, 2014 5:06 PM, Theodore Ts'o ty...@mit.edu wrote:

 On Fri, Jul 25, 2014 at 11:30:48AM -0700, Andy Lutomirski wrote:
 
  There is recent interest in having a way to turn generally-available
  kernel features off.  Maybe we should add a good one so we can stop
  bikeshedding and avoid proliferating dumb interfaces.

 I believe the seccomp infrastructure (which is already upstream)
 should be able to do most of what you want, at least with respect to
 features which are exposed via system calls (which was most of your
 list).

Seccomp can't really restrict lookups of non-self pids.  In fact, this
feature idea started out as a response to a patch adding a kind of
nasty seccomp feature to make it sort of possible.

I agree that that seccomp can turn off GRND_RANDOM, but how is it
supposed to do it in such a way that the filtered software will fall
back to something sensible?  -ENOSYS?  -EPERM?  Something else?

I think that -ENOSYS is clearly wrong, but standardizing this would be
nice.  Admittedly, adding something fancy like this for GRND_RANDOM
may not be appropriate.

--Andy


 It won't cover x86 specific things like restricting RDTSC or CPUID
 (and as far as I know you can't intercept the CPUID instruction), but
 I'm not sure it matters.  I don't really see the point, myself.

  - Ted
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


General flags to turn things off (getrandom, pid lookup, etc)

2014-07-25 Thread Andy Lutomirski
[new thread because this sort of combines two threads]

There is recent interest in having a way to turn generally-available
kernel features off.  Maybe we should add a good one so we can stop
bikeshedding and avoid proliferating dumb interfaces.

Things that might want to be turn-off-able include:
 - getrandom with GRND_RANDOM [from the getrandom threads]
 - Any lookup of a non-self pid [from the capsicum thread]
 - Any lookup of a pid outside the caller thread group [capsicum]
 - Various architectural things (personal wishlist), e.g.:
- RDTSC and userspace HPET access
- CPUID?
- 32-bit GDT code segments [huge attack surface]
- 64-bit GDT code segments [probably pointless]

I would propose a new syscall for this:

long restrict_userspace(int mode, int type, int value, int flags);

mode is RESTRICT_SET, RESTRICT_GET, or RESTRICT_LOCK.

type is RESTRICT_GRND_RANDOM, RESTRICT_PID_SCOPE, RESTRICT_X86_TIMING, etc.

Value is zero if RESTRICT_GET.  Otherwise value is the desired value,
generally 0 or 1.  For RESTRICT_PID_SCOPE, value would be
RESTRICT_PID_SCOPE_ANY, RESTRICT_PID_SCOPE_THREADGROUP, or
RESTRICT_PID_SCOPE_SELF.

flags must be zero.  Someday, someone will propose a thread-sync flag.

restrict_userspace requires either no_new_privs or CAP_SYS_ADMIN in
the current user namespace.

Thoughts?

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: General flags to turn things off (getrandom, pid lookup, etc)

2014-07-25 Thread Andy Lutomirski
On Fri, Jul 25, 2014 at 1:15 PM, Dave Jones da...@redhat.com wrote:
 On Fri, Jul 25, 2014 at 11:30:48AM -0700, Andy Lutomirski wrote:

   There is recent interest in having a way to turn generally-available
   kernel features off.  Maybe we should add a good one so we can stop
   bikeshedding and avoid proliferating dumb interfaces.
  
   Things that might want to be turn-off-able include:
- getrandom with GRND_RANDOM [from the getrandom threads]
- Any lookup of a non-self pid [from the capsicum thread]
- Any lookup of a pid outside the caller thread group [capsicum]
- Various architectural things (personal wishlist), e.g.:
   - RDTSC and userspace HPET access
   - CPUID?
   - 32-bit GDT code segments [huge attack surface]
   - 64-bit GDT code segments [probably pointless]

 I'm not sure there's value in disabling cpuid dev interface,
 when the instruction is unprivileged.

I meant the CPUID instruction.  Some CPUs have a setting that turns
off the CPUID instruction for user code.  In principle, all VMs can do
this, too, if the hypervisor would be kind enough to help out.

I only mentioned the x86 stuff here to make the point that there are
quite a few possibilities along these lines.  There's actually already
a way to turn off RDTSC, but it's not currently very useful because it
doesn't do the right thing for the vDSO.  That could be fixed, but
there's certainly no reason to make any of the other stuff here wait
for that.


   I would propose a new syscall for this:
  
   long restrict_userspace(int mode, int type, int value, int flags);

 do the restrictions happen system-wide like in say SELinux,
 or only within the calling process, like seccomp ?


The calling process and children, like seccomp.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: General flags to turn things off (getrandom, pid lookup, etc)

2014-07-25 Thread Andy Lutomirski
On Fri, Jul 25, 2014 at 2:35 PM, One Thousand Gnomes
gno...@lxorguk.ukuu.org.uk wrote:
 On Fri, 25 Jul 2014 11:30:48 -0700
 Andy Lutomirski l...@amacapital.net wrote:

 [new thread because this sort of combines two threads]

 There is recent interest in having a way to turn generally-available
 kernel features off.  Maybe we should add a good one so we can stop
 bikeshedding and avoid proliferating dumb interfaces.

 We sort of have one. It's called capable(). Just needs extending to cover
 anything else you care about, and probably all the numeric constants
 replacing with textual names.


Except that it's all backwards: these are things that default to *on*,
and people might want them to turn off.  capable() is totally fscked
if you want otherwise unprivileged users to carry capabilities around,
and fixing it seems to run into endless claims that the capability
system is carefully designed, flawless, perfect, ideal, amazing, and
shouldn't be changed, despite the fact that it's empirically damn near
useless.

Also, capabilities do the wrong thing wrt namespaces.  The things I'm
talking about aren't namespaced.  They're either on or off.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: General flags to turn things off (getrandom, pid lookup, etc)

2014-07-25 Thread Andy Lutomirski
On Fri, Jul 25, 2014 at 4:43 PM, H. Peter Anvin h...@zytor.com wrote:
 On 07/25/2014 11:30 AM, Andy Lutomirski wrote:
 - 32-bit GDT code segments [huge attack surface]
 - 64-bit GDT code segments [probably pointless]

 I presume you mean s/GDT/LDT/.

 We already don't allow 64-bit LDT code segments.  Also, it is unclear to
 me how 32-bit LDT segments have a huge attack surface, given that there
 will realistically always be a 32-bit *GDT* segment present.

I really did mean GDT :)  Setting the 32-bit code segment to not
present (and using seccomp to block modify_ldt) prevents any attempt
to exploit bugs in the sysenter and cstar code.  It also might prevent
exploiting CPU bugs, although I've never heard of a relevant CPU bug
in this area.

If I actually tried to implement this (which wouldn't be part of the
initial implementation), I'd split out the unusual things in
__switch_to and friends to a slow path that's only used if weird
settings are present (e.g. this, TSC restrictions, etc).  But
twiddling the present bit on a GDT entry is very fast, I assume --
it's just memory, and I don't think that any flush is needed.

Also, if I implement this, I will curse Xen.  I might even go so far
as to disable the feature entirely if there's a paravirt GDT.

Hmm.  A separate flag to turn int $0x80 into GPF could have some value, too.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v5] random: introduce getrandom(2) system call

2014-07-24 Thread Andy Lutomirski
On Thu, Jul 24, 2014 at 8:18 AM, Henrique de Moraes Holschuh
h...@hmh.eng.br wrote:
 On Thu, 24 Jul 2014, Theodore Ts'o wrote:
 ERRORS
   EINVAL  An invalid flag was passed to getrandom(2)

   EFAULT  buf is outside the accessible address space.

   EAGAIN  The requested entropy was not available, and
   getentropy(2) would have blocked if GRND_BLOCK flag
   was set.

   EINTR   While blocked waiting for entropy, the call was
   interrupted by a signal handler; see the description
   of how interrupted read(2) calls on slow devices
   are handled with and without the SA_RESTART flag
   in the signal(7) man page.

 Should we add ESOMETHING to be able to deny access to GRND_RANDOM or some
 future extension ?

This might actually be needed sooner rather than later.  There are
programs that use containers and intentionally don't pass /dev/random
through into the container.  I know that Sandstorm does this, and I
wouldn't be surprised if other things (Docker?) do the same thing.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v5] random: introduce getrandom(2) system call

2014-07-24 Thread Andy Lutomirski
On Thu, Jul 24, 2014 at 1:30 PM, Henrique de Moraes Holschuh
h...@hmh.eng.br wrote:
 On Thu, 24 Jul 2014, Theodore Ts'o wrote:
 On Thu, Jul 24, 2014 at 08:21:38AM -0700, Andy Lutomirski wrote:
   Should we add ESOMETHING to be able to deny access to GRND_RANDOM or 
   some
   future extension ?
 
  This might actually be needed sooner rather than later.  There are
  programs that use containers and intentionally don't pass /dev/random
  through into the container.  I know that Sandstorm does this, and I
  wouldn't be surprised if other things (Docker?) do the same thing.

 I wouldn't add the error to the man page until we actually modify the
 kernel to add such a restriction.

 By then, it might be too late.  It would be really sad to find ourselves
 forced to return ENOSYS to getrandom(GRND_RANDOM) when we actually wanted to
 return EPERM/EACCES.

 Actually, we might not be able to do even that much: all it takes is for
 someone to have the bright idea of deploying userspace code that checks for
 ENOSYS only on a first probe getrandom() syscall without GRND_RANDOM, and
 does something idiotic when it gets ENOSYS later on a
 getrandom(GRND_RANDOM).  meh.  We can't even abuse the system ourselves :-)


Or that someone writes userspace code that gets -EPERM/-EACCESS on
getrandom with GRND_RANDOM and falls back to something worse than
getrandom w/o GRND_RANDOM.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] random: introduce getrandom(2) system call

2014-07-17 Thread Andy Lutomirski
On 07/17/2014 02:18 AM, Theodore Ts'o wrote:
 The getrandom(2) system call was requested by the LibreSSL Portable
 developers.  It is analoguous to the getentropy(2) system call in
 OpenBSD.
 
 The rationale of this system call is to provide resiliance against
 file descriptor exhaustion attacks, where the attacker consumes all
 available file descriptors, forcing the use of the fallback code where
 /dev/[u]random is not available.  Since the fallback code is often not
 well-tested, it is better to eliminate this potential failure mode
 entirely.
 
 The other feature provided by this new system call is the ability to
 request randomness from the /dev/urandom entropy pool, but to block
 until at least 128 bits of entropy has been accumulated in the
 /dev/urandom entropy pool.  Historically, the emphasis in the
 /dev/urandom development has been to ensure that urandom pool is
 initialized as quickly as possible after system boot, and preferably
 before the init scripts start execution.  This is because changing
 /dev/urandom reads to block represents an interface change that could
 potentially break userspace which is not acceptable.  In practice, on
 most x86 desktop and server systems, in general the entropy pool can
 be initialized before it is needed (and in modern kernels, we will
 printk a warning message if not).  However, on an embedded system,
 this may not be hte case.  And so with a new interface, we can provide
 this requested functionality of blocking until the urandom pool has
 been initialized.  Any userspace program which uses this new
 functionality must make sure that if it is used in early boot, that it
 will not cause the boot up scripts or other portions of the system
 startup to hang indefinitely.
 
 SYNOPSIS
   #include linux/random.h
 
   int getrandom(void *buf, size_t buflen, unsigned int flags);
 
 DESCRIPTION
 
   The system call getrandom() fills the buffer pointed to by buf
   with up to buflen random bytes which can be used to seed user
   space random number generators (i.e., DRBG's) or for other
   cryptographic processes.  It should not be used Monte Carlo
   simulations or for other probabilistic sampling applications.
 
   If the GRND_RANDOM flags bit is set, then draw from the
   /dev/random pool instead of /dev/urandom pool.  The
   /dev/random pool is limited based on the entropy that can be
   obtained from environmental noise, so if there is insufficient
   entropy, the requested number of bytes may not be returned.
   If there is no entropy available at all, getrandom(2) will
   either return an error with errno set to EAGAIN, or block if
   the GRND_BLOCK flags bit is set.
 
   If the GRND_RANDOM flags bit is not set, then the /dev/raundom
   pool will be used.  Unlike reading from the /dev/urandom, if
   the urandom pool has not been sufficiently initialized,
   getrandom(2) will either return an error with errno set to
   EGAIN, or block if the GRND_BLOCK flags bit is set.
 
 RETURN VALUE
On success, the number of bytes that was returned is returned.
 
On error, -1 is returned, and errno is set appropriately
 
 ERRORS
   EINVAL  The buflen value was invalid.
 
   EFAULT  buf is outside your accessible address space.
 
   EAGAIN  The requested entropy was not available, and the
   getentropy(2) would have blocked if GRND_BLOCK flag
   was set.
 
 Signed-off-by: Theodore Ts'o ty...@mit.edu
 ---
  arch/x86/syscalls/syscall_32.tbl  |  1 +
  arch/x86/syscalls/syscall_64.tbl  |  1 +
  drivers/char/random.c | 35 +--
  include/linux/syscalls.h  |  3 +++
  include/uapi/asm-generic/unistd.h |  4 +++-
  include/uapi/linux/random.h   |  9 +
  6 files changed, 50 insertions(+), 3 deletions(-)
 
 diff --git a/arch/x86/syscalls/syscall_32.tbl 
 b/arch/x86/syscalls/syscall_32.tbl
 index d6b8679..f484e39 100644
 --- a/arch/x86/syscalls/syscall_32.tbl
 +++ b/arch/x86/syscalls/syscall_32.tbl
 @@ -360,3 +360,4 @@
  351  i386sched_setattr   sys_sched_setattr
  352  i386sched_getattr   sys_sched_getattr
  353  i386renameat2   sys_renameat2
 +354  i386getrandom   sys_getrandom
 diff --git a/arch/x86/syscalls/syscall_64.tbl 
 b/arch/x86/syscalls/syscall_64.tbl
 index ec255a1..6705032 100644
 --- a/arch/x86/syscalls/syscall_64.tbl
 +++ b/arch/x86/syscalls/syscall_64.tbl
 @@ -323,6 +323,7 @@
  314  common  sched_setattr   sys_sched_setattr
  315  common  sched_getattr   sys_sched_getattr
  316  common  renameat2   sys_renameat2
 +317  common  getrandom   sys_getrandom
  
  #
  # x32-specific system call numbers start at 512 to avoid cache impact
 diff --git a/drivers/char/random.c b/drivers/char/random.c
 index aa22fe5..76a56f6 100644
 --- a/drivers/char/random.c

Re: [PATCH, RFC] random: introduce getrandom(2) system call

2014-07-17 Thread Andy Lutomirski
On 07/17/2014 11:48 AM, Mark Kettenis wrote:
 On Thu, Jul 17, 2014, Theodore Ts'o wrote:

 The getrandom(2) system call is a superset of getentropy(2).  When we
 add the support for this into glibc, it won't be terribly difficult
 nor annoying to drop the following in alongside the standard support
 needed for any new system call:

 int getentropy(void *buf, size_t buflen)
 {
  int ret;

  ret = getentropy(buf, buflen, 0);
  return (ret  0) ? 0 : ret;
 }
 
 I'm sure you meant to use getrandom() there ;)
 
 Since for LibreSSL we'd want a getentropy() that cannot fail the
 getrandom() call should use GRND_BLOCK flag.  Actually it makes sense
 (to me) to make blocking the default behaviour and have a
 BRND_NONBLOCK flag.  Much in the same way as you need to specify
 O_NONBLOCK if you want non-blocking behaviour for files.
 

Can we please have a mode in which getrandom(2) can neither block nor
fail?  If that gets added, then this can replace things like AT_RANDOM.

There are non-crypto things out there that will want this.  There are
also probably VM systems (especially ones that have something like my
KVM_GET_RNG_SEED patches applied, or many VMs on Haswell, for that
matter) that will have perfectly fine cryptographically secure urandom
output immediately after bootup but that won't consider themselves
initialized for a while.  At least these will be perfectly fine from
the POV of those who trust their hypervisor and Intel :)

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, RFC] random: introduce getrandom(2) system call

2014-07-17 Thread Andy Lutomirski
On Thu, Jul 17, 2014 at 2:28 PM, Theodore Ts'o ty...@mit.edu wrote:
 On Thu, Jul 17, 2014 at 01:35:15PM -0700, Andy Lutomirski wrote:

 Can we please have a mode in which getrandom(2) can neither block nor
 fail?  If that gets added, then this can replace things like AT_RANDOM.

 AT_RANDOM has been around for a long time; it's not something we can
 remove.

I'm not suggesting removing AT_RANDOM.  To the contrary, I'm
suggesting that libraries that need to seed some kind of
non-cryptographic, non-security-related RNG could use getrandom(2)
with appropriate flags rather than screwing around with getpid,
clock_gettime, etc.

For example:

unsigned int seed;
getrandom(seed, sizeof(seed), GRND_BEST_EFFORT); /* Never fails on
new enough kernels */
srand(seed);

No error checking, no weird cases, and if the RNG isn't well seeded,
then I tried my best.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] crypto_user: Fix out-of-bounds read

2014-04-23 Thread Andy Lutomirski
On Apr 23, 2014 4:40 AM, Dan Carpenter dan.carpen...@oracle.com wrote:

 On Tue, Apr 22, 2014 at 12:30:28PM -0700, Andy Lutomirski wrote:
  This is unlikely to be exploitable for anything except an OOPS.
 
  Cc: sta...@vger.kernel.org
  Signed-off-by: Andy Lutomirski l...@amacapital.net
  ---
 
  Notes:
  This is entirely untested, but it looks obviously correct to me.
 
   crypto/crypto_user.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
 
  diff --git a/crypto/crypto_user.c b/crypto/crypto_user.c
  index 1512e41..bc7c4b5 100644
  --- a/crypto/crypto_user.c
  +++ b/crypto/crypto_user.c
  @@ -460,7 +460,7 @@ static int crypto_user_rcv_msg(struct sk_buff *skb, 
  struct nlmsghdr *nlh)
int type, err;
 
type = nlh-nlmsg_type;
  - if (type  CRYPTO_MSG_MAX)
  + if (type  CRYPTO_MSG_BASE || type  CRYPTO_MSG_MAX)
return -EINVAL;

 Adding a check here is obviously harmless but I think this is only
 called from netlink_rcv_skb() which already checks:

 if (nlh-nlmsg_type  NLMSG_MIN_TYPE)
 goto ack;

 NLMSG_MIN_TYPE is 0x10 as well, so I don't think we can hit your
 condition.

 Your patch freaked me out a little because this is one of the bugs that
 I should have caught throught static analysis.

BUILD_BUG_ON(NLMSG_MIN_TYPE != CRYPTO_MSG_BASE) might be a better
thing to add, then.

I don't feel so bad for missing this.


 regards,
 dan carpenter

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto_user: Fix out-of-bounds read

2014-04-22 Thread Andy Lutomirski
This is unlikely to be exploitable for anything except an OOPS.

Cc: sta...@vger.kernel.org
Signed-off-by: Andy Lutomirski l...@amacapital.net
---

Notes:
This is entirely untested, but it looks obviously correct to me.

 crypto/crypto_user.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/crypto/crypto_user.c b/crypto/crypto_user.c
index 1512e41..bc7c4b5 100644
--- a/crypto/crypto_user.c
+++ b/crypto/crypto_user.c
@@ -460,7 +460,7 @@ static int crypto_user_rcv_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
int type, err;
 
type = nlh-nlmsg_type;
-   if (type  CRYPTO_MSG_MAX)
+   if (type  CRYPTO_MSG_BASE || type  CRYPTO_MSG_MAX)
return -EINVAL;
 
type -= CRYPTO_MSG_BASE;
-- 
1.9.0

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-08-11 Thread Andy Lutomirski

On 08/04/2011 02:44 AM, Herbert Xu wrote:

On Sun, Jul 24, 2011 at 07:53:14PM +0200, Mathias Krause wrote:


With this algorithm I was able to increase the throughput of a single
IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using
the SSSE3 variant -- a speedup of +34.8%.


Were you testing this on the transmit side or the receive side?

As the IPsec receive code path usually runs in a softirq context,
does this code have any effect there at all?

This is pretty similar to the situation with the Intel AES code.
Over there they solved it by using the asynchronous interface and
deferring the processing to a work queue.


I have vague plans to clean up extended state handling and make 
kernel_fpu_begin work efficiently from any context.  (i.e. the first 
kernel_fpu_begin after a context switch could take up to ~60 ns on Sandy 
Bridge, but further calls to kernel_fpu_begin would be a single branch.)


The current code that handles context switches when user code is using 
extended state is terrible and will almost certainly become faster in 
the near future.


Hopefully I'll have patches for 3.2 or 3.3.

IOW, please don't introduce another thing like the fpu crypto module 
quite yet unless there's a good reason.  I'm looking forward to deleting 
the fpu module entirely.


--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] aesni-intel: Merge with fpu.ko

2011-05-07 Thread Andy Lutomirski
Loading fpu without aesni-intel does nothing.  Loading aesni-intel
without fpu causes modes like xts to fail.  (Unloading
aesni-intel will restore those modes.)

One solution would be to make aesni-intel depend on fpu, but it
seems cleaner to just combine the modules.

This is probably responsible for bugs like:
https://bugzilla.redhat.com/show_bug.cgi?id=589390

Signed-off-by: Andy Lutomirski l...@mit.edu
---

I'm not sure this is the best fix, but it seems to fix a longstanding bug.

FWIW, I don't understand why the fpu module exists at all.

 arch/x86/crypto/Makefile   |4 +---
 arch/x86/crypto/aesni-intel_glue.c |8 
 arch/x86/crypto/fpu.c  |   10 ++
 crypto/Kconfig |6 --
 4 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 1a58ad8..c04f1b7 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -2,8 +2,6 @@
 # Arch-specific CryptoAPI modules.
 #
 
-obj-$(CONFIG_CRYPTO_FPU) += fpu.o
-
 obj-$(CONFIG_CRYPTO_AES_586) += aes-i586.o
 obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o
 obj-$(CONFIG_CRYPTO_SALSA20_586) += salsa20-i586.o
@@ -24,6 +22,6 @@ aes-x86_64-y := aes-x86_64-asm_64.o aes_glue.o
 twofish-x86_64-y := twofish-x86_64-asm_64.o twofish_glue.o
 salsa20-x86_64-y := salsa20-x86_64-asm_64.o salsa20_glue.o
 
-aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o
+aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o
 
 ghash-clmulni-intel-y := ghash-clmulni-intel_asm.o ghash-clmulni-intel_glue.o
diff --git a/arch/x86/crypto/aesni-intel_glue.c 
b/arch/x86/crypto/aesni-intel_glue.c
index b375b2a..89632d1 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -140,6 +140,9 @@ asmlinkage void aesni_gcm_dec(void *ctx, u8 *out,
u8 *hash_subkey, const u8 *aad, unsigned long aad_len,
u8 *auth_tag, unsigned long auth_tag_len);
 
+int crypto_fpu_init(void);
+void crypto_fpu_exit(void);
+
 static inline struct
 aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct crypto_aead *tfm)
 {
@@ -1259,6 +1262,8 @@ static int __init aesni_init(void)
return -ENODEV;
}
 
+   if ((err = crypto_fpu_init()))
+   goto fpu_err;
if ((err = crypto_register_alg(aesni_alg)))
goto aes_err;
if ((err = crypto_register_alg(__aesni_alg)))
@@ -1336,6 +1341,7 @@ blk_ecb_err:
 __aes_err:
crypto_unregister_alg(aesni_alg);
 aes_err:
+fpu_err:
return err;
 }
 
@@ -1365,6 +1371,8 @@ static void __exit aesni_exit(void)
crypto_unregister_alg(blk_ecb_alg);
crypto_unregister_alg(__aesni_alg);
crypto_unregister_alg(aesni_alg);
+
+   crypto_fpu_exit();
 }
 
 module_init(aesni_init);
diff --git a/arch/x86/crypto/fpu.c b/arch/x86/crypto/fpu.c
index 1a8f864..98d7a18 100644
--- a/arch/x86/crypto/fpu.c
+++ b/arch/x86/crypto/fpu.c
@@ -150,18 +150,12 @@ static struct crypto_template crypto_fpu_tmpl = {
.module = THIS_MODULE,
 };
 
-static int __init crypto_fpu_module_init(void)
+int __init crypto_fpu_init(void)
 {
return crypto_register_template(crypto_fpu_tmpl);
 }
 
-static void __exit crypto_fpu_module_exit(void)
+void __exit crypto_fpu_exit(void)
 {
crypto_unregister_template(crypto_fpu_tmpl);
 }
-
-module_init(crypto_fpu_module_init);
-module_exit(crypto_fpu_module_exit);
-
-MODULE_LICENSE(GPL);
-MODULE_DESCRIPTION(FPU block cipher wrapper);
diff --git a/crypto/Kconfig b/crypto/Kconfig
index 4b7cb0e..87b22ca 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -264,11 +264,6 @@ config CRYPTO_XTS
  key size 256, 384 or 512 bits. This implementation currently
  can't handle a sectorsize which is not a multiple of 16 bytes.
 
-config CRYPTO_FPU
-   tristate
-   select CRYPTO_BLKCIPHER
-   select CRYPTO_MANAGER
-
 comment Hash modes
 
 config CRYPTO_HMAC
@@ -543,7 +538,6 @@ config CRYPTO_AES_NI_INTEL
select CRYPTO_AES_586 if !64BIT
select CRYPTO_CRYPTD
select CRYPTO_ALGAPI
-   select CRYPTO_FPU
help
  Use Intel AES-NI instructions for AES algorithm.
 
-- 
1.7.4.4

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >