Re: better patch for linux/bitops.h

2016-05-04 Thread Jeffrey Walton
On Wed, May 4, 2016 at 11:50 PM, Theodore Ts'o  wrote:
> ...
> But instead of arguing over what works and doesn't, let's just create
> the the test set and just try it on a wide range of compilers and
> architectures, hmmm?

What are the requirements? Here's a short list:

  * No undefined behavior
- important because the compiler writers use the C standard
  * Compiles to native "rotate IMMEDIATE" if the rotate amount is a
"constant expression" and the machine provides it
- translates to a native rotate instruction if available
- "rotate IMM" can be 3 times faster than "rotate REG"
- do any architectures *not* provide a rotate?
  * Compiles to native "rotate REGISTER" if the rotate is variable and
the machine provides it
- do any architectures *not* provide a rotate?
  * Constant time
- important to high-integrity code
- Non-security code paths probably don't care

Maybe the first thing to do is provide a different rotates for the
constant-time requirement when its in effect?

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: better patch for linux/bitops.h

2016-05-04 Thread Theodore Ts'o
Instead of arguing over who's "sane" or "insane", can we come up with
a agreed upon set of tests, and a set of compiler and compiler
versions for which these tests must achieve at least *working* code?
Bonus points if they achieve optimal code, but what's important is
that for a wide range of GCC versions (from ancient RHEL distributions
to bleeding edge gcc 5.x compilers) this *must* work.

>From my perspective, clang and ICC producing corret code is a very
nice to have, but most shops I know of don't yet assume that clang
produces code that is trustworthy for production systems, although it
*is* great for as far as generating compiler warnings to find
potential bugs.

But instead of arguing over what works and doesn't, let's just create
the the test set and just try it on a wide range of compilers and
architectures, hmmm?

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: better patch for linux/bitops.h

2016-05-04 Thread Jeffrey Walton
>>> So you are actually saying outright that we should sacrifice *actual*
>>portability in favor of *theoretical* portability?  What kind of
>>twilight zone did we just step into?!
>>
>>I'm not sure what you mean. It will be well defined on all platforms.
>>Clang may not recognize the pattern, which means they could run
>>slower. GCC and ICC will be fine.
>>
>>Slower but correct code is what you have to live with until the Clang
>>dev's fix their compiler.
>>
>>Its kind of like what Dr. Jon Bentley said: "If it doesn't have to be
>>correct, I can make it as fast as you'd like it to be".
>
> The current code works on all compilers we care about.  The code you propose 
> does not; it doesn't work on anything but very recent versions of our 
> flagship target compiler, and pretty your own admission might even cause 
> security hazards in the kernel if compiled on clang.

I'm not sure how you're arriving at the conclusion the code does not work.

> That qualifies as insane in my book.

OK, thanks.

I see the kernel is providing IPSec, SSL/TLS, etc. You can make
SSL/TLS run faster by using aNULL and eNULL.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: better patch for linux/bitops.h

2016-05-04 Thread H. Peter Anvin
On May 4, 2016 7:54:12 PM PDT, Jeffrey Walton  wrote:
>On Wed, May 4, 2016 at 10:41 PM, H. Peter Anvin  wrote:
>> On May 4, 2016 6:35:44 PM PDT, Jeffrey Walton 
>wrote:
>>>On Wed, May 4, 2016 at 5:52 PM, John Denker  wrote:
 On 05/04/2016 02:42 PM, I wrote:

> I find it very odd that the other seven functions were not
> upgraded. I suggest the attached fix-others.diff would make
> things more consistent.

 Here's a replacement patch.
 ...
>>>
>>>+1, commit it.
>>>
>>>Its good for three additional reasons. First, this change means the
>>>kernel is teaching the next generation the correct way to do things.
>>>Many developers aspire to be kernel hackers, and they sometimes use
>>>the code from bitops.h. I've actually run across the code in
>>>production during an audit where the developers cited bitops.h.
>>>
>>>Second, it preserves a "silent and dark" cockpit during analysis.
>That
>>>is, when analysis is run, no findings are generated. Auditors and
>>>security folks like quiet tools, much like the way pilots like their
>>>cockpits (flashing lights and sounding buzzers usually means
>something
>>>is wrong).
>>>
>>>Third, the pattern is recognized by the major compilers, so the
>kernel
>>>should not have any trouble when porting any of the compilers. I
>often
>>>use multiple compiler to tease out implementation defined behavior in
>>>a effort to achieve greater portability. Here are the citations to
>>>ensure the kernel is safe with the pattern:
>>>
>>>  * GCC: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157
>>>  * ICC: http://software.intel.com/en-us/forums/topic/580884
>>>
>>>However, Clang may cause trouble because they don't want the
>>>responsibility of recognizing the pattern:
>>>
>>> * https://llvm.org/bugs/show_bug.cgi?id=24226#c8
>>>
>>>Instead, they provide a defective rotate. The "defect" here is its
>>>non-constant time due to the branch, so it may not be suitable for
>>>high-integrity or high-assurance code like linux-crypto:
>>>
>>>  * https://llvm.org/bugs/show_bug.cgi?id=24226#c5
>>>
>>>Jeff
>>
>> So you are actually saying outright that we should sacrifice *actual*
>portability in favor of *theoretical* portability?  What kind of
>twilight zone did we just step into?!
>
>I'm not sure what you mean. It will be well defined on all platforms.
>Clang may not recognize the pattern, which means they could run
>slower. GCC and ICC will be fine.
>
>Slower but correct code is what you have to live with until the Clang
>dev's fix their compiler.
>
>Its kind of like what Dr. Jon Bentley said: "If it doesn't have to be
>correct, I can make it as fast as you'd like it to be".
>
>Jeff

The current code works on all compilers we care about.  The code you propose 
does not; it doesn't work on anything but very recent versions of our flagship 
target compiler, and pretty your own admission might even cause security 
hazards in the kernel if compiled on clang.

That qualifies as insane in my book.  The church code is de facto portable with 
the intended outcome, the one you propose is not.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: better patch for linux/bitops.h

2016-05-04 Thread Jeffrey Walton
On Wed, May 4, 2016 at 10:41 PM, H. Peter Anvin  wrote:
> On May 4, 2016 6:35:44 PM PDT, Jeffrey Walton  wrote:
>>On Wed, May 4, 2016 at 5:52 PM, John Denker  wrote:
>>> On 05/04/2016 02:42 PM, I wrote:
>>>
 I find it very odd that the other seven functions were not
 upgraded. I suggest the attached fix-others.diff would make
 things more consistent.
>>>
>>> Here's a replacement patch.
>>> ...
>>
>>+1, commit it.
>>
>>Its good for three additional reasons. First, this change means the
>>kernel is teaching the next generation the correct way to do things.
>>Many developers aspire to be kernel hackers, and they sometimes use
>>the code from bitops.h. I've actually run across the code in
>>production during an audit where the developers cited bitops.h.
>>
>>Second, it preserves a "silent and dark" cockpit during analysis. That
>>is, when analysis is run, no findings are generated. Auditors and
>>security folks like quiet tools, much like the way pilots like their
>>cockpits (flashing lights and sounding buzzers usually means something
>>is wrong).
>>
>>Third, the pattern is recognized by the major compilers, so the kernel
>>should not have any trouble when porting any of the compilers. I often
>>use multiple compiler to tease out implementation defined behavior in
>>a effort to achieve greater portability. Here are the citations to
>>ensure the kernel is safe with the pattern:
>>
>>  * GCC: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157
>>  * ICC: http://software.intel.com/en-us/forums/topic/580884
>>
>>However, Clang may cause trouble because they don't want the
>>responsibility of recognizing the pattern:
>>
>> * https://llvm.org/bugs/show_bug.cgi?id=24226#c8
>>
>>Instead, they provide a defective rotate. The "defect" here is its
>>non-constant time due to the branch, so it may not be suitable for
>>high-integrity or high-assurance code like linux-crypto:
>>
>>  * https://llvm.org/bugs/show_bug.cgi?id=24226#c5
>>
>>Jeff
>
> So you are actually saying outright that we should sacrifice *actual* 
> portability in favor of *theoretical* portability?  What kind of twilight 
> zone did we just step into?!

I'm not sure what you mean. It will be well defined on all platforms.
Clang may not recognize the pattern, which means they could run
slower. GCC and ICC will be fine.

Slower but correct code is what you have to live with until the Clang
dev's fix their compiler.

Its kind of like what Dr. Jon Bentley said: "If it doesn't have to be
correct, I can make it as fast as you'd like it to be".

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: better patch for linux/bitops.h

2016-05-04 Thread H. Peter Anvin
On May 4, 2016 6:35:44 PM PDT, Jeffrey Walton  wrote:
>On Wed, May 4, 2016 at 5:52 PM, John Denker  wrote:
>> On 05/04/2016 02:42 PM, I wrote:
>>
>>> I find it very odd that the other seven functions were not
>>> upgraded. I suggest the attached fix-others.diff would make
>>> things more consistent.
>>
>> Here's a replacement patch.
>> ...
>
>+1, commit it.
>
>Its good for three additional reasons. First, this change means the
>kernel is teaching the next generation the correct way to do things.
>Many developers aspire to be kernel hackers, and they sometimes use
>the code from bitops.h. I've actually run across the code in
>production during an audit where the developers cited bitops.h.
>
>Second, it preserves a "silent and dark" cockpit during analysis. That
>is, when analysis is run, no findings are generated. Auditors and
>security folks like quiet tools, much like the way pilots like their
>cockpits (flashing lights and sounding buzzers usually means something
>is wrong).
>
>Third, the pattern is recognized by the major compilers, so the kernel
>should not have any trouble when porting any of the compilers. I often
>use multiple compiler to tease out implementation defined behavior in
>a effort to achieve greater portability. Here are the citations to
>ensure the kernel is safe with the pattern:
>
>  * GCC: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157
>  * ICC: http://software.intel.com/en-us/forums/topic/580884
>
>However, Clang may cause trouble because they don't want the
>responsibility of recognizing the pattern:
>
> * https://llvm.org/bugs/show_bug.cgi?id=24226#c8
>
>Instead, they provide a defective rotate. The "defect" here is its
>non-constant time due to the branch, so it may not be suitable for
>high-integrity or high-assurance code like linux-crypto:
>
>  * https://llvm.org/bugs/show_bug.cgi?id=24226#c5
>
>Jeff

So you are actually saying outright that we should sacrifice *actual* 
portability in favor of *theoretical* portability?  What kind of twilight zone 
did we just step into?!
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: better patch for linux/bitops.h

2016-05-04 Thread Jeffrey Walton
On Wed, May 4, 2016 at 5:52 PM, John Denker  wrote:
> On 05/04/2016 02:42 PM, I wrote:
>
>> I find it very odd that the other seven functions were not
>> upgraded. I suggest the attached fix-others.diff would make
>> things more consistent.
>
> Here's a replacement patch.
> ...

+1, commit it.

Its good for three additional reasons. First, this change means the
kernel is teaching the next generation the correct way to do things.
Many developers aspire to be kernel hackers, and they sometimes use
the code from bitops.h. I've actually run across the code in
production during an audit where the developers cited bitops.h.

Second, it preserves a "silent and dark" cockpit during analysis. That
is, when analysis is run, no findings are generated. Auditors and
security folks like quiet tools, much like the way pilots like their
cockpits (flashing lights and sounding buzzers usually means something
is wrong).

Third, the pattern is recognized by the major compilers, so the kernel
should not have any trouble when porting any of the compilers. I often
use multiple compiler to tease out implementation defined behavior in
a effort to achieve greater portability. Here are the citations to
ensure the kernel is safe with the pattern:

  * GCC: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157
  * ICC: http://software.intel.com/en-us/forums/topic/580884

However, Clang may cause trouble because they don't want the
responsibility of recognizing the pattern:

 * https://llvm.org/bugs/show_bug.cgi?id=24226#c8

Instead, they provide a defective rotate. The "defect" here is its
non-constant time due to the branch, so it may not be suitable for
high-integrity or high-assurance code like linux-crypto:

  * https://llvm.org/bugs/show_bug.cgi?id=24226#c5

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux/bitops.h

2016-05-04 Thread H. Peter Anvin
On May 4, 2016 6:20:32 PM PDT, Jeffrey Walton  wrote:
>On Wed, May 4, 2016 at 7:06 PM, Andi Kleen  wrote:
>> On Wed, May 04, 2016 at 03:06:04PM -0700, John Denker wrote:
>>> On 05/04/2016 02:56 PM, H. Peter Anvin wrote:
>>> >> Beware that shifting by an amount >= the number of bits in the
>>> >> word remains Undefined Behavior.
>>>
>>> > This construct has been supported as a rotate since at least gcc2.
>>>
>>> How then should we understand the story told in commit d7e35dfa?
>>> Is the story wrong?
>>
>> I don't think Linux runs on a system where it would make a difference
>> (like a VAX), and also gcc always converts it before it could.
>> Even UBSan should not complain because it runs after the conversion
>> to ROTATE.
>>
>From what I understand, its a limitation in the barrel shifter and the
>way the shift bits are handled.
>
>Linux runs on a great number of devices, so its conceivable (likely?)
>a low-cost board would have hardware limitations that not found in
>modern desktops and servers or VAX.
>
>Jeff

This is a compiler feature, though.  And if icc or clang don't support the 
idiom they would never have compiled a working kernel.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux/bitops.h

2016-05-04 Thread Jeffrey Walton
On Wed, May 4, 2016 at 7:06 PM, Andi Kleen  wrote:
> On Wed, May 04, 2016 at 03:06:04PM -0700, John Denker wrote:
>> On 05/04/2016 02:56 PM, H. Peter Anvin wrote:
>> >> Beware that shifting by an amount >= the number of bits in the
>> >> word remains Undefined Behavior.
>>
>> > This construct has been supported as a rotate since at least gcc2.
>>
>> How then should we understand the story told in commit d7e35dfa?
>> Is the story wrong?
>
> I don't think Linux runs on a system where it would make a difference
> (like a VAX), and also gcc always converts it before it could.
> Even UBSan should not complain because it runs after the conversion
> to ROTATE.
>
>From what I understand, its a limitation in the barrel shifter and the
way the shift bits are handled.

Linux runs on a great number of devices, so its conceivable (likely?)
a low-cost board would have hardware limitations that not found in
modern desktops and servers or VAX.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] random: replace non-blocking pool with a Chacha20-based CRNG

2016-05-04 Thread Theodore Ts'o
On Wed, May 04, 2016 at 10:28:24PM +0200, Stephan Mueller wrote:
> > +out:
> > +   spin_unlock_irqrestore(_crng.lock, flags);
> > +   return ret;
> 
> Where did you add the memzero_explict of tmp?

Oops, sorry, somehow that change got lost in the patch updates.  Fixed now.

 - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux/bitops.h

2016-05-04 Thread Linus Torvalds
On Wed, May 4, 2016 at 5:30 PM, H. Peter Anvin  wrote:
>
> Yes. d7e35dfa is baloney IMNSHO.  All it does is produce worse code, and the
> description even says so.
>
> As I said, gcc has treated the former code as idiomatic since gcc 2, so that
> support is beyond ancient.

Well, we're *trying* to get clang supported, so the argument that gcc
has always supported it and compiles correct code for it is not
necessarily the whole story yet.

The problem with "32 - shift" is that it really could generate garbage
in situations where shift ends up being a constant zero..

That said, the fact that the other cases weren't changed
(rol64/ror64/ror32) does make that argument less interesting. Unless
there was some particular code that actively ended up using
"rol32(..0)" but not the other cases.

(And yes, rol32(x,0) is obviously nonsense, but it could easily happen
with inline functions or macros that end up generating that).

Note that there may be 8 "rol/ror" functions, but the 16-bit and 8-bit
ones don't have the undefined semantics. So there are only four that
had _that_ problem, although I agree that changing just one out of
four is wrong.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux/bitops.h

2016-05-04 Thread H. Peter Anvin

On 05/04/16 15:06, John Denker wrote:

On 05/04/2016 02:56 PM, H. Peter Anvin wrote:

Beware that shifting by an amount >= the number of bits in the
word remains Undefined Behavior.



This construct has been supported as a rotate since at least gcc2.


How then should we understand the story told in commit d7e35dfa?
Is the story wrong?

At the very least, something inconsistent is going on.  There
are 8 functions.  Why did d7e35dfa change one of them but
not the other 7?


Yes. d7e35dfa is baloney IMNSHO.  All it does is produce worse code, and 
the description even says so.


As I said, gcc has treated the former code as idiomatic since gcc 2, so 
that support is beyond ancient.


-hpa




--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux/bitops.h

2016-05-04 Thread John Denker
On 05/04/2016 04:06 PM, Andi Kleen wrote:

> gcc always converts it before it could
[make a difference].

At the moment, current versions of gcc treat the idiomatic
ror/rol code as something they support ... but older versions
do not, and future version may not.

The gcc guys have made it very clear that they reserve the
right to do absolutely anything they want in a UB situation.
 -- What is true as of today might not be "always" true.
 -- What is true at one level of optimization might not be
  true at another.
 -- The consequences can be highly nonlocal and counterintuitive.
  For example, in the case of:
 rslt = word << (32 - N);
 ...
 ...
 if (!N) { ... }
  the compiler could assume that N is necessarily nonzero,
  and many lines later it could optimize out the whole
  if-block.  So, even if the "<<" operator gives the right
  result, there could be ghastly failures elsewhere.  It
  might work for some people but not others.

> So it's unlikely to be a pressing issue.

Sometimes issues that are not urgently "pressing" ought
to be dealt with in a systematic way.

There are serious people who think that avoiding UB is
a necessity, if you want the code to be reliable and
maintainable.

I renew the question:  Why did commit d7e35dfa upgrade
one of the 8 functions but not the other 7?
  -- I could understand 0 of 8, or 8 of 8.
  -- In contrast, I'm having a hard time understanding
   why 7 of the 8 use the idiomatic expression while the
   8th does not.

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux/bitops.h

2016-05-04 Thread Andi Kleen
On Wed, May 04, 2016 at 03:06:04PM -0700, John Denker wrote:
> On 05/04/2016 02:56 PM, H. Peter Anvin wrote:
> >> Beware that shifting by an amount >= the number of bits in the
> >> word remains Undefined Behavior.
> 
> > This construct has been supported as a rotate since at least gcc2.
> 
> How then should we understand the story told in commit d7e35dfa?
> Is the story wrong?

I don't think Linux runs on a system where it would make a difference
(like a VAX), and also gcc always converts it before it could.
Even UBSan should not complain because it runs after the conversion
to ROTATE.

So it's unlikely to be a pressing issue.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/7] asm-generic/io.h: add io{read,write}64 accessors

2016-05-04 Thread Arnd Bergmann
On Wednesday 04 May 2016 20:16:19 Horia Geantă wrote:
> @@ -625,6 +645,16 @@ static inline u32 ioread32be(const volatile void __iomem 
> *addr)
>  }
>  #endif
>  
> +#ifdef CONFIG_64BIT
> +#ifndef ioread64be
> +#define ioread64be ioread64be
> +static inline u64 ioread64be(const volatile void __iomem *addr)
> +{
> +   return __be64_to_cpu(__raw_readq(addr));
> +}
> +#endif
> +#endif /* CONFIG_64BIT */
> +
>  #ifndef iowrite16be
>  #define iowrite16be iowrite16be
>  static inline void iowrite16be(u16 value, void volatile __iomem *addr)
> @@ -641,6 +671,16 @@ static inline void iowrite32be(u32 value, volatile void 
> __iomem *addr)
>  }
>  #endif
>  
> +#ifdef CONFIG_64BIT
> +#ifndef iowrite64be
> +#define iowrite64be iowrite64be
> +static inline void iowrite64be(u64 value, volatile void __iomem *addr)
> +{
> +   __raw_writeq(__cpu_to_be64(value), addr);
> +}
> +#endif
> +#endif /* CONFIG_64BIT */
> +
> 

I just noticed that these two are both a bit wrong, but they copy the
mistake that already exists in the 16 and 32 bit versions: If an
architecture overrides readq/writeq to have barriers but does not override
ioread64be/iowrite64be, this will lack the barriers and behave differently
from the little-endian version. I think the only affected architecture
is ARC, since ARM and ARM64 both override the big-endian accessors to
have the correct barriers, and all others don't use barriers at all.

Maybe you can add a patch before this one to replace the 16/32-bit accessors
with ones that do a

static inline void iowrite32be(u32 value, volatile void __iomem *addr)
{
writel(swab32(value), addr);
}

This will lead to a double-swap on architectures that don't override it,
but it will work correctly on all architectures without them having
to override the big-endian accessors.

Aside from that, the patch looks fine.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux/bitops.h

2016-05-04 Thread John Denker
On 05/04/2016 02:56 PM, H. Peter Anvin wrote:
>> Beware that shifting by an amount >= the number of bits in the
>> word remains Undefined Behavior.

> This construct has been supported as a rotate since at least gcc2.

How then should we understand the story told in commit d7e35dfa?
Is the story wrong?

At the very least, something inconsistent is going on.  There
are 8 functions.  Why did d7e35dfa change one of them but
not the other 7?

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] random: replace non-blocking pool with a Chacha20-based CRNG

2016-05-04 Thread H. Peter Anvin
On May 4, 2016 2:42:53 PM PDT, John Denker  wrote:
>On 05/04/2016 12:07 PM, ty...@thunk.org wrote:
>
>> it doesn't hit the
>> UB case which Jeffrey was concerned about. 
>
>That should be good enough for present purposes
>
>However, in the interests of long-term maintainability, I
>would suggest sticking in a comment or assertion saying 
>that ror32(,shift) is never called with shift=0.  This
>can be removed if/when bitops.h is upgraded.
>
>There is a track record of compilers doing Bad Things in
>response to UB code, including some very counterintuitive
>Bad Things.
>
>On Wed, May 04, 2016 at 11:29:57AM -0700, H. Peter Anvin wrote:
>>>
>>> If bitops.h doesn't do the right thing, we need to
>>> fix bitops.h.
>
>Most of the ror and rol functions in linux/bitops.h
>should be considered unsafe, as currently implemented.
>http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/linux/bitops.h?id=04974df8049fc4240d22759a91e035082ccd18b4#n103
>
>I don't see anything in the file that suggests any limits
>on the range of the second argument.  So at best it is an
>undocumented trap for the unwary.  This has demonstrably
>been a problem in the past.  The explanation in the attached
>fix-rol32.diff makes amusing reading.
>
>Of the eight functions
>  ror64, rol64, ror32, rol32, ror16, rol16, ror8, and rol8,
>only one of them can handle shifting by zero, namely rol32.
>It was upgraded on Thu Dec 3 22:04:01 2015; see the attached
>fix-rol32.diff.
>
>I find it very odd that the other seven functions were not
>upgraded.  I suggest the attached fix-others.diff would make
>things more consistent.
>
>Beware that shifting by an amount >= the number of bits in the
>word remains Undefined Behavior.  This should be either documented
>or fixed.  It could be fixed easily enough.

This construct has been supported as a rotate since at least gcc2.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: better patch for linux/bitops.h

2016-05-04 Thread John Denker
On 05/04/2016 02:42 PM, I wrote:

> I find it very odd that the other seven functions were not
> upgraded. I suggest the attached fix-others.diff would make
> things more consistent.

Here's a replacement patch.
Same idea, less brain damage.
Sorry for the confusion.

commit ba83b16d8430ee6104aa1feeed4ff7a82b02747a
Author: John Denker 
Date:   Wed May 4 13:55:51 2016 -0700

Make ror64, rol64, ror32, ror16, rol16, ror8, and rol8
consistent with rol32 in their handling of shifting by a zero amount.

Same overall rationale as in d7e35dfa, just more consistently applied.

Beware that shifting by an amount >= the number of bits in the
word remains Undefined Behavior.  This should be either documented
or fixed.  It could be fixed easily enough.

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index defeaac..90f389b 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -87,7 +87,7 @@ static __always_inline unsigned long hweight_long(unsigned long w)
  */
 static inline __u64 rol64(__u64 word, unsigned int shift)
 {
-	return (word << shift) | (word >> (64 - shift));
+	return (word << shift) | (word >> ((-shift) & 63));
 }
 
 /**
@@ -97,7 +97,7 @@ static inline __u64 rol64(__u64 word, unsigned int shift)
  */
 static inline __u64 ror64(__u64 word, unsigned int shift)
 {
-	return (word >> shift) | (word << (64 - shift));
+	return (word >> shift) | (word << ((-shift) & 63));
 }
 
 /**
@@ -117,7 +117,7 @@ static inline __u32 rol32(__u32 word, unsigned int shift)
  */
 static inline __u32 ror32(__u32 word, unsigned int shift)
 {
-	return (word >> shift) | (word << (32 - shift));
+	return (word >> shift) | (word << ((-shift) & 31));
 }
 
 /**
@@ -127,7 +127,7 @@ static inline __u32 ror32(__u32 word, unsigned int shift)
  */
 static inline __u16 rol16(__u16 word, unsigned int shift)
 {
-	return (word << shift) | (word >> (16 - shift));
+	return (word << shift) | (word >> ((-shift) & 15));
 }
 
 /**
@@ -137,7 +137,7 @@ static inline __u16 rol16(__u16 word, unsigned int shift)
  */
 static inline __u16 ror16(__u16 word, unsigned int shift)
 {
-	return (word >> shift) | (word << (16 - shift));
+	return (word >> shift) | (word << ((-shift) & 15));
 }
 
 /**
@@ -147,7 +147,7 @@ static inline __u16 ror16(__u16 word, unsigned int shift)
  */
 static inline __u8 rol8(__u8 word, unsigned int shift)
 {
-	return (word << shift) | (word >> (8 - shift));
+	return (word << shift) | (word >> ((-shift) & 7));
 }
 
 /**
@@ -157,7 +157,7 @@ static inline __u8 rol8(__u8 word, unsigned int shift)
  */
 static inline __u8 ror8(__u8 word, unsigned int shift)
 {
-	return (word >> shift) | (word << (8 - shift));
+	return (word >> shift) | (word << ((-shift) & 7));
 }
 
 /**


Re: [PATCH 1/3] random: replace non-blocking pool with a Chacha20-based CRNG

2016-05-04 Thread John Denker
On 05/04/2016 12:07 PM, ty...@thunk.org wrote:

> it doesn't hit the
> UB case which Jeffrey was concerned about. 

That should be good enough for present purposes

However, in the interests of long-term maintainability, I
would suggest sticking in a comment or assertion saying 
that ror32(,shift) is never called with shift=0.  This
can be removed if/when bitops.h is upgraded.

There is a track record of compilers doing Bad Things in
response to UB code, including some very counterintuitive
Bad Things.

On Wed, May 04, 2016 at 11:29:57AM -0700, H. Peter Anvin wrote:
>>
>> If bitops.h doesn't do the right thing, we need to
>> fix bitops.h.

Most of the ror and rol functions in linux/bitops.h
should be considered unsafe, as currently implemented.
   
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/linux/bitops.h?id=04974df8049fc4240d22759a91e035082ccd18b4#n103

I don't see anything in the file that suggests any limits
on the range of the second argument.  So at best it is an
undocumented trap for the unwary.  This has demonstrably
been a problem in the past.  The explanation in the attached
fix-rol32.diff makes amusing reading.

Of the eight functions
  ror64, rol64, ror32, rol32, ror16, rol16, ror8, and rol8,
only one of them can handle shifting by zero, namely rol32.
It was upgraded on Thu Dec 3 22:04:01 2015; see the attached
fix-rol32.diff.

I find it very odd that the other seven functions were not
upgraded.  I suggest the attached fix-others.diff would make
things more consistent.

Beware that shifting by an amount >= the number of bits in the
word remains Undefined Behavior.  This should be either documented
or fixed.  It could be fixed easily enough.
commit d7e35dfa2531b53618b9e6edcd8752ce988ac555
Author: Sasha Levin 
Date:   Thu Dec 3 22:04:01 2015 -0500

bitops.h: correctly handle rol32 with 0 byte shift

ROL on a 32 bit integer with a shift of 32 or more is undefined and the
result is arch-dependent. Avoid this by handling the trivial case of
roling by 0 correctly.

The trivial solution of checking if shift is 0 breaks gcc's detection
of this code as a ROL instruction, which is unacceptable.

This bug was reported and fixed in GCC
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157):

	The standard rotate idiom,

	  (x << n) | (x >> (32 - n))

	is recognized by gcc (for concreteness, I discuss only the case that x
	is an uint32_t here).

	However, this is portable C only for n in the range 0 < n < 32. For n
	== 0, we get x >> 32 which gives undefined behaviour according to the
	C standard (6.5.7, Bitwise shift operators). To portably support n ==
	0, one has to write the rotate as something like

	  (x << n) | (x >> ((-n) & 31))

	And this is apparently not recognized by gcc.

Note that this is broken on older GCCs and will result in slower ROL.

Acked-by: Linus Torvalds 
Signed-off-by: Sasha Levin 
Signed-off-by: Linus Torvalds 

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 2b8ed12..defeaac 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -107,7 +107,7 @@ static inline __u64 ror64(__u64 word, unsigned int shift)
  */
 static inline __u32 rol32(__u32 word, unsigned int shift)
 {
-	return (word << shift) | (word >> (32 - shift));
+	return (word << shift) | (word >> ((-shift) & 31));
 }
 
 /**
commit 03b97eeb5401ede1e1d7b6fbf6a9575db8d0efa6
Author: John Denker 
Date:   Wed May 4 13:55:51 2016 -0700

Make ror64, rol64, ror32, ror16, rol16, ror8, and rol8
consistent with rol32 in their handling of shifting by a zero amount.

Same overall rationale as in d7e35dfa, just more consistently applied.

Beware that shifting by an amount >= the number of bits in the
word remains Undefined Behavior.  This should be either documented
or fixed.  It could be fixed easily enough.

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index defeaac..97096b4 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -87,7 +87,7 @@ static __always_inline unsigned long hweight_long(unsigned long w)
  */
 static inline __u64 rol64(__u64 word, unsigned int shift)
 {
-	return (word << shift) | (word >> (64 - shift));
+	return (word << shift) | (word >> ((-shift) & 64));
 }
 
 /**
@@ -97,7 +97,7 @@ static inline __u64 rol64(__u64 word, unsigned int shift)
  */
 static inline __u64 ror64(__u64 word, unsigned int shift)
 {
-	return (word >> shift) | (word << (64 - shift));
+	return (word >> shift) | (word << ((-shift) & 64));
 }
 
 /**
@@ -117,7 +117,7 @@ static inline __u32 rol32(__u32 word, unsigned int shift)
  */
 static inline __u32 ror32(__u32 word, unsigned int shift)
 {
-	return (word >> shift) | (word << (32 - shift));
+	return (word >> shift) 

Re: [PATCH 1/3] random: replace non-blocking pool with a Chacha20-based CRNG

2016-05-04 Thread H. Peter Anvin
On 05/04/2016 12:07 PM, ty...@thunk.org wrote:
> 
> If we are all agreed that what is in bitops.h is considered valid,
> then we can start converting people over to using the version defined
> in bitops.h, and if there is some compiler issue we need to work
> around, at least we only need to put the workaround in a single header
> file.
> 

Yes, please.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [crypto / sparc64] cryptomgr_test OOPS

2016-05-04 Thread David Miller
From: Anatoly Pugachev 
Date: Wed, 4 May 2016 22:10:47 +0300

> Here's a quick diff related to crypto for debian kernel configs:
> 
> $ diff -u /boot/config-4.5.0-2-sparc64-smp /boot/config-4.6.0-rc5-sparc64-smp
> @@ -5299,10 +5380,9 @@
>  CONFIG_CRYPTO_RNG=m
>  CONFIG_CRYPTO_RNG2=y
>  CONFIG_CRYPTO_RNG_DEFAULT=m
> -CONFIG_CRYPTO_PCOMP=m
> -CONFIG_CRYPTO_PCOMP2=y
>  CONFIG_CRYPTO_AKCIPHER2=y
> -# CONFIG_CRYPTO_RSA is not set
> +CONFIG_CRYPTO_AKCIPHER=y
> +CONFIG_CRYPTO_RSA=y
>  CONFIG_CRYPTO_MANAGER=y
>  CONFIG_CRYPTO_MANAGER2=y
>  # CONFIG_CRYPTO_USER is not set
> 
> so, I need to compile 4.5.x kernel (or even older kernels) with
> CRYPTO_RSA=y and test how it would act.

Well yes, that would be an absolutely critical datapoint.

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] random: replace non-blocking pool with a Chacha20-based CRNG

2016-05-04 Thread Stephan Mueller
Am Mittwoch, 4. Mai 2016, 15:25:48 schrieb Theodore Ts'o:

Hi Theodore,

> The CRNG is faster, and we don't pretend to track entropy usage in the
> CRNG any more.
> 
> Signed-off-by: Theodore Ts'o 
> ---
>  crypto/chacha20_generic.c |  61 --
>  drivers/char/random.c | 283
> +++--- include/crypto/chacha20.h | 
>  1 +
>  lib/Makefile  |   2 +-
>  lib/chacha20.c|  79 +
>  5 files changed, 295 insertions(+), 131 deletions(-)
>  create mode 100644 lib/chacha20.c
> 
> diff --git a/crypto/chacha20_generic.c b/crypto/chacha20_generic.c
> index da9c899..1cab831 100644
> --- a/crypto/chacha20_generic.c
> +++ b/crypto/chacha20_generic.c
> @@ -15,72 +15,11 @@
>  #include 
>  #include 
> 
> -static inline u32 rotl32(u32 v, u8 n)
> -{
> - return (v << n) | (v >> (sizeof(v) * 8 - n));
> -}
> -
>  static inline u32 le32_to_cpuvp(const void *p)
>  {
>   return le32_to_cpup(p);
>  }
> 
> -static void chacha20_block(u32 *state, void *stream)
> -{
> - u32 x[16], *out = stream;
> - int i;
> -
> - for (i = 0; i < ARRAY_SIZE(x); i++)
> - x[i] = state[i];
> -
> - for (i = 0; i < 20; i += 2) {
> - x[0]  += x[4];x[12] = rotl32(x[12] ^ x[0],  16);
> - x[1]  += x[5];x[13] = rotl32(x[13] ^ x[1],  16);
> - x[2]  += x[6];x[14] = rotl32(x[14] ^ x[2],  16);
> - x[3]  += x[7];x[15] = rotl32(x[15] ^ x[3],  16);
> -
> - x[8]  += x[12];   x[4]  = rotl32(x[4]  ^ x[8],  12);
> - x[9]  += x[13];   x[5]  = rotl32(x[5]  ^ x[9],  12);
> - x[10] += x[14];   x[6]  = rotl32(x[6]  ^ x[10], 12);
> - x[11] += x[15];   x[7]  = rotl32(x[7]  ^ x[11], 12);
> -
> - x[0]  += x[4];x[12] = rotl32(x[12] ^ x[0],   8);
> - x[1]  += x[5];x[13] = rotl32(x[13] ^ x[1],   8);
> - x[2]  += x[6];x[14] = rotl32(x[14] ^ x[2],   8);
> - x[3]  += x[7];x[15] = rotl32(x[15] ^ x[3],   8);
> -
> - x[8]  += x[12];   x[4]  = rotl32(x[4]  ^ x[8],   7);
> - x[9]  += x[13];   x[5]  = rotl32(x[5]  ^ x[9],   7);
> - x[10] += x[14];   x[6]  = rotl32(x[6]  ^ x[10],  7);
> - x[11] += x[15];   x[7]  = rotl32(x[7]  ^ x[11],  7);
> -
> - x[0]  += x[5];x[15] = rotl32(x[15] ^ x[0],  16);
> - x[1]  += x[6];x[12] = rotl32(x[12] ^ x[1],  16);
> - x[2]  += x[7];x[13] = rotl32(x[13] ^ x[2],  16);
> - x[3]  += x[4];x[14] = rotl32(x[14] ^ x[3],  16);
> -
> - x[10] += x[15];   x[5]  = rotl32(x[5]  ^ x[10], 12);
> - x[11] += x[12];   x[6]  = rotl32(x[6]  ^ x[11], 12);
> - x[8]  += x[13];   x[7]  = rotl32(x[7]  ^ x[8],  12);
> - x[9]  += x[14];   x[4]  = rotl32(x[4]  ^ x[9],  12);
> -
> - x[0]  += x[5];x[15] = rotl32(x[15] ^ x[0],   8);
> - x[1]  += x[6];x[12] = rotl32(x[12] ^ x[1],   8);
> - x[2]  += x[7];x[13] = rotl32(x[13] ^ x[2],   8);
> - x[3]  += x[4];x[14] = rotl32(x[14] ^ x[3],   8);
> -
> - x[10] += x[15];   x[5]  = rotl32(x[5]  ^ x[10],  7);
> - x[11] += x[12];   x[6]  = rotl32(x[6]  ^ x[11],  7);
> - x[8]  += x[13];   x[7]  = rotl32(x[7]  ^ x[8],   7);
> - x[9]  += x[14];   x[4]  = rotl32(x[4]  ^ x[9],   7);
> - }
> -
> - for (i = 0; i < ARRAY_SIZE(x); i++)
> - out[i] = cpu_to_le32(x[i] + state[i]);
> -
> - state[12]++;
> -}
> -
>  static void chacha20_docrypt(u32 *state, u8 *dst, const u8 *src,
>unsigned int bytes)
>  {
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index b583e53..91d5c2a 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -260,6 +260,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include 
>  #include 
> @@ -412,6 +413,18 @@ static struct fasync_struct *fasync;
>  static DEFINE_SPINLOCK(random_ready_list_lock);
>  static LIST_HEAD(random_ready_list);
> 
> +/*
> + * crng_init =  0 --> Uninitialized
> + *   2 --> Initialized
> + *   3 --> Initialized from input_pool
> + *
> + * crng_init is protected by primary_crng->lock, and only increases
> + * its value (from 0->1->2->3).
> + */
> +static int crng_init = 0;
> +#define crng_ready() (likely(crng_init >= 2))
> +static void process_random_ready_list(void);
> +
>  /**
>   *
>   * OS independent entropy store.   Here are the functions which handle
> @@ -441,10 +454,13 @@ struct entropy_store {
>   __u8 last_data[EXTRACT_SIZE];
>  };
> 
> +static ssize_t extract_entropy(struct entropy_store *r, void *buf,
> +size_t nbytes, int min, int rsvd);
> +
> +static int crng_reseed(struct entropy_store *r);
>  static void push_to_pool(struct 

Re: [crypto / sparc64] cryptomgr_test OOPS

2016-05-04 Thread Tadeusz Struk
Hi Anatoly,
On 05/04/2016 12:10 PM, Anatoly Pugachev wrote:
> we're using 4.5.2 debian kernel here without this problem. I'm not
> sure I would be able to bisect, never did it before, but I could
> try...

On 4.5.2 could you try "modprobe rsa"

-- 
TS
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] random: add interrupt callback to VMBus IRQ handler

2016-05-04 Thread Theodore Ts'o
From: Stephan Mueller 

The Hyper-V Linux Integration Services use the VMBus implementation for
communication with the Hypervisor. VMBus registers its own interrupt
handler that completely bypasses the common Linux interrupt handling.
This implies that the interrupt entropy collector is not triggered.

This patch adds the interrupt entropy collection callback into the VMBus
interrupt handler function.

Signed-off-by: Stephan Mueller 
Signed-off-by: Stephan Mueller 
Signed-off-by: Theodore Ts'o 
---
 drivers/char/random.c  | 1 +
 drivers/hv/vmbus_drv.c | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index b970db6..897c75e 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1134,6 +1134,7 @@ void add_interrupt_randomness(int irq, int irq_flags)
/* award one bit for the contents of the fast pool */
credit_entropy_bits(r, credit + 1);
 }
+EXPORT_SYMBOL_GPL(add_interrupt_randomness);
 
 #ifdef CONFIG_BLOCK
 void add_disk_randomness(struct gendisk *disk)
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 64713ff..9af61bb 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -41,6 +41,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "hyperv_vmbus.h"
 
 static struct acpi_device  *hv_acpi_dev;
@@ -801,6 +802,8 @@ static void vmbus_isr(void)
else
tasklet_schedule(hv_context.msg_dpc[cpu]);
}
+
+   add_interrupt_randomness(HYPERVISOR_CALLBACK_VECTOR, 0);
 }
 
 
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] random: make /dev/urandom scalable for silly userspace programs

2016-05-04 Thread Theodore Ts'o
On a system with a 4 socket (NUMA) system where a large number of
application threads were all trying to read from /dev/urandom, this
can result in the system spending 80% of its time contending on the
global urandom spinlock.  The application should have used its own
PRNG, but let's try to help it from running, lemming-like, straight
over the locking cliff.

Reported-by: Andi Kleen 
Signed-off-by: Theodore Ts'o 
---
 drivers/char/random.c | 67 +++
 1 file changed, 62 insertions(+), 5 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 91d5c2a..b970db6 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -749,6 +749,17 @@ struct crng_state primary_crng = {
 };
 static DECLARE_WAIT_QUEUE_HEAD(crng_init_wait);
 
+#ifdef CONFIG_NUMA
+/*
+ * Hack to deal with crazy userspace progams when they are all trying
+ * to access /dev/urandom in parallel.  The programs are almost
+ * certainly doing something terribly wrong, but we'll work around
+ * their brain damage.
+ */
+static struct crng_state **crng_node_pool __read_mostly;
+#endif
+
+
 static void _initialize_crng(struct crng_state *crng)
 {
int i;
@@ -764,11 +775,13 @@ static void _initialize_crng(struct crng_state *crng)
crng->init_time = jiffies - CRNG_RESEED_INTERVAL;
 }
 
+#ifdef CONFIG_NUMA
 static void initialize_crng(struct crng_state *crng)
 {
_initialize_crng(crng);
spin_lock_init(>lock);
 }
+#endif
 
 static int crng_fast_load(__u32 pool[4])
 {
@@ -823,19 +836,23 @@ out:
return ret;
 }
 
+static inline void maybe_reseed_primary_crng(void)
+{
+   if (crng_init > 2 &&
+   time_after(jiffies, primary_crng.init_time + CRNG_RESEED_INTERVAL))
+   crng_reseed(_pool);
+}
+
 static inline void crng_wait_ready(void)
 {
wait_event_interruptible(crng_init_wait, crng_ready());
 }
 
-static void extract_crng(__u8 out[CHACHA20_BLOCK_SIZE])
+static void _extract_crng(struct crng_state *crng,
+ __u8 out[CHACHA20_BLOCK_SIZE])
 {
unsigned long v, flags;
-   struct crng_state *crng = _crng;
 
-   if (crng_init > 2 &&
-   time_after(jiffies, crng->init_time + CRNG_RESEED_INTERVAL))
-   crng_reseed(_pool);
spin_lock_irqsave(>lock, flags);
if (arch_get_random_long())
crng->state[14] ^= v;
@@ -845,6 +862,30 @@ static void extract_crng(__u8 out[CHACHA20_BLOCK_SIZE])
spin_unlock_irqrestore(>lock, flags);
 }
 
+static void extract_crng(__u8 out[CHACHA20_BLOCK_SIZE])
+{
+#ifndef CONFIG_NUMA
+   maybe_reseed_primary_crng();
+   _extract_crng(_crng, out);
+#else
+   int node_id = numa_node_id();
+   struct crng_state *crng = crng_node_pool[node_id];
+
+   if (time_after(jiffies, crng->init_time + CRNG_RESEED_INTERVAL)) {
+   unsigned long flags;
+
+   maybe_reseed_primary_crng();
+   _extract_crng(_crng, out);
+   spin_lock_irqsave(>lock, flags);
+   memcpy(>state[4], out, CHACHA20_KEY_SIZE);
+   crng->state[15] = node_id;
+   crng->init_time = jiffies;
+   spin_unlock_irqrestore(>lock, flags);
+   }
+   _extract_crng(crng, out);
+#endif
+}
+
 static ssize_t extract_crng_user(void __user *buf, size_t nbytes)
 {
ssize_t ret = 0, i;
@@ -1549,6 +1590,22 @@ static void init_std_data(struct entropy_store *r)
  */
 static int rand_initialize(void)
 {
+#ifdef CONFIG_NUMA
+   int i;
+   int num_nodes = num_possible_nodes();
+   struct crng_state *crng;
+
+   crng_node_pool = kmalloc(num_nodes * sizeof(void *),
+GFP_KERNEL|__GFP_NOFAIL);
+
+   for (i=0; i < num_nodes; i++) {
+   crng = kmalloc(sizeof(struct crng_state),
+  GFP_KERNEL | __GFP_NOFAIL);
+   initialize_crng(crng);
+   crng_node_pool[i] = crng;
+
+   }
+#endif
init_std_data(_pool);
init_std_data(_pool);
_initialize_crng(_crng);
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] random: add backtracking protection to the CRNG

2016-05-04 Thread Theodore Ts'o
Signed-off-by: Theodore Ts'o 
---
 drivers/char/random.c | 35 +--
 1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 897c75e..028d085 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -886,6 +886,34 @@ static void extract_crng(__u8 out[CHACHA20_BLOCK_SIZE])
 #endif
 }
 
+/*
+ * Use the leftover bytes from the CRNG block output (if there is
+ * enough) to mutate the CRNG key to provide backtracking protection.
+ */
+static void crng_backtrack_protect(__u8 tmp[CHACHA20_BLOCK_SIZE], int unused)
+{
+#ifdef CONFIG_NUMA
+   struct crng_state *crng = crng_node_pool[numa_node_id()];
+#else
+   struct crng_state *crng = _crng;
+#endif
+   unsigned long   flags;
+   __u32   *s, *d;
+   int i;
+
+   unused = round_up(unused, sizeof(__u32));
+   if (unused + CHACHA20_KEY_SIZE > CHACHA20_BLOCK_SIZE) {
+   extract_crng(tmp);
+   unused = 0;
+   }
+   spin_lock_irqsave(>lock, flags);
+   s = (__u32 *) [unused];
+   d = >state[4];
+   for (i=0; i < 8; i++)
+   *d++ ^= *s++;
+   spin_unlock_irqrestore(>lock, flags);
+}
+
 static ssize_t extract_crng_user(void __user *buf, size_t nbytes)
 {
ssize_t ret = 0, i;
@@ -913,6 +941,7 @@ static ssize_t extract_crng_user(void __user *buf, size_t 
nbytes)
buf += i;
ret += i;
}
+   crng_backtrack_protect(tmp, i);
 
/* Wipe data just written to memory */
memzero_explicit(tmp, sizeof(tmp));
@@ -1457,8 +1486,10 @@ void get_random_bytes(void *buf, int nbytes)
if (nbytes > 0) {
extract_crng(tmp);
memcpy(buf, tmp, nbytes);
-   memzero_explicit(tmp, nbytes);
-   }
+   crng_backtrack_protect(tmp, nbytes);
+   } else
+   crng_backtrack_protect(tmp, 0);
+   memzero_explicit(tmp, sizeof(tmp));
 }
 EXPORT_SYMBOL(get_random_bytes);
 
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] random: replace non-blocking pool with a Chacha20-based CRNG

2016-05-04 Thread Theodore Ts'o
The CRNG is faster, and we don't pretend to track entropy usage in the
CRNG any more.

Signed-off-by: Theodore Ts'o 
---
 crypto/chacha20_generic.c |  61 --
 drivers/char/random.c | 283 +++---
 include/crypto/chacha20.h |   1 +
 lib/Makefile  |   2 +-
 lib/chacha20.c|  79 +
 5 files changed, 295 insertions(+), 131 deletions(-)
 create mode 100644 lib/chacha20.c

diff --git a/crypto/chacha20_generic.c b/crypto/chacha20_generic.c
index da9c899..1cab831 100644
--- a/crypto/chacha20_generic.c
+++ b/crypto/chacha20_generic.c
@@ -15,72 +15,11 @@
 #include 
 #include 
 
-static inline u32 rotl32(u32 v, u8 n)
-{
-   return (v << n) | (v >> (sizeof(v) * 8 - n));
-}
-
 static inline u32 le32_to_cpuvp(const void *p)
 {
return le32_to_cpup(p);
 }
 
-static void chacha20_block(u32 *state, void *stream)
-{
-   u32 x[16], *out = stream;
-   int i;
-
-   for (i = 0; i < ARRAY_SIZE(x); i++)
-   x[i] = state[i];
-
-   for (i = 0; i < 20; i += 2) {
-   x[0]  += x[4];x[12] = rotl32(x[12] ^ x[0],  16);
-   x[1]  += x[5];x[13] = rotl32(x[13] ^ x[1],  16);
-   x[2]  += x[6];x[14] = rotl32(x[14] ^ x[2],  16);
-   x[3]  += x[7];x[15] = rotl32(x[15] ^ x[3],  16);
-
-   x[8]  += x[12];   x[4]  = rotl32(x[4]  ^ x[8],  12);
-   x[9]  += x[13];   x[5]  = rotl32(x[5]  ^ x[9],  12);
-   x[10] += x[14];   x[6]  = rotl32(x[6]  ^ x[10], 12);
-   x[11] += x[15];   x[7]  = rotl32(x[7]  ^ x[11], 12);
-
-   x[0]  += x[4];x[12] = rotl32(x[12] ^ x[0],   8);
-   x[1]  += x[5];x[13] = rotl32(x[13] ^ x[1],   8);
-   x[2]  += x[6];x[14] = rotl32(x[14] ^ x[2],   8);
-   x[3]  += x[7];x[15] = rotl32(x[15] ^ x[3],   8);
-
-   x[8]  += x[12];   x[4]  = rotl32(x[4]  ^ x[8],   7);
-   x[9]  += x[13];   x[5]  = rotl32(x[5]  ^ x[9],   7);
-   x[10] += x[14];   x[6]  = rotl32(x[6]  ^ x[10],  7);
-   x[11] += x[15];   x[7]  = rotl32(x[7]  ^ x[11],  7);
-
-   x[0]  += x[5];x[15] = rotl32(x[15] ^ x[0],  16);
-   x[1]  += x[6];x[12] = rotl32(x[12] ^ x[1],  16);
-   x[2]  += x[7];x[13] = rotl32(x[13] ^ x[2],  16);
-   x[3]  += x[4];x[14] = rotl32(x[14] ^ x[3],  16);
-
-   x[10] += x[15];   x[5]  = rotl32(x[5]  ^ x[10], 12);
-   x[11] += x[12];   x[6]  = rotl32(x[6]  ^ x[11], 12);
-   x[8]  += x[13];   x[7]  = rotl32(x[7]  ^ x[8],  12);
-   x[9]  += x[14];   x[4]  = rotl32(x[4]  ^ x[9],  12);
-
-   x[0]  += x[5];x[15] = rotl32(x[15] ^ x[0],   8);
-   x[1]  += x[6];x[12] = rotl32(x[12] ^ x[1],   8);
-   x[2]  += x[7];x[13] = rotl32(x[13] ^ x[2],   8);
-   x[3]  += x[4];x[14] = rotl32(x[14] ^ x[3],   8);
-
-   x[10] += x[15];   x[5]  = rotl32(x[5]  ^ x[10],  7);
-   x[11] += x[12];   x[6]  = rotl32(x[6]  ^ x[11],  7);
-   x[8]  += x[13];   x[7]  = rotl32(x[7]  ^ x[8],   7);
-   x[9]  += x[14];   x[4]  = rotl32(x[4]  ^ x[9],   7);
-   }
-
-   for (i = 0; i < ARRAY_SIZE(x); i++)
-   out[i] = cpu_to_le32(x[i] + state[i]);
-
-   state[12]++;
-}
-
 static void chacha20_docrypt(u32 *state, u8 *dst, const u8 *src,
 unsigned int bytes)
 {
diff --git a/drivers/char/random.c b/drivers/char/random.c
index b583e53..91d5c2a 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -260,6 +260,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -412,6 +413,18 @@ static struct fasync_struct *fasync;
 static DEFINE_SPINLOCK(random_ready_list_lock);
 static LIST_HEAD(random_ready_list);
 
+/*
+ * crng_init =  0 --> Uninitialized
+ * 2 --> Initialized
+ * 3 --> Initialized from input_pool
+ *
+ * crng_init is protected by primary_crng->lock, and only increases
+ * its value (from 0->1->2->3).
+ */
+static int crng_init = 0;
+#define crng_ready() (likely(crng_init >= 2))
+static void process_random_ready_list(void);
+
 /**
  *
  * OS independent entropy store.   Here are the functions which handle
@@ -441,10 +454,13 @@ struct entropy_store {
__u8 last_data[EXTRACT_SIZE];
 };
 
+static ssize_t extract_entropy(struct entropy_store *r, void *buf,
+  size_t nbytes, int min, int rsvd);
+
+static int crng_reseed(struct entropy_store *r);
 static void push_to_pool(struct work_struct *work);
 static __u32 input_pool_data[INPUT_POOL_WORDS];
 static __u32 blocking_pool_data[OUTPUT_POOL_WORDS];
-static __u32 nonblocking_pool_data[OUTPUT_POOL_WORDS];
 
 static struct entropy_store input_pool = {

[PATCH -v2 0/4] random: replace urandom pool with a CRNG

2016-05-04 Thread Theodore Ts'o
By using a CRNG to replace the urandom pool, we address a number of
complaints which Stephan Mueller has been concerned about.  We now use
a much more aggressive interrupt sampling system to quickly initialize
a CRNG which gets used in place of the original non-blocking pool.
This tends to get initialized *very* quickly (before the devices are
finished being proved.)  Like Stephan's proposal, this assumes that we
can get a bit of entropy per interrupt, which may be problematic on
some architectures.  So after we do this quick-and-dirty
initialization, we then fall back to the slower, more conservative
interrupt sampling system to fill the input pool, and we will do a
catastrophic reseeding once we get 128 bits using the slower but more
conservative system, and every five minutes afterwards, if possible.

In addition, on NUMA systems we make the CRNG state per-NUMA socket,
to address the NUMA locking contention problem which Andi Kleen has
been complaining about.  I'm not entirely sure this will work on the
crazy big SGI systems, but they are rare.  Whether they are rarer than
abusive userspace programs that are continuously pounding /dev/urandom
is unclear.  If necessary we can make a config option to turn off the
per-NUMA socket hack if it proves to be problematic.

Stephan Mueller (1):
  random: add interrupt callback to VMBus IRQ handler

Theodore Ts'o (3):
  random: replace non-blocking pool with a Chacha20-based CRNG
  random: make /dev/urandom scalable for silly userspace programs
  random: add backtracking protection to the CRNG

 crypto/chacha20_generic.c |  61 
 drivers/char/random.c | 372 +-
 drivers/hv/vmbus_drv.c|   3 +
 include/crypto/chacha20.h |   1 +
 lib/Makefile  |   2 +-
 lib/chacha20.c|  79 ++
 6 files changed, 387 insertions(+), 131 deletions(-)
 create mode 100644 lib/chacha20.c

-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [crypto / sparc64] cryptomgr_test OOPS

2016-05-04 Thread Anatoly Pugachev
On Tue, May 3, 2016 at 7:33 PM, David Miller  wrote:
> From: Anatoly Pugachev 
> Date: Tue, 3 May 2016 16:54:18 +0300
>
>> I have git kernel OOPS (4.6.0-rc6) on sparc64. This OOPS does not
>> happen, if I set the following kernel option:
>>
>> CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=y
>>
>> Can someone please look at 
>> https://bugzilla.kernel.org/show_bug.cgi?id=117551 ?
>
> The lib/mpi/ code hasn't been touched in a long time, so I can only see that 
> it might
> be the crypto/rsa.c changes that happened this release.
>
> Can you possibly bisect this or at least tell us that 4.5 doesn't have this 
> problem?

David,

we're using 4.5.2 debian kernel here without this problem. I'm not
sure I would be able to bisect, never did it before, but I could
try...

Here's a quick diff related to crypto for debian kernel configs:

$ diff -u /boot/config-4.5.0-2-sparc64-smp /boot/config-4.6.0-rc5-sparc64-smp
@@ -5299,10 +5380,9 @@
 CONFIG_CRYPTO_RNG=m
 CONFIG_CRYPTO_RNG2=y
 CONFIG_CRYPTO_RNG_DEFAULT=m
-CONFIG_CRYPTO_PCOMP=m
-CONFIG_CRYPTO_PCOMP2=y
 CONFIG_CRYPTO_AKCIPHER2=y
-# CONFIG_CRYPTO_RSA is not set
+CONFIG_CRYPTO_AKCIPHER=y
+CONFIG_CRYPTO_RSA=y
 CONFIG_CRYPTO_MANAGER=y
 CONFIG_CRYPTO_MANAGER2=y
 # CONFIG_CRYPTO_USER is not set

so, I need to compile 4.5.x kernel (or even older kernels) with
CRYPTO_RSA=y and test how it would act. And if it would not crash, I
could try to bisect.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] random: replace non-blocking pool with a Chacha20-based CRNG

2016-05-04 Thread tytso
On Wed, May 04, 2016 at 11:29:57AM -0700, H. Peter Anvin wrote:
> 
> We don't care about UB, we care about gcc, and to a lesser extent
> LLVM and ICC.  If bitops.h doesn't do the right thing, we need to
> fix bitops.h.

I'm going to suggest that we treat the ro[rl]{32,64}() question as
separable from the /dev/random case.  I've sanity checked gcc 5.3.1
and it does the right thing given the small number of constant
arguments given to rotl32() in lib/chacha20.c, and it doesn't hit the
UB case which Jeffrey was concerned about.  This is also code that was
previously in crypto/chacha20_generic.c, and so if there is a bug with
some obscure compiler not properly compiling it down to a rotate
instruction, (a) no one is paying me to make sure the kernel code
compiles well on ICC, and (b) it's not scalable to have each kernel
developer try to deal with the vagrancies of compilers.

So from my perspective, the only interesting results for me is (a)
using what had been used before with crypto/chacha20_generic.c, or (b)
reusing what is in bitops.h and letting it be someone else's problem
if some obscure compiler isn't happy with what is in bitops.h

If we are all agreed that what is in bitops.h is considered valid,
then we can start converting people over to using the version defined
in bitops.h, and if there is some compiler issue we need to work
around, at least we only need to put the workaround in a single header
file.

Cheers,
- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] random: replace non-blocking pool with a Chacha20-based CRNG

2016-05-04 Thread H. Peter Anvin
On May 4, 2016 11:22:25 AM PDT, Jeffrey Walton  wrote:
>On Wed, May 4, 2016 at 1:49 PM,   wrote:
>> On Wed, May 04, 2016 at 10:40:20AM -0400, Jeffrey Walton wrote:
>>> > +static inline u32 rotl32(u32 v, u8 n)
>>> > +{
>>> > +   return (v << n) | (v >> (sizeof(v) * 8 - n));
>>> > +}
>>>
>>> That's undefined behavior when n=0.
>>
>> Sure, but it's never called with n = 0; I've double checked and the
>> compiler seems to do the right thing with the above pattern as well.
>
>> Hmm, it looks like there is a "standard" version rotate left and
>right
>> defined in include/linux/bitops.h.  So I suspect it would make sense
>> to use rol32 as defined in bitops.h --- and this is probably
>something
>
>bitops.h could work in this case, but its not an ideal solution. GCC
>does not optimize the code below as expected under all use cases
>because GCC does not recognize it as a rotate (see
>http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157):
>
>return (v << n) | (v >> (sizeof(v) * 8 - n));
>
>And outside of special cases like Salsa, ChaCha and BLAKE2, the code
>provided in bitops.h suffers UB on arbitrary data. So I think care
>needs to be taken when selecting functions from bitops.h.
>
>> that we should do for the rest of crypto/*.c, where people seem to be
>> defininig their own version of something like rotl32 (I copied the
>> contents of crypto/chacha20_generic.c to lib/chacha20, so this
>pattern
>> of defining one's own version of rol32 isn't new).
>
>Yeah, I kind of thought there was some duplication going on.
>
>But I think bitops.h should be fixed. Many folks don't realize the
>lurking UB, and many folks don't realize its not always optimized
>well.
>
>>> I think the portable way to do a rotate that avoids UB is the
>>> following. GCC, Clang and ICC recognize the pattern, and emit a
>rotate
>>> instruction.
>>>
>>> static const unsigned int MASK=31;
>>> return (v<>(-n));
>>>
>>> You should also avoid the following because its not constant time
>due
>>> to the branch:
>>>
>>> return n == 0 ? v : (v << n) | (v >> (sizeof(v) * 8 - n));
>>>
>>
>> Where is this coming from?  I don't see this construct in the patch.
>
>My bad... It was a general observation. I've seen folks try to correct
>the UB by turning to something like that.
>
>Jeff

We don't care about UB, we care about gcc, and to a lesser extent LLVM and ICC. 
 If bitops.h doesn't do the right thing, we need to fix bitops.h.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] random: replace non-blocking pool with a Chacha20-based CRNG

2016-05-04 Thread Jeffrey Walton
On Wed, May 4, 2016 at 1:49 PM,   wrote:
> On Wed, May 04, 2016 at 10:40:20AM -0400, Jeffrey Walton wrote:
>> > +static inline u32 rotl32(u32 v, u8 n)
>> > +{
>> > +   return (v << n) | (v >> (sizeof(v) * 8 - n));
>> > +}
>>
>> That's undefined behavior when n=0.
>
> Sure, but it's never called with n = 0; I've double checked and the
> compiler seems to do the right thing with the above pattern as well.

> Hmm, it looks like there is a "standard" version rotate left and right
> defined in include/linux/bitops.h.  So I suspect it would make sense
> to use rol32 as defined in bitops.h --- and this is probably something

bitops.h could work in this case, but its not an ideal solution. GCC
does not optimize the code below as expected under all use cases
because GCC does not recognize it as a rotate (see
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157):

return (v << n) | (v >> (sizeof(v) * 8 - n));

And outside of special cases like Salsa, ChaCha and BLAKE2, the code
provided in bitops.h suffers UB on arbitrary data. So I think care
needs to be taken when selecting functions from bitops.h.

> that we should do for the rest of crypto/*.c, where people seem to be
> defininig their own version of something like rotl32 (I copied the
> contents of crypto/chacha20_generic.c to lib/chacha20, so this pattern
> of defining one's own version of rol32 isn't new).

Yeah, I kind of thought there was some duplication going on.

But I think bitops.h should be fixed. Many folks don't realize the
lurking UB, and many folks don't realize its not always optimized
well.

>> I think the portable way to do a rotate that avoids UB is the
>> following. GCC, Clang and ICC recognize the pattern, and emit a rotate
>> instruction.
>>
>> static const unsigned int MASK=31;
>> return (v<>(-n));
>>
>> You should also avoid the following because its not constant time due
>> to the branch:
>>
>> return n == 0 ? v : (v << n) | (v >> (sizeof(v) * 8 - n));
>>
>
> Where is this coming from?  I don't see this construct in the patch.

My bad... It was a general observation. I've seen folks try to correct
the UB by turning to something like that.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] random: replace non-blocking pool with a Chacha20-based CRNG

2016-05-04 Thread tytso
On Wed, May 04, 2016 at 10:40:20AM -0400, Jeffrey Walton wrote:
> > +static inline u32 rotl32(u32 v, u8 n)
> > +{
> > +   return (v << n) | (v >> (sizeof(v) * 8 - n));
> > +}
> 
> That's undefined behavior when n=0.

Sure, but it's never called with n = 0; I've double checked and the
compiler seems to do the right thing with the above pattern as well.

Hmm, it looks like there is a "standard" version rotate left and right
defined in include/linux/bitops.h.  So I suspect it would make sense
to use rol32 as defined in bitops.h --- and this is probably something
that we should do for the rest of crypto/*.c, where people seem to be
defininig their own version of something like rotl32 (I copied the
contents of crypto/chacha20_generic.c to lib/chacha20, so this pattern
of defining one's own version of rol32 isn't new).

> I think the portable way to do a rotate that avoids UB is the
> following. GCC, Clang and ICC recognize the pattern, and emit a rotate
> instruction.
> 
> static const unsigned int MASK=31;
> return (v<>(-n));
> 
> You should also avoid the following because its not constant time due
> to the branch:
> 
> return n == 0 ? v : (v << n) | (v >> (sizeof(v) * 8 - n));
> 

Where is this coming from?  I don't see this construct in the patch.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/7] asm-generic/io.h: add io{read,write}64 accessors

2016-05-04 Thread Horia Geantă
This will allow device drivers to consistently use io{read,write}XX
also for 64-bit accesses.

Signed-off-by: Horia Geantă 
---
 include/asm-generic/io.h| 63 +
 include/asm-generic/iomap.h |  8 ++
 2 files changed, 71 insertions(+)

diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index eed3bbe88c8a..0e2c73bb6d56 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -585,6 +585,16 @@ static inline u32 ioread32(const volatile void __iomem 
*addr)
 }
 #endif
 
+#ifdef CONFIG_64BIT
+#ifndef ioread64
+#define ioread64 ioread64
+static inline u64 ioread64(const volatile void __iomem *addr)
+{
+   return readq(addr);
+}
+#endif
+#endif /* CONFIG_64BIT */
+
 #ifndef iowrite8
 #define iowrite8 iowrite8
 static inline void iowrite8(u8 value, volatile void __iomem *addr)
@@ -609,6 +619,16 @@ static inline void iowrite32(u32 value, volatile void 
__iomem *addr)
 }
 #endif
 
+#ifdef CONFIG_64BIT
+#ifndef iowrite64
+#define iowrite64 iowrite64
+static inline void iowrite64(u64 value, volatile void __iomem *addr)
+{
+   writeq(value, addr);
+}
+#endif
+#endif /* CONFIG_64BIT */
+
 #ifndef ioread16be
 #define ioread16be ioread16be
 static inline u16 ioread16be(const volatile void __iomem *addr)
@@ -625,6 +645,16 @@ static inline u32 ioread32be(const volatile void __iomem 
*addr)
 }
 #endif
 
+#ifdef CONFIG_64BIT
+#ifndef ioread64be
+#define ioread64be ioread64be
+static inline u64 ioread64be(const volatile void __iomem *addr)
+{
+   return __be64_to_cpu(__raw_readq(addr));
+}
+#endif
+#endif /* CONFIG_64BIT */
+
 #ifndef iowrite16be
 #define iowrite16be iowrite16be
 static inline void iowrite16be(u16 value, void volatile __iomem *addr)
@@ -641,6 +671,16 @@ static inline void iowrite32be(u32 value, volatile void 
__iomem *addr)
 }
 #endif
 
+#ifdef CONFIG_64BIT
+#ifndef iowrite64be
+#define iowrite64be iowrite64be
+static inline void iowrite64be(u64 value, volatile void __iomem *addr)
+{
+   __raw_writeq(__cpu_to_be64(value), addr);
+}
+#endif
+#endif /* CONFIG_64BIT */
+
 #ifndef ioread8_rep
 #define ioread8_rep ioread8_rep
 static inline void ioread8_rep(const volatile void __iomem *addr, void *buffer,
@@ -668,6 +708,17 @@ static inline void ioread32_rep(const volatile void 
__iomem *addr,
 }
 #endif
 
+#ifdef CONFIG_64BIT
+#ifndef ioread64_rep
+#define ioread64_rep ioread64_rep
+static inline void ioread64_rep(const volatile void __iomem *addr,
+   void *buffer, unsigned int count)
+{
+   readsq(addr, buffer, count);
+}
+#endif
+#endif /* CONFIG_64BIT */
+
 #ifndef iowrite8_rep
 #define iowrite8_rep iowrite8_rep
 static inline void iowrite8_rep(volatile void __iomem *addr,
@@ -697,6 +748,18 @@ static inline void iowrite32_rep(volatile void __iomem 
*addr,
writesl(addr, buffer, count);
 }
 #endif
+
+#ifdef CONFIG_64BIT
+#ifndef iowrite64_rep
+#define iowrite64_rep iowrite64_rep
+static inline void iowrite64_rep(volatile void __iomem *addr,
+const void *buffer,
+unsigned int count)
+{
+   writesq(addr, buffer, count);
+}
+#endif
+#endif /* CONFIG_64BIT */
 #endif /* CONFIG_GENERIC_IOMAP */
 
 #ifdef __KERNEL__
diff --git a/include/asm-generic/iomap.h b/include/asm-generic/iomap.h
index d8f8622fa044..650fede33c25 100644
--- a/include/asm-generic/iomap.h
+++ b/include/asm-generic/iomap.h
@@ -30,12 +30,20 @@ extern unsigned int ioread16(void __iomem *);
 extern unsigned int ioread16be(void __iomem *);
 extern unsigned int ioread32(void __iomem *);
 extern unsigned int ioread32be(void __iomem *);
+#ifdef CONFIG_64BIT
+extern u64 ioread64(void __iomem *);
+extern u64 ioread64be(void __iomem *);
+#endif
 
 extern void iowrite8(u8, void __iomem *);
 extern void iowrite16(u16, void __iomem *);
 extern void iowrite16be(u16, void __iomem *);
 extern void iowrite32(u32, void __iomem *);
 extern void iowrite32be(u32, void __iomem *);
+#ifdef CONFIG_64BIT
+extern void iowrite64(u64, void __iomem *);
+extern void iowrite64be(u64, void __iomem *);
+#endif
 
 /*
  * "string" versions of the above. Note that they
-- 
2.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/7] arm64: dts: ls1043a: add crypto node

2016-05-04 Thread Horia Geantă
LS1043A has a SEC v5.4 security engine.
For now don't add rtic or sec_mon subnodes, since these features
haven't been tested yet.

Signed-off-by: Horia Geantă 
---
 arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dts |  4 +++
 arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi| 43 +++
 2 files changed, 47 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dts 
b/arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dts
index ce235577e90f..9b5b75a4f02a 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dts
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dts
@@ -49,6 +49,10 @@
 
 / {
model = "LS1043A RDB Board";
+
+   aliases {
+   crypto = 
+   };
 };
 
  {
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
index be72bf5b58b5..529c198494d5 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
@@ -159,6 +159,49 @@
big-endian;
};
 
+   crypto: crypto@170 {
+   compatible = "fsl,sec-v5.4", "fsl,sec-v5.0",
+"fsl,sec-v4.0";
+   fsl,sec-era = <3>;
+   #address-cells = <1>;
+   #size-cells = <1>;
+   ranges = <0x0 0x00 0x170 0x10>;
+   reg = <0x00 0x170 0x0 0x10>;
+   interrupts = <0 75 0x4>;
+
+   sec_jr0: jr@1 {
+   compatible = "fsl,sec-v5.4-job-ring",
+"fsl,sec-v5.0-job-ring",
+"fsl,sec-v4.0-job-ring";
+   reg= <0x1 0x1>;
+   interrupts = <0 71 0x4>;
+   };
+
+   sec_jr1: jr@2 {
+   compatible = "fsl,sec-v5.4-job-ring",
+"fsl,sec-v5.0-job-ring",
+"fsl,sec-v4.0-job-ring";
+   reg= <0x2 0x1>;
+   interrupts = <0 72 0x4>;
+   };
+
+   sec_jr2: jr@3 {
+   compatible = "fsl,sec-v5.4-job-ring",
+"fsl,sec-v5.0-job-ring",
+"fsl,sec-v4.0-job-ring";
+   reg= <0x3 0x1>;
+   interrupts = <0 73 0x4>;
+   };
+
+   sec_jr3: jr@4 {
+   compatible = "fsl,sec-v5.4-job-ring",
+"fsl,sec-v5.0-job-ring",
+"fsl,sec-v4.0-job-ring";
+   reg= <0x4 0x1>;
+   interrupts = <0 74 0x4>;
+   };
+   };
+
dcfg: dcfg@1ee {
compatible = "fsl,ls1043a-dcfg", "syscon";
reg = <0x0 0x1ee 0x0 0x1>;
-- 
2.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] random: replace non-blocking pool with a Chacha20-based CRNG

2016-05-04 Thread tytso
On Tue, May 03, 2016 at 10:50:25AM +0200, Stephan Mueller wrote:
> > +/*
> > + * crng_init =  0 --> Uninitialized
> > + * 2 --> Initialized
> > + * 3 --> Initialized from input_pool
> > + */
> > +static int crng_init = 0;
> 
> shouldn't that be an atomic_t ?

The crng_init variable only gets incremented (it progresses from
0->1->2->3) and it's protected by the primary_crng->lock spinlock.
There are a few places where we read it without first taking the lock,
but that's where we depend on it getting incremented and where if we
race with another CPU which has just bumped the crng_init status, it's
not critical.  (Or we can take the lock and then recheck the crng_init
if we really need to be sure.)

> > +static void _initialize_crng(struct crng_state *crng)
> > +{
> > +   int i;
> > +   unsigned long   rv;
> 
> Why do you use unsigned long here? I thought the state[i] is unsigned int.

Because it gets filled in via arch_get_random_seed_long() and
arch_get_random_log().  If that means we get 64 bits and we only
use 32 bits, that's OK

> Would it make sense to add the ChaCha20 self test vectors from RFC7539 here 
> to 
> test that the ChaCha20 works?

We're not doing that for any of the other ciphers and hashes.  We
didn't do that for SHA1, and the chacha20 code where I took this from
didn't check for this as well.  What threat are you most worried
about.  We don't care about chacha20 being exactly chacha20, so long
as it's cryptographically strong.  In fact I think I removed a
potential host order byteswap in the set key operation specifically
because we don't care and interop.

If this is required for FIPS mode, we can add that later.  I was
primarily trying to keep the patch small to be easier for people to
look at it, so I've deliberately left off bells and whistles that
aren't strictly needed to show that the new approach is sound.

> > +   if (crng_init++ >= 2)
> > +   wake_up_interruptible(_init_wait);
> 
> Don't we have a race here with the crng_init < 3 check in crng_reseed 
> considering multi-core systems?

No, because we are holding the primary_crng->lock spinlock.  I'll add
a comment explaining the locking protections which is intended for
crng_init where we declare it.


> > +   if (num < 16 || num > 32) {
> > +   WARN_ON(1);
> > +   pr_err("crng_reseed: num is %d?!?\n", num);
> > +   }
> > +   num_words = (num + 3) / 4;
> > +   for (i = 0; i < num_words; i++)
> > +   primary_crng.state[i+4] ^= tmp[i];
> > +   primary_crng.init_time = jiffies;
> > +   if (crng_init < 3) {
> 
> Shouldn't that one be if (crng_init < 3 && num >= 16) ?

I'll just change the above WRN_ON test to be:

 BUG_ON(num < 16 || num > 32);

This really can't happen, and I had it as a WARN_ON with a printk for
debugging purpose in case I was wrong about how the code works.

> > +   crng_init = 3;
> > +   process_random_ready_list();
> > +   wake_up_interruptible(_init_wait);
> > +   pr_notice("random: crng_init 3\n");
> 
> Would it make sense to be more descriptive here to allow readers of dmesg to 
> understand the output?

Yes, what we're currently printing during the initialization:

random: crng_init 1
random: crng_init 2
random: crng_init 3

was again mostly for debugging purposes.  What I'm thinking about
doing is changing crng_init 2 and 3 messages to be:

random: crng fast init done
random: crng conservative init done

> > +   }
> > +   ret = 1;
> > +out:
> > +   spin_unlock_irqrestore(_crng.lock, flags);
> 
> memzero_explicit of tmp?

Good point, I've added a call to memzero_explicit().

> > +static void extract_crng(__u8 out[CHACHA20_BLOCK_SIZE])
> > +{
> > +   unsigned long v, flags;
> > +   struct crng_state *crng = _crng;
> > +
> > +   if (crng_init > 2 &&
> > +   time_after(jiffies, crng->init_time + CRNG_RESEED_INTERVAL))
> > +   crng_reseed(_pool);
> > +   spin_lock_irqsave(>lock, flags);
> > +   if (arch_get_random_long())
> > +   crng->state[14] ^= v;
> > +   chacha20_block(>state[0], out);
> > +   if (crng->state[12] == 0)
> > +   crng->state[13]++;

> What is the purpose to only cover the 2nd 32 bit value of the nonce with 
> arch_get_random?
> 
> state[12]++? Or why do you increment the nonce?

In Chacha20, its output is a funcrtion of the state array, where
state[0..3] is a constant specified by the Chacha20 definition,
state[4..11] is the Key, and state[12..15] is the IV.  The
chacha20_block() function increments state[12] each time it is called,
so state[12] is being used as the block counter.  The increment of
state[13] is used to make state[13] to be the upper 32-bits of the
block counter.  We *should* be reseeding more often than 2**32 calls
to chacha20_block(), and the increment is just to be safe in case
something goes wronng and we're not reseeding.

We're using crng[14] to be contain the output of RDRAND, so this is
where we mix in the contribution from a CPU-level hwrng. 

[PATCH 0/7] crypto: caam - add support for LS1043A SoC

2016-05-04 Thread Horia Geantă
Hi,

[Patches 1-3 add io{read,write}64[be] accessors (generic, arm64, ppc64),
such that CAAM's accessors in regs.h are simplified a bit.
Patch 7 adds crypto node for LS1043A platform.
Let me know if it's ok to go with these through the cryptodev-2.6 tree.]

This is a follow-up on the following RFC patch set:
crypto: caam - Revamp I/O accessors
https://www.mail-archive.com/linux-crypto@vger.kernel.org/msg15878.html

There are platforms such as LS1043A (or LS1012A) where core endianness
does not match CAAM/SEC endianness (LE vs. BE).
Add support in caam driver for these cases.

Current patch set detects device endianness at runtime (as opposed to
compile-time endianness), in order to support multiplatform kernels.
Detection of device endianness is not device-tree based.
Instead, SSTA ("SEC STAtus") register has a property such that
reading it in any endianness and masking it properly, it's possible
to deduce device endianness.

The performance drop due to the runtime detection is < 1.0%.
(An alternative implementation using function pointers has been tried,
but lead to a bigger performance drop.)

Thanks,
Horia

Cristian Stoica (1):
  crypto: caam - fix offset field in hw sg entries

Horia Geantă (6):
  asm-generic/io.h: add io{read,write}64 accessors
  arm64: add io{read,write}64be accessors
  powerpc: add io{read,write}64 accessors
  crypto: caam - handle core endianness != caam endianness
  crypto: caam - add ARCH_LAYERSCAPE to supported architectures
  arm64: dts: ls1043a: add crypto node

 arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dts |   4 +
 arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi|  43 ++
 arch/arm64/include/asm/io.h   |   4 +-
 arch/powerpc/kernel/iomap.c   |  24 
 drivers/crypto/caam/Kconfig   |   6 +-
 drivers/crypto/caam/caamhash.c|   5 +-
 drivers/crypto/caam/ctrl.c| 125 +++---
 drivers/crypto/caam/desc.h|   9 +-
 drivers/crypto/caam/desc_constr.h |  44 ---
 drivers/crypto/caam/jr.c  |  22 ++--
 drivers/crypto/caam/pdb.h | 137 +++-
 drivers/crypto/caam/regs.h| 151 +++---
 drivers/crypto/caam/sg_sw_sec4.h  |  17 +--
 include/asm-generic/io.h  |  63 +
 include/asm-generic/iomap.h   |   8 ++
 15 files changed, 490 insertions(+), 172 deletions(-)

-- 
2.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/7] crypto: caam - add ARCH_LAYERSCAPE to supported architectures

2016-05-04 Thread Horia Geantă
This basically adds support for ls1043a platform.

Signed-off-by: Horia Geantă 
---
 drivers/crypto/caam/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/caam/Kconfig b/drivers/crypto/caam/Kconfig
index d2c2909a4020..ff54c42e6e51 100644
--- a/drivers/crypto/caam/Kconfig
+++ b/drivers/crypto/caam/Kconfig
@@ -1,6 +1,6 @@
 config CRYPTO_DEV_FSL_CAAM
tristate "Freescale CAAM-Multicore driver backend"
-   depends on FSL_SOC || ARCH_MXC
+   depends on FSL_SOC || ARCH_MXC || ARCH_LAYERSCAPE
help
  Enables the driver module for Freescale's Cryptographic Accelerator
  and Assurance Module (CAAM), also known as the SEC version 4 (SEC4).
-- 
2.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/7] crypto: caam - handle core endianness != caam endianness

2016-05-04 Thread Horia Geantă
There are SoCs like LS1043A where CAAM endianness (BE) does not match
the default endianness of the core (LE).
Moreover, there are requirements for the driver to handle cases like
CPU_BIG_ENDIAN=y on ARM-based SoCs.
This requires for a complete rewrite of the I/O accessors.

PPC-specific accessors - {in,out}_{le,be}XX - are replaced with
generic ones - io{read,write}[be]XX.

Endianness is detected dynamically (at runtime) to allow for
multiplatform kernels, for e.g. running the same kernel image
on LS1043A (BE CAAM) and LS2080A (LE CAAM) armv8-based SoCs.

While here: debugfs entries need to take into consideration the
endianness of the core when displaying data. Add the necessary
glue code so the entries remain the same, but they are properly
read, regardless of the core and/or SEC endianness.

Note: pdb.h fixes only what is currently being used (IPsec).

Signed-off-by: Horia Geantă 
Signed-off-by: Alex Porosanu 
---
 drivers/crypto/caam/Kconfig   |   4 -
 drivers/crypto/caam/caamhash.c|   5 +-
 drivers/crypto/caam/ctrl.c| 125 +++
 drivers/crypto/caam/desc.h|   7 +-
 drivers/crypto/caam/desc_constr.h |  44 +++
 drivers/crypto/caam/jr.c  |  22 +++---
 drivers/crypto/caam/pdb.h | 137 ++
 drivers/crypto/caam/regs.h| 151 +-
 drivers/crypto/caam/sg_sw_sec4.h  |  11 +--
 9 files changed, 340 insertions(+), 166 deletions(-)

diff --git a/drivers/crypto/caam/Kconfig b/drivers/crypto/caam/Kconfig
index 5652a53415dc..d2c2909a4020 100644
--- a/drivers/crypto/caam/Kconfig
+++ b/drivers/crypto/caam/Kconfig
@@ -116,10 +116,6 @@ config CRYPTO_DEV_FSL_CAAM_IMX
def_bool SOC_IMX6 || SOC_IMX7D
depends on CRYPTO_DEV_FSL_CAAM
 
-config CRYPTO_DEV_FSL_CAAM_LE
-   def_bool CRYPTO_DEV_FSL_CAAM_IMX || SOC_LS1021A
-   depends on CRYPTO_DEV_FSL_CAAM
-
 config CRYPTO_DEV_FSL_CAAM_DEBUG
bool "Enable debug output in CAAM driver"
depends on CRYPTO_DEV_FSL_CAAM
diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index 5845d4a08797..f1ecc8df8d41 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -847,7 +847,7 @@ static int ahash_update_ctx(struct ahash_request *req)
 *next_buflen, 0);
} else {
(edesc->sec4_sg + sec4_sg_src_index - 1)->len |=
-   SEC4_SG_LEN_FIN;
+   cpu_to_caam32(SEC4_SG_LEN_FIN);
}
 
state->current_buf = !state->current_buf;
@@ -949,7 +949,8 @@ static int ahash_final_ctx(struct ahash_request *req)
state->buf_dma = try_buf_map_to_sec4_sg(jrdev, edesc->sec4_sg + 1,
buf, state->buf_dma, buflen,
last_buflen);
-   (edesc->sec4_sg + sec4_sg_src_index - 1)->len |= SEC4_SG_LEN_FIN;
+   (edesc->sec4_sg + sec4_sg_src_index - 1)->len |=
+   cpu_to_caam32(SEC4_SG_LEN_FIN);
 
edesc->sec4_sg_dma = dma_map_single(jrdev, edesc->sec4_sg,
sec4_sg_bytes, DMA_TO_DEVICE);
diff --git a/drivers/crypto/caam/ctrl.c b/drivers/crypto/caam/ctrl.c
index 44d30b45f3cc..1c8e764872ae 100644
--- a/drivers/crypto/caam/ctrl.c
+++ b/drivers/crypto/caam/ctrl.c
@@ -15,6 +15,9 @@
 #include "desc_constr.h"
 #include "error.h"
 
+bool caam_little_end;
+EXPORT_SYMBOL(caam_little_end);
+
 /*
  * i.MX targets tend to have clock control subsystems that can
  * enable/disable clocking to our device.
@@ -106,7 +109,7 @@ static inline int run_descriptor_deco0(struct device 
*ctrldev, u32 *desc,
 
 
if (ctrlpriv->virt_en == 1) {
-   setbits32(>deco_rsr, DECORSR_JR0);
+   clrsetbits_32(>deco_rsr, 0, DECORSR_JR0);
 
while (!(rd_reg32(>deco_rsr) & DECORSR_VALID) &&
   --timeout)
@@ -115,7 +118,7 @@ static inline int run_descriptor_deco0(struct device 
*ctrldev, u32 *desc,
timeout = 10;
}
 
-   setbits32(>deco_rq, DECORR_RQD0ENABLE);
+   clrsetbits_32(>deco_rq, 0, DECORR_RQD0ENABLE);
 
while (!(rd_reg32(>deco_rq) & DECORR_DEN0) &&
 --timeout)
@@ -123,12 +126,12 @@ static inline int run_descriptor_deco0(struct device 
*ctrldev, u32 *desc,
 
if (!timeout) {
dev_err(ctrldev, "failed to acquire DECO 0\n");
-   clrbits32(>deco_rq, DECORR_RQD0ENABLE);
+   clrsetbits_32(>deco_rq, DECORR_RQD0ENABLE, 0);
return -ENODEV;
}
 
for (i = 0; i < desc_len(desc); i++)
-   wr_reg32(>descbuf[i], *(desc + i));
+   wr_reg32(>descbuf[i], 

[PATCH 4/7] crypto: caam - fix offset field in hw sg entries

2016-05-04 Thread Horia Geantă
From: Cristian Stoica 

The offset field is 13 bits wide; make sure we don't overwrite more than
that in the caam hardware scatter gather structure.

Signed-off-by: Cristian Stoica 
Signed-off-by: Horia Geantă 
---
 drivers/crypto/caam/desc.h   | 2 +-
 drivers/crypto/caam/sg_sw_sec4.h | 8 
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/crypto/caam/desc.h b/drivers/crypto/caam/desc.h
index 1e93c6af2275..fe30ff69088c 100644
--- a/drivers/crypto/caam/desc.h
+++ b/drivers/crypto/caam/desc.h
@@ -20,7 +20,7 @@
 #define SEC4_SG_BPID_MASK  0x00ff
 #define SEC4_SG_BPID_SHIFT 16
 #define SEC4_SG_LEN_MASK   0x3fff  /* Excludes EXT and FINAL */
-#define SEC4_SG_OFFS_MASK  0x1fff
+#define SEC4_SG_OFFSET_MASK0x1fff
 
 struct sec4_sg_entry {
 #ifdef CONFIG_CRYPTO_DEV_FSL_CAAM_IMX
diff --git a/drivers/crypto/caam/sg_sw_sec4.h b/drivers/crypto/caam/sg_sw_sec4.h
index 12ec6616e89d..2311341b7356 100644
--- a/drivers/crypto/caam/sg_sw_sec4.h
+++ b/drivers/crypto/caam/sg_sw_sec4.h
@@ -11,12 +11,12 @@ struct sec4_sg_entry;
  * convert single dma address to h/w link table format
  */
 static inline void dma_to_sec4_sg_one(struct sec4_sg_entry *sec4_sg_ptr,
- dma_addr_t dma, u32 len, u32 offset)
+ dma_addr_t dma, u32 len, u16 offset)
 {
sec4_sg_ptr->ptr = dma;
sec4_sg_ptr->len = len;
sec4_sg_ptr->buf_pool_id = 0;
-   sec4_sg_ptr->offset = offset;
+   sec4_sg_ptr->offset = offset & SEC4_SG_OFFSET_MASK;
 #ifdef DEBUG
print_hex_dump(KERN_ERR, "sec4_sg_ptr@: ",
   DUMP_PREFIX_ADDRESS, 16, 4, sec4_sg_ptr,
@@ -30,7 +30,7 @@ static inline void dma_to_sec4_sg_one(struct sec4_sg_entry 
*sec4_sg_ptr,
  */
 static inline struct sec4_sg_entry *
 sg_to_sec4_sg(struct scatterlist *sg, int sg_count,
- struct sec4_sg_entry *sec4_sg_ptr, u32 offset)
+ struct sec4_sg_entry *sec4_sg_ptr, u16 offset)
 {
while (sg_count) {
dma_to_sec4_sg_one(sec4_sg_ptr, sg_dma_address(sg),
@@ -48,7 +48,7 @@ sg_to_sec4_sg(struct scatterlist *sg, int sg_count,
  */
 static inline void sg_to_sec4_sg_last(struct scatterlist *sg, int sg_count,
  struct sec4_sg_entry *sec4_sg_ptr,
- u32 offset)
+ u16 offset)
 {
sec4_sg_ptr = sg_to_sec4_sg(sg, sg_count, sec4_sg_ptr, offset);
sec4_sg_ptr->len |= SEC4_SG_LEN_FIN;
-- 
2.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/7] arm64: add io{read,write}64be accessors

2016-05-04 Thread Horia Geantă
This will allow device drivers to consistently use io{read,write}XXbe
also for 64-bit accesses.

Signed-off-by: Alex Porosanu 
Signed-off-by: Horia Geantă 
---
 arch/arm64/include/asm/io.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
index 44be1e03ed65..9b6e408cfa51 100644
--- a/arch/arm64/include/asm/io.h
+++ b/arch/arm64/include/asm/io.h
@@ -174,13 +174,15 @@ extern void __iomem *ioremap_cache(phys_addr_t phys_addr, 
size_t size);
 #define iounmap__iounmap
 
 /*
- * io{read,write}{16,32}be() macros
+ * io{read,write}{16,32,64}be() macros
  */
 #define ioread16be(p)  ({ __u16 __v = be16_to_cpu((__force 
__be16)__raw_readw(p)); __iormb(); __v; })
 #define ioread32be(p)  ({ __u32 __v = be32_to_cpu((__force 
__be32)__raw_readl(p)); __iormb(); __v; })
+#define ioread64be(p)  ({ __u64 __v = be64_to_cpu((__force 
__be64)__raw_readq(p)); __iormb(); __v; })
 
 #define iowrite16be(v,p)   ({ __iowmb(); __raw_writew((__force 
__u16)cpu_to_be16(v), p); })
 #define iowrite32be(v,p)   ({ __iowmb(); __raw_writel((__force 
__u32)cpu_to_be32(v), p); })
+#define iowrite64be(v,p)   ({ __iowmb(); __raw_writeq((__force 
__u64)cpu_to_be64(v), p); })
 
 /*
  * Convert a physical pointer to a virtual kernel pointer for /dev/mem
-- 
2.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] random: replace non-blocking pool with a Chacha20-based CRNG

2016-05-04 Thread Jeffrey Walton
>> + chacha20_block(>state[0], out);
>> + if (crng->state[12] == 0)
>> + crng->state[13]++;
>
> state[12]++? Or why do you increment the nonce?

In Bernstein's Salsa and ChaCha, the counter is 64-bit. It appears
ChaCha-TLS uses a 32-bit counter, and the other 32-bits is given to
the nonce.

Maybe the first question to ask is, what ChaCha is the kernel
providing? If its ChaCha-TLS, then the carry does not make a lot of
sense.

If the generator is limiting the amount of material under a given set
of security parameters (key and nonce), then the generator will likely
re-key itself long before the 256-GB induced wrap. In this case, it
does not matter which ChaCha the kernel is providing and the carry is
superfluous.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [crypto / sparc64] cryptomgr_test OOPS

2016-05-04 Thread Anatoly Pugachev
On Wed, May 4, 2016 at 7:07 AM, Herbert Xu  wrote:
> David Miller  wrote:
>> From: Anatoly Pugachev 
>> Date: Tue, 3 May 2016 16:54:18 +0300
>>
>>> I have git kernel OOPS (4.6.0-rc6) on sparc64. This OOPS does not
>>> happen, if I set the following kernel option:
>>>
>>> CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=y
>>>
>>> Can someone please look at 
>>> https://bugzilla.kernel.org/show_bug.cgi?id=117551 ?
>>
>> The lib/mpi/ code hasn't been touched in a long time, so I can only see that 
>> it might
>> be the crypto/rsa.c changes that happened this release.
>>
>> Can you possibly bisect this or at least tell us that 4.5 doesn't have this 
>> problem?
>
> It would also be helpful to test the latest cryptodev tree which
> has a large number of MPI fixes from Nicolai Stange.

Herbert,

just tested cryptodev (
http://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
) kernel, same OOPS, but kernel version is 4.6.0-rc2+ .
kernel OOPS message - https://paste.fedoraproject.org/362554/23732641/
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] random: replace non-blocking pool with a Chacha20-based CRNG

2016-05-04 Thread Jeffrey Walton
> +static inline u32 rotl32(u32 v, u8 n)
> +{
> +   return (v << n) | (v >> (sizeof(v) * 8 - n));
> +}

That's undefined behavior when n=0.

I think the portable way to do a rotate that avoids UB is the
following. GCC, Clang and ICC recognize the pattern, and emit a rotate
instruction.

static const unsigned int MASK=31;
return (v<>(-n));

You should also avoid the following because its not constant time due
to the branch:

return n == 0 ? v : (v << n) | (v >> (sizeof(v) * 8 - n));

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: v4.6-rc1 regression bisected, Problem loading in-kernel X.509 certificate (-2)

2016-05-04 Thread Tadeusz Struk
Hi David
On 05/04/2016 02:01 AM, David Howells wrote:
> Do you want to push this via Herbert's tree?
> 

Yes, I think Herbert has some more patches queued for rc-7.
Let me also send a proper one with the signed-off tag.

---8<---
Subject: crypto: rsa - select crypto mgr dependency

The pkcs1pad template needs CRYPTO_MANAGER so it needs
to be explicitly selected by CRYPTO_RSA.

Reported-by: Jamie Heilman 
Signed-off-by: Tadeusz Struk 
---
 crypto/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 93a1fdc..1d33beb 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -96,6 +96,7 @@ config CRYPTO_AKCIPHER
 config CRYPTO_RSA
tristate "RSA algorithm"
select CRYPTO_AKCIPHER
+   select CRYPTO_MANAGER
select MPILIB
select ASN1
help

-- 
TS
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] crypto: hash - Fix page length clamping in hash walk

2016-05-04 Thread Sowmini Varadhan
On (05/04/16 12:08), Steffen Klassert wrote:
> > > Sowmini, could you please doublecheck with your test setup?
> > 
> > Don't bother, my patch was crap.  Here's one that's more likely
> > to work:
> 
> This one is much better, works here :)

The patch seems to work, but the caveat is that the original
bug (iperf segv) was not always reproducible on demand - it depended
on memory and other config. 

But looks like Steffen has a reliable way to reproduce the original
problem, so lets go with this patch for now.

--Sowmini

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] lib/mpi: Fix kernel unaligned access in mpi_write_to_sgl

2016-05-04 Thread Sowmini Varadhan
On (05/04/16 10:32), Herbert Xu wrote:
> 
> Please base it on cryptodev.
> 

Looks like this got fixed in cryptodev by commit cece762f6f3c 
("lib/mpi: mpi_write_sgl(): fix out-of-bounds stack access")

Thanks,
--Sowmini
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] crypto: hash - Fix page length clamping in hash walk

2016-05-04 Thread Steffen Klassert
On Wed, May 04, 2016 at 05:52:56PM +0800, Herbert Xu wrote:
> On Wed, May 04, 2016 at 11:34:20AM +0200, Steffen Klassert wrote:
> > 
> > Hmm, the 'sleeping while atomic' because of not unmapping
> > the page goes away, but now I see a lot of IPsec ICV fails
> > on the receive side. I'll try to find out what's going on.
> > 
> > Sowmini, could you please doublecheck with your test setup?
> 
> Don't bother, my patch was crap.  Here's one that's more likely
> to work:

This one is much better, works here :)

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] crypto: hash - Fix page length clamping in hash walk

2016-05-04 Thread Herbert Xu
On Wed, May 04, 2016 at 11:34:20AM +0200, Steffen Klassert wrote:
> 
> Hmm, the 'sleeping while atomic' because of not unmapping
> the page goes away, but now I see a lot of IPsec ICV fails
> on the receive side. I'll try to find out what's going on.
> 
> Sowmini, could you please doublecheck with your test setup?

Don't bother, my patch was crap.  Here's one that's more likely
to work:

---8<---
The crypto hash walk code is broken when supplied with an offset
greater than or equal to PAGE_SIZE.  This patch fixes it by adjusting
walk->pg and walk->offset when this happens.

Cc: 
Reported-by: Steffen Klassert 
Signed-off-by: Herbert Xu 

diff --git a/crypto/ahash.c b/crypto/ahash.c
index 5fc1f17..3887a98 100644
--- a/crypto/ahash.c
+++ b/crypto/ahash.c
@@ -69,8 +69,9 @@ static int hash_walk_new_entry(struct crypto_hash_walk *walk)
struct scatterlist *sg;
 
sg = walk->sg;
-   walk->pg = sg_page(sg);
walk->offset = sg->offset;
+   walk->pg = sg_page(walk->sg) + (walk->offset >> PAGE_SHIFT);
+   walk->offset = offset_in_page(walk->offset);
walk->entrylen = sg->length;
 
if (walk->entrylen > walk->total)
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: crypto: hash - Fix page length clamping in hash walk

2016-05-04 Thread Steffen Klassert
On Tue, May 03, 2016 at 05:55:31PM +0800, Herbert Xu wrote:
> On Thu, Apr 28, 2016 at 10:27:43AM +0200, Steffen Klassert wrote:
> >
> > The problem was that if offset (in a superpage) equals
> > PAGE_SIZE in hash_walk_next(), nbytes becomes zero. So
> > we map the page, but we don't hash and unmap because we
> > exit the loop in shash_ahash_update() in this case.
> 
> I see.  Does this patch help?

Hmm, the 'sleeping while atomic' because of not unmapping
the page goes away, but now I see a lot of IPsec ICV fails
on the receive side. I'll try to find out what's going on.

Sowmini, could you please doublecheck with your test setup?

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: v4.6-rc1 regression bisected, Problem loading in-kernel X.509 certificate (-2)

2016-05-04 Thread David Howells
Tadeusz Struk  wrote:

> I think the problem is that pkcs1pad template needs CRYPTO_MANAGER, but
> your configuration doesn't enable CRYPTO_MANAGER. Could you try this
> please:
> 
> diff --git a/crypto/Kconfig b/crypto/Kconfig
> index 93a1fdc..1d33beb 100644
> --- a/crypto/Kconfig
> +++ b/crypto/Kconfig
> @@ -96,6 +96,7 @@ config CRYPTO_AKCIPHER
>  config CRYPTO_RSA
>   tristate "RSA algorithm"
>   select CRYPTO_AKCIPHER
> + select CRYPTO_MANAGER
>   select MPILIB
>   select ASN1
>   help

Do you want to push this via Herbert's tree?

David
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] random: replace non-blocking pool with a Chacha20-based CRNG

2016-05-04 Thread Stephan Mueller
Am Dienstag, 3. Mai 2016, 11:36:12 schrieb Stephan Mueller:

Hi Ted,

> > +
> > +static ssize_t extract_crng_user(void __user *buf, size_t nbytes)
> > +{
> > +   ssize_t ret = 0, i;
> > +   __u8 tmp[CHACHA20_BLOCK_SIZE];
> > +   int large_request = (nbytes > 256);
> > +
> > +   while (nbytes) {
> > +   if (large_request && need_resched()) {
> > +   if (signal_pending(current)) {
> > +   if (ret == 0)
> > +   ret = -ERESTARTSYS;
> > +   break;
> > +   }
> > +   schedule();
> > +   }
> > +
> > +   extract_crng(tmp);
> > +   i = min_t(int, nbytes, CHACHA20_BLOCK_SIZE);
> > +   if (copy_to_user(buf, tmp, i)) {
> > +   ret = -EFAULT;
> > +   break;
> > +   }
> > +
> > +   nbytes -= i;
> > +   buf += i;
> > +   ret += i;
> > +   }
> > +
> > +   /* Wipe data just written to memory */
> > +   memzero_explicit(tmp, sizeof(tmp));
> 
> Would it make sense to add another chacha20_block() call here at the end?
> Note, the one thing about the SP800-90A DRBG I really like is the enhanced
> backward secrecy support which is implemented by "updating" the internal
> state (the key / state) used for one or more random number generation
> rounds after one request for random numbers is satisfied.
> 
> This means that even if the state becomes known or the subsequent caller
> manages to deduce the state of the RNG to some degree of confidence, he
> cannot backtrack the already generated random numbers.
> 
> I see that the ChaCha20 RNG implicitly updates its state while it operates.
> But for the last round of the RNG, there is no more shuffling of the
> internal state. As one round is 64 bytes in size and many callers just want
> 16 or 32 bytes (as seen during testing), a lot of callers trigger only one
> round of the RNG.

After doing some performance tests, I see that we reach a performance of north 
of 200 MB/s on my system (compare that to 12 MB/s for the SHA-1 version).

Thus, I would assume adding another call to chacha20_block should not hurt.

Ciao
Stephan
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html