Re: early x86 unseeded randomness

2017-08-16 Thread Will Deacon
On Wed, Aug 16, 2017 at 11:13:03AM +0200, Thomas Gleixner wrote:
> On Tue, 15 Aug 2017, Theodore Ts'o wrote:
> > If we really want to do this, I'd much rather *not* have code calling
> > tsc_early_random().  We're better off having the code call
> > get_random_bytes() and/or get_random_u32(), and having these systems
> > use RDRAND if available, and if not, falling back to
> > tsc_early_random() and then mixing it with whatever unpredictability
> > we may have been able to gather so far if the CRNG hasn't been
> > initialized yet.
> 
> I agree. This is not about systems which have RDRAND. We want to support
> systems which do not have it and there the TSC magic comes handy.
> 
> > That way something like tsc_early_random() can help, but it can't make
> > things worse than what we have today (excepting the performance delay
> > caused by adding whatever random shite that we hope is enough to
> > introduce unpredictability to the TSC --- for which I still remain
> > very skeptical).
> 
> I just rerun tests in the early boot code (interrupts disabled, no NMIs
> ...)  with the TSC/wbinvd voodoo on several generations of machines and
> stored 4M random values in a big static array. Reading it out after boot
> and running it through dieharder makes me pretty confident that we observe
> real random noise coming from the internals of the microarch/pipelines/bus
> interactions.
> 
> > P.S.  As I recall hpa@ has talked to some Intel architects internally
> > about how much unpredictability we could really get, and how much of
> > it is just because there's complex state that we can't see (which if
> > we could see, might make it much more predictable), and as I recall
> 
> Right, there is complex state which is not completely synchronous even if
> all frequencies are derived from a single 24MHZ oscillator. The PWMs, the
> memory access characteristics and quite some other sources of
> asynchronousity allow us to utilize that and I'm pretty sure, that you
> can't find two systems which expose exactly the same behaviour.
> 
> > they didn't say anyhing definitively; but they were nervous.  I'm
> 
> Sure, they are always nervous when you ask them questions about the
> internals of their chips especially when you expect authorative answers.

Right, especially as this is randomness as a side-effect of the design,
rather than something that was an actual design goal. You won't find CPU
designers committing to semantics of accidental behaviours :)

Another paper on this (which I think Paul pointed me to) is:

https://www.kernel.org/doc/ols/2014/ols2014-mueller.pdf

which seems to be what crypto/jitterentropy.c is based on.

On arm64, we currently rely on the bootloader for entropy (either an
explicit kaslr seed, or the EFI_RNG_PROTOCOL). Unfortunately, the former is
often zero and the latter unimplemented, but this seems to be improving
slowly.

Will


Re: early x86 unseeded randomness

2017-08-16 Thread Will Deacon
On Wed, Aug 16, 2017 at 11:13:03AM +0200, Thomas Gleixner wrote:
> On Tue, 15 Aug 2017, Theodore Ts'o wrote:
> > If we really want to do this, I'd much rather *not* have code calling
> > tsc_early_random().  We're better off having the code call
> > get_random_bytes() and/or get_random_u32(), and having these systems
> > use RDRAND if available, and if not, falling back to
> > tsc_early_random() and then mixing it with whatever unpredictability
> > we may have been able to gather so far if the CRNG hasn't been
> > initialized yet.
> 
> I agree. This is not about systems which have RDRAND. We want to support
> systems which do not have it and there the TSC magic comes handy.
> 
> > That way something like tsc_early_random() can help, but it can't make
> > things worse than what we have today (excepting the performance delay
> > caused by adding whatever random shite that we hope is enough to
> > introduce unpredictability to the TSC --- for which I still remain
> > very skeptical).
> 
> I just rerun tests in the early boot code (interrupts disabled, no NMIs
> ...)  with the TSC/wbinvd voodoo on several generations of machines and
> stored 4M random values in a big static array. Reading it out after boot
> and running it through dieharder makes me pretty confident that we observe
> real random noise coming from the internals of the microarch/pipelines/bus
> interactions.
> 
> > P.S.  As I recall hpa@ has talked to some Intel architects internally
> > about how much unpredictability we could really get, and how much of
> > it is just because there's complex state that we can't see (which if
> > we could see, might make it much more predictable), and as I recall
> 
> Right, there is complex state which is not completely synchronous even if
> all frequencies are derived from a single 24MHZ oscillator. The PWMs, the
> memory access characteristics and quite some other sources of
> asynchronousity allow us to utilize that and I'm pretty sure, that you
> can't find two systems which expose exactly the same behaviour.
> 
> > they didn't say anyhing definitively; but they were nervous.  I'm
> 
> Sure, they are always nervous when you ask them questions about the
> internals of their chips especially when you expect authorative answers.

Right, especially as this is randomness as a side-effect of the design,
rather than something that was an actual design goal. You won't find CPU
designers committing to semantics of accidental behaviours :)

Another paper on this (which I think Paul pointed me to) is:

https://www.kernel.org/doc/ols/2014/ols2014-mueller.pdf

which seems to be what crypto/jitterentropy.c is based on.

On arm64, we currently rely on the bootloader for entropy (either an
explicit kaslr seed, or the EFI_RNG_PROTOCOL). Unfortunately, the former is
often zero and the latter unimplemented, but this seems to be improving
slowly.

Will


Re: early x86 unseeded randomness

2017-08-16 Thread Thomas Gleixner
On Tue, 15 Aug 2017, Theodore Ts'o wrote:
> If we really want to do this, I'd much rather *not* have code calling
> tsc_early_random().  We're better off having the code call
> get_random_bytes() and/or get_random_u32(), and having these systems
> use RDRAND if available, and if not, falling back to
> tsc_early_random() and then mixing it with whatever unpredictability
> we may have been able to gather so far if the CRNG hasn't been
> initialized yet.

I agree. This is not about systems which have RDRAND. We want to support
systems which do not have it and there the TSC magic comes handy.

> That way something like tsc_early_random() can help, but it can't make
> things worse than what we have today (excepting the performance delay
> caused by adding whatever random shite that we hope is enough to
> introduce unpredictability to the TSC --- for which I still remain
> very skeptical).

I just rerun tests in the early boot code (interrupts disabled, no NMIs
...)  with the TSC/wbinvd voodoo on several generations of machines and
stored 4M random values in a big static array. Reading it out after boot
and running it through dieharder makes me pretty confident that we observe
real random noise coming from the internals of the microarch/pipelines/bus
interactions.

> P.S.  As I recall hpa@ has talked to some Intel architects internally
> about how much unpredictability we could really get, and how much of
> it is just because there's complex state that we can't see (which if
> we could see, might make it much more predictable), and as I recall

Right, there is complex state which is not completely synchronous even if
all frequencies are derived from a single 24MHZ oscillator. The PWMs, the
memory access characteristics and quite some other sources of
asynchronousity allow us to utilize that and I'm pretty sure, that you
can't find two systems which expose exactly the same behaviour.

> they didn't say anyhing definitively; but they were nervous.  I'm

Sure, they are always nervous when you ask them questions about the
internals of their chips especially when you expect authorative answers.

> pretty sure that for Intel architects, the right answer from their
> perspective is to use RDRAND, and not to play games with the TSC.
> 
> The other thing to note here is that because Intel has RDRAND, I'm
> actually not that worried about Intel; all of the
> drivers/char/random.c will mix in inputs from RDRAND or RDSEED if
> available.  I'm actually much more worried about architectures that
> don't have a hardware random number generator (e.g., some ARM
> subarchitectures and MIPS).  So while you might be able to come up
> with something that could work on x86, the real question is it safely
> generalizable to other, non-x86 architectures.  And that's where it
> gets much more scary.

You have similar asynchronous behaviour there especially when you take RTCs
into account. But also the stupid mechanism of utilizing a timer/counter
vs. a constant number of execution cycles mixed with a cacheflush will
expose random behaviour. You need to make some decisions which bit of the
counter to use, but that merily depends on the frequency ratio of the CPU
core vs. the counter.

It'd be probably worthwhile to code up a generic version of the
memcmp/readcounter/memset/cacheflush loop and expose it to a variety of
platforms to figure out how that behaves. We won't get authorative answers
from the silicon dudes, but my (not so limited) understanding of the
hardware internals makes me pretty confident that there is enough noise
caused by clock interference and domain interaction available on all modern
CPUs.

Thanks,

tglx
 



Re: early x86 unseeded randomness

2017-08-16 Thread Thomas Gleixner
On Tue, 15 Aug 2017, Theodore Ts'o wrote:
> If we really want to do this, I'd much rather *not* have code calling
> tsc_early_random().  We're better off having the code call
> get_random_bytes() and/or get_random_u32(), and having these systems
> use RDRAND if available, and if not, falling back to
> tsc_early_random() and then mixing it with whatever unpredictability
> we may have been able to gather so far if the CRNG hasn't been
> initialized yet.

I agree. This is not about systems which have RDRAND. We want to support
systems which do not have it and there the TSC magic comes handy.

> That way something like tsc_early_random() can help, but it can't make
> things worse than what we have today (excepting the performance delay
> caused by adding whatever random shite that we hope is enough to
> introduce unpredictability to the TSC --- for which I still remain
> very skeptical).

I just rerun tests in the early boot code (interrupts disabled, no NMIs
...)  with the TSC/wbinvd voodoo on several generations of machines and
stored 4M random values in a big static array. Reading it out after boot
and running it through dieharder makes me pretty confident that we observe
real random noise coming from the internals of the microarch/pipelines/bus
interactions.

> P.S.  As I recall hpa@ has talked to some Intel architects internally
> about how much unpredictability we could really get, and how much of
> it is just because there's complex state that we can't see (which if
> we could see, might make it much more predictable), and as I recall

Right, there is complex state which is not completely synchronous even if
all frequencies are derived from a single 24MHZ oscillator. The PWMs, the
memory access characteristics and quite some other sources of
asynchronousity allow us to utilize that and I'm pretty sure, that you
can't find two systems which expose exactly the same behaviour.

> they didn't say anyhing definitively; but they were nervous.  I'm

Sure, they are always nervous when you ask them questions about the
internals of their chips especially when you expect authorative answers.

> pretty sure that for Intel architects, the right answer from their
> perspective is to use RDRAND, and not to play games with the TSC.
> 
> The other thing to note here is that because Intel has RDRAND, I'm
> actually not that worried about Intel; all of the
> drivers/char/random.c will mix in inputs from RDRAND or RDSEED if
> available.  I'm actually much more worried about architectures that
> don't have a hardware random number generator (e.g., some ARM
> subarchitectures and MIPS).  So while you might be able to come up
> with something that could work on x86, the real question is it safely
> generalizable to other, non-x86 architectures.  And that's where it
> gets much more scary.

You have similar asynchronous behaviour there especially when you take RTCs
into account. But also the stupid mechanism of utilizing a timer/counter
vs. a constant number of execution cycles mixed with a cacheflush will
expose random behaviour. You need to make some decisions which bit of the
counter to use, but that merily depends on the frequency ratio of the CPU
core vs. the counter.

It'd be probably worthwhile to code up a generic version of the
memcmp/readcounter/memset/cacheflush loop and expose it to a variety of
platforms to figure out how that behaves. We won't get authorative answers
from the silicon dudes, but my (not so limited) understanding of the
hardware internals makes me pretty confident that there is enough noise
caused by clock interference and domain interaction available on all modern
CPUs.

Thanks,

tglx
 



Re: early x86 unseeded randomness

2017-08-15 Thread Theodore Ts'o
On Tue, Aug 15, 2017 at 07:37:05PM +0200, Thomas Gleixner wrote:
> That exploits the fact that the CPU and caches run at a different non
> synchronized clock than the memory controller and therefore the execution
> time for both the wbinvd() and the memchr_inv() measured in TSC cycles is
> non constant and random enough for the early boot randomization.

Um, can we guarantee that is always true for all systems?  Even, say,
for Silermont, Goldmont and Goldmont Plus (Intel's SOC designs) where
the memory may be located in the same chip package as the CPU/caches?

And even if this is true today, can we be sure that it will be true
for the forseeable future?  Using multiple clocks takes more power, so
I would think that on a SOC there would be a strong pressure to use a
single oscillator for the whole package.

If we really want to do this, I'd much rather *not* have code calling
tsc_early_random().  We're better off having the code call
get_random_bytes() and/or get_random_u32(), and having these systems
use RDRAND if available, and if not, falling back to
tsc_early_random() and then mixing it with whatever unpredictability
we may have been able to gather so far if the CRNG hasn't been
initialized yet.

That way something like tsc_early_random() can help, but it can't make
things worse than what we have today (excepting the performance delay
caused by adding whatever random shite that we hope is enough to
introduce unpredictability to the TSC --- for which I still remain
very skeptical).

- Ted

P.S.  As I recall hpa@ has talked to some Intel architects internally
about how much unpredictability we could really get, and how much of
it is just because there's complex state that we can't see (which if
we could see, might make it much more predictable), and as I recall
they didn't say anyhing definitively; but they were nervous.  I'm
pretty sure that for Intel architects, the right answer from their
perspective is to use RDRAND, and not to play games with the TSC.

The other thing to note here is that because Intel has RDRAND, I'm
actually not that worried about Intel; all of the
drivers/char/random.c will mix in inputs from RDRAND or RDSEED if
available.  I'm actually much more worried about architectures that
don't have a hardware random number generator (e.g., some ARM
subarchitectures and MIPS).  So while you might be able to come up
with something that could work on x86, the real question is it safely
generalizable to other, non-x86 architectures.  And that's where it
gets much more scary.


Re: early x86 unseeded randomness

2017-08-15 Thread Theodore Ts'o
On Tue, Aug 15, 2017 at 07:37:05PM +0200, Thomas Gleixner wrote:
> That exploits the fact that the CPU and caches run at a different non
> synchronized clock than the memory controller and therefore the execution
> time for both the wbinvd() and the memchr_inv() measured in TSC cycles is
> non constant and random enough for the early boot randomization.

Um, can we guarantee that is always true for all systems?  Even, say,
for Silermont, Goldmont and Goldmont Plus (Intel's SOC designs) where
the memory may be located in the same chip package as the CPU/caches?

And even if this is true today, can we be sure that it will be true
for the forseeable future?  Using multiple clocks takes more power, so
I would think that on a SOC there would be a strong pressure to use a
single oscillator for the whole package.

If we really want to do this, I'd much rather *not* have code calling
tsc_early_random().  We're better off having the code call
get_random_bytes() and/or get_random_u32(), and having these systems
use RDRAND if available, and if not, falling back to
tsc_early_random() and then mixing it with whatever unpredictability
we may have been able to gather so far if the CRNG hasn't been
initialized yet.

That way something like tsc_early_random() can help, but it can't make
things worse than what we have today (excepting the performance delay
caused by adding whatever random shite that we hope is enough to
introduce unpredictability to the TSC --- for which I still remain
very skeptical).

- Ted

P.S.  As I recall hpa@ has talked to some Intel architects internally
about how much unpredictability we could really get, and how much of
it is just because there's complex state that we can't see (which if
we could see, might make it much more predictable), and as I recall
they didn't say anyhing definitively; but they were nervous.  I'm
pretty sure that for Intel architects, the right answer from their
perspective is to use RDRAND, and not to play games with the TSC.

The other thing to note here is that because Intel has RDRAND, I'm
actually not that worried about Intel; all of the
drivers/char/random.c will mix in inputs from RDRAND or RDSEED if
available.  I'm actually much more worried about architectures that
don't have a hardware random number generator (e.g., some ARM
subarchitectures and MIPS).  So while you might be able to come up
with something that could work on x86, the real question is it safely
generalizable to other, non-x86 architectures.  And that's where it
gets much more scary.


Re: early x86 unseeded randomness

2017-08-15 Thread Theodore Ts'o
On Tue, Aug 15, 2017 at 04:42:47PM +0200, Thomas Gleixner wrote:
> Care to read the paper?
> 
> We tried that 6 years ago on a wide range of machines from server to stupid
> first generation in order ATOM chips. All of them exposed more or less the
> same behaviour and passed RND validation tests.

Yeah, I read the paper, It points out that in a tight loop, they *did*
see real patterns in the TSC values that they read out.  But then they
did better by adding usleep(delay).  This is critical because this was
being done in userspace, when presumably there were other processes
running (kernel threads, if nothing else).

Whether this would still realistic in early boot, when interrupts may
not have been fully enabled, and many devices not even probed yet, and
when there are no other userspace processes to schedule against ---
and where you can't use usleep because the scheduler may not have been
initialized yet --- and where udelay is *way* more boring that usleep
--- I think is something where you can't rely on that paper.

So I think a lot of care is required here.

 - Ted


Re: early x86 unseeded randomness

2017-08-15 Thread Theodore Ts'o
On Tue, Aug 15, 2017 at 04:42:47PM +0200, Thomas Gleixner wrote:
> Care to read the paper?
> 
> We tried that 6 years ago on a wide range of machines from server to stupid
> first generation in order ATOM chips. All of them exposed more or less the
> same behaviour and passed RND validation tests.

Yeah, I read the paper, It points out that in a tight loop, they *did*
see real patterns in the TSC values that they read out.  But then they
did better by adding usleep(delay).  This is critical because this was
being done in userspace, when presumably there were other processes
running (kernel threads, if nothing else).

Whether this would still realistic in early boot, when interrupts may
not have been fully enabled, and many devices not even probed yet, and
when there are no other userspace processes to schedule against ---
and where you can't use usleep because the scheduler may not have been
initialized yet --- and where udelay is *way* more boring that usleep
--- I think is something where you can't rely on that paper.

So I think a lot of care is required here.

 - Ted


Re: early x86 unseeded randomness

2017-08-15 Thread Thomas Gleixner
On Tue, 15 Aug 2017, Thomas Gleixner wrote:
> On Tue, 15 Aug 2017, Theodore Ts'o wrote:
> > On Tue, Aug 15, 2017 at 03:48:18PM +0200, Thomas Gleixner wrote:
> > > > > +u64 __init tsc_early_random(void)
> > > > > +{
> > > > > + u64 uninitialized_var(res);
> > > > > + int i;
> > > > > +
> > > > > + if (!boot_cpu_has(X86_FEATURE_TSC))
> > > > > + return res;
> > > > > +
> > > > > + res ^= rdtsc();
> > > > > + for (i = 0; i < BITS_PER_LONG; i++) {
> > > > > + res ^= ((rdtsc() & 0x04) >> 2) << i;
> > > > > + udelay(2);
> > > > > + }
> > > > > + return res;
> > > > > +}
> > 
> > Reasons why this is probably not the best idea:
> > 
> > 1)  Exactly how udelay is implemented varies from architecture to
> > architecture and in some cases is different on a subarchitectural
> > level.  Some of them rely on reading the TSC; others rely on
> > operations that will have a constant number of CPU cycles (e.g., they
> > aren't doing much if any operations that might even have a tiny
> > glimmer of hope of adding unpredictability).
> 
> That's not really true. You can add random shite instead of udelay(2). The
> point of this exercise is to somewhat utilize the instruction pipeline,
> which causes the TSC readouts to be not even spread over a the loop and
> therefor yield random results.

Talking about random shite:

memset(foo, 0, sizeof(foo));
res ^= rdtsc();
for (i = 0; i < BITS_PER_LONG; i++) {
/* Will never happen ... */
if (memchr_inv(foo, i, sizeof(foo)))
continue;
res ^= ((rdtsc() & 0x04) >> 2) << i;
memset(foo, i, sizeof(foo));
wbinvd();
}
return res;

That exploits the fact that the CPU and caches run at a different non
synchronized clock than the memory controller and therefore the execution
time for both the wbinvd() and the memchr_inv() measured in TSC cycles is
non constant and random enough for the early boot randomization.

Thanks,

tglx


Re: early x86 unseeded randomness

2017-08-15 Thread Thomas Gleixner
On Tue, 15 Aug 2017, Thomas Gleixner wrote:
> On Tue, 15 Aug 2017, Theodore Ts'o wrote:
> > On Tue, Aug 15, 2017 at 03:48:18PM +0200, Thomas Gleixner wrote:
> > > > > +u64 __init tsc_early_random(void)
> > > > > +{
> > > > > + u64 uninitialized_var(res);
> > > > > + int i;
> > > > > +
> > > > > + if (!boot_cpu_has(X86_FEATURE_TSC))
> > > > > + return res;
> > > > > +
> > > > > + res ^= rdtsc();
> > > > > + for (i = 0; i < BITS_PER_LONG; i++) {
> > > > > + res ^= ((rdtsc() & 0x04) >> 2) << i;
> > > > > + udelay(2);
> > > > > + }
> > > > > + return res;
> > > > > +}
> > 
> > Reasons why this is probably not the best idea:
> > 
> > 1)  Exactly how udelay is implemented varies from architecture to
> > architecture and in some cases is different on a subarchitectural
> > level.  Some of them rely on reading the TSC; others rely on
> > operations that will have a constant number of CPU cycles (e.g., they
> > aren't doing much if any operations that might even have a tiny
> > glimmer of hope of adding unpredictability).
> 
> That's not really true. You can add random shite instead of udelay(2). The
> point of this exercise is to somewhat utilize the instruction pipeline,
> which causes the TSC readouts to be not even spread over a the loop and
> therefor yield random results.

Talking about random shite:

memset(foo, 0, sizeof(foo));
res ^= rdtsc();
for (i = 0; i < BITS_PER_LONG; i++) {
/* Will never happen ... */
if (memchr_inv(foo, i, sizeof(foo)))
continue;
res ^= ((rdtsc() & 0x04) >> 2) << i;
memset(foo, i, sizeof(foo));
wbinvd();
}
return res;

That exploits the fact that the CPU and caches run at a different non
synchronized clock than the memory controller and therefore the execution
time for both the wbinvd() and the memchr_inv() measured in TSC cycles is
non constant and random enough for the early boot randomization.

Thanks,

tglx


Re: early x86 unseeded randomness

2017-08-15 Thread Borislav Petkov
On Tue, Aug 15, 2017 at 04:42:47PM +0200, Thomas Gleixner wrote:
> I'm not saying it's a replacement for the run time random generator, but
> it's exceptionally good and reasonably fast for the early boot
> randomization.

... which suffices our purposes. It's not like we're going to generate
long-term key pairs with it.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: early x86 unseeded randomness

2017-08-15 Thread Borislav Petkov
On Tue, Aug 15, 2017 at 04:42:47PM +0200, Thomas Gleixner wrote:
> I'm not saying it's a replacement for the run time random generator, but
> it's exceptionally good and reasonably fast for the early boot
> randomization.

... which suffices our purposes. It's not like we're going to generate
long-term key pairs with it.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: early x86 unseeded randomness

2017-08-15 Thread Borislav Petkov
On Tue, Aug 15, 2017 at 03:48:18PM +0200, Thomas Gleixner wrote:
> Go ahead. What you actually want to do is to replace boot_cpu_has() with a
> real cpuid() check because boot_cpu_has() is not initialized on real early
> boot.

Right.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: early x86 unseeded randomness

2017-08-15 Thread Borislav Petkov
On Tue, Aug 15, 2017 at 03:48:18PM +0200, Thomas Gleixner wrote:
> Go ahead. What you actually want to do is to replace boot_cpu_has() with a
> real cpuid() check because boot_cpu_has() is not initialized on real early
> boot.

Right.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: early x86 unseeded randomness

2017-08-15 Thread Thomas Gleixner
On Tue, 15 Aug 2017, Theodore Ts'o wrote:
> On Tue, Aug 15, 2017 at 03:48:18PM +0200, Thomas Gleixner wrote:
> > > > +u64 __init tsc_early_random(void)
> > > > +{
> > > > +   u64 uninitialized_var(res);
> > > > +   int i;
> > > > +
> > > > +   if (!boot_cpu_has(X86_FEATURE_TSC))
> > > > +   return res;
> > > > +
> > > > +   res ^= rdtsc();
> > > > +   for (i = 0; i < BITS_PER_LONG; i++) {
> > > > +   res ^= ((rdtsc() & 0x04) >> 2) << i;
> > > > +   udelay(2);
> > > > +   }
> > > > +   return res;
> > > > +}
> 
> Reasons why this is probably not the best idea:
> 
> 1)  Exactly how udelay is implemented varies from architecture to
> architecture and in some cases is different on a subarchitectural
> level.  Some of them rely on reading the TSC; others rely on
> operations that will have a constant number of CPU cycles (e.g., they
> aren't doing much if any operations that might even have a tiny
> glimmer of hope of adding unpredictability).

That's not really true. You can add random shite instead of udelay(2). The
point of this exercise is to somewhat utilize the instruction pipeline,
which causes the TSC readouts to be not even spread over a the loop and
therefor yield random results.

> 2) Given a dozen numbers and saying, "hmm, my human brain doesn't see
> a problem, it *must* be good" is hardly the basis on which to make
> this kind of security design.  If you don't understand why this is a
> bad idea, and can't come up with a counter example in under 5 seconds,
> you probably aren't qualified to be designing a RNG which is supposed
> to be cryptographically secure.

Care to read the paper?

We tried that 6 years ago on a wide range of machines from server to stupid
first generation in order ATOM chips. All of them exposed more or less the
same behaviour and passed RND validation tests.

I'm not saying it's a replacement for the run time random generator, but
it's exceptionally good and reasonably fast for the early boot
randomization.

Thanks,

tglx


Re: early x86 unseeded randomness

2017-08-15 Thread Thomas Gleixner
On Tue, 15 Aug 2017, Theodore Ts'o wrote:
> On Tue, Aug 15, 2017 at 03:48:18PM +0200, Thomas Gleixner wrote:
> > > > +u64 __init tsc_early_random(void)
> > > > +{
> > > > +   u64 uninitialized_var(res);
> > > > +   int i;
> > > > +
> > > > +   if (!boot_cpu_has(X86_FEATURE_TSC))
> > > > +   return res;
> > > > +
> > > > +   res ^= rdtsc();
> > > > +   for (i = 0; i < BITS_PER_LONG; i++) {
> > > > +   res ^= ((rdtsc() & 0x04) >> 2) << i;
> > > > +   udelay(2);
> > > > +   }
> > > > +   return res;
> > > > +}
> 
> Reasons why this is probably not the best idea:
> 
> 1)  Exactly how udelay is implemented varies from architecture to
> architecture and in some cases is different on a subarchitectural
> level.  Some of them rely on reading the TSC; others rely on
> operations that will have a constant number of CPU cycles (e.g., they
> aren't doing much if any operations that might even have a tiny
> glimmer of hope of adding unpredictability).

That's not really true. You can add random shite instead of udelay(2). The
point of this exercise is to somewhat utilize the instruction pipeline,
which causes the TSC readouts to be not even spread over a the loop and
therefor yield random results.

> 2) Given a dozen numbers and saying, "hmm, my human brain doesn't see
> a problem, it *must* be good" is hardly the basis on which to make
> this kind of security design.  If you don't understand why this is a
> bad idea, and can't come up with a counter example in under 5 seconds,
> you probably aren't qualified to be designing a RNG which is supposed
> to be cryptographically secure.

Care to read the paper?

We tried that 6 years ago on a wide range of machines from server to stupid
first generation in order ATOM chips. All of them exposed more or less the
same behaviour and passed RND validation tests.

I'm not saying it's a replacement for the run time random generator, but
it's exceptionally good and reasonably fast for the early boot
randomization.

Thanks,

tglx


Re: early x86 unseeded randomness

2017-08-15 Thread Theodore Ts'o
On Tue, Aug 15, 2017 at 03:48:18PM +0200, Thomas Gleixner wrote:
> > > +u64 __init tsc_early_random(void)
> > > +{
> > > + u64 uninitialized_var(res);
> > > + int i;
> > > +
> > > + if (!boot_cpu_has(X86_FEATURE_TSC))
> > > + return res;
> > > +
> > > + res ^= rdtsc();
> > > + for (i = 0; i < BITS_PER_LONG; i++) {
> > > + res ^= ((rdtsc() & 0x04) >> 2) << i;
> > > + udelay(2);
> > > + }
> > > + return res;
> > > +}

Reasons why this is probably not the best idea:

1)  Exactly how udelay is implemented varies from architecture to
architecture and in some cases is different on a subarchitectural
level.  Some of them rely on reading the TSC; others rely on
operations that will have a constant number of CPU cycles (e.g., they
aren't doing much if any operations that might even have a tiny
glimmer of hope of adding unpredictability).

2) Given a dozen numbers and saying, "hmm, my human brain doesn't see
a problem, it *must* be good" is hardly the basis on which to make
this kind of security design.  If you don't understand why this is a
bad idea, and can't come up with a counter example in under 5 seconds,
you probably aren't qualified to be designing a RNG which is supposed
to be cryptographically secure.

3)  Depending on when you use this function in early boot, udelay()
might not even have been calibrated yet.

4) As as standalone function, it doesn't take advantage of whatever
randomness might have been available; and if it turns out that the
tsc_early_random() is trivially predictable by someone spends more
time analyizing it for particular target architectures, it could be
disastrous

5) If you use this in addition to the existing get_random_u32(), it
won't hurt from a cryptographic perspective, but it will end up
burning 64-128 microseconds (or potentially more, depending on how
udelay is implemented on the architecture/subarch and whether udelay
has been calibrated yet).  And it's not clear it's really better(tm).

6) It would be the ultimate in irony if Jason, who tried so hard to
get this warning in because he hoped it would make Linux more secure,
actually ends up making Linux *less* secure because something like
this starts getting used

- Ted


Re: early x86 unseeded randomness

2017-08-15 Thread Theodore Ts'o
On Tue, Aug 15, 2017 at 03:48:18PM +0200, Thomas Gleixner wrote:
> > > +u64 __init tsc_early_random(void)
> > > +{
> > > + u64 uninitialized_var(res);
> > > + int i;
> > > +
> > > + if (!boot_cpu_has(X86_FEATURE_TSC))
> > > + return res;
> > > +
> > > + res ^= rdtsc();
> > > + for (i = 0; i < BITS_PER_LONG; i++) {
> > > + res ^= ((rdtsc() & 0x04) >> 2) << i;
> > > + udelay(2);
> > > + }
> > > + return res;
> > > +}

Reasons why this is probably not the best idea:

1)  Exactly how udelay is implemented varies from architecture to
architecture and in some cases is different on a subarchitectural
level.  Some of them rely on reading the TSC; others rely on
operations that will have a constant number of CPU cycles (e.g., they
aren't doing much if any operations that might even have a tiny
glimmer of hope of adding unpredictability).

2) Given a dozen numbers and saying, "hmm, my human brain doesn't see
a problem, it *must* be good" is hardly the basis on which to make
this kind of security design.  If you don't understand why this is a
bad idea, and can't come up with a counter example in under 5 seconds,
you probably aren't qualified to be designing a RNG which is supposed
to be cryptographically secure.

3)  Depending on when you use this function in early boot, udelay()
might not even have been calibrated yet.

4) As as standalone function, it doesn't take advantage of whatever
randomness might have been available; and if it turns out that the
tsc_early_random() is trivially predictable by someone spends more
time analyizing it for particular target architectures, it could be
disastrous

5) If you use this in addition to the existing get_random_u32(), it
won't hurt from a cryptographic perspective, but it will end up
burning 64-128 microseconds (or potentially more, depending on how
udelay is implemented on the architecture/subarch and whether udelay
has been calibrated yet).  And it's not clear it's really better(tm).

6) It would be the ultimate in irony if Jason, who tried so hard to
get this warning in because he hoped it would make Linux more secure,
actually ends up making Linux *less* secure because something like
this starts getting used

- Ted


Re: early x86 unseeded randomness

2017-08-15 Thread Thomas Gleixner
On Tue, 15 Aug 2017, Borislav Petkov wrote:
> On Tue, Aug 15, 2017 at 12:47:36PM +0200, Thomas Gleixner wrote:
> > 8<---
> > 
> > --- a/arch/x86/kernel/tsc.c
> > +++ b/arch/x86/kernel/tsc.c
> > @@ -1360,3 +1360,19 @@ unsigned long calibrate_delay_is_known(v
> > return 0;
> >  }
> >  #endif
> > +
> > +u64 __init tsc_early_random(void)
> > +{
> > +   u64 uninitialized_var(res);
> > +   int i;
> > +
> > +   if (!boot_cpu_has(X86_FEATURE_TSC))
> > +   return res;
> > +
> > +   res ^= rdtsc();
> > +   for (i = 0; i < BITS_PER_LONG; i++) {
> > +   res ^= ((rdtsc() & 0x04) >> 2) << i;
> > +   udelay(2);
> > +   }
> > +   return res;
> > +}
> 
> Something like this is exactly what I was aiming at with my dumb patch.
> We could use this for early boot randomness on x86.
> 
> Should I turn it into proper patches or you want to?

Go ahead. What you actually want to do is to replace boot_cpu_has() with a
real cpuid() check because boot_cpu_has() is not initialized on real early
boot.

Thanks,

tglx


Re: early x86 unseeded randomness

2017-08-15 Thread Thomas Gleixner
On Tue, 15 Aug 2017, Borislav Petkov wrote:
> On Tue, Aug 15, 2017 at 12:47:36PM +0200, Thomas Gleixner wrote:
> > 8<---
> > 
> > --- a/arch/x86/kernel/tsc.c
> > +++ b/arch/x86/kernel/tsc.c
> > @@ -1360,3 +1360,19 @@ unsigned long calibrate_delay_is_known(v
> > return 0;
> >  }
> >  #endif
> > +
> > +u64 __init tsc_early_random(void)
> > +{
> > +   u64 uninitialized_var(res);
> > +   int i;
> > +
> > +   if (!boot_cpu_has(X86_FEATURE_TSC))
> > +   return res;
> > +
> > +   res ^= rdtsc();
> > +   for (i = 0; i < BITS_PER_LONG; i++) {
> > +   res ^= ((rdtsc() & 0x04) >> 2) << i;
> > +   udelay(2);
> > +   }
> > +   return res;
> > +}
> 
> Something like this is exactly what I was aiming at with my dumb patch.
> We could use this for early boot randomness on x86.
> 
> Should I turn it into proper patches or you want to?

Go ahead. What you actually want to do is to replace boot_cpu_has() with a
real cpuid() check because boot_cpu_has() is not initialized on real early
boot.

Thanks,

tglx


Re: early x86 unseeded randomness

2017-08-15 Thread Borislav Petkov
On Tue, Aug 15, 2017 at 12:47:36PM +0200, Thomas Gleixner wrote:
> 8<---
> 
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -1360,3 +1360,19 @@ unsigned long calibrate_delay_is_known(v
>   return 0;
>  }
>  #endif
> +
> +u64 __init tsc_early_random(void)
> +{
> + u64 uninitialized_var(res);
> + int i;
> +
> + if (!boot_cpu_has(X86_FEATURE_TSC))
> + return res;
> +
> + res ^= rdtsc();
> + for (i = 0; i < BITS_PER_LONG; i++) {
> + res ^= ((rdtsc() & 0x04) >> 2) << i;
> + udelay(2);
> + }
> + return res;
> +}

Something like this is exactly what I was aiming at with my dumb patch.
We could use this for early boot randomness on x86.

Should I turn it into proper patches or you want to?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: early x86 unseeded randomness

2017-08-15 Thread Borislav Petkov
On Tue, Aug 15, 2017 at 12:47:36PM +0200, Thomas Gleixner wrote:
> 8<---
> 
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -1360,3 +1360,19 @@ unsigned long calibrate_delay_is_known(v
>   return 0;
>  }
>  #endif
> +
> +u64 __init tsc_early_random(void)
> +{
> + u64 uninitialized_var(res);
> + int i;
> +
> + if (!boot_cpu_has(X86_FEATURE_TSC))
> + return res;
> +
> + res ^= rdtsc();
> + for (i = 0; i < BITS_PER_LONG; i++) {
> + res ^= ((rdtsc() & 0x04) >> 2) << i;
> + udelay(2);
> + }
> + return res;
> +}

Something like this is exactly what I was aiming at with my dumb patch.
We could use this for early boot randomness on x86.

Should I turn it into proper patches or you want to?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: early x86 unseeded randomness

2017-08-15 Thread Willy Tarreau
On Tue, Aug 15, 2017 at 08:09:50AM -0400, Theodore Ts'o wrote:
> On Tue, Aug 15, 2017 at 10:05:46AM +0200, Ingo Molnar wrote:
> > > > * Willy Tarreau  wrote:
> > > > > 
> > > > > Nowadays we could use similar methods using RDTSC providing more 
> > > > > accurate
> > > > > counting. This doesn't provide a lot of entropy of course, given that 
> > > > > a
> > > > > 2 GHz machine will at most count 31 bits there. But I tend to think 
> > > > > that
> > > > > what matters during early boot is to transform something highly 
> > > > > predictable
> > > > > into something unlikely to be predicted (ie: an exploit having to 
> > > > > scan 2^31
> > > > > possible addresses will not be really usable). It's also possible to 
> > > > > do the
> > > > > same with the PIT0 counter ticking at 18.2 Hz without any correlation 
> > > > > with
> > > > > the RTC by the way, and roughly provide 25 more bits
> 
> All of this assumes that you have different clock crystals generating
> all of these different clock sources.  Otherwise there can be a lot
> less entropy than you expected.

That's what's cool with the RTC, it definitely runs on its own 32kHz
crystal since it continues to beat when everything else is powered off.

Willy


Re: early x86 unseeded randomness

2017-08-15 Thread Willy Tarreau
On Tue, Aug 15, 2017 at 08:09:50AM -0400, Theodore Ts'o wrote:
> On Tue, Aug 15, 2017 at 10:05:46AM +0200, Ingo Molnar wrote:
> > > > * Willy Tarreau  wrote:
> > > > > 
> > > > > Nowadays we could use similar methods using RDTSC providing more 
> > > > > accurate
> > > > > counting. This doesn't provide a lot of entropy of course, given that 
> > > > > a
> > > > > 2 GHz machine will at most count 31 bits there. But I tend to think 
> > > > > that
> > > > > what matters during early boot is to transform something highly 
> > > > > predictable
> > > > > into something unlikely to be predicted (ie: an exploit having to 
> > > > > scan 2^31
> > > > > possible addresses will not be really usable). It's also possible to 
> > > > > do the
> > > > > same with the PIT0 counter ticking at 18.2 Hz without any correlation 
> > > > > with
> > > > > the RTC by the way, and roughly provide 25 more bits
> 
> All of this assumes that you have different clock crystals generating
> all of these different clock sources.  Otherwise there can be a lot
> less entropy than you expected.

That's what's cool with the RTC, it definitely runs on its own 32kHz
crystal since it continues to beat when everything else is powered off.

Willy


Re: early x86 unseeded randomness

2017-08-15 Thread Michael Ellerman
Ingo Molnar  writes:

> * Willy Tarreau  wrote:
>
>> Nowadays we could use similar methods using RDTSC providing more accurate
>> counting. This doesn't provide a lot of entropy of course, given that a
>> 2 GHz machine will at most count 31 bits there. But I tend to think that
>> what matters during early boot is to transform something highly predictable
>> into something unlikely to be predicted (ie: an exploit having to scan 2^31
>> possible addresses will not be really usable). It's also possible to do the
>> same with the PIT0 counter ticking at 18.2 Hz without any correlation with
>> the RTC by the way, and roughly provide 25 more bits. And if you expect
>> that the BIOS has emitted a 800 Hz beep at boot, you could still have a
>> divider of 1491 in PIT2 providing 10 more bits, though with a bit of
>> correlation with PIT0 since they use the same 1.19 MHz source. These
>> methods increase the boot time by up to one second though, but my point
>> here is that when you have nothing it's always a bit better.
>
> One other thing besides trying to extract entropy via timing would be to 
> utilize 
> more of the machine's environment in seeding the random number generator.
>
> For example on x86 the E820 table is available very early on and its 
> addresses 
> could be mixed into the random pool. An external attacker often would not 
> know the 
> precise hardware configuration.
>
> Likewise the boot parameters string could be mixed into the initial random 
> pool as 
> well - and this way distributions could create per installation seed simply 
> by 
> appending a random number to the boot string.

In fact that could be a per-boot seed, if you just re-ran update-grub in
the shutdown scripts with a new value.

cheers


Re: early x86 unseeded randomness

2017-08-15 Thread Michael Ellerman
Ingo Molnar  writes:

> * Willy Tarreau  wrote:
>
>> Nowadays we could use similar methods using RDTSC providing more accurate
>> counting. This doesn't provide a lot of entropy of course, given that a
>> 2 GHz machine will at most count 31 bits there. But I tend to think that
>> what matters during early boot is to transform something highly predictable
>> into something unlikely to be predicted (ie: an exploit having to scan 2^31
>> possible addresses will not be really usable). It's also possible to do the
>> same with the PIT0 counter ticking at 18.2 Hz without any correlation with
>> the RTC by the way, and roughly provide 25 more bits. And if you expect
>> that the BIOS has emitted a 800 Hz beep at boot, you could still have a
>> divider of 1491 in PIT2 providing 10 more bits, though with a bit of
>> correlation with PIT0 since they use the same 1.19 MHz source. These
>> methods increase the boot time by up to one second though, but my point
>> here is that when you have nothing it's always a bit better.
>
> One other thing besides trying to extract entropy via timing would be to 
> utilize 
> more of the machine's environment in seeding the random number generator.
>
> For example on x86 the E820 table is available very early on and its 
> addresses 
> could be mixed into the random pool. An external attacker often would not 
> know the 
> precise hardware configuration.
>
> Likewise the boot parameters string could be mixed into the initial random 
> pool as 
> well - and this way distributions could create per installation seed simply 
> by 
> appending a random number to the boot string.

In fact that could be a per-boot seed, if you just re-ran update-grub in
the shutdown scripts with a new value.

cheers


Re: early x86 unseeded randomness

2017-08-15 Thread Theodore Ts'o
On Tue, Aug 15, 2017 at 10:05:46AM +0200, Ingo Molnar wrote:
> > > * Willy Tarreau  wrote:
> > > > 
> > > > Nowadays we could use similar methods using RDTSC providing more 
> > > > accurate
> > > > counting. This doesn't provide a lot of entropy of course, given that a
> > > > 2 GHz machine will at most count 31 bits there. But I tend to think that
> > > > what matters during early boot is to transform something highly 
> > > > predictable
> > > > into something unlikely to be predicted (ie: an exploit having to scan 
> > > > 2^31
> > > > possible addresses will not be really usable). It's also possible to do 
> > > > the
> > > > same with the PIT0 counter ticking at 18.2 Hz without any correlation 
> > > > with
> > > > the RTC by the way, and roughly provide 25 more bits

All of this assumes that you have different clock crystals generating
all of these different clock sources.  Otherwise there can be a lot
less entropy than you expected.

> In practice it's much faster to process the e820 table and the boot
> string though - and should already give a way to gain very good
> inter-machine randomization. It would also cover odd things like
> weird machines such as virtualization environments that often have
> nothing in the first 1MB.

There is nothing wrong with doing this --- and architecture
maintainers are encouraged to call add_device_randomness() on any
piece of data which they think might help.  That being said, these
values don't change much (boot string) or at all (e820 table) between
boots, and if you have a large collection of machines that all have
the same hardware (say, Dell desktops), and the same distro installed
on them (say, Ubuntu), it doesn't give you as much inter-machine
randomization as you might expect.

And as far as virtualization environments are concerned, the attacker
can determine quite easily from the IP address range if a VM is being
run out of an Amazon, Google, or Microsoft data center.

Finally, in some cases, we need the entropy *very* early --- for
example for kASLR, right after we get control from the boot loader.
Which is why I believe that we really do need to have the bootloader
fetching random seed material from the root file system or some other
trusted store to solve the inter-boot randomness problem.

Again, if platform code and device drivers want to call
add_device_randomness() early in the boot process --- they should
absolutely be encouraged to do this.  It is a good thing.  *AND* it is
not sufficient by itself.

Cheers,

- Ted


Re: early x86 unseeded randomness

2017-08-15 Thread Theodore Ts'o
On Tue, Aug 15, 2017 at 10:05:46AM +0200, Ingo Molnar wrote:
> > > * Willy Tarreau  wrote:
> > > > 
> > > > Nowadays we could use similar methods using RDTSC providing more 
> > > > accurate
> > > > counting. This doesn't provide a lot of entropy of course, given that a
> > > > 2 GHz machine will at most count 31 bits there. But I tend to think that
> > > > what matters during early boot is to transform something highly 
> > > > predictable
> > > > into something unlikely to be predicted (ie: an exploit having to scan 
> > > > 2^31
> > > > possible addresses will not be really usable). It's also possible to do 
> > > > the
> > > > same with the PIT0 counter ticking at 18.2 Hz without any correlation 
> > > > with
> > > > the RTC by the way, and roughly provide 25 more bits

All of this assumes that you have different clock crystals generating
all of these different clock sources.  Otherwise there can be a lot
less entropy than you expected.

> In practice it's much faster to process the e820 table and the boot
> string though - and should already give a way to gain very good
> inter-machine randomization. It would also cover odd things like
> weird machines such as virtualization environments that often have
> nothing in the first 1MB.

There is nothing wrong with doing this --- and architecture
maintainers are encouraged to call add_device_randomness() on any
piece of data which they think might help.  That being said, these
values don't change much (boot string) or at all (e820 table) between
boots, and if you have a large collection of machines that all have
the same hardware (say, Dell desktops), and the same distro installed
on them (say, Ubuntu), it doesn't give you as much inter-machine
randomization as you might expect.

And as far as virtualization environments are concerned, the attacker
can determine quite easily from the IP address range if a VM is being
run out of an Amazon, Google, or Microsoft data center.

Finally, in some cases, we need the entropy *very* early --- for
example for kASLR, right after we get control from the boot loader.
Which is why I believe that we really do need to have the bootloader
fetching random seed material from the root file system or some other
trusted store to solve the inter-boot randomness problem.

Again, if platform code and device drivers want to call
add_device_randomness() early in the boot process --- they should
absolutely be encouraged to do this.  It is a good thing.  *AND* it is
not sufficient by itself.

Cheers,

- Ted


Re: early x86 unseeded randomness

2017-08-15 Thread Thomas Gleixner
On Tue, 15 Aug 2017, Ingo Molnar wrote:
> * Willy Tarreau  wrote:
> 
> > Nowadays we could use similar methods using RDTSC providing more accurate
> > counting. This doesn't provide a lot of entropy of course, given that a
> > 2 GHz machine will at most count 31 bits there. But I tend to think that
> > what matters during early boot is to transform something highly predictable
> > into something unlikely to be predicted (ie: an exploit having to scan 2^31
> > possible addresses will not be really usable). It's also possible to do the
> > same with the PIT0 counter ticking at 18.2 Hz without any correlation with
> > the RTC by the way, and roughly provide 25 more bits. And if you expect
> > that the BIOS has emitted a 800 Hz beep at boot, you could still have a
> > divider of 1491 in PIT2 providing 10 more bits, though with a bit of
> > correlation with PIT0 since they use the same 1.19 MHz source. These
> > methods increase the boot time by up to one second though, but my point
> > here is that when you have nothing it's always a bit better.
> 
> One other thing besides trying to extract entropy via timing would be to 
> utilize 
> more of the machine's environment in seeding the random number generator.
> 
> For example on x86 the E820 table is available very early on and its 
> addresses 
> could be mixed into the random pool. An external attacker often would not 
> know the 
> precise hardware configuration.
> 
> Likewise the boot parameters string could be mixed into the initial random 
> pool as 
> well - and this way distributions could create per installation seed simply 
> by 
> appending a random number to the boot string.
> 
> Both methods should be very fast and robust.

Actually using RDTSC is not the worst approach. See:

  https://lwn.net/images/conf/rtlws11/random-hardware.pdf

Below is a stupid implementation of that. Here are the resulting numbers
from a dozen of boot cycles:

10a3e7af4890c0ae
e7b08c8c18e6d5d9
951e12c77f79e000
ad88753ad11c9b80
db2d4dce466a3da4
b328c76d4e67368d
642edf2265e0c8a7
ef45a9f9326249d0
13e01119498797a6
0a537c8751e0349e
eb67c02dc09326dd
037d4b332020538d
793fbfda06718c69
2231535c514769e5

This mechanism could also be used to seed the random generator.

Thanks,

tglx

8<---

--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1360,3 +1360,19 @@ unsigned long calibrate_delay_is_known(v
return 0;
 }
 #endif
+
+u64 __init tsc_early_random(void)
+{
+   u64 uninitialized_var(res);
+   int i;
+
+   if (!boot_cpu_has(X86_FEATURE_TSC))
+   return res;
+
+   res ^= rdtsc();
+   for (i = 0; i < BITS_PER_LONG; i++) {
+   res ^= ((rdtsc() & 0x04) >> 2) << i;
+   udelay(2);
+   }
+   return res;
+}


Re: early x86 unseeded randomness

2017-08-15 Thread Thomas Gleixner
On Tue, 15 Aug 2017, Ingo Molnar wrote:
> * Willy Tarreau  wrote:
> 
> > Nowadays we could use similar methods using RDTSC providing more accurate
> > counting. This doesn't provide a lot of entropy of course, given that a
> > 2 GHz machine will at most count 31 bits there. But I tend to think that
> > what matters during early boot is to transform something highly predictable
> > into something unlikely to be predicted (ie: an exploit having to scan 2^31
> > possible addresses will not be really usable). It's also possible to do the
> > same with the PIT0 counter ticking at 18.2 Hz without any correlation with
> > the RTC by the way, and roughly provide 25 more bits. And if you expect
> > that the BIOS has emitted a 800 Hz beep at boot, you could still have a
> > divider of 1491 in PIT2 providing 10 more bits, though with a bit of
> > correlation with PIT0 since they use the same 1.19 MHz source. These
> > methods increase the boot time by up to one second though, but my point
> > here is that when you have nothing it's always a bit better.
> 
> One other thing besides trying to extract entropy via timing would be to 
> utilize 
> more of the machine's environment in seeding the random number generator.
> 
> For example on x86 the E820 table is available very early on and its 
> addresses 
> could be mixed into the random pool. An external attacker often would not 
> know the 
> precise hardware configuration.
> 
> Likewise the boot parameters string could be mixed into the initial random 
> pool as 
> well - and this way distributions could create per installation seed simply 
> by 
> appending a random number to the boot string.
> 
> Both methods should be very fast and robust.

Actually using RDTSC is not the worst approach. See:

  https://lwn.net/images/conf/rtlws11/random-hardware.pdf

Below is a stupid implementation of that. Here are the resulting numbers
from a dozen of boot cycles:

10a3e7af4890c0ae
e7b08c8c18e6d5d9
951e12c77f79e000
ad88753ad11c9b80
db2d4dce466a3da4
b328c76d4e67368d
642edf2265e0c8a7
ef45a9f9326249d0
13e01119498797a6
0a537c8751e0349e
eb67c02dc09326dd
037d4b332020538d
793fbfda06718c69
2231535c514769e5

This mechanism could also be used to seed the random generator.

Thanks,

tglx

8<---

--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1360,3 +1360,19 @@ unsigned long calibrate_delay_is_known(v
return 0;
 }
 #endif
+
+u64 __init tsc_early_random(void)
+{
+   u64 uninitialized_var(res);
+   int i;
+
+   if (!boot_cpu_has(X86_FEATURE_TSC))
+   return res;
+
+   res ^= rdtsc();
+   for (i = 0; i < BITS_PER_LONG; i++) {
+   res ^= ((rdtsc() & 0x04) >> 2) << i;
+   udelay(2);
+   }
+   return res;
+}


Re: early x86 unseeded randomness

2017-08-15 Thread Ingo Molnar

* Willy Tarreau  wrote:

> On Tue, Aug 15, 2017 at 09:42:54AM +0200, Ingo Molnar wrote:
> > 
> > * Willy Tarreau  wrote:
> > 
> > > Nowadays we could use similar methods using RDTSC providing more accurate
> > > counting. This doesn't provide a lot of entropy of course, given that a
> > > 2 GHz machine will at most count 31 bits there. But I tend to think that
> > > what matters during early boot is to transform something highly 
> > > predictable
> > > into something unlikely to be predicted (ie: an exploit having to scan 
> > > 2^31
> > > possible addresses will not be really usable). It's also possible to do 
> > > the
> > > same with the PIT0 counter ticking at 18.2 Hz without any correlation with
> > > the RTC by the way, and roughly provide 25 more bits. And if you expect
> > > that the BIOS has emitted a 800 Hz beep at boot, you could still have a
> > > divider of 1491 in PIT2 providing 10 more bits, though with a bit of
> > > correlation with PIT0 since they use the same 1.19 MHz source. These
> > > methods increase the boot time by up to one second though, but my point
> > > here is that when you have nothing it's always a bit better.
> > 
> > One other thing besides trying to extract entropy via timing would be to 
> > utilize 
> > more of the machine's environment in seeding the random number generator.
> > 
> > For example on x86 the E820 table is available very early on and its 
> > addresses 
> > could be mixed into the random pool. An external attacker often would not 
> > know the 
> > precise hardware configuration.
> > 
> > Likewise the boot parameters string could be mixed into the initial random 
> > pool as 
> > well - and this way distributions could create per installation seed simply 
> > by 
> > appending a random number to the boot string.
> > 
> > Both methods should be very fast and robust.
> 
> Definitely, just like a simple MD5SUM on the first MB of RAM including
> the BIOS, and on the CMOS RAM contents, which also differ quite a bit
> between systems.

In practice it's much faster to process the e820 table and the boot string 
though 
- and should already give a way to gain very good inter-machine randomization. 
It 
would also cover odd things like weird machines such as virtualization 
environments that often have nothing in the first 1MB.

I.e. 'e820' and 'boot string' are two universally available pieces of 
environmental data that could be used in a robust and fast fashion.

Thanks,

Ingo


Re: early x86 unseeded randomness

2017-08-15 Thread Ingo Molnar

* Willy Tarreau  wrote:

> On Tue, Aug 15, 2017 at 09:42:54AM +0200, Ingo Molnar wrote:
> > 
> > * Willy Tarreau  wrote:
> > 
> > > Nowadays we could use similar methods using RDTSC providing more accurate
> > > counting. This doesn't provide a lot of entropy of course, given that a
> > > 2 GHz machine will at most count 31 bits there. But I tend to think that
> > > what matters during early boot is to transform something highly 
> > > predictable
> > > into something unlikely to be predicted (ie: an exploit having to scan 
> > > 2^31
> > > possible addresses will not be really usable). It's also possible to do 
> > > the
> > > same with the PIT0 counter ticking at 18.2 Hz without any correlation with
> > > the RTC by the way, and roughly provide 25 more bits. And if you expect
> > > that the BIOS has emitted a 800 Hz beep at boot, you could still have a
> > > divider of 1491 in PIT2 providing 10 more bits, though with a bit of
> > > correlation with PIT0 since they use the same 1.19 MHz source. These
> > > methods increase the boot time by up to one second though, but my point
> > > here is that when you have nothing it's always a bit better.
> > 
> > One other thing besides trying to extract entropy via timing would be to 
> > utilize 
> > more of the machine's environment in seeding the random number generator.
> > 
> > For example on x86 the E820 table is available very early on and its 
> > addresses 
> > could be mixed into the random pool. An external attacker often would not 
> > know the 
> > precise hardware configuration.
> > 
> > Likewise the boot parameters string could be mixed into the initial random 
> > pool as 
> > well - and this way distributions could create per installation seed simply 
> > by 
> > appending a random number to the boot string.
> > 
> > Both methods should be very fast and robust.
> 
> Definitely, just like a simple MD5SUM on the first MB of RAM including
> the BIOS, and on the CMOS RAM contents, which also differ quite a bit
> between systems.

In practice it's much faster to process the e820 table and the boot string 
though 
- and should already give a way to gain very good inter-machine randomization. 
It 
would also cover odd things like weird machines such as virtualization 
environments that often have nothing in the first 1MB.

I.e. 'e820' and 'boot string' are two universally available pieces of 
environmental data that could be used in a robust and fast fashion.

Thanks,

Ingo


Re: early x86 unseeded randomness

2017-08-15 Thread Willy Tarreau
On Tue, Aug 15, 2017 at 09:42:54AM +0200, Ingo Molnar wrote:
> 
> * Willy Tarreau  wrote:
> 
> > Nowadays we could use similar methods using RDTSC providing more accurate
> > counting. This doesn't provide a lot of entropy of course, given that a
> > 2 GHz machine will at most count 31 bits there. But I tend to think that
> > what matters during early boot is to transform something highly predictable
> > into something unlikely to be predicted (ie: an exploit having to scan 2^31
> > possible addresses will not be really usable). It's also possible to do the
> > same with the PIT0 counter ticking at 18.2 Hz without any correlation with
> > the RTC by the way, and roughly provide 25 more bits. And if you expect
> > that the BIOS has emitted a 800 Hz beep at boot, you could still have a
> > divider of 1491 in PIT2 providing 10 more bits, though with a bit of
> > correlation with PIT0 since they use the same 1.19 MHz source. These
> > methods increase the boot time by up to one second though, but my point
> > here is that when you have nothing it's always a bit better.
> 
> One other thing besides trying to extract entropy via timing would be to 
> utilize 
> more of the machine's environment in seeding the random number generator.
> 
> For example on x86 the E820 table is available very early on and its 
> addresses 
> could be mixed into the random pool. An external attacker often would not 
> know the 
> precise hardware configuration.
> 
> Likewise the boot parameters string could be mixed into the initial random 
> pool as 
> well - and this way distributions could create per installation seed simply 
> by 
> appending a random number to the boot string.
> 
> Both methods should be very fast and robust.

Definitely, just like a simple MD5SUM on the first MB of RAM including
the BIOS, and on the CMOS RAM contents, which also differ quite a bit
between systems.

Willy


Re: early x86 unseeded randomness

2017-08-15 Thread Willy Tarreau
On Tue, Aug 15, 2017 at 09:42:54AM +0200, Ingo Molnar wrote:
> 
> * Willy Tarreau  wrote:
> 
> > Nowadays we could use similar methods using RDTSC providing more accurate
> > counting. This doesn't provide a lot of entropy of course, given that a
> > 2 GHz machine will at most count 31 bits there. But I tend to think that
> > what matters during early boot is to transform something highly predictable
> > into something unlikely to be predicted (ie: an exploit having to scan 2^31
> > possible addresses will not be really usable). It's also possible to do the
> > same with the PIT0 counter ticking at 18.2 Hz without any correlation with
> > the RTC by the way, and roughly provide 25 more bits. And if you expect
> > that the BIOS has emitted a 800 Hz beep at boot, you could still have a
> > divider of 1491 in PIT2 providing 10 more bits, though with a bit of
> > correlation with PIT0 since they use the same 1.19 MHz source. These
> > methods increase the boot time by up to one second though, but my point
> > here is that when you have nothing it's always a bit better.
> 
> One other thing besides trying to extract entropy via timing would be to 
> utilize 
> more of the machine's environment in seeding the random number generator.
> 
> For example on x86 the E820 table is available very early on and its 
> addresses 
> could be mixed into the random pool. An external attacker often would not 
> know the 
> precise hardware configuration.
> 
> Likewise the boot parameters string could be mixed into the initial random 
> pool as 
> well - and this way distributions could create per installation seed simply 
> by 
> appending a random number to the boot string.
> 
> Both methods should be very fast and robust.

Definitely, just like a simple MD5SUM on the first MB of RAM including
the BIOS, and on the CMOS RAM contents, which also differ quite a bit
between systems.

Willy


Re: early x86 unseeded randomness

2017-08-15 Thread Ingo Molnar

* Willy Tarreau  wrote:

> Nowadays we could use similar methods using RDTSC providing more accurate
> counting. This doesn't provide a lot of entropy of course, given that a
> 2 GHz machine will at most count 31 bits there. But I tend to think that
> what matters during early boot is to transform something highly predictable
> into something unlikely to be predicted (ie: an exploit having to scan 2^31
> possible addresses will not be really usable). It's also possible to do the
> same with the PIT0 counter ticking at 18.2 Hz without any correlation with
> the RTC by the way, and roughly provide 25 more bits. And if you expect
> that the BIOS has emitted a 800 Hz beep at boot, you could still have a
> divider of 1491 in PIT2 providing 10 more bits, though with a bit of
> correlation with PIT0 since they use the same 1.19 MHz source. These
> methods increase the boot time by up to one second though, but my point
> here is that when you have nothing it's always a bit better.

One other thing besides trying to extract entropy via timing would be to 
utilize 
more of the machine's environment in seeding the random number generator.

For example on x86 the E820 table is available very early on and its addresses 
could be mixed into the random pool. An external attacker often would not know 
the 
precise hardware configuration.

Likewise the boot parameters string could be mixed into the initial random pool 
as 
well - and this way distributions could create per installation seed simply by 
appending a random number to the boot string.

Both methods should be very fast and robust.

Thanks,

Ingo


Re: early x86 unseeded randomness

2017-08-15 Thread Ingo Molnar

* Willy Tarreau  wrote:

> Nowadays we could use similar methods using RDTSC providing more accurate
> counting. This doesn't provide a lot of entropy of course, given that a
> 2 GHz machine will at most count 31 bits there. But I tend to think that
> what matters during early boot is to transform something highly predictable
> into something unlikely to be predicted (ie: an exploit having to scan 2^31
> possible addresses will not be really usable). It's also possible to do the
> same with the PIT0 counter ticking at 18.2 Hz without any correlation with
> the RTC by the way, and roughly provide 25 more bits. And if you expect
> that the BIOS has emitted a 800 Hz beep at boot, you could still have a
> divider of 1491 in PIT2 providing 10 more bits, though with a bit of
> correlation with PIT0 since they use the same 1.19 MHz source. These
> methods increase the boot time by up to one second though, but my point
> here is that when you have nothing it's always a bit better.

One other thing besides trying to extract entropy via timing would be to 
utilize 
more of the machine's environment in seeding the random number generator.

For example on x86 the E820 table is available very early on and its addresses 
could be mixed into the random pool. An external attacker often would not know 
the 
precise hardware configuration.

Likewise the boot parameters string could be mixed into the initial random pool 
as 
well - and this way distributions could create per installation seed simply by 
appending a random number to the boot string.

Both methods should be very fast and robust.

Thanks,

Ingo


Re: early x86 unseeded randomness

2017-08-15 Thread Willy Tarreau
Hi Ted,

On Mon, Aug 14, 2017 at 09:31:24PM -0400, Theodore Ts'o wrote:
> The real fix is to do what OpenBSD does, which is to teach the
> bootloader (e.g., grub) to read from some file such as
> /var/lib/urandom/random-seed, and then to have the init scripts
> overwrite it with a new set of entropy generated from getrandom(2) as
> early as possible.
> 
> It won't solve the CD-ROM install problem (although there is enough
> entropy during the install process that after the install is done, we
> should be fine, and again, *hopefully* the distro people won't be
> stupid enough to generate high-value keys during the installation
> process, or certainly not early in the installation process).  But it
> does solve most of the problem.

In my opinion what matters is to combine multiple sources of entropy. I
remember in the good old days when I was coding under DOS, I used to
build my own random numbers using a phase detection method, counting
the time it takes for the RTC to switch to the next second, similar to
this :

  rand:
 xor edx, edx
 mov al, 0
 out 70h, al
 in al, 71h
 mov ah, al
  count:
 inc edx
 in al, 71h
 cmp al, ah
 jz count
 mov eax, edx
 ret

Nowadays we could use similar methods using RDTSC providing more accurate
counting. This doesn't provide a lot of entropy of course, given that a
2 GHz machine will at most count 31 bits there. But I tend to think that
what matters during early boot is to transform something highly predictable
into something unlikely to be predicted (ie: an exploit having to scan 2^31
possible addresses will not be really usable). It's also possible to do the
same with the PIT0 counter ticking at 18.2 Hz without any correlation with
the RTC by the way, and roughly provide 25 more bits. And if you expect
that the BIOS has emitted a 800 Hz beep at boot, you could still have a
divider of 1491 in PIT2 providing 10 more bits, though with a bit of
correlation with PIT0 since they use the same 1.19 MHz source. These
methods increase the boot time by up to one second though, but my point
here is that when you have nothing it's always a bit better.

Willy


Re: early x86 unseeded randomness

2017-08-15 Thread Willy Tarreau
Hi Ted,

On Mon, Aug 14, 2017 at 09:31:24PM -0400, Theodore Ts'o wrote:
> The real fix is to do what OpenBSD does, which is to teach the
> bootloader (e.g., grub) to read from some file such as
> /var/lib/urandom/random-seed, and then to have the init scripts
> overwrite it with a new set of entropy generated from getrandom(2) as
> early as possible.
> 
> It won't solve the CD-ROM install problem (although there is enough
> entropy during the install process that after the install is done, we
> should be fine, and again, *hopefully* the distro people won't be
> stupid enough to generate high-value keys during the installation
> process, or certainly not early in the installation process).  But it
> does solve most of the problem.

In my opinion what matters is to combine multiple sources of entropy. I
remember in the good old days when I was coding under DOS, I used to
build my own random numbers using a phase detection method, counting
the time it takes for the RTC to switch to the next second, similar to
this :

  rand:
 xor edx, edx
 mov al, 0
 out 70h, al
 in al, 71h
 mov ah, al
  count:
 inc edx
 in al, 71h
 cmp al, ah
 jz count
 mov eax, edx
 ret

Nowadays we could use similar methods using RDTSC providing more accurate
counting. This doesn't provide a lot of entropy of course, given that a
2 GHz machine will at most count 31 bits there. But I tend to think that
what matters during early boot is to transform something highly predictable
into something unlikely to be predicted (ie: an exploit having to scan 2^31
possible addresses will not be really usable). It's also possible to do the
same with the PIT0 counter ticking at 18.2 Hz without any correlation with
the RTC by the way, and roughly provide 25 more bits. And if you expect
that the BIOS has emitted a 800 Hz beep at boot, you could still have a
divider of 1491 in PIT2 providing 10 more bits, though with a bit of
correlation with PIT0 since they use the same 1.19 MHz source. These
methods increase the boot time by up to one second though, but my point
here is that when you have nothing it's always a bit better.

Willy


Re: early x86 unseeded randomness

2017-08-14 Thread Theodore Ts'o
On Mon, Aug 14, 2017 at 09:00:13PM +0200, Borislav Petkov wrote:
> On Mon, Aug 14, 2017 at 11:17:37AM -0700, Linus Torvalds wrote:
> > Ok, guys, you ALL need to learn that blindly just trying to get rid of
> > warnings IS A HORRIBLE IDEA.
> 
> Not blindly - I was actually suggesting/asking whether falling back to
> the TSC that early during boot might make more sense than using unseeded
> randomness. Especially add the least significant 32 bits to the most
> significant i.e., that thing:
> 
>   tsc + (tsc << 32UL)
> 
> as they're more unpredictable.

Part of the initialization step of the entropy pools and the crng
mixes in the jiffies, ktime_get_real() and the TSC.  So if you are one
of those people who believe in the magic of jitter-rng, we do
something somewhat similar to that a part of the initialization.  We
just don't credit any entropy to that step.

> 
> > But maybe those places that currently trigger the warning should just
> > use "get_random_u32()" instead. That at least gets rid of the warning
> > if there's a fast architected hardware random thing (ie modern x86).
> 
> Right, that is better, at least for the RDRAND machines.

Also note that the CRNG will use RDRAND if it is there; it just
doesn't depend on it.  The main thing about get_random_u32() is that
it has a policy decision embedded into it which is that we believe
that RDRAND, or whatever implements arch_get_random_{seed_}{long,int} is 
trusted.  (e.g., that we trust that Intel, Qualcomm, etc., didn't put a
back door into CPU-based random number generator, which we basically have
to take on faith because there is no way to audit it).

I tend to be a bit of pragmatist on these things, which is that I
believe it is *unlikely* that the NSA managed to pay off or blackmail
Intel into subverting RDRAND.  I can't prove it, of course, but if we
can get away with not trusting Intel, it's better if we can avoid
investing all of our trust in Intel.  On the flip side, for very early
boot, unless we can do what OpenBSD does and actually embed into the
boot loader reading a cryptographic seed from trusted store, it's
pretty much hopeless to think we can gather entropy, and we might as
well just use RDRAND, and hope for the best.  Realistically speaking,
the harm done from a bogus stack canary or even a subverted KASLR is
very small compared with introducing a wekaness into the keygen of
long-term public/private key.

And yes, spraying warning around just annoys users and if it inspires
them to do dumb(tm) things, it really doesn't help.  So suppressing
warnings during early boot is probably a good thing, since hopefully
people in early boot won't be stupid enough to generate high-value
keypairs.  :-)

>  We'd still need a proper fix for the older ones. And I don't see an
> easy way to change the init ordering for the stack canary as it gets
> setup very very early in start_kernel() vs crng_initialize() being
> an early_initcall()... Need to sleep on it.

The real fix is to do what OpenBSD does, which is to teach the
bootloader (e.g., grub) to read from some file such as
/var/lib/urandom/random-seed, and then to have the init scripts
overwrite it with a new set of entropy generated from getrandom(2) as
early as possible.

It won't solve the CD-ROM install problem (although there is enough
entropy during the install process that after the install is done, we
should be fine, and again, *hopefully* the distro people won't be
stupid enough to generate high-value keys during the installation
process, or certainly not early in the installation process).  But it
does solve most of the problem.

- Ted


Re: early x86 unseeded randomness

2017-08-14 Thread Theodore Ts'o
On Mon, Aug 14, 2017 at 09:00:13PM +0200, Borislav Petkov wrote:
> On Mon, Aug 14, 2017 at 11:17:37AM -0700, Linus Torvalds wrote:
> > Ok, guys, you ALL need to learn that blindly just trying to get rid of
> > warnings IS A HORRIBLE IDEA.
> 
> Not blindly - I was actually suggesting/asking whether falling back to
> the TSC that early during boot might make more sense than using unseeded
> randomness. Especially add the least significant 32 bits to the most
> significant i.e., that thing:
> 
>   tsc + (tsc << 32UL)
> 
> as they're more unpredictable.

Part of the initialization step of the entropy pools and the crng
mixes in the jiffies, ktime_get_real() and the TSC.  So if you are one
of those people who believe in the magic of jitter-rng, we do
something somewhat similar to that a part of the initialization.  We
just don't credit any entropy to that step.

> 
> > But maybe those places that currently trigger the warning should just
> > use "get_random_u32()" instead. That at least gets rid of the warning
> > if there's a fast architected hardware random thing (ie modern x86).
> 
> Right, that is better, at least for the RDRAND machines.

Also note that the CRNG will use RDRAND if it is there; it just
doesn't depend on it.  The main thing about get_random_u32() is that
it has a policy decision embedded into it which is that we believe
that RDRAND, or whatever implements arch_get_random_{seed_}{long,int} is 
trusted.  (e.g., that we trust that Intel, Qualcomm, etc., didn't put a
back door into CPU-based random number generator, which we basically have
to take on faith because there is no way to audit it).

I tend to be a bit of pragmatist on these things, which is that I
believe it is *unlikely* that the NSA managed to pay off or blackmail
Intel into subverting RDRAND.  I can't prove it, of course, but if we
can get away with not trusting Intel, it's better if we can avoid
investing all of our trust in Intel.  On the flip side, for very early
boot, unless we can do what OpenBSD does and actually embed into the
boot loader reading a cryptographic seed from trusted store, it's
pretty much hopeless to think we can gather entropy, and we might as
well just use RDRAND, and hope for the best.  Realistically speaking,
the harm done from a bogus stack canary or even a subverted KASLR is
very small compared with introducing a wekaness into the keygen of
long-term public/private key.

And yes, spraying warning around just annoys users and if it inspires
them to do dumb(tm) things, it really doesn't help.  So suppressing
warnings during early boot is probably a good thing, since hopefully
people in early boot won't be stupid enough to generate high-value
keypairs.  :-)

>  We'd still need a proper fix for the older ones. And I don't see an
> easy way to change the init ordering for the stack canary as it gets
> setup very very early in start_kernel() vs crng_initialize() being
> an early_initcall()... Need to sleep on it.

The real fix is to do what OpenBSD does, which is to teach the
bootloader (e.g., grub) to read from some file such as
/var/lib/urandom/random-seed, and then to have the init scripts
overwrite it with a new set of entropy generated from getrandom(2) as
early as possible.

It won't solve the CD-ROM install problem (although there is enough
entropy during the install process that after the install is done, we
should be fine, and again, *hopefully* the distro people won't be
stupid enough to generate high-value keys during the installation
process, or certainly not early in the installation process).  But it
does solve most of the problem.

- Ted


Re: early x86 unseeded randomness

2017-08-14 Thread Borislav Petkov
On Mon, Aug 14, 2017 at 11:17:37AM -0700, Linus Torvalds wrote:
> Ok, guys, you ALL need to learn that blindly just trying to get rid of
> warnings IS A HORRIBLE IDEA.

Not blindly - I was actually suggesting/asking whether falling back to
the TSC that early during boot might make more sense than using unseeded
randomness. Especially add the least significant 32 bits to the most
significant i.e., that thing:

tsc + (tsc << 32UL)

as they're more unpredictable.

> But maybe those places that currently trigger the warning should just
> use "get_random_u32()" instead. That at least gets rid of the warning
> if there's a fast architected hardware random thing (ie modern x86).

Right, that is better, at least for the RDRAND machines.

We'd still need a proper fix for the older ones. And I don't see an
easy way to change the init ordering for the stack canary as it gets
setup very very early in start_kernel() vs crng_initialize() being an
early_initcall()... Need to sleep on it.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: early x86 unseeded randomness

2017-08-14 Thread Borislav Petkov
On Mon, Aug 14, 2017 at 11:17:37AM -0700, Linus Torvalds wrote:
> Ok, guys, you ALL need to learn that blindly just trying to get rid of
> warnings IS A HORRIBLE IDEA.

Not blindly - I was actually suggesting/asking whether falling back to
the TSC that early during boot might make more sense than using unseeded
randomness. Especially add the least significant 32 bits to the most
significant i.e., that thing:

tsc + (tsc << 32UL)

as they're more unpredictable.

> But maybe those places that currently trigger the warning should just
> use "get_random_u32()" instead. That at least gets rid of the warning
> if there's a fast architected hardware random thing (ie modern x86).

Right, that is better, at least for the RDRAND machines.

We'd still need a proper fix for the older ones. And I don't see an
easy way to change the init ordering for the stack canary as it gets
setup very very early in start_kernel() vs crng_initialize() being an
early_initcall()... Need to sleep on it.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: early x86 unseeded randomness

2017-08-14 Thread Linus Torvalds
On Mon, Aug 14, 2017 at 11:00 AM, Borislav Petkov  wrote:
> On Mon, Aug 14, 2017 at 10:47:47AM -0700, Linus Torvalds wrote:
>> Plus on modern x86, you'll always get at least the hardware
>> randomness, which is fundamentally much better anyway.
>
> Right, my only intention was to get rid of those:
>
> [0.00] random: get_random_bytes called from start_kernel+0x30/0x3d8 
> with crng_init=0

Ok, guys, you ALL need to learn that blindly just trying to get rid of
warnings IS A HORRIBLE IDEA.

People also need to learn that *adding* warnings isn't always a good
idea, exactly because then people will mindlessly react to them.

I *detest* bad compiler warnings for this reason. The number of
garbage patches that actually break working code that I've seen over
the year is mind-numbing.

> What do you propose? Keep 'em?

Keeping the warning (or removing the warning itself without changing
the code) is certainly preferable to trying to "fix" the warning by
bogus measures, yes.

> Or fix the above, snipped bit to conditionally do rdtsc() *once* or
> get_random_bytes() depending on the crng state?

Neither. Let's aim to make sure to fix the warning the *only* correct
way - by making sure the initialization _ordering_ is correct, not by
hacking around the caller code.

Maybe the warning message should be clarified to say that too. Make it
clear that the only acceptable fix is to change code ordering, not to
play games with randomness.

But maybe those places that currently trigger the warning should just
use "get_random_u32()" instead. That at least gets rid of the warning
if there's a fast architected hardware random thing (ie modern x86).

   Linus


Re: early x86 unseeded randomness

2017-08-14 Thread Linus Torvalds
On Mon, Aug 14, 2017 at 11:00 AM, Borislav Petkov  wrote:
> On Mon, Aug 14, 2017 at 10:47:47AM -0700, Linus Torvalds wrote:
>> Plus on modern x86, you'll always get at least the hardware
>> randomness, which is fundamentally much better anyway.
>
> Right, my only intention was to get rid of those:
>
> [0.00] random: get_random_bytes called from start_kernel+0x30/0x3d8 
> with crng_init=0

Ok, guys, you ALL need to learn that blindly just trying to get rid of
warnings IS A HORRIBLE IDEA.

People also need to learn that *adding* warnings isn't always a good
idea, exactly because then people will mindlessly react to them.

I *detest* bad compiler warnings for this reason. The number of
garbage patches that actually break working code that I've seen over
the year is mind-numbing.

> What do you propose? Keep 'em?

Keeping the warning (or removing the warning itself without changing
the code) is certainly preferable to trying to "fix" the warning by
bogus measures, yes.

> Or fix the above, snipped bit to conditionally do rdtsc() *once* or
> get_random_bytes() depending on the crng state?

Neither. Let's aim to make sure to fix the warning the *only* correct
way - by making sure the initialization _ordering_ is correct, not by
hacking around the caller code.

Maybe the warning message should be clarified to say that too. Make it
clear that the only acceptable fix is to change code ordering, not to
play games with randomness.

But maybe those places that currently trigger the warning should just
use "get_random_u32()" instead. That at least gets rid of the warning
if there's a fast architected hardware random thing (ie modern x86).

   Linus


Re: early x86 unseeded randomness

2017-08-14 Thread Borislav Petkov
On Mon, Aug 14, 2017 at 10:47:47AM -0700, Linus Torvalds wrote:
> Plus on modern x86, you'll always get at least the hardware
> randomness, which is fundamentally much better anyway.

Right, my only intention was to get rid of those:

[0.00] random: get_random_bytes called from start_kernel+0x30/0x3d8 
with crng_init=0

What do you propose? Keep 'em?

Or fix the above, snipped bit to conditionally do rdtsc() *once* or
get_random_bytes() depending on the crng state?

> So this patch is utter and absolute garbage, and should be shot in the
> head and buried very very deep.

/me takes out a 44 magnum...

> Please immediately delete it from the whole internet.

Haha, lemme call a guy.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: early x86 unseeded randomness

2017-08-14 Thread Borislav Petkov
On Mon, Aug 14, 2017 at 10:47:47AM -0700, Linus Torvalds wrote:
> Plus on modern x86, you'll always get at least the hardware
> randomness, which is fundamentally much better anyway.

Right, my only intention was to get rid of those:

[0.00] random: get_random_bytes called from start_kernel+0x30/0x3d8 
with crng_init=0

What do you propose? Keep 'em?

Or fix the above, snipped bit to conditionally do rdtsc() *once* or
get_random_bytes() depending on the crng state?

> So this patch is utter and absolute garbage, and should be shot in the
> head and buried very very deep.

/me takes out a 44 magnum...

> Please immediately delete it from the whole internet.

Haha, lemme call a guy.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: early x86 unseeded randomness

2017-08-14 Thread Linus Torvalds
On Mon, Aug 14, 2017 at 10:35 AM, Borislav Petkov  wrote:
>
> how about we address that unseeded randomness usage during early boot by
> falling back on the TSC on x86? I mean, we already do that for the stack
> canary value anyway...

That patch is completely broken:


> +   if (crng_ready())
> +   get_random_bytes(, sizeof(canary));
> +   else
> +   canary = rdtsc();
> +
> tsc = rdtsc();
> canary += tsc + (tsc << 32UL);

So now you do rdtsc() twice, and then add them together.

Adding the same value together adds absolutely zero information.
Quite the reverse - it just makes the values cancel out and you're
shifting away one bit.

So the current code that just does an unconditional
"get_random_bytes()" and then adds the TSC into it for noise when it's
not ranom is actually *objectively* better than that broken crap you
just tried.

Plus on modern x86, you'll always get at least the hardware
randomness, which is fundamentally much better anyway.

So this patch is utter and absolute garbage, and should be shot in the
head and buried very very deep.

Please immediately delete it from the whole internet.

   Linus


Re: early x86 unseeded randomness

2017-08-14 Thread Linus Torvalds
On Mon, Aug 14, 2017 at 10:35 AM, Borislav Petkov  wrote:
>
> how about we address that unseeded randomness usage during early boot by
> falling back on the TSC on x86? I mean, we already do that for the stack
> canary value anyway...

That patch is completely broken:


> +   if (crng_ready())
> +   get_random_bytes(, sizeof(canary));
> +   else
> +   canary = rdtsc();
> +
> tsc = rdtsc();
> canary += tsc + (tsc << 32UL);

So now you do rdtsc() twice, and then add them together.

Adding the same value together adds absolutely zero information.
Quite the reverse - it just makes the values cancel out and you're
shifting away one bit.

So the current code that just does an unconditional
"get_random_bytes()" and then adds the TSC into it for noise when it's
not ranom is actually *objectively* better than that broken crap you
just tried.

Plus on modern x86, you'll always get at least the hardware
randomness, which is fundamentally much better anyway.

So this patch is utter and absolute garbage, and should be shot in the
head and buried very very deep.

Please immediately delete it from the whole internet.

   Linus