Re: AES leaks, cgd ciphers, and vector units in the kernel

2020-06-18 Thread Paul.Koning



> On Jun 17, 2020, at 7:36 PM, Taylor R Campbell  wrote:
> 
> 
> [EXTERNAL EMAIL] 
> 
> ...
> 1. Replace the variable-time AES reference implementation we've been
>   using by constant-time AES software from Thomas Pornin's
>   high-quality BearSSL libary.
>  ...
>   Performance impact:  The cost is that constant-time AES software is
> much slower -- cgd AES-CBC encryption throughput is reduced to
> about 1/3, and decryption to about 1/2 (very roughly).  This is
> bad, obviously, but it is mostly addressed by the next two parts.

That's a pretty steep price.  It is worth it for some, not clear if it's worth 
it for others.  If I understand right, these are local attacks, not network 
attacks.  Users may judge that the risk from local attacks is not sufficient to 
pay this price.

> 2. Add support for CPU AES instructions on Intel, AMD, VIA, and
>   aarch64 CPUs to implement the kernel's synchronous AES API,
>   including machinery to allow the kernel to use the CPU's vector
>   unit.

Are those constant-time instructions?  They would need to be, I assume, 
otherwise we're moving the problem to a different place.

> ...
> 3. Add an alternative cgd cipher Adiantum[3], which is built out of
>   AES (used only once per disk sector), Poly1305, NH, and XChaCha12,
>   and has been deployed by Google for disk encryption on lower-end
>   ARM systems.
> 
>   Security impact:  Adiantum generally provides better disk
> encryption security than AES-CBC or AES-XTS because it encrypts
> an entire disk sector at a time, rather than individual cipher
> blocks independently like AES-XTS does or suffixes in units of
> cipher blocks like AES-CBC does, so two snapshots of a disk
> reveal less information with Adiantum than with AES-CBC or
> AES-XTS.  Of course, Adiantum is a different cipher so you have
> to create new cgd volumes if you want to use it.

Has this new system received enough scrutiny to justify its use in production?  
 I know AES but not the other bits, and in any case an insecure composite can 
be built out of secure building blocks.

paul


Re: [PATCH] Kernel entropy rework

2019-12-22 Thread Paul.Koning



> On Dec 21, 2019, at 5:08 PM, Taylor R Campbell  wrote:
> 
> 
> 
> The attached patch set reworks the kernel entropy subsystem.
> 
> ...
>  - For (e.g.) keyboard interrupt and network packet timings, this
>is zero, because an adversary can cause events to happen with
>timing that leads to predictable samples entering the pool.

That seems overly pessimistic, depending on the timer resolution.  If you have 
a CPU cycle timer, then it is perfectly reasonable to claim a bit or two of 
entropy, since an adversary doesn't have the ability to control the timing of 
those events to nanosecond accuracy, nor the ability to control internal 
processing delays (like memory cache misses) which introduce variability way in 
excess of a CPU cycle.

paul


Re: racy acccess in kern_runq.c

2019-12-06 Thread Paul.Koning



> On Dec 6, 2019, at 10:21 AM, Mouse  wrote:
> 
> 
> [EXTERNAL EMAIL] 
> 
>> Compilers have became much more aggressive over the years.  But they
>> are allowed to be so by the C standard.  Specifically, in addition to
>> code-level re-ordering, plain accesses (loads/stores) are subject to
>> load/store fusing, tearing as well as invented loads/stores.
> 
> Then, honestly, it sounds to me as though "the latest revision of C" is
> no longer an appropriate language for writing kernels.  I see no reason
> to twist the kernel code into a pretzel to work around latitude a new
> language gives to its compilers - and that's what C11 is, a new
> language, albeit one closely related to various previous languages.
> 
> One of the prime considerations when choosing a language and/or
> compiler for building a kernel is that it produce relatively
> predictable code, for exactly this kind of reason.  If the latest C and
> the latest gcc no longer do that, then IMO they are no longer
> appropriate for writing/compiling kernels.

C11 isn't all that new, of course.  And I don't know if the rules about 
optimization that you object to are anywhere near that new in any case.  What 
seems to be the case instead is that compilers are more and more doing what the 
language has for a long time allowed them to do.

Consider for example "Undefined behavior -- what happened to my code?" by Wang 
et al. (MIT and Tsinghua University).  It describes all sorts of allowed 
transformations that come as a surprise to programmers.

Yes, it certainly would be possible to create a programming language with less 
surprising semantics.  C is not in any way shape or form a clean language with 
clean semantics.  But as for wanting predictable code, that is certainly 
doable.  Unfortunately, it requires a sufficiently deep understanding of the 
rules of the language.  Typical textbooks are not all that good for this, they 
are too superficial.  The language standard tends to be better.  But again, a 
problem with most programming languages is that their semantics are not well 
defined and/or much more complex than they should be.  Programmming in C++ is 
particularly scary for this reason, but C is also problematic.  For clean 
semantics, I like ALGOL; too bad it is no longer used, though it did at one 
time serve to build successful operating systems.

paul



Re: /dev/random is hot garbage

2019-07-23 Thread Paul.Koning



> On Jul 21, 2019, at 5:03 PM, Joerg Sonnenberger  wrote:
> 
> 
> [EXTERNAL EMAIL] 
> 
> On Sun, Jul 21, 2019 at 08:50:30PM +, paul.kon...@dell.com wrote:
>> /dev/urandom is equivalent to /dev/random if there is adequate entropy,
>> but it will also deliver random numbers not suitable for cryptography before 
>> that time.
> 
> This is somewhat misleading. The problem is that with an unknown entropy
> state, the system cannot ensure that an attacker couldn't predict the
> seed used for the /dev/urandom stream. That doesn't mean that the stream
> itself is bad. It will still pass any statistical test etc.

That's exactly my point.  If you're interested in a statistically high quality 
pseudo-random bit stream, /dev/urandom is a gread source.  But if you need a 
cryptographically strong random number, then you can't safely proceed with an 
unknown entropy state for the reason you stated, which translates into "you 
must use /dev/random".

> Note that with the option of seeding the CPRNG at boot time, a lot of
> the distinction is actually moot.

Yes, if at boot time you get enough entropy then /dev/random is unblocked.  The 
distinction still matters because an application can't know this, so it should 
express its requirements by choosing the correct device.

paul


Re: /dev/random is hot garbage

2019-07-22 Thread Paul.Koning



> On Jul 22, 2019, at 4:55 PM, Joerg Sonnenberger  wrote:
> 
> 
> [EXTERNAL EMAIL] 
> 
> On Mon, Jul 22, 2019 at 04:36:41PM +, paul.kon...@dell.com wrote:
>> 
>> 
>>> On Jul 22, 2019, at 10:52 AM, Joerg Sonnenberger  wrote:
>>> 
>>> 
>>> [EXTERNAL EMAIL] 
>>> 
>>> On Sun, Jul 21, 2019 at 09:13:48PM +, paul.kon...@dell.com wrote:
 
 
> On Jul 21, 2019, at 5:03 PM, Joerg Sonnenberger  wrote:
> 
> 
> [EXTERNAL EMAIL] 
> 
> On Sun, Jul 21, 2019 at 08:50:30PM +, paul.kon...@dell.com wrote:
>> /dev/urandom is equivalent to /dev/random if there is adequate entropy,
>> but it will also deliver random numbers not suitable for cryptography 
>> before that time.
> 
> This is somewhat misleading. The problem is that with an unknown entropy
> state, the system cannot ensure that an attacker couldn't predict the
> seed used for the /dev/urandom stream. That doesn't mean that the stream
> itself is bad. It will still pass any statistical test etc.
 
 That's exactly my point.  If you're interested in a statistically high
 quality pseudo-random bit stream, /dev/urandom is a gread source.  But
 if you need a cryptographically strong random number, then you can't
 safely proceed with an unknown entropy state for the reason you stated,
 which translates into "you must use /dev/random".
>>> 
>>> That distinction makes no sense at all to me. /dev/urandom is *always* a
>>> cryptographically strong RNG. The only difference here is that without
>>> enough entropy during initialisation of the stream, you can brute force
>>> the entropy state and see if you get a matching output stream based on
>>> that seed.
>> 
>> I use a different definition of "cryptographically strong".  A bit string
>> that's guessable is never, by any useful definition, "cryptographically
>> strong" no matter what the properties of the string extender are.  The
>> only useful definition for the term I can see is as a synonym for
>> "suitable for security critical value in cryptographic algorithms".
>> An unseeded /dev/urandom output is not such a value.
> 
> Again, that's not really a sensible definition. It's always possible to
> guess the seed of used by the /dev/urandom CPRNG. By definition. That
> doesn't change the core properties though: there is no sensible way to
> predict the output of CPRNG without knowing the initial seed and offset.
> There is no known correlation between variations of the seed. As in: the
> only thing partial knowledge of the seed gives you is reducing the
> propability of guessing the right seed. It's a similar situation to why
> the concept of entropy exhaustion doesn't really make sense.

I guess I didn't state the requirement for "cryptographically strong" clearly 
enough.

A different but equivalent way of stating it: it must not be feasible, given 
the RNG output up to this point, to predict (with probability better than 50/50 
guess) the future output.

The /dev/urandom output is a function only of its internal state, for any span 
of time where no additional entropy is injected.  So the requirement translates 
into: (a) you can't guess the internal state from the output, (b) you can't 
predict future output from past output without knowing the internal state.

Ok, now suppose you have the output from /dev/urandom up to now, and only a few 
bits of entropy were injected since startup.  Is the output strong?  No, 
because the small number of entropy bits means that you can enumerate the 
possible combinations of entropy bits, construct the corresponding internal 
state, and generate trials output.  The trial output string that matches what 
you observed tells you the internal state, and consequently the future output.

This means that "enough entropy" translates to "so many bits of entropy that it 
is infeasible to guess the RNG internal state".

For an example of how you can create a real world security defect by using an 
RNG with insufficient entropy in its seed, look up the Debian SSL bug (for 
example https://www.schneier.com/blog/archives/2008/05/random_number_b.html).

paul



Re: /dev/random is hot garbage

2019-07-22 Thread Paul.Koning



> On Jul 22, 2019, at 10:52 AM, Joerg Sonnenberger  wrote:
> 
> 
> [EXTERNAL EMAIL] 
> 
> On Sun, Jul 21, 2019 at 09:13:48PM +, paul.kon...@dell.com wrote:
>> 
>> 
>>> On Jul 21, 2019, at 5:03 PM, Joerg Sonnenberger  wrote:
>>> 
>>> 
>>> [EXTERNAL EMAIL] 
>>> 
>>> On Sun, Jul 21, 2019 at 08:50:30PM +, paul.kon...@dell.com wrote:
 /dev/urandom is equivalent to /dev/random if there is adequate entropy,
 but it will also deliver random numbers not suitable for cryptography 
 before that time.
>>> 
>>> This is somewhat misleading. The problem is that with an unknown entropy
>>> state, the system cannot ensure that an attacker couldn't predict the
>>> seed used for the /dev/urandom stream. That doesn't mean that the stream
>>> itself is bad. It will still pass any statistical test etc.
>> 
>> That's exactly my point.  If you're interested in a statistically high
>> quality pseudo-random bit stream, /dev/urandom is a gread source.  But
>> if you need a cryptographically strong random number, then you can't
>> safely proceed with an unknown entropy state for the reason you stated,
>> which translates into "you must use /dev/random".
> 
> That distinction makes no sense at all to me. /dev/urandom is *always* a
> cryptographically strong RNG. The only difference here is that without
> enough entropy during initialisation of the stream, you can brute force
> the entropy state and see if you get a matching output stream based on
> that seed.

I use a different definition of "cryptographically strong".  A bit string 
that's guessable is never, by any useful definition, "cryptographically strong" 
no matter what the properties of the string extender are.  The only useful 
definition for the term I can see is as a synonym for "suitable for security 
critical value in cryptographic algorithms".  An unseeded /dev/urandom output 
is not such a value.

RFC 1750 is still a useful resource even though it's 25 years old.  There is 
newer work by highly respected cryptographers, too.

paul


Re: Plentiful unpredictable randomness

2019-07-22 Thread Paul.Koning



> On Jul 22, 2019, at 8:17 AM, Andreas Gustafsson  wrote:
> 
> 
> [EXTERNAL EMAIL] 
> 
> Taylor R Campbell wrote:
>> It has become popular to redefine the traditional semantics of
>> /dev/random or /dev/urandom so that one or both will block once at
>> boot until the OS thinks the entropy pool may have been seeded, and
>> then never block again.
> 
> IMO, those are the most useful semantics for a /dev/*random device,
> and NetBSD ought to provide a device that works this way.
> 
> This would combine the advantages of the existing /dev/random and
> /dev/urandom by providing randomness that is both unpredictable (like
> /dev/random) and plentiful (like /dev/urandom) once seeded.  It would
> not solve the problem of /dev/random blocking forever when a system
> has no entropy available at all, but it would solve the more common
> problem of it blocking due to becoming "exhausted" from use.

Yes.  It's a good idea to study the modern literature of cryptographic random 
number generators for this issue.

The fundamental requirement of a crypto RNG is that you can't predict (in the 
sense of "better than a 50/50 guess") any bit of the output stream given all 
the other bits of the stream.  That translates into a mathematical property of 
"indistinguishable from a random process", which is one of the modern tests for 
good ciphers.  Given the birthday paradox, a stream generated by a good cipher 
is likely to be distinsguishable from a random process after about 2^sqrt(block 
size) bits, so, for example, a RNG built around SHA-256 should not be used to 
generate more than 2^128  bits before reseeding.

But apart from that, given that the underlying machinery is built correctly, 
enough seed bits will make the initial state unpredictable, and the cipher will 
make all the remaining bits up to the limit unpredictable.  That translates to 
the "block once" property you described as desirable.

Maybe NetBSD needs an algorithm upgrade for the /dev/random core, or maybe the 
current one is adquate; I haven't studied that.  I'm also not a cryptographer; 
the above comments are about at the limit of my knowledge.

paul


Re: /dev/random is hot garbage

2019-07-22 Thread Paul.Koning


> On Jul 21, 2019, at 4:55 PM, Edgar Fuß  wrote:
> 
> 
> [EXTERNAL EMAIL] 
> 
> TRC> There is no reason in modern cryptography to read more than one byte
> TRC> from /dev/random ever in a single application; once you have done
> TRC> that, or confirmed some other way that the the entropy pool is seeded,
> TRC> you should generate keys from /dev/urandom.
> 
> DAH> There should be some way to do that without throwing away 8 random
> DAH> bits.
> Isn't that called poll()/select() etc?
> As far as I understand, it's not about actually reading from /dev/random, 
> but checking whether you could read without blocking, isn't it?

I don't agree with this reasoning.

If /dev/random is implemented right, it won't block later once it unblocks for 
the first time.  Given that, an application that needs a cryptographic random 
number should simply fetch all the bits it needs from /dev/random.

The only reason to read from /dev/urandom is that you want random numbers but 
they don't need to be strong.

If people do these hacks because we still have the "entropy is used up" notion 
in the code, the answer is to remove that.  

paul


Re: /dev/random is hot garbage

2019-07-21 Thread Paul.Koning



> On Jul 21, 2019, at 3:20 PM, Taylor R Campbell 
>  wrote:
> 
> 
> [EXTERNAL EMAIL] 
> 
>> Date: Sun, 21 Jul 2019 20:52:52 +0200
>> From: Manuel Bouyer 
>> 
>> /dev/randon actually works as documented and if rust wants /dev/urandom
>> behavior it should use /dev/urandom. Also I'd like to get explained why
>> a compiler needs that much random bits.
> 
> The difference is that /dev/random may block, and if it blocks, it
> doesn't wake up until the entropy pool is seeded.  In contrast,
> /dev/urandom never blocks, even if the entropy pool has not yet been
> seeded.
> 
> There is no reason in modern cryptography to read more than one byte
> from /dev/random ever in a single application; once you have done
> that, or confirmed some other way that the the entropy pool is seeded,
> you should generate keys from /dev/urandom.

The way I see it:

/dev/random blocks until it has adequate entropy to deliver cryptographically 
strong random numbers.  Once unblocked it delivers such random numbers.

/dev/urandom is equivalent to /dev/random if there is adequate entropy, but it 
will also deliver random numbers not suitable for cryptography before that time.

In addition, the notion of "entropy being consumed" is obsolete (if it was ever 
valid), so once adequately seeded /dev/random should not block after that.

Do we have an implementation that does these things?  It's critical to have a 
good implementation of /dev/random, otherwise you can't run security products.

paul


Re: re-enabling debugging of 32 bit processes with 64 bit debugger

2019-06-28 Thread Paul.Koning



> On Jun 28, 2019, at 4:44 PM, Christos Zoulas  wrote:
> 
> 
> [EXTERNAL EMAIL] 
> 
> 
> Background:
> 
> Max disabled the 32 bit code paths in process_machdep.c and matchdep.c 
> so trying to debug 32 bit processes from a 64 bit debugger. From his commit
> message I think this was not intentional:
> 
>revision 1.36
>date: 2017-10-19 05:32:01 -0400;  author: maxv;  state: Exp;  lines: +35 
> -0;  commitid: 0ZqTTwMXhMd40EbA;
>Make sure we don't go farther with 32bit LWPs. There appears to be some
>confusion in the code - in part introduced by myself -, and clearly this
>place is not supposed to handle 32bit LWPs.
> 
>Right now we're returning EINVAL, but verily we would need to redirect
>these calls to their netbsd32 counterparts.
> 
> I've been asking him privately to re-add the code (I even gave him a patch),
> but after not responding for a few days we had the exchange (appended below)
> which leads me to believe that he does not believe the functionality is 
> useful.
> 
> I would like to have this functinality restored because as I explained below
> it is easier to use a 64 bit debugger on a 32 bit app (for various reasons),
> plus I want to add some unit-tests to make sure we don't break it in the
> future, since it is required for mips64 right now. It is harder to add
> a new testing platform than doing this on amd64.
> 
> What do you think? SHould we make the code work like before? Or this is
> functionality that we don't want to have because it is "dumb"? (I think
> that Max here means that it adds complexity and it could be dangerous,
> but I am just guessing)

I'm baffled that this is even debatable.  The system supports running 32 bit 
code in a 64 bit system.  Obviously you must be able to debug such processes.

I suppose you could claim it would suffice to build two debuggers, one for each 
target.  But that makes no sense.  All the toolchains are multi-architecture: 
you can compile for 32 or 64 bit at the drop of a switch, and you can link all 
that with a single toolchain. GDB has supported multi-arch for a long time (in 
fact, not just multiple width but entirely different ISAs from a single image). 
 So it would be thoroughly strange to say that this sort of basic flexibility 
and user-friendliness is to be abandoned here.  And why would NetBSD want to 
regress like that?  Other platforms do this as a matter of course; it seems odd 
for NetBSD even to consider looking technically inferior in this respect.

paul


Re: Importing libraries for the kernel

2018-12-14 Thread Paul.Koning



> On Dec 14, 2018, at 2:16 PM, Joerg Sonnenberger  wrote:
> 
> On Fri, Dec 14, 2018 at 01:00:25PM -0500, Mouse wrote:
> ...
>> I also disagree that asymmetric crypto is necessarily all that complex.
>> Some asymmetric crypto algorithms require nothing more complex than
>> large-number arithmetic.  (Slow, yes, but not particularly complex.)
> 
> Correct and fast implementations of large number arithmetic are
> complex, esp. if you also want to avoid the typical set of timing leaks.
> This applies to operation sets used by RSA as well as those used by ECC.
> Different classes of operations, but a mine field to get right.

Indeed, side channel attacks of all kinds.  There are lots of ways to
get into trouble.  Consider the acoustic attack on RSA that allowed
researchers to recover private keys by listening to the sound made by
cellphones running the RSA algorithm.  
https://www.cs.tau.ac.il/~tromer/papers/acoustic-20131218.pdf

paul


Re: Support for tv_sec=-1 (one second before the epoch) timestamps?

2018-12-14 Thread Paul.Koning



> On Dec 14, 2018, at 9:30 AM, Joerg Sonnenberger  wrote:
> 
> 
> [EXTERNAL EMAIL] 
> 
> On Thu, Dec 13, 2018 at 02:37:06AM +0100, Kamil Rytarowski wrote:
>> In real life it's often needed to store time_t pointing before the UNIX
>> epoch.
> 
> Again, I quite disagree and believe that you are confusing two different
> things. It makes perfect sense in certain applications to store time as
> relative to the UNIX epoch. But that's not the same as time_t which is a
> specific type for a *system* interface. I'm strongly question the
> sensibility of trying to put dates before 1970 in the context of time_t.
> 
> Joerg

I'm not sure if people care about this example, but here's one: if you want to 
archive old files with their original timestamps, and those files predate the 
epoch.

paul


Re: Support for tv_sec=-1 (one second before the epoch) timestamps?

2018-12-13 Thread Paul.Koning



> On Dec 13, 2018, at 6:06 AM, Martin Husemann  wrote:
> 
> 
> [EXTERNAL EMAIL] 
> 
> On Thu, Dec 13, 2018 at 03:29:03AM +, David Holland wrote:
>> On Wed, Dec 12, 2018 at 10:27:04PM +0100, Joerg Sonnenberger wrote:
>>> On Wed, Dec 12, 2018 at 08:46:33PM +0100, Micha? G?rny wrote:
 While researching libc++ test failures, I've discovered that NetBSD
 suffers from the same issue as FreeBSD -- that is, both the userspace
 tooling and the kernel have problems with (time_t)-1 timestamp,
 i.e. one second before the epoch.
>>> 
>>> I see no reason why that should be valid or more general, why any
>>> negative value of time_t is required to be valid.
>> 
>> Are you Dan Pop? :-)
> 
> Not sure about that, but I agree that we should not extend the range of
> time_t (aka "seconds since the epoch") to negative values. It is a pandora
> box, keep it closed.
> 
> Martin

You could certainly make that restriction.  On the other hand, the TZ project 
maintains timezone offset rules for times long before the epoch.  Those time 
stamps, and the rules for processing them, are well defined.  At least until 
you get far enough back that Gregorian vs. Julian calendar becomes a 
consideration.

paul


Re: Spectre on non-amd64

2018-01-19 Thread Paul.Koning


> On Jan 19, 2018, at 4:47 PM,   wrote:
> 
> Hi folks.
> 
> I think that the spectre variant 2 situation is a lot worse for:
> - Speculative CPU
> - Weak memory protection
> 
> Then I don't need a JIT for gadgets.
> 
> Architectures that fall into this:
> - default i386 netbsd, because it is missing NX bit (PAE is optional)
> - MIPS for us, because we don't use kseg2 and then it doesn't go through
>  MMU.

which MIPS do speculative execution?

paul



Re: Spectre

2018-01-18 Thread Paul.Koning


> On Jan 18, 2018, at 10:31 AM, Mouse  wrote:
> 
>> ...
> 
>> The Spectre fixes all amount to a speculative barrier, which will do
>> the job just as well (though it requires code change).
> 
> Yes...but it requires a code change in the wrong place.
> 
> That "if (access is ok)" check that needs a spec ex barrier could well
> be inside a library that doesn't want to cripple performance for
> non-sandboxed applications.  See also the spectre paper's description
> of use of code that doesn't think it's making an access check but
> happens to contain an instruction sequence that can be used that way.
> 
> I'd prefer to have a spec ex disable bit which the sandbox could set
> for the duration of the sandboxed code. 

That's an option.  But for regular (not indirect) branches like the
example access check, most cases are not Spectre risks.  There is only
an issue if the speculatively loaded code is subjected to data dependent
actions that are visible through side channels.  For a lot of code, that
doesn't apply.

So yes, a spec ex barrier after such checks can affect performance 
somewhat.  But a "spec ex disable" bit that utterly turns off speculative
execution for sandboxed code will have an impact that's massively larger.
Easier to apply, sure -- well, once you have those bits which of course
right now you do not.  But a great deal more costly than spec ex
barriers applied with skill.

paul



Re: Spectre

2018-01-18 Thread Paul.Koning


> On Jan 18, 2018, at 9:48 AM, Mouse  wrote:
> 
>> Since this involves a speculative load that is legal from the
>> hardware definition point of view (the load is done by kernel code),
>> this isn't a hardware bug the way Meltdown is.
> 
> Well, I'd say it's the same fundamental hardware bug as meltdown, but
> not compounded by an additional hardware property (which I'm not sure I
> would call a bug) which is made much worse by the actual bug.
> 
> To my mind, the bug here is that annulling spec ex doesn't annul _all_
> its effects.  That, fundamentally, is what's behind both spectre and
> meltdown.  In meltdown it's exacerbated by spec ex's failure to check
> permissions fully - but if the side effects were annulled correctly,
> even that failure wouldn't cause trouble.

That's true.  But the problem is that cache fill is only the most
obvious and easiest to exploit side channel.  There are others, such
as timing due to execution units being busy, that are harder to exploit
but also harder to cure.  It seems to me that blocking all observable
side effects of speculative execution can probably only be done by
disabling speculative execution outright.  That clearly isn't a good
thing.  The Spectre fixes all amount to a speculative barrier, which
will do the job just as well (though it requires code change).  The
Meltdown fix is more obvious: don't omit mode dependent access checks
before launching a speculative load, as most CPU designers already did.

paul



Re: Spectre

2018-01-18 Thread Paul.Koning


> On Jan 18, 2018, at 8:49 AM, Joerg Sonnenberger  wrote:
> 
> On Wed, Jan 17, 2018 at 09:38:27PM -0500, Mouse wrote:
>> But, on the other hand, I can easily imagine a CPU designer looking at
>> it and saying "What's the big deal if this code can read that location?
>> It can get it anytime it wants with a simple load instruction anyway.",
>> something I have trouble disagreeing with.
> 
> Consider something like BPF -- code executed in the kernel with an
> enforced security model to prevent "undesirable" acceses. It will create
> logic like:
> 
>void *p = ...;
>if (!is_accesible(p))
>  raise_error();
>load(p);
> 
> Now imagine that the expression for p is intentionally pointing into
> userland and depends on the speculative execution of something else.
> Loading the pointer speculatively results in a visible side effect that
> defeats in part the access check. In short, it can effectively invert
> access control checks for verified code.

Yes, you've just described Spectre.  Since this involves a speculative
load that is legal from the hardware definition point of view (the load
is done by kernel code), this isn't a hardware bug the way Meltdown is.
But it's an issue that requires a fix -- which is a speculative execution 
barrier between the software access check, and the subsequent code that
is legal only if the check is successful.

paul


Re: Spectre

2018-01-17 Thread Paul.Koning


> On Jan 17, 2018, at 8:08 PM, Mouse  wrote:
> 
> ...
>> - Even speculative execution obeys access restrictions,
> 
> In some respects.  Meltdown is possible because Intel spec ex does not
> obey access restrictions in one particular respect; I don't know what
> aspects may not be obeyed by what CPUs except for that.

Indeed.  I was surprised, but apparently that "obeys..." is wrong in
the case of Intel, though it is correct, as you might expect, for AMD
and ARM and probably most other architectures.

More precisely, speculative execution obeys access restrictions in
the sense that no architecturally visible (i.e., register/memory)
changes occur that are prohibited by the access controls.  But Intel
does launch a speculative load without checking access; apparently
the access check is done in parallel and will complete a while later,
by which time the speculatively loaded data is in the cache and some
other operations may be done based on it.

Obviously, if speculative loads check permissions prior to launching
the load, the issue goes away.  If so, Meltdown is completely 
prevented.

Spectre is unrelated and does not depend on a mistake of this kind,
since there you're dealing with speculative loads that ARE permitted
as far as access control goes; they just aren't wanted because they
are preceded by range checks or the like.

paul



Re: meltdown

2018-01-05 Thread Paul.Koning


> On Jan 4, 2018, at 6:01 PM, Warner Losh  wrote:
> 
> 
> 
> On Thu, Jan 4, 2018 at 2:58 PM, Mouse  wrote:
> > As I understand it, on intel cpus and possibly more, we'll need to
> > unmap the kernel on userret, or else userland can read arbitrary
> > kernel memory.
> 
> "Possibly more"?  Anything that does speculative execution needs a good
> hard look, and that's damn near everything these days.
> 
> > Also, I understand that to exploit this, one has to attempt to access
> > kernel memory a lot, and SEGV at least once per bit.
> 
> I don't think so.  Traps that would be taken during normal execution
> are not taken during speculative execution.  The problem is, to quote
> one writeup I found, "Intel CPUs are allowed to access kernel memory
> when performing speculative execution, even when the application in
> question is running in user memory space.  The CPU does check to see if
> an invalid memory access occurs, but it performs the check after
> speculative execution, not before.".  This means that things like cache
> line loads can occur based on values the currently executing process
> should not be able to access; timing access to data that cache-collides
> with the cache lines of interest reveals the leaked bit(s).
> 
> Nowhere in there is a SEGV generated.
> 
> That's the meltdown stuff.  Spectre targets other things (I've seen
> branch prediction mentioned) to leak information around protection
> barriers.
> 
> I think you are confusing spectre and meltdown.
> 
> meltdown requires a sequence like:
> 
> exception (*0 = 0 or a = 1 / 0);
> do speculative read
> 
> to force a trip into kernel land just before the speculative read so that 
> otherwise not readable stuff gets (or does not get) read into cache which can 
> then be probed for data.

No, that's not correct.  You were being mislead by the "Toy example".
The toy example demonstrates that speculative operation are done
after the point in the code that generates an exception, but it in
itself is NOT the exploit.

The exploit has the form:

x = read(secret_memory_location);
touch (cacheline[x]);
while (1) ;

The first line will SEGV, of course, but in the vulnerable CPUs
the speculative load is issued before that happens.  And also before
the SEGV happens, cacheline[x] is touched, making that line resident
in the cache.  This "transmits to the side channel".

Next, the SEGV happens.  The exploit catches that, and then it
does a timing test on references to cacheline[i] to see which i is
now resident.  That i is the value  of x.

As the paper points out, it would be possible in principle to prefix
the exploit with

if (false) // predict_true

so the illegal read is also speculative, and is voided (exception
and all) when the wrong branch prediction is sorted out. But it
looks like the paper is saying that refinement has not been
demonstrated, though such branch prediction hacks have been shown
in other exploits.  Still, if that can be done, a test for
"SEGV too often" is no help.

The Meltdown paper clearly says that the KAISER fix cures this
vulnerability.  And while it doesn't say so, it is also clear that
the problem does not exist on CPUs where speculative memory references
do page protection checks.

All the above applies to Meltdown.  Spectre is unrelated in its
core mechanism.  The fact that both eventually end up using side
channels and were published at the same time seems to have caused
some confusion between the two.  It is important to understand they
are independent, stem from different underlying problems, apply
to a different set of vulnerable chips, and have different cures.

paul



Re: meltdown

2018-01-04 Thread Paul.Koning


> On Jan 4, 2018, at 4:58 PM, Mouse  wrote:
> 
>> As I understand it, on intel cpus and possibly more, we'll need to
>> unmap the kernel on userret, or else userland can read arbitrary
>> kernel memory.
> 
> "Possibly more"?  Anything that does speculative execution needs a good
> hard look, and that's damn near everything these days.
> 
>> Also, I understand that to exploit this, one has to attempt to access
>> kernel memory a lot, and SEGV at least once per bit.
> 
> I don't think so.  Traps that would be taken during normal execution
> are not taken during speculative execution.  The problem is, to quote
> one writeup I found, "Intel CPUs are allowed to access kernel memory
> when performing speculative execution, even when the application in
> question is running in user memory space.  The CPU does check to see if
> an invalid memory access occurs, but it performs the check after
> speculative execution, not before.".  This means that things like cache
> line loads can occur based on values the currently executing process
> should not be able to access; timing access to data that cache-collides
> with the cache lines of interest reveals the leaked bit(s).
> 
> Nowhere in there is a SEGV generated.

That depends.  The straightforward case of Meltdown starts with an
illegal load, which the CPU will execute anyway speculatively, resulting
in downstream code execution that can be used to change the cache state.
In that form, the load eventually aborts.

There's a discussion in the paper that the load could be preceded by
a branch not taken that's predicted taken.  If so, the SEGV would indeed
not happen, but it isn't clear how feasible this is.

In any case, the problem would not occur in any CPU that does protection
checks prior to issuing speculative memory references.  

paul



Re: mount_apfs?

2017-11-08 Thread Paul.Koning


> On Nov 8, 2017, at 5:07 AM, Edgar Fuß  wrote:
> 
>> here's a description of the APFS (Apple File System) format:
> So they didn't open-source the code?

Apparently not.  But an entry in "Hacker news" (on ycombinator) says:

An open source implementation is not available at this time. 
Apple plans to document and publish the APFS volume format when 
Apple File System is released in 2017.

No idea if that statement is accurate or not, but FWIW...

paul


Re: kaslr: better rng

2017-11-07 Thread Paul.Koning

> On Nov 7, 2017, at 11:21 AM, Taylor R Campbell 
>  wrote:
> 
>> Date: Tue, 7 Nov 2017 09:16:25 +0100
>> From: Maxime Villard 
>> ...
>> Well yes, my initial plan was two different files.
> 
> What's the security goal you hope to achieve by having two different
> files that cannot be achieved by using one and deriving two subkeys
> from it?

If you use two parts of a single file that's equivalent to using two files.

If two RNGs use the same data from the file as the starting point, then you 
have to argue security from the strengths of the two derivations.  Presumably 
they use additional entropy to make that work. If so, is the additional entropy 
enough on its own?  If yes, then you don't need the stored file in the first 
place.

paul

Re: kaslr: better rng

2017-11-06 Thread Paul.Koning


> On Nov 6, 2017, at 12:51 PM, Maxime Villard  wrote:
> 
> Le 06/11/2017 à 18:28, Thor Lancelot Simon a écrit :
>> On Mon, Nov 06, 2017 at 07:30:35AM +0100, Maxime Villard wrote:
>>> I'm in a point where I need to have a better rng before continuing - and an
>>> rng that can be used in the bootloader, in the prekern and in the kernel
>>> (early).
>>> 
>>> I would like to use a system similar to the /var/db/entropy-file 
>>> implementation.
>>> That is to say, when running the system generates /var/db/random-file, which
>>> would contain at least 256bytes of random data. When booting the bootloader
>>> reads this file, can use some of its bytes to get random values. It then 
>>> gives
>>> the file to the prekern which will use some other parts of it. The prekern
>>> finally gives the file to the kernel which can use the rest.
>> What is the reason for using only part of the file, in any application?
> 
> I meant to say that the components don't take random values from the same
> area in the file, for them not to use the same random numbers twice.

Yes, that's critical if the other sources of entropy aren't sufficient by 
themselves.  Then again, if they are, there is no reason to bother with this 
file in the first place.

If you think you need this file, I would argue there should be two: the current 
entropy file for the kernel to use, and a separate one generated from a 
different chunk of random bit stream, exclusively for the use next time by the 
bootloader.

paul



Re: Access to DMA memory while DMA in progress?

2017-10-27 Thread Paul.Koning

> On Oct 27, 2017, at 10:36 AM, Mouse  wrote:
> 
>>> I would like to read the DMA buffer while DMA is still going on.
>>> [...]  I'm fine if the CPU's view lags the hardware's view slightly,
>>> but I do care about the CPU's view of the DMA write order matching
>>> the hardware's: that is, if the CPU sees the value written by a
>>> given DMA cycle, then the CPU must also see the values written by
>>> all previous DMA cycles.
>> I'm not sure if that requirement is necessarily supported by hardware.  For $
> 
> Hm!  On such hardware, then, you can't count on any particular portion
> of a DMA transfer being visible until the whole transfer is finished?

Yes.  I'm assuming here that the driver would do a data cache invalidate
(for the address range, if possible) at DMA end.  Given that, during the
transfer you would see pieces that weren't in the cache before, and would
not see pieces for which there are cache hits.

> For my purposes, unelss amd64 is such a platform, I'm willing to write
> off portability to such machines.  Is there any way to detect them from
> within the driver?  I could just ignore the issue, but I'd prefer to
> give an error at attach time.

I don't know much about x86 style platforms.  An example of the sort of
platform I mentioned would be the MIPS R5000.  I still have some scars
from building a fast router on top of its incoherent DMA...

paul


Re: Access to DMA memory while DMA in progress?

2017-10-27 Thread Paul.Koning

> On Oct 27, 2017, at 9:38 AM, Mouse  wrote:
> 
> ...
> I would like to read the DMA buffer while DMA is still going on.  That
> is, I have a buffer of (say) 64K and the hardware is busily writing
> into it; I want to read the buffer and see what the hardware has
> written in the memory it has written and what used to be there in the
> memory it hasn't.  I'm fine if the CPU's view lags the hardware's view
> slightly, but I do care about the CPU's view of the DMA write order
> matching the hardware's: that is, if the CPU sees the value written by
> a given DMA cycle, then the CPU must also see the values written by all
> previous DMA cycles. 

I'm not sure if that requirement is necessarily supported by hardware.  For 
example, in machines that have incoherent DMA, I would think it isn't.

paul



Re: how to tell if a process is 64-bit

2017-09-10 Thread Paul.Koning

> On Sep 10, 2017, at 10:31 AM, Thor Lancelot Simon  wrote:
> 
> On Fri, Sep 08, 2017 at 07:38:24AM -0400, Mouse wrote:
>>> In a cross-platform process utility tool the question came up how to
>>> decide if a process is 64-bit.
>> 
>> First, I have to ask: what does it mean to say that a particular
>> process is - or isn't - 64-bit?
> 
> I think the only simple answer is "it is 64-bit in the relevant sense if
> it uses the platform's 64-bit ABI for interaction with the kernel".
> 
> This actually raises a question for me about MIPS: do we have another
> process flag to indicate O32 vs. N32, or can we simply not run O32
> executables on 64-bit or N32 kernels (surely we don't use the O32 ABI
> for all kernel interaction by 32-bit processes)?

MIPS has four ABIs, if you include "O64".  Whether a particular OS allows
all four concurrently is another matter; it isn't clear that would make
sense.  Mixing "O" and "N" ABIs is rather messy.

Would you call N32 a 64-bit ABI?  It has 64 bit registers, so if a value
is passed to the kernel in a register it comes across as 64 bits.  But it
has 32 bit addresses.

paul



Re: how to tell if a process is 64-bit

2017-09-08 Thread Paul.Koning

> On Sep 8, 2017, at 4:00 PM, matthew green  wrote:
> 
>> Is the answer "it's using an ISA with 64-bit registers and addresses"?
>> This actually can be broken down into the "registers" and "addresses"
>> portion, but, in practice, the two tend to go together.  (Always true
>> on most "64-bit" ports, a real question on amd64 (and others, if any)
>> which support 32-bit userland.)
> 
> actually -- our mips64 ports largely use N32 userland, which
> is 64 bit registers and 32 bit addresses.  this is also what
> linux calls "x32" for x86 platforms.  obviously, this does
> require a 64 bit cpu.

That's why I asked "what does 'is 64-bit' mean".  Your previous reference to 
LP64 answers the question "does this program use 64 bit addresses".  There are 
at least two other possible questions: (a) does this program have access to 64 
bit registers, and (b) can this program do operations such as arithmetic on 64 
bit integers.  (a) presumably implies (b) but the two are not equivalent.  For 
example, N32 is (a) and (b) but O32 -- the old "mips32" port -- is (b) but not 
(a).

paul



Re: Exposing FUA as alternative to DIOCCACHESYNC for WAPBL

2017-03-31 Thread Paul.Koning

> On Mar 31, 2017, at 4:16 PM, Thor Lancelot Simon  wrote:
> 
> On Fri, Mar 31, 2017 at 07:16:25PM +0200, Jarom??r Dole??ek wrote:
>>> The problem is that it does not always use SIMPLE and ORDERED tags in a
>>> way that would facilitate the use of ORDERED tags to enforce barriers.
>> 
>> Our scsipi layer actually never issues ORDERED tags right now as far
>> as I can see, and there is currently no interface to get it set for an
>> I/O.
> 
> It's not obvious, but in fact ORDERED gets set for writes
> as a default, I believe -- in sd.c, I think?

Why would you do that?  I don't know that as standard SCSI practice, and it 
seems like a recipe for slow performance.

paul



Re: Restricting rdtsc [was: kernel aslr]

2017-03-28 Thread Paul.Koning

> On Mar 28, 2017, at 2:37 PM, Taylor R Campbell 
>  wrote:
> 
> [EXTERNAL EMAIL]
> 
>> Date: Tue, 28 Mar 2017 16:58:58 +0200
>> From: Maxime Villard 
>> 
>> Having read several papers on the exploitation of cache latency to defeat
>> aslr (kernel or not), it appears that disabling the rdtsc instruction is a
>> good mitigation on x86. However, some applications can legitimately use it,
>> so I would rather suggest restricting it to root instead.
> 
> Put barriers in the way of legitimate applications to thwart
> hypothetical attackers who will... step around them and use another
> time source, of which there are many options in the system?  This
> sounds more like cutting off the nose to spite the face than a good
> mitigation against real attacks.

More in general, it seems to me that the answer to timing attacks is not to 
attempt to make timing information unavailable (which is not doable, as has 
been explained already) -- but rather to fix the algorithm to remove the 
vulnerability.

paul



Re: "Wire" definitions and __packed

2016-10-06 Thread Paul.Koning

> On Oct 6, 2016, at 2:01 PM, Joerg Sonnenberger  wrote:
> 
> On Fri, Oct 07, 2016 at 04:59:30AM +1100, matthew green wrote:
>> John Nemeth writes:
>>> On Oct 6,  3:01pm, matthew green wrote:
>>> }
>>> } >  X86 doesn't have alignment restrictions.  The platform
>>> } > practically lets you get away with murder, and thus is not useful
>>> } > as a test platform.
>>> } 
>>> } FWIW, this hasn't been true since at least 1999 (SSE.)  also,
>>> 
>>> That only counts if somebody is using SSE, and I highly doubt
>>> that dhcpcd does.
>> 
>> GCC will emit SSE code even if you don't explicitly use them.
> 
> Like for inlined memset or memcpy...

Still, though, the original comment is largely valid: you can't do meaningful 
testing of changes that affect alignment on an x86 system, because for the most 
part it doesn't care.  (The same goes for various other CISC machines such as 
VAX.)  Also, structure padding is different for x86 than for most RISC machines.

The trouble with making a change in fundamental machinery and then doing a 
"test it to see if it breaks" is that this only exposes issues in the code 
paths that happened to be touched by the particular test.

paul



Re: Plan: journalling fixes for WAPBL

2016-09-28 Thread Paul.Koning

> On Sep 28, 2016, at 7:22 AM, Jaromír Doleček  
> wrote:
> 
> I think it's far assesment to say that on SATA with NCQ/31 tags (max
> is actually 31, not 32 tags), it's pretty much impossible to have
> acceptable write performance without using write cache. We could never
> saturate even drive with 16MB cache with just 31 tags and 64k maxphys.
> So it's IMO not useful to design for world without disk drive write
> cache.

I think that depends on the software.  In a SAN storage array I work on, we 
used to use SATA drives, always with cache disabled to avoid data loss due to 
power failure.  We had them running just fine with NCQ.  (For that matter, even 
without NCQ, though that takes major effort.)

So perhaps an optimization effort is called for, if people view this 
performance issue as worth the trouble.  Or you might decide that for 
performance SAS is the answer, and SATA is only for non-critical applications.

paul



Re: Plan: journalling fixes for WAPBL

2016-09-23 Thread Paul.Koning

> On Sep 23, 2016, at 10:51 AM, Warner Losh  wrote:
> 
> On Fri, Sep 23, 2016 at 7:38 AM, Thor Lancelot Simon  wrote:
>> On Fri, Sep 23, 2016 at 11:47:24AM +0200, Manuel Bouyer wrote:
>>> On Thu, Sep 22, 2016 at 09:33:18PM -0400, Thor Lancelot Simon wrote:
> AFAIK ordered tags only guarantees that the write will happen in order,
> but not that the writes are actually done to stable storage.
 
 The target's not allowed to report the command complete unless the data
 are on stable storage, except if you have write cache enable set in the
 relevant mode page.
 
 If you run SCSI drives like that, you're playing with fire.  Expect to get
 burned.  The whole point of tagged queueing is to let you *not* set that
 bit in the mode pages and still get good performance.
>>> 
>>> Now I remember that I did indeed disable disk write cache when I had
>>> scsi disks in production. It's been a while though.
>>> 
>>> But anyway, from what I remember you still need the disk cache flush
>>> operation for SATA, even with NCQ. It's not equivalent to the SCSI tags

Re: FUA and TCQ (was: Plan: journalling fixes for WAPBL)

2016-09-23 Thread Paul.Koning

> On Sep 23, 2016, at 5:49 AM, Edgar Fuß  wrote:
> 
>> The whole point of tagged queueing is to let you *not* set [the write 
>> cache] bit in the mode pages and still get good performance.
> I don't get that. My understanding was that TCQ allowed the drive to re-order 
> commands within the bounds described by the tags. With the write cache 
> disabled, all write commands must hit stable storage before being reported 
> completed. So what's the point of tagging with cacheing disabled?

I'm not sure.  But I have the impression that in the real world tagging is 
rarely, if ever, used.

paul