Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-02-27 Thread Dr. Greg Wettstein
On Thu, Feb 16, 2017 at 09:04:47AM -0500, Ken Goldman wrote:

Good morning to everyone, leveraging some time between planes.

> On 2/14/2017 9:38 AM, Dr. Greg Wettstein wrote:
> >
> >I don't think there is any doubt that running cryptographic primitives
> >in userspace is going to be faster then going to hardware.  Obviously
> >that also means there is no need for a TPM resource manager which has
> >been the subject of much discussion here.

> I don't understand that comment.
>
> The resource manager schedules user space access to the TPM.  It also
> handles swapping of objects in and out of the limited number of
> TPM slots.
> 
> Without a RM, either you'd have to permit only a single TPM connection,
> blocking all other connections, or you'd have different connections
> interfering with each other.

Yes, if multiple contexts of execution require access to the TPM a
resource manager is needed to arbitrate that access.

I think, however, that we are talking past one another a bit.

We design and build systems which implement autonomous
self-regulation.  As such we need a hardware based confirmation that
the machine is in a given behavioral state.  This requires that we
reference a hardware root of trust, ie. the TPM.

Depending on the assurance granularity requirements, that may mean a
high rate of TPM verifications.  When I noticed you and James talking
about 'cloud based' levels of transactions I was assuming you were
operating at transaction rates we build for, ie. 10-100's/second.
That didn't seem feasible given our hardware measurements on Skylake
and Kabylake based systems.

James had cited the CoreOS/Tectonic white paper as an example of TPM's
working at cloud scale.  Our conversation to date seems to indicate
that the accepted modality of security appers to be to do userspace
verification of container signatures.  Given the extensive dialogue in
the paper about using TPM's for security we had inadvertently believed
that container verifications were being pinned to current platform
status which didn't correlate with expected container start time
latencies.

Our behavioral assessment code is namespaced so a supervisory system
can make statements about the behavior of a container.  We have
concluded the only way that is possible is to use userspace TPM
implementations which can meet the necessary latency requirements.

Our point in all this is that it doesn't seem to make any sense to
implement anything in the kernel more then basic resource management.
If other 'virtualization' is needed, such as session state management
and the like, the community would seem to be served better by having a
solid userspace simulation environment, with appropriate hardware
security guarantees.  That would serve needs like re-keying support
for VPNaaS applications as well as high transaction rate environments,
ie. why load the kernel with code to virtualize a resource when a
'user' can just be given its own TPM2 instance.

Just as an aside, has anyone given any thought about TPM2 resource
management in things like TXT/tboot environments?  The current tboot
code makes a rather naive assumption that it can take a handle slot to
protect its platform verification secret.  Doing resource management
correctly will require addressing extra-OS environments such as this
which may have TPM2 state requirement issues.

Our take away from all this is that it doesn't seem that we need to
worry about the fact that someone may have invented TPM2 hardware
which is faster then what we are developing on :-)

Have a good weekend.

Greg

As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.   Specializing in information infra-structure
Fargo, ND  58102development.
PH: 701-281-1686
FAX: 701-281-3949   EMAIL: g...@enjellic.com
--
"If you ever teach a yodeling class, probably the hardest thing is to
 keep the students from just trying to yodel right off. You see, we build
 to that."
-- Jack Handey
   Deep Thoughts

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tpmdd-devel mailing list
tpmdd-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tpmdd-devel


Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-02-27 Thread Dr. Greg Wettstein
On Fri, Feb 17, 2017 at 02:37:12PM +0200, Jarkko Sakkinen wrote:

Hi, I hope the week is ending well for everyone.

> On Fri, Feb 17, 2017 at 03:56:26AM -0600, Dr. Greg Wettstein wrote:
> > On Thu, Feb 16, 2017 at 10:33:04PM +0200, Jarkko Sakkinen wrote:
> > 
> > Good morning to everyone.
> > 
> > > On Thu, Feb 16, 2017 at 02:06:42PM -0600, Dr. Greg Wettstein wrote:
> > > > Just as an aside, has anyone given any thought about TPM2 resource
> > > > management in things like TXT/tboot environments?  The current tboot
> > > > code makes a rather naive assumption that it can take a handle slot to
> > > > protect its platform verification secret.  Doing resource management
> > > > correctly will require addressing extra-OS environments such as this
> > > > which may have TPM2 state requirement issues.
> > 
> > > The current implementation handles stuff created from regular
> > > /dev/tpm0 so I do not think this would be an issue. You can only
> > > access objects from a TPM space that are created within that space.
> > 
> > Unless I misunderstand the number of transient objects which can be
> > managed is a characteristic of the hardware and is a limited resource,
> > hence our discussion on the notion of a resource manager to shuttle
> > context in and out of these limited slots.
> > 
> > On a Kabylake system, running the following command:
> > 
> > getcapability -cap 6 | grep trans
> > 
> > After booting into a TXT mediated measured launch environment (MLE) yields
> > the following:
> > 
> > TPM_PT 010e value 0003 TPM_PT_HR_TRANSIENT_MIN - the minimum number 
> > of transient objects that can be held in TPM RAM
> > 
> > TPM_PT 0207 value 0002 TPM_PT_HR_TRANSIENT_AVAIL - estimate of the 
> > number of additional transient objects that could be loaded into TPM RAM
> > 
> > Booting without TXT results in the getcapability call indicating that
> > three slots are available.  Based on that and reading the tboot code,
> > we are assuming the occupied slot is the ephemeral primary key
> > generated by tboot which seals the verification secret.
> > 
> > In an MLE it is possible to create and then flush a new ephemeral
> > primary key which results in the following getcapability output:
> > 
> > TPM_PT 0207 value 0003 TPM_PT_HR_TRANSIENT_AVAIL - estimate of
> > the number of additional transient objects that could be loaded into TPM RAM
> > 
> > Which is probably going to be pretty surprising to tboot in the event
> > that it tries to re-verify the system state after a suspend event.
> > 
> > So based on that it would seem there would need to be some semblance
> > of cooperation between the resource manager and an extra-OS
> > utilization of TPM2 resources such as tboot.
> > 
> > Thoughts?

> The driver swaps in and out all the objects for one send-receive
> cycle.  So unless the driver is sending a command to a TPM the
> resource manager occupies zero slots. I do not see reason for
> forseeable future to change this pattern.
>
> I discussed about some "lazier" schemes for swapping with James an
> Ken in the early Fall but came into conclusion that it would make
> the RM really complicated. There would have to be something show
> stopper work load to even to start consider it.
>
> With the capacity of current TPMs and amount of traffic and
> workloads it is really not a worth of the trouble.
>
> I guess the way we do swapping kind of indirectly sorts out the
> issue you described, doesn't it?

I'm not sure, we've pulled down your resource manager branch so we can
figure out the exact mechanics of how it works.  Based on a cursory
read of the code it appears as if it loops through all three transient
handle slots and attempts to context save each transient object it
finds.  So if it does that for each send/receive cycle it should
theoretically inter-operate with TXT/tboot.

As noted previously, with the current kernel driver, we can see that
tboot has allocated a slot for the ephemeral key which is used to seal
the memory verification secrets.  This key gets allocated to handle
8000 as one would anticipate.  However when we attempt to issue a
context save against that handle we get an error.

Interestingly, when we attempt to flush that handle manually we
receive an error as well, but the number of available transient
handles increases by one which suggests the context flush cleared the
slot.

It seems that we should be able to manually replicate what the
resource manager is doing with the standard kernel driver or is this
an incorrect assumption?

We will have to spin up a kernel with your patches and see how it
reacts to the presence of the extra-OS handle allocation.

> /Jarkko

Greg

As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.   Specializing in information infra-structure
Fargo, ND  58102development.
PH: 701-281-1686
FAX: 701-281-3949   EMAIL: g...@enjellic.com

Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-02-27 Thread Dr. Greg Wettstein
On Thu, Feb 16, 2017 at 10:33:04PM +0200, Jarkko Sakkinen wrote:

Good morning to everyone.

> On Thu, Feb 16, 2017 at 02:06:42PM -0600, Dr. Greg Wettstein wrote:
> > Just as an aside, has anyone given any thought about TPM2 resource
> > management in things like TXT/tboot environments?  The current tboot
> > code makes a rather naive assumption that it can take a handle slot to
> > protect its platform verification secret.  Doing resource management
> > correctly will require addressing extra-OS environments such as this
> > which may have TPM2 state requirement issues.

> The current implementation handles stuff created from regular
> /dev/tpm0 so I do not think this would be an issue. You can only
> access objects from a TPM space that are created within that space.

Unless I misunderstand the number of transient objects which can be
managed is a characteristic of the hardware and is a limited resource,
hence our discussion on the notion of a resource manager to shuttle
context in and out of these limited slots.

On a Kabylake system, running the following command:

getcapability -cap 6 | grep trans

After booting into a TXT mediated measured launch environment (MLE) yields
the following:

TPM_PT 010e value 0003 TPM_PT_HR_TRANSIENT_MIN - the minimum number of 
transient objects that can be held in TPM RAM

TPM_PT 0207 value 0002 TPM_PT_HR_TRANSIENT_AVAIL - estimate of the 
number of additional transient objects that could be loaded into TPM RAM

Booting without TXT results in the getcapability call indicating that
three slots are available.  Based on that and reading the tboot code,
we are assuming the occupied slot is the ephemeral primary key
generated by tboot which seals the verification secret.

In an MLE it is possible to create and then flush a new ephemeral
primary key which results in the following getcapability output:

TPM_PT 0207 value 0003 TPM_PT_HR_TRANSIENT_AVAIL - estimate of
the number of additional transient objects that could be loaded into TPM RAM

Which is probably going to be pretty surprising to tboot in the event
that it tries to re-verify the system state after a suspend event.

So based on that it would seem there would need to be some semblance
of cooperation between the resource manager and an extra-OS
utilization of TPM2 resources such as tboot.

Thoughts?

> /Jarkko

Greg

As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.   Specializing in information infra-structure
Fargo, ND  58102development.
PH: 701-281-1686
FAX: 701-281-3949   EMAIL: g...@enjellic.com
--
"For a successful technology, reality must take precedence over public
 relations, for nature cannot be fooled."
-- Richard Feynmann

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tpmdd-devel mailing list
tpmdd-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tpmdd-devel


Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-02-17 Thread Jarkko Sakkinen
On Fri, Feb 17, 2017 at 03:56:26AM -0600, Dr. Greg Wettstein wrote:
> On Thu, Feb 16, 2017 at 10:33:04PM +0200, Jarkko Sakkinen wrote:
> 
> Good morning to everyone.
> 
> > On Thu, Feb 16, 2017 at 02:06:42PM -0600, Dr. Greg Wettstein wrote:
> > > Just as an aside, has anyone given any thought about TPM2 resource
> > > management in things like TXT/tboot environments?  The current tboot
> > > code makes a rather naive assumption that it can take a handle slot to
> > > protect its platform verification secret.  Doing resource management
> > > correctly will require addressing extra-OS environments such as this
> > > which may have TPM2 state requirement issues.
> 
> > The current implementation handles stuff created from regular
> > /dev/tpm0 so I do not think this would be an issue. You can only
> > access objects from a TPM space that are created within that space.
> 
> Unless I misunderstand the number of transient objects which can be
> managed is a characteristic of the hardware and is a limited resource,
> hence our discussion on the notion of a resource manager to shuttle
> context in and out of these limited slots.
> 
> On a Kabylake system, running the following command:
> 
> getcapability -cap 6 | grep trans
> 
> After booting into a TXT mediated measured launch environment (MLE) yields
> the following:
> 
> TPM_PT 010e value 0003 TPM_PT_HR_TRANSIENT_MIN - the minimum number 
> of transient objects that can be held in TPM RAM
> 
> TPM_PT 0207 value 0002 TPM_PT_HR_TRANSIENT_AVAIL - estimate of the 
> number of additional transient objects that could be loaded into TPM RAM
> 
> Booting without TXT results in the getcapability call indicating that
> three slots are available.  Based on that and reading the tboot code,
> we are assuming the occupied slot is the ephemeral primary key
> generated by tboot which seals the verification secret.
> 
> In an MLE it is possible to create and then flush a new ephemeral
> primary key which results in the following getcapability output:
> 
> TPM_PT 0207 value 0003 TPM_PT_HR_TRANSIENT_AVAIL - estimate of
> the number of additional transient objects that could be loaded into TPM RAM
> 
> Which is probably going to be pretty surprising to tboot in the event
> that it tries to re-verify the system state after a suspend event.
> 
> So based on that it would seem there would need to be some semblance
> of cooperation between the resource manager and an extra-OS
> utilization of TPM2 resources such as tboot.
> 
> Thoughts?

The driver swaps in and out all the objects for one send-receive cycle.
So unless the driver is sending a command to a TPM the resource manager
occupies zero slots. I do not see reason for forseeable future to change
this pattern.

I discussed about some "lazier" schemes for swapping with James an Ken
in the early Fall but came into conclusion that it would make the RM
really complicated. There would have to be something show stopper work
load to even to start consider it.

With the capacity of current TPMs and amount of traffic and workloads
it is really not a worth of the trouble.

I guess the way we do swapping kind of indirectly sorts out the issue
you described, doesn't it?

/Jarkko

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tpmdd-devel mailing list
tpmdd-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tpmdd-devel


Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-02-16 Thread Jarkko Sakkinen
On Thu, Feb 16, 2017 at 02:06:42PM -0600, Dr. Greg Wettstein wrote:
> Just as an aside, has anyone given any thought about TPM2 resource
> management in things like TXT/tboot environments?  The current tboot
> code makes a rather naive assumption that it can take a handle slot to
> protect its platform verification secret.  Doing resource management
> correctly will require addressing extra-OS environments such as this
> which may have TPM2 state requirement issues.

The current implementation handles stuff created from regular /dev/tpm0
so I do not think this would be an issue. You can only access objects
from a TPM space that are created within that space.

/Jarkko

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tpmdd-devel mailing list
tpmdd-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tpmdd-devel


Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-02-14 Thread James Bottomley
On Tue, 2017-02-14 at 08:38 -0600, Dr. Greg Wettstein wrote:
> On Fri, Feb 10, 2017 at 04:13:05PM -0500, Kenneth Goldman wrote:
> 
> Good morning to everyone.
> 
> > James Bottomley  wrote on 
> > 02/10/2017 11:46:03 AM:
> > 
> > > > quote: 810 milliseconds
> > > > verify signature: 635 milliseconds
> 
> For those who may be interested in this sort of thing I grabbed a few
> minutes and ran these basic verification primitives against a Kaby
> Lake system.
> 
> Average time for a quote is 600 milliseconds with a signature
> verification clocking in at 100 milliseconds.  The latter is
> consistent with what James found on his Skylake machine.
> 
> Latencies are still significant with things like container start
> times.
> 
> > > Part of the way of reducing the latency is not to use the TPM for
> > > things that don't require secrecy: 
> 
> > Agreed.  There are a few times one would verify a signature inside 
> > the TPM, but they're far from mainstream:
> > 
> > 1 - Early in the boot cycle, when there's no crypto library.
> > 
> > 2 - When the crypto library doesn't support the required algorithm.
> > 
> > 3 - When a ticket is needed to prove to the TPM later that it
> > verified
> > the signature.
> 
> I don't think there is any doubt that running cryptographic 
> primitives in userspace is going to be faster then going to hardware.
>   Obviously that also means there is no need for a TPM resource 
> manager which has been the subject of much discussion here.

That's a bit of a non-sequitur.  Ken's and my point was that although
you could run every crypto operation through the TPM, you don't (as you
say, because it's too slow), so you carefully select the ones that
preserve the confidentiality you're looking for.  To take the VPNaaS
use case again: the key material you're protecting is the client
identity key, so the only crypto operation you run through the TPM is
creation of the TLS client certificate verification signature. 
 Everything else, including the server certificate signature 
 verification, the symmetric key agreement and all the symmetric
encryption operations, you keep in userspace.  That means that instead
of requiring thousands of crypto operations per second from the TPM,
you basically require about one per hour per VPNaaS instance.

We need a RM because without one, given the constraints of TPM2, as few
as two VPNaaS instances can cause a resource exhaustion failure.

James

> The CoreOS paper makes significant reference to increased security
> guarantees inherent in the use of a TPM.  Obviously whatever uses
> those are will have the noted latency constraints.
> 
> We have extended our behavior measurement verifications to the
> container level so we offer an explicit guarantee that a container 
> has not operated in a manner which is inconsistent with the intent of 
> its designer.  Getting the security guarantee we need requires that 
> an linkage to a hardware root of trust hence our concerns about 
> hardware latency.
> 
> Have a good day.
> 
> As always,
> Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
> 4206 N. 19th Ave.   Specializing in information infra
> -structure
> Fargo, ND  58102development.
> PH: 701-281-1686
> FAX: 701-281-3949   EMAIL: g...@enjellic.com
> -
> -
> "UNIX is simple and coherent, but it takes a genious (or at any rate,
>  a programmer) to understand and appreciate its simplicity."
> -- Dennis Ritchie
>USENIX '87
> 


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tpmdd-devel mailing list
tpmdd-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tpmdd-devel


Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-02-12 Thread Ken Goldman
On 2/10/2017 11:46 AM, James Bottomley wrote:
> On Fri, 2017-02-10 at 04:03 -0600, Dr. Greg Wettstein wrote:
>> On Feb 9, 11:24am, James Bottomley wrote:

>> quote: 810 milliseconds
>> verify signature: 635 milliseconds
> ...
>
> Part of the way of reducing the latency is not to use the TPM for
> things that don't require secrecy: container signature verification is
> one such because the container is signed with a private key to which
> ...

Agreed.  There are a few times one would verify a signature inside the 
TPM, but they're far from mainstream:

1 - Early in the boot cycle, when there's no crypto library.

2 - When the crypto library doesn't support the required algorithm.

3 - When a ticket is needed to prove to the TPM later that it verified
the signature.



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tpmdd-devel mailing list
tpmdd-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tpmdd-devel


Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-02-11 Thread Kenneth Goldman
On Thu, Feb 09, 2017 at 12:04:26PM -0700, Jason Gunthorpe wrote:
Jarkko Sakkinen  wrote on 02/10/2017
03:48:37 AM:

> > This series should focus on allowing a user space RM to co-exist with
> > the in-kernel services - lets try and tackle the idea of a
> > policy-restricted or unpriv-safe cdev when someone comes up with a
> > comprehensive proposal..

First, does "coexist" mean in series (two layers of RM) or in parallel
(both
have simultaneous access).

Or does "in-kernel services" not include an RM?

Assuming in series, it will complicate the lower RM.  The main issue,
as always, is session context.  If you permit the upper RM to save context,
the lower RM has to track the mapping, because the lower layer can
alter the saved session context (regapping).
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
tpmdd-devel mailing list
tpmdd-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tpmdd-devel


Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-02-10 Thread Kenneth Goldman
> > quote: 810 milliseconds
> > verify signature: 635 milliseconds
> 
> Part of the way of reducing the latency is not to use the TPM for
> things that don't require secrecy: 

Agreed.  There are a few times one would verify a signature inside the 
TPM,
but they're far from mainstream:

1 - Early in the boot cycle, when there's no crypto library.

2 - When the crypto library doesn't support the required algorithm.

3 - When a ticket is needed to prove to the TPM la

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
tpmdd-devel mailing list
tpmdd-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tpmdd-devel


Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-02-10 Thread Kenneth Goldman
James Bottomley  wrote on 
02/10/2017 11:46:03 AM:

> > quote: 810 milliseconds
> > verify signature: 635 milliseconds
> 
> Part of the way of reducing the latency is not to use the TPM for
> things that don't require secrecy: 

Agreed.  There are a few times one would verify a signature inside the 
TPM,
but they're far from mainstream:

1 - Early in the boot cycle, when there's no crypto library.

2 - When the crypto library doesn't support the required algorithm.

3 - When a ticket is needed to prove to the TPM later that it verified
the signature.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
tpmdd-devel mailing list
tpmdd-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tpmdd-devel


Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-02-10 Thread Kenneth Goldman
> > It does. My trusted keys implementation actually uses sessions.
>
> But as I read the code, I can't find where the kernel creates a
> session.  It looks like the session and hmac are passed in as option
> arguments, aren't they?

A bit of background.

In TPM 1.2, any authorization needed a session and an HMAC.

In TPM 2.0, authorization can be done using a plaintext password
(optionally) rather than an HMAC.  To me, kernel authorization
is a good use case for a plaintext password, since there is a
trusted path to the TPM.

When using a plaintext password, the caller does not require
startauthsession.  There is a special handle number that means
"plaintext password, no HMAC".  It's always available, and does
not occupy a session slot.

However, for the future ...

TPM 2.0 also has policy sessions.  E.g., use of the EK requires
a policy.

If the kernel ever wants to use policy, it needs startauthsession.

That's why I'm thinking that perhaps the space code should just
reserve ~2 sessions for it's own use, so it never blocks
because user space has occupied all the session slots.
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
tpmdd-devel mailing list
tpmdd-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tpmdd-devel


Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-02-10 Thread James Bottomley
On Fri, 2017-02-10 at 04:03 -0600, Dr. Greg Wettstein wrote:
> On Feb 9, 11:24am, James Bottomley wrote:
> } Subject: Re: [tpmdd-devel] [RFC] tpm2-space: add handling for
> global sessi
> 
> Good morning to everyone.

Is there any way you could fix your email client?  It's setting In
-Reply-To: headers like this

In-reply-to: James Bottomley <james.bottom...@hansenpartnership.com> "Re: 
[tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion" 
(Feb  9, 11:24am)

Not using the message id breaks threading for everyone.

> > On Thu, 2017-02-09 at 03:06 -0600, Dr. Greg Wettstein wrote:
> > > Referring back to Ken's comments about having 20+ clients waiting
> > > to
> > > get access to the hardware.  Even with the focus in TPM2 on
> > > having it
> > > be more of a cryptographic accelerator are we convinced that the
> > > hardware is ever going to be fast enough for a model of having it
> > > directly service large numbers of transactions in something like
> > > a
> > > 'cloud' model?
> 
> > It's already in use as such today:
> > 
> > https://tectonic.com/assets/pdf/TectonicTrustedComputing.pdf
> 
> We are familiar with this work.  I'm not sure, however, that this 
> work is representative of the notion of using TPM hardware to support 
> a transactional environment, particularly at the cloud/container
> level.

It allows for cloud clients to request attestations.  The next step is
to allow containers to provision key material and PCR locked blobs
securely to the TPM for use by correctly attested containers all of
those are cloud scale use cases.

> There is not a great deal of technical detail on the CoreOS integrity
> architecture but it appears they are using TPM hardware to validate
> container integrity.  I'm not sure this type of environment reflects
> the ability of TPM hardware to support transactional throughputs in 
> an environment such as financial transaction processing.

OK, so in the cloud neither key provisioning nor attestation has a huge
latency requirement.  This appears to be your concern?  All I'd say is
that the fact that there are use cases that can work at cloud scale
doesn't mean that every use case can.

> Intel's Clear Container work cites the need to achieve container
> startup times of 150 milliseconds and they are currently claiming 45
> milliseconds as their optimal time.  This work was designed to
> demonstrate the feasibility of providing virtual machine isolation
> guarantees to containers and as such one of the mandates was to
> achieve container start times comparable to standard namespaces.

There are ephemeral container use cases where the lifetimes are of this
order, but they're not every use case (In fact, even in the devops
environment, they're still a minority).

> I ran some very rough timing metrics on one of our Skylake
> development systems with hardware TPM2 support.  Here are the elapsed
> times for two common verification operations which I assume would be
> at the heart of generating any type of reasonable integrity
> guarantee:
> 
> quote: 810 milliseconds
> verify signature: 635 milliseconds

That's interesting, my Skylake system has these figures down around
100ms or so ... however, I agree that 100ms is the order of this. 
 Which is still significant compared to container start times.

> This is with the verifying key loaded into the chip.  The elapsed
> time to load and validate a key into the chip averages 1200
> milliseconds. Since we are discussing a resource manager which would
> be shuttling context into and out of the limited resource slots on
> the chip I believe it is valid to consider this overhead as well.
> 
> This suggests that just a signature verification on the integrity of 
> a container is a factor of 4.2 times greater then a well accepted 
> start time metric for container technology.

Part of the way of reducing the latency is not to use the TPM for
things that don't require secrecy: container signature verification is
one such because the container is signed with a private key to which
you know the public component ... you can verify it on the host without
needing to trouble the TPM.  We only use the TPM for state quotes,
unsealing and signature generation.

> Based on that I'm assuming that if TPM based integrity guarantees are
> being implemented they are only on ingress of the container into the
> cloud environment.  I'm assuming an alternate methodology must be in
> place to protect against time of measurement/time of use issues.
> 
> Maybe people have better TPM2 hardware then what we have.  I was 
> going to run this on a Kaby Lake reference system but it appears that 
> TXT is causing some type of context depletion problems which we we 
> need to r

Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-02-10 Thread Dr. Greg Wettstein
On Feb 9, 11:24am, James Bottomley wrote:
} Subject: Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global sessi

Good morning to everyone.

> On Thu, 2017-02-09 at 03:06 -0600, Dr. Greg Wettstein wrote:
> > Referring back to Ken's comments about having 20+ clients waiting to
> > get access to the hardware.  Even with the focus in TPM2 on having it
> > be more of a cryptographic accelerator are we convinced that the
> > hardware is ever going to be fast enough for a model of having it
> > directly service large numbers of transactions in something like a
> > 'cloud' model?

> It's already in use as such today:
> 
> https://tectonic.com/assets/pdf/TectonicTrustedComputing.pdf

We are familiar with this work.  I'm not sure, however, that this work
is representative of the notion of using TPM hardware to support a
transactional environment, particularly at the cloud/container level.

There is not a great deal of technical detail on the CoreOS integrity
architecture but it appears they are using TPM hardware to validate
container integrity.  I'm not sure this type of environment reflects
the ability of TPM hardware to support transactional throughputs in an
environment such as financial transaction processing.

Intel's Clear Container work cites the need to achieve container
startup times of 150 milliseconds and they are currently claiming 45
milliseconds as their optimal time.  This work was designed to
demonstrate the feasibility of providing virtual machine isolation
guarantees to containers and as such one of the mandates was to
achieve container start times comparable to standard namespaces.

I ran some very rough timing metrics on one of our Skylake development
systems with hardware TPM2 support.  Here are the elapsed times for
two common verification operations which I assume would be at the
heart of generating any type of reasonable integrity guarantee:

quote: 810 milliseconds
verify signature: 635 milliseconds

This is with the verifying key loaded into the chip.  The elapsed time
to load and validate a key into the chip averages 1200 milliseconds.
Since we are discussing a resource manager which would be shuttling
context into and out of the limited resource slots on the chip I
believe it is valid to consider this overhead as well.

This suggests that just a signature verification on the integrity of a
container is a factor of 4.2 times greater then a well accepted start
time metric for container technology.

Based on that I'm assuming that if TPM based integrity guarantees are
being implemented they are only on ingress of the container into the
cloud environment.  I'm assuming an alternate methodology must be in
place to protect against time of measurement/time of use issues.

Maybe people have better TPM2 hardware then what we have.  I was going
to run this on a Kaby Lake reference system but it appears that TXT is
causing some type of context depletion problems which we we need to
run down.

> We're also planning something like this in the IBM Cloud.

I assume if there is an expection of true transactional times you
either will have better hardware then current generation TPM2
technology.  Either that or I assume you will be using userspace
simulators anchored with a hardware TPM trust root.

Ken's reflection of having 21-22 competing transactions would appear
to have problematic latency issues given our measurements.

I influence engineering for a company which builds deterministically
modeled Linux platforms.  We've spent a lot of time considering TPM2
hardware bottlenecks since they constrain the rate at which we can
validate platform behavioral measurements.

We have a variation of this work which allows SGX OCALL's to validate
platform behavior in order to provide a broader TCB resource spectrum
to the enclave and hardware TPM performance is problematic there as
well.

> James

Have a good weekend.

Greg

}-- End of excerpt from James Bottomley

As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.   Specializing in information infra-structure
Fargo, ND  58102development.
PH: 701-281-1686
FAX: 701-281-3949   EMAIL: g...@enjellic.com
--
"After being a technician for 2 years, I've discovered if people took
 care of their health with the same reckless abandon as their computers,
 half would be at the kitchen table on the phone with the hospital, trying
 to remove their appendix with a butter knife."
-- Brian Jones

-- 

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tpmdd-devel mailing list
tpmdd-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tpmdd-devel


Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-02-10 Thread Jarkko Sakkinen
On Thu, Feb 09, 2017 at 12:04:26PM -0700, Jason Gunthorpe wrote:
> On Thu, Feb 09, 2017 at 05:19:22PM +0200, Jarkko Sakkinen wrote:
> > > userspace instance with subsequent relinquishment of privilege.  At
> > > that point one has the freedom to implement all sorts of policy.
> > 
> > If you look at the patch set that I sent yesterday it exactly has a
> > feature that makes it more lean for a privileged process to implement
> > a resource manager.
> 
> I continue to think, based on comments like this, that you should not
> implement tmps0 in the first revision either. That is also something
> we have to live with forever, and it can never become the 'policy
> limited' or 'unpriv safe' access point to the kernel.  ie go back to
> something based on tmp0 with ioctl.

With /dev/tpms0 I'm fairly certain that it is right way to go as it does
make sense to have it as close as being drop in replacement for
/dev/tpm0 as possible. There's factors more certainty that the API is
something that most people will like to have.

> This series should focus on allowing a user space RM to co-exist with
> the in-kernel services - lets try and tackle the idea of a
> policy-restricted or unpriv-safe cdev when someone comes up with a
> comprehensive proposal..

Sure. I do agree with this.

> > The current patch set does not define policy. The simple policy
> > addition that could be added soon is the limit of connections
> > because it is easy to implement in non-intrusive way.
> 
> It is also trivial for a userspace RM to limit the number of sessions
> or connections or otherwise to manage this limitation. It is hard to
> see why we'd need kernel support for this.
> 
> The main issue from the kernel perspecitive is how to allow sessions
> to be used in-kernel and continue to make progress when they start to
> run out.
> 
> Jason

This is an issue but in the current patch set there's nothing that would
make it harder to sort out.

/Jarkko

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tpmdd-devel mailing list
tpmdd-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tpmdd-devel


Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-02-09 Thread Jason Gunthorpe
On Thu, Feb 09, 2017 at 11:29:51AM -0800, James Bottomley wrote:
> On Thu, 2017-02-09 at 12:04 -0700, Jason Gunthorpe wrote:
> > On Thu, Feb 09, 2017 at 05:19:22PM +0200, Jarkko Sakkinen wrote:
> > > The current patch set does not define policy. The simple policy
> > > addition that could be added soon is the limit of connections
> > > because it is easy to implement in non-intrusive way.
> > 
> > It is also trivial for a userspace RM to limit the number of sessions
> > or connections or otherwise to manage this limitation. It is hard to
> > see why we'd need kernel support for this.
> 
> Because the kernel is a primary TPM user.

When I said 'this' I meant a kernel policy to limit the number of
user connections.

Jason

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tpmdd-devel mailing list
tpmdd-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tpmdd-devel


Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-02-09 Thread James Bottomley
On Thu, 2017-02-09 at 03:06 -0600, Dr. Greg Wettstein wrote:
> On Jan 30, 11:58pm, Jarkko Sakkinen wrote:
> } Subject: Re: [tpmdd-devel] [RFC] tpm2-space: add handling for
> global sessi
> 
> Good morning, I hope the day is going well for everyone.
> 
> > I'm kind dilating to an opinion that we would leave this commit out
> > from the first kernel release that will contain the resource 
> > manager with similar rationale as Jason gave me for whitelisting: 
> > get the basic stuff in and once it is used with some workloads 
> > whitelisting and exhaustion will take eventually the right form.
> > 
> > How would you feel about this?
> 
> I wasn't able to locate the exact context to include but we noted 
> with interest Ken's comments about his need to support a model where 
> a client needs a TPM session for transaction purposes which can last 
> a highly variable amount of time.  That and concerns about command
> white-listing, hardware denial of service and related issues tend to
> underscore our concerns about how much TPM resource management should
> go into the kernel.
> 
> Once an API is in the kernel we live with it forever.

This actually is far too strong a statement:  Once you make API
guarantees, you have to live with them forever, but there's a
considerable difference between an API guarantee and the API itself. 
 For instance the kernel overlay filesystem has gone through several
iterations of file whiteouts (showing a file as deleted above a read
only copy): we began with an inode flag, moved to an extended attribute
and finally ended up with a device.  Each of those three changes was
fairly radical to the VFS API, but didn't fundamentally alter the API
guarantee (that users wouldn't see a file after it was deleted on an
overlay).

The API guarantee /dev/tpms0 is adding is that you won't see TPM out of
memory errors based on what other people are doing, so I think it's a
simple isolation guarantee we can live with long term.  I think that's
a solidly defensible one.

However, right at the moment the guarantee isn't that you won't be
affcted by *anything* another user does, so it's a weak guarantee: you
will see uncorrectable regapping errors based on what others are doing
and you will see global session exhaustion.

I think we begin with the defensible weak guarantee and discuss how to
strengthen it.

James


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tpmdd-devel mailing list
tpmdd-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tpmdd-devel


Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-02-09 Thread James Bottomley
On Thu, 2017-02-09 at 12:04 -0700, Jason Gunthorpe wrote:
> On Thu, Feb 09, 2017 at 05:19:22PM +0200, Jarkko Sakkinen wrote:
> > The current patch set does not define policy. The simple policy
> > addition that could be added soon is the limit of connections
> > because it is easy to implement in non-intrusive way.
> 
> It is also trivial for a userspace RM to limit the number of sessions
> or connections or otherwise to manage this limitation. It is hard to
> see why we'd need kernel support for this.

Because the kernel is a primary TPM user.  We can't have the kernel
call on the in-userspace resource manager without causing a deadlock,
so we need as much of the RM as is needed to support the kernel in the
kernel itself.

James


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tpmdd-devel mailing list
tpmdd-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tpmdd-devel


Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-02-09 Thread Jason Gunthorpe
On Thu, Feb 09, 2017 at 05:19:22PM +0200, Jarkko Sakkinen wrote:
> > userspace instance with subsequent relinquishment of privilege.  At
> > that point one has the freedom to implement all sorts of policy.
> 
> If you look at the patch set that I sent yesterday it exactly has a
> feature that makes it more lean for a privileged process to implement
> a resource manager.

I continue to think, based on comments like this, that you should not
implement tmps0 in the first revision either. That is also something
we have to live with forever, and it can never become the 'policy
limited' or 'unpriv safe' access point to the kernel.  ie go back to
something based on tmp0 with ioctl.

This series should focus on allowing a user space RM to co-exist with
the in-kernel services - lets try and tackle the idea of a
policy-restricted or unpriv-safe cdev when someone comes up with a
comprehensive proposal..

> The current patch set does not define policy. The simple policy
> addition that could be added soon is the limit of connections
> because it is easy to implement in non-intrusive way.

It is also trivial for a userspace RM to limit the number of sessions
or connections or otherwise to manage this limitation. It is hard to
see why we'd need kernel support for this.

The main issue from the kernel perspecitive is how to allow sessions
to be used in-kernel and continue to make progress when they start to
run out.

Jason

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tpmdd-devel mailing list
tpmdd-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tpmdd-devel


Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-02-09 Thread Jarkko Sakkinen
On Thu, Feb 09, 2017 at 03:06:38AM -0600, Dr. Greg Wettstein wrote:
> On Jan 30, 11:58pm, Jarkko Sakkinen wrote:
> } Subject: Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global sessi
> 
> Good morning, I hope the day is going well for everyone.
> 
> > I'm kind dilating to an opinion that we would leave this commit out
> > from the first kernel release that will contain the resource manager
> > with similar rationale as Jason gave me for whitelisting: get the
> > basic stuff in and once it is used with some workloads whitelisting
> > and exhaustion will take eventually the right form.
> >
> > How would you feel about this?
> 
> I wasn't able to locate the exact context to include but we noted with
> interest Ken's comments about his need to support a model where a
> client needs a TPM session for transaction purposes which can last a
> highly variable amount of time.  That and concerns about command
> white-listing, hardware denial of service and related issues tend to
> underscore our concerns about how much TPM resource management should
> go into the kernel.
> 
> Once an API is in the kernel we live with it forever.  Particularly
> with respect to TPM2, our field experiences suggest it is way too
> early to bake long term functionality into the kernel.
> 
> Referring back to Ken's comments about having 20+ clients waiting to
> get access to the hardware.  Even with the focus in TPM2 on having it
> be more of a cryptographic accelerator are we convinced that the
> hardware is ever going to be fast enough for a model of having it
> directly service large numbers of transactions in something like a
> 'cloud' model?

I doubt it. Personally I would rather just limit the number of
connections to /dev/tpms0 than have a complex lease model (like one
implemented in this commit). That could have '0' setting, which would
disable it so that it doesn't cause harm to those who do not need it.

> The industry has very solid userspace implementations of TPM2.  It
> seems that with respect to resource management about all we would want
> in the kernel is enough management to allow multiple privileged
> userspace process to establish a root of trust for a TPM2 based
> userspace instance with subsequent relinquishment of privilege.  At
> that point one has the freedom to implement all sorts of policy.

If you look at the patch set that I sent yesterday it exactly has a
feature that makes it more lean for a privileged process to implement
a resource manager.

> Given the potential lifespan of these security technologies I think a
> kernel design needs to factor in the availability of trusted execution
> environment's such as SGX as well.  Politics aside, such environments
> do have the ability to significantly modify the guarantees which can
> be afforded to architectural models which focus on using the hardware
> TPM as a root of trust for userspace implementations of 'TPM'
> functionality and policy.

Agreed.

> We can always add functionality to the kernel but we can never
> subtract.  It is way too early to lock security architecture decisions
> into the kernel.

The current patch set does not define policy. The simple policy
addition that could be added soon is the limit of connections
because it is easy to implement in non-intrusive way.

> 
> > /Jarkko
> 
> Have a good weekend.
> 
> Greg

Likewise!

/Jarkko

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tpmdd-devel mailing list
tpmdd-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tpmdd-devel


Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-02-09 Thread Dr. Greg Wettstein
On Jan 30, 11:58pm, Jarkko Sakkinen wrote:
} Subject: Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global sessi

Good morning, I hope the day is going well for everyone.

> I'm kind dilating to an opinion that we would leave this commit out
> from the first kernel release that will contain the resource manager
> with similar rationale as Jason gave me for whitelisting: get the
> basic stuff in and once it is used with some workloads whitelisting
> and exhaustion will take eventually the right form.
>
> How would you feel about this?

I wasn't able to locate the exact context to include but we noted with
interest Ken's comments about his need to support a model where a
client needs a TPM session for transaction purposes which can last a
highly variable amount of time.  That and concerns about command
white-listing, hardware denial of service and related issues tend to
underscore our concerns about how much TPM resource management should
go into the kernel.

Once an API is in the kernel we live with it forever.  Particularly
with respect to TPM2, our field experiences suggest it is way too
early to bake long term functionality into the kernel.

Referring back to Ken's comments about having 20+ clients waiting to
get access to the hardware.  Even with the focus in TPM2 on having it
be more of a cryptographic accelerator are we convinced that the
hardware is ever going to be fast enough for a model of having it
directly service large numbers of transactions in something like a
'cloud' model?

The industry has very solid userspace implementations of TPM2.  It
seems that with respect to resource management about all we would want
in the kernel is enough management to allow multiple privileged
userspace process to establish a root of trust for a TPM2 based
userspace instance with subsequent relinquishment of privilege.  At
that point one has the freedom to implement all sorts of policy.

Given the potential lifespan of these security technologies I think a
kernel design needs to factor in the availability of trusted execution
environment's such as SGX as well.  Politics aside, such environments
do have the ability to significantly modify the guarantees which can
be afforded to architectural models which focus on using the hardware
TPM as a root of trust for userspace implementations of 'TPM'
functionality and policy.

We can always add functionality to the kernel but we can never
subtract.  It is way too early to lock security architecture decisions
into the kernel.

> /Jarkko

Have a good weekend.

Greg

}-- End of excerpt from Jarkko Sakkinen

As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.   Specializing in information infra-structure
Fargo, ND  58102development.
PH: 701-281-1686
FAX: 701-281-3949   EMAIL: g...@enjellic.com
--
"If I'd listened to customers, I'd have given them a faster horse."
-- Henry Ford

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tpmdd-devel mailing list
tpmdd-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tpmdd-devel


Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-01-31 Thread James Bottomley
On Tue, 2017-01-31 at 14:28 -0500, Ken Goldman wrote:
> On 1/30/2017 11:04 AM, James Bottomley wrote:
> > 
> > This depends what your threat model is.  For ssh keys, you worry
> > that someone might be watching, so you use HMAC authority even for 
> > a local TPM.
> 
> If someone can "watch" my local process, they can capture my password
> anyway.  Does using a password that the attacker knows to HMAC the 
> command help?

It's about attack surface.  If you want my password and I use TPM_RS_PW
then you either prise it out of my app or snoop the command path.  If I
always use HMAC, I know you can only prise it out of my app (reduction
in attack surface) and I can plan defences accordingly (not saying I'll
be successful, just saying I have a better idea where the attack is
coming from).

> > In the cloud, you don't quite know where the TPM is, so again you'd
> > use HMAC sessions ... however, in both use cases the sessions 
> > should be very short lived.
> 
> If your entire application is in the cloud, then I think the same 
> question as above applies.
> 
> If you have your application on one platform (that you trust) and the
> TPM is on another (that you don't trust), then I absolutely agree 
> that HMAC (and parameter encryption) are necessary.

It's attack surface again ... although lengthening the transmission
pathway, which happens in the cloud, correspondingly increases that sur
face.

Look at it this way: if your TPM were network remote, would you still
think TPM_RS_PW to be appropriate?  I suspect not because the network
is seen as a very insecure pathway.  We can argue about the relative
security or insecurity of other pathways to the TPM, but it's
unarguable that using HMAC and parameter encryption means we don't have
to (and so is best practice).

James


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tpmdd-devel mailing list
tpmdd-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tpmdd-devel


Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-01-31 Thread Ken Goldman
On 1/30/2017 11:04 AM, James Bottomley wrote:
>
> This depends what your threat model is.  For ssh keys, you worry
> that someone might be watching, so you use HMAC authority even for a
> local TPM.

If someone can "watch" my local process, they can capture my password 
anyway.  Does using a password that the attacker knows to HMAC the 
command help?

> In the cloud, you don't quite know where the TPM is, so again you'd
> use HMAC sessions ... however, in both use cases the sessions should
> be very short lived.

If your entire application is in the cloud, then I think the same 
question as above applies.

If you have your application on one platform (that you trust) and the 
TPM is on another (that you don't trust), then I absolutely agree that 
HMAC (and parameter encryption) are necessary.






--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tpmdd-devel mailing list
tpmdd-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tpmdd-devel


Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-01-31 Thread Jarkko Sakkinen
On Mon, Jan 30, 2017 at 02:13:08PM -0800, James Bottomley wrote:
> On Mon, 2017-01-30 at 23:58 +0200, Jarkko Sakkinen wrote:
> > On Mon, Jan 30, 2017 at 08:04:55AM -0800, James Bottomley wrote:
> > > On Sun, 2017-01-29 at 19:52 -0500, Ken Goldman wrote:
> > > > On 1/27/2017 5:04 PM, James Bottomley wrote:
> > > > 
> > > > > > Beware the nasty corner case:
> > > > > > 
> > > > > > - Application asks for a session and gets 0200
> > > > > > 
> > > > > > - Time elapses and 0200 gets forcibly flushed
> > > > > > 
> > > > > > - Later, app comes back, asks for a second session and again
> > > > > > gets
> > > > > > 0200.
> > > > > > 
> > > > > > - App gets very confused.
> > > > > > 
> > > > > > May it be better to close the connection completely, which
> > > > > > the
> > > > > > application can detect, than flush a session and give this
> > > > > > corner
> > > > > > case?
> > > > > 
> > > > > if I look at the code I've written, I don't know what the
> > > > > session
> > > > > number is, I just save sessionHandle in a variable for later
> > > > > use 
> > > > > (lets say to v1).  If I got the same session number returned at
> > > > > a 
> > > > > later time and placed it in v2, all I'd notice is that an 
> > > > > authorization using v1 would fail.  I'm not averse to killing
> > > > > the 
> > > > > entire connection but, assuming you have fallback, it might be 
> > > > > kinder simply to ensure that the operations with the reclaimed 
> > > > > session fail (which is what the code currently does).
> > > > 
> > > > My worry is that this session failure cannot be detected by the 
> > > > application.  An HMAC failure could cause the app to tell a user
> > > > that
> > > > they entered the wrong password.  Misleading.  On the TPM, it
> > > > could 
> > > > trigger the dictionary attack lockout.  For a PIN index, it could
> > > > consume a failure count.  Killing a policy session that has e.g.,
> > > > a 
> > > > policy signed term could cause the application to go back to some
> > > > external entity for another authorization signature.
> > > > 
> > > > Let's go up to the stack.  What's the attack?
> > > > 
> > > > If we're worried about many simultaneous applications (wouldn't
> > > > that 
> > > > be wonderful), why not just let startauthsession fail?  The 
> > > > application can just retry periodically.
> > > 
> > > How in that scenario do we ensure that a session becomes available?
> > >  Once that's established, there's no real difference between
> > > retrying
> > > the startauthsession in the kernel when we know the session is
> > > available and forcing userspace to do the retry except that the
> > > former
> > > has a far greater chance of success (and it's only about 6 lines of
> > > code).
> > > 
> > > >   Just allocate them in triples so there's no deadlock.
> > > 
> > > Is this the application or the kernel?  If it's the kernel, that
> > > adds a
> > > lot of complexity.
> > > 
> > > > If we're worried about a DoS attack, killing a session just helps
> > > > the
> > > > attacker.  The attacker can create a few connections and spin on 
> > > > startauthsession, locking everyone out anyway.
> > > 
> > > There are two considerations here: firstly we'd need to introduce a
> > > mechanism to "kill" the connection.  Probably we'd simply error
> > > every
> > > command on the space until it was closed.  The second is which
> > > scenario
> > > is more reasonable: Say the application simply forgot to flush the
> > > session and will never use it again.  Simply reclaiming the session
> > > would produce no effect at all on the application in this scenario.
> > >  However, I have no data to say what's likely.
> > > 
> > > > ~~
> > > > 
> > > > Also, let's remember that this is a rare application.  Sessions
> > > > are 
> > > > only needed for remote access (requiring encryption, HMAC or
> > > > salt), 
> > > > or policy sessions.
> > > 
> > > This depends what your threat model is.  For ssh keys, you worry
> > > that
> > > someone might be watching, so you use HMAC authority even for a
> > > local
> > > TPM.  In the cloud, you don't quite know where the TPM is, so again
> > > you'd use HMAC sessions ... however, in both use cases the sessions
> > > should be very short lived.
> > > 
> > > > ~~
> > > > 
> > > > Should the code also reserve a session for the kernel?  Mark it
> > > > not 
> > > > kill'able?
> > > 
> > > At the moment, the kernel doesn't use sessions, so let's worry
> > > about
> > > that problem at the point it arises (if it ever arises).
> > > 
> > > James
> > 
> > It does. My trusted keys implementation actually uses sessions.
> 
> But as I read the code, I can't find where the kernel creates a
> session.  It looks like the session and hmac are passed in as option
> arguments, aren't they?

Yes. Sorry, I mixed up things.

> > I'm kind dilating to an opinion that we would leave this commit out 
> > from the first kernel release that will contain the resource manager 
> > with 

Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-01-30 Thread James Bottomley
On Mon, 2017-01-30 at 23:58 +0200, Jarkko Sakkinen wrote:
> On Mon, Jan 30, 2017 at 08:04:55AM -0800, James Bottomley wrote:
> > On Sun, 2017-01-29 at 19:52 -0500, Ken Goldman wrote:
> > > On 1/27/2017 5:04 PM, James Bottomley wrote:
> > > 
> > > > > Beware the nasty corner case:
> > > > > 
> > > > > - Application asks for a session and gets 0200
> > > > > 
> > > > > - Time elapses and 0200 gets forcibly flushed
> > > > > 
> > > > > - Later, app comes back, asks for a second session and again
> > > > > gets
> > > > > 0200.
> > > > > 
> > > > > - App gets very confused.
> > > > > 
> > > > > May it be better to close the connection completely, which
> > > > > the
> > > > > application can detect, than flush a session and give this
> > > > > corner
> > > > > case?
> > > > 
> > > > if I look at the code I've written, I don't know what the
> > > > session
> > > > number is, I just save sessionHandle in a variable for later
> > > > use 
> > > > (lets say to v1).  If I got the same session number returned at
> > > > a 
> > > > later time and placed it in v2, all I'd notice is that an 
> > > > authorization using v1 would fail.  I'm not averse to killing
> > > > the 
> > > > entire connection but, assuming you have fallback, it might be 
> > > > kinder simply to ensure that the operations with the reclaimed 
> > > > session fail (which is what the code currently does).
> > > 
> > > My worry is that this session failure cannot be detected by the 
> > > application.  An HMAC failure could cause the app to tell a user
> > > that
> > > they entered the wrong password.  Misleading.  On the TPM, it
> > > could 
> > > trigger the dictionary attack lockout.  For a PIN index, it could
> > > consume a failure count.  Killing a policy session that has e.g.,
> > > a 
> > > policy signed term could cause the application to go back to some
> > > external entity for another authorization signature.
> > > 
> > > Let's go up to the stack.  What's the attack?
> > > 
> > > If we're worried about many simultaneous applications (wouldn't
> > > that 
> > > be wonderful), why not just let startauthsession fail?  The 
> > > application can just retry periodically.
> > 
> > How in that scenario do we ensure that a session becomes available?
> >  Once that's established, there's no real difference between
> > retrying
> > the startauthsession in the kernel when we know the session is
> > available and forcing userspace to do the retry except that the
> > former
> > has a far greater chance of success (and it's only about 6 lines of
> > code).
> > 
> > >   Just allocate them in triples so there's no deadlock.
> > 
> > Is this the application or the kernel?  If it's the kernel, that
> > adds a
> > lot of complexity.
> > 
> > > If we're worried about a DoS attack, killing a session just helps
> > > the
> > > attacker.  The attacker can create a few connections and spin on 
> > > startauthsession, locking everyone out anyway.
> > 
> > There are two considerations here: firstly we'd need to introduce a
> > mechanism to "kill" the connection.  Probably we'd simply error
> > every
> > command on the space until it was closed.  The second is which
> > scenario
> > is more reasonable: Say the application simply forgot to flush the
> > session and will never use it again.  Simply reclaiming the session
> > would produce no effect at all on the application in this scenario.
> >  However, I have no data to say what's likely.
> > 
> > > ~~
> > > 
> > > Also, let's remember that this is a rare application.  Sessions
> > > are 
> > > only needed for remote access (requiring encryption, HMAC or
> > > salt), 
> > > or policy sessions.
> > 
> > This depends what your threat model is.  For ssh keys, you worry
> > that
> > someone might be watching, so you use HMAC authority even for a
> > local
> > TPM.  In the cloud, you don't quite know where the TPM is, so again
> > you'd use HMAC sessions ... however, in both use cases the sessions
> > should be very short lived.
> > 
> > > ~~
> > > 
> > > Should the code also reserve a session for the kernel?  Mark it
> > > not 
> > > kill'able?
> > 
> > At the moment, the kernel doesn't use sessions, so let's worry
> > about
> > that problem at the point it arises (if it ever arises).
> > 
> > James
> 
> It does. My trusted keys implementation actually uses sessions.

But as I read the code, I can't find where the kernel creates a
session.  It looks like the session and hmac are passed in as option
arguments, aren't they?

> I'm kind dilating to an opinion that we would leave this commit out 
> from the first kernel release that will contain the resource manager 
> with similar rationale as Jason gave me for whitelisting: get the 
> basic stuff in and once it is used with some workloads whitelisting 
> and exhaustion will take eventually the right form.
> 
> How would you feel about this?

As long as we get patch 1/2 then applications using sessions will
actually work with spaces, so taking more time with 

Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-01-30 Thread Jarkko Sakkinen
On Mon, Jan 30, 2017 at 08:04:55AM -0800, James Bottomley wrote:
> On Sun, 2017-01-29 at 19:52 -0500, Ken Goldman wrote:
> > On 1/27/2017 5:04 PM, James Bottomley wrote:
> > 
> > > > Beware the nasty corner case:
> > > > 
> > > > - Application asks for a session and gets 0200
> > > > 
> > > > - Time elapses and 0200 gets forcibly flushed
> > > > 
> > > > - Later, app comes back, asks for a second session and again gets
> > > > 0200.
> > > > 
> > > > - App gets very confused.
> > > > 
> > > > May it be better to close the connection completely, which the
> > > > application can detect, than flush a session and give this corner
> > > > case?
> > > 
> > > if I look at the code I've written, I don't know what the session
> > > number is, I just save sessionHandle in a variable for later use 
> > > (lets say to v1).  If I got the same session number returned at a 
> > > later time and placed it in v2, all I'd notice is that an 
> > > authorization using v1 would fail.  I'm not averse to killing the 
> > > entire connection but, assuming you have fallback, it might be 
> > > kinder simply to ensure that the operations with the reclaimed 
> > > session fail (which is what the code currently does).
> > 
> > My worry is that this session failure cannot be detected by the 
> > application.  An HMAC failure could cause the app to tell a user that
> > they entered the wrong password.  Misleading.  On the TPM, it could 
> > trigger the dictionary attack lockout.  For a PIN index, it could 
> > consume a failure count.  Killing a policy session that has e.g., a 
> > policy signed term could cause the application to go back to some 
> > external entity for another authorization signature.
> > 
> > Let's go up to the stack.  What's the attack?
> > 
> > If we're worried about many simultaneous applications (wouldn't that 
> > be wonderful), why not just let startauthsession fail?  The 
> > application can just retry periodically.
> 
> How in that scenario do we ensure that a session becomes available? 
>  Once that's established, there's no real difference between retrying
> the startauthsession in the kernel when we know the session is
> available and forcing userspace to do the retry except that the former
> has a far greater chance of success (and it's only about 6 lines of
> code).
> 
> >   Just allocate them in triples so there's no deadlock.
> 
> Is this the application or the kernel?  If it's the kernel, that adds a
> lot of complexity.
> 
> > If we're worried about a DoS attack, killing a session just helps the
> > attacker.  The attacker can create a few connections and spin on 
> > startauthsession, locking everyone out anyway.
> 
> There are two considerations here: firstly we'd need to introduce a
> mechanism to "kill" the connection.  Probably we'd simply error every
> command on the space until it was closed.  The second is which scenario
> is more reasonable: Say the application simply forgot to flush the
> session and will never use it again.  Simply reclaiming the session
> would produce no effect at all on the application in this scenario. 
>  However, I have no data to say what's likely.
> 
> > ~~
> > 
> > Also, let's remember that this is a rare application.  Sessions are 
> > only needed for remote access (requiring encryption, HMAC or salt), 
> > or policy sessions.
> 
> This depends what your threat model is.  For ssh keys, you worry that
> someone might be watching, so you use HMAC authority even for a local
> TPM.  In the cloud, you don't quite know where the TPM is, so again
> you'd use HMAC sessions ... however, in both use cases the sessions
> should be very short lived.
> 
> > ~~
> > 
> > Should the code also reserve a session for the kernel?  Mark it not 
> > kill'able?
> 
> At the moment, the kernel doesn't use sessions, so let's worry about
> that problem at the point it arises (if it ever arises).
> 
> James

It does. My trusted keys implementation actually uses sessions.

I'm kind dilating to an opinion that we would leave this commit out from
the first kernel release that will contain the resource manager with
similar rationale as Jason gave me for whitelisting: get the basic stuff
in and once it is used with some workloads whitelisting and exhaustion
will take eventually the right form.

How would you feel about this?

/Jarkko

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tpmdd-devel mailing list
tpmdd-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tpmdd-devel


Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-01-30 Thread James Bottomley
On Sun, 2017-01-29 at 19:52 -0500, Ken Goldman wrote:
> On 1/27/2017 5:04 PM, James Bottomley wrote:
> 
> > > Beware the nasty corner case:
> > > 
> > > - Application asks for a session and gets 0200
> > > 
> > > - Time elapses and 0200 gets forcibly flushed
> > > 
> > > - Later, app comes back, asks for a second session and again gets
> > > 0200.
> > > 
> > > - App gets very confused.
> > > 
> > > May it be better to close the connection completely, which the
> > > application can detect, than flush a session and give this corner
> > > case?
> > 
> > if I look at the code I've written, I don't know what the session
> > number is, I just save sessionHandle in a variable for later use 
> > (lets say to v1).  If I got the same session number returned at a 
> > later time and placed it in v2, all I'd notice is that an 
> > authorization using v1 would fail.  I'm not averse to killing the 
> > entire connection but, assuming you have fallback, it might be 
> > kinder simply to ensure that the operations with the reclaimed 
> > session fail (which is what the code currently does).
> 
> My worry is that this session failure cannot be detected by the 
> application.  An HMAC failure could cause the app to tell a user that
> they entered the wrong password.  Misleading.  On the TPM, it could 
> trigger the dictionary attack lockout.  For a PIN index, it could 
> consume a failure count.  Killing a policy session that has e.g., a 
> policy signed term could cause the application to go back to some 
> external entity for another authorization signature.
> 
> Let's go up to the stack.  What's the attack?
> 
> If we're worried about many simultaneous applications (wouldn't that 
> be wonderful), why not just let startauthsession fail?  The 
> application can just retry periodically.

How in that scenario do we ensure that a session becomes available? 
 Once that's established, there's no real difference between retrying
the startauthsession in the kernel when we know the session is
available and forcing userspace to do the retry except that the former
has a far greater chance of success (and it's only about 6 lines of
code).

>   Just allocate them in triples so there's no deadlock.

Is this the application or the kernel?  If it's the kernel, that adds a
lot of complexity.

> If we're worried about a DoS attack, killing a session just helps the
> attacker.  The attacker can create a few connections and spin on 
> startauthsession, locking everyone out anyway.

There are two considerations here: firstly we'd need to introduce a
mechanism to "kill" the connection.  Probably we'd simply error every
command on the space until it was closed.  The second is which scenario
is more reasonable: Say the application simply forgot to flush the
session and will never use it again.  Simply reclaiming the session
would produce no effect at all on the application in this scenario. 
 However, I have no data to say what's likely.

> ~~
> 
> Also, let's remember that this is a rare application.  Sessions are 
> only needed for remote access (requiring encryption, HMAC or salt), 
> or policy sessions.

This depends what your threat model is.  For ssh keys, you worry that
someone might be watching, so you use HMAC authority even for a local
TPM.  In the cloud, you don't quite know where the TPM is, so again
you'd use HMAC sessions ... however, in both use cases the sessions
should be very short lived.

> ~~
> 
> Should the code also reserve a session for the kernel?  Mark it not 
> kill'able?

At the moment, the kernel doesn't use sessions, so let's worry about
that problem at the point it arises (if it ever arises).

James


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tpmdd-devel mailing list
tpmdd-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tpmdd-devel


Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-01-29 Thread Ken Goldman
On 1/27/2017 5:04 PM, James Bottomley wrote:

>> Beware the nasty corner case:
>>
>> - Application asks for a session and gets 0200
>>
>> - Time elapses and 0200 gets forcibly flushed
>>
>> - Later, app comes back, asks for a second session and again gets
>> 0200.
>>
>> - App gets very confused.
>>
>> May it be better to close the connection completely, which the
>> application can detect, than flush a session and give this corner
>> case?
>
> if I look at the code I've written, I don't know what the session
> number is, I just save sessionHandle in a variable for later use (lets
> say to v1).  If I got the same session number returned at a later time
> and placed it in v2, all I'd notice is that an authorization using v1
> would fail.  I'm not averse to killing the entire connection but,
> assuming you have fallback, it might be kinder simply to ensure that
> the operations with the reclaimed session fail (which is what the code
> currently does).

My worry is that this session failure cannot be detected by the 
application.  An HMAC failure could cause the app to tell a user that 
they entered the wrong password.  Misleading.  On the TPM, it could 
trigger the dictionary attack lockout.  For a PIN index, it could 
consume a failure count.  Killing a policy session that has e.g., a 
policy signed term could cause the application to go back to some 
external entity for another authorization signature.

Let's go up to the stack.  What's the attack?

If we're worried about many simultaneous applications (wouldn't that be 
wonderful), why not just let startauthsession fail?  The application can 
just retry periodically.  Just allocate them in triples so there's no 
deadlock.

If we're worried about a DoS attack, killing a session just helps the 
attacker.  The attacker can create a few connections and spin on 
startauthsession, locking everyone out anyway.

~~

Also, let's remember that this is a rare application.  Sessions are only 
needed for remote access (requiring encryption, HMAC or salt), or policy 
sessions.

~~

Should the code also reserve a session for the kernel?  Mark it not 
kill'able?




--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tpmdd-devel mailing list
tpmdd-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tpmdd-devel


Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-01-27 Thread Jason Gunthorpe
On Fri, Jan 27, 2017 at 02:04:59PM -0800, James Bottomley wrote:

> if I look at the code I've written, I don't know what the session
> number is, I just save sessionHandle in a variable for later use (lets
> say to v1).  If I got the same session number returned at a later time
> and placed it in v2, all I'd notice is that an authorization using v1
> would fail.

Is there any way that could be used to cause an op thinking it is
using v1 to authorize something it shouldn't?

Jason

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tpmdd-devel mailing list
tpmdd-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tpmdd-devel


Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-01-27 Thread James Bottomley
On Fri, 2017-01-27 at 16:20 -0500, Ken Goldman wrote:
> On 1/19/2017 7:41 AM, Jarkko Sakkinen wrote:
> > 
> > I actually think that the very best solution would be such that
> > sessions would be *always* lease based. So when you create a
> > session you would always loose within a time limit.
> > 
> > There would not be any special victim selection mechanism. You
> > would just loose your session within a time limit.
> 
> I worry about the time limit.
> 
> I have a proposed use case (policy signed) where the user sends the 
> session nonce along with a "payment" to a vendor and receives back a 
> signature authorization over the nonce.
> 
> The time could be minutes or even hours.

So the problem is that sessions are a limited resource and we need a
way to allocate them when under resource pressure.  Leasing is the
fairest way I can think of but I'm open to other mechanisms if you
propose them.

Note that the lease mechanism doesn't mean every session expires after
the limit, it just means that every session becomes eligible for
reclaim after the limit.  If there's no-one else waiting, you can keep
your session for hours.

James



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tpmdd-devel mailing list
tpmdd-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tpmdd-devel


Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-01-27 Thread Ken Goldman
On 1/18/2017 3:48 PM, James Bottomley wrote:
> In a TPM2, sessions can be globally exhausted once there are
> TPM_PT_ACTIVE_SESSION_MAX of them (even if they're all context saved).
> The Strategy for handling this is to keep a global count of all the
> sessions along with their creation time.  Then if we see the TPM run
> out of sessions (via the TPM_RC_SESSION_HANDLES) we first wait for one
> to become free, but if it doesn't, we forcibly evict an existing one.
> The eviction strategy waits until the current command is repeated to
> evict the session which should guarantee there is an available slot.

Beware the nasty corner case:

- Application asks for a session and gets 0200

- Time elapses and 0200 gets forcibly flushed

- Later, app comes back, asks for a second session and again gets 0200.

- App gets very confused.

May it be better to close the connection completely, which the 
application can detect, than flush a session and give this corner case?



Part of me says to defer this.  That is:

64 sessions / 3 = 21 simultaneous applications.  If we have 21 
simultaneous TCG applications, we'll all celebrate.  For the DoS,
chmod and chgrp /dev/tpm and let only well behaved applications in the 
group.

Agreed, it's not a long term solution.




--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tpmdd-devel mailing list
tpmdd-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tpmdd-devel


Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-01-27 Thread Ken Goldman
On 1/19/2017 7:41 AM, Jarkko Sakkinen wrote:
>
> I actually think that the very best solution would be such that
> sessions would be *always* lease based. So when you create a
> session you would always loose within a time limit.
>
> There would not be any special victim selection mechanism. You
> would just loose your session within a time limit.

I worry about the time limit.

I have a proposed use case (policy signed) where the user sends the 
session nonce along with a "payment" to a vendor and receives back a 
signature authorization over the nonce.

The time could be minutes or even hours.





--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tpmdd-devel mailing list
tpmdd-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tpmdd-devel


Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-01-19 Thread James Bottomley
On Thu, 2017-01-19 at 14:25 +0200, Jarkko Sakkinen wrote:
> On Wed, Jan 18, 2017 at 03:48:09PM -0500, James Bottomley wrote:
> > In a TPM2, sessions can be globally exhausted once there are
> > TPM_PT_ACTIVE_SESSION_MAX of them (even if they're all context
> > saved).
> > The Strategy for handling this is to keep a global count of all the
> > sessions along with their creation time.  Then if we see the TPM
> > run
> > out of sessions (via the TPM_RC_SESSION_HANDLES) we first wait for
> > one
> > to become free, but if it doesn't, we forcibly evict an existing
> > one.
> > The eviction strategy waits until the current command is repeated
> > to
> > evict the session which should guarantee there is an available
> > slot.
> > 
> > On the force eviction case, we make sure that the victim session is
> > at
> > least SESSION_TIMEOUT old (currently 2 seconds).  The wait queue
> > for
> > session slots is a FIFO one, ensuring that once we run out of
> > sessions, everyone will get a session in a bounded time and once
> > they
> > get one, they'll have SESSION_TIMEOUT to use it before it may be
> > subject to eviction.
> > 
> > Signed-off-by: James Bottomley <
> > james.bottom...@hansenpartnership.com>
> 
> I didn't yet read the code properly. I'll do a more proper review
> once I have v4 of my patch set together. This comment is solely
> based on your commit message.
> 
> I'm just thinking that do we need this complicated timeout stuff
> or could you just kick a session out in LRU fashion as we run
> out of them?
> 
> Or one variation of what you are doing: couldn't the session that
> needs a session handle to do something sleep for 2 seconds and then
> take the oldest session? It would have essentially the same effect
> but no waitqueue needed.
> 
> Yeah, as I said, this is just commentary based on the description.

If you don't have a wait queue you lose fairness in resource allocation
on starvation.  What happens is that you get RC_SESSION_HANDLES and
sleep for 2s and retry.  Meanwhile someone frees a session, then next
user grabs it while you were sleeping and when you wake you still get
RC_SESSION_HANDLES.  I can basically DoS your process if I understand
this. The only way to make the resource fairly allocated: i.e. the
first person to sleep waiting for a session is the one who gets it when
they wake is to make sure that you wake one waiter as soon as a free
session comes in so probabalistically, they get the session.  If you
look, there are two mechanisms for ensuring fairness: one is the FIFO
wait queue (probabalistic) and the other is the reserved session which
really ensures it belongs to you when you wake (deterministic but
expensive, so this is only activated on the penultimate go around).

James



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tpmdd-devel mailing list
tpmdd-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tpmdd-devel


Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-01-19 Thread Jarkko Sakkinen
On Thu, Jan 19, 2017 at 02:25:33PM +0200, Jarkko Sakkinen wrote:
> On Wed, Jan 18, 2017 at 03:48:09PM -0500, James Bottomley wrote:
> > In a TPM2, sessions can be globally exhausted once there are
> > TPM_PT_ACTIVE_SESSION_MAX of them (even if they're all context saved).
> > The Strategy for handling this is to keep a global count of all the
> > sessions along with their creation time.  Then if we see the TPM run
> > out of sessions (via the TPM_RC_SESSION_HANDLES) we first wait for one
> > to become free, but if it doesn't, we forcibly evict an existing one.
> > The eviction strategy waits until the current command is repeated to
> > evict the session which should guarantee there is an available slot.
> > 
> > On the force eviction case, we make sure that the victim session is at
> > least SESSION_TIMEOUT old (currently 2 seconds).  The wait queue for
> > session slots is a FIFO one, ensuring that once we run out of
> > sessions, everyone will get a session in a bounded time and once they
> > get one, they'll have SESSION_TIMEOUT to use it before it may be
> > subject to eviction.
> > 
> > Signed-off-by: James Bottomley 
> 
> I didn't yet read the code properly. I'll do a more proper review
> once I have v4 of my patch set together. This comment is solely
> based on your commit message.
> 
> I'm just thinking that do we need this complicated timeout stuff
> or could you just kick a session out in LRU fashion as we run
> out of them?
> 
> Or one variation of what you are doing: couldn't the session that
> needs a session handle to do something sleep for 2 seconds and then
> take the oldest session? It would have essentially the same effect
> but no waitqueue needed.
> 
> Yeah, as I said, this is just commentary based on the description.

I actually think that the very best solution would be such that
sessions would be *always* lease based. So when you create a
session you would always loose within a time limit.

There would not be any special victim selection mechanism. You
would just loose your session within a time limit.

This could be already part of the session isolation and would
actually make only isolation usable.

We do not have API yet locked so why not make API that models
the nature of the resource. Here given that the amount of sessions
is always fixed leases make sense.

You just then need a wait queue for those waiting for leases.
They don't need to do any victim selectio or whatever. Everything
that takes above the lease gets flushed.

I strongly feel that this would be the best long term solution.

/Jarkko

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tpmdd-devel mailing list
tpmdd-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tpmdd-devel


Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

2017-01-19 Thread Jarkko Sakkinen
On Wed, Jan 18, 2017 at 03:48:09PM -0500, James Bottomley wrote:
> In a TPM2, sessions can be globally exhausted once there are
> TPM_PT_ACTIVE_SESSION_MAX of them (even if they're all context saved).
> The Strategy for handling this is to keep a global count of all the
> sessions along with their creation time.  Then if we see the TPM run
> out of sessions (via the TPM_RC_SESSION_HANDLES) we first wait for one
> to become free, but if it doesn't, we forcibly evict an existing one.
> The eviction strategy waits until the current command is repeated to
> evict the session which should guarantee there is an available slot.
> 
> On the force eviction case, we make sure that the victim session is at
> least SESSION_TIMEOUT old (currently 2 seconds).  The wait queue for
> session slots is a FIFO one, ensuring that once we run out of
> sessions, everyone will get a session in a bounded time and once they
> get one, they'll have SESSION_TIMEOUT to use it before it may be
> subject to eviction.
> 
> Signed-off-by: James Bottomley 

I didn't yet read the code properly. I'll do a more proper review
once I have v4 of my patch set together. This comment is solely
based on your commit message.

I'm just thinking that do we need this complicated timeout stuff
or could you just kick a session out in LRU fashion as we run
out of them?

Or one variation of what you are doing: couldn't the session that
needs a session handle to do something sleep for 2 seconds and then
take the oldest session? It would have essentially the same effect
but no waitqueue needed.

Yeah, as I said, this is just commentary based on the description.

/Jarkko

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tpmdd-devel mailing list
tpmdd-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tpmdd-devel