Re: [openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)

2015-09-23 Thread Matt Riedemann



On 6/25/2015 3:59 AM, Sylvain Bauza wrote:



Le 24/06/2015 19:56, Joe Gordon a écrit :



On Tue, Jun 23, 2015 at 3:41 AM, Sylvain Bauza > wrote:

Hi team,

Some discussion occurred over IRC about a bug which was publicly
open related to TrustedFilter [1]
I want to take the opportunity for raising my concerns about that
specific filter, why I dislike it and how I think we could improve
the situation - and clarify everyone's thoughts)

The current situation is that way : Nova only checks if one host
is compromised only when the scheduler is called, ie. only when
booting/migrating/evacuating/unshelving an instance (well, not
exactly all the evacuate/live-migrate cases, but let's not discuss
about that now). When the request goes in the scheduler, all the
hosts are checked against all the enabled filters and the
TrustedFilter is making an external HTTP(S) call to the
Attestation API service (not handled by Nova) for *each host* to
see if the host is valid (not compromised) or not.

To be clear, that's the only in-tree scheduler filter which
explicitly does an external call to a separate service that Nova
is not managing. I can see at least 3 reasons for thinking about
why it's bad :

#1 : that's a terrible bottleneck for performance, because we're
IO-blocking N times given N hosts (we're even not multiplexing the
HTTP requests)
#2 : all the filters are checking an internal Nova state for the
host (called HostState) but that the TrustedFilter, which means
that conceptually we defer the decision to a 3rd-party engine
#3 : that Attestation API services becomes a de facto dependency
for Nova (since it's an in-tree filter) while it's not listed as a
dependency and thus not gated.


All of these reasons could be acceptable if that would cover the
exposed usecase given in [1] (ie. I want to make sure that if my
host gets compromised, my instances will not be running on that
host) but that just doesn't work, due to the situation I mentioned
above.

So, given that, here are my thoughts :
a/ if a host gets compromised, we can just disable its service to
prevent its election as a valid destination host. There is no need
for a specialised filter.
b/ if a host is compromised, we can assume that the instances have
to resurrect elsewhere, ie. we can call a nova evacuate
c/ checking if an host is compromised or not is not a Nova
responsibility since it's already perfectly done by [2]

In other words, I'm considering that "security" usecase as
something analog as the HA usecase [3] where we need a 3rd-party
tool responsible for periodically checking the state of the hosts,
and if compromised then call the Nova API for fencing the host and
evacuating the compromised instances.

Given that, I'm proposing to deprecate TrustedFilter and explictly
mention to drop it from in-tree in a later cycle
https://review.openstack.org/194592


Given people are using this, it is a negligible maintenance burden.  I
think deprecating with the intention of removing is not worth it.

Although it would be very useful to further document the risks with
this filter (live migration, possible performance issues etc.)


Well, I can understand that customers could not be agreeing to remove
the filter because there is no clear alternative for them. That said, I
think saying that the filter is deprecated without saying when it would
be removed would help some contributors thinking about that and working
on a better solution, exactly like we did for EC2 API.

To be clear, I want to freeze the filter by deprecating it and
explaining why it's wrong (by amending the devref section and giving a
LOG warning saying it's deprecated) and then leave the filter within
in-tree unless we are sure that there is a good solution out of Nova.

-Sylvain





Thoughts ?
-Sylvain



[1] https://bugs.launchpad.net/nova/+bug/1456228
[2] https://github.com/OpenAttestation/OpenAttestation
[3]
http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposal/


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__
OpenStack Development Mailing 

Re: [openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)

2015-09-23 Thread Sylvain Bauza



Le 23/09/2015 15:31, Matt Riedemann a écrit :



On 6/25/2015 3:59 AM, Sylvain Bauza wrote:



Le 24/06/2015 19:56, Joe Gordon a écrit :



On Tue, Jun 23, 2015 at 3:41 AM, Sylvain Bauza > wrote:

Hi team,

Some discussion occurred over IRC about a bug which was publicly
open related to TrustedFilter [1]
I want to take the opportunity for raising my concerns about that
specific filter, why I dislike it and how I think we could improve
the situation - and clarify everyone's thoughts)

The current situation is that way : Nova only checks if one host
is compromised only when the scheduler is called, ie. only when
booting/migrating/evacuating/unshelving an instance (well, not
exactly all the evacuate/live-migrate cases, but let's not discuss
about that now). When the request goes in the scheduler, all the
hosts are checked against all the enabled filters and the
TrustedFilter is making an external HTTP(S) call to the
Attestation API service (not handled by Nova) for *each host* to
see if the host is valid (not compromised) or not.

To be clear, that's the only in-tree scheduler filter which
explicitly does an external call to a separate service that Nova
is not managing. I can see at least 3 reasons for thinking about
why it's bad :

#1 : that's a terrible bottleneck for performance, because we're
IO-blocking N times given N hosts (we're even not multiplexing the
HTTP requests)
#2 : all the filters are checking an internal Nova state for the
host (called HostState) but that the TrustedFilter, which means
that conceptually we defer the decision to a 3rd-party engine
#3 : that Attestation API services becomes a de facto dependency
for Nova (since it's an in-tree filter) while it's not listed as a
dependency and thus not gated.


All of these reasons could be acceptable if that would cover the
exposed usecase given in [1] (ie. I want to make sure that if my
host gets compromised, my instances will not be running on that
host) but that just doesn't work, due to the situation I mentioned
above.

So, given that, here are my thoughts :
a/ if a host gets compromised, we can just disable its service to
prevent its election as a valid destination host. There is no need
for a specialised filter.
b/ if a host is compromised, we can assume that the instances have
to resurrect elsewhere, ie. we can call a nova evacuate
c/ checking if an host is compromised or not is not a Nova
responsibility since it's already perfectly done by [2]

In other words, I'm considering that "security" usecase as
something analog as the HA usecase [3] where we need a 3rd-party
tool responsible for periodically checking the state of the hosts,
and if compromised then call the Nova API for fencing the host and
evacuating the compromised instances.

Given that, I'm proposing to deprecate TrustedFilter and explictly
mention to drop it from in-tree in a later cycle
https://review.openstack.org/194592


Given people are using this, it is a negligible maintenance burden.  I
think deprecating with the intention of removing is not worth it.

Although it would be very useful to further document the risks with
this filter (live migration, possible performance issues etc.)


Well, I can understand that customers could not be agreeing to remove
the filter because there is no clear alternative for them. That said, I
think saying that the filter is deprecated without saying when it would
be removed would help some contributors thinking about that and working
on a better solution, exactly like we did for EC2 API.

To be clear, I want to freeze the filter by deprecating it and
explaining why it's wrong (by amending the devref section and giving a
LOG warning saying it's deprecated) and then leave the filter within
in-tree unless we are sure that there is a good solution out of Nova.

-Sylvain





Thoughts ?
-Sylvain



[1] https://bugs.launchpad.net/nova/+bug/1456228
[2] https://github.com/OpenAttestation/OpenAttestation
[3]
http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposal/ 




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__ 


OpenStack Development Mailing List (not for usage questions)
Unsubscribe:openstack-dev-requ...@lists.openstack.org?subject:unsubscribe 


http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev





Re: [openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)

2015-09-23 Thread Matt Riedemann



On 9/23/2015 10:00 AM, Sylvain Bauza wrote:



Le 23/09/2015 15:31, Matt Riedemann a écrit :



On 6/25/2015 3:59 AM, Sylvain Bauza wrote:



Le 24/06/2015 19:56, Joe Gordon a écrit :



On Tue, Jun 23, 2015 at 3:41 AM, Sylvain Bauza > wrote:

Hi team,

Some discussion occurred over IRC about a bug which was publicly
open related to TrustedFilter [1]
I want to take the opportunity for raising my concerns about that
specific filter, why I dislike it and how I think we could improve
the situation - and clarify everyone's thoughts)

The current situation is that way : Nova only checks if one host
is compromised only when the scheduler is called, ie. only when
booting/migrating/evacuating/unshelving an instance (well, not
exactly all the evacuate/live-migrate cases, but let's not discuss
about that now). When the request goes in the scheduler, all the
hosts are checked against all the enabled filters and the
TrustedFilter is making an external HTTP(S) call to the
Attestation API service (not handled by Nova) for *each host* to
see if the host is valid (not compromised) or not.

To be clear, that's the only in-tree scheduler filter which
explicitly does an external call to a separate service that Nova
is not managing. I can see at least 3 reasons for thinking about
why it's bad :

#1 : that's a terrible bottleneck for performance, because we're
IO-blocking N times given N hosts (we're even not multiplexing the
HTTP requests)
#2 : all the filters are checking an internal Nova state for the
host (called HostState) but that the TrustedFilter, which means
that conceptually we defer the decision to a 3rd-party engine
#3 : that Attestation API services becomes a de facto dependency
for Nova (since it's an in-tree filter) while it's not listed as a
dependency and thus not gated.


All of these reasons could be acceptable if that would cover the
exposed usecase given in [1] (ie. I want to make sure that if my
host gets compromised, my instances will not be running on that
host) but that just doesn't work, due to the situation I mentioned
above.

So, given that, here are my thoughts :
a/ if a host gets compromised, we can just disable its service to
prevent its election as a valid destination host. There is no need
for a specialised filter.
b/ if a host is compromised, we can assume that the instances have
to resurrect elsewhere, ie. we can call a nova evacuate
c/ checking if an host is compromised or not is not a Nova
responsibility since it's already perfectly done by [2]

In other words, I'm considering that "security" usecase as
something analog as the HA usecase [3] where we need a 3rd-party
tool responsible for periodically checking the state of the hosts,
and if compromised then call the Nova API for fencing the host and
evacuating the compromised instances.

Given that, I'm proposing to deprecate TrustedFilter and explictly
mention to drop it from in-tree in a later cycle
https://review.openstack.org/194592


Given people are using this, it is a negligible maintenance burden.  I
think deprecating with the intention of removing is not worth it.

Although it would be very useful to further document the risks with
this filter (live migration, possible performance issues etc.)


Well, I can understand that customers could not be agreeing to remove
the filter because there is no clear alternative for them. That said, I
think saying that the filter is deprecated without saying when it would
be removed would help some contributors thinking about that and working
on a better solution, exactly like we did for EC2 API.

To be clear, I want to freeze the filter by deprecating it and
explaining why it's wrong (by amending the devref section and giving a
LOG warning saying it's deprecated) and then leave the filter within
in-tree unless we are sure that there is a good solution out of Nova.

-Sylvain





Thoughts ?
-Sylvain



[1] https://bugs.launchpad.net/nova/+bug/1456228
[2] https://github.com/OpenAttestation/OpenAttestation
[3]
http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposal/



__

OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__

OpenStack Development Mailing List (not for usage questions)
Unsubscribe:openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev





Re: [openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)

2015-06-25 Thread Dulko, Michal
 -Original Message-
 From: John Garbutt [mailto:j...@johngarbutt.com]
 Sent: Thursday, June 25, 2015 2:22 PM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] How to properly detect and fence a
 compromised host (and why I dislike TrustedFilter)
 
 On 24 June 2015 at 09:35, Dulko, Michal michal.du...@intel.com wrote:
  -Original Message-
  From: Sylvain Bauza [mailto:sba...@redhat.com]
  Sent: Wednesday, June 24, 2015 9:39 AM
  To: OpenStack Development Mailing List (not for usage questions)
  Subject: Re: [openstack-dev] [nova] How to properly detect and fence
  a compromised host (and why I dislike TrustedFilter)

(snip)

   So I would suggest using the 3rd-party tools as enhancing way to
  supplement our TCP/trustedfilter feature. And the 3rd party tools can
  also call attestation API for host attestation.
 
  I don't see much benefits of keeping such filter for the reasons I
  mentioned below. Again, if you want to fence one host, you can just
  disable its service, that's enough.
 
  This won't address the case in which you have heterogenic environment
 and you want only some important VMs to run on trusted hosts (and for the
 rest of the VMs you don't care).
 
 This is an interesting one to dig into.
 
 I had assumed in this case you put all the VMs that want the attestation
 check in a subset of nodes that are setup to use that set.
 You can do that using host aggregates and our existing filters.
 
 An external system could then just disable hosts within that subset of hosts
 that have the attestation check working.
 
 Does that work for your use case?

It should be fine for this case.  But then - why not go further and remove SG 
API? Let's leave monitoring of services to Pacemaker and NagiOS and they 
disable them if they consider that service is down.

My point is that following this logic we may use external services to replace 
any filter that has such simple logic. Is this the right direction?
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)

2015-06-25 Thread John Garbutt
On 24 June 2015 at 09:35, Dulko, Michal michal.du...@intel.com wrote:
 -Original Message-
 From: Sylvain Bauza [mailto:sba...@redhat.com]
 Sent: Wednesday, June 24, 2015 9:39 AM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] How to properly detect and fence a
 compromised host (and why I dislike TrustedFilter)

 (general point, could we please try not top-posting ? It makes a little 
 harder
 to follow the conversation)

 Replies inline.

 Le 24/06/2015 08:15, Wei, Gang a écrit :
  Only if all the hosts managed by OpenStack are capable for measured boot
 process, then let 3rd-party tool call nova fencing API might be better than
 using TrustedFilter.
 
  But if not all the hosts support measured boot, then with TrustedFilter we
 can schedule VM to only measured and trusted host, but in 3rd-party tool
 case, only untrusted/compromised hosts will be fenced, the host with
 unknown trustworthiness will still be able to run VM but the owner is not
 willing to do it that way.
 You don't need a specific filter for fencing one host from being scheduled.
 Just calling the Nova os-services API to explicitly disable the service (and
 providing a reason) just makes the hosts belonging to the service not able to
 be elected (thanks to the ComputeFilter)

 To be clear, I would love to see the logic inverted, ie. something which 
 would
 call the OAT service for a specific host would then fire a service disable.


  So I would suggest using the 3rd-party tools as enhancing way to
 supplement our TCP/trustedfilter feature. And the 3rd party tools can also
 call attestation API for host attestation.

 I don't see much benefits of keeping such filter for the reasons I mentioned
 below. Again, if you want to fence one host, you can just disable its 
 service,
 that's enough.

 This won't address the case in which you have heterogenic environment and you 
 want only some important VMs to run on trusted hosts (and for the rest of the 
 VMs you don't care).

This is an interesting one to dig into.

I had assumed in this case you put all the VMs that want the
attestation check in a subset of nodes that are setup to use that set.
You can do that using host aggregates and our existing filters.

An external system could then just disable hosts within that subset of
hosts that have the attestation check working.

Does that work for your use case?

Thanks,
John

  -Original Message-
  From: Bhandaru, Malini K [mailto:malini.k.bhand...@intel.com]
  Sent: Wednesday, June 24, 2015 1:13 PM
  To: OpenStack Development Mailing List (not for usage questions)
  Subject: Re: [openstack-dev] [nova] How to properly detect and fence a
  compromised host (and why I dislike TrustedFilter)
 
  Would like to add to Shane's points below.
 
  1) The Trust filter can be treated as an API, with different underlying
 implementations. Its default could even be Not Implemented and always
 return false.
And Nova.conf could specify use the OAT trust implementation. This
 would not break present day users of the functionality.

 Don't get me wrong, I'm not against OAT, I'm just saying that the
 TrustedFilter design is wrong. Even if another alternative would come up to
 serve the TrustedComputePool model of things, it would still be bad for the
 reasons I mentioned below, and wouldn't cover the usecase I quoted.


  2) The issue in the original bug is a a VM waking up after a reboot on a 
  host
 that has not pre-determined whether the host is still trustable.
This is essentially begging a feature to check that all constraints
 requested by a VM during launch are confirmed to hold when it re-awakens,
 even if it is not
going through Nova scheduler at this point.

 So I think we are in agreement that for covering that usecase, it can't be
 done at the scheduler level.
 Using TrustedFilter just ensures that at the instance creation time, the 
 host is
 checked but confuses people because they think it will be enforced for the
 whole instance lifecyle.


This holds even for aggregates that might be specified by geo, or 
  even
 reservation such as Coke or Pepsi.
What if a host, even without a reboot and certainly before a reboot 
  was
 assigned from Coke to Pepsi, there is cross contamination.
Perhaps we need Nova hooks that can be registered with functions that
 check expected aggregate values.

 I don't honestly see the point of an host aggregate. Given the failure domain
 is an host, you only need to trust that host or not. The fact that the host
 belongs to an aggregate or not is orthogonal to our problem IMHO.

Better still have  libvirt functionality that makes a call back for 
  each VM
 on a host to ensure its constraints are satisfied on start-up/boot, and 
 re-start
 when it comes out of pause.

 Hum, doesn't it sound weird to have the host being the source of truth ?
 Also, if an host gets compromised, why couldn't we 

Re: [openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)

2015-06-25 Thread John Garbutt
On 25 June 2015 at 14:09, Dulko, Michal michal.du...@intel.com wrote:
 -Original Message-
 From: John Garbutt [mailto:j...@johngarbutt.com]
 Sent: Thursday, June 25, 2015 2:22 PM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] How to properly detect and fence a
 compromised host (and why I dislike TrustedFilter)

 On 24 June 2015 at 09:35, Dulko, Michal michal.du...@intel.com wrote:
  -Original Message-
  From: Sylvain Bauza [mailto:sba...@redhat.com]
  Sent: Wednesday, June 24, 2015 9:39 AM
  To: OpenStack Development Mailing List (not for usage questions)
  Subject: Re: [openstack-dev] [nova] How to properly detect and fence
  a compromised host (and why I dislike TrustedFilter)

 (snip)

   So I would suggest using the 3rd-party tools as enhancing way to
  supplement our TCP/trustedfilter feature. And the 3rd party tools can
  also call attestation API for host attestation.
 
  I don't see much benefits of keeping such filter for the reasons I
  mentioned below. Again, if you want to fence one host, you can just
  disable its service, that's enough.
 
  This won't address the case in which you have heterogenic environment
 and you want only some important VMs to run on trusted hosts (and for the
 rest of the VMs you don't care).

 This is an interesting one to dig into.

 I had assumed in this case you put all the VMs that want the attestation
 check in a subset of nodes that are setup to use that set.
 You can do that using host aggregates and our existing filters.

 An external system could then just disable hosts within that subset of hosts
 that have the attestation check working.

 Does that work for your use case?

 It should be fine for this case.  But then - why not go further and remove SG 
 API? Let's leave monitoring of services to Pacemaker and NagiOS and they 
 disable them if they consider that service is down.

Honestly, I find that idea very attractive.

The mark down API is basically going down that route.
http://specs.openstack.org/openstack/nova-specs/specs/liberty/approved/mark-host-down.html

 My point is that following this logic we may use external services to replace 
 any filter that has such simple logic. Is this the right direction?

If its an external system, and you can integrate more efficiently by
disabling hosts, then yes thats awesome.

Thats not always going to be the correct direction, but we need to
look at if something can be done externally first. Nova is too big
already, we are actively trying to not expand its scope.

Thanks,
John

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)

2015-06-25 Thread Juvonen, Tomi (Nokia - FI/Espoo)
-Original Message-
From: ext John Garbutt [mailto:j...@johngarbutt.com] 
Sent: Thursday, June 25, 2015 4:39 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] How to properly detect and fence a 
compromised host (and why I dislike TrustedFilter)

On 25 June 2015 at 14:09, Dulko, Michal michal.du...@intel.com wrote:
 -Original Message-
 From: John Garbutt [mailto:j...@johngarbutt.com]
 Sent: Thursday, June 25, 2015 2:22 PM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] How to properly detect and fence a
 compromised host (and why I dislike TrustedFilter)

 On 24 June 2015 at 09:35, Dulko, Michal michal.du...@intel.com wrote:
  -Original Message-
  From: Sylvain Bauza [mailto:sba...@redhat.com]
  Sent: Wednesday, June 24, 2015 9:39 AM
  To: OpenStack Development Mailing List (not for usage questions)
  Subject: Re: [openstack-dev] [nova] How to properly detect and fence
  a compromised host (and why I dislike TrustedFilter)

 (snip)

   So I would suggest using the 3rd-party tools as enhancing way to
  supplement our TCP/trustedfilter feature. And the 3rd party tools can
  also call attestation API for host attestation.
 
  I don't see much benefits of keeping such filter for the reasons I
  mentioned below. Again, if you want to fence one host, you can just
  disable its service, that's enough.
 
  This won't address the case in which you have heterogenic environment
 and you want only some important VMs to run on trusted hosts (and for the
 rest of the VMs you don't care).

 This is an interesting one to dig into.

 I had assumed in this case you put all the VMs that want the attestation
 check in a subset of nodes that are setup to use that set.
 You can do that using host aggregates and our existing filters.

 An external system could then just disable hosts within that subset of hosts
 that have the attestation check working.

 Does that work for your use case?

 It should be fine for this case.  But then - why not go further and remove 
 SG API? Let's leave monitoring of services to Pacemaker and NagiOS and they 
 disable them if they consider that service is down.

Honestly, I find that idea very attractive.

The mark down API is basically going down that route.
http://specs.openstack.org/openstack/nova-specs/specs/liberty/approved/mark-host-down.html

 My point is that following this logic we may use external services to replace 
 any filter that has such simple logic. Is this the right direction?

If its an external system, and you can integrate more efficiently by
disabling hosts, then yes thats awesome.

Thats not always going to be the correct direction, but we need to
look at if something can be done externally first. Nova is too big
already, we are actively trying to not expand its scope.

So I worked this mark down API spec and now still working on the server 
states (VM states) as they stay in incorrect state if host suddenly goes 
down. Would appreciate comment on https://review.openstack.org/#/c/192246 to 
have right tract to do it. Maybe directly change the VM states when mark down 
API called and not like now proposed. And yes, there are use cases where one 
do not evacuate the VMs, so it will be valuable to see those states correct.

Related, I am working in OPNFV to bring Doctor project as external system that 
could use under the hood different existing opensource projects like Pacemaker 
or Nagios to detect any kind of host fault fast and use this mark down API to 
tell this to Nova. This Doctor will be opensource and for anybody to use. It 
also has now Ceilometer BP approved to enhance direct alarming for user without 
polling. So let's see what will happen when this work is completed. Could even 
be a component inside openstack someday when reach that kind of maturity 
(detecting faults, fence and doing automatic correlation based on VM specific 
configuration and faults specific configuration if wanted so..).

Br,
Tomi

Thanks,
John



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)

2015-06-25 Thread Sylvain Bauza



Le 24/06/2015 19:56, Joe Gordon a écrit :



On Tue, Jun 23, 2015 at 3:41 AM, Sylvain Bauza sba...@redhat.com 
mailto:sba...@redhat.com wrote:


Hi team,

Some discussion occurred over IRC about a bug which was publicly
open related to TrustedFilter [1]
I want to take the opportunity for raising my concerns about that
specific filter, why I dislike it and how I think we could improve
the situation - and clarify everyone's thoughts)

The current situation is that way : Nova only checks if one host
is compromised only when the scheduler is called, ie. only when
booting/migrating/evacuating/unshelving an instance (well, not
exactly all the evacuate/live-migrate cases, but let's not discuss
about that now). When the request goes in the scheduler, all the
hosts are checked against all the enabled filters and the
TrustedFilter is making an external HTTP(S) call to the
Attestation API service (not handled by Nova) for *each host* to
see if the host is valid (not compromised) or not.

To be clear, that's the only in-tree scheduler filter which
explicitly does an external call to a separate service that Nova
is not managing. I can see at least 3 reasons for thinking about
why it's bad :

#1 : that's a terrible bottleneck for performance, because we're
IO-blocking N times given N hosts (we're even not multiplexing the
HTTP requests)
#2 : all the filters are checking an internal Nova state for the
host (called HostState) but that the TrustedFilter, which means
that conceptually we defer the decision to a 3rd-party engine
#3 : that Attestation API services becomes a de facto dependency
for Nova (since it's an in-tree filter) while it's not listed as a
dependency and thus not gated.


All of these reasons could be acceptable if that would cover the
exposed usecase given in [1] (ie. I want to make sure that if my
host gets compromised, my instances will not be running on that
host) but that just doesn't work, due to the situation I mentioned
above.

So, given that, here are my thoughts :
a/ if a host gets compromised, we can just disable its service to
prevent its election as a valid destination host. There is no need
for a specialised filter.
b/ if a host is compromised, we can assume that the instances have
to resurrect elsewhere, ie. we can call a nova evacuate
c/ checking if an host is compromised or not is not a Nova
responsibility since it's already perfectly done by [2]

In other words, I'm considering that security usecase as
something analog as the HA usecase [3] where we need a 3rd-party
tool responsible for periodically checking the state of the hosts,
and if compromised then call the Nova API for fencing the host and
evacuating the compromised instances.

Given that, I'm proposing to deprecate TrustedFilter and explictly
mention to drop it from in-tree in a later cycle
https://review.openstack.org/194592


Given people are using this, it is a negligible maintenance burden.  I 
think deprecating with the intention of removing is not worth it.


Although it would be very useful to further document the risks with 
this filter (live migration, possible performance issues etc.)


Well, I can understand that customers could not be agreeing to remove 
the filter because there is no clear alternative for them. That said, I 
think saying that the filter is deprecated without saying when it would 
be removed would help some contributors thinking about that and working 
on a better solution, exactly like we did for EC2 API.


To be clear, I want to freeze the filter by deprecating it and 
explaining why it's wrong (by amending the devref section and giving a 
LOG warning saying it's deprecated) and then leave the filter within 
in-tree unless we are sure that there is a good solution out of Nova.


-Sylvain





Thoughts ?
-Sylvain



[1] https://bugs.launchpad.net/nova/+bug/1456228
[2] https://github.com/OpenAttestation/OpenAttestation
[3]
http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposal/


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)

Re: [openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)

2015-06-24 Thread Wei, Gang
Only if all the hosts managed by OpenStack are capable for measured boot 
process, then let 3rd-party tool call nova fencing API might be better than 
using TrustedFilter.

But if not all the hosts support measured boot, then with TrustedFilter we can 
schedule VM to only measured and trusted host, but in 3rd-party tool case, only 
untrusted/compromised hosts will be fenced, the host with unknown 
trustworthiness will still be able to run VM but the owner is not willing to do 
it that way.

So I would suggest using the 3rd-party tools as enhancing way to supplement our 
TCP/trustedfilter feature. And the 3rd party tools can also call attestation 
API for host attestation.

Thanks
Jimmy

-Original Message-
From: Bhandaru, Malini K [mailto:malini.k.bhand...@intel.com] 
Sent: Wednesday, June 24, 2015 1:13 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] How to properly detect and fence a 
compromised host (and why I dislike TrustedFilter)

Would like to add to Shane's points below.

1) The Trust filter can be treated as an API, with different underlying 
implementations. Its default could even be Not Implemented and always return 
false.
 And Nova.conf could specify use the OAT trust implementation. This would 
not break present day users of the functionality.

2) The issue in the original bug is a a VM waking up after a reboot on a host 
that has not pre-determined whether the host is still trustable.
 This is essentially begging a feature to check that all constraints 
requested by a VM during launch are confirmed to hold when it re-awakens, even 
if it is not
 going through Nova scheduler at this point. 

 This holds even for aggregates that might be specified by geo, or even 
reservation such as Coke or Pepsi.
 What if a host, even without a reboot and certainly before a reboot was 
assigned from Coke to Pepsi, there is cross contamination.
 Perhaps we need Nova hooks that can be registered with functions that 
check expected aggregate values.

 Better still have  libvirt functionality that makes a call back for each 
VM on a host to ensure its constraints are satisfied on start-up/boot, and 
re-start when it comes out of pause.

 Using aggregate for trust with a cron job to check for trust is 
inefficient in this case, trust status gets updated only on a host reboot. 
Intel TXT is a boot
 time authentication.

Regards
Malini


-Original Message-
From: Wang, Shane [mailto:shane.w...@intel.com] 
Sent: Tuesday, June 23, 2015 9:26 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] How to properly detect and fence a 
compromised host (and why I dislike TrustedFilter)

AFAIK, TrustedFilter is using a sort of cache to cache the trusted state, which 
is designed to solve the performance issue mentioned here.

My thoughts for deprecating it are:
#1. We already have customers here in China who are using that filter. How are 
they going to do upgrade in the future?
#2. Dependency should not be a reason to deprecate a module in OpenStack, Nova 
is not a stand-alone module, and it depends on various technologies and 
libraries.

Intel is setting up the third party CI for TCP/OAT in Liberty, which is to 
address the concerns mentioned in the thread. And also, OAT is an open source 
project which is being maintained as the long-term strategy.

For the situation that a host gets compromised, OAT checks trusted or untrusted 
from the start point of boot/reboot, it is hard for OAT to detect whether a 
host gets compromised when it is running, I don't know how to detect that 
without the filter?
Back to Michael's question, the process of the verification is done by software 
automatically when a host boots or reboots, will that be an overhead for the 
admin to have a separate job?

Thanks.
--
Shane

-Original Message-
From: Michael Still [mailto:mi...@stillhq.com]
Sent: Wednesday, June 24, 2015 7:49 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] How to properly detect and fence a 
compromised host (and why I dislike TrustedFilter)

I agree. I feel like this is another example of functionality which is 
trivially implemented outside nova, and where it works much better if we don't 
do it. Couldn't an admin just have a cron job which verifies hosts, and then 
adds them to a compromised-hosts host aggregate if they're owned? I assume 
without testing it that you can migrate instances _out_ of a host aggregate you 
can't boot in?

Michael

On Tue, Jun 23, 2015 at 8:41 PM, Sylvain Bauza sba...@redhat.com wrote:
 Hi team,

 Some discussion occurred over IRC about a bug which was publicly open 
 related to TrustedFilter [1] I want to take the opportunity for 
 raising my concerns about that specific filter, why I dislike it and 
 how I think we could improve the situation - and clarify everyone's
 thoughts)


Re: [openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)

2015-06-24 Thread Sylvain Bauza
(general point, could we please try not top-posting ? It makes a little 
harder to follow the conversation)


Replies inline.

Le 24/06/2015 08:15, Wei, Gang a écrit :

Only if all the hosts managed by OpenStack are capable for measured boot 
process, then let 3rd-party tool call nova fencing API might be better than 
using TrustedFilter.

But if not all the hosts support measured boot, then with TrustedFilter we can 
schedule VM to only measured and trusted host, but in 3rd-party tool case, only 
untrusted/compromised hosts will be fenced, the host with unknown 
trustworthiness will still be able to run VM but the owner is not willing to do 
it that way.
You don't need a specific filter for fencing one host from being 
scheduled. Just calling the Nova os-services API to explicitly disable 
the service (and providing a reason) just makes the hosts belonging to 
the service not able to be elected (thanks to the ComputeFilter)


To be clear, I would love to see the logic inverted, ie. something which 
would call the OAT service for a specific host would then fire a service 
disable.




So I would suggest using the 3rd-party tools as enhancing way to supplement our 
TCP/trustedfilter feature. And the 3rd party tools can also call attestation 
API for host attestation.


I don't see much benefits of keeping such filter for the reasons I 
mentioned below. Again, if you want to fence one host, you can just 
disable its service, that's enough.



Thanks
Jimmy

-Original Message-
From: Bhandaru, Malini K [mailto:malini.k.bhand...@intel.com]
Sent: Wednesday, June 24, 2015 1:13 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] How to properly detect and fence a 
compromised host (and why I dislike TrustedFilter)

Would like to add to Shane's points below.

1) The Trust filter can be treated as an API, with different underlying implementations. 
Its default could even be Not Implemented and always return false.
  And Nova.conf could specify use the OAT trust implementation. This would 
not break present day users of the functionality.


Don't get me wrong, I'm not against OAT, I'm just saying that the 
TrustedFilter design is wrong. Even if another alternative would come up 
to serve the TrustedComputePool model of things, it would still be bad 
for the reasons I mentioned below, and wouldn't cover the usecase I quoted.




2) The issue in the original bug is a a VM waking up after a reboot on a host 
that has not pre-determined whether the host is still trustable.
  This is essentially begging a feature to check that all constraints 
requested by a VM during launch are confirmed to hold when it re-awakens, even 
if it is not
  going through Nova scheduler at this point.


So I think we are in agreement that for covering that usecase, it can't 
be done at the scheduler level.
Using TrustedFilter just ensures that at the instance creation time, the 
host is checked but confuses people because they think it will be 
enforced for the whole instance lifecyle.




  This holds even for aggregates that might be specified by geo, or even reservation such as 
Coke or Pepsi.
  What if a host, even without a reboot and certainly before a reboot was 
assigned from Coke to Pepsi, there is cross contamination.
  Perhaps we need Nova hooks that can be registered with functions that 
check expected aggregate values.


I don't honestly see the point of an host aggregate. Given the failure 
domain is an host, you only need to trust that host or not. The fact 
that the host belongs to an aggregate or not is orthogonal to our 
problem IMHO.



  Better still have  libvirt functionality that makes a call back for each 
VM on a host to ensure its constraints are satisfied on start-up/boot, and 
re-start when it comes out of pause.


Hum, doesn't it sound weird to have the host being the source of truth ? 
Also, if an host gets compromised, why couldn't we assume that the 
instances can be compromised too and need to be resurrected (ie. 
evacuated) ?




  Using aggregate for trust with a cron job to check for trust is 
inefficient in this case, trust status gets updated only on a host reboot. 
Intel TXT is a boot
  time authentication.


Isn't that a specific implementation of OAT ? Couldn't we assume some 
alternative implementations able to do live checks ? I mean, whatever on 
how you trigger an host check (at boot time or periodically), you can 
then fire an alarm which would set the necessary remediation actions : 
fence the host and evacuate the instances




Regards
Malini


-Original Message-
From: Wang, Shane [mailto:shane.w...@intel.com]
Sent: Tuesday, June 23, 2015 9:26 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] How to properly detect and fence a 
compromised host (and why I dislike TrustedFilter)

AFAIK, TrustedFilter is using a sort of cache to cache the 

Re: [openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)

2015-06-24 Thread Dulko, Michal
 -Original Message-
 From: Sylvain Bauza [mailto:sba...@redhat.com]
 Sent: Wednesday, June 24, 2015 9:39 AM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] How to properly detect and fence a
 compromised host (and why I dislike TrustedFilter)
 
 (general point, could we please try not top-posting ? It makes a little harder
 to follow the conversation)
 
 Replies inline.
 
 Le 24/06/2015 08:15, Wei, Gang a écrit :
  Only if all the hosts managed by OpenStack are capable for measured boot
 process, then let 3rd-party tool call nova fencing API might be better than
 using TrustedFilter.
 
  But if not all the hosts support measured boot, then with TrustedFilter we
 can schedule VM to only measured and trusted host, but in 3rd-party tool
 case, only untrusted/compromised hosts will be fenced, the host with
 unknown trustworthiness will still be able to run VM but the owner is not
 willing to do it that way.
 You don't need a specific filter for fencing one host from being scheduled.
 Just calling the Nova os-services API to explicitly disable the service (and
 providing a reason) just makes the hosts belonging to the service not able to
 be elected (thanks to the ComputeFilter)
 
 To be clear, I would love to see the logic inverted, ie. something which would
 call the OAT service for a specific host would then fire a service disable.
 
 
  So I would suggest using the 3rd-party tools as enhancing way to
 supplement our TCP/trustedfilter feature. And the 3rd party tools can also
 call attestation API for host attestation.
 
 I don't see much benefits of keeping such filter for the reasons I mentioned
 below. Again, if you want to fence one host, you can just disable its service,
 that's enough.

This won't address the case in which you have heterogenic environment and you 
want only some important VMs to run on trusted hosts (and for the rest of the 
VMs you don't care).

 
  Thanks
  Jimmy
 
  -Original Message-
  From: Bhandaru, Malini K [mailto:malini.k.bhand...@intel.com]
  Sent: Wednesday, June 24, 2015 1:13 PM
  To: OpenStack Development Mailing List (not for usage questions)
  Subject: Re: [openstack-dev] [nova] How to properly detect and fence a
  compromised host (and why I dislike TrustedFilter)
 
  Would like to add to Shane's points below.
 
  1) The Trust filter can be treated as an API, with different underlying
 implementations. Its default could even be Not Implemented and always
 return false.
And Nova.conf could specify use the OAT trust implementation. This
 would not break present day users of the functionality.
 
 Don't get me wrong, I'm not against OAT, I'm just saying that the
 TrustedFilter design is wrong. Even if another alternative would come up to
 serve the TrustedComputePool model of things, it would still be bad for the
 reasons I mentioned below, and wouldn't cover the usecase I quoted.
 
 
  2) The issue in the original bug is a a VM waking up after a reboot on a 
  host
 that has not pre-determined whether the host is still trustable.
This is essentially begging a feature to check that all constraints
 requested by a VM during launch are confirmed to hold when it re-awakens,
 even if it is not
going through Nova scheduler at this point.
 
 So I think we are in agreement that for covering that usecase, it can't be
 done at the scheduler level.
 Using TrustedFilter just ensures that at the instance creation time, the host 
 is
 checked but confuses people because they think it will be enforced for the
 whole instance lifecyle.
 
 
This holds even for aggregates that might be specified by geo, or even
 reservation such as Coke or Pepsi.
What if a host, even without a reboot and certainly before a reboot 
  was
 assigned from Coke to Pepsi, there is cross contamination.
Perhaps we need Nova hooks that can be registered with functions that
 check expected aggregate values.
 
 I don't honestly see the point of an host aggregate. Given the failure domain
 is an host, you only need to trust that host or not. The fact that the host
 belongs to an aggregate or not is orthogonal to our problem IMHO.
 
Better still have  libvirt functionality that makes a call back for 
  each VM
 on a host to ensure its constraints are satisfied on start-up/boot, and 
 re-start
 when it comes out of pause.
 
 Hum, doesn't it sound weird to have the host being the source of truth ?
 Also, if an host gets compromised, why couldn't we assume that the
 instances can be compromised too and need to be resurrected (ie.
 evacuated) ?
 
 
Using aggregate for trust with a cron job to check for trust is 
  inefficient
 in this case, trust status gets updated only on a host reboot. Intel TXT is a
 boot
time authentication.
 
 Isn't that a specific implementation of OAT ? Couldn't we assume some
 alternative implementations able to do live checks ? I mean, whatever on
 how you 

Re: [openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)

2015-06-24 Thread Sylvain Bauza



Le 24/06/2015 10:35, Dulko, Michal a écrit :

-Original Message-
From: Sylvain Bauza [mailto:sba...@redhat.com]
Sent: Wednesday, June 24, 2015 9:39 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] How to properly detect and fence a
compromised host (and why I dislike TrustedFilter)

(general point, could we please try not top-posting ? It makes a little harder
to follow the conversation)

Replies inline.

Le 24/06/2015 08:15, Wei, Gang a écrit :

Only if all the hosts managed by OpenStack are capable for measured boot

process, then let 3rd-party tool call nova fencing API might be better than
using TrustedFilter.

But if not all the hosts support measured boot, then with TrustedFilter we

can schedule VM to only measured and trusted host, but in 3rd-party tool
case, only untrusted/compromised hosts will be fenced, the host with
unknown trustworthiness will still be able to run VM but the owner is not
willing to do it that way.
You don't need a specific filter for fencing one host from being scheduled.
Just calling the Nova os-services API to explicitly disable the service (and
providing a reason) just makes the hosts belonging to the service not able to
be elected (thanks to the ComputeFilter)

To be clear, I would love to see the logic inverted, ie. something which would
call the OAT service for a specific host would then fire a service disable.



So I would suggest using the 3rd-party tools as enhancing way to

supplement our TCP/trustedfilter feature. And the 3rd party tools can also
call attestation API for host attestation.

I don't see much benefits of keeping such filter for the reasons I mentioned
below. Again, if you want to fence one host, you can just disable its service,
that's enough.

This won't address the case in which you have heterogenic environment and you 
want only some important VMs to run on trusted hosts (and for the rest of the 
VMs you don't care).


In that case, you don't care about fencing the host, rather making sure 
that your trusted instances have to move. All of that is still not 
needed in Nova, you can just identify the instances with a specific 
metadata said 'trusted' and ask to evacuate them.


If you want to prevent new 'trusted instances to be booted on 
compromised hosts, you perhaps have to write a filter for comparing the 
instance metadata and the host metadata (which can be given by an 
aggregate) but all of that doesn't require the TrustedFilter.


To be clear, maybe some gaps are missing for fulfilling your whole 
story, but I'd rather identify what's missing within Nova for matching 
instances and hosts and leave the tagging done by a 3rd party tool, 
rather than trying to promote a filter which is very specific and 
doesn't really work by its own (needs an external dependency)


-Sylvain




Thanks
Jimmy

-Original Message-
From: Bhandaru, Malini K [mailto:malini.k.bhand...@intel.com]
Sent: Wednesday, June 24, 2015 1:13 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] How to properly detect and fence a
compromised host (and why I dislike TrustedFilter)

Would like to add to Shane's points below.

1) The Trust filter can be treated as an API, with different underlying

implementations. Its default could even be Not Implemented and always
return false.

   And Nova.conf could specify use the OAT trust implementation. This

would not break present day users of the functionality.

Don't get me wrong, I'm not against OAT, I'm just saying that the
TrustedFilter design is wrong. Even if another alternative would come up to
serve the TrustedComputePool model of things, it would still be bad for the
reasons I mentioned below, and wouldn't cover the usecase I quoted.



2) The issue in the original bug is a a VM waking up after a reboot on a host

that has not pre-determined whether the host is still trustable.

   This is essentially begging a feature to check that all constraints

requested by a VM during launch are confirmed to hold when it re-awakens,
even if it is not

   going through Nova scheduler at this point.

So I think we are in agreement that for covering that usecase, it can't be
done at the scheduler level.
Using TrustedFilter just ensures that at the instance creation time, the host is
checked but confuses people because they think it will be enforced for the
whole instance lifecyle.



   This holds even for aggregates that might be specified by geo, or even

reservation such as Coke or Pepsi.

   What if a host, even without a reboot and certainly before a reboot was

assigned from Coke to Pepsi, there is cross contamination.

   Perhaps we need Nova hooks that can be registered with functions that

check expected aggregate values.

I don't honestly see the point of an host aggregate. Given the failure domain
is an host, you only need to trust that host or not. The fact that the host
belongs to 

Re: [openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)

2015-06-24 Thread Joe Gordon
On Tue, Jun 23, 2015 at 3:41 AM, Sylvain Bauza sba...@redhat.com wrote:

 Hi team,

 Some discussion occurred over IRC about a bug which was publicly open
 related to TrustedFilter [1]
 I want to take the opportunity for raising my concerns about that specific
 filter, why I dislike it and how I think we could improve the situation -
 and clarify everyone's thoughts)

 The current situation is that way : Nova only checks if one host is
 compromised only when the scheduler is called, ie. only when
 booting/migrating/evacuating/unshelving an instance (well, not exactly all
 the evacuate/live-migrate cases, but let's not discuss about that now).
 When the request goes in the scheduler, all the hosts are checked against
 all the enabled filters and the TrustedFilter is making an external HTTP(S)
 call to the Attestation API service (not handled by Nova) for *each host*
 to see if the host is valid (not compromised) or not.

 To be clear, that's the only in-tree scheduler filter which explicitly
 does an external call to a separate service that Nova is not managing. I
 can see at least 3 reasons for thinking about why it's bad :

 #1 : that's a terrible bottleneck for performance, because we're
 IO-blocking N times given N hosts (we're even not multiplexing the HTTP
 requests)
 #2 : all the filters are checking an internal Nova state for the host
 (called HostState) but that the TrustedFilter, which means that
 conceptually we defer the decision to a 3rd-party engine
 #3 : that Attestation API services becomes a de facto dependency for Nova
 (since it's an in-tree filter) while it's not listed as a dependency and
 thus not gated.


 All of these reasons could be acceptable if that would cover the exposed
 usecase given in [1] (ie. I want to make sure that if my host gets
 compromised, my instances will not be running on that host) but that just
 doesn't work, due to the situation I mentioned above.

 So, given that, here are my thoughts :
 a/ if a host gets compromised, we can just disable its service to prevent
 its election as a valid destination host. There is no need for a
 specialised filter.
 b/ if a host is compromised, we can assume that the instances have to
 resurrect elsewhere, ie. we can call a nova evacuate
 c/ checking if an host is compromised or not is not a Nova responsibility
 since it's already perfectly done by [2]

 In other words, I'm considering that security usecase as something
 analog as the HA usecase [3] where we need a 3rd-party tool responsible for
 periodically checking the state of the hosts, and if compromised then call
 the Nova API for fencing the host and evacuating the compromised instances.

 Given that, I'm proposing to deprecate TrustedFilter and explictly mention
 to drop it from in-tree in a later cycle
 https://review.openstack.org/194592


Given people are using this, it is a negligible maintenance burden.  I
think deprecating with the intention of removing is not worth it.

Although it would be very useful to further document the risks with this
filter (live migration, possible performance issues etc.)




 Thoughts ?
 -Sylvain



 [1] https://bugs.launchpad.net/nova/+bug/1456228
 [2] https://github.com/OpenAttestation/OpenAttestation
 [3]
 http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposal/


 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)

2015-06-23 Thread Bhandaru, Malini K
Would like to add to Shane's points below.

1) The Trust filter can be treated as an API, with different underlying 
implementations. Its default could even be Not Implemented and always return 
false.
 And Nova.conf could specify use the OAT trust implementation. This would 
not break present day users of the functionality.

2) The issue in the original bug is a a VM waking up after a reboot on a host 
that has not pre-determined whether the host is still trustable.
 This is essentially begging a feature to check that all constraints 
requested by a VM during launch are confirmed to hold when it re-awakens, even 
if it is not
 going through Nova scheduler at this point. 

 This holds even for aggregates that might be specified by geo, or even 
reservation such as Coke or Pepsi.
 What if a host, even without a reboot and certainly before a reboot was 
assigned from Coke to Pepsi, there is cross contamination.
 Perhaps we need Nova hooks that can be registered with functions that 
check expected aggregate values.

 Better still have  libvirt functionality that makes a call back for each 
VM on a host to ensure its constraints are satisfied on start-up/boot, and 
re-start when it comes out of pause.

 Using aggregate for trust with a cron job to check for trust is 
inefficient in this case, trust status gets updated only on a host reboot. 
Intel TXT is a boot
 time authentication.

Regards
Malini


-Original Message-
From: Wang, Shane [mailto:shane.w...@intel.com] 
Sent: Tuesday, June 23, 2015 9:26 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] How to properly detect and fence a 
compromised host (and why I dislike TrustedFilter)

AFAIK, TrustedFilter is using a sort of cache to cache the trusted state, which 
is designed to solve the performance issue mentioned here.

My thoughts for deprecating it are:
#1. We already have customers here in China who are using that filter. How are 
they going to do upgrade in the future?
#2. Dependency should not be a reason to deprecate a module in OpenStack, Nova 
is not a stand-alone module, and it depends on various technologies and 
libraries.

Intel is setting up the third party CI for TCP/OAT in Liberty, which is to 
address the concerns mentioned in the thread. And also, OAT is an open source 
project which is being maintained as the long-term strategy.

For the situation that a host gets compromised, OAT checks trusted or untrusted 
from the start point of boot/reboot, it is hard for OAT to detect whether a 
host gets compromised when it is running, I don't know how to detect that 
without the filter?
Back to Michael's question, the process of the verification is done by software 
automatically when a host boots or reboots, will that be an overhead for the 
admin to have a separate job?

Thanks.
--
Shane

-Original Message-
From: Michael Still [mailto:mi...@stillhq.com]
Sent: Wednesday, June 24, 2015 7:49 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] How to properly detect and fence a 
compromised host (and why I dislike TrustedFilter)

I agree. I feel like this is another example of functionality which is 
trivially implemented outside nova, and where it works much better if we don't 
do it. Couldn't an admin just have a cron job which verifies hosts, and then 
adds them to a compromised-hosts host aggregate if they're owned? I assume 
without testing it that you can migrate instances _out_ of a host aggregate you 
can't boot in?

Michael

On Tue, Jun 23, 2015 at 8:41 PM, Sylvain Bauza sba...@redhat.com wrote:
 Hi team,

 Some discussion occurred over IRC about a bug which was publicly open 
 related to TrustedFilter [1] I want to take the opportunity for 
 raising my concerns about that specific filter, why I dislike it and 
 how I think we could improve the situation - and clarify everyone's
 thoughts)

 The current situation is that way : Nova only checks if one host is 
 compromised only when the scheduler is called, ie. only when 
 booting/migrating/evacuating/unshelving an instance (well, not exactly 
 all the evacuate/live-migrate cases, but let's not discuss about that 
 now). When the request goes in the scheduler, all the hosts are 
 checked against all the enabled filters and the TrustedFilter is 
 making an external HTTP(S) call to the Attestation API service (not 
 handled by Nova) for *each host* to see if the host is valid (not 
 compromised) or not.

 To be clear, that's the only in-tree scheduler filter which explicitly 
 does an external call to a separate service that Nova is not managing.
 I can see at least 3 reasons for thinking about why it's bad :

 #1 : that's a terrible bottleneck for performance, because we're 
 IO-blocking N times given N hosts (we're even not multiplexing the 
 HTTP requests)
 #2 : all the filters are checking an internal Nova state for the host 
 

[openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)

2015-06-23 Thread Sylvain Bauza

Hi team,

Some discussion occurred over IRC about a bug which was publicly open 
related to TrustedFilter [1]
I want to take the opportunity for raising my concerns about that 
specific filter, why I dislike it and how I think we could improve the 
situation - and clarify everyone's thoughts)


The current situation is that way : Nova only checks if one host is 
compromised only when the scheduler is called, ie. only when 
booting/migrating/evacuating/unshelving an instance (well, not exactly 
all the evacuate/live-migrate cases, but let's not discuss about that 
now). When the request goes in the scheduler, all the hosts are checked 
against all the enabled filters and the TrustedFilter is making an 
external HTTP(S) call to the Attestation API service (not handled by 
Nova) for *each host* to see if the host is valid (not compromised) or not.


To be clear, that's the only in-tree scheduler filter which explicitly 
does an external call to a separate service that Nova is not managing. I 
can see at least 3 reasons for thinking about why it's bad :


#1 : that's a terrible bottleneck for performance, because we're 
IO-blocking N times given N hosts (we're even not multiplexing the HTTP 
requests)
#2 : all the filters are checking an internal Nova state for the host 
(called HostState) but that the TrustedFilter, which means that 
conceptually we defer the decision to a 3rd-party engine
#3 : that Attestation API services becomes a de facto dependency for 
Nova (since it's an in-tree filter) while it's not listed as a 
dependency and thus not gated.



All of these reasons could be acceptable if that would cover the exposed 
usecase given in [1] (ie. I want to make sure that if my host gets 
compromised, my instances will not be running on that host) but that 
just doesn't work, due to the situation I mentioned above.


So, given that, here are my thoughts :
a/ if a host gets compromised, we can just disable its service to 
prevent its election as a valid destination host. There is no need for a 
specialised filter.
b/ if a host is compromised, we can assume that the instances have to 
resurrect elsewhere, ie. we can call a nova evacuate
c/ checking if an host is compromised or not is not a Nova 
responsibility since it's already perfectly done by [2]


In other words, I'm considering that security usecase as something 
analog as the HA usecase [3] where we need a 3rd-party tool responsible 
for periodically checking the state of the hosts, and if compromised 
then call the Nova API for fencing the host and evacuating the 
compromised instances.


Given that, I'm proposing to deprecate TrustedFilter and explictly 
mention to drop it from in-tree in a later cycle 
https://review.openstack.org/194592


Thoughts ?
-Sylvain



[1] https://bugs.launchpad.net/nova/+bug/1456228
[2] https://github.com/OpenAttestation/OpenAttestation
[3] http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposal/


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)

2015-06-23 Thread Wang, Shane
AFAIK, TrustedFilter is using a sort of cache to cache the trusted state, which 
is designed to solve the performance issue mentioned here.

My thoughts for deprecating it are:
#1. We already have customers here in China who are using that filter. How are 
they going to do upgrade in the future?
#2. Dependency should not be a reason to deprecate a module in OpenStack, Nova 
is not a stand-alone module, and it depends on various technologies and 
libraries.

Intel is setting up the third party CI for TCP/OAT in Liberty, which is to 
address the concerns mentioned in the thread. And also, OAT is an open source 
project which is being maintained as the long-term strategy.

For the situation that a host gets compromised, OAT checks trusted or untrusted 
from the start point of boot/reboot, it is hard for OAT to detect whether a 
host gets compromised when it is running, I don't know how to detect that 
without the filter?
Back to Michael's question, the process of the verification is done by software 
automatically when a host boots or reboots, will that be an overhead for the 
admin to have a separate job?

Thanks.
--
Shane

-Original Message-
From: Michael Still [mailto:mi...@stillhq.com] 
Sent: Wednesday, June 24, 2015 7:49 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] How to properly detect and fence a 
compromised host (and why I dislike TrustedFilter)

I agree. I feel like this is another example of functionality which is 
trivially implemented outside nova, and where it works much better if we don't 
do it. Couldn't an admin just have a cron job which verifies hosts, and then 
adds them to a compromised-hosts host aggregate if they're owned? I assume 
without testing it that you can migrate instances _out_ of a host aggregate you 
can't boot in?

Michael

On Tue, Jun 23, 2015 at 8:41 PM, Sylvain Bauza sba...@redhat.com wrote:
 Hi team,

 Some discussion occurred over IRC about a bug which was publicly open 
 related to TrustedFilter [1] I want to take the opportunity for 
 raising my concerns about that specific filter, why I dislike it and 
 how I think we could improve the situation - and clarify everyone's 
 thoughts)

 The current situation is that way : Nova only checks if one host is 
 compromised only when the scheduler is called, ie. only when 
 booting/migrating/evacuating/unshelving an instance (well, not exactly 
 all the evacuate/live-migrate cases, but let's not discuss about that 
 now). When the request goes in the scheduler, all the hosts are 
 checked against all the enabled filters and the TrustedFilter is 
 making an external HTTP(S) call to the Attestation API service (not 
 handled by Nova) for *each host* to see if the host is valid (not 
 compromised) or not.

 To be clear, that's the only in-tree scheduler filter which explicitly 
 does an external call to a separate service that Nova is not managing. 
 I can see at least 3 reasons for thinking about why it's bad :

 #1 : that's a terrible bottleneck for performance, because we're 
 IO-blocking N times given N hosts (we're even not multiplexing the 
 HTTP requests)
 #2 : all the filters are checking an internal Nova state for the host 
 (called HostState) but that the TrustedFilter, which means that 
 conceptually we defer the decision to a 3rd-party engine
 #3 : that Attestation API services becomes a de facto dependency for 
 Nova (since it's an in-tree filter) while it's not listed as a 
 dependency and thus not gated.


 All of these reasons could be acceptable if that would cover the 
 exposed usecase given in [1] (ie. I want to make sure that if my host 
 gets compromised, my instances will not be running on that host) but 
 that just doesn't work, due to the situation I mentioned above.

 So, given that, here are my thoughts :
 a/ if a host gets compromised, we can just disable its service to 
 prevent its election as a valid destination host. There is no need for 
 a specialised filter.
 b/ if a host is compromised, we can assume that the instances have to 
 resurrect elsewhere, ie. we can call a nova evacuate c/ checking if an 
 host is compromised or not is not a Nova responsibility since it's 
 already perfectly done by [2]

 In other words, I'm considering that security usecase as something 
 analog as the HA usecase [3] where we need a 3rd-party tool 
 responsible for periodically checking the state of the hosts, and if 
 compromised then call the Nova API for fencing the host and evacuating the 
 compromised instances.

 Given that, I'm proposing to deprecate TrustedFilter and explictly 
 mention to drop it from in-tree in a later cycle 
 https://review.openstack.org/194592

 Thoughts ?
 -Sylvain



 [1] https://bugs.launchpad.net/nova/+bug/1456228
 [2] https://github.com/OpenAttestation/OpenAttestation
 [3] 
 http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposa
 l/


 

Re: [openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)

2015-06-23 Thread Michael Still
I agree. I feel like this is another example of functionality which is
trivially implemented outside nova, and where it works much better if
we don't do it. Couldn't an admin just have a cron job which verifies
hosts, and then adds them to a compromised-hosts host aggregate if
they're owned? I assume without testing it that you can migrate
instances _out_ of a host aggregate you can't boot in?

Michael

On Tue, Jun 23, 2015 at 8:41 PM, Sylvain Bauza sba...@redhat.com wrote:
 Hi team,

 Some discussion occurred over IRC about a bug which was publicly open
 related to TrustedFilter [1]
 I want to take the opportunity for raising my concerns about that specific
 filter, why I dislike it and how I think we could improve the situation -
 and clarify everyone's thoughts)

 The current situation is that way : Nova only checks if one host is
 compromised only when the scheduler is called, ie. only when
 booting/migrating/evacuating/unshelving an instance (well, not exactly all
 the evacuate/live-migrate cases, but let's not discuss about that now). When
 the request goes in the scheduler, all the hosts are checked against all the
 enabled filters and the TrustedFilter is making an external HTTP(S) call to
 the Attestation API service (not handled by Nova) for *each host* to see if
 the host is valid (not compromised) or not.

 To be clear, that's the only in-tree scheduler filter which explicitly does
 an external call to a separate service that Nova is not managing. I can see
 at least 3 reasons for thinking about why it's bad :

 #1 : that's a terrible bottleneck for performance, because we're IO-blocking
 N times given N hosts (we're even not multiplexing the HTTP requests)
 #2 : all the filters are checking an internal Nova state for the host
 (called HostState) but that the TrustedFilter, which means that conceptually
 we defer the decision to a 3rd-party engine
 #3 : that Attestation API services becomes a de facto dependency for Nova
 (since it's an in-tree filter) while it's not listed as a dependency and
 thus not gated.


 All of these reasons could be acceptable if that would cover the exposed
 usecase given in [1] (ie. I want to make sure that if my host gets
 compromised, my instances will not be running on that host) but that just
 doesn't work, due to the situation I mentioned above.

 So, given that, here are my thoughts :
 a/ if a host gets compromised, we can just disable its service to prevent
 its election as a valid destination host. There is no need for a specialised
 filter.
 b/ if a host is compromised, we can assume that the instances have to
 resurrect elsewhere, ie. we can call a nova evacuate
 c/ checking if an host is compromised or not is not a Nova responsibility
 since it's already perfectly done by [2]

 In other words, I'm considering that security usecase as something analog
 as the HA usecase [3] where we need a 3rd-party tool responsible for
 periodically checking the state of the hosts, and if compromised then call
 the Nova API for fencing the host and evacuating the compromised instances.

 Given that, I'm proposing to deprecate TrustedFilter and explictly mention
 to drop it from in-tree in a later cycle https://review.openstack.org/194592

 Thoughts ?
 -Sylvain



 [1] https://bugs.launchpad.net/nova/+bug/1456228
 [2] https://github.com/OpenAttestation/OpenAttestation
 [3] http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposal/


 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Rackspace Australia

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev