Re: [openstack-dev] [nova] should we have a stale data indication in nova list/show?

2014-06-26 Thread Day, Phil
 -Original Message-
 From: Ahmed RAHAL [mailto:ara...@iweb.com]
 Sent: 25 June 2014 20:25
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [nova] should we have a stale data indication in
 nova list/show?
 
 Le 2014-06-25 14:26, Day, Phil a écrit :
  -Original Message-
  From: Sean Dague [mailto:s...@dague.net]
  Sent: 25 June 2014 11:49
  To: OpenStack Development Mailing List (not for usage questions)
  Subject: Re: [openstack-dev] [nova] should we have a stale data
  indication in nova list/show?
 
 
  +1 that the state shouldn't be changed.
 
  What about if we exposed the last updated time to users and allowed
 them to decide if its significant or not ?
 
 
 This would just indicate the last operation's time stamp.
 There already is a field in nova show called 'updated' that has some kind of
 indication. I honestly do not know who updates that field, but if anything,
 this existing field could/should be used.
 
 
Doh ! - yes that is the updated_at value in the DB.

I'd missed the last bit of my train of thought on this, which was that we could 
make the periodic task which checks (and corrects) the instance state update 
the updated_at timestamp even if the state is unchanged.

However that does add another DB update per instance every 60 seconds, and I'm 
with Joe that I'm really not convinced this is taking the Nova view of Status 
in the right direction.   Better to educate / document the limitation of status 
as they stand than to try and change it I think.

Phil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] should we have a stale data indication in nova list/show?

2014-06-25 Thread Belmiro Moreira
I like the current behavior of not changing the VM state if nova-compute
goes down.

The cloud operators can identify the issue in the compute node and try to
fix it without users noticing. Depending in the problem I can inform users
if instances are affected and change the state if necessary.

I wouldn't like is to expose any failure in nova-compute to users and be
contacted because VM state changed.



Belmiro


On Wed, Jun 25, 2014 at 4:49 AM, Ahmed RAHAL ara...@iweb.com wrote:

 Hi,

 Le 2014-06-24 20:12, Joe Gordon a écrit :


 Finally, assuming the customer had access to this 'unknown' state
 information, what would he be able to do with it ? Usually he has no
 lever to 'evacuate' or 'recover' the VM. All he could do is spawn
 another instance to replace the lost one. But only if the VM really
 is currently unavailable, an information he must get from other
 sources.


 If I was a user, and my instance went to an 'UNKNOWN' state, I would
 check if its still operating, and if not delete it and start another
 instance.


 If I was a user and polled nova list/show on a regular basis just in case
 the management pane indicates a failure, I should have no expectation
 whatsoever. If service availability is my concern, I should monitor the
 service, nothing else. From there, once the service has failed, I can
 imagine checking if VM management is telling me something. However, if my
 service is down and I have no longer access to the VM ... simple case:
 destroy and respawn.

 My point is that we should not make the nova state an expected source of
 truth regarding service availability in the VM, as there is no way to tell
 such a thing. If my VM is being DDOSed, nova would still say everything is
 fine, while my service is really down. In that situation, console access
 would help me determine if the VM management is wrong by stating everything
 is ok or if there is another root cause.
 Similarly, should nova show a state change if load in the VM is through
 the roof and the service is not responsive ? or if OOM is killing all my
 processes because of a memory shortage ?

 As stated before, providing such a state information is misleading because
 there are cases where node unavailability is not service disruptive, thus
 it would indicate a false positive while the opposite (everything is ok) is
 not at all indicative of a healthy status of the service.

 Maybe am I overseeing a use case here where you absolutely need the user
 of the service to know a potential problem with his hosting platform.

 Ahmed.

 --
 =
 Ahmed Rahal ara...@iweb.com / iWeb Technologies
 Spécialiste de l'Architecture TI
 / IT Architecture Specialist
 =


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] should we have a stale data indication in nova list/show?

2014-06-25 Thread Sean Dague
On 06/25/2014 04:28 AM, Belmiro Moreira wrote:
 I like the current behavior of not changing the VM state if nova-compute
 goes down. 
 
 The cloud operators can identify the issue in the compute node and try
 to fix it without users noticing. Depending in the problem I can inform
 users if instances are affected and change the state if necessary. 
 
 I wouldn't like is to expose any failure in nova-compute to users and be
 contacted because VM state changed. 

Agreed. Plus in the perfectly normal case of an upgrade of a compute
node, it's expected that nova-compute is going to be down for some
period of time, and it's 100% expected that the VMs remain up and ACTIVE
over that period.

Setting VMs to ERROR would totally gum that up.

-Sean

-- 
Sean Dague
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] should we have a stale data indication in nova list/show?

2014-06-25 Thread Day, Phil
 -Original Message-
 From: Sean Dague [mailto:s...@dague.net]
 Sent: 25 June 2014 11:49
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] should we have a stale data indication in
 nova list/show?
 
 On 06/25/2014 04:28 AM, Belmiro Moreira wrote:
  I like the current behavior of not changing the VM state if
  nova-compute goes down.
 
  The cloud operators can identify the issue in the compute node and try
  to fix it without users noticing. Depending in the problem I can
  inform users if instances are affected and change the state if necessary.
 
  I wouldn't like is to expose any failure in nova-compute to users and
  be contacted because VM state changed.
 
 Agreed. Plus in the perfectly normal case of an upgrade of a compute node,
 it's expected that nova-compute is going to be down for some period of
 time, and it's 100% expected that the VMs remain up and ACTIVE over that
 period.
 
 Setting VMs to ERROR would totally gum that up.
 
+1 that the state shouldn't be changed.

What about if we exposed the last updated time to users and allowed them to 
decide if its significant or not ?
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] should we have a stale data indication in nova list/show?

2014-06-25 Thread Joe Gordon
On Wed, Jun 25, 2014 at 11:26 AM, Day, Phil philip@hp.com wrote:

  -Original Message-
  From: Sean Dague [mailto:s...@dague.net]
  Sent: 25 June 2014 11:49
  To: OpenStack Development Mailing List (not for usage questions)
  Subject: Re: [openstack-dev] [nova] should we have a stale data
 indication in
  nova list/show?
 
  On 06/25/2014 04:28 AM, Belmiro Moreira wrote:
   I like the current behavior of not changing the VM state if
   nova-compute goes down.
  
   The cloud operators can identify the issue in the compute node and try
   to fix it without users noticing. Depending in the problem I can
   inform users if instances are affected and change the state if
 necessary.
  
   I wouldn't like is to expose any failure in nova-compute to users and
   be contacted because VM state changed.
 
  Agreed. Plus in the perfectly normal case of an upgrade of a compute node
  it's expected that nova-compute is going to be down for some period of
  time, and it's 100% expected that the VMs remain up and ACTIVE over that
  period.
 
  Setting VMs to ERROR would totally gum that up.
 
 +1 that the state shouldn't be changed.

 What about if we exposed the last updated time to users and allowed them
 to decide if its significant or not ?


I have changed my mind on this one. I agree we shouldn't change any state,
and I also do not think we should show the last update time to the user
either. I don't think showing that information would be very helpful to
users, if at all, and would train users to poll nova more.

We don't want folks using nova list/show to check if there instance is
functional or not.  A user should care about if there instance is operating
as expected, nova misbehaving isn't the only reason an instance may go
haywire (the service they are running inside the instance can crash etc.).
 So we should expect users to be able to monitor the health of there
instance without needing to poll nova on a regular basis.

Thoughts?


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] should we have a stale data indication in nova list/show?

2014-06-25 Thread Ahmed RAHAL

Le 2014-06-25 14:26, Day, Phil a écrit :

-Original Message-
From: Sean Dague [mailto:s...@dague.net]
Sent: 25 June 2014 11:49
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] should we have a stale data indication in
nova list/show?



+1 that the state shouldn't be changed.

What about if we exposed the last updated time to users and allowed them to 
decide if its significant or not ?



This would just indicate the last operation's time stamp.
There already is a field in nova show called 'updated' that has some 
kind of indication. I honestly do not know who updates that field, but 
if anything, this existing field could/should be used.



Ahmed.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] should we have a stale data indication in nova list/show?

2014-06-24 Thread Joe Gordon
On Jun 18, 2014 3:03 PM, Chris Friesen chris.frie...@windriver.com
wrote:

 The output of nova list and nova show reflects the current status in
the database, not the actual state on the compute node.

 If the instances in question are on a compute node that is currently
down, then the information is stale and possibly incorrect.  Would there
be any benefit in adding some sort of indication of this in the nova list
output?  Or do we expect the end-user to check nova service-list (or
other health-monitoring mechanisms) to see if the compute node is up
before relying on the output of nova list?

Great question.  In general I don't think a regular user should never need
to run any health monitoring command. I think the larger question here is
what how do we handle instances associated with a nova-compute that is
currently being reported as down.  If nova-compute is down we have no way
of knowing the actual state of the instances. Perhaps we should move those
instances to an error state and let the user respond accordingly (delete
instance etc.). And if the Nova-compute service returns we correct the
state.


 Chris

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] should we have a stale data indication in nova list/show?

2014-06-24 Thread Russell Bryant
On 06/24/2014 04:42 PM, Joe Gordon wrote:
 
 On Jun 18, 2014 3:03 PM, Chris Friesen chris.frie...@windriver.com
 mailto:chris.frie...@windriver.com wrote:

 The output of nova list and nova show reflects the current status
 in the database, not the actual state on the compute node.

 If the instances in question are on a compute node that is currently
 down, then the information is stale and possibly incorrect.  Would
 there be any benefit in adding some sort of indication of this in the
 nova list output?  Or do we expect the end-user to check nova
 service-list (or other health-monitoring mechanisms) to see if the
 compute node is up before relying on the output of nova list?
 
 Great question.  In general I don't think a regular user should never
 need to run any health monitoring command. I think the larger question
 here is what how do we handle instances associated with a nova-compute
 that is currently being reported as down.  If nova-compute is down we
 have no way of knowing the actual state of the instances. Perhaps we
 should move those instances to an error state and let the user respond
 accordingly (delete instance etc.). And if the Nova-compute service
 returns we correct the state.

There be dragons here.  Just because Nova doesn't see the node reporting
in, doesn't mean the VMs aren't actually still running.  I think this
needs to be left to logic outside of Nova.

For example, if your deployment monitoring really does think the host is
down, you want to make sure it's *completely* dead before taking further
action such as evacuating the host.  You certainly don't want to risk
having the VM running on two different hosts.  This is just a business I
don't think Nova should be getting in to.

-- 
Russell Bryant

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] should we have a stale data indication in nova list/show?

2014-06-24 Thread Joe Gordon
On Jun 24, 2014 2:31 PM, Russell Bryant rbry...@redhat.com wrote:

 On 06/24/2014 04:42 PM, Joe Gordon wrote:
 
  On Jun 18, 2014 3:03 PM, Chris Friesen chris.frie...@windriver.com
  mailto:chris.frie...@windriver.com wrote:
 
  The output of nova list and nova show reflects the current status
  in the database, not the actual state on the compute node.
 
  If the instances in question are on a compute node that is currently
  down, then the information is stale and possibly incorrect.  Would
  there be any benefit in adding some sort of indication of this in the
  nova list output?  Or do we expect the end-user to check nova
  service-list (or other health-monitoring mechanisms) to see if the
  compute node is up before relying on the output of nova list?
 
  Great question.  In general I don't think a regular user should never
  need to run any health monitoring command. I think the larger question
  here is what how do we handle instances associated with a nova-compute
  that is currently being reported as down.  If nova-compute is down we
  have no way of knowing the actual state of the instances. Perhaps we
  should move those instances to an error state and let the user respond
  accordingly (delete instance etc.). And if the Nova-compute service
  returns we correct the state.

 There be dragons here.  Just because Nova doesn't see the node reporting
 in, doesn't mean the VMs aren't actually still running.  I think this
 needs to be left to logic outside of Nova.

 For example, if your deployment monitoring really does think the host is
 down, you want to make sure it's *completely* dead before taking further
 action such as evacuating the host.  You certainly don't want to risk
 having the VM running on two different hosts.  This is just a business I
 don't think Nova should be getting in to.

I agree nova shouldn't take any actions. But I don't think leaving an
instance as 'active' is right either.  I was thinking move instance to
error state (maybe an unknown state would be more accurate) and let the
user deal with it, versus just letting the user deal with everything. Since
nova knows something *may* be wrong shouldn't we convey that to the user
(I'm not 100% sure we should myself).


 --
 Russell Bryant

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] should we have a stale data indication in nova list/show?

2014-06-24 Thread Rick Jones

On 06/24/2014 02:38 PM, Joe Gordon wrote:

I agree nova shouldn't take any actions. But I don't think leaving an
instance as 'active' is right either.  I was thinking move instance to
error state (maybe an unknown state would be more accurate) and let the
user deal with it, versus just letting the user deal with everything.
Since nova knows something *may* be wrong shouldn't we convey that to
the user (I'm not 100% sure we should myself).


I suspect the user's first action will be to call Support asking Hey, 
why is my perfectly usable instance showing-up in the ERROR|UNKNOWN state?


rick jones

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] should we have a stale data indication in nova list/show?

2014-06-24 Thread Joe Gordon
On Jun 24, 2014 2:47 PM, Rick Jones rick.jon...@hp.com wrote:

 On 06/24/2014 02:38 PM, Joe Gordon wrote:

 I agree nova shouldn't take any actions. But I don't think leaving an
 instance as 'active' is right either.  I was thinking move instance to
 error state (maybe an unknown state would be more accurate) and let the
 user deal with it, versus just letting the user deal with everything.
 Since nova knows something *may* be wrong shouldn't we convey that to
 the user (I'm not 100% sure we should myself).


 I suspect the user's first action will be to call Support asking Hey,
why is my perfectly usable instance showing-up in the ERROR|UNKNOWN state?

True, but the alternative is, why is this dead instance listed as ACTIVE,
and I am being billed for it too. I think this is a loose-loose


 rick jones


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] should we have a stale data indication in nova list/show?

2014-06-24 Thread Steve Gordon
- Original Message -
 From: Rick Jones rick.jon...@hp.com
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 
 On 06/24/2014 02:38 PM, Joe Gordon wrote:
  I agree nova shouldn't take any actions. But I don't think leaving an
  instance as 'active' is right either.  I was thinking move instance to
  error state (maybe an unknown state would be more accurate) and let the
  user deal with it, versus just letting the user deal with everything.
  Since nova knows something *may* be wrong shouldn't we convey that to
  the user (I'm not 100% sure we should myself).
 
 I suspect the user's first action will be to call Support asking Hey,
 why is my perfectly usable instance showing-up in the ERROR|UNKNOWN state?
 
 rick jones

The existing alternative would be having the user calling to ask why their 
non-responsive instance is showing as RUNNING so you are kind of damned if you 
do, damned if you don't.

Steve

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] should we have a stale data indication in nova list/show?

2014-06-24 Thread Rick Jones

On 06/24/2014 02:53 PM, Steve Gordon wrote:

- Original Message -

From: Rick Jones rick.jon...@hp.com
To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.org

On 06/24/2014 02:38 PM, Joe Gordon wrote:

I agree nova shouldn't take any actions. But I don't think leaving an
instance as 'active' is right either.  I was thinking move instance to
error state (maybe an unknown state would be more accurate) and let the
user deal with it, versus just letting the user deal with everything.
Since nova knows something *may* be wrong shouldn't we convey that to
the user (I'm not 100% sure we should myself).


I suspect the user's first action will be to call Support asking Hey,
why is my perfectly usable instance showing-up in the ERROR|UNKNOWN state?

rick jones


The existing alternative would be having the user calling to ask why
their non-responsive instance is showing as RUNNING so you are kind
of damned if you do, damned if you don't.


There will be a call for a non-responsive instance regardless what it 
shows.  However, responsive instance not showing ERROR or UNKNOWN will 
not generate a call.  So, all in all I think you will get fewer calls if 
you don't mark the not known to be non-responsive instance as ERROR or 
UNKNOWN.


rick


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] should we have a stale data indication in nova list/show?

2014-06-24 Thread Ahmed RAHAL

Le 2014-06-24 17:38, Joe Gordon a écrit :


On Jun 24, 2014 2:31 PM, Russell Bryant rbry...@redhat.com
mailto:rbry...@redhat.com wrote:



  There be dragons here.  Just because Nova doesn't see the node reporting
  in, doesn't mean the VMs aren't actually still running.  I think this
  needs to be left to logic outside of Nova.
 
  For example, if your deployment monitoring really does think the host is
  down, you want to make sure it's *completely* dead before taking further
  action such as evacuating the host.  You certainly don't want to risk
  having the VM running on two different hosts.  This is just a business I
  don't think Nova should be getting in to.

I agree nova shouldn't take any actions. But I don't think leaving an
instance as 'active' is right either.  I was thinking move instance to
error state (maybe an unknown state would be more accurate) and let the
user deal with it, versus just letting the user deal with everything.
Since nova knows something *may* be wrong shouldn't we convey that to
the user (I'm not 100% sure we should myself).


I saw compute nodes going down, from a management perspective (say, 
nova-compute disappeared), but VMs were just fine. Reporting on the 
state may be misleading. The 'unknown' state would fit, but nothing lets 
us presume the VMs are non-functional or impacted.


As far as an operator is concerned, a compute node not responding is a 
reason enough to check the situation.


To go further about other comments related to customer feedback, there 
are many reasons a customer may think his VM is down, so showing him a 
'useful information' in some cases will only trigger more anxiety.
Besides people will start hammering the API to check 'state' instead of 
using proper monitoring.

But, state is already reported if the customer shuts down a VM, so ...

Currently, compute nodes state reporting is done by the nova-compute 
process himself, reporting back with a time stamp to the database 
(through conductor if I recall well). It's more like a watchdog than a 
reporting system.
For VMs (assuming we find it useful) the same kind of process could 
occur: nova-compute reporting back all states with time stamps for all 
VMs he hosts. This shall then be optional, as I already sense 
scaling/performance issues here (ceilometer anyone ?).


Finally, assuming the customer had access to this 'unknown' state 
information, what would he be able to do with it ? Usually he has no 
lever to 'evacuate' or 'recover' the VM. All he could do is spawn 
another instance to replace the lost one. But only if the VM really is 
currently unavailable, an information he must get from other sources.


So, I see how the state reporting could be a useful information, but am 
not sure that nova Status is the right place for it.


Ahmed.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] should we have a stale data indication in nova list/show?

2014-06-24 Thread Joe Gordon
On Tue, Jun 24, 2014 at 4:16 PM, Ahmed RAHAL ara...@iweb.com wrote:

 Le 2014-06-24 17:38, Joe Gordon a écrit :


 On Jun 24, 2014 2:31 PM, Russell Bryant rbry...@redhat.com
 mailto:rbry...@redhat.com wrote:


There be dragons here.  Just because Nova doesn't see the node
 reporting
   in, doesn't mean the VMs aren't actually still running.  I think this
   needs to be left to logic outside of Nova.
  
   For example, if your deployment monitoring really does think the host
 is
   down, you want to make sure it's *completely* dead before taking
 further
   action such as evacuating the host.  You certainly don't want to risk
   having the VM running on two different hosts.  This is just a business
 I
   don't think Nova should be getting in to.

 I agree nova shouldn't take any actions. But I don't think leaving an
 instance as 'active' is right either.  I was thinking move instance to
 error state (maybe an unknown state would be more accurate) and let the
 user deal with it, versus just letting the user deal with everything.
 Since nova knows something *may* be wrong shouldn't we convey that to
 the user (I'm not 100% sure we should myself).


 I saw compute nodes going down, from a management perspective (say,
 nova-compute disappeared), but VMs were just fine. Reporting on the state
 may be misleading. The 'unknown' state would fit, but nothing lets us
 presume the VMs are non-functional or impacted.


nothing lets us presume the opposite as well. We don't know if the instance
is still up.



 As far as an operator is concerned, a compute node not responding is a
 reason enough to check the situation.

 To go further about other comments related to customer feedback, there are
 many reasons a customer may think his VM is down, so showing him a 'useful
 information' in some cases will only trigger more anxiety.
 Besides people will start hammering the API to check 'state' instead of
 using proper monitoring.
 But, state is already reported if the customer shuts down a VM, so ...

 Currently, compute nodes state reporting is done by the nova-compute
 process himself, reporting back with a time stamp to the database (through
 conductor if I recall well). It's more like a watchdog than a reporting
 system.
 For VMs (assuming we find it useful) the same kind of process could occur:
 nova-compute reporting back all states with time stamps for all VMs he
 hosts. This shall then be optional, as I already sense scaling/performance
 issues here (ceilometer anyone ?).

 Finally, assuming the customer had access to this 'unknown' state
 information, what would he be able to do with it ? Usually he has no lever
 to 'evacuate' or 'recover' the VM. All he could do is spawn another
 instance to replace the lost one. But only if the VM really is currently
 unavailable, an information he must get from other sources.


If I was a user, and my instance went to an 'UNKNOWN' state, I would check
if its still operating, and if not delete it and start another instance.



 So, I see how the state reporting could be a useful information, but am
 not sure that nova Status is the right place for it.

 Ahmed.


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] should we have a stale data indication in nova list/show?

2014-06-24 Thread Joe Gordon
On Tue, Jun 24, 2014 at 5:12 PM, Joe Gordon joe.gord...@gmail.com wrote:




 On Tue, Jun 24, 2014 at 4:16 PM, Ahmed RAHAL ara...@iweb.com wrote:

 Le 2014-06-24 17:38, Joe Gordon a écrit :


 On Jun 24, 2014 2:31 PM, Russell Bryant rbry...@redhat.com
 mailto:rbry...@redhat.com wrote:


There be dragons here.  Just because Nova doesn't see the node
 reporting
   in, doesn't mean the VMs aren't actually still running.  I think this
   needs to be left to logic outside of Nova.
  
   For example, if your deployment monitoring really does think the host
 is
   down, you want to make sure it's *completely* dead before taking
 further
   action such as evacuating the host.  You certainly don't want to risk
   having the VM running on two different hosts.  This is just a
 business I
   don't think Nova should be getting in to.

 I agree nova shouldn't take any actions. But I don't think leaving an
 instance as 'active' is right either.  I was thinking move instance to
 error state (maybe an unknown state would be more accurate) and let the
 user deal with it, versus just letting the user deal with everything.
 Since nova knows something *may* be wrong shouldn't we convey that to
 the user (I'm not 100% sure we should myself).


 I saw compute nodes going down, from a management perspective (say,
 nova-compute disappeared), but VMs were just fine. Reporting on the state
 may be misleading. The 'unknown' state would fit, but nothing lets us
 presume the VMs are non-functional or impacted.


 nothing lets us presume the opposite as well. We don't know if the
 instance is still up.



 As far as an operator is concerned, a compute node not responding is a
 reason enough to check the situation.

 To go further about other comments related to customer feedback, there
 are many reasons a customer may think his VM is down, so showing him a
 'useful information' in some cases will only trigger more anxiety.
 Besides people will start hammering the API to check 'state' instead of
 using proper monitoring.
 But, state is already reported if the customer shuts down a VM, so ...

 Currently, compute nodes state reporting is done by the nova-compute
 process himself, reporting back with a time stamp to the database (through
 conductor if I recall well). It's more like a watchdog than a reporting
 system.
 For VMs (assuming we find it useful) the same kind of process could
 occur: nova-compute reporting back all states with time stamps for all VMs
 he hosts. This shall then be optional, as I already sense
 scaling/performance issues here (ceilometer anyone ?).

 Finally, assuming the customer had access to this 'unknown' state
 information, what would he be able to do with it ? Usually he has no lever
 to 'evacuate' or 'recover' the VM. All he could do is spawn another
 instance to replace the lost one. But only if the VM really is currently
 unavailable, an information he must get from other sources.


 If I was a user, and my instance went to an 'UNKNOWN' state, I would check
 if its still operating, and if not delete it and start another instance.


The alternative is how things work today, if a nova-compute goes down we
don't change any instance states, and the user is responsible for making
sure there instance is still operating even if the instance is set to
ACTIVE.





 So, I see how the state reporting could be a useful information, but am
 not sure that nova Status is the right place for it.

 Ahmed. in


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] should we have a stale data indication in nova list/show?

2014-06-24 Thread Chris Behrens
I don't think we should be flipping states for instances on a potentially 
downed compute. We definitely should not set an instance to ERROR. I think a 
time associated with the last power state check might be nice and be good 
enough.

- Chris

 On Jun 24, 2014, at 5:17 PM, Joe Gordon joe.gord...@gmail.com wrote:
 
 
 
 
 On Tue, Jun 24, 2014 at 5:12 PM, Joe Gordon joe.gord...@gmail.com wrote:
 
 
 
 On Tue, Jun 24, 2014 at 4:16 PM, Ahmed RAHAL ara...@iweb.com wrote:
 Le 2014-06-24 17:38, Joe Gordon a écrit :
 
 On Jun 24, 2014 2:31 PM, Russell Bryant rbry...@redhat.com
 mailto:rbry...@redhat.com wrote:
 
   There be dragons here.  Just because Nova doesn't see the node reporting
   in, doesn't mean the VMs aren't actually still running.  I think this
   needs to be left to logic outside of Nova.
  
   For example, if your deployment monitoring really does think the host is
   down, you want to make sure it's *completely* dead before taking further
   action such as evacuating the host.  You certainly don't want to risk
   having the VM running on two different hosts.  This is just a business I
   don't think Nova should be getting in to.
 
 I agree nova shouldn't take any actions. But I don't think leaving an
 instance as 'active' is right either.  I was thinking move instance to
 error state (maybe an unknown state would be more accurate) and let the
 user deal with it, versus just letting the user deal with everything.
 Since nova knows something *may* be wrong shouldn't we convey that to
 the user (I'm not 100% sure we should myself).
 
 I saw compute nodes going down, from a management perspective (say, 
 nova-compute disappeared), but VMs were just fine. Reporting on the state 
 may be misleading. The 'unknown' state would fit, but nothing lets us 
 presume the VMs are non-functional or impacted.
 
 nothing lets us presume the opposite as well. We don't know if the instance 
 is still up.
  
 
 As far as an operator is concerned, a compute node not responding is a 
 reason enough to check the situation.
 
 To go further about other comments related to customer feedback, there are 
 many reasons a customer may think his VM is down, so showing him a 'useful 
 information' in some cases will only trigger more anxiety.
 Besides people will start hammering the API to check 'state' instead of 
 using proper monitoring.
 But, state is already reported if the customer shuts down a VM, so ...
 
 Currently, compute nodes state reporting is done by the nova-compute 
 process himself, reporting back with a time stamp to the database (through 
 conductor if I recall well). It's more like a watchdog than a reporting 
 system.
 For VMs (assuming we find it useful) the same kind of process could occur: 
 nova-compute reporting back all states with time stamps for all VMs he 
 hosts. This shall then be optional, as I already sense scaling/performance 
 issues here (ceilometer anyone ?).
 
 Finally, assuming the customer had access to this 'unknown' state 
 information, what would he be able to do with it ? Usually he has no lever 
 to 'evacuate' or 'recover' the VM. All he could do is spawn another 
 instance to replace the lost one. But only if the VM really is currently 
 unavailable, an information he must get from other sources.
 
 If I was a user, and my instance went to an 'UNKNOWN' state, I would check 
 if its still operating, and if not delete it and start another instance.
 
 The alternative is how things work today, if a nova-compute goes down we 
 don't change any instance states, and the user is responsible for making sure 
 there instance is still operating even if the instance is set to ACTIVE.
  
  
 
 So, I see how the state reporting could be a useful information, but am not 
 sure that nova Status is the right place for it.
 
 Ahmed. in
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] should we have a stale data indication in nova list/show?

2014-06-24 Thread Ahmed RAHAL

Hi,

Le 2014-06-24 20:12, Joe Gordon a écrit :


Finally, assuming the customer had access to this 'unknown' state
information, what would he be able to do with it ? Usually he has no
lever to 'evacuate' or 'recover' the VM. All he could do is spawn
another instance to replace the lost one. But only if the VM really
is currently unavailable, an information he must get from other sources.


If I was a user, and my instance went to an 'UNKNOWN' state, I would
check if its still operating, and if not delete it and start another
instance.


If I was a user and polled nova list/show on a regular basis just in 
case the management pane indicates a failure, I should have no 
expectation whatsoever. If service availability is my concern, I should 
monitor the service, nothing else. From there, once the service has 
failed, I can imagine checking if VM management is telling me something. 
However, if my service is down and I have no longer access to the VM ... 
simple case: destroy and respawn.


My point is that we should not make the nova state an expected source of 
truth regarding service availability in the VM, as there is no way to 
tell such a thing. If my VM is being DDOSed, nova would still say 
everything is fine, while my service is really down. In that situation, 
console access would help me determine if the VM management is wrong by 
stating everything is ok or if there is another root cause.
Similarly, should nova show a state change if load in the VM is through 
the roof and the service is not responsive ? or if OOM is killing all my 
processes because of a memory shortage ?


As stated before, providing such a state information is misleading 
because there are cases where node unavailability is not service 
disruptive, thus it would indicate a false positive while the opposite 
(everything is ok) is not at all indicative of a healthy status of the 
service.


Maybe am I overseeing a use case here where you absolutely need the user 
of the service to know a potential problem with his hosting platform.


Ahmed.

--
=
Ahmed Rahal ara...@iweb.com / iWeb Technologies
Spécialiste de l'Architecture TI
/ IT Architecture Specialist
=

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev