Re: [openstack-dev] [nova] VM diagnostics - V3 proposal

2013-12-29 Thread Gary Kotton
Hi,
Following all of the discussion I have done the following:
1. Update the wiki with all of the details -
https://wiki.openstack.org/wiki/Nova_VM_Diagnostics
2. Renamed the BP. It is now called v3-diagnostics
https://blueprints.launchpad.net/openstack/?searchtext=v3-diagnostics
3. Posted a patch with libvirt support -
https://review.openstack.org/#/c/61753/
The other drivers that support diagnostics will be updated in the coming
days.
I am not sure how tempest behaves with the V3 client but I am in the
process of looking into that so that we can leverage this API with
tempest. Do we also want the same support in V2? I think that it could be
very helpful with the spurious test failures that we have.
Thanks and a happy new year to all
Gary


On 12/19/13 6:21 PM, Vladik Romanovsky vladik.romanov...@enovance.com
wrote:

Ah, I think I've responded too fast, sorry.

meter-list provides a list of various measurements that are being done
per resource.
sample-list provides a list of samples per every meter: ceilometer
sample-list --meter cpu_util -q resource_id=vm_uuid
These samples can be aggregated over a period of time per every meter and
resource:
ceilometer statistics -m cpu_util -q
'timestampSTART;timestamp=END;resource_id=vm_uuid' --period 3600

Vladik



- Original Message -
 From: Daniel P. Berrange berra...@redhat.com
 To: Vladik Romanovsky vladik.romanov...@enovance.com
 Cc: OpenStack Development Mailing List (not for usage questions)
openstack-dev@lists.openstack.org, John
 Garbutt j...@johngarbutt.com
 Sent: Thursday, 19 December, 2013 10:37:27 AM
 Subject: Re: [openstack-dev] [nova] VM diagnostics - V3 proposal
 
 On Thu, Dec 19, 2013 at 03:47:30PM +0100, Vladik Romanovsky wrote:
  I think it was:
  
  ceilometer sample-list -m cpu_util -q 'resource_id=vm_uuid'
 
 Hmm, a standard devstack deployment of ceilometer doesn't seem to
 record any performance stats at all - just shows me the static
 configuration parameters :-(
 
  ceilometer meter-list  -q
'resource_id=296b22c6-2a4d-4a8d-a7cd-2d73339f9c70'
 
+-+---+--+---
---+--+--
+
 | Name| Type  | Unit | Resource ID
 | | User ID  | Project ID
 | |
 
+-+---+--+---
---+--+--
+
 | disk.ephemeral.size | gauge | GB   |
 | 296b22c6-2a4d-4a8d-a7cd-2d73339f9c70 |
96f9a624a325473daf4cd7875be46009 |
 | ec26984024c1438e8e2f93dc6a8c5ad0 |
 | disk.root.size  | gauge | GB   |
 | 296b22c6-2a4d-4a8d-a7cd-2d73339f9c70 |
96f9a624a325473daf4cd7875be46009 |
 | ec26984024c1438e8e2f93dc6a8c5ad0 |
 | instance| gauge | instance |
 | 296b22c6-2a4d-4a8d-a7cd-2d73339f9c70 |
96f9a624a325473daf4cd7875be46009 |
 | ec26984024c1438e8e2f93dc6a8c5ad0 |
 | instance:m1.small   | gauge | instance |
 | 296b22c6-2a4d-4a8d-a7cd-2d73339f9c70 |
96f9a624a325473daf4cd7875be46009 |
 | ec26984024c1438e8e2f93dc6a8c5ad0 |
 | memory  | gauge | MB   |
 | 296b22c6-2a4d-4a8d-a7cd-2d73339f9c70 |
96f9a624a325473daf4cd7875be46009 |
 | ec26984024c1438e8e2f93dc6a8c5ad0 |
 | vcpus   | gauge | vcpu |
 | 296b22c6-2a4d-4a8d-a7cd-2d73339f9c70 |
96f9a624a325473daf4cd7875be46009 |
 | ec26984024c1438e8e2f93dc6a8c5ad0 |
 
+-+---+--+---
---+--+--
+
 
 
 If the admin user can't rely on ceilometer guaranteeing availability of
 the performance stats at all, then I think having an API in nova to
report
 them is in fact justifiable. In fact it is probably justifiable no
matter
 what as a fallback way to check that VMs are doing in the fact of
failure
 of ceilometer / part of the cloud infrastructure.
 
 Daniel
 --
 |: 
https://urldefense.proofpoint.com/v1/url?u=http://berrange.com/k=oIvRg1%
2BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxPEq
8%3D%0Am=rSOjPJG6y%2F7%2B6l5u7ekbOpyWGQbpEaEZGcEqj8pnDJk%3D%0As=6555259
3e486953ee40218f87feced256047c7277195f1c4e44e44fa847210a4  -o-
https://urldefense.proofpoint.com/v1/url?u=http://www.flickr.com/photos/d
berrange/k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2B
fDtysg45MkPhCZFxPEq8%3D%0Am=rSOjPJG6y%2F7%2B6l5u7ekbOpyWGQbpEaEZGcEqj8pn
DJk%3D%0As=5a2cc10d6d1df7a65129d7b3184e7280c0e2ad47c969e16bdba70a66d3b34
905 :|
 |: 
https://urldefense.proofpoint.com/v1/url?u=http://libvirt.org/k=oIvRg1%2
BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxPEq8
%3D%0Am=rSOjPJG6y%2F7%2B6l5u7ekbOpyWGQbpEaEZGcEqj8pnDJk%3D%0As=64423471
28d8c5384877cb4e356baa489330dd532f9cdf764bfb6d5fd65ce984
-o- 
https://urldefense.proofpoint.com/v1/url?u=http://virt-manager.org/k=oIv
Rg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZF
xPEq8%3D%0Am

Re: [openstack-dev] [nova] VM diagnostics - V3 proposal

2013-12-20 Thread Oleg Gelbukh
Hi everyone,

I'm sorry for being late to the thread, but what about baremetal driver?
Should it support the get_diagnostics() as well?

--
Best regards,
Oleg Gelbukh


On Thu, Dec 19, 2013 at 8:21 PM, Vladik Romanovsky 
vladik.romanov...@enovance.com wrote:

 Ah, I think I've responded too fast, sorry.

 meter-list provides a list of various measurements that are being done per
 resource.
 sample-list provides a list of samples per every meter: ceilometer
 sample-list --meter cpu_util -q resource_id=vm_uuid
 These samples can be aggregated over a period of time per every meter and
 resource:
 ceilometer statistics -m cpu_util -q
 'timestampSTART;timestamp=END;resource_id=vm_uuid' --period 3600

 Vladik



 - Original Message -
  From: Daniel P. Berrange berra...@redhat.com
  To: Vladik Romanovsky vladik.romanov...@enovance.com
  Cc: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org, John
  Garbutt j...@johngarbutt.com
  Sent: Thursday, 19 December, 2013 10:37:27 AM
  Subject: Re: [openstack-dev] [nova] VM diagnostics - V3 proposal
 
  On Thu, Dec 19, 2013 at 03:47:30PM +0100, Vladik Romanovsky wrote:
   I think it was:
  
   ceilometer sample-list -m cpu_util -q 'resource_id=vm_uuid'
 
  Hmm, a standard devstack deployment of ceilometer doesn't seem to
  record any performance stats at all - just shows me the static
  configuration parameters :-(
 
   ceilometer meter-list  -q
 'resource_id=296b22c6-2a4d-4a8d-a7cd-2d73339f9c70'
 
 +-+---+--+--+--+--+
  | Name| Type  | Unit | Resource ID
  | | User ID  | Project ID
  | |
 
 +-+---+--+--+--+--+
  | disk.ephemeral.size | gauge | GB   |
  | 296b22c6-2a4d-4a8d-a7cd-2d73339f9c70 |
 96f9a624a325473daf4cd7875be46009 |
  | ec26984024c1438e8e2f93dc6a8c5ad0 |
  | disk.root.size  | gauge | GB   |
  | 296b22c6-2a4d-4a8d-a7cd-2d73339f9c70 |
 96f9a624a325473daf4cd7875be46009 |
  | ec26984024c1438e8e2f93dc6a8c5ad0 |
  | instance| gauge | instance |
  | 296b22c6-2a4d-4a8d-a7cd-2d73339f9c70 |
 96f9a624a325473daf4cd7875be46009 |
  | ec26984024c1438e8e2f93dc6a8c5ad0 |
  | instance:m1.small   | gauge | instance |
  | 296b22c6-2a4d-4a8d-a7cd-2d73339f9c70 |
 96f9a624a325473daf4cd7875be46009 |
  | ec26984024c1438e8e2f93dc6a8c5ad0 |
  | memory  | gauge | MB   |
  | 296b22c6-2a4d-4a8d-a7cd-2d73339f9c70 |
 96f9a624a325473daf4cd7875be46009 |
  | ec26984024c1438e8e2f93dc6a8c5ad0 |
  | vcpus   | gauge | vcpu |
  | 296b22c6-2a4d-4a8d-a7cd-2d73339f9c70 |
 96f9a624a325473daf4cd7875be46009 |
  | ec26984024c1438e8e2f93dc6a8c5ad0 |
 
 +-+---+--+--+--+--+
 
 
  If the admin user can't rely on ceilometer guaranteeing availability of
  the performance stats at all, then I think having an API in nova to
 report
  them is in fact justifiable. In fact it is probably justifiable no matter
  what as a fallback way to check that VMs are doing in the fact of failure
  of ceilometer / part of the cloud infrastructure.
 
  Daniel
  --
  |: http://berrange.com  -o-
 http://www.flickr.com/photos/dberrange/ :|
  |: http://libvirt.org  -o-
 http://virt-manager.org :|
  |: http://autobuild.org   -o-
 http://search.cpan.org/~danberr/ :|
  |: http://entangle-photo.org   -o-
 http://live.gnome.org/gtk-vnc :|
 

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] VM diagnostics - V3 proposal

2013-12-20 Thread Daniel P. Berrange
On Fri, Dec 20, 2013 at 12:56:47PM +0400, Oleg Gelbukh wrote:
 Hi everyone,
 
 I'm sorry for being late to the thread, but what about baremetal driver?
 Should it support the get_diagnostics() as well?

Of course, where practical, every driver should aim to support every
method in the virt driver class API.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] VM diagnostics - V3 proposal

2013-12-20 Thread Matt Riedemann



On Friday, December 20, 2013 3:57:15 AM, Daniel P. Berrange wrote:

On Fri, Dec 20, 2013 at 12:56:47PM +0400, Oleg Gelbukh wrote:

Hi everyone,

I'm sorry for being late to the thread, but what about baremetal driver?
Should it support the get_diagnostics() as well?


Of course, where practical, every driver should aim to support every
method in the virt driver class API.

Regards,
Daniel


Although isn't the baremetal driver moving to ironic, or there is an 
ironic driver moving into nova?  I'm a bit fuzzy on what's going on 
there.  Point is, if we're essentially halting feature development on 
the nova baremetal driver I'd hold off on implementing get_diagnostics 
there for now.


--

Thanks,

Matt Riedemann


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] VM diagnostics - V3 proposal

2013-12-20 Thread Oleg Gelbukh
Matt,

My understanding is that there will be a nova.virt.baremetal.ironic driver
in Nova which will talk to Ironic API to manage bare-metal instances. So,
Ironic will be actually providing the diagnostics data about bm instance
via its API eventually.

Hope someone will correct me if I'm wrong.

--
Best regards,
Oleg Gelbukh


On Fri, Dec 20, 2013 at 7:12 PM, Matt Riedemann
mrie...@linux.vnet.ibm.comwrote:



 On Friday, December 20, 2013 3:57:15 AM, Daniel P. Berrange wrote:

 On Fri, Dec 20, 2013 at 12:56:47PM +0400, Oleg Gelbukh wrote:

 Hi everyone,

 I'm sorry for being late to the thread, but what about baremetal driver?
 Should it support the get_diagnostics() as well?


 Of course, where practical, every driver should aim to support every
 method in the virt driver class API.

 Regards,
 Daniel


 Although isn't the baremetal driver moving to ironic, or there is an
 ironic driver moving into nova?  I'm a bit fuzzy on what's going on there.
  Point is, if we're essentially halting feature development on the nova
 baremetal driver I'd hold off on implementing get_diagnostics there for now.

 --

 Thanks,

 Matt Riedemann


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] VM diagnostics - V3 proposal

2013-12-19 Thread John Garbutt
On 16 December 2013 15:50, Daniel P. Berrange berra...@redhat.com wrote:
 On Mon, Dec 16, 2013 at 03:37:39PM +, John Garbutt wrote:
 On 16 December 2013 15:25, Daniel P. Berrange berra...@redhat.com wrote:
  On Mon, Dec 16, 2013 at 06:58:24AM -0800, Gary Kotton wrote:
  I'd like to propose the following for the V3 API (we will not touch V2
  in case operators have applications that are written against this – this
  may be the case for libvirt or xen. The VMware API support was added
  in I1):
 
   1.  We formalize the data that is returned by the API [1]
 
  Before we debate what standard data should be returned we need
  detail of exactly what info the current 3 virt drivers return.
  IMHO it would be better if we did this all in the existing wiki
  page associated with the blueprint, rather than etherpad, so it
  serves as a permanent historical record for the blueprint design.

 +1

  While we're doing this I think we should also consider whether
  the 'get_diagnostics' API is fit for purpose more generally.
  eg currently it is restricted to administrators. Some, if
  not all, of the data libvirt returns is relevant to the owner
  of the VM but they can not get at it.

 Ceilometer covers that ground, we should ask them about this API.

 If we consider what is potentially in scope for ceilometer and
 subtract that from what the libvirt get_diagnostics impl currently
 returns, you pretty much end up with the empty set. This might cause
 us to question if 'get_diagnostics' should exist at all from the
 POV of the libvirt driver's impl. Perhaps vmware/xen return data
 that is out of scope for ceilometer ?

Hmm, a good point.

  For a cloud administrator it might be argued that the current
  API is too inefficient to be useful in many troubleshooting
  scenarios since it requires you to invoke it once per instance
  if you're collecting info on a set of guests, eg all VMs on
  one host. It could be that cloud admins would be better
  served by an API which returned info for all VMs ona host
  at once, if they're monitoring say, I/O stats across VM
  disks to identify one that is causing I/O trouble ? IOW, I
  think we could do with better identifying the usage scenarios
  for this API if we're to improve its design / impl.

 I like the API that helps you dig into info for a specific host that
 other system highlight as problematic.
 You can do things that could be expensive to compute, but useful for
 troubleshooting.

 If things get expensive to compute, then it may well be preferrable to
 have separate APIs for distinct pieces of interesting diagnostic
 data. eg If they only care about one particular thing, they don't want
 to trigger expensive computations of things they don't care about seeing.

Maybe that is what we need:
* API to get what ceilometer would tell you, maybe using its format
* API to perform expensive diagnostics

But then, we would just be duplicating ceilometer, which goes back to
your original point. And we are trying to get rid of the APIs that
just proxy to another service, so lets not add another one.

Maybe we should just remove this from the v3 API for now, and see who shouts?

John

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] VM diagnostics - V3 proposal

2013-12-19 Thread Daniel P. Berrange
On Thu, Dec 19, 2013 at 02:27:40PM +, John Garbutt wrote:
 On 16 December 2013 15:50, Daniel P. Berrange berra...@redhat.com wrote:
  On Mon, Dec 16, 2013 at 03:37:39PM +, John Garbutt wrote:
  On 16 December 2013 15:25, Daniel P. Berrange berra...@redhat.com wrote:
   On Mon, Dec 16, 2013 at 06:58:24AM -0800, Gary Kotton wrote:
   I'd like to propose the following for the V3 API (we will not touch V2
   in case operators have applications that are written against this – this
   may be the case for libvirt or xen. The VMware API support was added
   in I1):
  
1.  We formalize the data that is returned by the API [1]
  
   Before we debate what standard data should be returned we need
   detail of exactly what info the current 3 virt drivers return.
   IMHO it would be better if we did this all in the existing wiki
   page associated with the blueprint, rather than etherpad, so it
   serves as a permanent historical record for the blueprint design.
 
  +1
 
   While we're doing this I think we should also consider whether
   the 'get_diagnostics' API is fit for purpose more generally.
   eg currently it is restricted to administrators. Some, if
   not all, of the data libvirt returns is relevant to the owner
   of the VM but they can not get at it.
 
  Ceilometer covers that ground, we should ask them about this API.
 
  If we consider what is potentially in scope for ceilometer and
  subtract that from what the libvirt get_diagnostics impl currently
  returns, you pretty much end up with the empty set. This might cause
  us to question if 'get_diagnostics' should exist at all from the
  POV of the libvirt driver's impl. Perhaps vmware/xen return data
  that is out of scope for ceilometer ?
 
 Hmm, a good point.

So perhaps I'm just being dumb, but I deployed ceilometer and could
not figure out how to get it to print out the stats for a single
VM from its CLI ? eg, can someone show me a command line invocation
for ceilometer that displays CPU, memory, disk and network I/O stats
in one go ?


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] VM diagnostics - V3 proposal

2013-12-19 Thread Vladik Romanovsky
I think it was:

ceilometer sample-list -m cpu_util -q 'resource_id=vm_uuid'

Vladik

- Original Message -
 From: Daniel P. Berrange berra...@redhat.com
 To: John Garbutt j...@johngarbutt.com
 Cc: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Sent: Thursday, 19 December, 2013 9:34:02 AM
 Subject: Re: [openstack-dev] [nova] VM diagnostics - V3 proposal
 
 On Thu, Dec 19, 2013 at 02:27:40PM +, John Garbutt wrote:
  On 16 December 2013 15:50, Daniel P. Berrange berra...@redhat.com wrote:
   On Mon, Dec 16, 2013 at 03:37:39PM +, John Garbutt wrote:
   On 16 December 2013 15:25, Daniel P. Berrange berra...@redhat.com
   wrote:
On Mon, Dec 16, 2013 at 06:58:24AM -0800, Gary Kotton wrote:
I'd like to propose the following for the V3 API (we will not touch
V2
in case operators have applications that are written against this –
this
may be the case for libvirt or xen. The VMware API support was added
in I1):
   
 1.  We formalize the data that is returned by the API [1]
   
Before we debate what standard data should be returned we need
detail of exactly what info the current 3 virt drivers return.
IMHO it would be better if we did this all in the existing wiki
page associated with the blueprint, rather than etherpad, so it
serves as a permanent historical record for the blueprint design.
  
   +1
  
While we're doing this I think we should also consider whether
the 'get_diagnostics' API is fit for purpose more generally.
eg currently it is restricted to administrators. Some, if
not all, of the data libvirt returns is relevant to the owner
of the VM but they can not get at it.
  
   Ceilometer covers that ground, we should ask them about this API.
  
   If we consider what is potentially in scope for ceilometer and
   subtract that from what the libvirt get_diagnostics impl currently
   returns, you pretty much end up with the empty set. This might cause
   us to question if 'get_diagnostics' should exist at all from the
   POV of the libvirt driver's impl. Perhaps vmware/xen return data
   that is out of scope for ceilometer ?
  
  Hmm, a good point.
 
 So perhaps I'm just being dumb, but I deployed ceilometer and could
 not figure out how to get it to print out the stats for a single
 VM from its CLI ? eg, can someone show me a command line invocation
 for ceilometer that displays CPU, memory, disk and network I/O stats
 in one go ?
 
 
 Daniel
 --
 |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
 |: http://libvirt.org  -o- http://virt-manager.org :|
 |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
 |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] VM diagnostics - V3 proposal

2013-12-19 Thread Vladik Romanovsky
Or

ceilometer meter-list -q resource_id='vm_uuid'

- Original Message -
 From: Daniel P. Berrange berra...@redhat.com
 To: John Garbutt j...@johngarbutt.com
 Cc: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Sent: Thursday, 19 December, 2013 9:34:02 AM
 Subject: Re: [openstack-dev] [nova] VM diagnostics - V3 proposal
 
 On Thu, Dec 19, 2013 at 02:27:40PM +, John Garbutt wrote:
  On 16 December 2013 15:50, Daniel P. Berrange berra...@redhat.com wrote:
   On Mon, Dec 16, 2013 at 03:37:39PM +, John Garbutt wrote:
   On 16 December 2013 15:25, Daniel P. Berrange berra...@redhat.com
   wrote:
On Mon, Dec 16, 2013 at 06:58:24AM -0800, Gary Kotton wrote:
I'd like to propose the following for the V3 API (we will not touch
V2
in case operators have applications that are written against this –
this
may be the case for libvirt or xen. The VMware API support was added
in I1):
   
 1.  We formalize the data that is returned by the API [1]
   
Before we debate what standard data should be returned we need
detail of exactly what info the current 3 virt drivers return.
IMHO it would be better if we did this all in the existing wiki
page associated with the blueprint, rather than etherpad, so it
serves as a permanent historical record for the blueprint design.
  
   +1
  
While we're doing this I think we should also consider whether
the 'get_diagnostics' API is fit for purpose more generally.
eg currently it is restricted to administrators. Some, if
not all, of the data libvirt returns is relevant to the owner
of the VM but they can not get at it.
  
   Ceilometer covers that ground, we should ask them about this API.
  
   If we consider what is potentially in scope for ceilometer and
   subtract that from what the libvirt get_diagnostics impl currently
   returns, you pretty much end up with the empty set. This might cause
   us to question if 'get_diagnostics' should exist at all from the
   POV of the libvirt driver's impl. Perhaps vmware/xen return data
   that is out of scope for ceilometer ?
  
  Hmm, a good point.
 
 So perhaps I'm just being dumb, but I deployed ceilometer and could
 not figure out how to get it to print out the stats for a single
 VM from its CLI ? eg, can someone show me a command line invocation
 for ceilometer that displays CPU, memory, disk and network I/O stats
 in one go ?
 
 
 Daniel
 --
 |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
 |: http://libvirt.org  -o- http://virt-manager.org :|
 |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
 |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] VM diagnostics - V3 proposal

2013-12-19 Thread Matt Riedemann



On Thursday, December 19, 2013 8:49:13 AM, Vladik Romanovsky wrote:

Or

ceilometer meter-list -q resource_id='vm_uuid'

- Original Message -

From: Daniel P. Berrange berra...@redhat.com
To: John Garbutt j...@johngarbutt.com
Cc: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.org
Sent: Thursday, 19 December, 2013 9:34:02 AM
Subject: Re: [openstack-dev] [nova] VM diagnostics - V3 proposal

On Thu, Dec 19, 2013 at 02:27:40PM +, John Garbutt wrote:

On 16 December 2013 15:50, Daniel P. Berrange berra...@redhat.com wrote:

On Mon, Dec 16, 2013 at 03:37:39PM +, John Garbutt wrote:

On 16 December 2013 15:25, Daniel P. Berrange berra...@redhat.com
wrote:

On Mon, Dec 16, 2013 at 06:58:24AM -0800, Gary Kotton wrote:

I'd like to propose the following for the V3 API (we will not touch
V2
in case operators have applications that are written against this –
this
may be the case for libvirt or xen. The VMware API support was added
in I1):

  1.  We formalize the data that is returned by the API [1]


Before we debate what standard data should be returned we need
detail of exactly what info the current 3 virt drivers return.
IMHO it would be better if we did this all in the existing wiki
page associated with the blueprint, rather than etherpad, so it
serves as a permanent historical record for the blueprint design.


+1


While we're doing this I think we should also consider whether
the 'get_diagnostics' API is fit for purpose more generally.
eg currently it is restricted to administrators. Some, if
not all, of the data libvirt returns is relevant to the owner
of the VM but they can not get at it.


Ceilometer covers that ground, we should ask them about this API.


If we consider what is potentially in scope for ceilometer and
subtract that from what the libvirt get_diagnostics impl currently
returns, you pretty much end up with the empty set. This might cause
us to question if 'get_diagnostics' should exist at all from the
POV of the libvirt driver's impl. Perhaps vmware/xen return data
that is out of scope for ceilometer ?


Hmm, a good point.


So perhaps I'm just being dumb, but I deployed ceilometer and could
not figure out how to get it to print out the stats for a single
VM from its CLI ? eg, can someone show me a command line invocation
for ceilometer that displays CPU, memory, disk and network I/O stats
in one go ?


Daniel
--
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


I just wanted to point out for anyone that hasn't reviewed it yet, but 
Gary's latest design wiki [1] is quite a departure from his original 
set of patches for this blueprint, which was pretty straight-forward, 
just namespacing the diagnostics dict when using the nova v3 API.  The 
keys were all still hypervisor-specific.


The proposal now is much more generic and attempts to translate 
hypervisor-specific keys/data into a common standard versioned set and 
allows for some wiggle room for the drivers to still provide custom 
data if necessary.


I think this is a better long-term solution but is a lot more work than 
the original blueprint and given there seems to be some feeling of 
does nova even need this API, can ceilometer provider it instead? I'd 
like there to be some agreement within nova that this is the right way 
to go before Gary spends a bunch of time on it - and I as the bp 
sponsor spend a bunch of time reviewing it. :)


[1] https://wiki.openstack.org/wiki/Nova_VM_Diagnostics

--

Thanks,

Matt Riedemann


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] VM diagnostics - V3 proposal

2013-12-19 Thread Daniel P. Berrange
On Tue, Dec 17, 2013 at 04:28:30AM -0800, Gary Kotton wrote:
 Hi,
 Following the discussion yesterday I have updated the wiki - please see
 https://wiki.openstack.org/wiki/Nova_VM_Diagnostics. The proposal is
 backwards compatible and will hopefully provide us with the tools to be
 able to troubleshoot VM issues.

Some comments

 If the driver is unable to return the value or does not have
  access to it at the moment then it should return 'n/a'.

I think it is better if the driver just omitted any key that
it doesn't support altogether. That avoids clients / users
having to do magic string comparisons to identify omitted
data.

 An ID for the diagnostics version. The structure defined below
  is version 1 (Integer)

What are the proposed semantics for version numbers. Do they incremented
on any change, or only on backwards incompatible changes ?

 The amount of time in seconds that the VM has been running (Integer)

I'd suggest nano-seconds here. I've been burnt too many times in the
past providing APIs where we rounded data to a coarse unit like seconds.

Let client programs convert from nanoseconds to seconds if they wish
to display it in that way, but keep the API with the full precision.

  The version of the raw data

Same question as previously.



The allowed keys in network/disk/memory details seem to be
unduly limited. Just having a boolean activity for disk
or NICs seems almost entirely useless. eg the VM might have
sent 1 byte when it first booted and nothing more for the
next 10 days, and an admin can't see this.

I'd suggest we should follow the much expanded set of possible
stats shown by the libvirt driver. These are pretty common
things to show for disk/nic activity and a driver wouldn't have
to support all of them if it doesn't have that info.

It would be nice to have CPU stats available too. 


 https://urldefense.proofpoint.com/v1/url?u=http://berrange.com/k=oIvRg1%2
 BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxPEq8%
 3D%0Am=k92Oxw4Ev6Raba%2FayHa0ExWlFkO%2BLbCNYQYrLDivTK8%3D%0As=dd903dfca0
 b7b3ace5c560509caf1164f8f3f4dda62174e6374b07a85724183c  -o-
 https://urldefense.proofpoint.com/v1/url?u=http://www.flickr.com/photos/db
 errange/k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfD
 tysg45MkPhCZFxPEq8%3D%0Am=k92Oxw4Ev6Raba%2FayHa0ExWlFkO%2BLbCNYQYrLDivTK8
 %3D%0As=3d3587124076d99d0ad02847a95a69c541cfe296f650027c99cf098aad764ab9

BTW if you would be nice if you can get your email program not to
mangle URLs in mails you're replying to. In this case it was just
links in a signature so didn't matter, but in other messages it is
mangled stuff in the body of the message :-( It makes it painful
to read the context.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] VM diagnostics - V3 proposal

2013-12-19 Thread Gary Kotton


On 12/19/13 5:50 PM, Daniel P. Berrange berra...@redhat.com wrote:

On Tue, Dec 17, 2013 at 04:28:30AM -0800, Gary Kotton wrote:
 Hi,
 Following the discussion yesterday I have updated the wiki - please see
 
https://urldefense.proofpoint.com/v1/url?u=https://wiki.openstack.org/wik
i/Nova_VM_Diagnosticsk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZ
yF6hgoMQu%2BfDtysg45MkPhCZFxPEq8%3D%0Am=vzUZT3t%2BPvKlvBTFueAjUjo8YUZvDE
tRLmlzFb5ORuM%3D%0As=d13969885872ea187937a89d12aab9b36b51452ba47e35c7e41
692335967b9f7. The proposal is
 backwards compatible and will hopefully provide us with the tools to be
 able to troubleshoot VM issues.

Some comments

 If the driver is unable to return the value or does not have
  access to it at the moment then it should return 'n/a'.

I think it is better if the driver just omitted any key that
it doesn't support altogether. That avoids clients / users
having to do magic string comparisons to identify omitted
data.

I am fine with this. If the data is marked optional then whoever is
parsing the data should check to see if the field exists prior.


 An ID for the diagnostics version. The structure defined below
  is version 1 (Integer)

What are the proposed semantics for version numbers. Do they incremented
on any change, or only on backwards incompatible changes ?

The purpose of this was to be backward compatible. But I guess that if we
go with the optional approach then this is redundant.


 The amount of time in seconds that the VM has been running (Integer)

I'd suggest nano-seconds here. I've been burnt too many times in the
past providing APIs where we rounded data to a coarse unit like seconds.

Sure, sounds reasonable.


Let client programs convert from nanoseconds to seconds if they wish
to display it in that way, but keep the API with the full precision.

  The version of the raw data

I guess that this is redundant too.


Same question as previously.



The allowed keys in network/disk/memory details seem to be
unduly limited. Just having a boolean activity for disk
or NICs seems almost entirely useless. eg the VM might have
sent 1 byte when it first booted and nothing more for the
next 10 days, and an admin can't see this.

I'd suggest we should follow the much expanded set of possible
stats shown by the libvirt driver. These are pretty common
things to show for disk/nic activity and a driver wouldn't have
to support all of them if it doesn't have that info.

Ok. I was just trying to provide an indicator for the admin to dive into
the raw data. But I am fine with this.


It would be nice to have CPU stats available too.

At the moment libvirt only return the cpu0_time. Can you please let me
know what other stats you would like here?

 


 
https://urldefense.proofpoint.com/v1/url?u=http://berrange.com/k=oIvRg1
%2
 
BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxPEq
8%
 
3D%0Am=k92Oxw4Ev6Raba%2FayHa0ExWlFkO%2BLbCNYQYrLDivTK8%3D%0As=dd903dfc
a0
 b7b3ace5c560509caf1164f8f3f4dda62174e6374b07a85724183c  -o-
 
https://urldefense.proofpoint.com/v1/url?u=http://www.flickr.com/photos/
db
 
errange/k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2B
fD
 
tysg45MkPhCZFxPEq8%3D%0Am=k92Oxw4Ev6Raba%2FayHa0ExWlFkO%2BLbCNYQYrLDivT
K8
 
%3D%0As=3d3587124076d99d0ad02847a95a69c541cfe296f650027c99cf098aad764ab
9

BTW if you would be nice if you can get your email program not to
mangle URLs in mails you're replying to. In this case it was just
links in a signature so didn't matter, but in other messages it is
mangled stuff in the body of the message :-( It makes it painful
to read the context.

Regards,
Daniel
-- 
|: 
https://urldefense.proofpoint.com/v1/url?u=http://berrange.com/k=oIvRg1%2
BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxPEq8%
3D%0Am=vzUZT3t%2BPvKlvBTFueAjUjo8YUZvDEtRLmlzFb5ORuM%3D%0As=3b9f7af3a2bc
4ffaf73f6cff69fd1e88b2af95ced9b60945bc1e2f97ebaf7da4  -o-
https://urldefense.proofpoint.com/v1/url?u=http://www.flickr.com/photos/db
errange/k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfD
tysg45MkPhCZFxPEq8%3D%0Am=vzUZT3t%2BPvKlvBTFueAjUjo8YUZvDEtRLmlzFb5ORuM%3
D%0As=3fa5cf45352c4dcbedf56f5ba059edf46fec10637a329c808cdfca2f66d3a4ce :|
|: 
https://urldefense.proofpoint.com/v1/url?u=http://libvirt.org/k=oIvRg1%2B
dGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxPEq8%3
D%0Am=vzUZT3t%2BPvKlvBTFueAjUjo8YUZvDEtRLmlzFb5ORuM%3D%0As=4bf16f2a8e571
3d2fb13f8dcb0ab13e78a5ec376b215f6c07476f4a75c1b829f  -o-
   
https://urldefense.proofpoint.com/v1/url?u=http://virt-manager.org/k=oIvR
g1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxP
Eq8%3D%0Am=vzUZT3t%2BPvKlvBTFueAjUjo8YUZvDEtRLmlzFb5ORuM%3D%0As=1d14716b
524d2c056cb5b26f32b13df0a602ca98fb80380d7e1964491f43b44f :|
|: 
https://urldefense.proofpoint.com/v1/url?u=http://autobuild.org/k=oIvRg1%
2BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxPEq8

Re: [openstack-dev] [nova] VM diagnostics - V3 proposal

2013-12-19 Thread Gary Kotton


On 12/19/13 6:07 PM, Daniel P. Berrange berra...@redhat.com wrote:

On Thu, Dec 19, 2013 at 08:02:16AM -0800, Gary Kotton wrote:
 
 
 On 12/19/13 5:50 PM, Daniel P. Berrange berra...@redhat.com wrote:
 
 On Tue, Dec 17, 2013 at 04:28:30AM -0800, Gary Kotton wrote:
  Hi,
  Following the discussion yesterday I have updated the wiki - please
see
  
 
https://urldefense.proofpoint.com/v1/url?u=https://wiki.openstack.org/w
ik
 
i/Nova_VM_Diagnosticsk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8N
PZ
 
yF6hgoMQu%2BfDtysg45MkPhCZFxPEq8%3D%0Am=vzUZT3t%2BPvKlvBTFueAjUjo8YUZv
DE
 
tRLmlzFb5ORuM%3D%0As=d13969885872ea187937a89d12aab9b36b51452ba47e35c7e
41
 692335967b9f7. The proposal is
  backwards compatible and will hopefully provide us with the tools to
be
  able to troubleshoot VM issues.
 
 Some comments
 
  If the driver is unable to return the value or does not have
   access to it at the moment then it should return 'n/a'.
 
 I think it is better if the driver just omitted any key that
 it doesn't support altogether. That avoids clients / users
 having to do magic string comparisons to identify omitted
 data.
 
 I am fine with this. If the data is marked optional then whoever is
 parsing the data should check to see if the field exists prior.
 
 
  An ID for the diagnostics version. The structure defined below
   is version 1 (Integer)
 
 What are the proposed semantics for version numbers. Do they
incremented
 on any change, or only on backwards incompatible changes ?
 
 The purpose of this was to be backward compatible. But I guess that if
we
 go with the optional approach then this is redundant.
 
 
  The amount of time in seconds that the VM has been running (Integer)
 
 I'd suggest nano-seconds here. I've been burnt too many times in the
 past providing APIs where we rounded data to a coarse unit like
seconds.
 
 Sure, sounds reasonable.

Oh hang on, when you say 'amount of time in seconds the VM has been
running'
you're meaning wall-clock time since boot.  Seconds is fine for wall clock
time actually.


I was getting mixed up with CPU utilization time, since libvirt doesn't
actually provide any way to get uptime.


 Let client programs convert from nanoseconds to seconds if they wish
 to display it in that way, but keep the API with the full precision.
 
   The version of the raw data
 
 I guess that this is redundant too.
 
 
 Same question as previously.
 
 
 
 The allowed keys in network/disk/memory details seem to be
 unduly limited. Just having a boolean activity for disk
 or NICs seems almost entirely useless. eg the VM might have
 sent 1 byte when it first booted and nothing more for the
 next 10 days, and an admin can't see this.
 
 I'd suggest we should follow the much expanded set of possible
 stats shown by the libvirt driver. These are pretty common
 things to show for disk/nic activity and a driver wouldn't have
 to support all of them if it doesn't have that info.
 
 Ok. I was just trying to provide an indicator for the admin to dive into
 the raw data. But I am fine with this.
 
 
 It would be nice to have CPU stats available too.
 
 At the moment libvirt only return the cpu0_time. Can you please let me
 know what other stats you would like here?

Since we have numCpus, I'd suggest we allow for a list of cpus in the
same way we do for disk/nics and returning the execution time split
out for each vCPU.  We could still have a merged execution time too
since I can imagine some hypervisors won't be able to provide the
split out per-vcpu time.

Good call. I'll add this!


Daniel
-- 
|: 
https://urldefense.proofpoint.com/v1/url?u=http://berrange.com/k=oIvRg1%2
BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxPEq8%
3D%0Am=np3rsZrtAfOFOfhCRXiCtSXdJPm3QIwaKWcO75QdIvo%3D%0As=43e28d32e5a671
8ba104d118a69e659866e10cb5981b43bd8c89ac09d96bc6de  -o-
https://urldefense.proofpoint.com/v1/url?u=http://www.flickr.com/photos/db
errange/k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfD
tysg45MkPhCZFxPEq8%3D%0Am=np3rsZrtAfOFOfhCRXiCtSXdJPm3QIwaKWcO75QdIvo%3D%
0As=69b56c12bb439e62c6aa90ec908016b701d268210a464d9d1f43f8c070e6e1db :|
|: 
https://urldefense.proofpoint.com/v1/url?u=http://libvirt.org/k=oIvRg1%2B
dGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxPEq8%3
D%0Am=np3rsZrtAfOFOfhCRXiCtSXdJPm3QIwaKWcO75QdIvo%3D%0As=d4d36b4c778f308
9290ed06239bad90dbe8b52370f9c6c24b60a935510fb74d7  -o-
 
https://urldefense.proofpoint.com/v1/url?u=http://virt-manager.org/k=oIvR
g1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxP
Eq8%3D%0Am=np3rsZrtAfOFOfhCRXiCtSXdJPm3QIwaKWcO75QdIvo%3D%0As=edb1182136
b3c880b14557de1856e0fbb4a950fceb89b39bb0cef7df081fa10c :|
|: 
https://urldefense.proofpoint.com/v1/url?u=http://autobuild.org/k=oIvRg1%
2BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxPEq8
%3D%0Am=np3rsZrtAfOFOfhCRXiCtSXdJPm3QIwaKWcO75QdIvo%3D%0As=42e39e02e8829
734ca2e30d74e10da2b0b3467c1cf019dc51c2edf1886f1 

Re: [openstack-dev] [nova] VM diagnostics - V3 proposal

2013-12-19 Thread Vladik Romanovsky
Ah, I think I've responded too fast, sorry.

meter-list provides a list of various measurements that are being done per 
resource.
sample-list provides a list of samples per every meter: ceilometer sample-list 
--meter cpu_util -q resource_id=vm_uuid
These samples can be aggregated over a period of time per every meter and 
resource:
ceilometer statistics -m cpu_util -q 
'timestampSTART;timestamp=END;resource_id=vm_uuid' --period 3600

Vladik



- Original Message -
 From: Daniel P. Berrange berra...@redhat.com
 To: Vladik Romanovsky vladik.romanov...@enovance.com
 Cc: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org, John
 Garbutt j...@johngarbutt.com
 Sent: Thursday, 19 December, 2013 10:37:27 AM
 Subject: Re: [openstack-dev] [nova] VM diagnostics - V3 proposal
 
 On Thu, Dec 19, 2013 at 03:47:30PM +0100, Vladik Romanovsky wrote:
  I think it was:
  
  ceilometer sample-list -m cpu_util -q 'resource_id=vm_uuid'
 
 Hmm, a standard devstack deployment of ceilometer doesn't seem to
 record any performance stats at all - just shows me the static
 configuration parameters :-(
 
  ceilometer meter-list  -q 'resource_id=296b22c6-2a4d-4a8d-a7cd-2d73339f9c70'
 +-+---+--+--+--+--+
 | Name| Type  | Unit | Resource ID
 | | User ID  | Project ID
 | |
 +-+---+--+--+--+--+
 | disk.ephemeral.size | gauge | GB   |
 | 296b22c6-2a4d-4a8d-a7cd-2d73339f9c70 | 96f9a624a325473daf4cd7875be46009 |
 | ec26984024c1438e8e2f93dc6a8c5ad0 |
 | disk.root.size  | gauge | GB   |
 | 296b22c6-2a4d-4a8d-a7cd-2d73339f9c70 | 96f9a624a325473daf4cd7875be46009 |
 | ec26984024c1438e8e2f93dc6a8c5ad0 |
 | instance| gauge | instance |
 | 296b22c6-2a4d-4a8d-a7cd-2d73339f9c70 | 96f9a624a325473daf4cd7875be46009 |
 | ec26984024c1438e8e2f93dc6a8c5ad0 |
 | instance:m1.small   | gauge | instance |
 | 296b22c6-2a4d-4a8d-a7cd-2d73339f9c70 | 96f9a624a325473daf4cd7875be46009 |
 | ec26984024c1438e8e2f93dc6a8c5ad0 |
 | memory  | gauge | MB   |
 | 296b22c6-2a4d-4a8d-a7cd-2d73339f9c70 | 96f9a624a325473daf4cd7875be46009 |
 | ec26984024c1438e8e2f93dc6a8c5ad0 |
 | vcpus   | gauge | vcpu |
 | 296b22c6-2a4d-4a8d-a7cd-2d73339f9c70 | 96f9a624a325473daf4cd7875be46009 |
 | ec26984024c1438e8e2f93dc6a8c5ad0 |
 +-+---+--+--+--+--+
 
 
 If the admin user can't rely on ceilometer guaranteeing availability of
 the performance stats at all, then I think having an API in nova to report
 them is in fact justifiable. In fact it is probably justifiable no matter
 what as a fallback way to check that VMs are doing in the fact of failure
 of ceilometer / part of the cloud infrastructure.
 
 Daniel
 --
 |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
 |: http://libvirt.org  -o- http://virt-manager.org :|
 |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
 |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] VM diagnostics - V3 proposal

2013-12-17 Thread Gary Kotton
Hi,
Following the discussion yesterday I have updated the wiki - please see
https://wiki.openstack.org/wiki/Nova_VM_Diagnostics. The proposal is
backwards compatible and will hopefully provide us with the tools to be
able to troubleshoot VM issues.
Thanks
Gary

On 12/16/13 5:50 PM, Daniel P. Berrange berra...@redhat.com wrote:

On Mon, Dec 16, 2013 at 03:37:39PM +, John Garbutt wrote:
 On 16 December 2013 15:25, Daniel P. Berrange berra...@redhat.com
wrote:
  On Mon, Dec 16, 2013 at 06:58:24AM -0800, Gary Kotton wrote:
  I'd like to propose the following for the V3 API (we will not touch
V2
  in case operators have applications that are written against this ­
this
  may be the case for libvirt or xen. The VMware API support was added
  in I1):
 
   1.  We formalize the data that is returned by the API [1]
 
  Before we debate what standard data should be returned we need
  detail of exactly what info the current 3 virt drivers return.
  IMHO it would be better if we did this all in the existing wiki
  page associated with the blueprint, rather than etherpad, so it
  serves as a permanent historical record for the blueprint design.
 
 +1
 
  While we're doing this I think we should also consider whether
  the 'get_diagnostics' API is fit for purpose more generally.
  eg currently it is restricted to administrators. Some, if
  not all, of the data libvirt returns is relevant to the owner
  of the VM but they can not get at it.
 
 Ceilometer covers that ground, we should ask them about this API.

If we consider what is potentially in scope for ceilometer and
subtract that from what the libvirt get_diagnostics impl currently
returns, you pretty much end up with the empty set. This might cause
us to question if 'get_diagnostics' should exist at all from the
POV of the libvirt driver's impl. Perhaps vmware/xen return data
that is out of scope for ceilometer ?

  For a cloud administrator it might be argued that the current
  API is too inefficient to be useful in many troubleshooting
  scenarios since it requires you to invoke it once per instance
  if you're collecting info on a set of guests, eg all VMs on
  one host. It could be that cloud admins would be better
  served by an API which returned info for all VMs ona host
  at once, if they're monitoring say, I/O stats across VM
  disks to identify one that is causing I/O trouble ? IOW, I
  think we could do with better identifying the usage scenarios
  for this API if we're to improve its design / impl.
 
 I like the API that helps you dig into info for a specific host that
 other system highlight as problematic.
 You can do things that could be expensive to compute, but useful for
 troubleshooting.

If things get expensive to compute, then it may well be preferrable to
have separate APIs for distinct pieces of interesting diagnostic
data. eg If they only care about one particular thing, they don't want
to trigger expensive computations of things they don't care about seeing.

Regards,
Daniel
-- 
|: 
https://urldefense.proofpoint.com/v1/url?u=http://berrange.com/k=oIvRg1%2
BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxPEq8%
3D%0Am=k92Oxw4Ev6Raba%2FayHa0ExWlFkO%2BLbCNYQYrLDivTK8%3D%0As=dd903dfca0
b7b3ace5c560509caf1164f8f3f4dda62174e6374b07a85724183c  -o-
https://urldefense.proofpoint.com/v1/url?u=http://www.flickr.com/photos/db
errange/k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfD
tysg45MkPhCZFxPEq8%3D%0Am=k92Oxw4Ev6Raba%2FayHa0ExWlFkO%2BLbCNYQYrLDivTK8
%3D%0As=3d3587124076d99d0ad02847a95a69c541cfe296f650027c99cf098aad764ab9
:|
|: 
https://urldefense.proofpoint.com/v1/url?u=http://libvirt.org/k=oIvRg1%2B
dGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxPEq8%3
D%0Am=k92Oxw4Ev6Raba%2FayHa0ExWlFkO%2BLbCNYQYrLDivTK8%3D%0As=60b14571198
ff9afdeb5949597de9596b75ab79abca0496a96703e15aa10  -o-
 
https://urldefense.proofpoint.com/v1/url?u=http://virt-manager.org/k=oIvR
g1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxP
Eq8%3D%0Am=k92Oxw4Ev6Raba%2FayHa0ExWlFkO%2BLbCNYQYrLDivTK8%3D%0As=a6f1f8
5bb37036e37ed7e7dba4d88c00a289cfb0e42740528d5c7ca1bd690620 :|
|: 
https://urldefense.proofpoint.com/v1/url?u=http://autobuild.org/k=oIvRg1%
2BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxPEq8
%3D%0Am=k92Oxw4Ev6Raba%2FayHa0ExWlFkO%2BLbCNYQYrLDivTK8%3D%0As=ee08dd8de
4174a142a6d8c5aa18ede84b47ec0db358b96c3b729232e004641e1   -o-
https://urldefense.proofpoint.com/v1/url?u=http://search.cpan.org/~danberr
/k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45M
kPhCZFxPEq8%3D%0Am=k92Oxw4Ev6Raba%2FayHa0ExWlFkO%2BLbCNYQYrLDivTK8%3D%0A
s=313fd521d220dc3b7cbba305533de490bf614449d0489e705e15f2536657c222 :|
|: 
https://urldefense.proofpoint.com/v1/url?u=http://entangle-photo.org/k=oI
vRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZF
xPEq8%3D%0Am=k92Oxw4Ev6Raba%2FayHa0ExWlFkO%2BLbCNYQYrLDivTK8%3D%0As=da78

[openstack-dev] [nova] VM diagnostics - V3 proposal

2013-12-16 Thread Gary Kotton
Hi,
At the moment the administrator is able to retrieve diagnostics for a running 
VM. Currently the implementation is very loosely defined, that is, each driver 
returns whatever they have to return. This is problematic in a number of 
respects:

 1.  The tempest tests were written specifically for one driver and break with 
all other drivers (the test was removed to prevent this – bug 1240043)
 2.  An admin is unable to write tools that may work with a hybrid cloud
 3.  Adding support for get_diagnostics for drivers that do not support is 
painful

I'd like to propose the following for the V3 API (we will not touch V2 in case 
operators have applications that are written against this – this may be the 
case for libvirt or xen. The VMware API support was added in I1):

 1.  We formalize the data that is returned by the API [1]
 2.  We enable the driver to add extra information that will assist the 
administrators in troubleshooting problems for VM's

I have proposed a BP for this - 
https://blueprints.launchpad.net/nova/+spec/diagnostics-namespace (I'd like to 
change the name to v3-api-diagnostics – which is more apt)

And as Nelson Mandel would have said: “It always seems impossible until it's 
done.”

Moving forwards we decide to provide administrator the option of using the for 
V2 (it may be very helpful with debugging issues). But that is another 
discussion.

Thanks
Gary

[1] https://etherpad.openstack.org/p/vm-diagnostics
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] VM diagnostics - V3 proposal

2013-12-16 Thread Daniel P. Berrange
On Mon, Dec 16, 2013 at 06:58:24AM -0800, Gary Kotton wrote:
 Hi,
 At the moment the administrator is able to retrieve diagnostics for a running 
 VM. Currently the implementation is very loosely defined, that is, each 
 driver returns whatever they have to return. This is problematic in a number 
 of respects:
 
  1.  The tempest tests were written specifically for one driver and break 
 with all other drivers (the test was removed to prevent this – bug 1240043)
  2.  An admin is unable to write tools that may work with a hybrid cloud
  3.  Adding support for get_diagnostics for drivers that do not support is 
 painful

Technically 3 is currently easy, since currently you don't need to care
about what the other drivers have done - you can return any old info
for your new driver's get_diagnostics API impl ;-)

Seriously though, I agree the current API is a big trainwreck.

 I'd like to propose the following for the V3 API (we will not touch V2
 in case operators have applications that are written against this – this
 may be the case for libvirt or xen. The VMware API support was added
 in I1):
 
  1.  We formalize the data that is returned by the API [1]

Before we debate what standard data should be returned we need
detail of exactly what info the current 3 virt drivers return.
IMHO it would be better if we did this all in the existing wiki
page associated with the blueprint, rather than etherpad, so it
serves as a permanent historical record for the blueprint design.

While we're doing this I think we should also consider whether
the 'get_diagnostics' API is fit for purpose more generally. 
eg currently it is restricted to administrators. Some, if
not all, of the data libvirt returns is relevant to the owner
of the VM but they can not get at it.

For a cloud administrator it might be argued that the current
API is too inefficient to be useful in many troubleshooting
scenarios since it requires you to invoke it once per instance
if you're collecting info on a set of guests, eg all VMs on
one host. It could be that cloud admins would be better
served by an API which returned info for all VMs ona host
at once, if they're monitoring say, I/O stats across VM
disks to identify one that is causing I/O trouble ? IOW, I
think we could do with better identifying the usage scenarios
for this API if we're to improve its design / impl.


  2.  We enable the driver to add extra information that will assist the 
 administrators in troubleshooting problems for VM's
 
 I have proposed a BP for this - 
 https://blueprints.launchpad.net/nova/+spec/diagnostics-namespace (I'd like 
 to change the name to v3-api-diagnostics – which is more apt)

The bp rename would be a good idea.

 [1] https://etherpad.openstack.org/p/vm-diagnostics

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] VM diagnostics - V3 proposal

2013-12-16 Thread John Garbutt
On 16 December 2013 15:25, Daniel P. Berrange berra...@redhat.com wrote:
 On Mon, Dec 16, 2013 at 06:58:24AM -0800, Gary Kotton wrote:
 Hi,
 At the moment the administrator is able to retrieve diagnostics for a 
 running VM. Currently the implementation is very loosely defined, that is, 
 each driver returns whatever they have to return. This is problematic in a 
 number of respects:

  1.  The tempest tests were written specifically for one driver and break 
 with all other drivers (the test was removed to prevent this – bug 1240043)
  2.  An admin is unable to write tools that may work with a hybrid cloud
  3.  Adding support for get_diagnostics for drivers that do not support is 
 painful

 Technically 3 is currently easy, since currently you don't need to care
 about what the other drivers have done - you can return any old info
 for your new driver's get_diagnostics API impl ;-)

 Seriously though, I agree the current API is a big trainwreck.

+1

 I'd like to propose the following for the V3 API (we will not touch V2
 in case operators have applications that are written against this – this
 may be the case for libvirt or xen. The VMware API support was added
 in I1):

  1.  We formalize the data that is returned by the API [1]

 Before we debate what standard data should be returned we need
 detail of exactly what info the current 3 virt drivers return.
 IMHO it would be better if we did this all in the existing wiki
 page associated with the blueprint, rather than etherpad, so it
 serves as a permanent historical record for the blueprint design.

+1

 While we're doing this I think we should also consider whether
 the 'get_diagnostics' API is fit for purpose more generally.
 eg currently it is restricted to administrators. Some, if
 not all, of the data libvirt returns is relevant to the owner
 of the VM but they can not get at it.

Ceilometer covers that ground, we should ask them about this API.

 For a cloud administrator it might be argued that the current
 API is too inefficient to be useful in many troubleshooting
 scenarios since it requires you to invoke it once per instance
 if you're collecting info on a set of guests, eg all VMs on
 one host. It could be that cloud admins would be better
 served by an API which returned info for all VMs ona host
 at once, if they're monitoring say, I/O stats across VM
 disks to identify one that is causing I/O trouble ? IOW, I
 think we could do with better identifying the usage scenarios
 for this API if we're to improve its design / impl.

I like the API that helps you dig into info for a specific host that
other system highlight as problematic.
You can do things that could be expensive to compute, but useful for
troubleshooting.

But you are right, we should think about it first.


  2.  We enable the driver to add extra information that will assist the 
 administrators in troubleshooting problems for VM's


I think we need to version this information, if possible. I don't like
the idea of the driver just changing the public API as it wishes.

 I have proposed a BP for this - 
 https://blueprints.launchpad.net/nova/+spec/diagnostics-namespace (I'd like 
 to change the name to v3-api-diagnostics – which is more apt)

 The bp rename would be a good idea.
+1

 [1] https://etherpad.openstack.org/p/vm-diagnostics

John

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] VM diagnostics - V3 proposal

2013-12-16 Thread Daniel P. Berrange
On Mon, Dec 16, 2013 at 03:37:39PM +, John Garbutt wrote:
 On 16 December 2013 15:25, Daniel P. Berrange berra...@redhat.com wrote:
  On Mon, Dec 16, 2013 at 06:58:24AM -0800, Gary Kotton wrote:
  I'd like to propose the following for the V3 API (we will not touch V2
  in case operators have applications that are written against this – this
  may be the case for libvirt or xen. The VMware API support was added
  in I1):
 
   1.  We formalize the data that is returned by the API [1]
 
  Before we debate what standard data should be returned we need
  detail of exactly what info the current 3 virt drivers return.
  IMHO it would be better if we did this all in the existing wiki
  page associated with the blueprint, rather than etherpad, so it
  serves as a permanent historical record for the blueprint design.
 
 +1
 
  While we're doing this I think we should also consider whether
  the 'get_diagnostics' API is fit for purpose more generally.
  eg currently it is restricted to administrators. Some, if
  not all, of the data libvirt returns is relevant to the owner
  of the VM but they can not get at it.
 
 Ceilometer covers that ground, we should ask them about this API.

If we consider what is potentially in scope for ceilometer and
subtract that from what the libvirt get_diagnostics impl currently
returns, you pretty much end up with the empty set. This might cause
us to question if 'get_diagnostics' should exist at all from the
POV of the libvirt driver's impl. Perhaps vmware/xen return data
that is out of scope for ceilometer ?

  For a cloud administrator it might be argued that the current
  API is too inefficient to be useful in many troubleshooting
  scenarios since it requires you to invoke it once per instance
  if you're collecting info on a set of guests, eg all VMs on
  one host. It could be that cloud admins would be better
  served by an API which returned info for all VMs ona host
  at once, if they're monitoring say, I/O stats across VM
  disks to identify one that is causing I/O trouble ? IOW, I
  think we could do with better identifying the usage scenarios
  for this API if we're to improve its design / impl.
 
 I like the API that helps you dig into info for a specific host that
 other system highlight as problematic.
 You can do things that could be expensive to compute, but useful for
 troubleshooting.

If things get expensive to compute, then it may well be preferrable to
have separate APIs for distinct pieces of interesting diagnostic
data. eg If they only care about one particular thing, they don't want
to trigger expensive computations of things they don't care about seeing.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] VM diagnostics - V3 proposal

2013-12-16 Thread Gary Kotton


On 12/16/13 5:25 PM, Daniel P. Berrange berra...@redhat.com wrote:

On Mon, Dec 16, 2013 at 06:58:24AM -0800, Gary Kotton wrote:
 Hi,
 At the moment the administrator is able to retrieve diagnostics for a
running VM. Currently the implementation is very loosely defined, that
is, each driver returns whatever they have to return. This is
problematic in a number of respects:
 
  1.  The tempest tests were written specifically for one driver and
break with all other drivers (the test was removed to prevent this ­ bug
1240043)
  2.  An admin is unable to write tools that may work with a hybrid cloud
  3.  Adding support for get_diagnostics for drivers that do not support
is painful

Technically 3 is currently easy, since currently you don't need to care
about what the other drivers have done - you can return any old info
for your new driver's get_diagnostics API impl ;-)

To be honest it was not easy at all.


Seriously though, I agree the current API is a big trainwreck.

 I'd like to propose the following for the V3 API (we will not touch V2
 in case operators have applications that are written against this ­ this
 may be the case for libvirt or xen. The VMware API support was added
 in I1):
 
  1.  We formalize the data that is returned by the API [1]

Before we debate what standard data should be returned we need
detail of exactly what info the current 3 virt drivers return.
IMHO it would be better if we did this all in the existing wiki
page associated with the blueprint, rather than etherpad, so it
serves as a permanent historical record for the blueprint design.

I will add this to the wiki. Not sure what this will achieve - other than
the fact that it will crystalize the fact that we need to have common data
returned.


While we're doing this I think we should also consider whether
the 'get_diagnostics' API is fit for purpose more generally.
eg currently it is restricted to administrators. Some, if
not all, of the data libvirt returns is relevant to the owner
of the VM but they can not get at it.

This is configurable. The default is for an admin user. This is in the
policy.json file - 
https://github.com/openstack/nova/blob/master/etc/nova/policy.json#L202


For a cloud administrator it might be argued that the current
API is too inefficient to be useful in many troubleshooting
scenarios since it requires you to invoke it once per instance
if you're collecting info on a set of guests, eg all VMs on
one host. It could be that cloud admins would be better
served by an API which returned info for all VMs ona host
at once, if they're monitoring say, I/O stats across VM
disks to identify one that is causing I/O trouble ? IOW, I
think we could do with better identifying the usage scenarios
for this API if we're to improve its design / impl.

Host diagnostics would be a nice feature to have. I do not think that it
is part of the scope of what we want to achieve here but I will certainly
be happy to work on this afterwards.



  2.  We enable the driver to add extra information that will assist the
administrators in troubleshooting problems for VM's
 
 I have proposed a BP for this -
https://urldefense.proofpoint.com/v1/url?u=https://blueprints.launchpad.n
et/nova/%2Bspec/diagnostics-namespacek=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A
r=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxPEq8%3D%0Am=xY7Bdz7UGQQFxbe2
g6zVO%2FBUpkZfN%2BImM564xugLjsk%3D%0As=9d0ecd519b919b6c87bbdd2e1e1bf9a51
6f469143d15797d272cfd8c7e2d0686 (I'd like to change the name to
v3-api-diagnostics ­ which is more apt)

The bp rename would be a good idea.

 [1] 
https://urldefense.proofpoint.com/v1/url?u=https://etherpad.openstack.org
/p/vm-diagnosticsk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6h
goMQu%2BfDtysg45MkPhCZFxPEq8%3D%0Am=xY7Bdz7UGQQFxbe2g6zVO%2FBUpkZfN%2BIm
M564xugLjsk%3D%0As=d1386b91ca07f5504844e7f4312dc5b53b709660fe71ca96c76c3
8d447bec2e5

Regards,
Daniel
-- 
|: 
https://urldefense.proofpoint.com/v1/url?u=http://berrange.com/k=oIvRg1%2
BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxPEq8%
3D%0Am=xY7Bdz7UGQQFxbe2g6zVO%2FBUpkZfN%2BImM564xugLjsk%3D%0As=c421c25857
f1ca0294b5cc318e87a758a2b49ecc35b3ca9f75b57be574ce0299  -o-
https://urldefense.proofpoint.com/v1/url?u=http://www.flickr.com/photos/db
errange/k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfD
tysg45MkPhCZFxPEq8%3D%0Am=xY7Bdz7UGQQFxbe2g6zVO%2FBUpkZfN%2BImM564xugLjsk
%3D%0As=281520d30342d840da18dac821fdc235faf903c0bb7e8fcb51620217bf7b236a
:|
|: 
https://urldefense.proofpoint.com/v1/url?u=http://libvirt.org/k=oIvRg1%2B
dGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxPEq8%3
D%0Am=xY7Bdz7UGQQFxbe2g6zVO%2FBUpkZfN%2BImM564xugLjsk%3D%0As=9424295e978
7fe72415305745f36556f0b8167ba0da8ac9632a4f8a183a926aa  -o-
 
https://urldefense.proofpoint.com/v1/url?u=http://virt-manager.org/k=oIvR
g1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=eH0pxTUZo8NPZyF6hgoMQu%2BfDtysg45MkPhCZFxP