Re: [openstack-dev] [Nova] New DB column or new DB table?

2013-07-22 Thread Wang, Shane
Daniel raised a good point, I also agreed that is not a good architecture.
Nova can't touch any monitoring stuffs - I don't think that is good.
At least, Ceilometer can be a monitoring hub for external utilities.

On the other hand, for the options Lianhao raised.
Is a query on a DB and a json column faster than the one on two-DB join?
I have no experimental data but I doubt it.

Thanks.
--
Shane

Dan Smith wrote on 2013-07-20:

>> IIUC, Ceilometer is currently a downstream consumer of data from
>> Nova, but no functionality in Nova is a consumer of data from
>> Ceilometer. This is good split from a security separation point of
>> view, since the security of Nova is self-contained in this
>> architecture.
>> 
>> If Nova schedular becomes dependant on data from ceilometer, then now
>> the security of Nova depends on the security of Ceilometer, expanding
>> the attack surface. This is not good architecture IMHO.
> 
> Agreed.
> 
>> At the same time, I hear your concerns about the potential for
>> duplication of stats collection functionality between Nova &
>> Ceilometer. I don't think we neccessarily need to remove 100% of
>> duplication. IMHO probably the key thing is for the virt drivers to
>> expose a standard API for exporting the stats, and make sure that
>> both ceilometer & nova schedular use the same APIs and ideally the
>> same data feed, so we're not invoking the same APIs twice to get the
>> same data.
> 
> I imagine there's quite a bit that could be shared, without dependency
> between the two. Interfaces out of the virt drivers may be one, and the
> code that boils numbers into useful values, as well as perhaps the
> format of the JSON blobs that are getting shoved into the database.
> Perhaps a ceilo-core library with some very simple primitives and
> definitions could be carved out, which both nova and ceilometer could
> import for consistency, without a runtime dependency?
> 
> --Dan
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] New DB column or new DB table?

2013-07-19 Thread Jiang, Yunhong
The  "lazy load" is , with lazy load, for example, the framework don't need 
fetch the PCI information if no PCI filter specified.

The discussion on 
'http://markmail.org/message/gxoqi6coscd2lhwo#query:+page:1+mid:7ksr6byyrpcgkqjv+state:results'
   gives a lot of information.

--jyh



From: Boris Pavlovic [mailto:bo...@pavlovic.me]
Sent: Friday, July 19, 2013 1:07 PM
To: OpenStack Development Mailing List
Subject: Re: [openstack-dev] [Nova] New DB column or new DB table?

Jiang,

I would like to reduce "magic"

1) We are using already RPC (because all compute nodes update are done in DB 
via conductor (which means RPC call).
So count of RPC calls and size of msg will be the same.

2) There is no lazy load when you have to fetch all data about all compute 
nodes on every request to scheduler.

3) Object models are off topic

Best regards,
Boris Pavlovic

Mirantis Inc.



On Fri, Jul 19, 2013 at 11:23 PM, Jiang, Yunhong 
mailto:yunhong.ji...@intel.com>> wrote:
Boris
   I think you in fact covered two topic, one is if use db or rpc for 
communication. This has been discussed a lot. But I didn't find the conclusion. 
From the discussion,  seems the key thing is the fan out messages. I'd suggest 
you to bring this to scheduler sub meeting.

http://eavesdrop.openstack.org/meetings/scheduler/2013/scheduler.2013-06-11-14.59.log.html
http://www.mail-archive.com/openstack-dev@lists.openstack.org/msg00070.html
http://comments.gmane.org/gmane.comp.cloud.openstack.devel/23

   The second topic is adding extra tables to compute nodes. I think we 
need the lazy loading for the compute node, and also I think with object model, 
we can further improve it if we utilize the compute node object.

Thanks
--jyh


From: Boris Pavlovic [mailto:bo...@pavlovic.me<mailto:bo...@pavlovic.me>]
Sent: Friday, July 19, 2013 10:07 AM

To: OpenStack Development Mailing List
Subject: Re: [openstack-dev] [Nova] New DB column or new DB table?

Hi all,

We have to much different branches about scheduler (so I have to repeat here 
also).

I am against to add some extra tables that will be joined to compute_nodes 
table on each scheduler request (or adding large text columns).
Because it make our non scalable scheduler even less scalable.

Also if we just remove DB between scheduler and compute nodes we will get 
really good improvement in all aspects (performance, db load, network traffic, 
scalability )
And also it will be easily to use another resources provider (cinder, 
ceilometer e.g..) in Nova scheduler.

And one more thing this all could be really simple implement in current Nova, 
without big changes
 
https://docs.google.com/document/d/1_DRv7it_mwalEZzLy5WO92TJcummpmWL4NWsWf0UWiQ/edit?usp=sharing


Best regards,
Boris Pavlovic

Mirantis Inc.

On Fri, Jul 19, 2013 at 8:44 PM, Dan Smith 
mailto:d...@danplanet.com>> wrote:
> IIUC, Ceilometer is currently a downstream consumer of data from
> Nova, but no functionality in Nova is a consumer of data from
> Ceilometer. This is good split from a security separation point of
> view, since the security of Nova is self-contained in this
> architecture.
>
> If Nova schedular becomes dependant on data from ceilometer, then now
> the security of Nova depends on the security of Ceilometer, expanding
> the attack surface. This is not good architecture IMHO.
Agreed.

> At the same time, I hear your concerns about the potential for
> duplication of stats collection functionality between Nova &
> Ceilometer. I don't think we neccessarily need to remove 100% of
> duplication. IMHO probably the key thing is for the virt drivers to
> expose a standard API for exporting the stats, and make sure that
> both ceilometer & nova schedular use the same APIs and ideally the
> same data feed, so we're not invoking the same APIs twice to get the
> same data.
I imagine there's quite a bit that could be shared, without dependency
between the two. Interfaces out of the virt drivers may be one, and the
code that boils numbers into useful values, as well as perhaps the
format of the JSON blobs that are getting shoved into the database.
Perhaps a ceilo-core library with some very simple primitives and
definitions could be carved out, which both nova and ceilometer could
import for consistency, without a runtime dependency?

--Dan

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org<mailto:OpenStack-dev@lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org<mailto:OpenStack-dev@lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] New DB column or new DB table?

2013-07-19 Thread Boris Pavlovic
Jiang,

I would like to reduce "magic"

1) We are using already RPC (because all compute nodes update are done in
DB via conductor (which means RPC call).
So count of RPC calls and size of msg will be the same.

2) There is no lazy load when you have to fetch all data about all compute
nodes on every request to scheduler.

3) Object models are off topic

Best regards,
Boris Pavlovic

Mirantis Inc.




On Fri, Jul 19, 2013 at 11:23 PM, Jiang, Yunhong wrote:

>  Boris
>
>I think you in fact covered two topic, one is if use db or rpc for
> communication. This has been discussed a lot. But I didn’t find the
> conclusion. From the discussion,  seems the key thing is the fan out
> messages. I’d suggest you to bring this to scheduler sub meeting.
>
> ** **
>
>
> http://eavesdrop.openstack.org/meetings/scheduler/2013/scheduler.2013-06-11-14.59.log.html
> 
>
> http://www.mail-archive.com/openstack-dev@lists.openstack.org/msg00070.html
> 
>
> http://comments.gmane.org/gmane.comp.cloud.openstack.devel/23 
>
> ** **
>
>The second topic is adding extra tables to compute nodes. I think
> we need the lazy loading for the compute node, and also I think with object
> model, we can further improve it if we utilize the compute node object.***
> *
>
> ** **
>
> Thanks
>
> --jyh
>
> ** **
>
> ** **
>
> *From:* Boris Pavlovic [mailto:bo...@pavlovic.me]
> *Sent:* Friday, July 19, 2013 10:07 AM
>
> *To:* OpenStack Development Mailing List
> *Subject:* Re: [openstack-dev] [Nova] New DB column or new DB table?
>
>  ** **
>
> Hi all, 
>
> ** **
>
> We have to much different branches about scheduler (so I have to repeat
> here also).
>
> ** **
>
> I am against to add some extra tables that will be joined to compute_nodes
> table on each scheduler request (or adding large text columns).
>
> Because it make our non scalable scheduler even less scalable. 
>
> ** **
>
> Also if we just remove DB between scheduler and compute nodes we will get
> really good improvement in all aspects (performance, db load, network
> traffic, scalability )
>
> And also it will be easily to use another resources provider (cinder,
> ceilometer e.g..) in Nova scheduler. 
>
> ** **
>
> And one more thing this all could be really simple implement in current
> Nova, without big changes 
>
>
> https://docs.google.com/document/d/1_DRv7it_mwalEZzLy5WO92TJcummpmWL4NWsWf0UWiQ/edit?usp=sharing
> 
>
> ** **
>
> ** **
>
> Best regards,
>
> Boris Pavlovic 
>
> ** **
>
> Mirantis Inc.
>
> ** **
>
> On Fri, Jul 19, 2013 at 8:44 PM, Dan Smith  wrote:
>
> > IIUC, Ceilometer is currently a downstream consumer of data from
> > Nova, but no functionality in Nova is a consumer of data from
> > Ceilometer. This is good split from a security separation point of
> > view, since the security of Nova is self-contained in this
> > architecture.
> >
> > If Nova schedular becomes dependant on data from ceilometer, then now
> > the security of Nova depends on the security of Ceilometer, expanding
> > the attack surface. This is not good architecture IMHO.
>
> Agreed.
>
>
> > At the same time, I hear your concerns about the potential for
> > duplication of stats collection functionality between Nova &
> > Ceilometer. I don't think we neccessarily need to remove 100% of
> > duplication. IMHO probably the key thing is for the virt drivers to
> > expose a standard API for exporting the stats, and make sure that
> > both ceilometer & nova schedular use the same APIs and ideally the
> > same data feed, so we're not invoking the same APIs twice to get the
> > same data.
>
> I imagine there's quite a bit that could be shared, without dependency
> between the two. Interfaces out of the virt drivers may be one, and the
> code that boils numbers into useful values, as well as perhaps the
> format of the JSON blobs that are getting shoved into the database.
> Perhaps a ceilo-core library with some very simple primitives and
> definitions could be carved out, which both nova and ceilometer could
> import for consistency, without a runtime dependency?
>
> --Dan
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> ** **
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] New DB column or new DB table?

2013-07-19 Thread Boris Pavlovic
Hi all,

We have to much different branches about scheduler (so I have to repeat
here also).

I am against to add some extra tables that will be joined to compute_nodes
table on each scheduler request (or adding large text columns).
Because it make our non scalable scheduler even less scalable.

Also if we just remove DB between scheduler and compute nodes we will get
really good improvement in all aspects (performance, db load, network
traffic, scalability )
And also it will be easily to use another resources provider (cinder,
ceilometer e.g..) in Nova scheduler.

And one more thing this all could be really simple implement in current
Nova, without big changes

https://docs.google.com/document/d/1_DRv7it_mwalEZzLy5WO92TJcummpmWL4NWsWf0UWiQ/edit?usp=sharing


Best regards,
Boris Pavlovic

Mirantis Inc.


On Fri, Jul 19, 2013 at 8:44 PM, Dan Smith  wrote:

> > IIUC, Ceilometer is currently a downstream consumer of data from
> > Nova, but no functionality in Nova is a consumer of data from
> > Ceilometer. This is good split from a security separation point of
> > view, since the security of Nova is self-contained in this
> > architecture.
> >
> > If Nova schedular becomes dependant on data from ceilometer, then now
> > the security of Nova depends on the security of Ceilometer, expanding
> > the attack surface. This is not good architecture IMHO.
>
> Agreed.
>
> > At the same time, I hear your concerns about the potential for
> > duplication of stats collection functionality between Nova &
> > Ceilometer. I don't think we neccessarily need to remove 100% of
> > duplication. IMHO probably the key thing is for the virt drivers to
> > expose a standard API for exporting the stats, and make sure that
> > both ceilometer & nova schedular use the same APIs and ideally the
> > same data feed, so we're not invoking the same APIs twice to get the
> > same data.
>
> I imagine there's quite a bit that could be shared, without dependency
> between the two. Interfaces out of the virt drivers may be one, and the
> code that boils numbers into useful values, as well as perhaps the
> format of the JSON blobs that are getting shoved into the database.
> Perhaps a ceilo-core library with some very simple primitives and
> definitions could be carved out, which both nova and ceilometer could
> import for consistency, without a runtime dependency?
>
> --Dan
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] New DB column or new DB table?

2013-07-19 Thread Jiang, Yunhong
Boris
   I think you in fact covered two topic, one is if use db or rpc for 
communication. This has been discussed a lot. But I didn't find the conclusion. 
From the discussion,  seems the key thing is the fan out messages. I'd suggest 
you to bring this to scheduler sub meeting.

http://eavesdrop.openstack.org/meetings/scheduler/2013/scheduler.2013-06-11-14.59.log.html
http://www.mail-archive.com/openstack-dev@lists.openstack.org/msg00070.html
http://comments.gmane.org/gmane.comp.cloud.openstack.devel/23

   The second topic is adding extra tables to compute nodes. I think we 
need the lazy loading for the compute node, and also I think with object model, 
we can further improve it if we utilize the compute node object.

Thanks
--jyh


From: Boris Pavlovic [mailto:bo...@pavlovic.me]
Sent: Friday, July 19, 2013 10:07 AM
To: OpenStack Development Mailing List
Subject: Re: [openstack-dev] [Nova] New DB column or new DB table?

Hi all,

We have to much different branches about scheduler (so I have to repeat here 
also).

I am against to add some extra tables that will be joined to compute_nodes 
table on each scheduler request (or adding large text columns).
Because it make our non scalable scheduler even less scalable.

Also if we just remove DB between scheduler and compute nodes we will get 
really good improvement in all aspects (performance, db load, network traffic, 
scalability )
And also it will be easily to use another resources provider (cinder, 
ceilometer e.g..) in Nova scheduler.

And one more thing this all could be really simple implement in current Nova, 
without big changes
 
https://docs.google.com/document/d/1_DRv7it_mwalEZzLy5WO92TJcummpmWL4NWsWf0UWiQ/edit?usp=sharing


Best regards,
Boris Pavlovic

Mirantis Inc.

On Fri, Jul 19, 2013 at 8:44 PM, Dan Smith 
mailto:d...@danplanet.com>> wrote:
> IIUC, Ceilometer is currently a downstream consumer of data from
> Nova, but no functionality in Nova is a consumer of data from
> Ceilometer. This is good split from a security separation point of
> view, since the security of Nova is self-contained in this
> architecture.
>
> If Nova schedular becomes dependant on data from ceilometer, then now
> the security of Nova depends on the security of Ceilometer, expanding
> the attack surface. This is not good architecture IMHO.
Agreed.

> At the same time, I hear your concerns about the potential for
> duplication of stats collection functionality between Nova &
> Ceilometer. I don't think we neccessarily need to remove 100% of
> duplication. IMHO probably the key thing is for the virt drivers to
> expose a standard API for exporting the stats, and make sure that
> both ceilometer & nova schedular use the same APIs and ideally the
> same data feed, so we're not invoking the same APIs twice to get the
> same data.
I imagine there's quite a bit that could be shared, without dependency
between the two. Interfaces out of the virt drivers may be one, and the
code that boils numbers into useful values, as well as perhaps the
format of the JSON blobs that are getting shoved into the database.
Perhaps a ceilo-core library with some very simple primitives and
definitions could be carved out, which both nova and ceilometer could
import for consistency, without a runtime dependency?

--Dan

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org<mailto:OpenStack-dev@lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] New DB column or new DB table?

2013-07-19 Thread Dan Smith
> IIUC, Ceilometer is currently a downstream consumer of data from
> Nova, but no functionality in Nova is a consumer of data from
> Ceilometer. This is good split from a security separation point of
> view, since the security of Nova is self-contained in this
> architecture.
> 
> If Nova schedular becomes dependant on data from ceilometer, then now
> the security of Nova depends on the security of Ceilometer, expanding
> the attack surface. This is not good architecture IMHO.

Agreed.
 
> At the same time, I hear your concerns about the potential for
> duplication of stats collection functionality between Nova &
> Ceilometer. I don't think we neccessarily need to remove 100% of
> duplication. IMHO probably the key thing is for the virt drivers to
> expose a standard API for exporting the stats, and make sure that
> both ceilometer & nova schedular use the same APIs and ideally the
> same data feed, so we're not invoking the same APIs twice to get the
> same data.

I imagine there's quite a bit that could be shared, without dependency
between the two. Interfaces out of the virt drivers may be one, and the
code that boils numbers into useful values, as well as perhaps the
format of the JSON blobs that are getting shoved into the database.
Perhaps a ceilo-core library with some very simple primitives and
definitions could be carved out, which both nova and ceilometer could
import for consistency, without a runtime dependency?

--Dan

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] New DB column or new DB table?

2013-07-19 Thread Daniel P. Berrange
On Thu, Jul 18, 2013 at 07:05:10AM -0400, Sean Dague wrote:
> On 07/17/2013 10:54 PM, Lu, Lianhao wrote:
> >Hi fellows,
> >
> >Currently we're implementing the BP 
> >https://blueprints.launchpad.net/nova/+spec/utilization-aware-scheduling. 
> >The main idea is to have an extensible plugin framework on nova-compute 
> >where every plugin can get different metrics(e.g. CPU utilization, memory 
> >cache utilization, network bandwidth, etc.) to store into the DB, and the 
> >nova-scheduler will use that data from DB for scheduling decision.
> >
> >Currently we adds a new table to store all the metric data and have 
> >nova-scheduler join loads the new table with the compute_nodes table to get 
> >all the data(https://review.openstack.org/35759). Someone is concerning 
> >about the performance penalty of the join load operation when there are many 
> >metrics data stored in the DB for every single compute node. Don suggested 
> >adding a new column in the current compute_nodes table in DB, and put all 
> >metric data into a dictionary key/value format and store the json encoded 
> >string of the dictionary into that new column in DB.
> >
> >I'm just wondering which way has less performance impact, join load with a 
> >new table with quite a lot of rows, or json encode/decode a dictionary with 
> >a lot of key/value pairs?
> >
> >Thanks,
> >-Lianhao
> 
> I'm really confused. Why are we talking about collecting host
> metrics in nova when we've got a whole project to do that in
> ceilometer? I think utilization based scheduling would be a great
> thing, but it really out to be interfacing with ceilometer to get
> that data. Storing it again in nova (or even worse collecting it a
> second time in nova) seems like the wrong direction.
> 
> I think there was an equiv patch series at the end of Grizzly that
> was pushed out for the same reasons.
> 
> If there is a reason ceilometer can't be used in this case, we
> should have that discussion here on the list. Because my initial
> reading of this blueprint and the code patches is that it partially
> duplicates ceilometer function, which we definitely don't want to
> do. Would be happy to be proved wrong on that.

IIUC, Ceilometer is currently a downstream consumer of data from Nova, but
no functionality in Nova is a consumer of data from Ceilometer. This is good
split from a security separation point of view, since the security of Nova
is self-contained in this architecture.

If Nova schedular becomes dependant on data from ceilometer, then now the
security of Nova depends on the security of Ceilometer, expanding the attack
surface. This is not good architecture IMHO.

At the same time, I hear your concerns about the potential for duplication
of stats collection functionality between Nova & Ceilometer. I don't think
we neccessarily need to remove 100% of duplication. IMHO probably the key
thing is for the virt drivers to expose a standard API for exporting the
stats, and make sure that both ceilometer & nova schedular use the same
APIs and ideally the same data feed, so we're not invoking the same APIs
twice to get the same data.


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] New DB column or new DB table?

2013-07-19 Thread Sandy Walsh


On 07/19/2013 09:43 AM, Sandy Walsh wrote:
> 
> 
> On 07/18/2013 11:12 PM, Lu, Lianhao wrote:
>> Sean Dague wrote on 2013-07-18:
>>> On 07/17/2013 10:54 PM, Lu, Lianhao wrote:
 Hi fellows,

 Currently we're implementing the BP 
 https://blueprints.launchpad.net/nova/+spec/utilization-aware-scheduling. 
 The main idea is to have
>>> an extensible plugin framework on nova-compute where every plugin can get 
>>> different metrics(e.g. CPU utilization, memory cache
>>> utilization, network bandwidth, etc.) to store into the DB, and the 
>>> nova-scheduler will use that data from DB for scheduling decision.

 Currently we adds a new table to store all the metric data and have 
 nova-scheduler join loads the new table with the compute_nodes
>>> table to get all the data(https://review.openstack.org/35759). Someone is 
>>> concerning about the performance penalty of the join load
>>> operation when there are many metrics data stored in the DB for every 
>>> single compute node. Don suggested adding a new column in the
>>> current compute_nodes table in DB, and put all metric data into a 
>>> dictionary key/value format and store the json encoded string of the
>>> dictionary into that new column in DB.

 I'm just wondering which way has less performance impact, join load
 with a new table with quite a lot of rows, or json encode/decode a
 dictionary with a lot of key/value pairs?

 Thanks,
 -Lianhao
>>>
>>> I'm really confused. Why are we talking about collecting host metrics in
>>> nova when we've got a whole project to do that in ceilometer? I think
>>> utilization based scheduling would be a great thing, but it really out
>>> to be interfacing with ceilometer to get that data. Storing it again in
>>> nova (or even worse collecting it a second time in nova) seems like the
>>> wrong direction.
>>>
>>> I think there was an equiv patch series at the end of Grizzly that was
>>> pushed out for the same reasons.
>>>
>>> If there is a reason ceilometer can't be used in this case, we should
>>> have that discussion here on the list. Because my initial reading of
>>> this blueprint and the code patches is that it partially duplicates
>>> ceilometer function, which we definitely don't want to do. Would be
>>> happy to be proved wrong on that.
>>>
>>> -Sean
>>>
>> Using ceilometer as the source of those metrics was discussed in the
>> nova-scheduler subgroup meeting. (see #topic extending data in host
>> state in the following link).
>> http://eavesdrop.openstack.org/meetings/scheduler/2013/scheduler.2013-04-30-15.04.log.html
>>
>> In that meeting, all agreed that ceilometer would be a great source of
>> metrics for scheduler, but many of them don't want to make the
>> ceilometer as a mandatory dependency for nova scheduler. 
> 
> This was also discussed at the Havana summit and rejected since we
> didn't want to introduce the external dependency of Ceilometer into Nova.
> 
> That said, we already have hooks at the virt layer for collecting host
> metrics and we're talking about removing the pollsters from nova compute
> nodes if the data can be collected from these existing hooks.
> 
> Whatever solution the scheduler group decides to use should utilize the
> existing (and maintained/growing) mechanisms we have in place there.
> That is, it should likely be a special notification driver that can get
> the data back to the scheduler in a timely fashion. It wouldn't have to
> use the rpc mechanism if it didn't want to, but it should be a plug-in
> at the notification layer.
> 
> Please don't add yet another way of pulling metric data out of the hosts.
> 
> -S

I should also add, that if you go the notification route, that doesn't
close the door on ceilometer integration. All you need is a means to get
the data from the notification driver to the scheduler, that part could
easily be replaced with a ceilometer driver if an operator wanted to go
that route.

The benefits of using Ceilometer would be having access to the
downstream events/meters and generated statistics that could be produced
there. We certainly don't want to add an advanced statistical package or
event-stream manager to Nova, when Ceilometer already has aspirations of
that.

The out-of-the-box nova experience should be better scheduling when
simple host metrics are used internally but really great scheduling when
integrated with Ceilometer.

> 
> 
> 
> 
>> Besides, currently ceilometer doesn't have "host metrics", like the 
>> cpu/network/cache utilization data of the compute node host, which
>> will affect the scheduling decision. What ceilometer has currently
>> is the "VM metrics", like cpu/network utilization of each VM instance.
>>
>> After the nova compute node collects the "host metrics", those metrics
>> could also be fed into ceilometer framework(e.g. through a ceilometer
>> listener) for further processing, like alarming, etc.
>>
>> -Lianhao
>>
>> ___

Re: [openstack-dev] [Nova] New DB column or new DB table?

2013-07-19 Thread Sandy Walsh


On 07/18/2013 11:12 PM, Lu, Lianhao wrote:
> Sean Dague wrote on 2013-07-18:
>> On 07/17/2013 10:54 PM, Lu, Lianhao wrote:
>>> Hi fellows,
>>>
>>> Currently we're implementing the BP 
>>> https://blueprints.launchpad.net/nova/+spec/utilization-aware-scheduling. 
>>> The main idea is to have
>> an extensible plugin framework on nova-compute where every plugin can get 
>> different metrics(e.g. CPU utilization, memory cache
>> utilization, network bandwidth, etc.) to store into the DB, and the 
>> nova-scheduler will use that data from DB for scheduling decision.
>>>
>>> Currently we adds a new table to store all the metric data and have 
>>> nova-scheduler join loads the new table with the compute_nodes
>> table to get all the data(https://review.openstack.org/35759). Someone is 
>> concerning about the performance penalty of the join load
>> operation when there are many metrics data stored in the DB for every single 
>> compute node. Don suggested adding a new column in the
>> current compute_nodes table in DB, and put all metric data into a dictionary 
>> key/value format and store the json encoded string of the
>> dictionary into that new column in DB.
>>>
>>> I'm just wondering which way has less performance impact, join load
>>> with a new table with quite a lot of rows, or json encode/decode a
>>> dictionary with a lot of key/value pairs?
>>>
>>> Thanks,
>>> -Lianhao
>>
>> I'm really confused. Why are we talking about collecting host metrics in
>> nova when we've got a whole project to do that in ceilometer? I think
>> utilization based scheduling would be a great thing, but it really out
>> to be interfacing with ceilometer to get that data. Storing it again in
>> nova (or even worse collecting it a second time in nova) seems like the
>> wrong direction.
>>
>> I think there was an equiv patch series at the end of Grizzly that was
>> pushed out for the same reasons.
>>
>> If there is a reason ceilometer can't be used in this case, we should
>> have that discussion here on the list. Because my initial reading of
>> this blueprint and the code patches is that it partially duplicates
>> ceilometer function, which we definitely don't want to do. Would be
>> happy to be proved wrong on that.
>>
>>  -Sean
>>
> Using ceilometer as the source of those metrics was discussed in the
> nova-scheduler subgroup meeting. (see #topic extending data in host
> state in the following link).
> http://eavesdrop.openstack.org/meetings/scheduler/2013/scheduler.2013-04-30-15.04.log.html
> 
> In that meeting, all agreed that ceilometer would be a great source of
> metrics for scheduler, but many of them don't want to make the
> ceilometer as a mandatory dependency for nova scheduler. 

This was also discussed at the Havana summit and rejected since we
didn't want to introduce the external dependency of Ceilometer into Nova.

That said, we already have hooks at the virt layer for collecting host
metrics and we're talking about removing the pollsters from nova compute
nodes if the data can be collected from these existing hooks.

Whatever solution the scheduler group decides to use should utilize the
existing (and maintained/growing) mechanisms we have in place there.
That is, it should likely be a special notification driver that can get
the data back to the scheduler in a timely fashion. It wouldn't have to
use the rpc mechanism if it didn't want to, but it should be a plug-in
at the notification layer.

Please don't add yet another way of pulling metric data out of the hosts.

-S




> Besides, currently ceilometer doesn't have "host metrics", like the 
> cpu/network/cache utilization data of the compute node host, which
> will affect the scheduling decision. What ceilometer has currently
> is the "VM metrics", like cpu/network utilization of each VM instance.
> 
> After the nova compute node collects the "host metrics", those metrics
> could also be fed into ceilometer framework(e.g. through a ceilometer
> listener) for further processing, like alarming, etc.
> 
> -Lianhao
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] New DB column or new DB table?

2013-07-19 Thread Day, Phil
Ceilometer is a great project for taking metrics available in Nova and other 
systems and making them available for use by Operations, Billing, Monitoring, 
etc - and clearly we should try and avoid having multiple collectors of the 
same data.

But making the Nova scheduler dependent on Ceilometer seems to be the wrong way 
round to me - scheduling is such a fundamental operation that I want Nova to be 
self sufficient in this regard.   In particular I don't want the availability 
of my core compute platform to be constrained by the availability of my (still 
evolving) monitoring system.

If Ceilometer can be fed from the data used by the Nova scheduler then that's a 
good plus - but not the other way round.

Phil

> -Original Message-
> From: Sean Dague [mailto:s...@dague.net]
> Sent: 18 July 2013 12:05
> To: OpenStack Development Mailing List
> Subject: Re: [openstack-dev] [Nova] New DB column or new DB table?
> 
> On 07/17/2013 10:54 PM, Lu, Lianhao wrote:
> > Hi fellows,
> >
> > Currently we're implementing the BP
> https://blueprints.launchpad.net/nova/+spec/utilization-aware-scheduling. The
> main idea is to have an extensible plugin framework on nova-compute where
> every plugin can get different metrics(e.g. CPU utilization, memory cache
> utilization, network bandwidth, etc.) to store into the DB, and the nova-
> scheduler will use that data from DB for scheduling decision.
> >
> > Currently we adds a new table to store all the metric data and have nova-
> scheduler join loads the new table with the compute_nodes table to get all the
> data(https://review.openstack.org/35759). Someone is concerning about the
> performance penalty of the join load operation when there are many metrics
> data stored in the DB for every single compute node. Don suggested adding a
> new column in the current compute_nodes table in DB, and put all metric data
> into a dictionary key/value format and store the json encoded string of the
> dictionary into that new column in DB.
> >
> > I'm just wondering which way has less performance impact, join load with a
> new table with quite a lot of rows, or json encode/decode a dictionary with a
> lot of key/value pairs?
> >
> > Thanks,
> > -Lianhao
> 
> I'm really confused. Why are we talking about collecting host metrics in nova
> when we've got a whole project to do that in ceilometer? I think utilization
> based scheduling would be a great thing, but it really out to be interfacing 
> with
> ceilometer to get that data. Storing it again in nova (or even worse 
> collecting it
> a second time in nova) seems like the wrong direction.
> 
> I think there was an equiv patch series at the end of Grizzly that was pushed 
> out
> for the same reasons.
> 
> If there is a reason ceilometer can't be used in this case, we should have 
> that
> discussion here on the list. Because my initial reading of this blueprint and 
> the
> code patches is that it partially duplicates ceilometer function, which we
> definitely don't want to do. Would be happy to be proved wrong on that.
> 
>   -Sean
> 
> --
> Sean Dague
> http://dague.net
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] New DB column or new DB table?

2013-07-18 Thread Lu, Lianhao
Sean Dague wrote on 2013-07-18:
> On 07/17/2013 10:54 PM, Lu, Lianhao wrote:
>> Hi fellows,
>> 
>> Currently we're implementing the BP 
>> https://blueprints.launchpad.net/nova/+spec/utilization-aware-scheduling. 
>> The main idea is to have
> an extensible plugin framework on nova-compute where every plugin can get 
> different metrics(e.g. CPU utilization, memory cache
> utilization, network bandwidth, etc.) to store into the DB, and the 
> nova-scheduler will use that data from DB for scheduling decision.
>> 
>> Currently we adds a new table to store all the metric data and have 
>> nova-scheduler join loads the new table with the compute_nodes
> table to get all the data(https://review.openstack.org/35759). Someone is 
> concerning about the performance penalty of the join load
> operation when there are many metrics data stored in the DB for every single 
> compute node. Don suggested adding a new column in the
> current compute_nodes table in DB, and put all metric data into a dictionary 
> key/value format and store the json encoded string of the
> dictionary into that new column in DB.
>> 
>> I'm just wondering which way has less performance impact, join load
>> with a new table with quite a lot of rows, or json encode/decode a
>> dictionary with a lot of key/value pairs?
>> 
>> Thanks,
>> -Lianhao
> 
> I'm really confused. Why are we talking about collecting host metrics in
> nova when we've got a whole project to do that in ceilometer? I think
> utilization based scheduling would be a great thing, but it really out
> to be interfacing with ceilometer to get that data. Storing it again in
> nova (or even worse collecting it a second time in nova) seems like the
> wrong direction.
> 
> I think there was an equiv patch series at the end of Grizzly that was
> pushed out for the same reasons.
> 
> If there is a reason ceilometer can't be used in this case, we should
> have that discussion here on the list. Because my initial reading of
> this blueprint and the code patches is that it partially duplicates
> ceilometer function, which we definitely don't want to do. Would be
> happy to be proved wrong on that.
> 
>   -Sean
>
Using ceilometer as the source of those metrics was discussed in the
nova-scheduler subgroup meeting. (see #topic extending data in host
state in the following link).
http://eavesdrop.openstack.org/meetings/scheduler/2013/scheduler.2013-04-30-15.04.log.html

In that meeting, all agreed that ceilometer would be a great source of
metrics for scheduler, but many of them don't want to make the
ceilometer as a mandatory dependency for nova scheduler. 

Besides, currently ceilometer doesn't have "host metrics", like the 
cpu/network/cache utilization data of the compute node host, which
will affect the scheduling decision. What ceilometer has currently
is the "VM metrics", like cpu/network utilization of each VM instance.

After the nova compute node collects the "host metrics", those metrics
could also be fed into ceilometer framework(e.g. through a ceilometer
listener) for further processing, like alarming, etc.

-Lianhao

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] New DB column or new DB table?

2013-07-18 Thread Murray, Paul (HP Cloud Services)
Hi Jay, Lianhao, All,

Sorry if this comes out of order - for some reason I am not receiving the 
messages so I'm cut-and-pasting from the archive :( 

I think I might mean something closer to Brian's blue print (now I've seen it) 
https://blueprints.launchpad.net/nova/+spec/heterogeneous-instance-types 

Really I want to do resource management the way vcpu, memory and disk do. The 
scheduler chooses where to place instances according to an understanding of the 
available and free resources (and updates that when scheduling multiple 
instances, as in the consume_from_instance method of 
nova.scheduler.host_manager.HostState). Likewise, the compute node checks (in 
the test method of nova.compute.claims.Claim ) that they are available before 
accepting an instance. When the instance is created it reports back the usage 
to the database via the resource tracker. This is actually accounting what has 
been allocated, not an on-going measure of what is being used. 

Extra specs can certainly be used, but that does not provide the feedback loop 
between the compute nodes and the scheduler necessary to do the accounting of 
resource consumption.

What I would need for a generic way to do this is plugins at the compute node, 
a way to pass arbitrary resource consumption information back through the 
database, and plugins at the scheduler. So I am going beyond what is described 
here but the basic mechanisms are the same. The alternative is to code in each 
new resource we want to manage (which may not be that many really - but they 
may not be there for all installations).

Interestingly the 
https://blueprints.launchpad.net/nova/+spec/generic-host-state-for-scheduler 
blueprint (referenced in the patch) does talk about going to ceilometer. And 
that does seem to make sense to me. 

BTW, I'm getting all the other emails - just not this thread!

Bemused...
Paul


On 07/18/2013 10:44 AM, Murray, Paul (HP Cloud Services) wrote:
> Hi All,
>
> I would like to chip in with something from the side here (sorry to stretch 
> the discussion out).
>
> I was looking for a mechanism to do something like this in the context of 
> this blueprint on network aware scheduling: 
> https://blueprints.launchpad.net/nova/+spec/network-bandwidth-entitlement 
> Essentially the problem is that I need to add network bandwidth resource 
> allocation information just like vcpu, memory and disk space already has. I 
> could hard code this just as they are, but I can also think of a couple of 
> others we would like to add that may be more specific to a given 
> installation. So I could do with a generic way to feed this information back 
> from the compute node to the scheduler just like this.
>
> However, my use case is not the same - it is not meant to be for 
> monitored/statistical utilization info. But I would like a similar mechanism 
> to allow the scheduler to keep track of more general / extensible resource 
> allocation.

How is that a different use case from Lianhao's? You mean instead of 
collected usage metrics you want to allocate based on the value of a 
transient statistic like current network bandwidth utilisation?

> Do you have any thoughts on that? Again, don't mean to deflect the discussion 
> - just I have another use case.

I tend to agree with both Brian and Sean on this. I agree with Sean in 
that it seems duplicative to store compute_node_resources in the Nova 
database when a simple REST call to Ceilometer would avoid the 
duplication. And I agree with Brian that the extra_specs scheduler 
filters seem like they would fit the "check a current bandwidth 
statistic" type use case you describe above, Paul.

Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] New DB column or new DB table?

2013-07-18 Thread Jay Pipes

On 07/18/2013 10:44 AM, Murray, Paul (HP Cloud Services) wrote:

Hi All,

I would like to chip in with something from the side here (sorry to stretch the 
discussion out).

I was looking for a mechanism to do something like this in the context of this 
blueprint on network aware scheduling: 
https://blueprints.launchpad.net/nova/+spec/network-bandwidth-entitlement 
Essentially the problem is that I need to add network bandwidth resource 
allocation information just like vcpu, memory and disk space already has. I 
could hard code this just as they are, but I can also think of a couple of 
others we would like to add that may be more specific to a given installation. 
So I could do with a generic way to feed this information back from the compute 
node to the scheduler just like this.

However, my use case is not the same - it is not meant to be for 
monitored/statistical utilization info. But I would like a similar mechanism to 
allow the scheduler to keep track of more general / extensible resource 
allocation.


How is that a different use case from Lianhao's? You mean instead of 
collected usage metrics you want to allocate based on the value of a 
transient statistic like current network bandwidth utilisation?



Do you have any thoughts on that? Again, don't mean to deflect the discussion - 
just I have another use case.


I tend to agree with both Brian and Sean on this. I agree with Sean in 
that it seems duplicative to store compute_node_resources in the Nova 
database when a simple REST call to Ceilometer would avoid the 
duplication. And I agree with Brian that the extra_specs scheduler 
filters seem like they would fit the "check a current bandwidth 
statistic" type use case you describe above, Paul.


Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] New DB column or new DB table?

2013-07-18 Thread Murray, Paul (HP Cloud Services)
Hi All,

I would like to chip in with something from the side here (sorry to stretch the 
discussion out).

I was looking for a mechanism to do something like this in the context of this 
blueprint on network aware scheduling: 
https://blueprints.launchpad.net/nova/+spec/network-bandwidth-entitlement 
Essentially the problem is that I need to add network bandwidth resource 
allocation information just like vcpu, memory and disk space already has. I 
could hard code this just as they are, but I can also think of a couple of 
others we would like to add that may be more specific to a given installation. 
So I could do with a generic way to feed this information back from the compute 
node to the scheduler just like this.

However, my use case is not the same - it is not meant to be for 
monitored/statistical utilization info. But I would like a similar mechanism to 
allow the scheduler to keep track of more general / extensible resource 
allocation.

Do you have any thoughts on that? Again, don't mean to deflect the discussion - 
just I have another use case.

Paul.


>-Original Message-
>From: Sean Dague [mailto:s...@dague.net]
>Sent: 18 July 2013 12:05
>To: OpenStack Development Mailing List
>Subject: Re: [openstack-dev] [Nova] New DB column or new DB table?
>
>On 07/17/2013 10:54 PM, Lu, Lianhao wrote:
>> Hi fellows,
>>
>> Currently we're implementing the BP
>https://blueprints.launchpad.net/nova/+spec/utilization-aware-scheduling.
>The main idea is to have an extensible plugin framework on nova-compute 
>where every plugin can get different metrics(e.g. CPU utilization, 
>memory cache utilization, network bandwidth, etc.) to store into the 
>DB, and the nova- scheduler will use that data from DB for scheduling decision.
>>
>> Currently we adds a new table to store all the metric data and have 
>> nova-
>scheduler join loads the new table with the compute_nodes table to get 
>all the data(https://review.openstack.org/35759). Someone is concerning 
>about the performance penalty of the join load operation when there are 
>many metrics data stored in the DB for every single compute node. Don 
>suggested adding a new column in the current compute_nodes table in DB, 
>and put all metric data into a dictionary key/value format and store 
>the json encoded string of the dictionary into that new column in DB.
>>
>> I'm just wondering which way has less performance impact, join load 
>> with a
>new table with quite a lot of rows, or json encode/decode a dictionary 
>with a lot of key/value pairs?
>>
>> Thanks,
>> -Lianhao
>
>I'm really confused. Why are we talking about collecting host metrics 
>in nova when we've got a whole project to do that in ceilometer? I 
>think utilization based scheduling would be a great thing, but it 
>really out to be interfacing with ceilometer to get that data. Storing 
>it again in nova (or even worse collecting it a second time in nova) seems 
>like the wrong direction.
>
>I think there was an equiv patch series at the end of Grizzly that was 
>pushed out for the same reasons.
>
>If there is a reason ceilometer can't be used in this case, we should 
>have that discussion here on the list. Because my initial reading of 
>this blueprint and the code patches is that it partially duplicates 
>ceilometer function, which we definitely don't want to do. Would be happy to 
>be proved wrong on that.
>
>   -Sean
>
>--
>Sean Dague
>http://dague.net
>
>___
>OpenStack-dev mailing list
>OpenStack-dev@lists.openstack.org
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] New DB column or new DB table?

2013-07-18 Thread Sean Dague

On 07/17/2013 10:54 PM, Lu, Lianhao wrote:

Hi fellows,

Currently we're implementing the BP 
https://blueprints.launchpad.net/nova/+spec/utilization-aware-scheduling. The 
main idea is to have an extensible plugin framework on nova-compute where every 
plugin can get different metrics(e.g. CPU utilization, memory cache 
utilization, network bandwidth, etc.) to store into the DB, and the 
nova-scheduler will use that data from DB for scheduling decision.

Currently we adds a new table to store all the metric data and have 
nova-scheduler join loads the new table with the compute_nodes table to get all 
the data(https://review.openstack.org/35759). Someone is concerning about the 
performance penalty of the join load operation when there are many metrics data 
stored in the DB for every single compute node. Don suggested adding a new 
column in the current compute_nodes table in DB, and put all metric data into a 
dictionary key/value format and store the json encoded string of the dictionary 
into that new column in DB.

I'm just wondering which way has less performance impact, join load with a new 
table with quite a lot of rows, or json encode/decode a dictionary with a lot 
of key/value pairs?

Thanks,
-Lianhao


I'm really confused. Why are we talking about collecting host metrics in 
nova when we've got a whole project to do that in ceilometer? I think 
utilization based scheduling would be a great thing, but it really out 
to be interfacing with ceilometer to get that data. Storing it again in 
nova (or even worse collecting it a second time in nova) seems like the 
wrong direction.


I think there was an equiv patch series at the end of Grizzly that was 
pushed out for the same reasons.


If there is a reason ceilometer can't be used in this case, we should 
have that discussion here on the list. Because my initial reading of 
this blueprint and the code patches is that it partially duplicates 
ceilometer function, which we definitely don't want to do. Would be 
happy to be proved wrong on that.


-Sean

--
Sean Dague
http://dague.net

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] New DB column or new DB table?

2013-07-17 Thread Brian Schott
How is this different than extra_specs used by the filter scheduler?
http://docs.openstack.org/developer/nova/devref/filter_scheduler.html

I did some very old blueprints related to heterogeneous architectures that had 
similar goals.  
https://blueprints.launchpad.net/nova/+spec/heterogeneous-instance-types


Fully support the idea, but probably can adapt the existing functionality to 
also target metric data.
—
Sent from Mailbox for iPad

On Wed, Jul 17, 2013 at 10:56 PM, Lu, Lianhao 
wrote:

> Hi fellows,
> Currently we're implementing the BP 
> https://blueprints.launchpad.net/nova/+spec/utilization-aware-scheduling. The 
> main idea is to have an extensible plugin framework on nova-compute where 
> every plugin can get different metrics(e.g. CPU utilization, memory cache 
> utilization, network bandwidth, etc.) to store into the DB, and the 
> nova-scheduler will use that data from DB for scheduling decision.
> Currently we adds a new table to store all the metric data and have 
> nova-scheduler join loads the new table with the compute_nodes table to get 
> all the data(https://review.openstack.org/35759). Someone is concerning about 
> the performance penalty of the join load operation when there are many 
> metrics data stored in the DB for every single compute node. Don suggested 
> adding a new column in the current compute_nodes table in DB, and put all 
> metric data into a dictionary key/value format and store the json encoded 
> string of the dictionary into that new column in DB. 
> I'm just wondering which way has less performance impact, join load with a 
> new table with quite a lot of rows, or json encode/decode a dictionary with a 
> lot of key/value pairs?
> Thanks,
> -Lianhao
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev