Re: [openstack-dev] [TripleO] Undercloud Ceilometer

2013-10-07 Thread Ladislav Smola

Hello Chris,

That would be much appreciated, thank you. :-)

Kind Regards,
Ladislav

On 10/05/2013 12:12 AM, Chris Jones wrote:

Hi

On 4 October 2013 16:28, Ladislav Smola > wrote:


test it. Anybody volunteers for this task? There will be a hard
part: doing the right configurations.
(firewall, keystone, snmpd.conf) So it's all configured in a clean
and a secured way. That would
require a seasoned sysadmin to at least observe the thing. Any
volunteers here? :-)


I'm not familiar at all with Ceilometer, but I'd be happy to discuss 
how/where things like snmpd are going to be exposed, and look over the 
resulting bits in tripleo :)


--
Cheers,

Chris


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Undercloud Ceilometer

2013-10-07 Thread Ladislav Smola

Hello Clint,

thank you for your feedback.

On 10/04/2013 06:08 PM, Clint Byrum wrote:

Excerpts from Ladislav Smola's message of 2013-10-04 08:28:22 -0700:

Hello,

just a few words about role of Ceilometer in the Undercloud and the work
in progress.

Why we need Ceilometer in Undercloud:
---

In Tuskar-UI, we will display number of statistics, that will show
Undercloud metrics.
Later also number of alerts and notifications, that will come from
Ceilometer.

But I do suspect, that the Heat will use the Ceilometer Alarms, similar
way it is using it for
auto-scaling in Overcloud. Can anybody confirm?

I have not heard of anyone want to "auto scale" baremetal for the
purpose of scaling out OpenStack itself. There is certainly a use case
for it when we run out of compute resources and happen to have spare
hardware around. But unlike on a cloud where you have several
applications all contending for the same hardware, in the undercloud we
have only one application, so it seems less likely that auto-scaling
will be needed. We definitely need "scaling", but I suspect it will not
be extremely elastic.


Yeah that's probably true. What I had in mind was something like
suspending hardware, that is no used at the time and e.g. have no
VM's running inside, for energy saving. And start it again when
we run out of compute resources, as you say.


What will be needed, however, is metrics for the rolling updates feature
we plan to add to Heat. We want to make sure that a rolling update does
not adversely affect the service level of the running cloud. If we're
early in the process with our canary-based deploy and suddenly CPU load is
shooting up on all of the completed nodes, something, perhaps Ceilometer,
should be able to send a signal to Heat, and trigger a rollback.


That is how Alarms should work now, you will just define the Alarm
inside of the Heat template, check the example:
https://github.com/openstack/heat-templates/blob/master/cfn/F17/AutoScalingCeilometer.yaml


What is planned in near future
---

The Hardware Agent capable of obtaining statistics:
https://blueprints.launchpad.net/ceilometer/+spec/monitoring-physical-devices
It uses SNMP inspector for obtaining the stats. I have tested that with
the Devtest tripleo setup
and it works.

The planned architecture is to have one Hardware Agent(will be merged to
central agent code)
placed on Control Node (or basically anywhere). That agent will poll
SNMP daemons placed on
hardware in the Undercloud(baremetals, network devices). Any objections
why this is a bad idea?

We will have to create a Ceilometer Image element, snmpd element is
already there, but we should
test it. Anybody volunteers for this task? There will be a hard part:
doing the right configurations.
(firewall, keystone, snmpd.conf) So it's all configured in a clean and a
secured way. That would
require a seasoned sysadmin to at least observe the thing. Any
volunteers here? :-)

The IPMI inspector for Hardware agent just started:
https://blueprints.launchpad.net/ceilometer/+spec/ipmi-inspector-for-monitoring-physical-devices
Seems it should query the Ironic API, which would provide the data
samples. Any objections?
Any volunteers for implementing this on Ironic side?

devananda and lifeless had a greatest concern about the scalability of a
Central agent. The Ceilometer
is not doing any scaling right now, but they are planning Horizontal
scaling of the central agent
for the future. So this is a very important task for us, for larger
deployments. Any feedback about
scaling? Or changing of architecture for better scalability?


I share their concerns. For < 100 nodes it is no big deal. But centralized
monitoring has a higher cost than distributed monitoring. I'd rather see
agents on the machines themselves do a bit more than respond to polling
so that load is distributed as much as possible and non-essential
network chatter is reduced.


Right now, for the central agent, it should be matter of configuration.
So you can set one central agent, fetching all baremetals from nova. Or
You can bake the central agent to each baremetal and set it to poll only
from localhost. Or one of distributed architecture, that is planned as
configuration option, is having node (Management Leaf node), that is
managing bunch of hardware, so the Central agent could be baked into it.

What the agent does then, is process the data, pack it into message
and send it to openstack message bus (should be heavily scalable) where
it is collected by a Collector (should be able to have many workers) and 
saved

to database.



I'm extremely interested in the novel approach that Assimilation
Monitoring [1] is taking to this problem, which is to have each node
monitor itself and two of its immediate neighbors on a switch and
some nodes monitor an additional node on a different switch. Failures
are reported to an API server which uses graph database queries to

Re: [openstack-dev] [TripleO] Undercloud Ceilometer

2013-10-04 Thread Clint Byrum
Excerpts from Ladislav Smola's message of 2013-10-04 08:28:22 -0700:
> Hello,
> 
> just a few words about role of Ceilometer in the Undercloud and the work 
> in progress.
> 
> Why we need Ceilometer in Undercloud:
> ---
> 
> In Tuskar-UI, we will display number of statistics, that will show 
> Undercloud metrics.
> Later also number of alerts and notifications, that will come from 
> Ceilometer.
> 
> But I do suspect, that the Heat will use the Ceilometer Alarms, similar 
> way it is using it for
> auto-scaling in Overcloud. Can anybody confirm?

I have not heard of anyone want to "auto scale" baremetal for the
purpose of scaling out OpenStack itself. There is certainly a use case
for it when we run out of compute resources and happen to have spare
hardware around. But unlike on a cloud where you have several
applications all contending for the same hardware, in the undercloud we
have only one application, so it seems less likely that auto-scaling
will be needed. We definitely need "scaling", but I suspect it will not
be extremely elastic.

What will be needed, however, is metrics for the rolling updates feature
we plan to add to Heat. We want to make sure that a rolling update does
not adversely affect the service level of the running cloud. If we're
early in the process with our canary-based deploy and suddenly CPU load is
shooting up on all of the completed nodes, something, perhaps Ceilometer,
should be able to send a signal to Heat, and trigger a rollback.

> 
> What is planned in near future
> ---
> 
> The Hardware Agent capable of obtaining statistics:
> https://blueprints.launchpad.net/ceilometer/+spec/monitoring-physical-devices
> It uses SNMP inspector for obtaining the stats. I have tested that with 
> the Devtest tripleo setup
> and it works.
> 
> The planned architecture is to have one Hardware Agent(will be merged to 
> central agent code)
> placed on Control Node (or basically anywhere). That agent will poll 
> SNMP daemons placed on
> hardware in the Undercloud(baremetals, network devices). Any objections 
> why this is a bad idea?
> 
> We will have to create a Ceilometer Image element, snmpd element is 
> already there, but we should
> test it. Anybody volunteers for this task? There will be a hard part: 
> doing the right configurations.
> (firewall, keystone, snmpd.conf) So it's all configured in a clean and a 
> secured way. That would
> require a seasoned sysadmin to at least observe the thing. Any 
> volunteers here? :-)
> 
> The IPMI inspector for Hardware agent just started:
> https://blueprints.launchpad.net/ceilometer/+spec/ipmi-inspector-for-monitoring-physical-devices
> Seems it should query the Ironic API, which would provide the data 
> samples. Any objections?
> Any volunteers for implementing this on Ironic side?
> 
> devananda and lifeless had a greatest concern about the scalability of a 
> Central agent. The Ceilometer
> is not doing any scaling right now, but they are planning Horizontal 
> scaling of the central agent
> for the future. So this is a very important task for us, for larger 
> deployments. Any feedback about
> scaling? Or changing of architecture for better scalability?
> 

I share their concerns. For < 100 nodes it is no big deal. But centralized
monitoring has a higher cost than distributed monitoring. I'd rather see
agents on the machines themselves do a bit more than respond to polling
so that load is distributed as much as possible and non-essential
network chatter is reduced.

I'm extremely interested in the novel approach that Assimilation
Monitoring [1] is taking to this problem, which is to have each node
monitor itself and two of its immediate neighbors on a switch and
some nodes monitor an additional node on a different switch. Failures
are reported to an API server which uses graph database queries to
determine at what level the failure occurred (single node, cascading,
or network level).

If Ceilometer could incorporate that type of light-weight high-scale
monitoring ethos, rather than implementing something we know does not
scale well at the level of scale OpenStack needs to be, I'd feel a lot
better about pushing it out as part of the standard deployment.

[1] http://assimmon.org/

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Undercloud Ceilometer

2013-10-04 Thread Chris Jones
Hi

On 4 October 2013 16:28, Ladislav Smola  wrote:

> test it. Anybody volunteers for this task? There will be a hard part:
> doing the right configurations.
> (firewall, keystone, snmpd.conf) So it's all configured in a clean and a
> secured way. That would
> require a seasoned sysadmin to at least observe the thing. Any volunteers
> here? :-)


I'm not familiar at all with Ceilometer, but I'd be happy to discuss
how/where things like snmpd are going to be exposed, and look over the
resulting bits in tripleo :)

-- 
Cheers,

Chris
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [TripleO] Undercloud Ceilometer

2013-10-04 Thread Ladislav Smola

Hello,

just a few words about role of Ceilometer in the Undercloud and the work 
in progress.


Why we need Ceilometer in Undercloud:
---

In Tuskar-UI, we will display number of statistics, that will show 
Undercloud metrics.
Later also number of alerts and notifications, that will come from 
Ceilometer.


But I do suspect, that the Heat will use the Ceilometer Alarms, similar 
way it is using it for

auto-scaling in Overcloud. Can anybody confirm?

What is planned in near future
---

The Hardware Agent capable of obtaining statistics:
https://blueprints.launchpad.net/ceilometer/+spec/monitoring-physical-devices
It uses SNMP inspector for obtaining the stats. I have tested that with 
the Devtest tripleo setup

and it works.

The planned architecture is to have one Hardware Agent(will be merged to 
central agent code)
placed on Control Node (or basically anywhere). That agent will poll 
SNMP daemons placed on
hardware in the Undercloud(baremetals, network devices). Any objections 
why this is a bad idea?


We will have to create a Ceilometer Image element, snmpd element is 
already there, but we should
test it. Anybody volunteers for this task? There will be a hard part: 
doing the right configurations.
(firewall, keystone, snmpd.conf) So it's all configured in a clean and a 
secured way. That would
require a seasoned sysadmin to at least observe the thing. Any 
volunteers here? :-)


The IPMI inspector for Hardware agent just started:
https://blueprints.launchpad.net/ceilometer/+spec/ipmi-inspector-for-monitoring-physical-devices
Seems it should query the Ironic API, which would provide the data 
samples. Any objections?

Any volunteers for implementing this on Ironic side?

devananda and lifeless had a greatest concern about the scalability of a 
Central agent. The Ceilometer
is not doing any scaling right now, but they are planning Horizontal 
scaling of the central agent
for the future. So this is a very important task for us, for larger 
deployments. Any feedback about

scaling? Or changing of architecture for better scalability?


Thank you for any feedback.

Kind Regards,
Ladislav









___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev