[openstack-dev] [Nova][Heat] How to reliably detect VM failures? (Zane Bitter)

2014-03-19 Thread WICKES, ROGER
> On 03/18/2014 07:54 AM, Qiming Teng wrote:
>> Hi, Folks,
>>
>>I have been trying to implement a HACluster resource type in Heat. I
>> haven't created a BluePrint for this because I am not sure everything
>> will work as expected.
...
>>The most difficult issue here is to come up with a reliable VM failure
>> detection mechanism.  The service_group feature in Nova only concerns
>> about the OpenStack services themselves, not the VMs.  Considering that
>> in our customer's cloud environment, user provided images can be used,
>> we cannot assume some agents in the VMs to send heartbeat signals.

[Roger] My response is more of a user-oriented rather than developer-
oriented, but was asked on dev so...here goes:

When enabled, the hypervisor is always collecting (and sending to 
Ceilometer) basic cpu, memory stats that you can alarm on. 
http://docs.openstack.org/trunk/openstack-ops/content/logging_monitoring.html

For external monitoring, consider setting up a Nagios or Selenium server 
for agent-less monitoring. You can have it do the most basic heartbeat 
(ping) test; if the ping is slow for a period of say five minutes, or fails, 
alarm 
that you have a network problem. You can use Selenium to execute synthetic
transactions against whatever the server is supposed to provide; if it does it
for you, you can assume it is doing it for everyone else. If it fails, you can 
take action
http://www.seleniumhq.org
You can also use Selenium to re-run selected OpenStack test cases to ensure 
your 
infrastructure is working properly.

>>I have checked the 'instance' table in Nova database, it seemed that
>> the 'update_at' column is only updated when VM state changed and
>> reported.  If the 'heartbeat' messages are coming in from many VMs very
>> frequently, there could be a DB query performance/scalability issue,
>> right?

[Roger] For time-series, high-volume collection, consider going to a 
non-relational 
system like RRDTool, PyRRD, Graphite, etc. if you want to store the history and 
look 
for trends. 

>>So, how can I detect VM failures reliably, so that I can notify Heat
>> to take the appropriate recovery action?

[Roger] When Nagios detects a problem, have it kick off the appropriate script
(shell script) that invokes the Heat API or other to fix the issue with the 
cluster. 
I think you were hoping that Heat could be coded to automagically fix any 
issue, 
but I think you may need to be more specific; develop specific use cases for 
what 
you mean by "VM failure", as the desired action may be different depending on 
the type of failure. 

> Qiming,
>
> Check out
>
> https://github.com/openstack/heat-templates/blob/master/cfn/F17/WordPress_Single_Instance_With_HA.template

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack] Need unique ID for every Network Service

2014-03-03 Thread WICKES, ROGER
Maybe I am misunderstanding the debate, but imho Every OpenStack Service (XaaS) 
needs to be listed in the Service Catalog as being available (and stable and 
tested), and every instance of that service, when started, needs a service ID, 
and every X created by that service needs a UUID aka object id. This is 
regardless of how many of them are per tenant or host or whatever. This 
discussion may be semantics, but just to be clear, LBaaS is the service that is 
called to create an LB. 

On the surface, it makes sense that you would only have one Service running per 
tenant; every object or instantiation created by that service (a Load Balancer, 
in this case) must have a UUID. I can't imagine why you would want multiple 
LBaaS services running at the same time, but again my imagination is limited. I 
am sure someone else has more imagination, such as a tenant having two vApps 
located on hosts in two different data centers, and they want an LBaaS in each 
data center since their inventory system or whatever is restricted to a single 
data center. If there were like two or three LBaaS' running, how would Neutron 
or Heat etc. know which one to call (criteria) when the network changes? It 
would be like having two butlers. 

A UUID on each Load Balancer is needed for alarming, callbacks, service 
assurance, service delivery, service availability monitoring and reporting, 
billing, compliance audits, and simply being able to modify the service. If 
there is an n-ary tuple relationship between LB and anything, you might be 
inclined to restrict only one LB per vApp. However, for ultra-high volume and 
high-availability apps we may want cross-redundant LB's with a third LB in 
front of the first two; that way if one gets overloaded or crashes, we can 
route to the other. A user might want to even mix and match hard and soft LB's 
in a hybrid environment. So, even in that use case, restricting the number of 
LB's or their tupleness is restrictive. 

I also want to say to those who are struggling with reasonable n-ary 
relationship modeling: This is just a problem with global app development, 
where there are so many use cases out there. It's tough to never say never, as 
in, you would never want more than one LBaaS per tenant. 

[Roger] --
From: Srikanth Kumar Lingala [mailto:srikanth.ling...@freescale.com]
Sent: Monday, March 03, 2014 5:18 PM
To: Stephen Balukoff; Veera Reddy
Cc: openstack-dev@lists.openstack.org; openstack
Subject: Re: [openstack-dev] [Openstack] Need unique ID for every Network 
Service

Yes..I will send a mail to Eugene Nikanorov, requesting to add this to the 
agenda in the coming weekly discussion.
Detailed requirement is as follows:
In the current implementation, only one LBaaS configuration is possible per 
tenant. It is better to have multiple LBaaS configurations for each tenant.
We are planning to configure haproxy as VM in a Network Service Chain. In a 
chain, there may be possibility of multiple Network Services of the same type 
(For Eg: Haproxy). For that, each Network Service should have a Unique ID 
(UUID) for a tenant.

Regards,
Srikanth.

From: Stephen Balukoff [mailto:sbaluk...@bluebox.net]
Sent: Saturday, March 01, 2014 1:22 AM
To: Veera Reddy
Cc: Lingala Srikanth Kumar-B37208; 
openstack-dev@lists.openstack.org; 
openstack
Subject: Re: [Openstack] Need unique ID for every Network Service

Hi y'all!

The ongoing debate in the LBaaS group is whether the concept of a 
'Loadbalancer' needs to exist  as an entity. If it is decided that we need it, 
I'm sure it'll have a unique ID. (And please feel free to join the discussion 
on this as well, eh!)

Stephen

On Thu, Feb 27, 2014 at 10:27 PM, Veera Reddy 
mailto:veerare...@gmail.com>> wrote:
Hi,

Good idea to have unique id for each entry of network functions.
So that we can configure multiple network function with different configuration.


Regards,
Veera.

On Fri, Feb 28, 2014 at 11:23 AM, Srikanth Kumar Lingala 
mailto:srikanth.ling...@freescale.com>> wrote:
Hi-
In the existing Neutron, we have FWaaS, LBaaS, VPNaaS ?etc.
In FWaaS, each Firewall has its own UUID.
It is good to have a unique ID [UUID] for LBaaS also.

Please share your comments on the above.

Regards,
Srikanth.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][LBaaS] Proposal for model

2014-02-11 Thread WICKES, ROGER
[Roger] Hi Stephen! Great job! Obviously your experience is both awesome and 
essential here.

I would ask that we add a historical archive (physically implemented as a log 
file, probably) 
object to your model. When you mentioned sending data off to Ceilometer, that 
triggered 
me to think about one problem I have had to deal with is "what packet went 
where? " 
in diagnosing errors usually related to having a bug on 1 out of 5 
load-balanced servers, 
usually because of a deployed version mismatch, but could also be due to virus. 
When our
customer sees "hey every now and then this image is broken on a web page" that 
points
us to an inconsistent farm, and having the ability to trace or see which server 
got that customer's
packet (routed to by the LB) would really help in pinpointing the errant 
server.  

> Benefits of a new model
>
> If we were to adopt either of these data models, this would enable us to
> eventually support the following feature sets, in the following ways (for
> example):
>
> Automated scaling of load-balancer services
>
[Roger] Would the Heat module be called on to add more LB's to the farm?

> I talked about horizontal scaling of load balancers above under "High
> Availability," but, at least in the case of a software appliance, vertical
> scaling should also be possible in an active-standby cluster_model by
**

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Neutron] Using Python-Neutronclient from

2014-02-10 Thread WICKES, ROGER
All dictionaries are not created equal; in some respects it's like we have a 
medical dictionary and an automotive terms dictionary and on for each api. So, 
we need to document our dictionaries, by which key names (and possibly sample 
values) are required in the dictionary and which are optional (and when), for 
that particular API (e.g. create_credential) to work as expected. 
--

Message: 3
Date: Sat, 8 Feb 2014 19:20:02 +
From: "Collins, Sean" 
To: "OpenStack Development Mailing List (not for usage questions)"

Subject: [openstack-dev] [Neutron] Using Python-Neutronclient from
Python - docstrings needed?
Message-ID: <20140208192001.gb40...@hqsml-1081034.cable.comcast.com>
Content-Type: text/plain; charset="utf-8"

Hi,

I was writing a small script yesterday to parse a list of IP blocks and
create security groups and rules, by using python-neutronclient.

To be honest, it was very difficult - even though I have actually
written extensions to Python-Neutronclient for the QoS API. 

For those that are trying to use the client from inside their code,
they end up getting zero help as to how to actually call any of the
functions, and what parameters they take. 


>>> neutron = client.Client('2.0', auth_url=os.environ['OS_AUTH_URL'],
... tenant_id=os.environ['OS_TENANT_ID'],
... username=os.environ['OS_USERNAME'],
... password=os.environ['OS_PASSWORD'])
>>> help(neutron)

   |  create_credential = 
   |  
   |  create_firewall = 
   |  
   |  create_firewall_policy = 
   |  
   |  create_firewall_rule = 
   |  
   |  create_floatingip = 
   |  
   |  create_health_monitor = 
   |  
   |  create_ikepolicy = 
   |  
   |  create_ipsec_site_connection = 
   |  
   |  create_ipsecpolicy = 
   |  
   |  create_member = 
   |  
   |  create_metering_label = 


Since there was nothing there, I decided to go check the source of
python-neutronclient and see if there are any examples.

https://github.com/openstack/python-neutronclient/blob/master/doc/source/index.rst

If you read closely enough, you'll find out that the function takes a
dictionary, that looks very similar to the request/response examples
listed in the API documentation. So, I went over and checked it out.

http://docs.openstack.org/api/openstack-network/2.0/content/POST_security-groups-v2.0_createSecGroup_v2.0_security-groups_security-groups-ext.html

So from there, I was able to remember enough that each of these
functions takes a single argument, that is a dictionary, that mimics
the same structure that you see in the API documentation, so from there
it was just some experimentation to get the structure right.

Honestly it wasn't easy to remember all this stuff, since
it had been a couple months since I had worked with
python-neutronclient, and it had been from inside the library itself.

This was my first experience using it "on the outside" and it was pretty
tough - so I'm going to try and look into how we can improve the
docstrings for the client object, to make it a bit easier to figure out.

Thoughts?

-- 
Sean M. Collins

--

Message: 4
Date: Sat, 8 Feb 2014 14:22:24 -0800
From: Georgy Okrokvertskhov 
To: "OpenStack Development Mailing List (not for usage questions)"

Subject: Re: [openstack-dev] [heat] non-trivial example - IBM
Connections [and Murano]
Message-ID:

Content-Type: text/plain; charset="iso-8859-1"

Hi Mike,

Thank you for clarification. I like your approach with Ruby and I think
this is a right way to solve such tasks like DSL creation. In Murano we use
Yaml and python just to avoid introduction of a whole new language like
Ruby to OpenStack.

As for software configurations in heat, we are eager to have it available
for use. We use Heat in Murano and we want to pass as much as possible work
 to Heat engine. Murano itself is intended to be an Application Catalog for
managing available application packages and focuses on UI and user
experience rather then on deployment details. We still use DSL for several
things, have something working while waiting for Heat implementations, and
to have imperative workflow engine which is useful when you need to
orchestrate complex workflows. The last part is very powerful when you need
to have an explicit control on deployment sequence with conditional
branches orchestration among several different instances. When Mistral will
be available we plan to use its workflow engine for task orchestration.

Again, thank you for sharing the work you are doing in IBM. This is very
good feedback for OpenStack community and helps to understand how OpenStack
components used in enterprise use cases.

Thanks
Georgy


On Sat, Feb 8, 2014 at 10:52 AM, Mike Spreitzer  wrote:

> > From: Georgy Okrokvertskhov 
> > ...
> > Thank you for sharing this. It looks pretty impressive. Could you,
> > please some details