Re: [openstack-dev] [omni] Has the Platform9's hybrid multicloud project been abandoned?

2018-11-20 Thread Steven Hardy
On Tue, Nov 20, 2018 at 6:13 AM Blake Covarrubias  wrote:
>
> Hi Andrea,
>
> Omni has not been abandoned by Platform9. We're still developing Omni 
> internally, and are working to open source additional code as well as improve 
> docs so that others may more easily test & contribute.

It sounds like you're approaching this using an internal design
process, so you may like to check:

https://governance.openstack.org/tc/reference/opens.html

The most recent commit was over a year ago, so it's understandably
confusing to have an apparently unmaintained project, hosted under the
openstack github org, which doesn't actually follow our comunity
design/development process?

> With that said, I too would be interested in hearing how others are enabling, 
> or looking to enable, hybrid cloud use cases with OpenStack. I'm not aware of 
> any other projects with similar goals as Omni, however its possible I just 
> haven't been looking in the right places.

Having design discussions in the open, and some way for others in the
community to understand the goals and roadmap of the project is really
the first step in engaging folks for this sort of discussion IME.

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Proposing Enrique Llorente Pastora as a core reviewer for TripleO

2018-11-19 Thread Steven Hardy
On Thu, Nov 15, 2018 at 3:54 PM Sagi Shnaidman  wrote:
>
> Hi,
> I'd like to propose Quique (@quiquell) as a core reviewer for TripleO. Quique 
> is actively involved in improvements and development of TripleO and TripleO 
> CI. He also helps in other projects including but not limited to 
> Infrastructure.
> He shows a very good understanding how TripleO and CI works and I'd like 
> suggest him as core reviewer of TripleO in CI related code.
>
> Please vote!
> My +1 is here :)

+1!

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Proposing Bob Fournier as core reviewer

2018-10-22 Thread Steven Hardy
On Fri, Oct 19, 2018 at 1:24 PM Juan Antonio Osorio Robles
 wrote:
>
> Hello!
>
>
> I would like to propose Bob Fournier (bfournie) as a core reviewer in
> TripleO. His patches and reviews have spanned quite a wide range in our
> project, his reviews show great insight and quality and I think he would
> be a addition to the core team.
>
> What do you folks think?
+1!

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Removing global bootstrap_nodeid?

2018-09-25 Thread Steven Hardy
On Tue, Sep 25, 2018 at 2:06 PM Jiří Stránský  wrote:
>
> Hi Steve,

Thanks for the reply - summary of our follow-up discussion on IRC below:

> On 25/09/2018 10:51, Steven Hardy wrote:
> > Hi all,
> >
> > After some discussions with bandini at the PTG, I've been taking a
> > look at this bug and how to solve it:
> >
> > https://bugs.launchpad.net/tripleo/+bug/1792613
> > (Also more information in downstream bz1626140)
> >
> > The problem is that we always run container bootstrap tasks (as well
> > as a bunch of update/upgrade tasks) on the bootstrap_nodeid, which by
> > default is always the overcloud-controller-0 node (regardless of which
> > services are running there).
> >
> > This breaks a pattern we established a while ago for Composable HA,
> > where we' work out the bootstrap node by
> > $service_short_bootstrap_hostname, which means we always run on the
> > first node that has the service enabled (even if it spans more than
> > one Role).
> >
> > This presents two problems:
> >
> > 1. service_config_settings only puts the per-service hieradata on
> > nodes where a service is enabled, hence data needed for the
> > bootstrapping (e.g keystone users etc) can be missing if, say,
> > keystone is running on some role that's not Controller (this, I think
> > is the root-cause of the bug/bz linked above)
> >
> > 2. Even when we by luck have the data needed to complete the bootstrap
> > tasks, we'll end up pulling service containers to nodes where the
> > service isn't running, potentially wasting both time and space.
> >
> > I've been looking at solutions, and it seems like we either have to
> > generate per-service bootstrap_nodeid's (I have a patch to do this
> > https://review.openstack.org/605010), or perhaps we could just remove
> > all the bootstrap node id's, and switch to using hostnames instead?
> > Seems like that could be simpler, but wanted to check if there's
> > anything I'm missing?
>
> I think we should recheck he initial assumptions, because based on my
> testing:
>
> * the bootstrap_nodeid is in fact a hostname already, despite its
> deceptive name,
>
> * it's not global, it is per-role.
>
>  From my env:
>
> [root@overcloud-controller-2 ~]# hiera -c /etc/puppet/hiera.yaml
> bootstrap_nodeid
> overcloud-controller-0
>
> [root@overcloud-novacompute-1 ~]# hiera -c /etc/puppet/hiera.yaml
> bootstrap_nodeid
> overcloud-novacompute-0
>
> This makes me think the problems (1) and (2) as stated above shouldn't
> be happening. The containers or tasks present in service definition
> should be executed on all nodes where a service is present, and if they
> additionally filter for bootstrap_nodeid, it would only pick one node
> per role. So, the service *should* be deployed on the selected bootstrap
> node, which means the service-specific hiera should be present there and
> needless container downloading should not be happening, AFAICT.

Ah, I'd missed that because we have another different definition of
the bootstrap node ID here:

https://github.com/openstack/tripleo-heat-templates/blob/master/common/deploy-steps.j2#L283

That one is global, because it only considers primary_role_name, which
I think explains the problem described in the LP/BZ.

> However, thinking about it this way, we probably have a different
> problem still:
>
> (3) The actions which use bootstrap_nodeid check are not guaranteed to
> execute once per service. In case the service is present on multiple
> roles, the bootstrap_nodeid check succeeds once per such role.
>
> Using per-service bootstrap host instead of per-role bootstrap host
> sounds like going the right way then.

Yeah it seems like both definitions of the bootstrap node described
above are wrong in different ways ;)

> However, none of the above provides a solid explanation of what's really
> happening in the LP/BZ mentioned above. Hopefully this info is at least
> a piece of the puzzle.

Yup, thanks for working through it - as mentioned above I think the
problem is the docker-puppet.py conditional that runs the bootstrap
tasks uses the deploy-steps.j2 global bootstrap ID, so it can run on
potentially the wrong role.

Unless there's other ideas, I think this will be a multi-step fix:

1. Replace all t-h-t references for bootstrap_nodeid with per-service
bootstrap node names (I pushed
https://review.openstack.org/#/c/605046/ which may make this easier to
do in the ansible tasks)
2. Ensure puppet-tripleo does the same
3. Rework docker-puppet.py so it can read all the per-service
bootstrap nodes (and filter to only run when appropriate), or perhaps
figure out a way to do the filtering in the ansible tasks before
running do

[openstack-dev] [TripleO] Removing global bootstrap_nodeid?

2018-09-25 Thread Steven Hardy
Hi all,

After some discussions with bandini at the PTG, I've been taking a
look at this bug and how to solve it:

https://bugs.launchpad.net/tripleo/+bug/1792613
(Also more information in downstream bz1626140)

The problem is that we always run container bootstrap tasks (as well
as a bunch of update/upgrade tasks) on the bootstrap_nodeid, which by
default is always the overcloud-controller-0 node (regardless of which
services are running there).

This breaks a pattern we established a while ago for Composable HA,
where we' work out the bootstrap node by
$service_short_bootstrap_hostname, which means we always run on the
first node that has the service enabled (even if it spans more than
one Role).

This presents two problems:

1. service_config_settings only puts the per-service hieradata on
nodes where a service is enabled, hence data needed for the
bootstrapping (e.g keystone users etc) can be missing if, say,
keystone is running on some role that's not Controller (this, I think
is the root-cause of the bug/bz linked above)

2. Even when we by luck have the data needed to complete the bootstrap
tasks, we'll end up pulling service containers to nodes where the
service isn't running, potentially wasting both time and space.

I've been looking at solutions, and it seems like we either have to
generate per-service bootstrap_nodeid's (I have a patch to do this
https://review.openstack.org/605010), or perhaps we could just remove
all the bootstrap node id's, and switch to using hostnames instead?
Seems like that could be simpler, but wanted to check if there's
anything I'm missing?

[root@overcloud-controller-0 ~]# ansible -m setup localhost | grep hostname
 [WARNING]: provided hosts list is empty, only localhost is available. Note
that the implicit localhost does not match 'all'
"ansible_hostname": "overcloud-controller-0",
"facter_hostname": "overcloud-controller-0",
[root@overcloud-controller-0 ~]# hiera -c /etc/puppet/hiera.yaml
xinetd_short_bootstrap_node_name
overcloud-controller-0
[root@overcloud-controller-0 ~]# hiera -c /etc/puppet/hiera.yaml
xinetd_bootstrap_nodeid
ede5f189-7149-4faf-a378-ac965a2a818c

This is the first part of the problem, when we agree the approach here
we can convert docker-puppet.py and all the *tasks to use the
per-service IDs/names instead of the global one to work properly with
composable roles/services.

Any thoughts on this appreciated before I go ahead and implement the fix.

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Tripleo] Automating role generation

2018-09-04 Thread Steven Hardy
On Tue, Sep 4, 2018 at 9:48 AM, Jiří Stránský  wrote:
> On 4.9.2018 08:13, Janki Chhatbar wrote:
>>
>> Hi
>>
>> I am looking to automate role file generation in TripleO. The idea is
>> basically for an operator to create a simple yaml file (operator.yaml,
>> say)
>> listing services that are needed and then TripleO to generate
>> Controller.yaml enabling only those services that are mentioned.
>>
>> For example:
>> operator.yaml
>> services:
>>  Glance
>>  OpenDaylight
>>  Neutron ovs agent
>
>
> I'm not sure it's worth introducing a new file format as such, if the
> purpose is essentially to expand e.g. "Glance" into
> "OS::TripleO::Services::GlanceApi" and
> "OS::TripleO::Services::GlanceRegistry"? It would be another layer of
> indirection (additional mental work for the operator who wants to understand
> how things work), while the layer doesn't make too much difference in
> preparation of the role. At least that's my subjective view.
>
>>
>> Then TripleO should
>> 1. Fail because ODL and OVS agent are either-or services
>
>
> +1 i think having something like this would be useful.
>
>> 2. After operator.yaml is modified to remove Neutron ovs agent, it should
>> generate Controller.yaml with below content
>>
>> ServicesDefault:
>> - OS::TripleO::Services::GlanceApi
>> - OS::TripleO::Services::GlanceRegistry
>> - OS::TripleO::Services::OpenDaylightApi
>> - OS::TripleO::Services::OpenDaylightOvs
>>
>> Currently, operator has to manually edit the role file (specially when
>> deployed with ODL) and I have seen many instances of failing deployment
>> due
>> to variations of OVS, OVN and ODL services enabled when they are actually
>> exclusive.
>
>
> Having validations on the service list would be helpful IMO, e.g. "these
> services must not be in one deployment together", "these services must not
> be in one role together", "these services must be together", "we recommend
> this service to be in every role" (i'm thinking TripleOPackages, Ntp, ...)
> etc. But as mentioned above, i think it would be better if we worked
> directly with the "OS::TripleO::Services..." values rather than a new layer
> of proxy-values.
>
> Additional random related thoughts:
>
> * Operator should still be able to disobey what the validation suggests, if
> they decide so.
>
> * Would be nice to have the info about particular services (e.g what can't
> be together) specified declaratively somewhere (TripleO's favorite thing in
> the world -- YAML?).
>
> * We could start with just one type of validation, e.g. the mutual
> exclusivity rule for ODL vs. OVS, but would be nice to have the solution
> easily expandable for new rule types.

This is similar to how the UI uses the capabilities-map.yaml, so
perhaps we can use that as the place to describe service dependencies
and conflicts?

https://github.com/openstack/tripleo-heat-templates/blob/master/capabilities-map.yaml

Currently this isn't used at all for the CLI, but I can imagine some
kind of wizard interface being useful, e.g you could say enable
"Glance" group and it'd automatically pull in all glance dependencies?

Another thing to mention is this doesn't necessarily have to generate
a new role (although it could), the *Services parameter for existing
roles can be overridden, so it might be simpler to generate an
environment file instead.

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] quickstart for humans

2018-08-31 Thread Steven Hardy
On Thu, Aug 30, 2018 at 3:28 PM, Honza Pokorny  wrote:
> Hello!
>
> Over the last few months, it seems that tripleo-quickstart has evolved
> into a CI tool.  It's primarily used by computers, and not humans.
> tripleo-quickstart is a helpful set of ansible playbooks, and a
> collection of feature sets.  However, it's become less useful for
> setting up development environments by humans.  For example, devmode.sh
> was recently deprecated without a user-friendly replacement. Moreover,
> during some informal irc conversations in #oooq, some developers even
> mentioned the plan to merge tripleo-quickstart and tripleo-ci.

I was recently directed to the reproducer-quickstart.sh script that's
written in the logs directory for all oooq CI jobs - does that help as
a replacement for the previous devmode interface?

Not that familiar with it myself but it seems to target many of the
use-cases you mention e.g uniform reproducer for issues, potentially
quicker way to replicate CI results?

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] podman: varlink interface for nice API calls

2018-08-16 Thread Steven Hardy
On Wed, Aug 15, 2018 at 10:48 PM, Jay Pipes  wrote:
> On 08/15/2018 04:01 PM, Emilien Macchi wrote:
>>
>> On Wed, Aug 15, 2018 at 5:31 PM Emilien Macchi > > wrote:
>>
>> More seriously here: there is an ongoing effort to converge the
>> tools around containerization within Red Hat, and we, TripleO are
>> interested to continue the containerization of our services (which
>> was initially done with Docker & Docker-Distribution).
>> We're looking at how these containers could be managed by k8s one
>> day but way before that we plan to swap out Docker and join CRI-O
>> efforts, which seem to be using Podman + Buildah (among other things).
>>
>> I guess my wording wasn't the best but Alex explained way better here:
>>
>> http://eavesdrop.openstack.org/irclogs/%23openstack-tc/%23openstack-tc.2018-08-15.log.html#t2018-08-15T17:56:52
>>
>> If I may have a chance to rephrase, I guess our current intention is to
>> continue our containerization and investigate how we can improve our tooling
>> to better orchestrate the containers.
>> We have a nice interface (openstack/paunch) that allows us to run multiple
>> container backends, and we're currently looking outside of Docker to see how
>> we could solve our current challenges with the new tools.
>> We're looking at CRI-O because it happens to be a project with a great
>> community, focusing on some problems that we, TripleO have been facing since
>> we containerized our services.
>>
>> We're doing all of this in the open, so feel free to ask any question.
>
>
> I appreciate your response, Emilien, thank you. Alex' responses to Jeremy on
> the #openstack-tc channel were informative, thank you Alex.
>
> For now, it *seems* to me that all of the chosen tooling is very Red Hat
> centric. Which makes sense to me, considering Triple-O is a Red Hat product.

Just as a point of clarification - TripleO is an OpenStack project,
and yes there is a downstream product derived from it, but we could
e.g support multiple container backends in TripleO if there was
community interest in supporting that.

Also I think Alex already explained that fairly clearly in the IRC
link that this is initially about proving our existing abstractions
work to enable alternate container backends.

Thanks,

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Mistral workflow cannot establish connection

2018-07-16 Thread Steven Hardy
On Sun, Jul 15, 2018 at 7:50 PM, Samuel Monderer
 wrote:
>
> Hi Remo,
>
> Attached are templates I used for the deployment. They are based on a 
> deployment we did with OSP11.
> I made the changes for it to work with OSP13.
>
> I do think it's the roles_data.yaml file that is causing the error because if 
> remove the " -r $TEMPLATES_DIR/roles_data.yaml" from the deployment script 
> the deployment passes the point it was failing before but fails much later 
> because of the missing definition of the role.

I can't see a problem with the roles_data.yaml you provided, it seems
to render ok using tripleo-heat-templates/tools/process-templates.py -
are you sure the error isn't related to uploading the roles_data file
to the swift container?

I'd check basic CLI access to swift as a sanity check, e.g something like:

openstack container list

and writing the roles data e.g:

openstack object create overcloud roles_data.yaml

If that works OK then it may be an haproxy timeout - you are
specifying quite a lot of roles, so I wonder if something is timing
out during the plan creation phase - we had some similar issues in CI
ref https://bugs.launchpad.net/tripleo-quickstart/+bug/1638908 where
increasing the haproxy timeouts helped.

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Proposing Alan Bishop tripleo core on storage bits

2018-06-19 Thread Steven Hardy
On Wed, Jun 13, 2018 at 4:50 PM, Emilien Macchi  wrote:
> Alan Bishop has been highly involved in the Storage backends integration in
> TripleO and Puppet modules, always here to update with new features, fix
> (nasty and untestable third-party backends) bugs and manage all the
> backports for stable releases:
> https://review.openstack.org/#/q/owner:%22Alan+Bishop+%253Cabishop%2540redhat.com%253E%22
>
> He's also well knowledgeable of how TripleO works and how containers are
> integrated, I would like to propose him as core on TripleO projects for
> patches related to storage things (Cinder, Glance, Swift, Manila, and
> backends).
>
> Please vote -1/+1,

+1

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] config-download/ansible next steps

2018-06-18 Thread Steven Hardy
On Mon, Jun 18, 2018 at 1:51 PM, Dmitry Tantsur  wrote:
> On 06/13/2018 03:17 PM, James Slagle wrote:
>>
>> On Wed, Jun 13, 2018 at 6:49 AM, Dmitry Tantsur 
>> wrote:
>>>
>>> Slightly hijacking the thread to provide a status update on one of the
>>> items
>>> :)
>>
>>
>> Thanks for jumping in.
>>
>>
>>> The immediate plan right now is to wait for metalsmith 0.4.0 to hit the
>>> repositories, then start experimenting. I need to find a way to
>>> 1. make creating nova instances no-op
>>> 2. collect the required information from the created stack (I need
>>> networks,
>>> ports, hostnames, initial SSH keys, capabilities, images)
>>> 3. update the config-download code to optionally include the role [2]
>>> I'm not entirely sure where to start, so any hints are welcome.
>>
>>
>> Here are a couple of possibilities.
>>
>> We could reuse the OS::TripleO::{{role.name}}Server mappings that we
>> already have in place for pre-provisioned nodes (deployed-server).
>> This could be mapped to a template that exposes some Ansible tasks as
>> outputs that drives metalsmith to do the deployment. When
>> config-download runs, it would execute these ansible tasks to
>> provision the nodes with Ironic. This has the advantage of maintaining
>> compatibility with our existing Heat parameter interfaces. It removes
>> Nova from the deployment so that from the undercloud perspective you'd
>> roughly have:
>>
>> Mistral -> Heat -> config-download -> Ironic (driven via
>> ansible/metalsmith)
>
>
> One thing that came to my mind while planning this work is that I'd prefer
> all nodes to be processed in one step. This will help avoiding some issues
> that we have now. For example, the following does not work reliably:
>
>  compute-0: just any profile:compute
>  compute-1: precise node=abcd
>  control-0: any node
>
> This has two issues that will pop up randomly:
> 1. compute-0 can pick node abcd designated for compute-1
> 2. control-0 can pick a compute node, failing either compute-0 or compute-1
>
> This problem is hard to fix if all deployment requests are processed
> separately, but is quite trivial if the decision is done based on the whole
> deployment plan. I'm going to work on a bulk scheduler like that in
> metalsmith.
>
>>
>> A further (or completely different) iteration might look like:
>>
>> Step 1: Mistral -> Ironic (driven via ansible/metalsmith)
>> Step 2: Heat -> config-download
>
>
> Step 1 will still use provided environment to figure out the count of nodes
> for each role, their images, capabilities and (optionally) precise node
> scheduling?
> I'm a bit worried about the last bit: IIRC we rely on Heat's %index%
> variable currently. We can, of course, ask people to replace it with
> something more explicit on upgrade.
>
>>
>> Step 2 would use the pre-provisioned node (deployed-server)  feature
>> already existing in TripleO and treat the just provisioned by Ironic
>> nodes, as pre-provisioned from the Heat stack perspective. Step 1 and
>> Step 2 would also probably be driven by a higher level Mistral
>> workflow. This has the advantage of minimal impact to
>> tripleo-heat-templates, and also removes Heat from the baremetal
>> provisioning step. However, we'd likely need some python compatibility
>> libraries that could translate Heat parameter values such as
>> HostnameMap to ansible vars for some basic backwards compatibility.
>
>
> Overall, I like this option better. It will allow an operator to isolate the
> bare metal provisioning step from everything else.
>
>>
>>>
>>> [1] https://github.com/openstack/metalsmith
>>> [2] https://metalsmith.readthedocs.io/en/latest/user/ansible.html
>>>

 Obviously we have things to consider here such as backwards
 compatibility
 and
 upgrades, but overall, I think this would be a great simplification to
 our
 overall deployment workflow.

>>>
>>> Yeah, this is tricky. Can we make Heat "forget" about Nova instances?
>>> Maybe
>>> by re-defining them to OS::Heat::None?
>>
>>
>> Not exactly, as Heat would delete the previous versions of the
>> resources. We'd need some special migrations, or could support the
>> existing method forever for upgrades, and only deprecate it for new
>> deployments.
>
>
> Do I get it right that if we redefine OS::TripleO::{{role.name}}Server to be
> OS::Heat::None, Heat will delete the old {{role.name}}Server instances on
> the next update? This is sad..
>
> I'd prefer not to keep Nova support forever, this is going to be hard to
> maintain and cover by the CI. Should we extend Heat to support "forgetting"
> resources? I think it may have a use case outside of TripleO.

This is already supported, it's just not the default:

https://docs.openstack.org/heat/latest/template_guide/hot_spec.html#resources-section

you can used e.g deletion_policy: retain to skip the deletion of the
underlying heat-managed resource.

Steve

__
OpenStack Development Mailing 

Re: [openstack-dev] [tripleo][heat][jinja] resources.RedisVirtualIP: Property error: resources.VipPort.properties.network: Error validating value 'internal_api': Unable to find network with name or id

2018-06-18 Thread Steven Hardy
On Thu, Jun 14, 2018 at 1:48 PM, Mark Hamzy  wrote:
> I am trying to delete the Storage, StorageMgmt, Tenant, and Management
> networks and trying to deploy using TripleO.
>
> The following patch
> https://hamzy.fedorapeople.org/0001-RedisVipPort-error-internal_api.patchapplied
> on top of /usr/share/openstack-tripleo-heat-templates from
> openstack-tripleo-heat-templates-8.0.2-14.el7ost.noarch
>
> yields the following error:
>
> (undercloud) [stack@oscloud5 ~]$ openstack overcloud deploy --templates -e
> ~/templates/node-info.yaml -e ~/templates/overcloud_images.yaml -e
> ~/templates/environments/network-environment.yaml -e
> ~/templates/environments/network-isolation.yaml -e
> ~/templates/environments/config-debug.yaml --ntp-server pool.ntp.org
> --control-scale 1 --compute-scale 1 --control-flavor control
> --compute-flavor compute 2>&1 | tee output.overcloud.deploy
> ...
> overcloud.RedisVirtualIP:
>   resource_type: OS::TripleO::Network::Ports::RedisVipPort
>   physical_resource_id:
>   status: CREATE_FAILED
>   status_reason: |
> resources.RedisVirtualIP: Property error:
> resources.VipPort.properties.network: Error validating value 'internal_api':
> Unable to find network with name or id 'internal_api'
> ...
>
> The following patch seems to fix it:
>
> 8<-8<-8<-8<-8<-
> diff --git a/environments/network-isolation.j2.yaml
> b/environments/network-isolation.j2.yaml
> index 3d4f59b..07cb748 100644
> --- a/environments/network-isolation.j2.yaml
> +++ b/environments/network-isolation.j2.yaml
> @@ -20,7 +20,13 @@ resource_registry:
>{%- for network in networks if network.vip and
> network.enabled|default(true) %}
>OS::TripleO::Network::Ports::{{network.name}}VipPort:
> ../network/ports/{{network.name_lower|default(network.name.lower())}}.yaml
>{%- endfor %}
> +{%- for role in roles -%}
> +  {%- if internal_api in role.networks|default([]) and
> internal_api.enabled|default(true) %}
>OS::TripleO::Network::Ports::RedisVipPort: ../network/ports/vip.yaml
> +  {%- else %}
> +  # Avoid weird jinja2 bugs that don't output a newline...
> +  {%- endif %}
> +{%- endfor -%}

Does this actually work, or just suppress the error because your
network_data.yaml has deleted the internal_api network?

From the diff it looks like you're also deleting the internal_api and
external network which won't work with the default ServiceNetMap:

https://github.com/openstack/tripleo-heat-templates/blob/master/network/service_net_map.j2.yaml#L27

Can you please provide the network_data.yaml to confirm this?

If you really want to delete the internal_api network then you'll need
to pass a ServiceNetMap to specify a new bind network for every
service (and any other deleted networks where used as a value in the
ServiceNetMapDefaults).

Thanks,

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] DeployArtifacts considered...complicated?

2018-06-18 Thread Steven Hardy
On Sat, Jun 16, 2018 at 3:06 AM, Lars Kellogg-Stedman  wrote:
> I've been working on a series of patches to enable support for
> keystone federation in tripleo.  I've been making good use of the
> DeployArtifacts support for testing puppet modules...until today.
>
> I have some patches that teach puppet-keystone about multi-valued
> configuration options (like trusted_dashboard).  They replace the
> keystone_config provider (and corresponding type) with ones that work
> with the 'openstackconfig' provider (instead of ini_settings).  These
> work great when I test them in isolation, but whenever I ran them as
> part of an "overcloud deploy" I would get erroneous output.
>
> After digging through the various layers I found myself looking at
> docker-puppet.py [1], which ultimately ends up calling puppet like
> this:
>
>   puppet apply ... 
> --modulepath=/etc/puppet/modules:/usr/share/openstack-puppet/modules ...
>
> It's that --modulepath argument that's the culprit.  DeployArtifacts
> (when using the upload-puppet-modules script) works by replacing the
> symlinks in /etc/puppet/modules with the directories from your upload
> directory.  Even though the 'keystone' module in /etc/puppet/modules
> takes precedence when doing something like 'include ::keystone', *all
> the providers and types* in lib/puppet/* in
> /usr/share/openstack-puppet/modules will be activated.

Is this the same issue Carlos is trying to fix via
https://review.openstack.org/#/c/494517/ ?

I think there was some confusion on that patch around the underlying
problem, but I think your explanation here helps, e.g the problem is
you can conceivably end up with a mix of old/new modules?

Thanks,

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleO] quickstart with containers deployment failed

2018-01-15 Thread Steven Hardy
On Sun, Jan 14, 2018 at 7:46 AM, Moshe Levi  wrote:
> Hi,
>
>
>
> We are trying to add container support ovs hw offload.
>
> We were able to do the deployment  a few weeks ago, but now we getting an
> errors .
>
> Jan 14 07:14:32 localhost os-collect-config: "2018-01-14 07:14:28,568
> WARNING: 18476 -- retrying pulling image:
> 192.168.24.1:8787/tripleomaster/centos-binary-neutron-server:current-tripleo",
>
> Jan 14 07:14:32 localhost os-collect-config: "2018-01-14 07:14:31,587
> WARNING: 18476 -- docker pull failed: Get
>
> https://192.168.24.1:8787/v1/_ping: dial tcp 192.168.24.1:8787: getsockopt:
> connection refused",

Sounds like the docker registry is not running on the undercloud?

How does your environment compare to these outputs:

(undercloud) [stack@undercloud tripleo-heat-templates]$ sudo systemctl
status docker-distribution
● docker-distribution.service - v2 Registry server for Docker
   Loaded: loaded
(/usr/lib/systemd/system/docker-distribution.service; enabled; vendor
preset: disabled)
   Active: active (running) since Fri 2018-01-12 11:39:25 UTC; 2 days ago
 Main PID: 5859 (registry)
   CGroup: /system.slice/docker-distribution.service
   └─5859 /usr/bin/registry serve
/etc/docker-distribution/registry/config.yml


(undercloud) [stack@undercloud tripleo-heat-templates]$ sudo netstat
-taupen | grep registry
tcp0  0 192.168.24.1:8787   0.0.0.0:*
LISTEN  0  47530  5859/registry

(undercloud) [stack@undercloud tripleo-heat-templates]$ curl
http://192.168.24.1:8787/v2/_catalog
{"repositories":["tripleomaster/centos-binary-aodh-api","tripleomaster/centos-binary-aodh-evaluator","tripleomaster/centos-binary-aodh-listener","tripleomaster/centos-binary-aodh-notifier","tripleomaster/centos-binary-ceilometer-central","tripleomaster/centos-binary-ceilometer-compute","tripleomaster/centos-binary-ceilometer-notification","tripleomaster/centos-binary-cinder-api","tripleomaster/centos-binary-cinder-scheduler","tripleomaster/centos-binary-cinder-volume","tripleomaster/centos-binary-cron","tripleomaster/centos-binary-glance-api","tripleomaster/centos-binary-gnocchi-api","tripleomaster/centos-binary-gnocchi-metricd","tripleomaster/centos-binary-gnocchi-statsd","tripleomaster/centos-binary-haproxy","tripleomaster/centos-binary-heat-api","tripleomaster/centos-binary-heat-api-cfn","tripleomaster/centos-binary-heat-engine","tripleomaster/centos-binary-horizon","tripleomaster/centos-binary-iscsid","tripleomaster/centos-binary-keystone","tripleomaster/centos-binary-mariadb","tripleomaster/centos-binary-memcached","tripleomaster/centos-binary-neutron-dhcp-agent","tripleomaster/centos-binary-neutron-l3-agent","tripleomaster/centos-binary-neutron-metadata-agent","tripleomaster/centos-binary-neutron-openvswitch-agent","tripleomaster/centos-binary-neutron-server","tripleomaster/centos-binary-nova-api","tripleomaster/centos-binary-nova-compute","tripleomaster/centos-binary-nova-conductor","tripleomaster/centos-binary-nova-consoleauth","tripleomaster/centos-binary-nova-libvirt","tripleomaster/centos-binary-nova-novncproxy","tripleomaster/centos-binary-nova-placement-api","tripleomaster/centos-binary-nova-scheduler","tripleomaster/centos-binary-panko-api","tripleomaster/centos-binary-rabbitmq","tripleomaster/centos-binary-redis","tripleomaster/centos-binary-swift-account","tripleomaster/centos-binary-swift-container","tripleomaster/centos-binary-swift-object","tripleomaster/centos-binary-swift-proxy-server"]}
(undercloud) [stack@undercloud tripleo-heat-templates]$


> see log below [1].
>
> We tried to run
>
> openstack overcloud container image tag discover   --image
> trunk.registry.rdoproject.org/master/centos-binary-base:current-tripleo-rdo
> --tag-from-label rdo_version
>
>
>
> and we are getting the errors below [2]

Ok this looks like a different problem, does the quickstart generated
overcloud-prep-containers.sh script work?  What --release argument did
you give to quickstart?

This may be a docs issue or bug in the discover CLI as running with
the same image/tag also fails for me, but the
overcloud-prep-containers.sh script (which doesn't use the discover
CLI) works fine.

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Tripleo] Some historic digging

2017-12-07 Thread Steven Hardy
On Thu, Dec 7, 2017 at 7:22 AM, Cédric Jeanneret
 wrote:
> Hello,
>
> While trying to add some unit tests for some resources in the
> puppet-tripleo repository, I stumbled on a not-so-nice case: the
> tripleo::firewall::service_rules definition.
>
> In there, a resource is dynamically taken from hiera, using the
> following key structure:
> tripleo..firewall_rules
>
> Although this seems to work properly in the tripleo deploy process, it
> just doesn't work at all for a unit test.
>
> After some more digging, it appears the "dot" (".") in YAML is a
> reserved char for hashes. In the puppet world, we usually use the double
> semi-colon, aka "::" as a separator.
>
> Sooo… I'm wondering what's the history behind that non-puppet-standard
> choice of "dot" separator, and what would be required in order to move
> to a more standard syntax for that kind of resources.

dprince may be the best person to answer as IIRC he implemented this originally.

However I suspect the choice of "." was deliberate to differentiate
from :: which implies the hiera values are consumed by puppet class
interfaces, vs parsed inside the class.

Can you work around the yaml issue by using quotes to force the
"tripleo.foo.firewall" to be a string?

We don't seem to require that for any templates in
tripleo-heat-templates though, is that just because the yaml key gets
cast to a string by hiera?

> I've checked the puppet-tripleo repository, and apparently only two
> resources are using that kind of syntax:
> - tripleo::firewall::service_rules (already widely present in the
> tripleo-heat-templates)
> - tripleo::haproxy::service_endpoints
> (at least, those are the only resources using a "$underscore_name" variable)
>
> Currently, if anyone wants to change something close to those two
> resources, it won't be tested against regressions, and it's a really bad
> situation. Especially since it might break central services (closing all
> firewall ports isn't good, for example, or dropping an haproxy endpoint
> by mistake)… Unit tests are needed, especially in such a huge piece of
> software ;).

Yeah I think we need to know more about the reasons this syntax won't
work for unit tests, we could conceivably change it, but as you say
it's widely used for firewall rules, and could potentially break any
out of tree service templates that exist, so we'd have to follow a
deprecation process to change the interface.

Thanks,

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Proposing Wesley Hayutin core on TripleO CI

2017-12-07 Thread Steven Hardy
+1!

On Wed, Dec 6, 2017 at 3:45 PM, Emilien Macchi  wrote:
> Team,
>
> Wes has been consistently and heavily involved in TripleO CI work.
> He has a very well understanding on how tripleo-quickstart and
> tripleo-quickstart-extras work, his number and quality of reviews are
> excellent so far. His experience with testing TripleO is more than
> valuable.
> Also, he's always here to help on TripleO CI issues or just
> improvements (he's the guy filling bugs on a Saturday evening).
> I think he would be a good addition to the TripleO CI core team
> (tripleo-ci, t-q and t-q-e repos for now).
> Anyway, thanks a lot Wes for your hard work on CI, I think it's time
> to move on and get you +2 ;-)
>
> As usual, it's open for voting, feel free to bring any feedback.
> Thanks everyone,
> --
> Emilien Macchi
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Proposing Ronelle Landy for Tripleo-Quickstart/Extras/CI core

2017-11-30 Thread Steven Hardy
+1

On Thu, Nov 30, 2017 at 3:56 PM, Attila Darazs  wrote:
> On 11/29/2017 08:34 PM, John Trowbridge wrote:
>>
>> I would like to propose Ronelle be given +2 for the above repos. She has
>> been a solid contributor to tripleo-quickstart and extras almost since the
>> beginning. She has solid review numbers, but more importantly has always
>> done quality reviews. She also has been working in the very intense rover
>> role on the CI squad in the past CI sprint, and has done very well in that
>> role.
>
>
> +1, yep!
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Updates on the TripleO on Kubernetes work

2017-11-17 Thread Steven Hardy
On Thu, Nov 16, 2017 at 4:56 PM, James Slagle  wrote:
> On Thu, Nov 16, 2017 at 8:44 AM, Flavio Percoco  wrote:
>> Integration with TripleO Heat Templates
>> ===
>>
>> This work is on-going and you should eventually see some patches popping-up
>> on
>> the reviews list. One of the goals, besides consuming these ansible roles
>> from
>> t-h-t, is to be able to create a PoC for upgrades and have an end-to-end
>> test/demo of this work.
>>
>> As we progress, we are trying to nail down an end-to-end deployment before
>> creating roles for all the services that are currently supported by TripleO.
>> We
>> will be adding projects as needed with a focus on the end-to-end goal.
>
> When we consume these ansible-role-k8s-* roles from t-h-t, I think
> that should be a stepping stone towards migrating away from having to
> use Heat to deploy and configure those services. We know that these
> new ansible roles will be deployable standalone, and the interface to
> do that should be typical ansible best practices (role defaults, vars,
> etc).
>
> We can offer a mechanism such that one can migrate from a
> tripleo-heat-templates/docker/services/database/mysql.yaml deployed
> mariadb to one deployed via
> ansible-role-k8s-mariadb. The config-download mechanism could be
> updated to generate or pull from Heat the necessary ansible vars files
> for configuring the roles. We should make sure that the integration
> with tripleo-heat-templates results in the same inputs/outputs that
> someone would consume if using the roles standalone. Future iterations
> would then not have to require Heat for that service at all, unless
> the operator wanted to continue to configure the service via Heat
> parameters/environments.
>
> What I'm trying to propose is a path towards deprecating the Heat
> parameter/environment driven and hieradata driven approach to
> configuring the services. The ansible-role-k8s-* roles should offer a
> new interface, so I don't think we have to remain tied to Heat
> forever, so we should consider what we want the long term goal to be
> in an ideal world, and take some iterative steps to get there.

I agree this is a good time to discuss ways to rationalize the
toolchain, but I do suspect it may be premature to consider
derprecating puppet/hiera as AFAIK this doesn't provide any drop-in
replacement for the config file generation?

I was thinking we'd probably maintain the current docker-puppet.py
model for this first pass, to reduce the risk of migrating containers
to k8s, and we could probably refactor things such that this config
generation via puppet+docker is orchestrated via the ansible roles and
kubernetes?

The current model is something like:

1. Run temporary docker container, run puppet, write config files to
host file system
2. Start service container, config files bind mounted into container
from host filesystem
3. Run temporary bootstrapping container (runs puppet, optional step)

(this is simplified for clarity as there are opportunities for some
other bootstrapping steps)

In the ansible/kubernetes model, it could work like:

1. Ansible role makes k8s API call creating pod with multiple containers
2. Pod starts temporary container that runs puppet, config files
written out to shared volume
3. Service container starts, config consumed from shared volume
4. Optionally run temporary bootstrapping container inside pod

This sort of pattern is documented here:

https://kubernetes.io/docs/tasks/access-application-cluster/communicate-containers-same-pod-shared-volume/

The main advantage is we don't have to reimplement config management
for every single service, but obviously we'd want this to be pluggable
in the ansible roles so other config management strategies/tools could
be used instead of our puppet model.

> It's probably worthwhile as a thought experiment to update this
> diagram[0] as to how it might look at different future stages. The
> first stage might just be t-h-t driven ansible-role-k8s-* , followed
> by a migration to ansible-role-k8s-* as the primary interface, and
> then finally perhaps no Heat[1].

Agreed this is definitely a good time to discuss moving the service
configuration workflow to pure ansible, but as noted above I'm not
convinced we're yet ready to take puppet out of the mix, so it may be
safer to leave that (by now quite well proven in our heat+ansible
container architecture) pattern in place, at least initially?

Thanks!

Steve

> [0] https://slagle.fedorapeople.org/tripleo-ansible-arch.png
> [1] Except for perhaps deployment of baremetal resources, but even
> then I'm personally of the opinion that would be better serviced by
> Mistral->Ansible->Ironic directly.
>
> --
> -- James Slagle
> --
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: 

Re: [openstack-dev] [tripleo] Proposing John Fulton core on TripleO

2017-11-08 Thread Steven Hardy
On Wed, Nov 8, 2017 at 10:33 PM, Alex Schultz  wrote:
> On Wed, Nov 8, 2017 at 3:24 PM, Giulio Fidente  wrote:
>> Hi,
>>
>> I would like to propose John Fulton core on TripleO.
>>
>> I think John did an awesome work during the Pike cycle around the
>> integration of ceph-ansible as a replacement for puppet-ceph, for the
>> deployment of Ceph in containers.
>>
>> I think John has good understanding of many different parts of TripleO
>> given that the ceph-ansible integration has been a complicated effort
>> involving changes in heat/tht/mistral workflows/ci and last but not
>> least, docs and he is more recently getting busier with reviews outside
>> his main comfort zone.
>>
>> I am sure John would be a great addition to the team and I welcome him
>> first to tune into radioparadise with the rest of us when joining #tripleo
>>
>
> +1. Excellent work with the ceph-ansible items.

Agreed, +1!

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Next steps for pre-deployment workflows (e.g derive parameters)

2017-11-08 Thread Steven Hardy
On Wed, Nov 8, 2017 at 10:55 PM, James Slagle <james.sla...@gmail.com> wrote:
> On Wed, Nov 8, 2017 at 7:16 AM, James Slagle <james.sla...@gmail.com> wrote:
>> On Wed, Nov 8, 2017 at 12:09 AM, Steven Hardy <sha...@redhat.com> wrote:
>>> Hi all,
>>>
>>> Today I had a productive hallway discussion with jtomasek and
>>> stevebaker re $subject, so I wanted to elaborate here for the benefit
>>> of those folks not present.  Hopefully we can get feedback on the
>>> ideas and see if it makes sense to continue and work on some patches:
>>>
>>> The problem under discussion is how do we run pre-deployment workflows
>>> (such as those integrated recently to calculate derived parameters,
>>> and in future perhaps also those which download container images etc),
>>> and in particular how do we make these discoverable via the UI
>>> (including any input parameters).
>
> After chatting with jtomasek on irc, I wanted to clarify that the part
> of this proposal I'm hesitant about.
>
> Specifically, it's adding an interface for any service to specify a
> mistral workflow, that in theory could do anything, as part of a
> pre_deploy interface "contract". If we do offer such an interface, I
> think it ought to be driven via ansible tasks/plays that are available
> as Heat stack outputs to match the config-download pattern.

Thanks for the feedback, yes I agree if we can do this with pure
ansible that would be great, but we'll have to take a closer look at
the existing implementation, e.g as mentioned by Saravanan there is an
existing integration with mistral workflows, which we'll either have
to maintain or migrate away from.

> Perhaps for just deriving parameters, having a way to specify
> workflows for the UI is Ok. It's more of the generic interface I'm not
> so keen on. As it relates to your example of downloading container
> images, it seems we could have a generic ansible task to do that, that
> could then be executed with Mistral for API purposes instead of
> specifying the Mistral workflow directly in the templates/roles_data.

Yeah good point, and also the point about CI moving towards undercloud
deploy is a good one - if we can work out a way to do this via ansible
(even if that means ansible running a mistral workflow as a
transitional step?) that would certainly be easier.

Hopefully we can chat more about this on IRC next week and prototype
the ansible approach to see how it could work.

Thanks!

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Nominate akrivoka for tripleo-validations core

2017-11-07 Thread Steven Hardy
On Mon, Nov 6, 2017 at 2:32 PM, Honza Pokorny  wrote:
> Hello people,
>
> I would like to nominate Ana Krivokapić (akrivoka) for the core team for
> tripleo-validations.  She has really stepped up her game on that project
> in terms of helpful reviews, and great patches.
>
> With Ana's help as a core, we can get more done, and innovate faster.
>
> If there are no objections within a week, we'll proceed with adding Ana
> to the team.

+1!

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [TripleO] Next steps for pre-deployment workflows (e.g derive parameters)

2017-11-07 Thread Steven Hardy
Hi all,

Today I had a productive hallway discussion with jtomasek and
stevebaker re $subject, so I wanted to elaborate here for the benefit
of those folks not present.  Hopefully we can get feedback on the
ideas and see if it makes sense to continue and work on some patches:

The problem under discussion is how do we run pre-deployment workflows
(such as those integrated recently to calculate derived parameters,
and in future perhaps also those which download container images etc),
and in particular how do we make these discoverable via the UI
(including any input parameters).

The idea we came up with has two parts:

1. Add a new optional section to roles_data for services that require
pre-deploy workflows

E.g something like this:

 pre_deploy_workflows:
- derive_params:
  workflow_name:
tripleo.derive_params_formulas.v1.dpdk_derive_params
  inputs:
  ...

This would allow us to associate a specific mistral workflow with a
given service template, and also work around the fact that currently
mistral inputs don't have any schema (only key/value input) as we
could encode the required type and any constraints in the inputs block
(clearly this could be removed in future should typed parameters
become available in mistral).

2. Add a new workflow that calculates the enabled services and returns
all pre_deploy_workflows

This would take all enabled environments, then use heat to validate
the configuration and return the merged resource registry (which will
require https://review.openstack.org/#/c/509760/), then we would
iterate over all enabled services in the registry and extract a given
roles_data key (e.g pre_deploy_workflows)

The result of the workflow would be a list of all pre_deploy_workflows
for all enabled services, which the UI could then use to run the
workflows as part of the pre-deploy process.

If this makes sense I can go ahead and push some patches so we can
iterate on the implementation?

Thanks,

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] roles_data.yaml equivalent in containers

2017-10-25 Thread Steven Hardy
On Wed, Oct 25, 2017 at 6:41 AM, Abhishek Kane
 wrote:
>
> Hi,
>
>
>
> In THT I have an environment file and corresponding puppet service for 
> Veritas HyperScale.
>
> https://github.com/openstack/tripleo-heat-templates/blob/master/environments/veritas-hyperscale/veritas-hyperscale-config.yaml
>
> https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/veritas-hyperscale-controller.yaml
>
>
>
> This service needs rabbitmq user the hooks for it is 
> “veritas_hyperscale::hs_rabbitmq”-
>
> https://github.com/openstack/puppet-tripleo/blob/master/manifests/profile/base/rabbitmq.pp#L172
>
>
>
> In order to configure Veritas HyperScale, I add 
> “OS::TripleO::Services::VRTSHyperScale” to roles_data.yaml file and use 
> following command-
>
>
>
> # openstack overcloud deploy --templates -r /home/stack/roles_data.yaml -e 
> /usr/share/openstack-tripleo-heat-templates/environments/veritas-hyperscale/veritas-hyperscale-config.yaml
>  -e 
> /usr/share/openstack-tripleo-heat-templates/environments/veritas-hyperscale/cinder-veritas-hyperscale-config.yaml
>
>
>
> This command sets “veritas_hyperscale_controller_enabled” to true in 
> hieradata and all the hooks gets called.
>
>
>
> I am trying to containerize Veritas HyperScale services. I used following 
> config file in quickstart-
>
> http://paste.openstack.org/show/624438/
>
>
>
> It has the environment files-
>
>   -e 
> {{overcloud_templates_path}}/environments/veritas-hyperscale/cinder-veritas-hyperscale-config.yaml
>
>   -e 
> {{overcloud_templates_path}}/environments/veritas-hyperscale/veritas-hyperscale-config.yaml
>
>
>
> But this itself doesn’t set “veritas_hyperscale_controller_enabled” to true 
> in hieradata and veritas_hyperscale::hs_rabbitmq doesn’t get called.
>
> https://github.com/openstack/tripleo-heat-templates/blob/master/roles_data.yaml#L56
>
>
>
>
>
> How do I add OS::TripleO::Services::VRTSHyperScale in case of containers?

the roles_data.yaml approach you used previously should still work in
the case of containers, but the service template referenced will be
different (the files linked above still refer to the puppet service
template)

e.g

https://github.com/openstack/tripleo-heat-templates/blob/master/environments/veritas-hyperscale/veritas-hyperscale-config.yaml#L18

defines:

OS::TripleO::Services::VRTSHyperScale:
../../puppet/services/veritas-hyperscale-controller.yaml

Which overrides this default mapping to OS::Heat::None:

https://github.com/openstack/tripleo-heat-templates/blob/master/overcloud-resource-registry-puppet.j2.yaml#L297

For containerized services, there are different resource_registry
mappings that refer to the templates in
tripleo-heat-templates/docker/services. e.g like this:

https://github.com/openstack/tripleo-heat-templates/blob/master/environments/services-docker/sahara.yaml

I think you'll need to create similar new service templates under
docker/services, then create some new environment files which map to
the new implementation that defines the data needed to start the
contianers.

You can get help with this in #tripleo on Freenode, and there are some
docs here:

https://github.com/openstack/tripleo-heat-templates/blob/master/docker/services/README.rst
https://docs.openstack.org/tripleo-docs/latest/install/containers_deployment/index.html

There was also a deep-dive recorded which is linked from here:

https://etherpad.openstack.org/p/tripleo-deep-dive-topics

Hope that helps somewhat?

Thanks,

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc] [stable] [tripleo] [kolla] [ansible] [puppet] Proposing changes in stable policy for installers

2017-10-16 Thread Steven Hardy
On Mon, Oct 16, 2017 at 2:33 PM, Steven Dake (stdake)  wrote:
> Emilien,
>
> I generally thought the stable policy seemed reasonable enough for lifecycle 
> management tools.  I’m not sure what specific problems you had in TripleO 
> although I did read your review.  Kolla was just tagged with the stable 
> policy, and TMK, we haven’t run into trouble yet, although the Kolla project 
> is stable and has been following the stable policy for about 18 months.  If 
> the requirements are watered down, the tag could potentially be meaningless.  
> We haven’t experienced this specific tag enough to know if it needs some 
> refinement for the specific use case of lifecycle management tools.  That 
> said, the follows release policy was created to handle the special case of 
> lifecycle management tool’s upstream sources not being ready for lifecycle 
> management tools to release at one coordinated release time.

We initially felt the policy was reasonable too, but there are a
couple of specific recurring pain points:

1. Services land features which require installer/management tool
updates late in the cycle, or the work to update the configuration
tooling just doesn't happen fast enough during a given cycle.

2. Vendor integrations, similar to (1) but specific to enabling vendor
backends e.g for Neutron etc - the work to enable configuring specific
vendor plugins tends to lag the upstream releases (sometimes
significantly) because most vendors are focussed on consuming the
stable branch releases, not the development/master releases.

In an ideal world the answer would be for everyone working on these
integrations to land the installer (e.g puppet/TripleO/Kolla/...)
patches earlier, but even with the concessions around cycle-trailing
deadlines we're finding that there is ongoing pressure to backport
integrations which (according to stable-maint policy) are strictly
"features" but are actually more integration or enablement of features
which do exist in the version of OpenStack we're deploying.

Several releases ago (before we adopted stable: follows-policy) we
tried to solve this by allowing selected feature backports, but this
was insufficiently well defined (and thus abused) so we need some way
to enable vendor integrations and exposure of new features in the
underlying services, without allowing a backport-everything floodgate
to open ;)

I think one difference between TripleO/Puppet and Kolla here is AFAIK
Kolla has several ways to customize the configuration of deployed
services in a fairly unconstrained way, whereas the openstack puppet
modules and TripleO publish interfaces via a somewhat more static
module and "service plugin" model, which improves discoverability of
features e.g for the TripleO UI but causes a headache when you
discover support for a new vendor Neutron plugin is required well
after the upstream release deadline has passed.

Hope that helps clarify somewhat?

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo][quickstart][rdo] shipping python-virtualbmc in Newton to allow undercloud upgrades from Newton to Queens

2017-10-05 Thread Steven Hardy
On Wed, Oct 4, 2017 at 1:08 PM, Lee Yarwood  wrote:
> Hello all,
>
> I'm currently working to get the tripleo-spec for fast-forward upgrades
> out of WIP and merged ahead of the Queens M-1 milestone next week. One
> of the documented pre-requisite steps for fast-forward upgrades is for
> an operator to linearly upgrade the undercloud from Newton (N) to Queens
> (N+3):
>
> https://review.openstack.org/#/c/497257/
>
> This is not possible at present with tripleo-quickstart deployed virtual
> environments thanks to our use of the pxe_ssh Ironic driver in Newton
> that has now been removed in Pike:
>
> https://docs.openstack.org/releasenotes/ironic/pike.html#id14
>
> I briefly looked into migrating between pxe_ssh and the new default of
> vbmc during the Ocata to Pike undercloud upgrade but I'd much rather
> just deploy Newton using vbmc. AFAICT the only issue here is packaging
> with the python-virtualbmc package not present in the Newton repos.
>
> With that in mind I've submitted the following changes that remove the
> various conditionals in tripleo-quickstart that block the use of vbmc in
> Newton and verified that this works by using the Ocata python-virtualbmc
> package:
>
> https://review.openstack.org/#/q/topic:allow_vbmc_newton+(status:open+OR+status:merged)
>
> FWIW I can deploy successfully on Newton with these changes and then
> upgrade the undercloud to Pike just fine.
>
> Would anyone be able to confirm *if* we could ship python-virtualbmc in
> the Newton relevant repos?

This sounds reasonable to me, but note another option for testing
fast-forward overcloud upgrades would be to deploy a trunk/pike
undercloud, then use it to deploy a newton overcloud (use newton
overcloud-full image and tripleo-heat-templates).

We already do a mixed version deploy like this if the upgrade CI jobs,
although those are only deploying the N-1 release not N-3, but I think
in theory it should work the same.

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] plans on testing minor updates?

2017-09-28 Thread Steven Hardy
On Thu, Sep 28, 2017 at 8:04 AM, Marios Andreou  wrote:
>
>
> On Thu, Sep 28, 2017 at 9:50 AM, mathieu bultel  wrote:
>>
>> Hi,
>>
>>
>> On 09/28/2017 05:05 AM, Emilien Macchi wrote:
>> > I was reviewing https://review.openstack.org/#/c/487496/ and
>> > https://review.openstack.org/#/c/487488/ when I realized that we still
>> > didn't have any test coverage for minor updates.
>> > We never had this coverage AFICT but this is not a reason to not push
>> > forward it.
>> Thank you for the review and the -2! :)
>> So I'm agree with you, we need CI coverage for that part, and I was
>> wondering how I can put quickly a test in CI for the minor update.
>> But before that, just few things to take in account regarding those
>> reviews:
>>
>
> agree on the need for the ci coverage, but disagree on blocking this. by the
> same logic we should not have landed anything minor update related during
> the previous cycle. This is the very last part for
> https://bugs.launchpad.net/tripleo/+bug/1715557 - wiring up the mechanism
> into client and what's more matbu has managed to do it 'properly' with a
> tripleo-common mistral action wired up to the tripleoclient cli.
>
> I don't think its right we don't have coverage but I also don't think its
> right to block these last patches,

Yeah I agree - FWIW we have discussed this before, and AIUI the plan was:

1 - Get multinode coverage of an HA deployment with more than on
controller (e.g the 3nodes job) but with containers enabled
2- Implement a rolling minor update test based on that
multi-controller HA-with-containers test

AFAIK we're only starting to get containers+pacemaker CI scenarios
working with one controller, so it's not really reasonable to block
this, since that is a prerequisite to the multi-controller test, which
is a prerequisite to the rolling update test.

Personally I think we'd be best to aim directly for the rolling update
test in CI, as doing a single node minor update doesn't really test
the most important aspect (e.g zero downtime).

The other challenge here is the walltime relative to the CI timeout -
we've been running into that for the containers upgrade job, and I
think we need to figure out optimizations there which may also be
required for minor update testing (maybe we can work around that by
only updating a very small number of containers, but that will reduce
the test coverage considerably?)

I completely agree we need this coverage, and honestly we should have
had it a long time ago, but we need to make progress on this last
critical blocker for pike, while continuing to make progress on the CI
coverage (which should certainly be a top priority for the Lifecycle
squad, as soon as we have this completely new-for-pike minor updates
workflow fully implemented and debugged).

Thanks,

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] CLI extensions for the undercloud

2017-09-15 Thread Steven Hardy
On Thu, Sep 14, 2017 at 8:02 PM, Ricardo Noriega De Soto
 wrote:
> Hello guys,
>
> After integrating BGPVPN and L2GW Neutron drivers in TripleO, I've realized
> I always have to jump to the overcloud controller (and copy-pasting the
> overcloudrc file) in order to use the CLI extensions such "neutron
> l2-gateway list".

That option seems to be available on my undercloud:

(undercloud) [stack@undercloud tmp]$ neutron help | grep l2
neutron CLI is deprecated and will be removed in the future. Use
openstack CLI instead.
  l2-gateway-connection-create   [l2_gateway_connection] Create
l2gateway-connection information.
  l2-gateway-connection-delete   [l2_gateway_connection] Delete a
given l2gateway-connection.
  l2-gateway-connection-list [l2_gateway_connection] List
l2gateway-connections.
  l2-gateway-connection-show [l2_gateway_connection] Show
information of a given l2gateway-connection.
  l2-gateway-create  [l2_gateway] Create l2gateway information.
  l2-gateway-delete  [l2_gateway] Delete a given l2gateway.
  l2-gateway-list[l2_gateway] List l2gateway that
belongs to a given tenant.
  l2-gateway-show[l2_gateway] Show information of
a given l2gateway.
  l2-gateway-update  [l2_gateway] Update a given l2gateway.

So is it not just a case of sourcing the overcloudrc on the undercloud
(or any other node with the correct client packages installed and
access to the overcloud endpoints) ?

Maybe I'm missing something but I'm not clear why you need to SSH to
the overcloud controller - that's never expected for any tenant
API/CLI operations, but clearly wherever they do connect from needs an
appropriate client - is your undercloud build missing whatever package
provides l2-gateway-list perhaps?

> I'd like to know what's the current strategy on other services and if it's
> possible to use those extensions from the undercloud. From the UX
> perspective it would be much appreciated! :-)

AFAIK the expectation is, much like any OpenStack cloud, that the user
interacts with the ReST APIs exposed by the overcloud endpoints, and
TripleO itself doesn't really care about how that happens, even though
we do install many of the clients on the undercloud (which isn't
intended for use of end users, only cloud administrators).

Hope that helps,

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] pingtest vs tempest

2017-09-06 Thread Steven Hardy
On Thu, Apr 6, 2017 at 12:29 PM, Justin Kilpatrick  wrote:
> Maybe I'm getting a little off topic with this question, but why was
> Tempest removed last time?
>
> I'm not well versed in the history of this discussion, but from what I
> understand Tempest in the gate has
> been an off and on again thing for a while but I've never heard the
> story of why it got removed.

I think the main reason has always been CI job runtime - we have a
hard limit for job timeouts, and historically we've had better luck
using our tripleo "pingtest" minimal functional test vs running a
suite of tempest tests as it was just much faster than running tempest
smoke tests - that may no longer be the case though, I've not used
tempest myself for a while.

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Forming our plans around Ansible

2017-07-10 Thread Steven Hardy
On Sun, Jul 9, 2017 at 8:44 AM, Yolanda Robla Mota  wrote:
> What i'd like to dig more is how Ansible and Heat can live together. And
> what features do Heat offer that are not covered by Ansible as well? Is
> there still the need to have Heat as the main engine, or could that be
> replaced by Ansible totally in the future?

The main interface provided by Heat which AFAIK cannot currently be
replaced by Ansible is the parameters schema, where the template
parameters are exposed (that include description, type and constraint
data) in a format that is useful to e.g building the interfaces for
tripleo-ui

Ansible has a different approach to role/playbook parameters AFAICT,
which is more a global namespace with no type validation, no way to
include description data or tags with variable declarations, and no
way to specify constraints (other than perhaps hainvg custom modules
or playbook patterns that perform the validations early in the
deployment).

This is kind of similar to how the global namespace for hiera works
with our puppet model, although that at least has the advantage of
namespacing foo::something::variable, which again doesn't have a
direct equivalent in the ansible role model AFAIK (happy to be
corrected here, I'm not an ansible expert :)

For these reasons (as mentioned in my reply to James), I think a first
step of a "split stack" model where heat deploys the nodes/networks
etc, then outputs data that can be consumed by Ansible is reasonable -
it leaves the operator interfaces alone for now, and gives us time to
think about the interface changes that may be needed long term, while
still giving most of the operator-debug and usability/scalabilty
benefits that I think folks pushing for Ansible are looking for.

Steve




> On Sat, Jul 8, 2017 at 12:20 AM, James Slagle 
> wrote:
>>
>> On Fri, Jul 7, 2017 at 5:31 PM, David Moreau Simard 
>> wrote:
>> > On Fri, Jul 7, 2017 at 1:50 PM, James Slagle 
>> > wrote:
>> >> (0) tripleo-quickstart which follows the common and well accepted
>> >> approach to bundling a set of Ansible playbooks/roles.
>> >
>> > I don't want to de-rail the thread but I really want to bring some
>> > attention to a pattern that tripleo-quickstart has been using across
>> > it's playbooks and roles.
>> > I sincerely hope that we can find a better implementation should we
>> > start developing new things from scratch.
>>
>> Yes, just to clarify...by "well accepted" I just meant how the git
>> repo is organized and how you are expected to interface with those
>> playbooks and roles as opposed to what those playbooks/roles actually
>> do.
>>
>> > I'll sound like a broken record for those that have heard me mention
>> > this before but for those that haven't, here's a concrete example of
>> > how things are done today:
>> > (Sorry for the link overload, making sure the relevant information is
>> > available)
>> >
>> > For an example tripleo-quickstart job, here's the console [1] and it's
>> > corresponding ARA report [2]:
>> > - A bash script is created [3][4][5] from a jinja template [6]
>> > - A task executes the bash script [7][8][9]
>>
>> From my limited experience, I believe the intent was that the
>> playbooks should do what a user is expected to do so that it's as
>> close to reproducing the user interface of TripleO 1:1.
>>
>> For example, we document users running commands from a shell prompt.
>> Therefore, oooq ought to do the same thing as close as possible.
>> Obviously there will be gaps, just as there is with tripleo.sh, but I
>> feel that both tools (tripleo.sh/oooq) were trying to be faithful to
>> our published docs as mush as possible, and I think there's something
>> to be commended there.
>>
>> Not saying it's right or wong, just that I believe that was the intent.
>>
>> An alternative would be custom ansible modules that exposed tasks for
>> interfacing with our API directly. That would also be valuable, as
>> that code path is mostly untested now outside of the UI and CLI.
>>
>> I think that tripleo-quickstart is a slightly different class of
>> "thing" from the other current Ansible uses I mentioned, in that it
>> sits at a layer above everything else. It's meant to automate TripleO
>> itself vs TripleO automating things. Regardless, we should certainly
>> consider how it fits into a larger plan.
>>
>> --
>> -- James Slagle
>> --
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
>
> --
>
> Yolanda Robla Mota
>
> Principal Software Engineer, RHCE
>
> Red Hat
>
> C/Avellana 213
>
> Urb Portugal
>
> yrobl...@redhat.comM: +34605641639
>
>
> __
> OpenStack 

Re: [openstack-dev] [TripleO] Forming our plans around Ansible

2017-07-10 Thread Steven Hardy
On Fri, Jul 7, 2017 at 6:50 PM, James Slagle  wrote:
> I proposed a session for the PTG
> (https://etherpad.openstack.org/p/tripleo-ptg-queens) about forming a
> common plan and vision around Ansible in TripleO.
>
> I think it's important however that we kick this discussion off more
> broadly before the PTG, so that we can hopefully have some agreement
> for deeper discussions and prototyping when we actually meet in
> person.

Thanks for starting this James, it's a topic that I've also been
giving quite a lot of thought to lately (and as you've seen, have
pushed some related patches) so it's good to get some broader
discussions going.

> Right now, we have multiple uses of Ansible in TripleO:
>
> (0) tripleo-quickstart which follows the common and well accepted
> approach to bundling a set of Ansible playbooks/roles.

FWIW I agree with Giulio that quickstart is a separate case, and while
I also do agree with David that there's plenty of scope for
improvement of the oooq user experience, but I'm going to focus on the
TripleO deployment aspects below.

> (1) Mistral calling Ansible. This is the approach used by
> tripleo-validations where Mistral directly executes ansible playbooks
> using a dynamic inventory. The inventory is constructed from the
> server related stack outputs of the overcloud stack.
>
> (2) Ansible running playbooks against localhost triggered by the
> heat-config Ansible hook. This approach is used by
> tripleo-heat-templates for upgrade tasks and various tasks for
> deploying containers.
>
> (3) Mistral calling Heat calling Mistral calling Ansible. In this
> approach, we have Mistral resources in tripleo-heat-templates that are
> created as part of the overcloud stack and in turn, the created
> Mistral action executions run ansible. This has been prototyped with
> using ceph-ansible to install Ceph as part of the overcloud
> deployment, and some of the work has already landed. There are also
> proposed WIP patches using this approach to install Kubernetes.
>
> There are also some ideas forming around pulling the Ansible playbooks
> and vars out of Heat so that they can be rerun (or run initially)
> independently from the Heat SoftwareDeployment delivery mechanism:
>
> (4) https://review.openstack.org/#/c/454816/
>
> (5) Another idea I'd like to prototype is a local tool that runs on
> the undercloud and pulls all of the SoftwareDeployment data out of
> Heat as the stack is being created and generates corresponding Ansible
> playbooks to apply those deployments. Once a given playbook is
> generated by the tool, the tool would signal back to Heat that the
> deployment is complete. Heat then creates the whole stack without
> actually applying a single deployment to an overcloud node. At that
> point, Ansible (or Mistral->Ansible for an API) would be used to do
> the actual deployment of the Overcloud with the Undercloud as the
> ansible runner.

Yeah so my idea with (4), and subsequent patches such as[1] is to
gradually move the deploy steps performed to configure services (on
baremetal and in containers) to a single ansible playbook.

There's currently still heat orchestration around the host preparation
(although this is performed via ansible) and iteration over each step
(where we re-apply the same deploy-steps playbook with an incrementing
step variable, but this could be replaced by e.g an ansible or mistral
loop), but my idea was to enable end-to-end configuration of nodes via
ansible-playbook, without the need for any special tooks (e.g we
refactor t-h-t enough that we don't need any special tools, and we
make deploy-steps-playbook.yaml the only method of deployment (for
baremetal and container cases)

[1] https://review.openstack.org/#/c/462211/

> All of this work has merit as we investigate longer term plans, and
> it's all at different stages with some being for dev/CI (0), some
> being used already in production (1 and 2), some just at the
> experimental stage (3 and 4), and some does not exist other than an
> idea (5).

I'd like to get the remaining work for (4) done so it's a supportable
option for minor updates, but there's still a bit more t-h-t
refactoring required to enable it I think, but I think we're already
pretty close to being able to run end-to-end ansible for most of the
PostDeploy steps without any special tooling.

Note this related patch from Matthieu:

https://review.openstack.org/#/c/444224/

I think we'll need to go further here but it's a starting point which
shows how we could expose ansible tasks from the heat stack outputs as
a first step to enabling standalone configuration via ansible (or
mistral->ansible)

> My intent with this mail is to start a discussion around what we've
> learned from these approaches and start discussing a consolidated plan
> around Ansible. And I'm not saying that whatever we come up with
> should only use Ansible a certain way. Just that we ought to look at
> how users/operators interact with Ansible 

Re: [openstack-dev] [tripleo] proposing Alex Schultz tripleo-core in all projects

2017-07-10 Thread Steven Hardy
+1

On Fri, Jul 7, 2017 at 6:39 PM, Emilien Macchi  wrote:
> Alex has demonstrated high technical and community skills in TripleO -
> where he's already core on THT, instack-undercloud, and puppet-tripleo
> - but also very involved in other repos.
> I propose that we extend his core status to all TripleO projects and
> of course trust him (like we trust all core members) to review patches
> were we feel confortable with.
>
> He has shown an high interest in reviewed other TripleO projects and I
> think he would be ready for this change.
> As usual, this is an open proposal, any feedback is welcome.
>
> Thanks,
> --
> Emilien Macchi
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] How to test private patches on overcloud ?

2017-06-29 Thread Steven Hardy
On Thu, Jun 29, 2017 at 4:21 PM, Dnyaneshwar Pawar
 wrote:
> Hi,
>
> while testing my patches i am getting errors with command 'openstack deploy
> --templates' any idea what is wrong here?
>
> http://paste.openstack.org/show/614074/

This shows rabbit is failing to start:

 Error: Systemd start for rabbitmq-server failed!

Your patches modify the rabbit configuration I think?

My next step to debug this would be to run rabbit manually and see why
it won't start, fix the config until it does, then align the puppet
configuration with the working settings.

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat] Deprecate/Remove deferred_auth_method=password config option

2017-06-29 Thread Steven Hardy
On Thu, Jun 29, 2017 at 4:13 AM, Kaz Shinohara  wrote:
> Hi,
>
>> No, I think we still need this, because it is disabled by default -
>> this option allows you to enable defeating token expiry via trusts,
> My understanding for the current implementation is..
> `deferred_auth_method=trust` triggers getting trust_id and storing it in the 
> db.
> `reauthentication_method=trust` triggers getting trust scoped token by
> taking the trust id(Allow reauthentication)
> Those looks somehow duplicated because trust_id is required only when
> you want to get the trust scoped token, it's ok for heat to get
> trust_id when he need to get trust scoped token, isn't it ?

No they are two different uses for the trust_id;

1. reauthentication_method unset (default) - we can get a trust scoped
token for deferred operations such as autoscaling, but we cannot
defeat the token expiry set by keystone by reauthenticating.

2. reauthentication_method=trusts - we can get a trust scoped token
for any operation (including those initiated by a user with a real not
trust scoped token), such that the token expiry set by keystone can be
defeated.

(2) is not a safe default, but it's useful for certain use-cases such
as TripleO where stack operations can take many hours.

> In case of removing the password authentication, why don't we remove
> `deferred_auth_method` from heat.conf and unify
> 'reauthentication_method' to triggers getting trust_id and getting
> trust scoped token.

Yes as I said before, we could remove deferred_auth_method but we
cannot remove reauthentication_method.

HTH,

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo]: overcloud deploy --templates fails.

2017-06-29 Thread Steven Hardy
On Thu, Jun 29, 2017 at 7:07 AM, Dnyaneshwar Pawar
 wrote:
> Hi,
>
>
>
> I am getting following error. Do I need to specify anything else in the
> command?
>
> This command was working fine before commit
> 3e6147dee4497bbb607e6377bd588e32f62be2b7 .

There is a Depends-On: I961624723d127aebbaacd0c2b481211d83dde3f6 -
have you updated tripleo-common and re-run openstack undercloud
install?

There is some coupling between changes in tripleo-heat-templates and
tripleo-common mistral actions/workbooks, so if you don't update both
parts it's likely things won't work as expected.

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Pt. 2 of Passing along some field feedback

2017-06-28 Thread Steven Hardy
On Wed, Jun 28, 2017 at 8:06 PM, Ben Nemec  wrote:
> A few weeks later than I had planned, but here's the other half of the field
> feedback I mentioned in my previous email:
>
> * They very emphatically want in-place upgrades to work when moving from
> non-containerized to containerized.  I think this is already the plan, but I
> told them I'd make sure development was aware of the desire.

It is the plan, and already has basic CI coverage via
gate-tripleo-ci-centos-7-containers-multinode-upgrades-nv

At this point we need more testing of production-like deployments but
in general this is expected to work.

> * There was also great interest in contributing back some of the custom
> templates that they've had to write to get advanced features working in the
> field.  Here again we recommended that they start with an RFE so things
> could be triaged appropriately.  I'm hoping we can find some developer time
> to help polish and shepherd these things through the review process.
>
> * Policy configuration was discussed, and I pointed them at some recent work
> we have done around that:
> https://docs.openstack.org/developer/tripleo-docs/advanced_deployment/api_policies.html
> I'm not sure it fully addressed their issues, but I suggested they take a
> closer look and provide feedback on any ways it doesn't meet their needs.
>
> The specific use case they were looking at right now was adding a read-only
> role.  They did provide me with a repo containing their initial work, but
> unfortunately it's private to Red Hat so I can't share it here.
>
> * They wanted to be able to maintain separate role files instead of one
> monolithic roles_data.yaml.  Apparently they have a pre-deploy script now
> that essentially concatenates some individual files to get this
> functionality.  I think this has already been addressed by
> https://review.openstack.org/#/c/445687

Yes this is already possible, but only via the CLI - that feature
needs porting to tripleo-common so that it can be consumed by
tripleo-ui, which was discussed but I'm not sure on the latest status.

> * They've also been looking at ways to reorganize the templates in a more
> intuitive fashion.  At first glance the changes seemed reasonable, but they
> were still just defining the layout.  I don't know that they've actually
> tried to use the reorganized templates yet and given the number of relative
> paths in tht I suspect it may be a bigger headache than they expect, but I
> thought it was interesting.  There may at least be elements of this work
> that we can use to make the templates easier to understand for deployers.

More information on this would be helpful, e.g what specific issues
they are trying to solve and the layout they found to be better and
why?

Thanks,

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat][mistral][deployment] how to add deployment roles

2017-06-27 Thread Steven Hardy
Hi Dan,

On Tue, Jun 27, 2017 at 9:19 PM, Dan Trainor  wrote:
> Hi -
>
> I'm looking for the glue that populates the overcloud role list.
>
> I can get a list of roles via 'openstack overcloud role list', however I'm
> looking to create new roles to incorporate in to this list.
>
> I got as far as using 'mistral action-update' against what I beleive to be
> the proper action (tripleo.role.list) but am not sure what to use as the
> source of what I would be updating, not am I  finding any information about
> how that runs and where it gets its data from.  I also had a nice exercise
> pruning the output of 'mistral action-*' commands which was pretty
> insightful and helped me hone in on what I was looking for, but still
> uncertain of.

I think perhaps the confusion is because this was implemented in
tripleoclient, and porting it to tripleo-common is not yet completed?
(Alex can confirm the status of this but it was planned I think).

Related ML discussion which includes links to the patches:

http://lists.openstack.org/pipermail/openstack-dev/2017-June/118157.html

http://lists.openstack.org/pipermail/openstack-dev/2017-June/118213.html

HTH,

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat] Deprecate/Remove deferred_auth_method=password config option

2017-06-21 Thread Steven Hardy
On Fri, Jun 16, 2017 at 10:09 AM, Kaz Shinohara  wrote:
> Hi Rabi,
>
>
> I still takes `deferred _auth_method=password` behalf of trusts because we
> don't enable trusts in the Keystone side due to some internal reason.
> The issues what you pointed are correct(e.g. user_domain_id), we don't use
> the domain well and also added some patches to skip those issues.
> But I guess that the majority of heat users already moved to trusts and it
> is obviously better solution in terms of security and granular role control.
> As the edge case(perhaps), if a user want to take password auth, it would be
> too tricky for them to introduce it, therefore I agree your 2nd option.
>
> If we will remove the `deferred_auth_method=password` from heat.conf,
> should we keep `deferred_auth_method` self or will replace it to a new
> config option just to specify the trusts enable/disable ?  Do you have any
> idea on this?

I don't think it makes sense to have an enable/disable trusts config
option unless there is an alternative (e.g we've discussed oauth in
the past and in future there may be alternatives to trusts).

I guess if there was sufficient interest we could have some option
that blacklists all resources that require deferred authentication,
but I'm not sure folks are actually asking for that right now?

My preference is to deprecate deferred_auth_method, since right now
there's not really any alternative that works for us.

> Also I'm thinking that `reauthentication_method` also might be
> changed/merged ?

No, I think we still need this, because it is disabled by default -
this option allows you to enable defeating token expiry via trusts,
which is something an operator must opt-in to IMO (we should not
enable this by default, as it's really only intended for certain edge
cases such as TripleO where there are often very long running stack
operations that may exceed the keystone token expiry).

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat] Making stack outputs static

2017-06-12 Thread Steven Hardy
On Mon, Jun 12, 2017 at 6:18 PM, Ben Nemec  wrote:
>
>
> On 06/09/2017 03:10 PM, Zane Bitter wrote:
>>
>> History lesson: a long, long time ago we made a very big mistake. We
>> treated stack outputs as things that would be resolved dynamically when
>> you requested them, instead of having values fixed at the time the
>> template was created or updated. This makes performance of reading
>> outputs slow, especially for e.g. large stacks, because it requires
>> making ReST calls, and it can result in inconsistencies between Heat's
>> internal model of the world and what it actually outputs.
>>
>> As unfortunate as this is, it's difficult to change the behaviour and be
>> certain that no existing users will get broken. For that reason, this
>> issue has never been addressed. Now is the time to address it.
>>
>> Here's the tracker bug: https://bugs.launchpad.net/heat/+bug/1660831
>>
>> It turns out that the correct fix is to store the attributes of a
>> resource in the DB - this accounts for the fact that outputs may contain
>> attributes of multiple resources, and that these resources might get
>> updated at different times. It also solves a related consistency issue,
>> which is that during a stack update a resource that is not updated may
>> nevertheless report new attribute values, and thus cause things
>> downstream to be updated, or to fail, unexpectedly (e.g.
>> https://bugzilla.redhat.com/show_bug.cgi?id=1430753#c13).
>>
>> The proposal[1] is to make this change in Pike for convergence stacks
>> only. This is to allow some warning for existing users who might be
>> relying on the current behaviour - at least if they control their own
>> cloud then they can opt to keep convergence disabled, and even once they
>> opt to enable it for new stacks they can keep using existing stacks in
>> legacy mode until they are ready to convert them to convergence or
>> replace them. In addition, it avoids the difficulty of trying to get
>> consistency out of the legacy path's crazy backup-stack shenanigans -
>> there's basically no way to get the outputs to behave in exactly the
>> same way in the legacy path as they will in convergence.
>>
>> This topic was raised at the Forum, and there was some feedback that:
>>
>> 1) There are users who require the old behaviour even after they move to
>> convergence.
>> 2) Specifically, there are users who don't have public API endpoints for
>> services other than Heat, and who rely on Heat proxying requests to
>> other services to get any information at all about their resources o.O
>> 3) There are users still using the legacy path (*cough*TripleO) that
>> want the performance benefits of quick output resolution.
>>
>> The suggestion is that instead of tying the change to the convergence
>> flag, we should make it configurable by the user on a per-stack basis.
>>
>> I am vehemently opposed to this suggestion.
>>
>> It's a total cop-out to make the user decide. The existing behaviour is
>> clearly buggy and inconsistent. Users are not, and should not have to
>> be, sufficiently steeped in the inner workings of Heat to be able to
>> decide whether and when to subject themselves to random inconsistencies
>> and hope for the best. If we make the change the default then we'll
>> still break people, and if we don't we'll still be saying "OMG, you
>> forgot to enable the --not-suck option??!" 10 years from now.
>>
>> Instead, this is what I'm proposing as the solution to the above feedback:
>>
>> 1) The 'show' attribute of each resource will be marked CACHE_NONE[2].
>> This ensures that the live data is always available via this attribute.
>> 2) When showing a resource's attributes via the API (as opposed to
>> referencing them from within a template), always return live values.[3]
>> Since we only store the attribute values that are actually referenced in
>> the template anyway, we more or less have to do this if we want the
>> attributes output through this API to be consistent with each other.
>> 3) Move to convergence. Seriously, the memory and database usage are
>> much improved, and there are even more memory improvements in the
>> pipeline,[4] and they might even get merged in Pike as long as we don't
>> have to stop and reimplement the attribute storage patches that they
>> depend on. If TripleO were to move to convergence in Queens, which I
>> believe is 100% feasible, then it would get the performance improvements
>> at least as soon as it would if we tried to implement attribute storage
>> in the legacy path.
>
>
> I think we wanted to move to convergence anyway so I don't see a problem
> with this.  I know there was some discussion about starting to test with
> convergence in tripleo-ci, does anyone know what, if anything, happened with
> that?

There's an experimental job that runs only on the heat repo
(gate-tripleo-ci-centos-7-ovb-nonha-convergence)

But yeah now seems like a good time to get something running more
regularly in tripleo-ci.

Steve


Re: [openstack-dev] [requirements][tripleo][heat] Projects holding back requirements updates

2017-05-26 Thread Steven Hardy
On Thu, May 25, 2017 at 03:15:13PM -0500, Ben Nemec wrote:
> Tagging with tripleo and heat teams for the os-*-config projects.  I'm not
> sure which owns them now, but it sounds like we need a new release.

I think they're still owned by the TripleO team, but we missed them in the
pike-1 release, I pushed this patch aiming to resolve that:

https://review.openstack.org/#/c/468292/

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Onboarding rooms postmortem, what did you do, what worked, lessons learned

2017-05-22 Thread Steven Hardy
On Fri, May 19, 2017 at 09:22:07AM -0400, Sean Dague wrote:
> This is a thread for anyone that participated in the onboarding rooms,
> on either the presenter or audience side. Because we all went into this
> creating things from whole cloth, I'm sure there are lots of lessons
> learned.
> 
> If you ran a room, please post the project, what you did in the room,
> what you think worked, what you would have done differently. If you
> attended a room you didn't run, please provide feedback about which one
> it was, and what you thought worked / didn't work from the other side of
> the table.

TripleO:
Attendees - nearly full room (~30 people?)

We took an informal approach to our session, we polled the room asking for
questions, and on request gave an architectural overview and some
code/template walkthroughs, then had open questions/discussion for the
remainder of the session.

Overall it worked quite well, but next time I would like visibility of
some specific questions/topics ahead of time to enable better preparation
of demo/slide content, and also we should have prepared a demo environment
prior to the session to enable easier hands-on examples/demos.

Overall I thought the new track was a good idea, and the feedback I got
from those attending was positive.

The slides we used are linked from this blog post:

http://hardysteven.blogspot.co.uk/2017/05/openstack-summit-tripleo-project.html

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Issue while applying customs configuration to overcloud.

2017-05-16 Thread Steven Hardy
On Tue, May 16, 2017 at 04:33:33AM +, Dnyaneshwar Pawar wrote:
> Hi TripleO team,
> 
> I am trying to apply custom configuration to an existing overcloud. (using 
> openstack overcloud deploy command)
> Though there is no error, the configuration is in not applied to overcloud.
> Am I missing anything here?
> http://paste.openstack.org/show/609619/

In your paste you have the resource_registry like this:

OS::TripleO::ControllerServer: /home/stack/test/heat3_ocata.yaml

The problem is OS::TripleO::ControllerServer isn't a resource type we use,
e.g it's not a valid hook to enable additional node configuration.

Instead try something like this:

OS::TripleO::NodeExtraConfigPost: /home/stack/test/heat3_ocata.yaml

Which will run the script on all nodes, as documented here:

https://docs.openstack.org/developer/tripleo-docs/advanced_deployment/extra_config.html

Out of interest, where did you find OS::TripleO::ControllerServer, do we
have a mistake in our docs somewhere?

Also in your template the type: OS::Heat::SoftwareDeployment should be
either type: OS::Heat::SoftwareDeployments (as in the docs) or type:
OS::Heat::SoftwareDeploymentGroup (the newer name for SoftwareDeployments,
we should switch the docs to that..).

Hope that helps!

-- 
Steve Hardy
Red Hat Engineering, Cloud

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Validations before upgrades and updates

2017-05-15 Thread Steven Hardy
On Mon, May 08, 2017 at 02:45:08PM +0300, Marios Andreou wrote:
>Hi folks, after some discussion locally with colleagues about improving
>the upgrades experience, one of the items that came up was pre-upgrade and
>update validations. I took an AI to look at the current status of
>tripleo-validations [0] and posted a simple WIP [1] intended to be run
>before an undercloud update/upgrade and which just checks service status.
>It was pointed out by shardy that for such checks it is better to instead
>continue to use the per-service  manifests where possible like [2] for
>example where we check status before N..O major upgrade. There may still
>be some undercloud specific validations that we can land into the
>tripleo-validations repo (thinking about things like the neutron
>networks/ports, validating the current nova nodes state etc?).
>So do folks have any thoughts about this subject - for example the kinds
>of things we should be checking - Steve said he had some reviews in
>progress for collecting the overcloud ansible puppet/docker config into an
>ansible playbook that the operator can invoke for upgrade of the 'manual'
>nodes (for example compute in the N..O workflow) - the point being that we
>can add more per-service ansible validation tasks into the service
>manifests for execution when the play is run by the operator - but I'll
>let Steve point at and talk about those. 

Thanks for starting this thread Marios, sorry for the slow reply due to
Summit etc.

As we discussed, I think adding validations is great, but I'd prefer we
kept any overcloud validations specific to services in t-h-t instead of
trying to manage service specific things over multiple repos.

This would also help with the idea of per-step validations I think, where
e.g you could have a "is service active" test and run it after the step
where we expect the service to start, a blueprint was raised a while back
asking for exactly that:

https://blueprints.launchpad.net/tripleo/+spec/step-by-step-validation

One way we could achive this is to add ansible tasks that perform some
validation after each step, where we combine the tasks for all services,
similar to how we already do upgrade_tasks and host_prep_tasks:

https://github.com/openstack/tripleo-heat-templates/blob/master/docker/services/database/redis.yaml#L92

With the benefit of hindsight using ansible tags for upgrade_tasks wasn't
the best approach, because you can't change the tags via SoftwareDeployment
(e.g you need a SoftwareConfig per step), it's better if we either generate
the list of tasks by merging maps e.g 

  validation_tasks:
step3:
  - sometask

Or via ansible conditionals where we pass a step value in to each run of
the tasks:

  validation_tasks:
- sometask
  when: step == 3

The latter approach is probably my preference, because it'll require less
complex merging in the heat layer.

As you mentioned, I've been working on ways to make the deployment steps
more ansible driven, so having these tasks integrated with the t-h-t model
would be well aligned with that I think:

https://review.openstack.org/#/c/454816/

https://review.openstack.org/#/c/462211/

Happy to discuss further when you're ready to start integrating some
overcloud validations.

Thanks!

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] Heat template example repository

2017-05-15 Thread Steven Hardy
On Mon, May 15, 2017 at 04:46:28PM +0200, Lance Haig wrote:
> Hi Steve,
> 
> I am happy to assist in any way to be honest.
> 
> The backwards compatibility is not always correct as I have seen when
> developing our library of templates on Liberty and then trying to deploy it
> on Mitaka for example.

Yeah, I guess it's true that there are sometimes deprecated resource
interfaces that get removed on upgrade to a new OpenStack version, and that
is independent of the HOT version.

As we've proven, maintaining these templates has been a challenge given the
available resources, so I guess I'm still in favor of not duplicating a bunch
of templates, e.g perhaps we could focus on a target of CI testing
templates on the current stable release as a first step?

> As you guys mentioned in our discussions the Networking example I quoted is
> not something you guys can deal with as the source project affects this.
> 
> Unless we can use this exercise to test these and fix them then I am
> happier.
> 
> My vision would be to have a set of templates and examples that are tested
> regularly against a running OS deployment so that we can make sure the
> combinations still run. I am sure we can agree on a way to do this with CICD
> so that we test the fetureset.

Agreed, getting the approach to testing agreed seems like the first step -
FYI we do already have automated scenario tests in the main heat tree that
consume templates similar to many of the examples:

https://github.com/openstack/heat/tree/master/heat_integrationtests/scenario

So, in theory, getting a similar test running on heat_templates should be
fairly simple, but getting all the existing templates working is likely to
be a bigger challenge.

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] Heat template example repository

2017-05-15 Thread Steven Hardy
On Mon, May 15, 2017 at 03:21:30AM -0400, Lance Haig wrote:
>Good to know that there is interest.

Thanks for starting this effort - I agree it would be great to see the
example templates we provide improved and over time become better
references for heat features (as well as being more well tested).

>I was thinking that we should perhaps create a directory for each
>openstack version.

I'm personally not keen on this - Heat should handle old HOT versions in a
backwards compatible way, and we can use the template version (which
supports using the release name in recent heat versions) to document the
required version e.g if demonstrating some new resource or function.

FWIW we did already try something similar in the early days of heat, where
we had duplicate wordpress examples for different releases (operating
systems not OpenStack versions but it's the same problem).  We found that
old versions quickly became unmaintained, and ultimately got broken anyway
due to changes unrelated to Heat or OpenStack versions.

>so we start say with a mitaka directory and then move the files there and
>test them all so that they work with Liberty.
>Then we can copy it over to Mitaka and do the same but add the extra
>functionality.

While some manual testing each release is better than nothing, honestly I
feel like CI testing some (or ideally all) examples is the only way to
ensure they're not broken.  Clearly that's going to be more work initially,
but it'd be worth considering I think.

To make this simple for template authors, we could perhaps just create the
template with the default parameters, and codify some special place to
define the expected ouput values (we could for example have a special
expected_output parameter which the CI test consumes and compares after the
stack create completes).

>and then Newton etc...
>That way if someone is on a specific version they only have to go to a
>specific directory to get the examples they need.

As mentioned above, I think just using the template version should be
enough - we could even generate some docs using this to highlight example
templates that are specific to a release?

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Project On-Boarding Info Collection

2017-05-10 Thread Steven Hardy
On Mon, May 08, 2017 at 06:57:38PM +, Kendall Nelson wrote:
>Hello!
> 
>If you are running a project onboarding session and have etherpads/slides/
>etc you are using to educate new contributors  please send them to me! I
>am collecting all the resources you are sharing into a single place for
>people that weren't able to attend sessions.

Thanks, this sounds like a good idea :)

Here are the slides for the TripleO onboarding session we had yesterday:

https://github.com/hardys/presentations/blob/master/TripleOProjectOnboarding.pdf

(there is also an odp version for anyone wishing to reuse/modify)

Thanks!

Steve Hardy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [qa][heat][murano][daisycloud] Removing Heat support from Tempest

2017-05-04 Thread Steven Hardy
On Wed, May 03, 2017 at 11:56:56PM -0400, Matthew Treinish wrote:
> On Wed, May 03, 2017 at 11:51:13AM +, Andrea Frittoli wrote:
> > On Tue, May 2, 2017 at 5:33 PM Matthew Treinish 
> > wrote:
> > 
> > > On Tue, May 02, 2017 at 09:49:14AM +0530, Rabi Mishra wrote:
> > > > On Fri, Apr 28, 2017 at 2:17 PM, Andrea Frittoli <
> > > andrea.fritt...@gmail.com>
> > > > wrote:
> > > >
> > > > >
> > > > >
> > > > > On Fri, Apr 28, 2017 at 10:29 AM Rabi Mishra 
> > > wrote:
> > > > >
> > > > >> On Thu, Apr 27, 2017 at 3:55 PM, Andrea Frittoli <
> > > > >> andrea.fritt...@gmail.com> wrote:
> > > > >>
> > > > >>> Dear stackers,
> > > > >>>
> > > > >>> starting in the Liberty cycle Tempest has defined a set of projects
> > > > >>> which are in scope for direct
> > > > >>> testing in Tempest [0]. The current list includes keystone, nova,
> > > > >>> glance, swift, cinder and neutron.
> > > > >>> All other projects can use the same Tempest testing infrastructure
> > > (or
> > > > >>> parts of it) by taking advantage
> > > > >>> the Tempest plugin and stable interfaces.
> > > > >>>
> > > > >>> Tempest currently hosts a set of API tests as well as a service
> > > client
> > > > >>> for the Heat project.
> > > > >>> The Heat service client is used by the tests in Tempest, which run 
> > > > >>> in
> > > > >>> Heat gate as part of the grenade
> > > > >>> job, as well as in the Tempest gate (check pipeline) as part of the
> > > > >>> layer4 job.
> > > > >>> According to code search [3] the Heat service client is also used by
> > > > >>> Murano and Daisycore.
> > > > >>>
> > > > >>
> > > > >> For the heat grenade job, I've proposed two patches.
> > > > >>
> > > > >> 1. To run heat tree gabbi api tests as part of grenade 'post-upgrade'
> > > > >> phase
> > > > >>
> > > > >> https://review.openstack.org/#/c/460542/
> > > > >>
> > > > >> 2. To remove tempest tests from the grenade job
> > > > >>
> > > > >> https://review.openstack.org/#/c/460810/
> > > > >>
> > > > >>
> > > > >>
> > > > >>> I proposed a patch to Tempest to start the deprecation counter for
> > > Heat
> > > > >>> / orchestration related
> > > > >>> configuration items in Tempest [4], and I would like to make sure
> > > that
> > > > >>> all tests and the service client
> > > > >>> either find a new home outside of Tempest, or are removed, by the 
> > > > >>> end
> > > > >>> the Pike cycle at the latest.
> > > > >>>
> > > > >>> Heat has in-tree integration tests and Gabbi based API tests, but I
> > > > >>> don't know if those provide
> > > > >>> enough coverage to replace the tests on Tempest side.
> > > > >>>
> > > > >>>
> > > > >> Yes, the heat gabbi api tests do not yet have the same coverage as 
> > > > >> the
> > > > >> tempest tree api tests (lacks tests using nova, neutron and swift
> > > > >> resources),  but I think that should not stop us from *not* running
> > > the
> > > > >> tempest tests in the grenade job.
> > > > >>
> > > > >> I also don't know if the tempest tree heat tests are used by any 
> > > > >> other
> > > > >> upstream/downstream jobs. We could surely add more tests to bridge
> > > the gap.
> > > > >>
> > > > >> Also, It's possible to run the heat integration tests (we've enough
> > > > >> coverage there) with tempest plugin after doing some initial setup,
> > > as we
> > > > >> do in all our dsvm gate jobs.
> > > > >>
> > > > >> It would propose to move tests and client to a Tempest plugin owned /
> > > > >>> maintained by
> > > > >>> the Heat team, so that the Heat team can have full flexibility in
> > > > >>> consolidating their integration
> > > > >>> tests. For Murano and Daisycloud - and any other team that may want
> > > to
> > > > >>> use the Heat service
> > > > >>> client in their tests, even if the client is removed from Tempest, 
> > > > >>> it
> > > > >>> would still be available via
> > > > >>> the Heat Tempest plugin. As long as the plugin implements the 
> > > > >>> service
> > > > >>> client interface,
> > > > >>> the Heat service client will register automatically in the service
> > > > >>> client manager and be available
> > > > >>> for use as today.
> > > > >>>
> > > > >>>
> > > > >> if I understand correctly, you're proposing moving the existing
> > > tempest
> > > > >> tests and service clients to a separate repo managed by heat team.
> > > Though
> > > > >> that would be collective decision, I'm not sure that's something I
> > > would
> > > > >> like to do. To start with we may look at adding some of the missing
> > > pieces
> > > > >> in heat tree itself.
> > > > >>
> > > > >
> > > > > I'm proposing to move tests and the service client outside of tempest
> > > to a
> > > > > new home.
> > > > >
> > > > > I also suggested that the new home could be a dedicate repo, since 
> > > > > that
> > > > > would allow you to maintain the
> > > > > current branchless nature of those tests. A more detailed discussion
> > > about
> > > > > the topic can be found
> > > > > in the 

Re: [openstack-dev] [heat][ironic][tripleo] ironic resources in heat

2017-04-27 Thread Steven Hardy
On Thu, Apr 27, 2017 at 09:39:51AM +0300, Pavlo Shchelokovskyy wrote:
>HI all,
>from some conversations I had during Pike PTG and recently in IRC I
>understood that the need for ironic resources in heat has arisen again. I
>remember back when it was proposed for the first time there was some
>opposition from ironic community (although I personally find it reasonable
>to have those) but AFAIU this is no longer the case.
>I would gladly revive old Steven Hardy patches [0] (have them starred on
>Gerrit since then :) ) and make it happen if there are no objections.
>I also see that the spec itself to this regard has been recently
>re-proposed [1], so if someone is already working on those, I'm
>volunteering to help with it with my both ironic and heat hats on :)

Feel free to revive my patches, or reuse any parts of them which may be
useful.

FWIW (as I think I mentioned in the spec review and some previous threads
on this topic), I reached the conclusion that the really interesting thing
about driving ironic via heat is not so much registering the inventory, but
driving the workflow around the various node states.

I personally thing mistral is a better fit for that, so although I'm fine
with some Ironic heat resources landing, I think it'd be great if we could
define the workflow to e.g do introspection, or deploy an image to some
node via mistral, instead of hard-coded via a heat resource.

I started that here, but never had time to finish it, any help welcome! :)

https://review.openstack.org/#/c/313048/

Also, Giulio has started working on a Heat resource that can run mistral
workflows, which could be combined with a workflow similar to the above to
deploy nodes via heat.

https://review.openstack.org/#/c/420664/

Perhaps those are some other ideas we can consider while also looking at
any heat native plugins which may be useful.

Thanks!

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Heat] Conditionally passing properties in Heat

2017-04-20 Thread Steven Hardy
On Wed, Apr 19, 2017 at 02:51:28PM -0700, Dan Sneddon wrote:
> On 04/13/2017 12:01 AM, Rabi Mishra wrote:
> > On Thu, Apr 13, 2017 at 2:14 AM, Dan Sneddon  > > wrote:
> > 
> > On 04/12/2017 01:22 PM, Thomas Herve wrote:
> > > On Wed, Apr 12, 2017 at 9:00 PM, Dan Sneddon  > > wrote:
> > >> I'm implementing predictable control plane IPs for spine/leaf,
> > and I'm
> > >> running into a problem implementing this in the TripleO Heat
> > templates.
> > >>
> > >> I have a review in progress [1] that works, but fails on upgrade,
> > so I'm
> > >> looking for an alternative approach. I'm trying to influence the IP
> > >> address that is selected for overcloud nodes' Control Plane IP.
> > Here is
> > >> the current construct:
> > >>
> > >>   Controller:
> > >> type: OS::TripleO::Server
> > >> metadata:
> > >>   os-collect-config:
> > >> command: {get_param: ConfigCommand}
> > >> properties:
> > >>   image: {get_param: controllerImage}
> > >>   image_update_policy: {get_param: ImageUpdatePolicy}
> > >>   flavor: {get_param: OvercloudControlFlavor}
> > >>   key_name: {get_param: KeyName}
> > >>   networks:
> > >> - network: ctlplane  # <- Here's where the port is created
> > >>
> > >> If I add fixed_ip: to the networks element at the end of the above, I
> > >> can select an IP address from the 'ctlplane' network, like this:
> > >>
> > >>   networks:
> > >> - network: ctlplane
> > >>   fixed_ip: {get_attr: [ControlPlanePort, ip_address]}
> > >>
> > >> But the problem is that if I pass a blank string to fixed_ip, I
> > get an
> > >> error on deployment. This means that the old behavior of
> > automatically
> > >> selecting an IP doesn't work.
> > >>
> > >> I thought I has solved this by passing an external Neutron port,
> > like this:
> > >>
> > >>   networks:
> > >> - network: ctlplane
> > >>   port: {get_attr: [ControlPlanePort, port_id]}
> > >>
> > >> Which works for deployments, but that fails on upgrades, since the
> > >> original port was created as part of the Nova::Server resource,
> > instead
> > >> of being an external resource.
> > >
> > > Can you detail how it fails? I was under the impression we never
> > > replaced servers no matter what (or we try to do that, at least). Is
> > > the issue that your new port is not the correct one?
> > >
> > >> I'm now looking for a way to use Heat conditionals to apply the
> > fixed_ip
> > >> only if the value is not unset. Looking at the intrinsic
> > functions [2],
> > >> I don't see a way to do this. Is what I'm trying to do with Heat
> > possible?
> > >
> > > You should be able to write something like that (not tested):
> > >
> > > networks:
> > >   if:
> > > - 
> > > - network: ctlplane
> > >   fixed_ip: {get_attr: [ControlPlanePort, ip_address]}
> > > - network: ctlplane
> > >
> > > The question is how to define your condition. Maybe:
> > >
> > > conditions:
> > >   fixed_ip_condition:
> > >  not:
> > > equals:
> > >   - {get_attr: [ControlPlanePort, ip_address]}
> > >   - ''
> > >
> > > To get back to the problem you stated first.
> > >
> > >
> > >> Another option I'm exploring is conditionally applying resources. It
> > >> appears that would require duplicating the entire TripleO::Server
> > stanza
> > >> in *-role.yaml so that there is one that uses fixed_ip and one
> > that does
> > >> not. Which one is applied would be based on a condition that tested
> > >> whether fixed_ip was blank or not. The downside of that is that
> > it would
> > >> make the role definition confusing because there would be a large
> > >> resource that was implemented twice, with only one line difference
> > >> between them.
> > >
> > > You can define properties with conditions, so you shouldn't need to
> > > rewrite everything.
> > >
> > 
> > Thomas,
> > 
> > Thanks, I will try your suggestions and that should get me closer.
> > 
> > The full error log is available here:
> > 
> > http://logs.openstack.org/78/413278/11/check-tripleo/gate-tripleo-ci-centos-7-ovb-updates/8d91762/console.html
> > 
> > 
> > 
> > We do an interface_detach/attach when a port is replaced.
> > It seems to be failing[1] as this is not implemented for
> > ironic/baremetal driver.  I could see a patch[2] to add that
> > functionality though.
> > 
> > [1]
> > 

Re: [openstack-dev] [tripleo] pingtest vs tempest

2017-04-18 Thread Steven Hardy
On Mon, Apr 17, 2017 at 12:48:32PM -0400, Justin Kilpatrick wrote:
> On Mon, Apr 17, 2017 at 12:28 PM, Ben Nemec  wrote:
> > Tempest isn't really either of those things.  According to another message
> > in this thread it takes around 15 minutes to run just the smoke tests.
> > That's unacceptable for a lot of our CI jobs.
> 
> Ben, is the issue merely the time it takes? Is it the affect that time
> taken has on hardware availability?

It's both, but the main constraint is the infra job timeout, which is about
2.5hrs - if you look at our current jobs many regularly get close to (and
sometimes exceed this), so we just don't have the time budget available to
run exhasutive tests every commit.

> Should we focus on how much testing we can get into N time period?
> Then how do we decide an optimal N
> for our constraints?

Well yeah, but that's pretty much how/why we ended up with pingtest, it's
simple, fast, and provides an efficient way to do smoke tests, e.g creating
just one heat resource is enough to prove multiple OpenStack services are
running, as well as the DB/RPC etc etc.

> I've been working on a full up functional test for OpenStack CI builds
> for a long time now, it works but takes
> more than 10 hours. IF you're interested in results kick through to
> Kibana here [0]. Let me know off list if you
> have any issues, the presentation of this data is all experimental still.

This kind of thing is great, and I'd support more exhaustive testing via
periodic jobs etc, but the reality is we need to focus on "bang for buck"
e.g the deepest possible coverage in the most minimal amount of time for
our per-commit tests - we rely on the project gates to provide a full API
surface test, and we need to focus on more basic things like "did the service
start", and "is the API accessible".  Simple crud operations on a subset of
the API's is totally fine for this IMO, whether via pingtest or some other
means.

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] blog post about being PTL in OpenStack

2017-04-18 Thread Steven Hardy
On Thu, Apr 13, 2017 at 08:22:50PM -0400, Emilien Macchi wrote:
> Exceptionally, I'll self-promote a blog post that I wrote about my
> personal experience of being PTL in OpenStack.
> 
> http://my1.fr/blog/my-journey-as-an-openstack-ptl/

Thanks for writing this up Emilien - it aligns pretty well with my previous
experiences as both Heat and TripleO PTL, particularly the part about
dealing with interuptions and requests for help.

I agree the more autonomous squads approach has helped in that regard as we
have to acknowledge that individuals and their time only scales so far.

> My hope is to engage some discussion about what our community thinks
> about this role and how we could bring more leaders in OpenStack.
> This blog post also explains why I won't run for PTL during the next cycle.

Thanks for your great work as TripleO PTL over the last two cycles, and in
particular for your constant focus on improving CI coverage and team-building.

Thanks,

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Proposing Florian Fuchs for tripleo-validations core

2017-04-18 Thread Steven Hardy
On Thu, Apr 06, 2017 at 11:53:04AM +0200, Martin André wrote:
> Hellooo,
> 
> I'd like to propose we extend Florian Fuchs +2 powers to the
> tripleo-validations project. Florian is already core on tripleo-ui
> (well, tripleo technically so this means there is no changes to make
> to gerrit groups).
> 
> Florian took over many of the stalled patches in tripleo-validations
> and is now the principal contributor in the project [1]. He has built
> a good expertise over the last months and I think it's time he has
> officially the right to approve changes in tripleo-validations.
> 
> Consider this my +1 vote.

+1

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Roadmap for Container CI work

2017-04-05 Thread Steven Hardy
On Tue, Apr 04, 2017 at 04:01:48PM -0400, Emilien Macchi wrote:
> After our weekly meeting of today, I found useful to share and discuss
> our roadmap for Container CI jobs in TripleO.
> They are ordered by priority from the highest to lowest:
> 
> 1. Swap ovb-nonha job with ovb-containers, enable introspection on the
> container job and shuffle other coverage (e.g ssl) to other jobs
> (HA?). It will help us to get coverage for ovb-containers scenario
> again, without consuming more rh1 resources and keep existing
> coverage.
> 2. Get multinode coverage of deployments - this should integrate with
> the scenarios we already have defined for non-container deployment.
> This is super important to cover all overcloud services, like we did
> with classic deployments. It should be non voting to start and then
> voting once it works. We should find a way to keep the same templates
> as we have now, and just include the docker environment. In other
> words, find a way to keep using:
> https://github.com/openstack/tripleo-heat-templates/blob/master/ci/environments/scenario001-multinode.yaml
> so we don't duplicate scenario environments.
> 3. Implement container upgrade job, which for Pike will be deploy a
> baremetal overcloud, then migrate on upgrade to containers. Use
> multinode jobs for this task. Start with a non-voting job and move to
> the gate once it work. I also suggest to use scenarios framework, so
> we keep good coverage.
> 4. After we implement the workflow for minor updates, have a job with
> tests container-to-container updates for minor (rolling) updates, this
> ideally should add some coverage to ensure no downtime of APIs and
> possibly checks for service restarts (ref recent bugs about bouncing
> services on minor updates)
> 5. Once Pike is released and Queens starts, let's work on container to
> containers upgrade job.
> 
> Any feedback or question is highly welcome,

+1, I think this makes sense and is well aligned with what we discussed in
the meeting.

I agree the priority is roughly in the order listed above, but provided we
have sufficient folks willing to help we can probably work on some of these
tasks in parallel, as really we need at least (1), (2) and (3) ASAP.

I've started looking at (4) but there is significant work required to
enable this as our current breakpoint based update workflow won't work, and
it looks like we also can't use the rolling update feature of
SoftwareDeploymentGroup, because we want each node to be fully updated
before moving on to the next.

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [stable][heat] Nominate huangtianhua for heat-stable-maint

2017-04-03 Thread Steven Hardy
On Thu, Mar 30, 2017 at 04:45:38PM -0400, Zane Bitter wrote:
> We are feeling the pinch on stable-branch reviewers in Heat, so now that I
> understand the process a bit better, let's try this again.
> 
> I'd like to nominate Huang Tianhua to join the heat-stable-maint team.
> Tianhua is a heat-core member and one of our most prolific stable branch
> reviewers:
> 
> https://review.openstack.org/#/q/reviewer:huangtianhua+-owner:huangtianhua+projects:openstack/heat+branch:%22%255Estable/.*%22
> 
> IMHO her track record displays an understanding of the stable branch
> policies appropriate to a stable branch core. e.g.
> 
> * https://review.openstack.org/#/c/434030/
> * https://review.openstack.org/#/c/371135/
> * https://review.openstack.org/#/c/244948/

+1!

> Also, I suggest we take this opportunity to remove Angus Salkeld, since he
> is no longer actively working on OpenStack
> (http://stackalytics.com/?release=all_id=asalkeld)

+1

Thanks!

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] container jobs are unstable

2017-03-30 Thread Steven Hardy
On Wed, Mar 29, 2017 at 10:07:24PM -0400, Paul Belanger wrote:
> On Thu, Mar 30, 2017 at 09:56:59AM +1300, Steve Baker wrote:
> > On Thu, Mar 30, 2017 at 9:39 AM, Emilien Macchi  wrote:
> > 
> > > On Mon, Mar 27, 2017 at 8:00 AM, Flavio Percoco  wrote:
> > > > On 23/03/17 16:24 +0100, Martin André wrote:
> > > >>
> > > >> On Wed, Mar 22, 2017 at 2:20 PM, Dan Prince  wrote:
> > > >>>
> > > >>> On Wed, 2017-03-22 at 13:35 +0100, Flavio Percoco wrote:
> > > 
> > >  On 22/03/17 13:32 +0100, Flavio Percoco wrote:
> > >  > On 21/03/17 23:15 -0400, Emilien Macchi wrote:
> > >  > > Hey,
> > >  > >
> > >  > > I've noticed that container jobs look pretty unstable lately; to
> > >  > > me,
> > >  > > it sounds like a timeout:
> > >  > > http://logs.openstack.org/19/447319/2/check-tripleo/gate-tripleo-
> > >  > > ci-centos-7-ovb-containers-oooq-nv/bca496a/console.html#_2017-03-
> > >  > > 22_00_08_55_358973
> > >  >
> > >  > There are different hypothesis on what is going on here. Some
> > >  > patches have
> > >  > landed to improve the write performance on containers by using
> > >  > hostpath mounts
> > >  > but we think the real slowness is coming from the images download.
> > >  >
> > >  > This said, this is still under investigation and the containers
> > >  > squad will
> > >  > report back as soon as there are new findings.
> > > 
> > >  Also, to be more precise, Martin André is looking into this. He also
> > >  fixed the
> > >  gate in the last 2 weeks.
> > > >>>
> > > >>>
> > > >>> I spoke w/ Martin on IRC. He seems to think this is the cause of some
> > > >>> of the failures:
> > > >>>
> > > >>> http://logs.openstack.org/32/446432/1/check-tripleo/gate-
> > > tripleo-ci-cen
> > > >>> tos-7-ovb-containers-oooq-nv/543bc80/logs/oooq/overcloud-controller-
> > > >>> 0/var/log/extra/docker/containers/heat_engine/log/heat/heat-
> > > >>> engine.log.txt.gz#_2017-03-21_20_26_29_697
> > > >>>
> > > >>>
> > > >>> Looks like Heat isn't able to create Nova instances in the overcloud
> > > >>> due to "Host 'overcloud-novacompute-0' is not mapped to any cell'. 
> > > >>> This
> > > >>> means our cells initialization code for containers may not be quite
> > > >>> right... or there is a race somewhere.
> > > >>
> > > >>
> > > >> Here are some findings. I've looked at time measures from CI for
> > > >> https://review.openstack.org/#/c/448533/ which provided the most
> > > >> recent results:
> > > >>
> > > >> * gate-tripleo-ci-centos-7-ovb-ha [1]
> > > >>undercloud install: 23
> > > >>overcloud deploy: 72
> > > >>total time: 125
> > > >> * gate-tripleo-ci-centos-7-ovb-nonha [2]
> > > >>undercloud install: 25
> > > >>overcloud deploy: 48
> > > >>total time: 122
> > > >> * gate-tripleo-ci-centos-7-ovb-updates [3]
> > > >>undercloud install: 24
> > > >>overcloud deploy: 57
> > > >>total time: 152
> > > >> * gate-tripleo-ci-centos-7-ovb-containers-oooq-nv [4]
> > > >>undercloud install: 28
> > > >>overcloud deploy: 48
> > > >>total time: 165 (timeout)
> > > >>
> > > >> Looking at the undercloud & overcloud install times, the most task
> > > >> consuming tasks, the containers job isn't doing that bad compared to
> > > >> other OVB jobs. But looking closer I could see that:
> > > >> - the containers job pulls docker images from dockerhub, this process
> > > >> takes roughly 18 min.
> > > >
> > > >
> > > > I think we can optimize this a bit by having the script that populates
> > > the
> > > > local
> > > > registry in the overcloud job to run in parallel. The docker daemon can
> > > do
> > > > multiple pulls w/o problems.
> > > >
> > > >> - the overcloud validate task takes 10 min more than it should because
> > > >> of the bug Dan mentioned (a fix is in the queue at
> > > >> https://review.openstack.org/#/c/448575/)
> > > >
> > > >
> > > > +A
> > > >
> > > >> - the postci takes a long time with quickstart, 13 min (4 min alone
> > > >> spent on docker log collection) whereas it takes only 3 min when using
> > > >> tripleo.sh
> > > >
> > > >
> > > > mmh, does this have anything to do with ansible being in between? Or is
> > > that
> > > > time specifically for the part that gets the logs?
> > > >
> > > >>
> > > >> Adding all these numbers, we're at about 40 min of additional time for
> > > >> oooq containers job which is enough to cross the CI job limit.
> > > >>
> > > >> There is certainly a lot of room for optimization here and there and
> > > >> I'll explore how we can speed up the containers CI job over the next
> > > >
> > > >
> > > > Thanks a lot for the update. The time break down is fantastic,
> > > > Flavio
> > >
> > > TBH the problem is far from being solved:
> > >
> > > 1. Click on https://status-tripleoci.rhcloud.com/
> > > 2. Select gate-tripleo-ci-centos-7-ovb-containers-oooq-nv
> > >
> > > Container job has been 

Re: [openstack-dev] [TripleO] spec-lite process for tripleo

2017-03-29 Thread Steven Hardy
On Tue, Mar 28, 2017 at 12:09:43PM -0400, Emilien Macchi wrote:
> Bringing an old topic on the table.
> 
> We might have noticed:
> 
> 1. Some tripleo-specs take huge amount of time before getting merged
> (or even reviewed). We have been asking folks to review them every
> week but unfortunately they don't get much attraction (# of core
> reviewers versus # of folks actually reviewing specs).
> 2. Some folks spend a lot of time writing TripleO specs and wait for
> feedback before starting some implementation (like proof of concept).
> 
> Because TripleO like innovation and also moving fast, I think it's
> time to bring the tripleo-specs topic on the table:
> 
> 1. If you have an idea, don't feel obliged to write a specs. Create a
> blueprint on launchpad, announce it on the ML and start writing code
> (can be real implementation or just a simple PoC). Feedback will be
> given in the classic code review.

+1 I for one have been burnt more than once spending significant time
on a spec only to find our collective understanding changes after actual
code exists.

For things related to interfaces a spec can be helpful, but I think it's
often faster to raise a blueprint with relatively few details and work on a
prototype that clarifies the direction, particularly if such code patches
can be generated fairly quickly.

> 2. If you still want to write a spec, please make it readable and
> communicate about it. If your spec is 900 lines long and not announced
> anywhere, there is an high change that it will never been reviewed.

I agree - I think a common mistake is to get bogged down in implementation
detail when writing (and reviewing) a spec, so I definitely favor a
clearly expressed summary of the problem, an overview of the proposed
direction (including any major interface changes), and clarification of any
user/dev visible impact.  None of this requires much focus at all on the
details of the implementation IMO.

Thanks for raising this Emilien, hopefully this will help us move a little
faster in future!

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Tripleo] Status of monitoring tools

2017-03-28 Thread Steven Hardy
On Tue, Mar 28, 2017 at 04:34:09PM +0530, Sanjay Upadhyay wrote:
>Hi Folks,
>I recently started work with a requirement to integrate a security tool
>with monitoring framework for alerting and logging. I fail to  find
>relevant docs in this direction. After looking around for more
>information, it seems we do not have the server side implementation at
>all. 
>I have created a bug for
>this https://bugs.launchpad.net/tripleo/+bug/1676407. IMO,
>architecturally we have no place for opstools in the tripleo deployment,
>yet. 

I think we are mixing two different problems here - one is the current
composable services integration for fluentd and sensu isn't documented, we
should fix that.

>Ideally, there could be a role like MonServer, which can be a specific
>role for all the monitoring services (server side). If one wants to reduce
>the no of nodes can probably have all the monitoring services on the
>controller node. 

This is a different problem, and IIRC we agreed that only client side
support for monitoring tools was in-scope for TripleO.  The expectation is
that folks will stand up a separate logging/monitoring environment, or that
they will have existing servers to integrate with.

There have been discussions around how to deploy such services, and AFAIK
the suggestion is to use this ansible solution:

https://github.com/centos-opstools/opstools-ansible

AFAIK there is no plan to deploy these tools via TripleO (and IMO that's
correct, we don't really want to duplicate efforts here).

Thanks!

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] How to Preview the Overcloud Stack?

2017-03-28 Thread Steven Hardy
On Mon, Mar 27, 2017 at 01:43:39PM -0700, Dan Sneddon wrote:
> I've been trying to figure out a workflow for previewing the results of
> importing custom templates in an overcloud deployment (without actually
> deploying). For instance, I am overriding some parameters using custom
> templates, and I want to make sure those parameters will be expressed
> correctly when I deploy.
> 
> I know about "heat stack-preview", but between the complexity of the
> overcloud stack and the jinja2 template processing, I can't figure out a
> way to preview the entire overcloud stack.
> 
> Is this possible? If not, any hints on what would it take to write a
> script that would accomplish this?

Yes this is possible, but when I tested it I ran into this bug:

https://bugs.launchpad.net/heat/+bug/1676823

Which it seems from IRC discussion may be a duplicate of:

https://bugs.launchpad.net/heat/+bug/1669571

I'll re-test later with the patch proposed applied, but the basic steps
are:

1. Get a rendered copy of the tripleo-heat-templates

There are two ways to do this, either run ./tools/process-templates.py
in a local t-h-t tree, or create a plan (either via openstack overcloud
plan create, or openstack overcloud plan deploy --update-plan-only).

If you do this via creating a plan you can download the rendered files from
swift, e.g mkdir tmp-tht; cd tmp-tht; swift download overcloud

2. Run the stack preview

To do this, you need to generate an environment file with all the passwords
normally created by tripleo-common.  The easiest way to do it is to look
at an existing deployment and run "mistral environment-get overcloud", then
copy/paste and sed an environment like: http://paste.openstack.org/show/604475/

Then you just run the preview like:

openstack stack create test --dry-run -t overcloud.yaml -e 
overcloud-resource-registry-puppet.yaml -e dummy_passwords.yaml

This will break due to the bug above, but in the past it's worked fine for
me, and as mentioned by Saravanan it's also possible to do a template
validate:

openstack orchestration template validate --show-nested -t overcloud.yaml -e 
overcloud-resource-registry-puppet.yaml -e dummy_passwords.yaml

Hopefully we can confirm the heat bugfix later then you'll be able to use
one of the above to do what you need.

Thanks,

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [dib][heat] dib-utils/dib-run-parts/dib v2 concern

2017-03-16 Thread Steven Hardy
On Thu, Mar 16, 2017 at 10:30:48AM -0500, Gregory Haynes wrote:
> On Thu, Mar 16, 2017, at 05:18 AM, Steven Hardy wrote:
> > On Wed, Mar 15, 2017 at 04:22:37PM -0500, Ben Nemec wrote:
> > > While looking through the dib v2 changes after the feature branch was 
> > > merged
> > > to master, I noticed this commit[1], which bring dib-run-parts back into 
> > > dib
> > > itself.  Unfortunately I missed the original proposal to do this, but I 
> > > have
> > > some concerns about the impact of this change.
> > > 
> > > Originally the split was done so that dib-run-parts and one of the
> > > os-*-config projects (looks like os-refresh-config) that depends on it 
> > > could
> > > be included in a stock distro cloud image without pulling in all of dib.
> > > Note that it is still present in the requirements of orc: 
> > > https://github.com/openstack/os-refresh-config/blob/master/requirements.txt#L5
> > > 
> > > Disk space in a distro cloud image is at a premium, so pulling in a 
> > > project
> > > like diskimage-builder to get one script out of it was not acceptable, at
> > > least from what I was told at the time.
> > > 
> > > I believe this was done so a distro cloud image could be used with Heat 
> > > out
> > > of the box, hence the heat tag on this message.  I don't know exactly what
> > > happened after we split out dib-utils, so I'm hoping someone can confirm
> > > whether this requirement still exists.  I think Steve was the one who made
> > > the original request.  There were a lot of Steves working on Heat at the
> > > time though, so it's possible I'm wrong. ;-)
> > 
> > I don't think I'm the Steve you're referring to, but I do have some
> > additional info as a result of investigating this bug:
> > 
> > https://bugs.launchpad.net/tripleo/+bug/1673144
> > 
> > It appears we have three different versions of dib-run-parts on the
> > undercloud (and, presumably overcloud nodes) at the moment, which is a
> > pretty major headache from a maintenance/debugging perspective.
> > 
> 
> I looked at the bug and I think there may only be two different
> versions? The versions in /bin and /usr/bin seem to come from the same
> package (so I hope they are the same version). I don't understand what
> is going on with the ./lib version but that seems like either a local
> package / checkout or something else non-dib related.
> 
> Two versions is certainly less than ideal, though :).

No I think there are four versions, three unique:

(undercloud) [stack@undercloud ~]$ rpm -qf /usr/bin/dib-run-parts
dib-utils-0.0.11-1.el7.noarch
(undercloud) [stack@undercloud ~]$ rpm -qf /bin/dib-run-parts
dib-utils-0.0.11-1.el7.noarch
(undercloud) [stack@undercloud ~]$ rpm -qf 
/usr/lib/python2.7/site-packages/diskimage_builder/lib/dib-run-parts
diskimage-builder-2.0.1-0.20170314023517.756923c.el7.centos.noarch
(undercloud) [stack@undercloud ~]$ rpm -qf /usr/local/bin/dib-run-parts
file /usr/local/bin/dib-run-parts is not owned by any package

/usr/bin/dib-run-parts and /bin/dib-run-parts are the same file, owned by
dib-utils

/usr/lib/python2.7/site-packages/diskimage_builder/lib/dib-run-parts is
owned by diskimage-builder

/usr/local/bin/dib-run-parts is the mystery file presumed from image
building

But the exciting thing from a rolling-out-bugfixes perspective is that the
one actually running via o-r-c isn't either of the packaged versions (doh!)
so we probably need to track down which element is installing it.

This is a little OT for this thread (sorry), but hopefully provides more
context around my concerns about creating another fork etc.

> > However we resolve this, *please* can we avoid permanently forking the
> > tool, as e.g in that bug, where do I send the patch to fix leaking
> > profiledir directories?  What package needs an update?  What is
> > installing
> > the script being run that's not owned by any package?
> > 
> > Yes, I know the answer to some of those questions, but I'm trying to
> > point
> > out duplicating this script and shipping it from multiple repos/packages
> > is
> > pretty horrible from a maintenance perspective, especially for new or
> > casual contributors.
> > 
> 
> I agree. You answered my previous question of whether os-refresh-config
> is still in use (sounds like it definitely is) so this complicates
> things a bit.
> 
> > If we have to fork it, I'd suggest we should rename the script to avoid
> > the
> > confusion I outline in the bug above, e.g one script -> one repo -> one
> > package?
> 
> I really like this idea of renami

Re: [openstack-dev] [dib][heat] dib-utils/dib-run-parts/dib v2 concern

2017-03-16 Thread Steven Hardy
On Wed, Mar 15, 2017 at 04:22:37PM -0500, Ben Nemec wrote:
> While looking through the dib v2 changes after the feature branch was merged
> to master, I noticed this commit[1], which bring dib-run-parts back into dib
> itself.  Unfortunately I missed the original proposal to do this, but I have
> some concerns about the impact of this change.
> 
> Originally the split was done so that dib-run-parts and one of the
> os-*-config projects (looks like os-refresh-config) that depends on it could
> be included in a stock distro cloud image without pulling in all of dib.
> Note that it is still present in the requirements of orc: 
> https://github.com/openstack/os-refresh-config/blob/master/requirements.txt#L5
> 
> Disk space in a distro cloud image is at a premium, so pulling in a project
> like diskimage-builder to get one script out of it was not acceptable, at
> least from what I was told at the time.
> 
> I believe this was done so a distro cloud image could be used with Heat out
> of the box, hence the heat tag on this message.  I don't know exactly what
> happened after we split out dib-utils, so I'm hoping someone can confirm
> whether this requirement still exists.  I think Steve was the one who made
> the original request.  There were a lot of Steves working on Heat at the
> time though, so it's possible I'm wrong. ;-)

I don't think I'm the Steve you're referring to, but I do have some
additional info as a result of investigating this bug:

https://bugs.launchpad.net/tripleo/+bug/1673144

It appears we have three different versions of dib-run-parts on the
undercloud (and, presumably overcloud nodes) at the moment, which is a
pretty major headache from a maintenance/debugging perspective.

However we resolve this, *please* can we avoid permanently forking the
tool, as e.g in that bug, where do I send the patch to fix leaking
profiledir directories?  What package needs an update?  What is installing
the script being run that's not owned by any package?

Yes, I know the answer to some of those questions, but I'm trying to point
out duplicating this script and shipping it from multiple repos/packages is
pretty horrible from a maintenance perspective, especially for new or
casual contributors.

If we have to fork it, I'd suggest we should rename the script to avoid the
confusion I outline in the bug above, e.g one script -> one repo -> one
package?

Thanks!

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo][heat] Heat memory usage in the TripleO gate: Ocata edition

2017-03-15 Thread Steven Hardy
On Tue, Mar 14, 2017 at 10:21:54PM -0400, Emilien Macchi wrote:
> On Tue, Mar 14, 2017 at 4:06 PM, Zane Bitter  wrote:
> > Following up on the previous thread:
> >
> > http://lists.openstack.org/pipermail/openstack-dev/2017-January/109748.html
> >
> > Here is the latest data, which includes the Ocata release:
> >
> > https://fedorapeople.org/~zaneb/tripleo-memory/20170314/heat_memused.png
> >
> > As you can see, there has been one jump in memory usage. This was due to the
> > TripleO patch https://review.openstack.org/#/c/425717/
> 
> Since Contrail is optional when deploying TripleO, is there an easy
> way to disable the endpoints by default and activate them only when we
> enable Contrail services?

Not any easy way at the moment AFAIK - I think we need a way to fix this
though, I raised a bug:

https://bugs.launchpad.net/tripleo/+bug/1673042

I think there are a few ways we could potentially rework things to
dynamically generate the endpoint_map.yaml, but the easiest (and least
expensive from a heat memory perspective) is probably to use jinja2

If we try to do the transformation in heat we have a chicken/egg
problem (the EndpointMap is created before the per-role *ServiceChain
resources in overcloud.j2.yaml, so we can't do the normal pattern of
composing lists of per-service things from *ServiceChain), which is
unfortunate, but I can't currently think of a good way around it.

> > Unlike previous increases in memory usage, I was able to warn of this one in
> > the advance, and it was deemed an acceptable trade-off. The reasons for the
> > increase are unknown - the addition of more stuff to the endpoint map seemed
> > like a good bet, but one attempt to mitigate that[1] had no effect and I'm
> > increasingly unconvinced that this could account for the magnitude of the
> > increase.
> >
> > In any event, memory usage remains around the 1GiB level, none of the other
> > complexity increases during Ocata have had any discernible effect, and Heat
> > has had no memory usage regressions.
> >
> > Stay tuned for the next exciting edition, in which I try to figure out how
> > to do more than 3 colors on the plot.

Thanks for the analysis Zane, it's much appreciated!

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [TripleO][release][deployment] Packaging problems due to branch/release ordering

2017-03-08 Thread Steven Hardy
Hi all,

I wanted to raise visibility of a problem we're experiencing in TripleO CI,
which I think will potentially affect other projects that consume trunk
builds from the RDO repositories (and potentially other distros too I
guess)

The problem is that we tag our final ocata releases after branching
stable/ocata, but there is then a period prior to cutting pike-1 milestone
releases (or some other intermediate release for projects not following the
milestone model) where trunk package builds n-v-r is older than the stable
branches.

This presents a big problem if you want to test package-derived updates
from stable/$foo to master, as the pacakges don't get updated where the
installed stable one is newer than the one built from master.

I raised this bug initially thinking it was specific to the puppet-*
projects, but it seems from subsequent discussion that it's a broader issue
that may impact many OpenStack projects.

https://bugs.launchpad.net/tripleo/+bug/1669462

I'm not clear on the best path forward at this point, but the simplest one
suggested so far is to simply tag a new pre-milestone/alpha release for all
master branches, which will enable testing upgrades to master.

I know we don't expect to fully support upgrades to pre-milestone releases,
but the requirement here is to simply enable testing them.

A side-benefit of this regular testing e.g via CI is we'll find upgrade issues
much faster than waiting for one or more milestone releases to happen then
doing an upgrade-debug firedrill later in the cycle (which has been bad for
project and deployment teams IMO, so it'd be good to figure out this first
step to enable earlier testing of upgrades to the development release).

Any thoughts on how we can resolve this would be much appreciated, thanks!

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Heat] Selectively disabling deployment resources

2017-03-08 Thread Steven Hardy
On Tue, Mar 07, 2017 at 02:34:50PM -0500, James Slagle wrote:
> I've been working on this spec for TripleO:
> https://review.openstack.org/#/c/431745/
>
> which allows users to selectively disable Heat deployment resources
> for a given server (or server in the case of a *DeloymentGroup
> resource).
> 
> Some of the main use cases in TripleO for such a feature are scaling
> out compute nodes where you do not need to rerun Puppet (or make any
> changes at all) on non-compute nodes, or to exclude nodes from hanging
> a stack-update if you know they are unreachable or degraded for some
> reason. There are others, but those are 2 of the major use cases.

Thanks for raising this, I know it's been a pain point for some users of
TripleO.

However I think we're conflating two different issues here:

1. Don't re-run puppet (or yum update) when no other changes have happened

2. Disable deployment resources when changes have happened

(1) is actually very simple, and is the default behavior of Heat
(SoftwareDeployment resources never update unless either the config
referenced or the input_values change).  We just need to provide an option
to disable the DeployIdentifier/UpdateIdentifier timestamps from being
generated in tripleoclient.

(2) is harder, because the whole point of SoftwareDeploymentGroup is to run
the exact same configuration on a group of servers, with no exceptions.

As Zane mentions (2) is related to the way ResourceGroup works, but the
problem here isn't ResourceGroup per-se, as it would in theory be pretty
easy to reimplement SoftwareDeploymentGroup to generate it's nested stack
without inheriting from ResourceGroup (which may be needed if you want a
flag to make existing Deployments in the group immutable).

I'd suggest we solve (1) and do some testing, it may be enough to solve the
"don't change computes on scale-out" case at least?

One way to potentially solve (2) would be to unroll the
SoftwareDeploymentGroup resources and instead generate the Deployment
resources via jinja2 - this would enable completely removing them on update
if that's what is desired, similar to what we already do for upgrades to
e.g not upgrade any compute nodes.

Steve

> 
> I started by taking an approach that would be specific to TripleO.
> Basically mapping all the deployment resources to a nested stack
> containing the logic to selectively disable servers from the
> deployment (using yaql) based on a provided parameter value. Here's
> the main patch: https://review.openstack.org/#/c/442681/
> 
> After considering that complexity, particularly the yaql expression,
> I'm wondering if it would be better to add this support natively to
> Heat.
> 
> I was looking at the restricted_actions key in the resource_registry
> and was thinking this might be a reasonable place to add such support.
> It would require some changes to how restricted_actions work.
> 
> One change would be a method for specifying that restricted_actions
> should not fail the stack operation if an action would have otherwise
> been triggered. Currently the behavior is to raise an exception and
> mark the stack failed if an action needs to be taken but has been
> marked restricted. That would need to be tweaked to allow specifying
> that that we don't want the stack to fail. One thought would be to
> change the allowed values of restricted_actions to:
> 
> replace_fail
> replace_ignore
> update_fail
> update_ignore
> replace
> update
> 
> where replace and update were synonyms for replace_fail/update_fail to
> maintain backwards compatibility.
> 
> Another change would be to add logic to the Deployment resources
> themselves to consider if any restricted_actions have been set on an
> Server resources before triggering an updated deployment for a given
> server.
> 
> It also might be nice to allow specifying restricted_actions on the
> server's name property (which typically is the hostname) instead of
> having to use the resource name. The reason being is that it is not
> really feasibly to expect operators/users to have to represent the
> full nested_stack structure in their resource_registry. They would
> have to query and record nested_stack names just to refer to a given
> server resource. Each ResourceGroup nested stack would be have to be
> individually represented, etc. Unless there is another way I'm
> overlooking.
> 
> Whether or not the restricted_actions approach is taken, is Heat
> interested in this functionality natively? I think it would make for a
> much cleaner implementation than something TripleO specific. I can
> work on a Heat spec if there's interest, though I'd like to get some
> early feedback.
> 
> Thanks.
> 
> -- 
> -- James Slagle
> --
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-- 

Re: [openstack-dev] [deployment][TripleO][kolla][ansible][fuel] Next steps for cross project collaboration

2017-03-02 Thread Steven Hardy
On Thu, Mar 02, 2017 at 05:06:34AM +, Brandon B. Jozsa wrote:
>+1 for the monthly meetings and a long standing, cross-team, collaborative
>etherpad.

Yes the etherpad is a good idea, thanks!

I guess we'll want one per-release, so I created one for Pike here:

https://etherpad.openstack.org/p/deployment-pike

Everyone please feel free to add relevant content/links there, thanks!

I also went ahead and created the WG wiki page:

https://wiki.openstack.org/wiki/Deployment

Which is linked from:

https://wiki.openstack.org/wiki/Category:Working_Groups

I referenced the agreed [deployment] openstack-dev tag, and the new IRC
channel which Steve set up (thanks) #openstack-deployment

Again, please feel free to edit if I missed anything.

Please can anyone wanting to help with organizing (e.g chairing meetings if
we have them, proactively seeking cross-project things to discuss, and
helping with sessions when we meet f2f next time) please add your name and
email to the Deployment Wiki page.

>It’s challenging for some projects who already collaborate heavily with
>communities outside of OpenStack to take on additional heavy meeting
>cycles.

Yeah I think there's enough interest in semi-regular meetings that we may
want to arrange them, but lets see what topics are added (to the etherpad
above), then we can poll for a suitable day/time when there's enough
content?

I think in many cases ML discussion combined with IRC will be enough, but
I'm also happy to arrange a regular monthly meeting if folks feel that will
be worthwhile.

>The PTG deployment cross-team collaboration was really awesome. I can’t
>wait to see what this team is able to do together! Very happy to be a part
>of this effort!

Agreed, I'm really happy to see these first-steps to more effective
collaboration, lets keep it going! :)

Thanks,

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [deployment][TripleO][kolla][ansible][fuel] Next steps for cross project collaboration

2017-02-27 Thread Steven Hardy
On Mon, Feb 27, 2017 at 09:25:46AM -0700, Steven Dake wrote:
>comments inline.
>On Mon, Feb 27, 2017 at 9:02 AM, Steven Hardy <sha...@redhat.com> wrote:
> 
>  Hi all,
> 
>  Over the recent PTG, and previously at the design summit in Barcelona,
>  we've had some productive cross-project discussions amongst the various
>  deployment teams.
> 
>  It's clear that we share many common problems, such as patterns for
>  major
>  version upgrades (even if the workflow isn't identical we've all
>  duplicated
>  effort e.g around basic nova upgrade workflow recently), container
>  images
>  and other common building blocks for configuration management.
> 
>  Here's a non-exhaustive list of sessions where we had some good
>  cross-project discussion, and agreed a number of common problems where
>  collaboration may be possible:
> 
>  https://etherpad.openstack.org/p/ansible-config-mgt
> 
>  https://etherpad.openstack.org/p/tripleo-kolla-kubernetes
> 
>  https://etherpad.openstack.org/p/kolla-pike-ptg-images
> 
>  https://etherpad.openstack.org/p/fuel-ocata-fuel-tripleo-integration
> 
>  If there is interest in continuing the discussions on a more regular
>  basis,
>  I'd like to propose we start a cross-project working group:
> 
>  https://wiki.openstack.org/wiki/Category:Working_Groups
> 
>  If I go ahead and do this is "deployment" a sufficiently project-neutral
>  term to proceed with?
> 
>WFM.  Anything longer such as "openstack-deployment-tools" doesn't show
>up very well in IRC clients.  Forgive the bikeshedding;
>"openstack-deploy-tools" is very project-neutral and shows up well in IRC
>clients.
> 
> 
>  I'd suggest we start with an informal WG, which it seems just requires
>  an
>  update to the wiki, e.g no need for any formal project team at this
>  point?
> 
>WFM.  Since we aren't really a project team but a collection of projects
>working together, I don't think we need further formalization.
> 
> 
>  Likewise I know some folks have expressed an interest in an IRC channel
>  (openstack-deployment?), I'm happy to start with the ML but open to IRC
>  also if someone is willing to set up the channel.
> 
>+1 - I think an IRC channel would be the best way for real time
>communication.
> 
> 
>  Perhaps we can start by using the tag "deployment" in all cross-project
>  ML
>  traffic, then potentially discuss IRC (or even regular meetings) if it
>  becomes apparrent these would add value beyond ML discussion?
> 
>[deploy-tools] may be better unless that breaks people's email clients.
>I am out of bandwidth personally for meetings, although others may be
>interested in a meeting.  I'm not sure what value a regular meeting would
>have and would need a chair, which may result in an inability to obtain
>neutral ground.
>IMO IRC and ML would be sufficient for this CP effort, however others may
>have different viewpoints.

No strong opinion, but FWIW I chose "deployment" because I'd like to see
collaboration not only around tools, but also around experiences and
abstract workflow (e.g we could have all shared experiences around, say,
nova upgrades without necessarily focussing on any one tool).

"deployment" seems like a catch-all and it uses less characters in the
subject line ;)  But I'm happy to go with the consensus here.

I agree ML/IRC should be sufficient, at least in the first instance.

Thanks!

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [deployment][TripleO][kolla][ansible][fuel][openstack-ansible] Next steps for cross project collaboration

2017-02-27 Thread Steven Hardy
Adding openstack-ansible to the subject tags (apologies, sdake pointed out
I got that wrong in the initial mail, here's the top of this thread):

http://lists.openstack.org/pipermail/openstack-dev/2017-February/112960.html

On Mon, Feb 27, 2017 at 04:02:54PM +, Steven Hardy wrote:
> Hi all,
> 
> Over the recent PTG, and previously at the design summit in Barcelona,
> we've had some productive cross-project discussions amongst the various
> deployment teams.
> 
> It's clear that we share many common problems, such as patterns for major
> version upgrades (even if the workflow isn't identical we've all duplicated
> effort e.g around basic nova upgrade workflow recently), container images
> and other common building blocks for configuration management.
> 
> Here's a non-exhaustive list of sessions where we had some good
> cross-project discussion, and agreed a number of common problems where
> collaboration may be possible:
> 
> https://etherpad.openstack.org/p/ansible-config-mgt
> 
> https://etherpad.openstack.org/p/tripleo-kolla-kubernetes
> 
> https://etherpad.openstack.org/p/kolla-pike-ptg-images
> 
> https://etherpad.openstack.org/p/fuel-ocata-fuel-tripleo-integration
> 
> If there is interest in continuing the discussions on a more regular basis,
> I'd like to propose we start a cross-project working group:
> 
> https://wiki.openstack.org/wiki/Category:Working_Groups
> 
> If I go ahead and do this is "deployment" a sufficiently project-neutral
> term to proceed with?
> 
> I'd suggest we start with an informal WG, which it seems just requires an
> update to the wiki, e.g no need for any formal project team at this point?
> 
> Likewise I know some folks have expressed an interest in an IRC channel
> (openstack-deployment?), I'm happy to start with the ML but open to IRC
> also if someone is willing to set up the channel.
> 
> Perhaps we can start by using the tag "deployment" in all cross-project ML
> traffic, then potentially discuss IRC (or even regular meetings) if it
> becomes apparrent these would add value beyond ML discussion?
> 
> Please follow up here if anyone has other/better ideas on how to facilitate
> ongoing cross-team discussion and I'll do my best to help move things
> forward.
> 
> Thanks!
> 
> Steve
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-- 
Steve Hardy
Red Hat Engineering, Cloud

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [deployment][TripleO][kolla][ansible][fuel] Next steps for cross project collaboration

2017-02-27 Thread Steven Hardy
Hi all,

Over the recent PTG, and previously at the design summit in Barcelona,
we've had some productive cross-project discussions amongst the various
deployment teams.

It's clear that we share many common problems, such as patterns for major
version upgrades (even if the workflow isn't identical we've all duplicated
effort e.g around basic nova upgrade workflow recently), container images
and other common building blocks for configuration management.

Here's a non-exhaustive list of sessions where we had some good
cross-project discussion, and agreed a number of common problems where
collaboration may be possible:

https://etherpad.openstack.org/p/ansible-config-mgt

https://etherpad.openstack.org/p/tripleo-kolla-kubernetes

https://etherpad.openstack.org/p/kolla-pike-ptg-images

https://etherpad.openstack.org/p/fuel-ocata-fuel-tripleo-integration

If there is interest in continuing the discussions on a more regular basis,
I'd like to propose we start a cross-project working group:

https://wiki.openstack.org/wiki/Category:Working_Groups

If I go ahead and do this is "deployment" a sufficiently project-neutral
term to proceed with?

I'd suggest we start with an informal WG, which it seems just requires an
update to the wiki, e.g no need for any formal project team at this point?

Likewise I know some folks have expressed an interest in an IRC channel
(openstack-deployment?), I'm happy to start with the ML but open to IRC
also if someone is willing to set up the channel.

Perhaps we can start by using the tag "deployment" in all cross-project ML
traffic, then potentially discuss IRC (or even regular meetings) if it
becomes apparrent these would add value beyond ML discussion?

Please follow up here if anyone has other/better ideas on how to facilitate
ongoing cross-team discussion and I'll do my best to help move things
forward.

Thanks!

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [stable][heat] Heat stable-maint additions

2017-02-20 Thread Steven Hardy
On Fri, Feb 17, 2017 at 10:07:38PM +0530, Rabi Mishra wrote:
>On Fri, Feb 17, 2017 at 8:44 PM, Matt Riedemann <mriede...@gmail.com>
>wrote:
> 
>  On 2/15/2017 12:40 PM, Zane Bitter wrote:
> 
>Traditionally Heat has given current and former PTLs of the project +2
>rights on stable branches for as long as they remain core reviewers.
>Usually I've done that by adding them to the heat-release group.
> 
>At some point the system changed so that the review rights for these
>branches are no longer under the team's control (instead, the
>stable-maint core team is in charge), and as a result at least the
>current PTL (Rico Lin) and the previous PTL (Rabi Mishra), and
>possibly
>others (Thomas Herve, Sergey Kraynev), haven't been added to the
>group.
>That's slowing down getting backports merged, amongst other things.
> 
>I'd like to request that we update the membership to be the same as
>https://review.openstack.org/#/admin/groups/152,members
> 
>Rabi Mishra
>    Rico Lin
>Sergey Kraynev
>Steve Baker
>Steven Hardy
>Thomas Herve
>Zane Bitter
> 
>I also wonder if the stable-maint team would consider allowing the
>Heat
>team to manage the group membership again if we commit to the criteria
>above (all current/former PTLs who are also core reviewers) by just
>adding that group as a member of heat-stable-maint?
> 
>thanks,
>Zane.
> 
>
> __
>OpenStack Development Mailing List (not for usage questions)
>Unsubscribe:
>openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
>  Reviewing patches on stable branches have different guidelines,
>  expressed here [1]. In the past when this comes up I've asked if the
>  people being asked to be added to the stable team for a project have
>  actually been doing reviews on the stable branches to show they are
>  following the guidelines, and at times when this has come up the people
>  proposed (usually PTLs) haven't, so I've declined at that time until
>  they start actually doing reviews and can show they are following the
>  guidelines.
> 
>  There are reviewstats tools for seeing the stable review numbers for
>  Heat, I haven't run that though to check against those proposed above,
>  but it's probably something I'd do first before just adding a bunch of
>  people.
> 
>Would it not be appropriate to trust the stable cross-project liaison for
>heat when he nominates stable cores? Having been the PTL for Ocata and one
>who struggled to get the backports on time for a stable release as
>planned, I don't recall seeing many reviews from stable maintenance core
>team for them to be able to judge the quality of reviews. So I don't think
>it's fair to decide eligibility only based on the review numbers and
>stats.

I agree - those nominated by Zane are all highly experienced reviewers and
as ex-PTLs are well aware of the constraints around stable backports and
stable release management.

I do agree the requirements around reviews for stable branches are very
different, but I think we need to assume good faith here and accept we have
a bottleneck which can be best fixed by adding some folks we *know* are
capable of exercising sound judgement to the stable-maint team for heat.

I respect the arguments made by the stable-maint core folks, and I think we
all understand the reason for these concerns, but ultimately unless folks
outside the heat core team are offering to help with reviews directly, I
think it's a little unreasonable to block the addition of these reviewers,
given they've been proposed by the current stable liason who I think is in
the best position to judge the suitability of candidates.

Thanks,

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [TripleO] FFE request for composable upgrades

2017-02-03 Thread Steven Hardy
Hi all,

There's been some discussion on IRC about this, but I wanted to formally
request a FFE for the remaining composable upgrades work and provide a
status update:

https://blueprints.launchpad.net/tripleo/+spec/overcloud-upgrades-per-service

We've made pretty good progress on this, and the ansible based upgrade
architecture, along with tasks for most overcloud services has landed.

However we've been blocked for a while due to the complexity of upgrading
nova - some work was required to get puppet support for cellsv2, and to
figure out the required steps related to the placement API integration.

I think that's all close now, and folks have been testing locally with some
success:

https://review.openstack.org/#/c/405241/

The other missing piece is pacemaker support, which again has been locally
tested but not yet proven via CI:

https://review.openstack.org/#/c/403397/

The final part is deploying the upgrade script for operator driven upgrades
when automated upgrades are disabled (this is similar to previous releases,
but required some rework due to the new architecture):

https://review.openstack.org/#/c/424905/
https://review.openstack.org/#/c/419886/

There are also a few other various fixes and cleanups outstanding:

https://review.openstack.org/#/c/428309
https://review.openstack.org/#/c/428310/
https://review.openstack.org/#/c/428348/
https://review.openstack.org/#/c/428349/
https://review.openstack.org/#/c/424715/

I think that's all of the outstanding work, but we're still testing and
trying to flush out any remaining issues - if I've missed anything
hopefully other folks on the upgrades squad can follow up with more
details.

Overall I think this is OK for the final release, but we should aim to land
the patches above ASAP so we can focus on testing and any remaining
bugfixes.

Thanks!

-- 
Steve Hardy
Red Hat Engineering, Cloud

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] tripleo-heat-templates, vendor plugins and the new hiera hook

2017-01-25 Thread Steven Hardy
On Wed, Jan 25, 2017 at 02:59:42PM +0200, Marios Andreou wrote:
> Hi, as part of the composable upgrades workflow shaping up for Newton to
> Ocata, we need to install the new hiera hook that was first added with
> [1] and disable the old hook and data as part of the upgrade
> initialization [2]. Most of the existing hieradata was ported to use the
> new hook in [3]. The deletion of the old hiera data is necessary for the
> Ocata upgrade, but it also means it will break any plugins still using
> the 'old' os-apply-config hiera hook.
> 
> In order to be able to upgrade to Ocata any templates that define hiera
> data need to be using the new hiera hook and then the overcloud nodes
> need to have the new hook installed (installing is done in [2] as a
> matter of necessity, and that is what prompted this email in the first
> place). I've had a go at updating all the plugin templates that are
> still using the old hiera data with a review at [4] which I have -1 for now.
> 
> I'll try and reach out to some individuals more directly as well but
> wanted to get the review at [4] and this email out as a first step,

Thanks for raising this marios, and yeah it's unfortunate as we've had to
do a switch from the old to new hiera hook this release with out a
transition where both work.

I think we probably need to do the following:

1. Convert anything in t-h-t refering to the old hook to the new (seems you
have this in progress, we need to ensure it all lands before ocata)

2. Write a good release note for t-h-t explaining the change, referencing
docs which show how to convert to use the new hook

3. Figure out a way to make the 99-refresh-completed script signal failure
if anyone tries to deploy with the old hook (vs potentially silently
failing then hanging the deploy, which I think is what will happen atm).

I think ensuring a good error path should mitigate this change, since it's
fairly simple for folks to switch to the new hook provided we can document
it and point to those docs in the error I think.

Be good to get input from Dan on this too, as he might have ideas on how we
could maintain both hooks for one release.

Thanks!

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Proposing Honza Pokorny core on tripleo-ui

2017-01-25 Thread Steven Hardy
On Tue, Jan 24, 2017 at 08:52:51AM -0500, Emilien Macchi wrote:
> I have been discussed with TripleO UI core reviewers and it's pretty
> clear Honza's work has been valuable so we can propose him part of
> Tripleo UI core team.
> His quality of code and reviews make him a good candidate and it would
> also help the other 2 core reviewers to accelerate the review process
> in UI component.
> 
> Like usual, this is open for discussion, Tripleo UI core and TripleO
> core, please vote.

+1

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Proposing Sergey (Sagi) Shnaidman for core on tripleo-ci

2017-01-25 Thread Steven Hardy
On Tue, Jan 24, 2017 at 07:03:56PM +0200, Juan Antonio Osorio wrote:
>  Sagi (sshnaidm on IRC) has done significant work in TripleO CI (both
>  on the current CI solution and in getting tripleo-quickstart jobs for
>  it); So I would like to propose him as part of the TripleO CI core team.
>  I think he'll make a great addition to the team and will help move CI
>  issues forward quicker.

+1

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Update TripleO core members

2017-01-25 Thread Steven Hardy
On Mon, Jan 23, 2017 at 02:03:28PM -0500, Emilien Macchi wrote:
> Greeting folks,
> 
> I would like to propose some changes in our core members:
> 
> - Remove Jay Dobies who has not been active in TripleO for a while
> (thanks Jay for your hard work!).
> - Add Flavio Percoco core on tripleo-common and tripleo-heat-templates
> docker bits.
> - Add Steve Backer on os-collect-config and also docker bits in
> tripleo-common and tripleo-heat-templates.
> 
> Indeed, both Flavio and Steve have been involved in deploying TripleO
> in containers, their contributions are very valuable. I would like to
> encourage them to keep doing more reviews in and out container bits.
> 
> As usual, core members are welcome to vote on the changes.

+1

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Upstream backwards compatibility job for Newton oooq

2017-01-17 Thread Steven Hardy
On Tue, Jan 17, 2017 at 10:42:18AM -0500, Emilien Macchi wrote:
> On Tue, Jan 17, 2017 at 9:34 AM, mathieu bultel  wrote:
> > Hi Adriano
> >
> > On 01/17/2017 03:05 PM, Adriano Petrich wrote:
> >
> > So I want to make a backwards compatibility job upstream so from last scrum
> > I got the feeling that we should not be adding more stuff to the
> > experimental jobs due to lack of resources (and large queues)
> >
> > What kind of "test" do you want to add ?
> > I ask because since few days we have upstream an upgrade job that does:
> > master UC -> deploying a Newton OC with Newton OC + tht stable/newton ->
> > then upgrade the OC to master with tht master branch.
> > It's sounds like a "small backward compatibility" validation, but I'm not
> > sure if it's cover what you need.
> 
> While I understand what is the idea, I don't see the use case.
> In which case you want to deploy a old version of overcloud by using a
> recent undercloud?
> Why don't use deploy a stable undercloud to deploy a stable overcloud?

For development & test usage it's actually really useful - I can deploy any
version overcloud locally for testing (any kind of overcloud bugfixes etc,
not only testing upgrades), and it's even possible to deploy two overclouds
at once, with different versions, to easily do comparative testing.

This is something we get "for free" because TripleO is using Heat/Glance
etc which provide stable interfaces, and although I accept it's something
of a specialist use-case, I think it is a valid one.

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Upstream backwards compatibility job for Newton oooq

2017-01-17 Thread Steven Hardy
On Tue, Jan 17, 2017 at 02:48:27PM +, Adriano Petrich wrote:
>Mathieu,
>    That sounds exactly what we need. Do we run tempest or something on
>those to validate it?

It doesn't currently run tempest, only some basic sanity tests (crud
operations where we create some resources for each service before the
upgrade, then check they are still there after the upgrade is completed).

In future we could probably add more validation, but we're constrained by
walltime of the job.

As Mathieu says this does provide at least partial coverage of deploying an
old undercloud version (e.g Newton) with a latest (trunk/ocata) undercloud
- hopefully we can adjust the upgrade test coverage to meet your needs and
  avoid the overhead of a completely new job.

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Fixing Swift rings when upscaling/replacing nodes in TripleO deployments

2017-01-05 Thread Steven Hardy
On Thu, Jan 05, 2017 at 02:56:15PM +, arkady.kanev...@dell.com wrote:
> I have concern to rely on undercloud for overcloud swift.
> Undercloud is not HA (yet) so it may not be operational when disk failed or 
> swift overcloud node is added/deleted.

I think the proposal is only for a deploy-time dependency, after the
overcloud is deployed there should be no dependency on the undercloud
swift, because the ring data will have been copied to all the nodes.

During create/update operations you need the undercloud operational by
definition, so I think this is probably OK?

Steve
> 
> -Original Message-
> From: Christian Schwede [mailto:cschw...@redhat.com] 
> Sent: Thursday, January 05, 2017 6:14 AM
> To: OpenStack Development Mailing List 
> Subject: [openstack-dev] [TripleO] Fixing Swift rings when 
> upscaling/replacing nodes in TripleO deployments
> 
> Hello everyone,
> 
> there was an earlier discussion on $subject last year [1] regarding a bug 
> when upscaling or replacing nodes in TripleO [2].
> 
> Shortly summarized: Swift rings are built on each node separately, and if 
> adding or replacing nodes (or disks) this will break the rings because they 
> are no longer consistent across the nodes. What's needed are the previous 
> ring builder files on each node before changing the rings.
> 
> My former idea in [1] was to build the rings in advance on the undercloud, 
> and also using introspection data to gather a set of disks on each node for 
> the rings.
> 
> However, this changes the current way of deploying significantly, and also 
> requires more work in TripleO and Mistral (for example to trigger a ring 
> build on the undercloud after the nodes have been started, but before the 
> deployment triggers the Puppet run).
> 
> I prefer smaller steps to keep everything stable for now, and therefore I 
> changed my patches quite a bit. This is my updated proposal:
> 
> 1. Two temporary undercloud Swift URLs (one PUT, one GET) will be computed 
> before Mistral starts the deployments. A new Mistral action to create such 
> URLs is required for this [3].
> 2. Each overcloud node will try to fetch rings from the undercloud Swift 
> deployment before updating it's set of rings locally using the temporary GET 
> url. This guarantees that each node uses the same source set of builder 
> files. This happens in step 2. [4] 3. puppet-swift runs like today, updating 
> the rings if required.
> 4. Finally, at the end of the deployment (in step 5) the nodes will upload 
> their modified rings to the undercloud using the temporary PUT urls. 
> swift-recon will run before this, ensuring that all rings across all nodes 
> are consistent.
> 
> The two required patches [3][4] are not overly complex IMO, but they solve 
> the problem of adding or replacing nodes without changing the current 
> workflow significantly. It should be even easy to backport them if needed.
> 
> I'll continue working on an improved way of deploying Swift rings (using 
> introspection data), but using this approach it could be even done using 
> todays workflow, feeding data into puppet-swift (probably with some updates 
> to puppet-swift/tripleo-heat-templates to allow support for regions, zones, 
> different disk layouts and the like). However, all of this could be built on 
> top of these two patches.
> 
> I'm curious about your thoughts and welcome any feedback or reviews!
> 
> Thanks,
> 
> -- Christian
> 
> 
> [1]
> http://lists.openstack.org/pipermail/openstack-dev/2016-August/100720.html
> [2] https://bugs.launchpad.net/tripleo/+bug/1609421
> [3] https://review.openstack.org/#/c/413229/
> [4] https://review.openstack.org/#/c/414460/
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-- 
Steve Hardy
Red Hat Engineering, Cloud

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Adding a LateServices ResourceChain

2017-01-04 Thread Steven Hardy
On Tue, Jan 03, 2017 at 03:20:37PM -0500, Lars Kellogg-Stedman wrote:
> On Fri, Dec 23, 2016 at 02:21:00PM +0000, Steven Hardy wrote:
> > I commented on the bug, I'm not sure about this as it seems to overlap with
> > our service_config_settings interface, which IIRC landed slightly after
> > your previous patches for opstools things, and potentially provides a
> > cleaner way to approach this.
> 
> I'm not sure I see how to apply that, but let me further describe the
> use case and you can perhaps point me in the right direction.
> 
> > Perhaps you can point to some examples of this usage, then we can compare
> > with the service_config_settings approach?
> > 
> > I suspect the main difference is you need to append data for each service
> > to e.g the collectd configuration?
> 
> Let's take the existing Fluentd support as an example.  We want the
> ability for every service to provide a logging source configuation for
> Fluentd, which will get aggregated into a logging_sources
> list and then ultimately used in puppet-tripleo to populate a series
> of ::fluentd::config resources.
> 
> Currently, the aggregation happens in a combination of
> puppet/services/services.yaml (which aggregates the logging_source
> attribute from all the services in the service chain) and in
> overcloud.j2.yaml (which actually instantiates the hiera data).
> 
> With the LateServiceChain I've proposed, this could all be isolated
> inside the fluentd composable service: it would not be necessary to
> expose any of this logic in either services.yaml or overcloud.j2.yaml,
> leading.  Additionally, it would not require the fluentd service to
> have any a priori knowledge of what services were in use; it would
> simply aggregate any configuration information that is provided in the
> primary service chain.  It would also allow us to get rid of the
> "LoggingConfiguration" resource, which only exists as a way to expose
> certain parameter_defaults inside services.yaml.

Ok, so it's very similar to service_config_settings, except you need to
append to a list of settings for all services?

The problem I have with your current solution (as just mentioned on the
review) is it makes operators have to care about service ordering, because
it duplicates the per-role *Services parameters.

A key part of the custom roles work was to provide a simple interface that
enables operators to select specific services for each role, without having
to care at all about ordering (this is handled by the deployment steps in
the puppet profiles).

Thinking of ways around this, a couple of options come to mind:

1. Add a new "append" version of service_config_settings

(example based on https://review.openstack.org/#/c/411048)

  # In services/nova-compute.yaml
  service_config_settings_list:
  collectd:
tripleo::profile::base::metrics::collectd::collectd_plugins:
  - virt

  # In services/foo.yaml
  service_config_settings_list:
  collectd:
tripleo::profile::base::metrics::collectd::collectd_plugins:
  - foo

  Then in on the role running collectd, you'd get the hieradata key set
with a list containing "virt" and "foo" ?

This would require some data mangling in services.yaml, and a new interface
to the services templates, but it would leave the operator interfaces
unchanged?

2. Do the list manipulation in puppet, like we do for firewall rules

E.g see:

https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/ceilometer-api.yaml#L62

https://github.com/openstack/puppet-tripleo/blob/master/manifests/firewall/service_rules.pp#L32

This achieves the same logical result as the above, but it does the list
manipulation in the puppet profile instead of t-h-t.

I think either approach would be fine, but I've got a slight preference for
(1) as I think it may be more reusable in a future non-puppet world, e.g
for container deployments etc where we may not always want to use puppet.

Open to other suggestions, but would either of the above solve your
problem?

Thanks,

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] [ci] TripleO-Quickstart Transition to TripleO-CI Update and Invite:

2017-01-04 Thread Steven Hardy
Hi Harry,

On Tue, Jan 03, 2017 at 04:04:51PM -0500, Harry Rybacki wrote:
> Greetings All,
> 
> Folks have been diligently working on the blueprint[1] to prepare
> TripleO-Quickstart (OOOQ)[2] and TripleO-Quickstart-Extras[3] for
> their transition into TripleO-CI. Presently, our aim is to begin the
> actual transition to OOOQ on 4-Feb-2017. We are tracking our work on
> the RDO-Infra Trello board[4] and holding public discussion of key
> blockers on the team’s scrum etherpad[5].

Thanks for the update - can you please describe what "transition into
TripleO-CI" means?

I'm happy to see this work proceeding, but we have to be mindful that the
end of the development cycle (around the time you're proposing) is always
a crazy-busy time where folks are trying to land features and fixes.

So, we absolutely must avoid any CI outages around this time, thus I get
nervous talking about major CI transitions around the Release-candate
weeks ;)

https://releases.openstack.org/ocata/schedule.html

If we're talking about getting the jobs ready, then switching over to
primarily oooq jobs in early pike, that's great, but please lets ensure we
don't may any disruptive changes before the end of this (very short and
really busy) cycle.

> We are hosting weekly transition update meetings (1600-1700 UTC) and
> would like to invite folks to participate. Specifically, we are
> looking for at least one stakeholder in the existing TripleO-CI to
> join us as we prepare to migrate OOOQ. Attend and map out job/feature
> coverage to identify any holes so we can begin plugging them. Please
> reply off-list or reach out to me (hrybacki) on IRC to be added to the
> transition meeting calendar invite.

Why can't we discuss this in the weekly TripleO IRC meeting?

I think folks would be fine with having a standing item where we dicscuss
this transition (there is already a CI item, but I've rarely seen this
topic raised there).

https://wiki.openstack.org/wiki/Meetings/TripleO

Thanks!

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Adding a LateServices ResourceChain

2016-12-23 Thread Steven Hardy
On Thu, Dec 22, 2016 at 02:02:12PM -0500, Lars Kellogg-Stedman wrote:
> I've been working along with a few others on some "opstools"
> composable services for TripleO: that is, services that provide
> things like centralized logging, performance monitoring, or
> availability/health monitoring.
> 
> We've been running into a persistent problem with TripleO's
> architecture: our services in many cases need configuration
> information to be provided by other services in the stack, but there's
> no way to introspect this data from inside a composable service
> template.  This has led to some rather messy and invasive changes to
> things like puppet/services/services.yaml.
> 
> In https://review.openstack.org/#/c/413748/, I've proposed the
> addition of a secondary chain of services called LateServiceChain.
> This is, like the existing ServiceChain resource, just a Heat
> ResourceChain.  Unlike the existing ServiceChain, it receives an
> additional "RoleData" parameter that contains the role_data outputs
> from all the services realized in ServiceChain.

I commented on the bug, I'm not sure about this as it seems to overlap with
our service_config_settings interface, which IIRC landed slightly after
your previous patches for opstools things, and potentially provides a
cleaner way to approach this.

We originally added this because there was a need to wire configuration
data from any node to the nodes running e.g keystone, but it seems like the
use-case you're describing is very similar?

> This permits composable services in the LateServices chain to access
> per-service configuration information provided by the other services,
> leading to much cleaner implementations of these auxiliary services.
> 
> I am attempting to use this right now for a collectd composable
> service implementation, but this model would ultimately allow us to
> remove several of the changes made in services.yaml to support Sensu
> and Fluentd and put them back into the appropriate composable service
> templates.

Perhaps you can point to some examples of this usage, then we can compare
with the service_config_settings approach?

I suspect the main difference is you need to append data for each service
to e.g the collectd configuration?

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Network Configuration in TripleO UI

2016-12-17 Thread Steven Hardy
On Thu, Dec 15, 2016 at 03:46:05PM -0600, Ben Nemec wrote:
> 
> 
> On 12/08/2016 08:05 AM, Jiri Tomasek wrote:
> > Hi all,
> > 
> > I've been investigating how to implement TripleO network configuration
> > in TripleO UI. Based on my findings I'd like to propose a solution.
> > 
> > tl;dr proposal: Slightly refactor Network environment files to match GUI
> > usage, Use Jinja Templating to generate dynamic parts of the
> > templates/environments
> > 
> > 
> > # Overview
> > 
> > I've used Ben Nemec's amazing Network template generator as a reference
> > to help me understand how the network configuration works [1]. In
> > general the process of configuring the network in TripleO is:
> > 
> > Define which Networks we intend to use -> Assign Roles to the Networks
> > (+ Assign Role Services to the Network) -> Generate NIC config templates
> > based on previous information
> > 
> > 
> [snip]
> > # The proposal
> > 
> > So having described previous, here is the approach I think we should use
> > to achieve network configuration using TripleO UI:
> > 
> > 1. Put networks definitions into separate environment for each network:
> > - this way GUI can provide a list of networks available to use and let
> > user select which of them he wants to use. These environments are not
> > dynamic and if user wants to add a new network, he does so by creating
> > new templates and environment for it. UI also provides means to
> > configure parameters for each network at this point (if needed).
> > 
> > For example the environment for a Storage Network looks like this:
> > 
> > resource_registry:
> >   OS::TripleO::Network::Storage: ../network/storage.yaml
> >   OS::TripleO::Network::Ports::StorageVipPort:
> > ../network/ports/storage.yaml
> 
> This seems like an obvious improvement, and is essentially how my tool works
> too except that it munges all of the individual environments back together
> at the end.
> 
> Definite +1 from me.
> 
> > 
> > 2. Assign Roles to Networks
> > Having the Networks selected as well as Roles defined, TripleO UI
> > provides user with means to assign Roles to Networks. This step involves
> > generating the network-environment.yaml file. So TripleO UI sends the
> > mapping of roles to network in json format to tripleo-common which in
> > turn uses network-isolation.j2.yaml Jinja template to generate the
> > environment file. I expect that pre-defined network-isolation.yaml will
> > be included in default plan so the user does not need to start from
> > scratch. Tripleo-common also provides an action to fetch network-roles
> > assignment data by parsing the network-isolation.yaml
> > 
> > In addition, user is able to assign individual Role Services to a
> > Network. ServiceNetMap parameter is currently used for this. GUI needs
> > to make sure that it represents Services-Networks assignment grouped by
> > Role so it is ensured that user assigns Services to only networks where
> > their Role is assigned.
> 
> This sounds reasonable to me, but I do want to note that assigning roles to
> networks feels a little backwards to me.  I tend to think of a role as kind
> of the top level thing here, to which we assign other things (services and
> networks, for example).  So my mental model kind of looks like:

I agree - we already have a list of roles (in roles_data.yaml), so the
approach I've been experimenting with is to also define a list of networks,
then we can assign those networks to each role in roles_data.yaml (or some
other file containing the same data).

> 
> roles:
>   Controller:
> networks:
>   - provision
>   - external
>   - internal
>   ...

+1 this is pretty much what I have proposed here:

https://review.openstack.org/#/c/409920/2/roles_data.yaml

Then the missing piece is a list of networks (so that in future we can have
fully composable or custom networks), I proposed a network_data.yaml here:

https://review.openstack.org/#/c/409921/

There is some discussion about if we do that, or instead have a single file
e.g "plan_data.yaml" or similar, but logically I think both of these fit
with your preferred mental model? :)

>   Compute:
> networks:
>   -provision
>   ...
> 
> as opposed to what I think you're describing, which is
> 
> networks:
>   Provision:
> roles:
>   - controller
>   - compute
>   External:
> roles:
>   - controller
>   ...
> 
> Maybe there are benefits to modeling it as the latter, but I think the
> former fits better with the composable roles architecture.

I agree, this second approach is awkward wrt the custom roles architecture,
and I'd prefer the former.

Thanks!

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat][TripleO] How to run mistral workflows via templates

2016-12-16 Thread Steven Hardy
On Fri, Dec 16, 2016 at 04:02:51PM +0100, Thomas Herve wrote:
> On Fri, Dec 16, 2016 at 2:57 PM, Steven Hardy <sha...@redhat.com> wrote:
> > On Fri, Dec 16, 2016 at 02:03:10PM +0100, Thomas Herve wrote:
> >> On Fri, Dec 16, 2016 at 1:17 PM, Giulio Fidente <gfide...@redhat.com> 
> >> wrote:
> >> > I was wondering if it would make sense to have a property for the 
> >> > existing
> >> > Workflow resource to let the user decide if the workflow should *also* be
> >> > triggered on CREATE/UPDATE? And if it would make sense to block the 
> >> > Workflow
> >> > resource until the execution result is returned in that case?
> >>
> >> I'm not super in favor of that. It's conflicting 2 different concepts here.
> >
> > Well, they're already conflated in the mistral workflow resource, because
> > we allow creating an execution by sending a signal:
> >
> > https://github.com/openstack/heat/blob/master/heat/engine/resources/openstack/mistral/workflow.py#L586
> 
> Right, I mentioned that elsewhere. But it doesn't change the resource
> interface, for better or worse.
> 
> > That satisfies the requirement for asynchronous (signal/alarm driven)
> > workflow execution, but we need the synchronous version that can be fully
> > integrated into the heat stack lifecycle (without external dependencies on
> > alarms etc).
> >
> > I know we've previously tried to steer execute/run type actions to signal
> > driven interfaces (and I myself have opposed these kinds of resources in
> > the past, to be honest).  However, I can't currently see a better way to
> > handle this requirement, and given it may be pretty easy to fix (refactor
> > handle_signal and add a boolean to each handle_foo function), I think this
> > discussion warrants revisiting.
> 
> It's unclear what changed since the discussion happened, just that you
> have a use case without another approach possible?

Well, honestly yes - my perspective has changed somewhat as I've had recent
first-hand exposure to several problems which would be very nicely solved
by the additional Execution resource (or interface) under discussion.

Also, it's unclear if the hinted-at future model involving zaqar
notifications actually solves this (or at least not in it's current form),
so I'm looking at this from a pragmatic standpoint and trying to find a
relatively low-impact way to just solve the problem and move forward.

E.g in https://review.openstack.org/#/c/267770/ the execution resource was
rejected based partly on the expectation of a future Zaqar driven model
satisfying the same requirement, which was futher described in this summit
talk:

https://www.openstack.org/videos/video/building-self-healing-applications-with-aodh-zaqar-and-mistral

While the patterns discussed there are definitely compelling, they still
don't address the requirement of wanting to just run some mistral workflow
as part of the heat stack lifecycle?

AFAICT it's describing an independent (of Heat) system which can take
actions in an event/alarm driven manner e.g the interaction between
telemetry services, Zaqar and Mistral are configured by, but essentially
asynchronous to Heat - all great, but doesn't solve our problem?

> >> > Alternatively, would an ex-novo Execution resource make more sense?
> >>
> >> We had some discussions here: https://review.openstack.org/267770.
> >> Executing things as part of a template is a tricky proposition. I
> >> think we'd like it to be more akin to software deployments, where it
> >> runs on actions. We also were talking about doing something like AWS
> >> CustomResource in Heat, which may look like WorkflowExecution (at
> >> least one part of it).
> >
> > Yeah I think whichever approach we take, a list of actions similar to
> > SoftwareDeployment makes sense, then you can elect to run a specific
> > workflow at any state transition in the lifecycle.
> 
> Among the various solutions, I think that's the one I like the best
> for now. It doesn't touch the workflow resource interface, and it
> seems to fit relatively naturally (an API to call, a state to check,
> etc).

Yeah, honestly I think it's a simple step which some folks will find
useful, and it's likely to be low-imact from a heat perspective (folks can
continue to prefer the Aodh/Zaqar approach if it suits their use-case
better).

As always, open to suggestions of a better way to approach this, but atm
I'm feeling like just going ahead with the WorkflowExecution resource
(ensuring it can handle all lifecycle actions, e.g just like
SoftwareDeployment/SoftwareCompoinent which is quite conceptually similar,
as noted in 267770 ...) is the most workable option.

Thanks!

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat][TripleO] How to run mistral workflows via templates

2016-12-16 Thread Steven Hardy
On Fri, Dec 16, 2016 at 02:03:10PM +0100, Thomas Herve wrote:
> On Fri, Dec 16, 2016 at 1:17 PM, Giulio Fidente  wrote:
> > hi,
> >
> > we're trying to address in TripleO a couple of use cases for which we'd like
> > to trigger a Mistral workflow from a Heat template.
> >
> > One example where this would be useful is the creation of the Swift rings,
> > which need some data related to the Heat stack (like the list of Swift nodes
> > and their disks), so it can't be executed in advance, yet it provides data
> > which is needed to complete successfully the deployment of the overcloud.
> >
> > Currently we can create a workflow from Heat, but we can't trigger its
> > execution and also we can't block Heat on the result of the execution.
> 
> You can trigger it out of band with resource signal. But you're right,
> Heat won't wait for the result.
> 
> > I was wondering if it would make sense to have a property for the existing
> > Workflow resource to let the user decide if the workflow should *also* be
> > triggered on CREATE/UPDATE? And if it would make sense to block the Workflow
> > resource until the execution result is returned in that case?
> 
> I'm not super in favor of that. It's conflicting 2 different concepts here.

Well, they're already conflated in the mistral workflow resource, because
we allow creating an execution by sending a signal:

https://github.com/openstack/heat/blob/master/heat/engine/resources/openstack/mistral/workflow.py#L586

That satisfies the requirement for asynchronous (signal/alarm driven)
workflow execution, but we need the synchronous version that can be fully
integrated into the heat stack lifecycle (without external dependencies on
alarms etc).

I know we've previously tried to steer execute/run type actions to signal
driven interfaces (and I myself have opposed these kinds of resources in
the past, to be honest).  However, I can't currently see a better way to
handle this requirement, and given it may be pretty easy to fix (refactor
handle_signal and add a boolean to each handle_foo function), I think this
discussion warrants revisiting.

> > Alternatively, would an ex-novo Execution resource make more sense?
> 
> We had some discussions here: https://review.openstack.org/267770.
> Executing things as part of a template is a tricky proposition. I
> think we'd like it to be more akin to software deployments, where it
> runs on actions. We also were talking about doing something like AWS
> CustomResource in Heat, which may look like WorkflowExecution (at
> least one part of it).

Yeah I think whichever approach we take, a list of actions similar to
SoftwareDeployment makes sense, then you can elect to run a specific
workflow at any state transition in the lifecycle.

> > Or are there different ideas, approaches to the problem?
> 
> If you could define the event outside of Heat (in your example,
> publish something when a swift node is available), then you could use
> Zaqar to trigger your workflow. If you want Heat to block that won't
> do it though.

Yeah that doesn't solve our use-case, we want to run a workflow during an
overcloud stack create, wait for the result, then continue (or fail).

Thanks!

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Re-defining network templates/isolation

2016-12-12 Thread Steven Hardy
On Mon, Dec 12, 2016 at 12:12:30PM -0500, Tim Rozet wrote:
> Hello,
> I wanted to get thoughts about re-thinking how users configure and create new 
> networks with OOO.  The current way to configure network settings for a 
> deployment requires creating nic + network environment templates, and 
> updating the network isolation resource registry.  I think a better approach 
> could consolidate all of the network settings for a deployment into a single 
> yaml file, and then parse that information to create the appropriate nic and 
> network env templates.  We do that in OPNFV Apex with a combination of python 
> and jinja2 using this unified template format:
> 
> https://github.com/opnfv/apex/blob/master/config/network/network_settings.yaml

Thanks for sharing, and for raising this issue Tim.

Strangely enough I was thinking along similar lines recently and I started
hacking on some prototype code, just pushed here:


https://review.openstack.org/#/c/409920
https://review.openstack.org/#/c/409921

That was originally related to fixing this bug where network isolation is
a little inconvenient to use when defining custom roles:

https://bugs.launchpad.net/tripleo/+bug/1633090

Basically I agree we need some way to define per-network data that can then
be consumed by jinja2 when we render templates for each role.

> Furthermore consider cdefining new networks in OOO.  Think about how much is 
> involved in creating a new network, subnet, port definition + net_ip_map for 
> that network, VIP. If you look at the tht/network directory, almost all of 
> the templates for ports and networks have the exact same format.  I think you 
> could make the example above dynamic so that a user could define any new 
> network there and the corresponding port, network + subnet template files 
> could be created on the fly.

Yes, I agree, this could be the next step after enabling the current
networks for custom roles.  If we do the j2 implementation right for fixing
the bug above, I think enabling arbitrary additional networks e.g via some
j2 loops shouldn't be too much additional work.

> I think this creates a much more simple interface for users by exposing 
> networking configuration they need, but also hiding redundant OOO/heat 
> template syntax they don't necessarily care about.  Thoughts?

So, yeah basically I agree - we should reduce the duplication between
templates e.g for nic configuration, and j2 render them where possible for
each role/network.

The trick here will be doing it so that we maintain backwards compatibility
- if we're careful that's probably possible, but we'll have to figure out
  ways to test that ensure we don't break existing users.

My suggestion would be to refactor things to resolve the bug above, and
possibly also https://bugs.launchpad.net/tripleo/+bug/1625558 which I think
should really be fixed by generating the nic configs, not adding even more
example templates.

If we can do some of that during the Ocata timefram, I expect fully
composable/custom networks may be possible during Pike?

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Requesting files from the overcloud from the undercloud

2016-12-12 Thread Steven Hardy
On Wed, Nov 30, 2016 at 01:54:34PM -0700, Alex Schultz wrote:
> Hey folks,
> 
> So I'm in the process of evaluating options for implementing the
> capture-environment-status-and-logs[0] blueprint.  At the moment my
> current plan is to implement a mistral workflow to execute the
> sosreport to bundle the status and logs up on the requested nodes.
> I'm leveraging a similar concept to the the remote execution[1] method
> we current expose via 'openstack overcloud execute'.  The issue I'm
> currently running into is getting the files off the overcloud node(s)
> so that they can be returned to the tripleoclient.  The files can be
> large so I don't think they are something that can just be returned as
> output from Heat.  So I wanted to ask for some input on the best path
> forward.
> 
> IDEA 1: Write something (script or utility) to be executed via Heat on
> the nodes to push the result files to a container on the undercloud.
> Pros:
> - The swift container can be used by the mistral workflow for other
> actions as part of this bundling
> - The tripleoclient will be able to just pull the result files
> straight from swift
> - No additional user access needs to be created to perform operations
> against the overcloud from the undercloud
> Cons:
> - Swift credentials (or token) need to be passed to the script being
> executed by Heat on the overcloud nodes which could lead to undercloud
> credentials being leaked to the overcloud

I think we can just use a swift tempurl?  That's in alignment for what we
already do both for polling metadata from heat (which is put into swift,
then we give a tempurl to the nodes, see /etc/os-collect-config.conf on the
overcloud nodes.

It's also well aligned with what we do for the DeployArtifactURLs
interface.

I guess the main difference here is we're only allowing GET access for
those cases, but here there's probably more scope for abuse, e.g POSTing
giant files from the overcloud nodes could impact e.g disk space on the
undercloud?

> - I'm not sure if all overcloud nodes would have access to the
> undercloud swift endpoint

I think they will, or the tempurl transport we use for heat won't work.

> IDEA 2: Write additional features into undercloud deployment for ssh
> key generation and inclusion into the deployment specifically for this
> functionality to be able to reach into the nodes and pull files out
> (via ssh).
> Pros:
> - We would be able to leverage these 'support' credentials for future
> support features (day 2 operations?)
> - ansible (or similar tooling) could be used to perform operations
> against the overcloud from the undercloud nodes
> Cons:
> - Complexity and issues around additional user access
> - Depending on where the ssh file transfer occurs (client vs mistral),
> additional network access might be needed.
> 
> IDEA 2a: Leverage the validations ssh key to pull files off of the
> overcloud nodes
> Pros:
> - ssh keys already exist when enable_validations = true so we can
> leverage existing
> Cons:
> - Validations can be disabled, possibly preventing 'support' features
> from working
> - Probably should not leverage the same key for multiple functions.
> 
> I'm leaning towards idea 1, but wanted to see if there was some other
> form of existing functionality I'm not aware of.

Yeah I think (1) is probably the way to go, although cases could be argued
for all approaches you mention.

My main reason for preferring (1) is I think we'll want the data to end up
in swift anyway, e.g so UI users can access it (which won't be possible if
we e.g scp some tarball from overcloud nodes into the undercloud filesystem
directly, so we may as well just push it into swift from the nodes?)

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ironic][heat][mistral][magnum] Manage Ironic resource in Orchestration.

2016-12-12 Thread Steven Hardy
On Mon, Dec 12, 2016 at 04:18:59PM +0800, Rico Lin wrote:
>Think about bare metal (ironic) deployment, we can do directly ironic
>call, or use nova to deploy.
>Right now I have a spec about implemented Ironic resources [1] in Heat.
>Including Chassis, Port and some state setting feature for Node
>(including node-set-xxx). In heat team discussion, we tend to not
>implement Node deployment feature.
>
>If we counting Ironic action into three parts:
>1. Create the node in the ironic DB
>2. Do a series of steps to get the node in a state where it's ready for
>deployment (this may include introspection via ironic-inspector, or just
>changing the state)
>3. Select a node and drive a workflow that deploys an image on it.

Yes, I think a heat resource is a good fit for (1), but less so for (2) and
(3), because these are really workflows.  I started looking into ways of
automating those workflows via mistral here (WIP, needs more work):

https://review.openstack.org/#/c/313048/

So what I would like is to finish that deployment workflow, then have some
way to drive it via heat, e.g:

resources:
  the_node:
type: OS::Ironic::Node
properties:
  

  node_deploy_workflow:
type: OS::Mistral::Workflow
properties:
  input:
node: {get_resource: the_node}
tasks:
  

There are two missing pieces (other than fixing the workflow example I
referenced above):

1. The Ironic node heat resource
2. Some way to actually execute the mistral workflow during the stack create

I think (1) is pretty simple, I wrote some resources that do that
previously ref https://review.openstack.org/#/c/104223/ - but that effort
stalled because at that time we didn't have a good answer to how we'd drive
the deployment workflow (IMO we do now, it's mistral).

The missing part for (2) is that currently OS::Mistral::Workflow expects a
signal to actually create a workflow execution and run the workflow.  I
think we need some other option (a boolean that says run once when we
create the resource perhaps?) to make it more convenient to drive a one-off
workflow during a stack create/update in a synchronous way.

>We can do in Heat is to use Nova on (3) and hope someone already
>handles on (1) and (2). If we consider (1) and (2) also part of Heat
>resources, we can actually make entire management done in heat template.

While we could implement the workflow directly in the heat resources, IMO
it'd be nice to consider mistral instead, unless there are objections to
making that a dependency when solving this problem.

I actually think Nova is much less interesting for many Ironic use-cases,
e.g for TripleO all we use nova for is to schedule to groups of nodes using
a very simple filter.  It'd be very nice to remove any dependency on Nova
and just drive deployments with explicit node placement directly via Ironic
(either via Heat, or Heat->Mistral, or just Mistral depending on your
preferences).

>The use case in my head was ironic+magnum case:
>Ironic resource handles state we need, then through magnum resource, nova
>will deploy that baremetal node and config it as part of COE.
>The open question is if heat really implemented such feature, who will
>benefit from it and how are they going to use it? We certainly don't want
>to implement something that no one will use it or not even think it's a
>good idea.

I think TripleO definitely would benefit from this work too - I'd just like
to see it done in a way which makes depending on Nova optional (it's a
major overhead, and for some baremetal deployment use-cases, it's not
providing much value).

>And which projects might be a good fit if it's not a good idea to do in
>heat?
>We can also think about the possibility of implement it with putting it in
>Nova resource in heat if it's a baremetal case, Heat+Mistral, or just
>Mistral will do.

My preference would be Heat+Mistral as discussed above, but open to other
ideas.

I don't think conflating any of this with the Nova resource is a good idea
- if we decide to implement the workflow directly in heat as an alternative
to depending on Mistral it should probably be a new resource, perhaps
implemented with a properties schema that makes overridding the normal nova
server resource easy.

I still think Heat+Mistral provides a cleaner solution tho, so I'd like to
see that further explored before committing to an internal reimplementation
of such workflow.

Thanks for reviving this topic - I'm certainly interested in helping move
this forward and/or discussing further.

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat] On allowing null as a parameter default

2016-12-08 Thread Steven Hardy
On Mon, Dec 05, 2016 at 04:27:50PM -0500, Zane Bitter wrote:
> Any parameter in a Heat template that has a default other than None is
> considered optional, so the user is not required to pass a value. Otherwise,
> however, the parameter is required and creating the stack will fail pretty
> immediately if the user does not pass a value.
> 
> I've noticed that this presents a giant pain, particularly when trying to
> create what we used to call provider templates. If you do e.g.
> 
> `openstack orchestration resource type show -f yaml --template-type hot
> OS::Nova::Server`


Yes, indeed, this is a giant pain - e.g in TripleO where we make very heavy
use of nested "provider" templates, it's really inconvenient having to work
around spurious validation issues related to get_attr returning None, and
also figuring out what values are passed as parameters via the parent
templates.  So yes, *please* lets fix this :)

> then you get back a template with dozens of parameters, most of which don't
> have defaults (because the corresponding resource properties don't have
> defaults) and are therefore not optional. I consider that a bug, because in
> many cases the corresponding resource properties are *not* required
> (properties have a "required" flag that is independent from the "default"
> value).
> 
> The result is that it's effectively impossible for our users to build
> re-usable child templates; they have to know which properties the parent
> template does and does not want to specify values for.
> 
> Using a default that corresponds to the parameter type ("", [], {}, 0,
> false) doesn't work, I don't think, because there are properties that treat
> None differently to e.g. an empty dict.
> 
> The obvious alternative is to use a different sentinel value, other than
> None, for determining whether a parameter default is provided and then
> allowing users to pass null as default. We could then adjust the properties
> code to treat this sentinel as if no value were specified for the property.
> 
> The difficulty of this is knowing how to handle other places that get_param
> might be used, especially in arguments to other functions. I guess we have
> that problem now in some ways, because get_attr often returns None up to the
> point where the resource it refers to is created. I hoped that we might get
> away from that with the placeholders spec though :/

Maybe I'm oversimplifying but I was expecting as solution similar to you
describe above with a per-type sentinel, but with a reworked base-class for
all intrinsic functions, so that the new sentinel value can be
transparently passed around instead of evaluated?

I guess the tricky part about this is we'd always need a type for any
attributes (e.g including outputs which are currently untyped).

We'd possibly also need to modify parameter contraints evaluation to ensure
the sentinel isn't failed for constraints, but IIRC we disable value
validation for most non-runtime validation anyway?

More thought required here but for sure I'd love to see this fixed and will
be happy to help with testing/reviews on TripleO if anyone has any WIP
patches based on the above ideas :)

Thanks!

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Proposing Alex Schultz core on puppet-tripleo

2016-12-08 Thread Steven Hardy
On Thu, Dec 01, 2016 at 05:26:31PM -0500, Emilien Macchi wrote:
> Team,
> 
> Alex Schultz (mwhahaha on IRC) has been active on TripleO since a few
> months now.  While he's very active in different areas of TripleO, his
> reviews and contributions on puppet-tripleo have been very useful.
> Alex is a Puppet guy and also the current PTL of Puppet OpenStack. I
> think he perfectly understands how puppet-tripleo works. His
> involvement in the project and contributions on puppet-tripleo deserve
> that we allow him to +2 puppet-tripleo.
> 
> Thanks Alex for your involvement and hard work in the project, this is
> very appreciated!
> 
> As usual, I'll let the team to vote about this proposal.

+1!

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Proposing Julie Pichon for tripleo core

2016-11-22 Thread Steven Hardy
On Tue, Nov 22, 2016 at 05:01:49PM +, Dougal Matthews wrote:
>Hi all,
> 
>I would like to propose we add Julie (jpich) to the TripleO core team for
>python-tripleoclient and tripleo-common. This nomination is based
>partially on review stats[1] and also my experience with her reviews and
>contributions.
> 
>Julie has consistently provided thoughtful and detailed reviews since the
>start of the Newton cycle. She has made a number of contributions which
>improve the CLI and has been extremely helpful with other tasks that don't
>often get enough attention (backports, bug triaging/reporting and
>improving our processes[2]).
> 
>I think she will be a valuable addition to the review team

+1!

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] proposing Michele Baldessari part of core team

2016-11-07 Thread Steven Hardy
On Fri, Nov 04, 2016 at 01:40:43PM -0400, Emilien Macchi wrote:
> MIchele Baldessari (bandini on IRC) has consistently demonstrated high
> levels of contributions in TripleO projects, specifically in High
> Availability area where's he's for us a guru (I still don't understand
> how pacemaker works, but hopefully he does).
> 
> He has done incredible work on composable services and also on
> improving our HA configuration by following reference architectures.
> Always here during meetings, and on #tripleo to give support to our
> team, he's a great team player and we are lucky to have him onboard.
> I believe he would be a great core reviewer on HA-related work and we
> expect his review stats to continue improving as his scope broadens
> over time.
> 
> As usual, feedback is welcome and please vote for this proposal!

+1 agreed, Michele has done a lot of great work recently and will make an
excellent addition to the core reviewers group.

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Ocata specs

2016-11-02 Thread Steven Hardy
On Tue, Nov 01, 2016 at 05:46:48PM -0400, Zane Bitter wrote:
> On 01/11/16 15:13, James Slagle wrote:
> > On Tue, Nov 1, 2016 at 7:21 PM, Emilien Macchi  wrote:
> > > Hi,
> > > 
> > > TripleO (like some other projects in OpenStack) have not always done
> > > good job in merging specs on time during a cycle.
> > > I would like to make progress on this topic and for that, I propose we
> > > set a deadline to get a spec approved for Ocata release.
> > > This deadline would be Ocata-1 which is week of November 14th.
> > > 
> > > So if you have a specs under review, please make sure it's well
> > > communicated to our team (IRC, mailing-list, etc); comments are
> > > addressed.
> > > 
> > > Also, I would ask our team to spend some time to review them when they
> > > have time. Here is the link:
> > > https://review.openstack.org/#/q/project:openstack/tripleo-specs+status:open
> > 
> > Given that we don't always require specs, should we make the same
> > deadline for blueprints to get approved for Ocata as well?
> > 
> > In fact, we haven't even always required blueprints for all features.
> > In order to avoid any surprise FFE's towards the end of the cycle, I
> > think it might be wise to start doing so. The overhead of creating a
> > blueprint is very small, and it actually works to the implementer's
> > advantage as it helps to focus review attention at the various
> > milestones.
> > 
> > So, we could say:
> > - All features require a blueprint
> > - They may require a spec if we need to reach concensus about the feature 
> > first
> > - All Blueprints and Specs for Ocata not approved by November 14th
> > will be deferred to Pike.
> > 
> > Given we reviewed all the blueprints at the summit, and discussed all
> > the features we plan to implement for Ocata, I think it would be
> > reasonable to go with the above. However, 'm interested in any
> > feedback or if anyone feels that requiring a blueprint for features is
> > undesirable.
> 
> The blueprint interface in Launchpad is kind of horrible for our purposes
> (too many irrelevant fields to fill out). For features that aren't
> big/controversial enough to require a spec, some projects have adopted a
> 'spec-lite' process. Basically you raise a *bug* in Launchpad, give it
> 'Wishlist' priority and tag it with 'spec-lite'.

I think either approach is fine and IIRC we did previously discuss the
spec-lite process and agree it was acceptable for tracking smaller
features for TripleO.

The point is we absolutely need some way to track stuff that isn't yet
landed - and I think folks probably don't care much re (Bug|Blueprint)
provided it's correctly targetted.

We had a very rough time at the end of Newton because $many folks showed up
late with features we didn't know about and/or weren't correctly tracked,
so I think a feature proposal freeze is reasonable.  Given the number of
BPs targetted at Ocata is already prety high I think Nov 14th probably
justifiable but it is on the more conservative side relative to other
projects[2].

Regarding the specs process - tbh I feel like that hasn't been working well
for a while (for all the same reasons John referenced in [1]).

So I've been leaning towards not requiring (or writing) specs in the
majority of cases, instead often we've just linked an etherpad with notes
or had a ML discussion to gain consensus on direction. (This seems pretty
similar to the wiki based approach adopted by the swift team).

Thanks,

Steve

[1] http://lists.openstack.org/pipermail/openstack-dev/2016-May/094026.html
[2] https://releases.openstack.org/ocata/schedule.html

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Your draft logo & a sneak peek

2016-10-27 Thread Steven Hardy
On Wed, Oct 26, 2016 at 05:09:54PM +0200, Carlos Camacho Gonzalez wrote:
> Here you have my 2 cents,
> 
> I think the proposed draft does not fit with what we currently have/want.
> 
> I'm 100% sure that can be improved but here it goes.

Thanks to everyone for the feedback (both on and off list), but a quick
reminder, please do provide your feedback via the link in the original mail
(http://tinyurl.com/OSmascot).  This will ensure that your concerns are
seen by the team producing the logos, so hopefully another revision can be
produced which everyone is happy with.

FWIW I do agree this initial logo does somewhat miss the mark and I'm
hoping we'll be able to reach another revision which is more in keeping
with (and/or more directly derived from) our current logo.

Thanks!

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [TripleO] Your draft logo & a sneak peek

2016-10-25 Thread Steven Hardy
Hi team,

I recently received a draft version of our project logo, using the mascot
we selected together. A final version (and some cool swag) will be ready
for us before the Project Team Gathering in February. Before they make our
logo final, they want to be sure we're happy with our mascot.

We can discuss any concerns in Barcelona and you can also provide direct
feedback to the designers: http://tinyurl.com/OSmascot . Logo feedback is
due Friday, Nov. 11.

To get a sense of how ours stacks up to others, check out this sneak
preview of several dozen draft logos from our community:
https://youtu.be/JmMTCWyY8Y4.

The only comment I have made is this logo does lose some of the OoO imagery
we had with the previous owl version - please feel free to provide feedback
of your own via the url above, thanks!

Thanks!

Steve
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] PTG space request

2016-10-13 Thread Steven Hardy
On Wed, Oct 12, 2016 at 11:28:00AM +0200, Thierry Carrez wrote:
> Emilien Macchi wrote:
> > I would like to request for some space dedicated to TripleO project
> > for the first OpenStack PTG.
> > 
> > https://www.openstack.org/ptg/
> > 
> > The event will happen in February 2017 during the next PTG in Atlanta.
> > Any feedback is welcome,
> 
> Just a quick note: as you can imagine we have finite space at the event,
> and the OpenStack Foundation wants to give priority to teams which have
> a diverse affiliation (or which are not tagged "single-vendor").
> Depending on which teams decide to take advantage of the event and which
> don't, we may or may not be able to offer space to single-vendor
> projects -- and TripleO is currently tagged single-vendor.

Thanks for the feedback Thierry, I can understand the need to somehow keep
PTG space requirements bounded, but I would agree with Emilien and Eoghan
that perhaps the single-vendor tag is too coarse a metric with which to
judge all projects (it really doesn't capture cross-project collaboration
at all IMO).

One of the main goals of TripleO is using OpenStack projects where
possible, and as such we have a very broad range of cross project
collaboration happening, and clearly the PTG is the ideal forum for such
discussions.

> The rationale is, the more organizations are involved in a given project
> team, the more value there is to offer common meeting space to that team
> for them to sync on priorities and get stuff done. If more than 90% of
> contributions / reviews / core reviewers come from a single
> organization, there is less coordination needs and less value in having
> all those people from a single org to travel to a distant place to have
> a team meeting.

One aspect not considered by this is that some projects have a large number
of contributors who are also involved with other projects e.g we have
regular contributors who are also deeply involved in Heat, Ironic, Mistral,
Keystone, Kolla, etc.  So we've typically got huge value from cross-project
discussion during the design summit (and attendance of folks from other
teams in the TripleO sessions).

> I hope we'll be able to accommodate you, though. And in all cases
> TripleO people are more than welcome to join the event to coordinate
> with other teams. It's just not 100% sure we'll be able to give you a
> dedicated room for multiple days. We should know better in a week or so,
> once we get a good idea of who plans to meet at the event and who doesn't.

Sure, we could have TripleO folks go to these project PTGs but it's going
to be something of a fragmented set of discussions if we can't then tie the
cross-project requirements together into design discussions related to the
TripleO roadmap (and have the relevant folks from other teams in the
TripleO sessions).

Hopefully it will work out and there will be space available, otherwise
we'll have to consider our plan-b, thanks!

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Roles count and flavors inside Heat environment file

2016-10-03 Thread Steven Hardy
On Mon, Oct 03, 2016 at 02:23:08PM +0200, Marius Cornea wrote:
> Hello everyone,
> 
> In Newton we've deprecated the *-scale and *-flavor deploy command
> arguments in favor of using Heat environment files. In the context of
> testing the composable roles where the custom roles' node count and
> flavor need to be passed inside an environment file I would like to
> build the test plan by using an environment containing all nodes count
> and flavors, including the preexisting roles.
> 
> A deploy command example would look like:
> 
> openstack overcloud deploy --stack cloudy --templates -e nodes.yaml
> 
> where the nodes environment file contains something like:
> 
> parameter_defaults:
>   ControllerCount: 3
>   ComputeCount: 2
>   CephStorageCount: 3
>   ServiceApiCount: 3
> 
>   OvercloudControlFlavor: controller
>   OvercloudComputeFlavor: compute
>   OvercloudCephStorageFlavor: ceph
>   OvercloudServiceApiFlavor: serviceapi
> 
> 
> I would like to get some feedback about this approach. I think it's
> better to keep all the roles count/flavors in the same place for
> consistency reasons.

+1 - I think this is well aligned with the interfaces we want to encourage
(vs the hard-coded CLI options which we want to move away from).

The only disadvantage of this approach is there's a few special-cases where
the parameter name isn't intuitive (OvercloudControlFlavor is an example).

It'd be better if we move to a consistent e.g $roleFlavor interface in due
course), but I still think encouraging this pattern is good, as it'll help
us identify the parameter interfaces which are inconsistent, then we can
fix them (deprecate the old parameter, add new consistent ones).

Thanks,

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] TripleO Core nominations

2016-09-26 Thread Steven Hardy
On Thu, Sep 15, 2016 at 10:20:07AM +0100, Steven Hardy wrote:
> Hi all,
> 
> As we work to finish the last remaining tasks for Newton, it's a good time
> to look back over the cycle, and recognize the excellent work done by
> several new contributors.
> 
> We've seen a different contributor pattern develop recently, where many
> folks are subsystem experts and mostly focus on a particular project or
> area of functionality.  I think this is a good thing, and it's hopefully
> going to allow our community to scale more effectively over time (and it
> fits pretty nicely with our new composable/modular architecture).
> 
> We do still need folks who can review with the entire TripleO architecture
> in mind, but I'm very confident folks will start out as subsystem experts
> and over time broaden their area of experience to encompass more of
> the TripleO projects (we're already starting to see this IMO).
> 
> We've had some discussion in the past[1] about strictly defining subteams,
> vs just adding folks to tripleo-core and expecting good judgement to be
> used (e.g only approve/+2 stuff you're familiar with - and note that it's
> totally fine for a core reviewer to continue to +1 things if the patch
> looks OK but is outside their area of experience).
> 
> So, I'm in favor of continuing that pattern and just welcoming some of our
> subsystem expert friends to tripleo-core, let me know if folks feel
> strongly otherwise :)
> 
> The nominations, are based partly on the stats[2] and partly on my own
> experience looking at reviews, patches and IRC discussion with these folks
> - I've included details of the subsystems I expect these folks to focus
> their +2A power on (at least initially):
> 
> 1. Brent Eagles
> 
> Brent has been doing some excellent work mostly related to Neutron this
> cycle - his reviews have been increasingly detailed, and show a solid
> understanding of our composable services architecture.  He's also provided
> a lot of valuable feedback on specs such as dpdk and sr-iov.  I propose
> Brent continues this exellent Neutron focussed work, while also expanding
> his review focus such as the good feedback he's been providing on new
> Mistral actions in tripleo-common for custom-roles.
> 
> 2. Pradeep Kilambi
> 
> Pradeep has done a large amount of pretty complex work around Ceilomenter
> and Aodh over the last two cycles - he's dealt with some pretty tough
> challenges around upgrades and has consistently provided good review
> feedback and solid analysis via discussion on IRC.  I propose Prad
> continues this excellent Ceilomenter/Aodh focussed work, while also
> expanding review focus aiming to cover more of t-h-t and other repos over
> time.
> 
> 3. Carlos Camacho
> 
> Carlos has been mostly focussed on composability, and has done a great job
> of working through the initial architecture implementation, including
> writing some very detailed initial docs[3] to help folks make the transition
> to the new architecture.  I'd suggest that Carlos looks to maintain this
> focus on composable services, while also building depth of reviews in other
> repos.
> 
> 4. Ryan Brady
> 
> Ryan has been one of the main contributors implementing the new Mistral
> based API in tripleo-common.  His reviews, patches and IRC discussion have
> consistently demonstrated that he's an expert on the mistral
> actions/workflows and I think it makes sense for him to help with review
> velocity in this area, and also look to help with those subsystems
> interacting with the API such as tripleoclient.
> 
> 5. Dan Sneddon
> 
> For many cycles, Dan has been driving direction around our network
> architecture, and he's been consistently doing a relatively small number of
> very high-quality and insightful reviews on both os-net-config and the
> network templates for tripleo-heat-templates.  I'd suggest Dan continues
> this focus, and he's indicated he may have more bandwidth to help with
> reviews around networking in future.
> 
> Please can I get feedback from exisitng core reviewers - you're free to +1
> these nominations (or abstain), but any -1 will veto the process.  I'll
> wait one week, and if we have consensus add the above folks to
> tripleo-core.

Ok, so we got quite a few +1s and no objections, so I will go ahead and add
the folks listed above to tripleo-core, congratulations (and thanks!) guys,
keep up the great work! :)

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo][ci] Temporary increase for the OVB undercloud instance memory

2016-09-22 Thread Steven Hardy
On Thu, Sep 22, 2016 at 04:36:30PM +0200, Gabriele Cerami wrote:
> Hi,
> 
> As reported on this bug
> 
> https://bugs.launchpad.net/tripleo/+bug/1626483
> 
> HA gate and periodic jobs for master and sometimes newton started to
> fail for errors related to memory shortage. Memory on undercloud
> instance was increased to 8G less than a month ago, so the problem
> needs a different approach to be solved. 
> 
> We have some solutions in store. However, with the release date so
> close, I don't think it's time for this kind of changes. So I thought
> it could be a good compromise to temporarily increase the undercloud
> instance memory to 12G, just for this week, unless there's a rapid way
> to reduce memory footprint for heat-engine (usually the biggest memory
> consumer on the undercloud instance)

If we can avoid it, I'd rather we avoided increasing the ram again - I
suspect there is an issue with a heat regression as I'm seeing much higher
memory usage in my local test environment too.

I did a quick re-test of some local monitoring I did earlier in the cycle
when we experienced some high memory usage:

http://people.redhat.com/~shardy/heat/plots/heat_before_after_end_newton.png

There are three plots there, one early in the cycle, one after some fixes
which reduced memory usage a lot, then the highest leaky plot is the one I
just did today.

So I'm pretty sure we have another heat memory leak to track down.

If anyone has any historical data of memory usage e.g from periodic CI
runs, that would be helpful, otherwise we'll have to bisect testing locally
or derive it from scraping our dstat data from CI run logs.

Steve.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] let's talk (development) environment deployment tooling and workflows

2016-09-19 Thread Steven Hardy
Hi Alex,

Firstly, thanks for this detailed feedback - it's very helpful to have
someone with a fresh perspective look at the day-1 experience for TripleO,
and while some of what follows are "know issues", it's great to get some
perspective on them, as well as ideas re how we might improve things.

On Thu, Sep 15, 2016 at 09:09:24AM -0600, Alex Schultz wrote:
> Hi all,
> 
> I've recently started looking at the various methods for deploying and
> developing tripleo.  What I would like to bring up is the current
> combination of the tooling for managing the VM instances and the
> actual deployment method to launch the undercloud/overcloud
> installation.  While running through the various methods and reading
> up on the documentation, I'm concerned that they are not currently
> flexible enough for a developer (or operator for that matter) to be
> able to setup the various environment configurations for testing
> deployments and doing development.  Additionally I ran into issues
> just trying get them working at all so this probably doesn't help when
> trying to attract new contributors as well.  The focus of this email
> and of my experience seems to relate with workflow-simplification
> spec[0].  I would like to share my experiences with the various
> tooling available and raise some ideas.
> 
> Example Situation:
> 
> For example, I have a laptop with 16G of RAM and an SSD and I'd like
> to get started with tripleo.  How can I deploy tripleo?

So, this is probably problem #1, because while I have managed to deploy a
minimal TripleO environment on a laptop with 16G of RAM, I think it's
pretty widely known that it's not really enough (certainly with our default
configuration, which has unfortunately grown over time as more and more
things got integrated).

I see two options here:

1. Document the reality (which is really you need a physical machine with
at least 32G RAM unless you're prepared to deal with swapping).

2. Look at providing a "TripleO lite" install option, which disables some
services (both on the undercloud and default overcloud install).

Either of these are defintely possible, but (2) seems like the best
long-term solution (although it probably means another CI job).

> Tools:
> 
> instack:
> 
> I started with the tripleo docs[1] that reference using the instack
> tools for virtual environment creation while deploying tripleo.   The
> docs say you need at least 12G of RAM[2].  The docs lie (step 7[3]).
> So after basically shutting everything down and letting it deploy with
> all my RAM, the deployment fails because the undercloud runs out of
> RAM and OOM killer kills off heat.  This was not because I had reduced
> the amount of ram for the undercloud node or anything.  It was because
> by default, 6GB of RAM with no swap is configured for the undercloud
> (not sure if this is a bug?).  So I added a swap file to the
> undercloud and continued. My next adventure was having the overcloud
> deployment fail because lack of memory as puppet fails trying to spawn
> a process and gets denied.  The instack method does not configure swap
> for the VMs that are deployed and the deployment did not work with 5GB
> RAM for each node.  So for a full 16GB I was unable to follow the
> documentation and use instack to successfully deploy.  At this point I
> switched over to trying to use tripleo-quickstart.  Eventually I was
> able to figure out a configuration with instack to get it to deploy
> when I figured out how to enable swap for the overcloud deployment.

Yeah, so this definitely exposes that we need to update the docs, and also
provide an easy install-time option to enable swap on all-the-things for
memory contrained environments.

> tripleo-quickstart:
> 
> The next thing I attempted to use was the tripleo-quickstart[4].
> Following the directions I attempted to deploy against my localhost.
> I turns out that doesn't work as expected since ansible likes to do
> magic when dealing with localhost[5].  Ultimately I was unable to get
> it working against my laptop locally because I ran into some libvirt
> issues.  But I was able to get it to work when I pointed it at a
> separate machine.  It should be noted that tripleo-quickstart creates
> an undercloud with swap which was nice because then it actually works,
> but is an inconsistent experience depending on which tool you used for
> your deployment.

Yeah, so while a lot of folks have good luck with tripleo-quickstart, it
has the disadvantage of not currently being the tool used in upstream
TripleO CI (which folks have looked at fixing, but it's not yet happened).

The original plan was for tripleo-quickstart to completely replace the
instack-virt-setup workflow:

https://blueprints.launchpad.net/tripleo/+spec/tripleo-quickstart

But for a variety of reasons, we never quite got to that - we may need a
summit discussion on the path forward here.

For me (as an upstream developer) it really boils down to the CI usage
issue - at all times I want to 

Re: [openstack-dev] [TripleO] *ExtraConfig, backwards compatibility & deprecation

2016-09-19 Thread Steven Hardy
On Wed, Sep 14, 2016 at 06:32:07PM +0200, Giulio Fidente wrote:
> On 09/14/2016 05:59 PM, Giulio Fidente wrote:
> > On 09/14/2016 02:31 PM, Steven Hardy wrote:
> > > Related to this is the future of all of the per-role customization
> > > interfaces.  I'm thinking these don't really make sense to maintain
> > > long-term now we have the new composable services architecture, and it
> > > would be better if we can deprecate them and move folks towards the
> > > composable services templates instead?
> > 
> > my experience is that the ExtraConfig interfaces have been useful to
> > provide arbitrary hiera and class includes
> > 
> > I wonder if we could ship by default some roles parsing those parameters?
> 
> thinking more about it, the *ExtraConfig interfaces also offer a simple
> mechanism to *override* any hiera setting we push via the templates ...
> which isn't easy to achieve with roles
> 
> a simple short-term solution could be to merge ExtraConfig in the $role
> mapped_data, thoughts?

Thanks for the feedback, so yeah I agree there are reasons to keep the
ExtraConfig *parameters* around, or some similar interface.

I probably should have clarified this in my original post, but there are
two types of *ExtraConfig interfaces, the parameters you refer to, which
simply override some hieradata (we probably want to keep this, but it still
means we have ExtraConfig tied the the role (not the service), but
presumably an operator will know what services are deployed on what role).

The second (and more problematic from a containers point of view) is the
ExtraConfig *resources*, where you can pass an arbitrary heat template,
which typically is used to run stuff on the host (which will be impossible,
or at least not useful on an atomic host in a fully containerized
deployment).

I think your concerns are mostly around the ExtraConfig *parameters* thus,
provided we maintain some way to do those hiera overrides, e.g the
documented interfaces for Ceph ExtraConfig can still be used?

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [TripleO] TripleO Core nominations

2016-09-15 Thread Steven Hardy
Hi all,

As we work to finish the last remaining tasks for Newton, it's a good time
to look back over the cycle, and recognize the excellent work done by
several new contributors.

We've seen a different contributor pattern develop recently, where many
folks are subsystem experts and mostly focus on a particular project or
area of functionality.  I think this is a good thing, and it's hopefully
going to allow our community to scale more effectively over time (and it
fits pretty nicely with our new composable/modular architecture).

We do still need folks who can review with the entire TripleO architecture
in mind, but I'm very confident folks will start out as subsystem experts
and over time broaden their area of experience to encompass more of
the TripleO projects (we're already starting to see this IMO).

We've had some discussion in the past[1] about strictly defining subteams,
vs just adding folks to tripleo-core and expecting good judgement to be
used (e.g only approve/+2 stuff you're familiar with - and note that it's
totally fine for a core reviewer to continue to +1 things if the patch
looks OK but is outside their area of experience).

So, I'm in favor of continuing that pattern and just welcoming some of our
subsystem expert friends to tripleo-core, let me know if folks feel
strongly otherwise :)

The nominations, are based partly on the stats[2] and partly on my own
experience looking at reviews, patches and IRC discussion with these folks
- I've included details of the subsystems I expect these folks to focus
their +2A power on (at least initially):

1. Brent Eagles

Brent has been doing some excellent work mostly related to Neutron this
cycle - his reviews have been increasingly detailed, and show a solid
understanding of our composable services architecture.  He's also provided
a lot of valuable feedback on specs such as dpdk and sr-iov.  I propose
Brent continues this exellent Neutron focussed work, while also expanding
his review focus such as the good feedback he's been providing on new
Mistral actions in tripleo-common for custom-roles.

2. Pradeep Kilambi

Pradeep has done a large amount of pretty complex work around Ceilomenter
and Aodh over the last two cycles - he's dealt with some pretty tough
challenges around upgrades and has consistently provided good review
feedback and solid analysis via discussion on IRC.  I propose Prad
continues this excellent Ceilomenter/Aodh focussed work, while also
expanding review focus aiming to cover more of t-h-t and other repos over
time.

3. Carlos Camacho

Carlos has been mostly focussed on composability, and has done a great job
of working through the initial architecture implementation, including
writing some very detailed initial docs[3] to help folks make the transition
to the new architecture.  I'd suggest that Carlos looks to maintain this
focus on composable services, while also building depth of reviews in other
repos.

4. Ryan Brady

Ryan has been one of the main contributors implementing the new Mistral
based API in tripleo-common.  His reviews, patches and IRC discussion have
consistently demonstrated that he's an expert on the mistral
actions/workflows and I think it makes sense for him to help with review
velocity in this area, and also look to help with those subsystems
interacting with the API such as tripleoclient.

5. Dan Sneddon

For many cycles, Dan has been driving direction around our network
architecture, and he's been consistently doing a relatively small number of
very high-quality and insightful reviews on both os-net-config and the
network templates for tripleo-heat-templates.  I'd suggest Dan continues
this focus, and he's indicated he may have more bandwidth to help with
reviews around networking in future.

Please can I get feedback from exisitng core reviewers - you're free to +1
these nominations (or abstain), but any -1 will veto the process.  I'll
wait one week, and if we have consensus add the above folks to
tripleo-core.

Finally, there are quite a few folks doing great work that are not on this
list, but seem to be well on track towards core status.  Some of those
folks I've already reached out to, but if you're not nominated now, please
don't be disheartened, and feel free to chat to me on IRC about it.  Also
note the following:

 - We need folks to regularly show up, establishing a long-term pattern of
   doing useful reviews, but core status isn't about raw number of reviews,
   it's about consistent downvotes and detailed, well considered and
   insightful feedback that helps increase quality and catch issues early.

 - Try to spend some time reviewing stuff outside your normal area of
   expertise, to build understanding of the broader TripleO system - as
   discussed above subsystem experts are a good thing, but we also need
   to see some appreciation of the broader Tripleo archticture &
   interfaces (all the folks above have demonstrated solid knowledge of one
   or more of our primary interfaces, e.g 

[openstack-dev] [TripleO] *ExtraConfig, backwards compatibility & deprecation

2016-09-14 Thread Steven Hardy
Hi all,

I wanted to draw attention to this patch:

https://review.openstack.org/#/c/367295/

As part of the custom-roles work, I had to break backwards compatibility
for the OS::TripleO::AllNodesExtraConfig resource.

I'm not happy about that, but I couldn't find any way to avoid it if we
want to allow existing roles to be optional (such as removing the *Storage
role resources from the deployment completely).

The adjustments for any out-of-tree users should be simple, and I'm
planning to write a script to help folks migrate but we'll need to document
this in the release notes/docs (I'll write these).

Related to this is the future of all of the per-role customization
interfaces.  I'm thinking these don't really make sense to maintain
long-term now we have the new composable services architecture, and it
would be better if we can deprecate them and move folks towards the
composable services templates instead?

In particular, when moving to a fully containerized deployment using an
atomic host image, configuration of the host directly via these interfaces
will no longer be possible, so it will be necessary to get folks onto the
composable services interfaces ahead of such a move (as these will fit much
better with a container based deployment):

https://review.openstack.org/#/c/330659/

What do folks think about this?  I suspect there's going to be some work
required to achieve it, but a first step would be to convert all the
in-tree ExtraConfig examples to the new format & update the docs to show how
customizations via composable services would work.

Then later we can update the docs & mark these interfaces deprecated
(during Ocata).

Any thoughts appreciated, thanks!

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Overriding internal_api network name

2016-09-12 Thread Steven Hardy
On Mon, Sep 12, 2016 at 04:21:43PM +0200, Dmitry Tantsur wrote:
> Hi folks!
> 
> I'm looking into support for multiple overclouds with shared control plane.
> I'm porting a downstream guide: https://review.openstack.org/368840.
> 
> However, this no longer works, probably because "internal_api" network name
> is hardcoded in ServiceNetMapDefaults: 
> https://github.com/openstack/tripleo-heat-templates/blob/dfe74b211267cde7a1da4e1fe9430127eda234c6/network/service_net_map.yaml#L14.
> So deployment fails with
> 
> CREATE_FAILED resources.RedisVirtualIP: Property error:
> resources.VipPort.properties.network: Error validating value 'internal_api':
> Unable to find network with name or id 'internal_api'
> 
> Is it a bug? Or is there another way to change the network name? I need it
> to avoid overlap between networks from two overclouds. I'd prefer to avoid
> overriding everything from ServiceNetMapDefaults in my network environment
> file.

IMO this isn't a bug, but an RFE perhaps.

The reason is that until a couple of weeks ago, you always had to fully
define all services in ServiceNetMap, so this is basically just a case
where the optimization introduced here (which allows you to partially
specify ServiceNetMap which is then merged with ServiceNetMapDefaults)
doesn't work:

https://review.openstack.org/#/c/353032/

I'd say overriding everything is an OK workaround, but we can definitely
discuss ways to do it more cleanly - I'll give it some thought (probably
we'll need another mapping that defines the network names that can be
easily overidden).

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


  1   2   3   4   5   6   >