Re: [openstack-dev] [TripleO] Improving Swift deployments with TripleO

2016-08-22 Thread Christian Schwede
On 04.08.16 15:39, Giulio Fidente wrote:
> On 08/04/2016 01:26 PM, Christian Schwede wrote:
>> On 04.08.16 10:27, Giulio Fidente wrote:
>>> On 08/02/2016 09:36 PM, Christian Schwede wrote:
 Hello everyone,
>>>
>>> thanks Christian,
>>>
 I'd like to improve the Swift deployments done by TripleO. There are a
 few problems today when deployed with the current defaults:

 1. Adding new nodes (or replacing existing nodes) is not possible,
 because the rings are built locally on each host and a new node doesn't
 know about the "history" of the rings. Therefore rings might become
 different on the nodes, and that results in an unusable state
 eventually.
>>>
>>> one of the ideas for this was to use a tempurl in the undercloud swift
>>> where to upload the rings built by a single overcloud node, not by the
>>> undercloud
>>>
>>> so I proposed a new heat resource which would permit us to create a
>>> swift tempurl in the undercloud during the deployment
>>>
>>> https://review.openstack.org/#/c/350707/
>>>
>>> if we build the rings on the undercloud we can ignore this and use a
>>> mistral action instead, as pointed by Steven
>>>
>>> the good thing about building rings in the overcloud is that it doesn't
>>> force us to have a static node mapping for each and every deployment but
>>> it makes hard to cope with heterogeneous environments
>>
>> That's true. However - we still need to collect the device data from all
>> the nodes from the undercloud, push it to at least one overcloud mode,
>> build/update the rings there, push it to the undercloud Swift and use
>> that on all overcloud nodes. Or not?
> 
> sure, let's build on the undercloud, when automated with mistral it
> shouldn't make a big difference for the user
> 
>> I was also thinking more about the static node mapping and how to avoid
>> this. Could we add a host alias using the node UUIDs? That would never
>> change, it's available from the introspection data and therefore could
>> be used in the rings.
>>
>> http://docs.openstack.org/developer/tripleo-docs/advanced_deployment/node_specific_hieradata.html#collecting-the-node-uuid
>>
> 
> right, this is the mechanism I wanted to use to proviude per-node disk
> maps, it's how it works for ceph disks as well

I looked into this further and proposed a patch upstream:

https://review.openstack.org/358643

This worked fine in my tests, an example /etc/hosts looks like this:

http://paste.openstack.org/show/562206/

And based on that patch we could build the Swift rings even if the nodes
are down and never been deployed, because the system uuid will never
change and is unique. I updated my tripleo-swift-ring-tool and just run
a successful deployment with the patch (also using the merged patches
from Giulio).

Let me know what you think about it - I think with that patch we could
integrate the tripleo-swift-ring-tool.

-- Christian

 2. The rings are only using a single device, and it seems that this is
 just a directory and not a mountpoint with a real device. Therefore
 data
 is stored on the root device - even if you have 100TB disk space in the
 background. If not fixed manually your root device will run out of
 space
 eventually.
>>>
>>> for the disks instead I am thinking to add a create_resources wrapper in
>>> puppet-swift:
>>>
>>> https://review.openstack.org/#/c/350790
>>> https://review.openstack.org/#/c/350840/
>>>
>>> so that we can pass via hieradata per-node swift::storage::disks maps
>>>
>>> we have a mechanism to push per-node hieradata based on the system uuid,
>>> we could extend the tool to capture the nodes (system) uuid and generate
>>> per-node maps
>>
>> Awesome, thanks Giulio!
>>
>> I will test that today. So the tool could generate the mapping
>> automatically, and we don't need to filter puppet facts on the nodes
>> itself. Nice!   
> 
> and we could re-use the same tool to generate the ceph::osds disk maps
> as well :)
> 


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Improving Swift deployments with TripleO

2016-08-04 Thread Giulio Fidente

On 08/04/2016 01:26 PM, Christian Schwede wrote:

On 04.08.16 10:27, Giulio Fidente wrote:

On 08/02/2016 09:36 PM, Christian Schwede wrote:

Hello everyone,


thanks Christian,


I'd like to improve the Swift deployments done by TripleO. There are a
few problems today when deployed with the current defaults:

1. Adding new nodes (or replacing existing nodes) is not possible,
because the rings are built locally on each host and a new node doesn't
know about the "history" of the rings. Therefore rings might become
different on the nodes, and that results in an unusable state eventually.


one of the ideas for this was to use a tempurl in the undercloud swift
where to upload the rings built by a single overcloud node, not by the
undercloud

so I proposed a new heat resource which would permit us to create a
swift tempurl in the undercloud during the deployment

https://review.openstack.org/#/c/350707/

if we build the rings on the undercloud we can ignore this and use a
mistral action instead, as pointed by Steven

the good thing about building rings in the overcloud is that it doesn't
force us to have a static node mapping for each and every deployment but
it makes hard to cope with heterogeneous environments


That's true. However - we still need to collect the device data from all
the nodes from the undercloud, push it to at least one overcloud mode,
build/update the rings there, push it to the undercloud Swift and use
that on all overcloud nodes. Or not?


sure, let's build on the undercloud, when automated with mistral it 
shouldn't make a big difference for the user



I was also thinking more about the static node mapping and how to avoid
this. Could we add a host alias using the node UUIDs? That would never
change, it's available from the introspection data and therefore could
be used in the rings.

http://docs.openstack.org/developer/tripleo-docs/advanced_deployment/node_specific_hieradata.html#collecting-the-node-uuid


right, this is the mechanism I wanted to use to proviude per-node disk 
maps, it's how it works for ceph disks as well



2. The rings are only using a single device, and it seems that this is
just a directory and not a mountpoint with a real device. Therefore data
is stored on the root device - even if you have 100TB disk space in the
background. If not fixed manually your root device will run out of space
eventually.


for the disks instead I am thinking to add a create_resources wrapper in
puppet-swift:

https://review.openstack.org/#/c/350790
https://review.openstack.org/#/c/350840/

so that we can pass via hieradata per-node swift::storage::disks maps

we have a mechanism to push per-node hieradata based on the system uuid,
we could extend the tool to capture the nodes (system) uuid and generate
per-node maps


Awesome, thanks Giulio!

I will test that today. So the tool could generate the mapping
automatically, and we don't need to filter puppet facts on the nodes
itself. Nice!   


and we could re-use the same tool to generate the ceph::osds disk maps 
as well :)


--
Giulio Fidente
GPG KEY: 08D733BA | IRC: gfidente

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Improving Swift deployments with TripleO

2016-08-04 Thread Christian Schwede
On 04.08.16 10:27, Giulio Fidente wrote:
> On 08/02/2016 09:36 PM, Christian Schwede wrote:
>> Hello everyone,
> 
> thanks Christian,
> 
>> I'd like to improve the Swift deployments done by TripleO. There are a
>> few problems today when deployed with the current defaults:
>>
>> 1. Adding new nodes (or replacing existing nodes) is not possible,
>> because the rings are built locally on each host and a new node doesn't
>> know about the "history" of the rings. Therefore rings might become
>> different on the nodes, and that results in an unusable state eventually.
> 
> one of the ideas for this was to use a tempurl in the undercloud swift
> where to upload the rings built by a single overcloud node, not by the
> undercloud
> 
> so I proposed a new heat resource which would permit us to create a
> swift tempurl in the undercloud during the deployment
> 
> https://review.openstack.org/#/c/350707/
> 
> if we build the rings on the undercloud we can ignore this and use a
> mistral action instead, as pointed by Steven
> 
> the good thing about building rings in the overcloud is that it doesn't
> force us to have a static node mapping for each and every deployment but
> it makes hard to cope with heterogeneous environments

That's true. However - we still need to collect the device data from all
the nodes from the undercloud, push it to at least one overcloud mode,
build/update the rings there, push it to the undercloud Swift and use
that on all overcloud nodes. Or not?

That leaves some room for new inconsistencies IMO: how do we ensure that
the overcloud node uses the last ring to start with? Also, ring building
has to be limited to one overcloud node, otherwise we might end up with
multiple ringbuilding nodes? How can an operator manually modify the rings?

The tool to build the rings on the undercloud could be further improved
later, for example I'd like to be able to move data to new nodes slowly
over time, and also query existing storage servers about the progress.
Therefore we need some more functionality than currently available in
the ringbuilding part in puppet-swift IMO.

I think if we move this step to the undercloud we could solve a lot of
these challenges in a consistent way. WDYT?

I was also thinking more about the static node mapping and how to avoid
this. Could we add a host alias using the node UUIDs? That would never
change, it's available from the introspection data and therefore could
be used in the rings.

http://docs.openstack.org/developer/tripleo-docs/advanced_deployment/node_specific_hieradata.html#collecting-the-node-uuid

>> 2. The rings are only using a single device, and it seems that this is
>> just a directory and not a mountpoint with a real device. Therefore data
>> is stored on the root device - even if you have 100TB disk space in the
>> background. If not fixed manually your root device will run out of space
>> eventually.
> 
> for the disks instead I am thinking to add a create_resources wrapper in
> puppet-swift:
> 
> https://review.openstack.org/#/c/350790
> https://review.openstack.org/#/c/350840/
>
> so that we can pass via hieradata per-node swift::storage::disks maps
> 
> we have a mechanism to push per-node hieradata based on the system uuid,
> we could extend the tool to capture the nodes (system) uuid and generate
> per-node maps

Awesome, thanks Giulio!

I will test that today. So the tool could generate the mapping
automatically, and we don't need to filter puppet facts on the nodes
itself. Nice!

> then, with the above puppet changes and having the per-node map and the
> rings download url, we could feed them to the templates, replace with an
> environment the rings building implementation and deploy without further
> customizations
> 
> what do you think?

Yes, that sounds like a good plan to me.

I'll continue working on the ringbuilder tool for now and see how I
integrate this into the Mistral actions.

-- Christian

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Improving Swift deployments with TripleO

2016-08-04 Thread Giulio Fidente

On 08/02/2016 09:36 PM, Christian Schwede wrote:

Hello everyone,


thanks Christian,


I'd like to improve the Swift deployments done by TripleO. There are a
few problems today when deployed with the current defaults:

1. Adding new nodes (or replacing existing nodes) is not possible,
because the rings are built locally on each host and a new node doesn't
know about the "history" of the rings. Therefore rings might become
different on the nodes, and that results in an unusable state eventually.


one of the ideas for this was to use a tempurl in the undercloud swift 
where to upload the rings built by a single overcloud node, not by the 
undercloud


so I proposed a new heat resource which would permit us to create a 
swift tempurl in the undercloud during the deployment


https://review.openstack.org/#/c/350707/

if we build the rings on the undercloud we can ignore this and use a 
mistral action instead, as pointed by Steven


the good thing about building rings in the overcloud is that it doesn't 
force us to have a static node mapping for each and every deployment but 
it makes hard to cope with heterogeneous environments



2. The rings are only using a single device, and it seems that this is
just a directory and not a mountpoint with a real device. Therefore data
is stored on the root device - even if you have 100TB disk space in the
background. If not fixed manually your root device will run out of space
eventually.


for the disks instead I am thinking to add a create_resources wrapper in 
puppet-swift:


https://review.openstack.org/#/c/350790
https://review.openstack.org/#/c/350840/

so that we can pass via hieradata per-node swift::storage::disks maps

we have a mechanism to push per-node hieradata based on the system uuid, 
we could extend the tool to capture the nodes (system) uuid and generate 
per-node maps


then, with the above puppet changes and having the per-node map and the 
rings download url, we could feed them to the templates, replace with an 
environment the rings building implementation and deploy without further 
customizations


what do you think?
--
Giulio Fidente
GPG KEY: 08D733BA | IRC: gfidente

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Improving Swift deployments with TripleO

2016-08-03 Thread Christian Schwede
Thanks Steven for your feedback! Please see my answers inline.

On 02.08.16 23:46, Steven Hardy wrote:
> On Tue, Aug 02, 2016 at 09:36:45PM +0200, Christian Schwede wrote:
>> Hello everyone,
>>
>> I'd like to improve the Swift deployments done by TripleO. There are a
>> few problems today when deployed with the current defaults:
> 
> Thanks for digging into this, I'm aware this has been something of a
> known-issue for some time, so it's great to see it getting addressed :)
> 
> Some comments inline;
> 
>> 1. Adding new nodes (or replacing existing nodes) is not possible,
>> because the rings are built locally on each host and a new node doesn't
>> know about the "history" of the rings. Therefore rings might become
>> different on the nodes, and that results in an unusable state eventually.
>>
>> 2. The rings are only using a single device, and it seems that this is
>> just a directory and not a mountpoint with a real device. Therefore data
>> is stored on the root device - even if you have 100TB disk space in the
>> background. If not fixed manually your root device will run out of space
>> eventually.
>>
>> 3. Even if a real disk is mounted in /srv/node, replacing a faulty disk
>> is much more troublesome. Normally you would simply unmount a disk, and
>> then replace the disk sometime later. But because mount_check is set to
>> False in the storage servers data will be written to the root device in
>> the meantime; and when you finally mount the disk again, you can't
>> simply cleanup.
>>
>> 4. In general, it's not possible to change cluster layout (using
>> different zones/regions/partition power/device weight, slowly adding new
>> devices to avoid 25% of the data will be moved immediately when adding
>> new nodes to a small cluster, ...). You could manually manage your
>> rings, but they will be overwritten finally when updating your overcloud.
>>
>> 5. Missing erasure coding support (or storage policies in general)
>>
>> This sounds bad, however most of the current issues can be fixed using
>> customized templates and some tooling to create the rings in advance on
>> the undercloud node.
>>
>> The information about all the devices can be collected from the
>> introspection data, and by using node placement the nodenames in the
>> rings are known in advance if the nodes are not yet powered on. This
>> ensures a consistent ring state, and an operator can modify the rings if
>> needed and to customize the cluster layout.
>>
>> Using some customized templates we can already do the following:
>> - disable rinbguilding on the nodes
>> - create filesystems on the extra blockdevices
>> - copy ringfiles from the undercloud, using pre-built rings
>> - enable mount_check by default
>> - (define storage policies if needed)
>>
>> I started working on a POC using tripleo-quickstart, some custom
>> templates and a small Python tool to build rings based on the
>> introspection data:
>>
>> https://github.com/cschwede/tripleo-swift-ring-tool
>>
>> I'd like to get some feedback on the tool and templates.
>>
>> - Does this make sense to you?
> 
> Yes, I think the basic workflow described should work, and it's good to see
> that you're passing the ring data via swift as this is consistent with how
> we already pass some data to nodes via our DeployArtifacts interface:
> 
> https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/deploy-artifacts.yaml
> 
> Note however that there are no credentials to access the undercloud swift
> on the nodes, so you'll need to pass a tempurl reference in (which is what
> we do for deploy artifacts, obviously you will have credentials to create
> the container & tempurl on the undercloud).

Ah, that's very useful! I updated my POC; makes one less customized
template and less code to support in the Python tool. Works as expected!

> One slight concern I have is mandating the use of predictable placement -
> it'd be nice to think about ways we might avoid that but the undercloud
> centric approach seems OK for a first pass (in either case I think the
> delivery via swift will be the same).

Do you mean the predictable artifact filename? We could just add a
randomized prefix to the filename IMO.

>> - How (and where) could we integrate this upstream?
> 
> So I think the DeployArtefacts interface may work for this, and we have a
> helper script that can upload data to swift:
> 
> https://github.com/openstack/tripleo-common/blob/master/scripts/upload-swift-artifacts
> 
> This basically pushes a tarball to swift, creates a tempurl, then creates a
> file ($HOME/.tripleo/environments/deployment-artifacts.yaml) which is
> automatically read by tripleoclient on deployment.
> 
> DeployArtifactURLs is already a list, but we'll need to test and confirm we
> can pass both e.g swift ring data and updated puppet modules at the same
> time.

If I see this correct the artifacts are deployed just before Puppet
runs; and the Swift rings doesn't affect the Puppet modules, so that
should be 

Re: [openstack-dev] [TripleO] Improving Swift deployments with TripleO

2016-08-02 Thread Steven Hardy
On Tue, Aug 02, 2016 at 09:36:45PM +0200, Christian Schwede wrote:
> Hello everyone,
> 
> I'd like to improve the Swift deployments done by TripleO. There are a
> few problems today when deployed with the current defaults:

Thanks for digging into this, I'm aware this has been something of a
known-issue for some time, so it's great to see it getting addressed :)

Some comments inline;

> 1. Adding new nodes (or replacing existing nodes) is not possible,
> because the rings are built locally on each host and a new node doesn't
> know about the "history" of the rings. Therefore rings might become
> different on the nodes, and that results in an unusable state eventually.
> 
> 2. The rings are only using a single device, and it seems that this is
> just a directory and not a mountpoint with a real device. Therefore data
> is stored on the root device - even if you have 100TB disk space in the
> background. If not fixed manually your root device will run out of space
> eventually.
> 
> 3. Even if a real disk is mounted in /srv/node, replacing a faulty disk
> is much more troublesome. Normally you would simply unmount a disk, and
> then replace the disk sometime later. But because mount_check is set to
> False in the storage servers data will be written to the root device in
> the meantime; and when you finally mount the disk again, you can't
> simply cleanup.
> 
> 4. In general, it's not possible to change cluster layout (using
> different zones/regions/partition power/device weight, slowly adding new
> devices to avoid 25% of the data will be moved immediately when adding
> new nodes to a small cluster, ...). You could manually manage your
> rings, but they will be overwritten finally when updating your overcloud.
> 
> 5. Missing erasure coding support (or storage policies in general)
> 
> This sounds bad, however most of the current issues can be fixed using
> customized templates and some tooling to create the rings in advance on
> the undercloud node.
> 
> The information about all the devices can be collected from the
> introspection data, and by using node placement the nodenames in the
> rings are known in advance if the nodes are not yet powered on. This
> ensures a consistent ring state, and an operator can modify the rings if
> needed and to customize the cluster layout.
> 
> Using some customized templates we can already do the following:
> - disable rinbguilding on the nodes
> - create filesystems on the extra blockdevices
> - copy ringfiles from the undercloud, using pre-built rings
> - enable mount_check by default
> - (define storage policies if needed)
> 
> I started working on a POC using tripleo-quickstart, some custom
> templates and a small Python tool to build rings based on the
> introspection data:
> 
> https://github.com/cschwede/tripleo-swift-ring-tool
> 
> I'd like to get some feedback on the tool and templates.
> 
> - Does this make sense to you?

Yes, I think the basic workflow described should work, and it's good to see
that you're passing the ring data via swift as this is consistent with how
we already pass some data to nodes via our DeployArtifacts interface:

https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/deploy-artifacts.yaml

Note however that there are no credentials to access the undercloud swift
on the nodes, so you'll need to pass a tempurl reference in (which is what
we do for deploy artifacts, obviously you will have credentials to create
the container & tempurl on the undercloud).

One slight concern I have is mandating the use of predictable placement -
it'd be nice to think about ways we might avoid that but the undercloud
centric approach seems OK for a first pass (in either case I think the
delivery via swift will be the same).

> - How (and where) could we integrate this upstream?

So I think the DeployArtefacts interface may work for this, and we have a
helper script that can upload data to swift:

https://github.com/openstack/tripleo-common/blob/master/scripts/upload-swift-artifacts

This basically pushes a tarball to swift, creates a tempurl, then creates a
file ($HOME/.tripleo/environments/deployment-artifacts.yaml) which is
automatically read by tripleoclient on deployment.

DeployArtifactURLs is already a list, but we'll need to test and confirm we
can pass both e.g swift ring data and updated puppet modules at the same
time.

The part that actually builds the rings on the undercloud will probably
need to be created as a custom mistral action:

https://github.com/openstack/tripleo-common/tree/master/tripleo_common/actions

These are then driven as part of the deployment workflow (although the
final workflow where this will wire in hasn't yet landed):

https://review.openstack.org/#/c/298732/

> - Templates might be included in tripleo-heat-templates?

Yes, although by the look of it there may be few template changes required.

If you want to remove the current ringbuilder puppet step completely, you
can simply remove