date:20140122


On 21/01/14 13:14 -0500, Joe Gordon wrote:


On Jan 17, 2014 12:24 AM, Flavio Percoco fla...@redhat.com wrote:


On 16/01/14 17:32 -0500, Doug Hellmann wrote:


On Thu, Jan 16, 2014 at 3:19 PM, Ben Nemec openst...@nemebean.com wrote:

   On 2014-01-16 13:48, John Griffith wrote:

   Hey Everyone,

   A review came up today that cherry-picked a specific commit to OSLO
   Incubator, without updating the rest of the files in the module.  I
   rejected that patch, because my philosophy has been that when you
   update/pull from oslo-incubator it should be done as a full sync of
   the entire module, not a cherry pick of the bits and pieces that you
   may or may not be interested in.

   As it turns out I've received a bit of push back on this, so it seems
   maybe I'm being unreasonable, or that I'm mistaken in my

understanding

   of the process here.  To me it seems like a complete and total waste
   to have an oslo-incubator and common libs if you're going to turn
   around and just cherry pick changes, but maybe I'm completely out of
   line.

   Thoughts??


   I suppose there might be exceptions, but in general I'm with you.  For

one

   thing, if someone tries to pull out a specific change in the Oslo code,
   there's no guarantee that code even works.  Depending on how the sync was
   done it's possible the code they're syncing never passed the Oslo unit
   tests in the form being synced, and since unit tests aren't synced to the
   target projects it's conceivable that completely broken code could get
   through Jenkins.

   Obviously it's possible to do a successful partial sync, but for the sake
   of reviewer sanity I'm -1 on partial syncs without a _very_ good reason
   (like it's blocking the gate and there's some reason the full module

can't

   be synced).


I agree. Cherry picking a single (or even partial) commit really should be
avoided.

The update tool does allow syncing just a single module, but that should be
used very VERY carefully, especially because some of the changes we're

making

as we work on graduating some more libraries will include cross-dependent
changes between oslo modules.



Agrred. Syncing on master should be complete synchornization from Oslo
incubator. IMHO, the only case where cherry-picking from oslo should
be allowed is when backporting patches to stable branches. Master
branches should try to keep up-to-date with Oslo and sync everything
every time.


When we started Oslo incubator, we treated that code as trusted. But since then
there have been occasional issues when syncing the code. So Oslo incubator code
has lost *my* trust. Therefore I am always a hesitant to do a full Oslo sync
because I am not an expert on the Oslo code and I risk breaking something when
doing it (the issue may not appear 100% of the time too). Syncing code in
becomes the first time that code is run against tempest, which scares me.


While this might be true in some cases, I think we should address it
differently. Just dropping the trust on the project won't help much.


I would like to propose having a integration test job in Oslo incubator that
syncs in the code, similar to how we do global requirements.


But isn't this what other gates are for? I mean, when proposing an
oslo sync, each project has it's own gate plus integrated tests that
do this exact job.


Additionally, what about a periodic jenkins job that does the Oslo syncs and is
managed by the Oslo team itself?


This would be awesome. It would take the burden of doing the sync from
the project maintainers. Before doing this, though, we need to improve
the `update` script. Currently, there's no good way to generate useful
commit messages out of the sync.

Cheers,
FF

--
@flaper87
Flavio Percoco


pgpfQZZGqrqIv.pgp
Description: PGP signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][neutron]About creating vms without ip address

2014-01-22 Thread Miguel Angel Ajo Pelayo



Hi Dong,
 
Can you elaborate an example of what you get, and what you were expecting 
exactly?.

I have a similar problem within one operator, where they assign you sparse 
blocks
of IP addresses (floating IPs), directly routed to your machine, and they also
assign the virtual mac addresses from their API.

Direct routing means, that the subnet router will route your IP from 
outside the
subnet directly through your subnet, to your machine..., and the traffic (with 
external IP)
is routed back to this internal router through the subnet to this router.

   Chears,
 
- Original Message -
 From: Dong Liu willowd...@gmail.com
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Sent: Tuesday, January 21, 2014 9:52:44 AM
 Subject: [openstack-dev] [nova][neutron]About creating vms without ip address
 
 Hi fellow OpenStackers
 
 I found that we could not create vms without ip address. But in the
 telecom scene, the ip address usually managed by the telecom network
 element themselves. So they need a vm without ip address and configurate
 it through some specific method. How can we provide a kind of vms like this.
 
 I think provide a bility that allow tenant to create vm without ip
 address is necessary.
 
 What's your opinion?
 
 
 Regards
 
 Dong Liu
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][Neutron] PMTUd broken in gre networks

2014-01-22 Thread Ian Wells

On 22 January 2014 00:00, Robert Collins robe...@robertcollins.net wrote:

 I think dropping frames that can't be forwarded is entirely sane - at

a guess it's what a physical ethernet switch would do if you try to
 send a 1600 byte frame (on a non-jumbo-frame switched network) - but
 perhaps there is an actual standard for this we could follow?


Speaking from bitter experience, if you've misconfigured your switch so
that it's dropping packets for this reason, you will have a period of hair
tearing out to solve the problem before you work it out.  Believe me, been
there, rabbit messages that don't turn up because they're the first ones
that were too big are not a helpful diagnostic indicator.

Getting the MTU *right* on all hosts seems to be key to keeping your hair
attached to your head for a little longer.  Hence the DHCP suggestion to
set it to the right value.

 (c) we require Neutron plugins to work out the MTU, which for
  any encap except VLAN is (host interface MTU - header size).

 do you mean tunnel wrap overheads? (What if a particular tunnel has a
 trailer.. crazy talk I know).


Yup, basically.  Unfortunately, thinking about this a bit more, you can't
easily be certain what the max packet size allowed in a GRE tunnel is going
to be, because you don't know which interface it's going over (or what's
between), but to a certain extent we can use config items to fix what we
can't discover.

-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] [Tuskar] Terminology Revival #1 - Roles




On 2014/22/01 00:56, Tzu-Mainn Chen wrote:

Hiya - Resource is actually a Heat term that corresponds to what we're 
deploying within
the Overcloud Stack - i.e., if we specify that we want an Overcloud with 1 
Controller
and 3 Compute, Heat will create a Stack that contains 1 Controller and 3 Compute
Resources.


Then a quick question - why do we design deployment by 
increasing/decreasing number of *instances* instead of resources?


-- Jarda

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] Questions regarding image location and glanceclient behaviour ...

2014-01-22 Thread Public Mail

Hi All,

I have two questions ...

1) Glance v1 APIs can take a --location argument when creating an image
   but v2 APIs can't - bug or feature? (Details below)

2) How should glanceclient (v2 commands) handle reserved attributes?
a) status quo: (Apparently) let the user set them but the server
   will return attribute is reserved error.  Pros: No missing
   functionality, no damage done.  Cons: Bad usability.
b) hard-code list of reserved attributes in client and don't expose
   them to the user.
Pros: quick to implement.
Cons: Need to track reserved attributes in server
implementation.
c) get-reserved words from schema downloaded from server (and don't
   expose them to the user).
Pros: Don't need to track server implmentation.
Cons: Complex - reserved words can vary from command to
command.

  I personally favor (b) on the grounds that a client implementation
  needs to closely understand server behaviour anyway so the sync-ing
  of reserved attributes shouldn't be a big problem (*provided* the
  list of reserved attributes is made available in the reference
  documentation which doesn't seem to be the case currently).

So what does everybody think?

details
When using glance client's v1 interface I can image-create an image and
specify the image file's location via the --location parameter.
Alternatively I can image-create an empty image and then image-update the
image's location to some url.

However, when using the client's v2 commands I can neither image-create the
file using the --location parameter, nor image-update the file later.

When using image-create with --location, the client gives the following
error (printed by warlock):

  Unable to set 'locations' to '[u'http://192.168.1.111/foo/bar']'

This is because the schema dictates that the location should be an object
of the form [{url: string, metadata: object}, ...] but there is no
way to specify such an object from the command line - I cannot specify a
string like '{url: 192.168.1.111/foo/bar, metadata: {}}' for there is
no conversion from command line strings to python dicts nor is there any
conversion from a simple URL string to a suitable location object.

If I modify glanceclient.v2.images.Controller.create to convert the
locations parameter from a URL string to the desired object then the
request goes through to the glance server where it fails with a 403 error
(Attribute 'locations' is reserved).

So is this discrepancy between V1  V2 deliberate (a feature :)) or is it a
bug?
/details

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] [Tuskar] Terminology Revival #1 - Roles




On 2014/22/01 00:56, Tzu-Mainn Chen wrote:

Hiya - Resource is actually a Heat term that corresponds to what we're 
deploying within
the Overcloud Stack - i.e., if we specify that we want an Overcloud with 1 
Controller
and 3 Compute, Heat will create a Stack that contains 1 Controller and 3 Compute
Resources.


Then a quick question - why do we design deployment by 
increasing/decreasing number of *instances* instead of resources?


-- Jarda

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [requirements][oslo] Upgrade six to 1.5.2?

2014-01-22 Thread Julien Danjou

On Tue, Jan 21 2014, ZhiQiang Fan wrote:

 six 1.5.2 has been released on 2014-01-06, it provides urllib/urlparse
 compatibility. Is there any plan to upgrade six to 1.5.2? (since it is
 fresh new, may need some time to test)

 six 1.4.1 is lack of urllib/urlparse support, so oslo-incubator/py3kcompat
 is needed, and it is used in some projects, if we upgrade six, should we
 remove py3k in the same time, or just leave those code there?

Upgrade and remove our own code, that'd be better. I think us all
Python 3 hackers will be ok enough to handle the transition as needed.

-- 
Julien Danjou
# Free Software hacker # independent consultant
# http://julien.danjou.info


signature.asc
Description: PGP signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [OpenStack-Dev] Cherry picking commit from oslo-incubator

2014-01-22 Thread Julien Danjou

On Tue, Jan 21 2014, Joe Gordon wrote:

 I would like to propose having a integration test job in Oslo incubator
 that syncs in the code, similar to how we do global requirements.

I don't think that would be possible as a voting job, since the point of
oslo-incubator is to be able to break the API compatibility.

-- 
Julien Danjou
;; Free Software hacker ; independent consultant
;; http://julien.danjou.info


signature.asc
Description: PGP signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Next steps for Whole Host allocation / Pclouds

2014-01-22 Thread Sylvain Bauza


Le 22/01/2014 02:50, Jay Pipes a écrit :


Yup, agreed. It's difficult to guess what the capacity implications
would be without having solid numbers on customer demands for this
functionality, including hard data on how long such instances would
typically live (see my previous point about re-using compute hosts for
other purposes once the last dedicated instance is terminated on that
host).

Best,
-jay




My personal opinion (but I can be wrong) is that such feature would only 
be accepted by operators only if there is some termination period 
defined when you create a dedicated instance.
Again, what happens when the lease (or the lock-in) ends should be 
defined by the operator, on his own convenience, and that's why Climate 
is behaviour-driven by configuration flags for lease termination.



Back to the initial subject, I think that's pretty good having such 
dedicated instances model in Nova (thanks to an API extension, which 
could be non-core), but the instance lifecycle (in case of termination 
period) should stay in Climate, IMHO.


-Sylvain

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [gantt] How to include nova modules in unit tests

2014-01-22 Thread Sylvain Bauza


Le 22/01/2014 01:37, Dugger, Donald D a écrit :


Sylvain-

Tnx, that worked great.

(Now if I can just find a way to get the affinity tests working, all 
the other tests pass.  I only have 17 tests failing out of 254.)





I'm pretty busy these days with Climate 0.1 to deliver, but if I find 
some time, I will take a look on these.


-Sylvain
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [requirements][oslo] Upgrade six to 1.5.2?

2014-01-22 Thread Chmouel Boudjnah

On Wed, Jan 22, 2014 at 11:17 AM, Julien Danjou jul...@danjou.info wrote:

 On Tue, Jan 21 2014, ZhiQiang Fan wrote:

  six 1.5.2 has been released on 2014-01-06, it provides urllib/urlparse
  compatibility. Is there any plan to upgrade six to 1.5.2? (since it is
  fresh new, may need some time to test)
 
  six 1.4.1 is lack of urllib/urlparse support, so
 oslo-incubator/py3kcompat
  is needed, and it is used in some projects, if we upgrade six, should we
  remove py3k in the same time, or just leave those code there?

 Upgrade and remove our own code, that'd be better. I think us all



+1 less code we maintain is a good thing :)

Chmouel.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][Neutron] PMTUd broken in gre networks

2014-01-22 Thread Robert Collins

On 22 January 2014 21:28, Ian Wells ijw.ubu...@cack.org.uk wrote:
 On 22 January 2014 00:00, Robert Collins robe...@robertcollins.net wrote:

 I think dropping frames that can't be forwarded is entirely sane - at

 a guess it's what a physical ethernet switch would do if you try to
 send a 1600 byte frame (on a non-jumbo-frame switched network) - but
 perhaps there is an actual standard for this we could follow?


 Speaking from bitter experience, if you've misconfigured your switch so that
 it's dropping packets for this reason, you will have a period of hair
 tearing out to solve the problem before you work it out.  Believe me, been
 there, rabbit messages that don't turn up because they're the first ones
 that were too big are not a helpful diagnostic indicator.

PMTU blackhole problems show the same symptoms :) - been there, done tat.

 Getting the MTU *right* on all hosts seems to be key to keeping your hair
 attached to your head for a little longer.  Hence the DHCP suggestion to set
 it to the right value.

I certainly think having the MTU set to the right value is important.
I wonder if there's a standard way we can signal the MTU (e.g. in the
virtio interface) other than DHCP. Not because DHCP is bad, but
because that would work with statically injected network configs as
well.

  (c) we require Neutron plugins to work out the MTU, which for
  any encap except VLAN is (host interface MTU - header size).

 do you mean tunnel wrap overheads? (What if a particular tunnel has a
 trailer.. crazy talk I know).


 Yup, basically.  Unfortunately, thinking about this a bit more, you can't
 easily be certain what the max packet size allowed in a GRE tunnel is going
 to be, because you don't know which interface it's going over (or what's
 between), but to a certain extent we can use config items to fix what we
 can't discover.

One thing we could do is encourage OS vendors to turn
/proc/sys/net/ipv4/tcp_mtu_probing
(http://www.ietf.org/rfc/rfc4821.txt) on in combination with dropping
over-size frames. That should detect the actual MTU.

Another thing would be for encapsulation failures in the switch to be
reflected in the vNIC in the instance - export back media errors (e.g.
babbles) so that users can diagnose problems.

Note that IPv6 doesn't *have* a DF bit, because routers are not
permitted to fragment - arguably encapsulating an ipv6 frame in GRE
and then fragmenting the outer layer is a violation of that.

As for automatically determining the size - we can determine the PMTU
between all hosts in the mesh, report those back centrally and take
the lowest then subtract the GRE overhead.

-Rob


-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [Heat] Reducing pep8 ignores

2014-01-22 Thread Pavlo Shchelokovskyy

Hi all,

we have an approved blueprint that concerns reducing number of ignored PEP8
and openstack/hacking style checks for heat (
https://blueprints.launchpad.net/heat/+spec/reduce-flake8-ignored-rules).
I've been already warned that enabling some of these rules will be quite
controversial, and personally I do not like some of these rules myself
either. In order to understand what is the opinion of the community, I
would like to ask you to leave a comment on the blueprint page about what
do you think about enabling these checks.

The style rules being currently ignored are:
F841 local variable 'json_template' is assigned to but never used
H201 no 'except:' at least use 'except Exception:' (this actually checks
for bare 'except:' lines, so 'except BaseException:' will pass too)
H302 do not import objects, only modules (this I don't like myself as it
can clutter the code beyond reasonable limit)
H306 imports not in alphabetical order
H404 multi line docstring should start with a summary

Another question I have is how to proceed with such changes. I've already
implemented H306 (order of imports) and am being now puzzled with how to
propose such change to Gerrit. This change naturally touches many files
(163 so far) and as such is clearly not suited for review in one piece. The
only solution I currently can think of is to split it in 4-5-6 patches
without actually changing tox.ini, and after all of them are merged, issue
a final patch that updates tox.ini and any files breaking the rule that
were introduced in between. But there is still a question on how Jenkins
works with verify and merge jobs. Can it happen that we end up with code in
master that does not pass pep8 check? Or there will be a 'race condition'
between my final patch and any other that breaks the style rules? I would
really appreciate any thoughts and comments about this.

Best regards,
Pavlo Shchelokovskyy.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [savanna] savannaclient v2 api

2014-01-22 Thread Alexander Ignatov

Current EDP config-hints are not only plugin specific. Several types of jobs
must have certain key/values and without it job will fail. For instance,
MapReduce (former Jar) job type requires Mapper/Reducer classes parameters
to be set[1]. Moreover, for such kind of jobs we already have separated
configuration defaults [2]. Also initial versions of patch implementing
config-hints contained plugin-independent defaults for all each job types [3].
I remember we postponed decision about which configs are commmon for all
plugins and agreed to show users all vanilla-specific defaults. That's why now
we have several TODOs in the code about config-hints should be plugin-specific.

So I propose to leave config-hints REST call in EDP internal and make it
plugin-independent (or job-specific) by removing of parsing all
vanilla-specific
defaults and define small list of configs which is definitely common for each
type of jobs.
The first things come to mind:
- For MapReduce jobs it's already defined in [1]
- Configs like number of map and reduce tasks are common for all type of jobs
- At least user always has an ability to set any key/value(s) as
params/arguments for job

[1] http://docs.openstack.org/developer/savanna/userdoc/edp.html#workflow
[2]
https://github.com/openstack/savanna/blob/master/savanna/service/edp/resources/mapred-job-config.xml
[3] https://review.openstack.org/#/c/45419/10

Regards,
Alexander Ignatov

On 20 Jan 2014, at 22:04, Matthew Farrellee m...@redhat.com wrote:

On 01/20/2014 12:50 PM, Andrey Lazarev wrote:
Inlined.

On Mon, Jan 20, 2014 at 8:15 AM, Matthew Farrellee m...@redhat.com
mailto:m...@redhat.com wrote:

(inline, trying to make this readable by a text-only mail client
that doesn't use tabs to indicate quoting)

On 01/20/2014 02:50 AM, Andrey Lazarev wrote:

--
FIX - @rest.get('/jobs/config-hints/job_type') -
should move to
GET /plugins/plugin_name/plugin_version, similar to
get_node_processes
and get_required_image_tags
--
Not sure if it should be plugin specific right now. EDP
uses it
to show some
configs to users in the dashboard. it's just a cosmetic
thing.
Also when user
starts define some configs for some job he might not define
cluster yet and
thus plugin to run this job. I think we should leave it
as is
and leave only
abstract configs like Mapper/Reducer class and allow
users to
apply any
key/value configs if needed.

FYI, the code contains comments suggesting it should be
plugin specific.

https://github.com/openstack/savanna/blob/master/savanna/service/edp/workflow_creator/workflow_factory.py#L179

https://github.com/openstack/__savanna/blob/master/savanna/__service/edp/workflow_creator/__workflow_factory.py#L179

https://github.com/openstack/savanna/blob/master/savanna/service/edp/workflow_creator/workflow_factory.py#L179

IMHO, the EDP should have no plugin specific dependencies.

If it currently does, we should look into why and see if we
can't
eliminate this entirely.

[AL] EDP uses plugins in two ways:
1. for HDFS user
2. for config hints
I think both items should not be plugin specific on EDP API
level. But
implementation should go to plugin and call plugin API for result.

In fact they are both plugin specific. The user is forced to click
through a plugin selection (when launching a job on transient
cluster) or the plugin selection has already occurred (when
launching a job on an existing cluster).

Since the config is something that is plugin specific, you might not
have hbase hints from vanilla but you would from hdp, and you
already have plugin information whenever you ask for a hint, my view
that this be under the /plugins namespace is growing stronger.

[AL] Disagree. They are plugin specific, but EDP itself could have
additional plugin-independent logic inside. Now config hints return EDP
properties (like mapred.input.dir) as well as plugin-specific
properties. Placing it under /plugins namespace will give a vision that
it is fully plugin specific.

I like to see EDP API fully plugin independent and in one workspace. If
core side needs some information internally it can easily go into the
plugin.

I'm not sure if we're disagreeing. We may, in fact, be in violent agreement.

The EDP API is fully plugin independent, and should stay that way as a

Re: [openstack-dev] [requirements][oslo] Upgrade six to 1.5.2?


On 22/01/14 11:40 +0100, Chmouel Boudjnah wrote:


On Wed, Jan 22, 2014 at 11:17 AM, Julien Danjou jul...@danjou.info wrote:

   On Tue, Jan 21 2014, ZhiQiang Fan wrote:

six 1.5.2 has been released on 2014-01-06, it provides urllib/urlparse
compatibility. Is there any plan to upgrade six to 1.5.2? (since it is
fresh new, may need some time to test)
   
six 1.4.1 is lack of urllib/urlparse support, so oslo-incubator/
   py3kcompat
is needed, and it is used in some projects, if we upgrade six, should we
remove py3k in the same time, or just leave those code there?

   Upgrade and remove our own code, that'd be better. I think us all



+1 less code we maintain is a good thing :)


+1 :)



Chmouel.



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



--
@flaper87
Flavio Percoco


pgp1bpGlNdhXH.pgp
Description: PGP signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Disabling file injection by default

2014-01-22 Thread Kashyap Chamarthy

On 01/22/2014 03:27 AM, Robert Collins wrote:
 On 22 January 2014 10:50, Kashyap Chamarthy kcham...@redhat.com wrote:
 [CC'ed libguestfs author, Rich Jones]

 Heya,


 On 01/21/2014 07:59 AM, Robert Collins wrote:
 I was reminded of this while I cleaned up failed file injection nbd
 devices on ci-overcloud.tripleo.org :/ - what needs to happen for us
 to change the defaults around file injection so that it's disabled?

 I presume you're talking about libguestfs based file injection. I
 remember recently debugging/testing by disabling it to isolate a
 different problem:

inject_partition=-2
 
 No, the default is nbd based injection, which is terrible on two counts:
  - its got horrible security ramifications
  - its a horrible thing to be doing
 
 libguestfs based injection is only terrible on one count:
  - its a horrible thing to be doing
 
 That said, I'm trying to understand the rationale of your proposal in
 this case. Can you point me to a URL or some such? I'm just curious as a
 heavy user of libguestfs.
 
 There's nothing wrong with libguestfs, this is about the feature which
 has been discussed, here, a lot :) - for delivering metadata to
 images, config-drive || metadata service are much better. Hypervisors
 shouldn't be in the business of tinkering inside VM file systems at
 all.
 

Thanks for the details, Robert and Rich.

-- 
/kashyap

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][Neutron] PMTUd broken in gre networks

2014-01-22 Thread Ian Wells

On 22 January 2014 12:01, Robert Collins robe...@robertcollins.net wrote:

  Getting the MTU *right* on all hosts seems to be key to keeping your hair
  attached to your head for a little longer.  Hence the DHCP suggestion to
 set
  it to the right value.

 I certainly think having the MTU set to the right value is important.
 I wonder if there's a standard way we can signal the MTU (e.g. in the
 virtio interface) other than DHCP. Not because DHCP is bad, but
 because that would work with statically injected network configs as
 well.


To the best of my knowledge, no.  And it wants to be a part of the static
config too.

derail
And the static config, the last I checked, also sucks - we really want the
data to be in a metadata format that cloud-init absorbs, but the last I
checked there's a feature in config-drive et al that writes
/etc/network/interfaces.  Which is no use to anyone on Windows, or Redhat,
or...
/derail


 One thing we could do is encourage OS vendors to turn
 /proc/sys/net/ipv4/tcp_mtu_probing
 (http://www.ietf.org/rfc/rfc4821.txt) on in combination with dropping
 over-size frames. That should detect the actual MTU.


Though it's really a bit of a workaround.

Another thing would be for encapsulation failures in the switch to be
 reflected in the vNIC in the instance - export back media errors (e.g.
 babbles) so that users can diagnose problems.


Ditto.


 Note that IPv6 doesn't *have* a DF bit, because routers are not
 permitted to fragment - arguably encapsulating an ipv6 frame in GRE
 and then fragmenting the outer layer is a violation of that.


Fragmentation is fine for the tunnel, *if* the tunnel also reassembles. The
issue of fragmentation is it's horrible to implement on all your endpoints,
aiui, and used to lead to innumerable fragmentation attacks.

As for automatically determining the size - we can determine the PMTU
 between all hosts in the mesh, report those back centrally and take
 the lowest then subtract the GRE overhead.


If there's one path, and if there's no lower MTU on the GRE path (which can
go via routers)...  We can make an educated guess at the MTU but we can't
know it without testing each GRE tunnel as we set it up (and multiple
routes defeats even that) so I would recommend a config option as the best
of a nasty set of choices.  It can still go wrong but it's then blatantly
and obviously a config fault rather than some code guessing wrong, which
would be harder for an end user to work around.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron][IPv6] A pair of mode keywords

2014-01-22 Thread Ian Wells

On 21 January 2014 22:46, Veiga, Anthony anthony_ve...@cable.comcast.comwrote:


Hi, Sean and Xuhan:

  I totally agree. This is not the ultimate solution with the assumption
 that we had to use “enable_dhcp”.

  We haven’t decided the name of another parameter, however, we are open
 to any suggestions. As we mentioned during the meeting, the second
 parameter should highlight the need of addressing. If so, it should have at
 least four values:

  1) off (i.e. address is assigned by external devices out of OpenStack
 control)
 2) slaac (i.e. address is calculated based on RA sent by OpenStack dnsmasq)
 3) dhcpv6-stateful (i.e. address is obtained from OpenStack dnsmasq acting
 as DHCPv6 stateful server)
 4) dhcpv6-stateless (i.e. address is calculated based on RA sent from
 either OpenStack dnsmasq, or external router, and optional information is
 retrieved from OpenStack dnsmasq acting as DHCPv6 stateless server)


So how does this work if I have an external DHCPv6 server and an internal
router?  (How baroque do we have to get?)  enable_dhcp, for backward
compatibility reasons, should probably disable *both* RA and DHCPv6,
despite the name, so we can't use that to disable the DHCP server.  We
could add a *third* attribute, which I hate as an idea but does resolve the
problem - one flag for each of the servers, one for the mode the servers
are operating in, and enable_dhcp which needs to DIAF but will persist till
the API is revved.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] [Tuskar] Terminology Revival #1 - Roles




On 2014/22/01 10:00, Jaromir Coufal wrote:



On 2014/22/01 00:56, Tzu-Mainn Chen wrote:

Hiya - Resource is actually a Heat term that corresponds to what we're
deploying within
the Overcloud Stack - i.e., if we specify that we want an Overcloud
with 1 Controller
and 3 Compute, Heat will create a Stack that contains 1 Controller and
3 Compute
Resources.


Then a quick question - why do we design deployment by
increasing/decreasing number of *instances* instead of resources?

-- Jarda


And one more thing - Resource is very broad term as well as Role is. The 
only difference is that Heat accepted 'Resource' as specific term for 
them (you see? they used broad term for their concept). So I am asking 
myself, where is difference between generic term Resource and Role? Why 
cannot we accept Roles? It's short, well describing...


I am leaning towards Role. We can be more specific with adding some 
extra word, e.g.:

* Node Role
* Deployment Role
... and if we are in the context of undercloud, people can shorten it to 
just Roles. But 'Resource Category' seems to me that it doesn't solve 
anything.


-- Jarda

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] [Tuskar] Terminology Revival #1 - Roles

2014-01-22 Thread Oleg Gelbukh

Hello, Jaromir

On Wed, Jan 22, 2014 at 4:09 PM, Jaromir Coufal jcou...@redhat.com wrote:


 I am leaning towards Role. We can be more specific with adding some extra
 word, e.g.:
 * Node Role


We use this term a lot internally for the very similar purpose, so it looks
reasonable to me.
Just my 2c.

--
Best regards,
Oleg Gelbukh


 * Deployment Role
 ... and if we are in the context of undercloud, people can shorten it to
 just Roles. But 'Resource Category' seems to me that it doesn't solve
 anything.


 -- Jarda

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Heat] Reducing pep8 ignores

2014-01-22 Thread Steven Hardy

On Wed, Jan 22, 2014 at 01:23:05PM +0200, Pavlo Shchelokovskyy wrote:
 Hi all,
 
 we have an approved blueprint that concerns reducing number of ignored PEP8
 and openstack/hacking style checks for heat (
 https://blueprints.launchpad.net/heat/+spec/reduce-flake8-ignored-rules).
 I've been already warned that enabling some of these rules will be quite
 controversial, and personally I do not like some of these rules myself
 either. In order to understand what is the opinion of the community, I
 would like to ask you to leave a comment on the blueprint page about what
 do you think about enabling these checks.
 
 The style rules being currently ignored are:
 F841 local variable 'json_template' is assigned to but never used

This was fixed an enabled in https://review.openstack.org/#/c/62827/

 H201 no 'except:' at least use 'except Exception:' (this actually checks
 for bare 'except:' lines, so 'except BaseException:' will pass too)

This sounds reasonable, we made an effort to purge naked excepts a while
back so hopefully it shouldn't be too difficult to enable.

However there are a couple of remaining instances (in resource.py and
scheduler.py in particular), so we need to evaluate if these are
justifiable or need to be reworked.

 H302 do not import objects, only modules (this I don't like myself as it
 can clutter the code beyond reasonable limit)
 H306 imports not in alphabetical order
 H404 multi line docstring should start with a summary

Personally I don't care much about any of these, in particular the import
ones seem to me unncessesarily inconvenient so I'd prefer to leave these
disabled.

H404 is probably a stronger argument, as it would help improve the
quality of our auto-generated docs, but again I see it as of marginal
value considering the (probably large) effort involved.

I'd rather see that effort used to provide a better, more automated way to
keep our API documentation updated (since that's the documentation users
really need, combined with the existing template/resource documentation).

 Another question I have is how to proceed with such changes. I've already
 implemented H306 (order of imports) and am being now puzzled with how to
 propose such change to Gerrit. This change naturally touches many files
 (163 so far) and as such is clearly not suited for review in one piece. The
 only solution I currently can think of is to split it in 4-5-6 patches
 without actually changing tox.ini, and after all of them are merged, issue
 a final patch that updates tox.ini and any files breaking the rule that
 were introduced in between. But there is still a question on how Jenkins
 works with verify and merge jobs. Can it happen that we end up with code in
 master that does not pass pep8 check? Or there will be a 'race condition'
 between my final patch and any other that breaks the style rules? I would
 really appreciate any thoughts and comments about this.

If you do proceed with the work, then I thing those reviewing will just
have to police the queue and ensure we don't merge patches which break the
style rules after you've fixed them.

Steve

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Heat] Reducing pep8 ignores

On 01/22/2014 06:23 AM, Pavlo Shchelokovskyy wrote:
 Hi all,
 
 we have an approved blueprint that concerns reducing number of ignored
 PEP8 and openstack/hacking style checks for heat
 (https://blueprints.launchpad.net/heat/+spec/reduce-flake8-ignored-rules).
 I've been already warned that enabling some of these rules will be quite
 controversial, and personally I do not like some of these rules myself
 either. In order to understand what is the opinion of the community, I
 would like to ask you to leave a comment on the blueprint page about
 what do you think about enabling these checks.
 
 The style rules being currently ignored are:
 F841 local variable 'json_template' is assigned to but never used
 H201 no 'except:' at least use 'except Exception:' (this actually checks
 for bare 'except:' lines, so 'except BaseException:' will pass too)
 H302 do not import objects, only modules (this I don't like myself as it
 can clutter the code beyond reasonable limit)

Realize you can do import aliases.

import sqlalchemy as sa

That looks to be best practice in the python community right now.

 H306 imports not in alphabetical order
 H404 multi line docstring should start with a summary
 
 Another question I have is how to proceed with such changes. I've
 already implemented H306 (order of imports) and am being now puzzled
 with how to propose such change to Gerrit. This change naturally touches
 many files (163 so far) and as such is clearly not suited for review in
 one piece. The only solution I currently can think of is to split it in
 4-5-6 patches without actually changing tox.ini, and after all of them
 are merged, issue a final patch that updates tox.ini and any files
 breaking the rule that were introduced in between. But there is still a
 question on how Jenkins works with verify and merge jobs. Can it happen
 that we end up with code in master that does not pass pep8 check? Or
 there will be a 'race condition' between my final patch and any other
 that breaks the style rules? I would really appreciate any thoughts and
 comments about this.

As long as it is all done in a git patch series on your side, with the
patches stacked on top of each other in the correct order, it will be fine.

The system that we have won't let you merge a pep8 error, you are
protected from that.

When I was doing similar cleanups for nova last year I just ended up
with a 17 deep patch queue, which tended to merge in chunks of 4 then
need rebasing, as something else landed and changed in front of me.

-Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [OpenStack-Dev] Cherry picking commit from oslo-incubator

On 01/22/2014 05:19 AM, Julien Danjou wrote:
 On Tue, Jan 21 2014, Joe Gordon wrote:
 
 I would like to propose having a integration test job in Oslo incubator
 that syncs in the code, similar to how we do global requirements.
 
 I don't think that would be possible as a voting job, since the point of
 oslo-incubator is to be able to break the API compatibility.

I'm starting to feel like we need to revisit that point. Because what
happens now is a chunk of code gets worked off in a corner, possibly
randomly changing interfaces, not running unit tests in a way that we
know it's multi process safe.

So there ends up being a ton of blind trust in the sync right now. Which
is why the syncs are coming slower, and you'll have nova 4 - 6 months
behind on many modules, missing a critical bug fix that's buried some
where inside a bunch of other interface changes that are expensive. (Not
theoretical, I just tripped over this in Dec).

I think we need to graduate things to stable interfaces a lot faster.
Realizing that stable just means have to deprecate to change it. So
the interface is still changeable, just requires standard deprecation
techniques. Which we are trying to get more python libraries to do
anyway, so it would be good if we built up a bunch of best practices here.

-Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][neutron] PCI pass-through SRIOV

2014-01-22 Thread Robert Li (baoli)

Sounds great! Let's do it on Thursday.

--Robert

On 1/22/14 12:46 AM, Irena Berezovsky 
ire...@mellanox.commailto:ire...@mellanox.com wrote:

Hi Robert, all,
I would suggest not to delay the SR-IOV discussion to the next week.
Let’s try to cover the SRIOV side and especially the nova-neutron interaction 
points and interfaces this Thursday.
Once we have the interaction points well defined, we can run parallel patches 
to cover the full story.

Thanks a lot,
Irena

From: Robert Li (baoli) [mailto:ba...@cisco.com]
Sent: Wednesday, January 22, 2014 12:02 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: [openstack-dev] [nova][neutron] PCI passthrough SRIOV

Hi Folks,

As the debate about PCI flavor versus host aggregate goes on, I'd like to move 
forward with the SRIOV side of things in the same time. I know that tomorrow's 
IRC will be focusing on the BP review, and it may well continue into Thursday. 
Therefore, let's start discussing SRIOV side of things on Monday.

Basically, we need to work out the details on:
-- regardless it's PCI flavor or host aggregate or something else, how 
to use it to specify a SRIOV port.
-- new parameters for —nic
-- new parameters for neutron net-create/neutron port-create
-- interface between nova and neutron
-- nova side of work
-- neutron side of work

We should start coding ASAP.

Thanks,
Robert


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron] Neutron should disallow /32 CIDR

2014-01-22 Thread Paul Ward


Thanks for your input, Carl.  You're right, it seems the more appropriate
place for this is _validate_subnet().  It checks ip version, gateway,
etc... but not the size of the subnet.



Carl Baldwin c...@ecbaldwin.net wrote on 01/21/2014 09:22:55 PM:

 From: Carl Baldwin c...@ecbaldwin.net
 To: OpenStack Development Mailing List
openstack-dev@lists.openstack.org,
 Date: 01/21/2014 09:27 PM
 Subject: Re: [openstack-dev] [neutron] Neutron should disallow /32 CIDR

 The bottom line is that the method you mentioned shouldn't validate
 the subnet. It should assume the subnet has been validated and
 validate the pool.  It seems to do a adequate job of that.
 Perhaps there is a _validate_subnet method that you should be
 focused on?  (I'd check but I don't have convenient access to the
 code at the moment)
 Carl
 On Jan 21, 2014 6:16 PM, Paul Ward wpw...@us.ibm.com wrote:
 You beat me to it. :)  I just responded about not checking the
 allocation pool start and end but rather, checking subnet_first_ip
 and subnet_last_ip, which is set as follows:

 subnet = netaddr.IPNetwork(subnet_cidr)
 subnet_first_ip = netaddr.IPAddress(subnet.first + 1)
 subnet_last_ip = netaddr.IPAddress(subnet.last - 1)

 However, I'm curious about your contention that we're ok... I'm
 assuming you mean that this should already be handled.   I don't
 believe anything is really checking to be sure the allocation pool
 leaves room for a gateway, I think it just makes sure it fits in the
 subnet.  A member of our test team successfully created a network
 with a subnet of 255.255.255.255, so it got through somehow.  I will
 look into that more tomorrow.



 Carl Baldwin c...@ecbaldwin.net wrote on 01/21/2014 05:27:49 PM:

  From: Carl Baldwin c...@ecbaldwin.net
  To: OpenStack Development Mailing List (not for usage questions)
  openstack-dev@lists.openstack.org,
  Date: 01/21/2014 05:32 PM
  Subject: Re: [openstack-dev] [neutron] Neutron should disallow /32 CIDR
 
  I think there may be some confusion between the two concepts:  subnet
  and allocation pool.  You are right that an ipv4 subnet smaller than
  /30 is not useable on a network.
 
  However, this method is checking the validity of an allocation pool.
  These pools should not include room for a gateway nor broadcast
  address.  Their relation to subnets is that the range of ips contained
  in the pool must fit within the allocatable IP space on the subnet
  from which they are allocated.  Other than that, they are simple
  ranges; they don't need to be cidr aligned or anything.  A pool of a
  single IP is valid.
 
  I just checked the method's implementation now.  It does check that
  the pool fits within the allocatable range of the subnet.  I think
  we're good.
 
  Carl
 
  On Tue, Jan 21, 2014 at 3:35 PM, Paul Ward wpw...@us.ibm.com wrote:
   Currently, NeutronDbPluginV2._validate_allocation_pools() does some
very
   basic checking to be sure the specified subnet is valid.  One thing
that's
   missing is checking for a CIDR of /32.  A subnet with one IP address
in it
   is unusable as the sole IP address will be allocated to the gateway,
and
   thus no IPs are left over to be allocated to VMs.
  
   The fix for this is simple.  In
   NeutronDbPluginV2._validate_allocation_pools(), we'd check for
start_ip ==
   end_ip and raise an exception if that's true.
  
   I've opened lauchpad bug report 1271311
   (https://bugs.launchpad.net/neutron/+bug/1271311) for this, but
wanted to
   start a discussion here to see if others find this enhancement to be
a
   valuable addition.
  
  
   ___
   OpenStack-dev mailing list
   OpenStack-dev@lists.openstack.org
   http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
  
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] [Tuskar] Terminology Revival #1 - Roles

That's a fair question; I'd argue that it *should* be resources.  When we
update an overcloud deployment, it'll create additional resources.

Mainn

- Original Message -
 
 
 On 2014/22/01 00:56, Tzu-Mainn Chen wrote:
  Hiya - Resource is actually a Heat term that corresponds to what we're
  deploying within
  the Overcloud Stack - i.e., if we specify that we want an Overcloud with 1
  Controller
  and 3 Compute, Heat will create a Stack that contains 1 Controller and 3
  Compute
  Resources.
 
 Then a quick question - why do we design deployment by
 increasing/decreasing number of *instances* instead of resources?
 
 -- Jarda
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] [Tuskar] Terminology Revival #1 - Roles



- Original Message -
 
 
 On 2014/22/01 10:00, Jaromir Coufal wrote:
 
 
  On 2014/22/01 00:56, Tzu-Mainn Chen wrote:
  Hiya - Resource is actually a Heat term that corresponds to what we're
  deploying within
  the Overcloud Stack - i.e., if we specify that we want an Overcloud
  with 1 Controller
  and 3 Compute, Heat will create a Stack that contains 1 Controller and
  3 Compute
  Resources.
 
  Then a quick question - why do we design deployment by
  increasing/decreasing number of *instances* instead of resources?
 
  -- Jarda
 
 And one more thing - Resource is very broad term as well as Role is. The
 only difference is that Heat accepted 'Resource' as specific term for
 them (you see? they used broad term for their concept). So I am asking
 myself, where is difference between generic term Resource and Role? Why
 cannot we accept Roles? It's short, well describing...

True, but Heat was creating something new, while it seems like (to me),
our intention is mostly to consume other Openstack APIs and expose the
results in the UI.  If I call a Heat API which returns something that 
they call a Resource, I think it's confusing to developers to rename
that.

 I am leaning towards Role. We can be more specific with adding some
 extra word, e.g.:
 * Node Role
 * Deployment Role
 ... and if we are in the context of undercloud, people can shorten it to
 just Roles. But 'Resource Category' seems to me that it doesn't solve
 anything.

I'd be okay with Resource Role!

 -- Jarda
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [OpenStack-Dev] Cherry picking commit from oslo-incubator


On 22/01/14 07:32 -0500, Sean Dague wrote:

On 01/22/2014 05:19 AM, Julien Danjou wrote:

On Tue, Jan 21 2014, Joe Gordon wrote:


I would like to propose having a integration test job in Oslo incubator
that syncs in the code, similar to how we do global requirements.


I don't think that would be possible as a voting job, since the point of
oslo-incubator is to be able to break the API compatibility.


I'm starting to feel like we need to revisit that point. Because what
happens now is a chunk of code gets worked off in a corner, possibly
randomly changing interfaces, not running unit tests in a way that we
know it's multi process safe.


This is not true. If there have been abrupt changes on the interfaces
then we failed at keeping backwards compatibility. However, that
doesn't mean the API is not considered when reviewing nor that it
doesn't matter because the library is incubated.

This kind of changes usually get filed on all projects using a
specific functionality. For example, 
https://bugs.launchpad.net/oslo/+bug/1266962

Again, if there have been cases where an API has been changed without
even notifying others - either with a good commit message, m-l thread
or bug report - then there's definitely something wrong in the process
and it should be fixed. Also, I'd expect these errors to be raised as
soon as they're noted because they may also affect other projects as
well.

The above is very different than just saying oslo-incubator is not
trustworthy because things get copied around and changes to the
libraries are made randomly.


So there ends up being a ton of blind trust in the sync right now. Which
is why the syncs are coming slower, and you'll have nova 4 - 6 months
behind on many modules, missing a critical bug fix that's buried some
where inside a bunch of other interface changes that are expensive. (Not
theoretical, I just tripped over this in Dec).


I'm sorry but this is not an excuse to avoid syncing from
oslo-incubator. Actually, if things like this can happen, the bigger
the gap is the harder it'll be to sync from oslo. My suggestion has
always been to do periodic syncs from oslo and keep up to day.
Interface changes that *just* break other projects without a good way
forward have to be raised here.

I know we're talking about incubated libraries that are suppose to
change but as mentioned above, we always consider backwards
compatibility even on incubated libs because they're on its way to
stability and breaking other projects is not fun.


I think we need to graduate things to stable interfaces a lot faster.
Realizing that stable just means have to deprecate to change it. So
the interface is still changeable, just requires standard deprecation
techniques. Which we are trying to get more python libraries to do
anyway, so it would be good if we built up a bunch of best practices here.


Agreed. This is something that we've been working on during Icehouse.
We should probably define more clear what's the incubation path of
modules that land in oslo-incubator. For example, determine where
would they fit, how long should they be around based on their
functionality and/or complexity etc.

We talked about having a meeting on this matter after I-2. Not sure
when it'll happen but it'll be a perfect time to discuss this further.

Cheers,
FF

--
@flaper87
Flavio Percoco


pgpv4mSdRVkCi.pgp
Description: PGP signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] [Tuskar] Terminology Revival #1 - Roles

  On 2014/22/01 10:00, Jaromir Coufal wrote:
  
  
   On 2014/22/01 00:56, Tzu-Mainn Chen wrote:
   Hiya - Resource is actually a Heat term that corresponds to what we're
   deploying within
   the Overcloud Stack - i.e., if we specify that we want an Overcloud
   with 1 Controller
   and 3 Compute, Heat will create a Stack that contains 1 Controller and
   3 Compute
   Resources.
  
   Then a quick question - why do we design deployment by
   increasing/decreasing number of *instances* instead of resources?
  
   -- Jarda
  
  And one more thing - Resource is very broad term as well as Role is. The
  only difference is that Heat accepted 'Resource' as specific term for
  them (you see? they used broad term for their concept). So I am asking
  myself, where is difference between generic term Resource and Role? Why
  cannot we accept Roles? It's short, well describing...
 
 True, but Heat was creating something new, while it seems like (to me),
 our intention is mostly to consume other Openstack APIs and expose the
 results in the UI.  If I call a Heat API which returns something that
 they call a Resource, I think it's confusing to developers to rename
 that.
 
  I am leaning towards Role. We can be more specific with adding some
  extra word, e.g.:
  * Node Role
  * Deployment Role
  ... and if we are in the context of undercloud, people can shorten it to
  just Roles. But 'Resource Category' seems to me that it doesn't solve
  anything.
 
 I'd be okay with Resource Role!

Actually - didn't someone raise the objection that Role was a defined term 
within
Keystone and potentially a source of confusion?

Mainn

  -- Jarda
  
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
  
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] Gate Update - Wed Morning Edition

Things aren't great, but they are actually better than yesterday.

Vital Stats:
  Gate queue length: 107
  Check queue length: 107
  Head of gate entered: 45hrs ago
  Changes merged in last 24hrs: 58

The 58 changes merged is actually a good number, not a great number, but
best we've seen in a number of days. I saw at least a 6 streak merge
yesterday, so zuul is starting to behave like we expect it should.

= Previous Top Bugs =

Our previous top 2 issues - 1270680 and 1270608 (not confusing at all)
are under control.

Bug 1270680 - v3 extensions api inherently racey wrt instances

Russell managed the second part of the fix for this, we've not seen it
come back since that was ninja merged.

Bug 1270608 - n-cpu 'iSCSI device not found' log causes
gate-tempest-dsvm-*-full to fail

Turning off the test that was triggering this made it completely go
away. We'll have to revisit if that's because there is a cinder bug or a
tempest bug, but we'll do that once the dust has settled.

= New Top Bugs =

Note: all fail numbers are across all queues

Bug 1253896 - Attempts to verify guests are running via SSH fails. SSH
connection to guest does not work.

83 fails in 24hrs


Bug 1224001 - test_network_basic_ops fails waiting for network to become
available

51 fails in 24hrs


Bug 1254890 - Timed out waiting for thing causes tempest-dsvm-* failures

30 fails in 24hrs


We are now sorting - http://status.openstack.org/elastic-recheck/ by
failures in the last 24hrs, so we can use it more as a hit list. The top
3 issues are fingerprinted against infra, but are mostly related to
normal restart operations at this point.

= Starvation Update =

with 214 jobs across queues, and averaging 7 devstack nodes per job, our
working set is 1498 nodes (i.e. if we had than number we'd be able to be
running all the jobs right now in parallel).

Our current quota of nodes gives us ~ 480. Which is  1/3 our working
set, and part of the reasons for delays. Rackspace has generously
increased our quota in 2 of their availability zones, and Monty is going
to prioritize getting those online.

Because of Jenkins scaling issues (it starts generating failures when
talking to too many build slaves), that means spinning up more Jenkins
masters. We've found a 1 / 100 ratio makes Jenkins basically stable,
pushing beyond that means new fails. Jenkins is not inherently elastic,
so this is a somewhat manual process. Monty is diving on that.

There is also a TCP slow start algorthm for zuul that Clark was working
on yesterday, which we'll put into production as soon as it is good.
This will prevent us from speculating all the way down the gate queue,
just to throw it all away on a reset. It acts just like TCP, on every
success we grow our speculation length, on every fail we reduce it, with
a sane minimum so we don't over throttle ourselves.


Thanks to everyone that's been pitching in digging on reset bugs. More
help is needed. Many core reviewers are at this point completely
ignoring normal reviews until the gate is back, so if you are waiting
for a review on some code, the best way to get it, is help us fix the
bugs reseting the gate.

-Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] [Tuskar] Terminology Revival #1 - Roles

2014-01-22 Thread Dougal Matthews


On 22/01/14 14:31, Tzu-Mainn Chen wrote:

On 2014/22/01 10:00, Jaromir Coufal wrote:



On 2014/22/01 00:56, Tzu-Mainn Chen wrote:

Hiya - Resource is actually a Heat term that corresponds to what we're
deploying within
the Overcloud Stack - i.e., if we specify that we want an Overcloud
with 1 Controller
and 3 Compute, Heat will create a Stack that contains 1 Controller and
3 Compute
Resources.


Then a quick question - why do we design deployment by
increasing/decreasing number of *instances* instead of resources?

-- Jarda


And one more thing - Resource is very broad term as well as Role is. The
only difference is that Heat accepted 'Resource' as specific term for
them (you see? they used broad term for their concept). So I am asking
myself, where is difference between generic term Resource and Role? Why
cannot we accept Roles? It's short, well describing...


True, but Heat was creating something new, while it seems like (to me),
our intention is mostly to consume other Openstack APIs and expose the
results in the UI.  If I call a Heat API which returns something that
they call a Resource, I think it's confusing to developers to rename
that.


I am leaning towards Role. We can be more specific with adding some
extra word, e.g.:
* Node Role
* Deployment Role
... and if we are in the context of undercloud, people can shorten it to
just Roles. But 'Resource Category' seems to me that it doesn't solve
anything.


I'd be okay with Resource Role!


Actually - didn't someone raise the objection that Role was a defined term 
within
Keystone and potentially a source of confusion?

Mainn


Yup, I think the concern was that it could be confused with User Roles. 
However, Resource Role is probably clear enough IMO.



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Next steps for Whole Host allocation / Pclouds

2014-01-22 Thread Mike Spreitzer

 From: Sylvain Bauza sylvain.ba...@bull.net
 
 Le 22/01/2014 02:50, Jay Pipes a écrit :
 
  Yup, agreed. It's difficult to guess what the capacity implications
  would be without having solid numbers on customer demands for this
  functionality, including hard data on how long such instances would
  typically live (see my previous point about re-using compute hosts for
  other purposes once the last dedicated instance is terminated on that
  host).
 
  Best,
  -jay
 
 
 
 My personal opinion (but I can be wrong) is that such feature would only


 be accepted by operators only if there is some termination period 
 defined when you create a dedicated instance.
 Again, what happens when the lease (or the lock-in) ends should be 
 defined by the operator, on his own convenience, and that's why Climate 
 is behaviour-driven by configuration flags for lease termination.

Is that enough?  Remember that some of us are concerned with business 
workloads, rather than HPC jobs.  While it might be acceptable in a 
business workload to plan on regularly recycling every individual 
instance, it is definitely not acceptable to plan on a specific end to a 
given workload.  And if the workload lives on, then usage of particular 
hosts can live on (at least for a pretty large amount of time like the 
product of (lifetime of a VM) * (number of VMs on the host) ).

Thanks,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Gate Update - Wed Morning Edition

On 01/22/2014 09:38 AM, Sean Dague wrote:
 Things aren't great, but they are actually better than yesterday.
 
 Vital Stats:
   Gate queue length: 107
   Check queue length: 107
   Head of gate entered: 45hrs ago
   Changes merged in last 24hrs: 58
 
 The 58 changes merged is actually a good number, not a great number, but
 best we've seen in a number of days. I saw at least a 6 streak merge
 yesterday, so zuul is starting to behave like we expect it should.
 
 = Previous Top Bugs =
 
 Our previous top 2 issues - 1270680 and 1270608 (not confusing at all)
 are under control.
 
 Bug 1270680 - v3 extensions api inherently racey wrt instances
 
 Russell managed the second part of the fix for this, we've not seen it
 come back since that was ninja merged.
 
 Bug 1270608 - n-cpu 'iSCSI device not found' log causes
 gate-tempest-dsvm-*-full to fail
 
 Turning off the test that was triggering this made it completely go
 away. We'll have to revisit if that's because there is a cinder bug or a
 tempest bug, but we'll do that once the dust has settled.
 
 = New Top Bugs =
 
 Note: all fail numbers are across all queues
 
 Bug 1253896 - Attempts to verify guests are running via SSH fails. SSH
 connection to guest does not work.
 
 83 fails in 24hrs
 
 
 Bug 1224001 - test_network_basic_ops fails waiting for network to become
 available
 
 51 fails in 24hrs
 
 
 Bug 1254890 - Timed out waiting for thing causes tempest-dsvm-* failures
 
 30 fails in 24hrs
 
 
 We are now sorting - http://status.openstack.org/elastic-recheck/ by
 failures in the last 24hrs, so we can use it more as a hit list. The top
 3 issues are fingerprinted against infra, but are mostly related to
 normal restart operations at this point.
 
 = Starvation Update =
 
 with 214 jobs across queues, and averaging 7 devstack nodes per job, our
 working set is 1498 nodes (i.e. if we had than number we'd be able to be
 running all the jobs right now in parallel).
 
 Our current quota of nodes gives us ~ 480. Which is  1/3 our working
 set, and part of the reasons for delays. Rackspace has generously
 increased our quota in 2 of their availability zones, and Monty is going
 to prioritize getting those online.
 
 Because of Jenkins scaling issues (it starts generating failures when
 talking to too many build slaves), that means spinning up more Jenkins
 masters. We've found a 1 / 100 ratio makes Jenkins basically stable,
 pushing beyond that means new fails. Jenkins is not inherently elastic,
 so this is a somewhat manual process. Monty is diving on that.
 
 There is also a TCP slow start algorthm for zuul that Clark was working
 on yesterday, which we'll put into production as soon as it is good.
 This will prevent us from speculating all the way down the gate queue,
 just to throw it all away on a reset. It acts just like TCP, on every
 success we grow our speculation length, on every fail we reduce it, with
 a sane minimum so we don't over throttle ourselves.
 
 
 Thanks to everyone that's been pitching in digging on reset bugs. More
 help is needed. Many core reviewers are at this point completely
 ignoring normal reviews until the gate is back, so if you are waiting
 for a review on some code, the best way to get it, is help us fix the
 bugs reseting the gate.

One last thing, Anita has also gotten on top of pruning out all the
neutron changes from the gate. Something is very wrong in the neutron
isolated jobs right now, so their chance of passing is close enough to
0, that we need to keep them out of the gate. This is a new regression
in the last couple of days.

This is a contributing factor in the gates moving again.

She and Mark are rallying the Neutron folks to sort this one out.

-Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Gate Update - Wed Morning Edition

2014-01-22 Thread Salvatore Orlando

It's worth noticing that elastic recheck is signalling bug 1253896 and bug
1224001 but they have actually the same signature.
I found also interesting that neutron is triggering a lot bug 1254890,
which appears to be a hang on /dev/nbdX during key injection; so far I have
no explanation for that.

As suggested on IRC, the neutron isolated job had a failure rate of about
5-7% last week (until thursday I think). It might be therefore also looking
at tempest/devstack patches which might be triggering failure or uncovering
issues in neutron.

I shared a few findings on the mailing list yesterday ([1]). I hope people
actively looking at failures will find them helpful.

Salvatore

[1]
http://lists.openstack.org/pipermail/openstack-dev/2014-January/025013.html


On 22 January 2014 14:57, Sean Dague s...@dague.net wrote:

 On 01/22/2014 09:38 AM, Sean Dague wrote:
  Things aren't great, but they are actually better than yesterday.
 
  Vital Stats:
Gate queue length: 107
Check queue length: 107
Head of gate entered: 45hrs ago
Changes merged in last 24hrs: 58
 
  The 58 changes merged is actually a good number, not a great number, but
  best we've seen in a number of days. I saw at least a 6 streak merge
  yesterday, so zuul is starting to behave like we expect it should.
 
  = Previous Top Bugs =
 
  Our previous top 2 issues - 1270680 and 1270608 (not confusing at all)
  are under control.
 
  Bug 1270680 - v3 extensions api inherently racey wrt instances
 
  Russell managed the second part of the fix for this, we've not seen it
  come back since that was ninja merged.
 
  Bug 1270608 - n-cpu 'iSCSI device not found' log causes
  gate-tempest-dsvm-*-full to fail
 
  Turning off the test that was triggering this made it completely go
  away. We'll have to revisit if that's because there is a cinder bug or a
  tempest bug, but we'll do that once the dust has settled.
 
  = New Top Bugs =
 
  Note: all fail numbers are across all queues
 
  Bug 1253896 - Attempts to verify guests are running via SSH fails. SSH
  connection to guest does not work.
 
  83 fails in 24hrs
 
 
  Bug 1224001 - test_network_basic_ops fails waiting for network to become
  available
 
  51 fails in 24hrs
 
 
  Bug 1254890 - Timed out waiting for thing causes tempest-dsvm-*
 failures
 
  30 fails in 24hrs
 
 
  We are now sorting - http://status.openstack.org/elastic-recheck/ by
  failures in the last 24hrs, so we can use it more as a hit list. The top
  3 issues are fingerprinted against infra, but are mostly related to
  normal restart operations at this point.
 
  = Starvation Update =
 
  with 214 jobs across queues, and averaging 7 devstack nodes per job, our
  working set is 1498 nodes (i.e. if we had than number we'd be able to be
  running all the jobs right now in parallel).
 
  Our current quota of nodes gives us ~ 480. Which is  1/3 our working
  set, and part of the reasons for delays. Rackspace has generously
  increased our quota in 2 of their availability zones, and Monty is going
  to prioritize getting those online.
 
  Because of Jenkins scaling issues (it starts generating failures when
  talking to too many build slaves), that means spinning up more Jenkins
  masters. We've found a 1 / 100 ratio makes Jenkins basically stable,
  pushing beyond that means new fails. Jenkins is not inherently elastic,
  so this is a somewhat manual process. Monty is diving on that.
 
  There is also a TCP slow start algorthm for zuul that Clark was working
  on yesterday, which we'll put into production as soon as it is good.
  This will prevent us from speculating all the way down the gate queue,
  just to throw it all away on a reset. It acts just like TCP, on every
  success we grow our speculation length, on every fail we reduce it, with
  a sane minimum so we don't over throttle ourselves.
 
 
  Thanks to everyone that's been pitching in digging on reset bugs. More
  help is needed. Many core reviewers are at this point completely
  ignoring normal reviews until the gate is back, so if you are waiting
  for a review on some code, the best way to get it, is help us fix the
  bugs reseting the gate.

 One last thing, Anita has also gotten on top of pruning out all the
 neutron changes from the gate. Something is very wrong in the neutron
 isolated jobs right now, so their chance of passing is close enough to
 0, that we need to keep them out of the gate. This is a new regression
 in the last couple of days.

 This is a contributing factor in the gates moving again.

 She and Mark are rallying the Neutron folks to sort this one out.

 -Sean

 --
 Sean Dague
 Samsung Research America
 s...@dague.net / sean.da...@samsung.com
 http://dague.net


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list

Re: [openstack-dev] [TripleO] [Tuskar] Terminology Revival #1 - Roles


On Jan 22, 2014, at 4:02 AM, Jaromir Coufal jcou...@redhat.com wrote:

 
 
 On 2014/22/01 00:56, Tzu-Mainn Chen wrote:
 Hiya - Resource is actually a Heat term that corresponds to what we're 
 deploying within
 the Overcloud Stack - i.e., if we specify that we want an Overcloud with 1 
 Controller
 and 3 Compute, Heat will create a Stack that contains 1 Controller and 3 
 Compute
 Resources.
 
 Then a quick question - why do we design deployment by increasing/decreasing 
 number of *instances* instead of resources?

Yeah, great question Jarda. When I test out the “Stacks” functionality in 
Horizon the user doesn’t create a Stack that spins up resources, it spins up 
instances. Maybe there is a difference around the terms being used behind the 
scenes and in Horizon? 

Liz

 
 -- Jarda
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] [Tuskar] Terminology Revival #1 - Roles


On Jan 22, 2014, at 9:52 AM, Dougal Matthews dou...@redhat.com wrote:

 On 22/01/14 14:31, Tzu-Mainn Chen wrote:
 On 2014/22/01 10:00, Jaromir Coufal wrote:
 
 
 On 2014/22/01 00:56, Tzu-Mainn Chen wrote:
 Hiya - Resource is actually a Heat term that corresponds to what we're
 deploying within
 the Overcloud Stack - i.e., if we specify that we want an Overcloud
 with 1 Controller
 and 3 Compute, Heat will create a Stack that contains 1 Controller and
 3 Compute
 Resources.
 
 Then a quick question - why do we design deployment by
 increasing/decreasing number of *instances* instead of resources?
 
 -- Jarda
 
 And one more thing - Resource is very broad term as well as Role is. The
 only difference is that Heat accepted 'Resource' as specific term for
 them (you see? they used broad term for their concept). So I am asking
 myself, where is difference between generic term Resource and Role? Why
 cannot we accept Roles? It's short, well describing...
 
 True, but Heat was creating something new, while it seems like (to me),
 our intention is mostly to consume other Openstack APIs and expose the
 results in the UI.  If I call a Heat API which returns something that
 they call a Resource, I think it's confusing to developers to rename
 that.
 
 I am leaning towards Role. We can be more specific with adding some
 extra word, e.g.:
 * Node Role
 * Deployment Role
 ... and if we are in the context of undercloud, people can shorten it to
 just Roles. But 'Resource Category' seems to me that it doesn't solve
 anything.
 
 I'd be okay with Resource Role!
 
 Actually - didn't someone raise the objection that Role was a defined term 
 within
 Keystone and potentially a source of confusion?
 

Yeah, that was me :)

 Mainn
 
 Yup, I think the concern was that it could be confused with User Roles. 
 However, Resource Role is probably clear enough IMO.
 

Exactly. If we add something to make “Role” more specific to the user it would 
be much more clear.

Liz

 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] [Tuskar] Terminology Revival #1 - Roles


On Jan 22, 2014, at 7:09 AM, Jaromir Coufal jcou...@redhat.com wrote:

 
 
 On 2014/22/01 10:00, Jaromir Coufal wrote:
 
 
 On 2014/22/01 00:56, Tzu-Mainn Chen wrote:
 Hiya - Resource is actually a Heat term that corresponds to what we're
 deploying within
 the Overcloud Stack - i.e., if we specify that we want an Overcloud
 with 1 Controller
 and 3 Compute, Heat will create a Stack that contains 1 Controller and
 3 Compute
 Resources.
 
 Then a quick question - why do we design deployment by
 increasing/decreasing number of *instances* instead of resources?
 
 -- Jarda
 
 And one more thing - Resource is very broad term as well as Role is. The only 
 difference is that Heat accepted 'Resource' as specific term for them (you 
 see? they used broad term for their concept). So I am asking myself, where is 
 difference between generic term Resource and Role? Why cannot we accept 
 Roles? It's short, well describing...
 
 I am leaning towards Role. We can be more specific with adding some extra 
 word, e.g.:
 * Node Role

+1 to Node Role. I agree that “role” is being used as a generic term here. I’m 
still convinced it’s important to use “Node” in the name since this is the item 
we are describing by assigning it a certain type of role.

Liz

 * Deployment Role
 ... and if we are in the context of undercloud, people can shorten it to just 
 Roles. But 'Resource Category' seems to me that it doesn't solve anything.
 
 -- Jarda
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] [Tuskar] Terminology Revival #1 - Roles

 On Jan 22, 2014, at 7:09 AM, Jaromir Coufal jcou...@redhat.com wrote:
 
  
  
  On 2014/22/01 10:00, Jaromir Coufal wrote:
  
  
  On 2014/22/01 00:56, Tzu-Mainn Chen wrote:
  Hiya - Resource is actually a Heat term that corresponds to what we're
  deploying within
  the Overcloud Stack - i.e., if we specify that we want an Overcloud
  with 1 Controller
  and 3 Compute, Heat will create a Stack that contains 1 Controller and
  3 Compute
  Resources.
  
  Then a quick question - why do we design deployment by
  increasing/decreasing number of *instances* instead of resources?
  
  -- Jarda
  
  And one more thing - Resource is very broad term as well as Role is. The
  only difference is that Heat accepted 'Resource' as specific term for them
  (you see? they used broad term for their concept). So I am asking myself,
  where is difference between generic term Resource and Role? Why cannot we
  accept Roles? It's short, well describing...
  
  I am leaning towards Role. We can be more specific with adding some extra
  word, e.g.:
  * Node Role
 
 +1 to Node Role. I agree that “role” is being used as a generic term here.
 I’m still convinced it’s important to use “Node” in the name since this is
 the item we are describing by assigning it a certain type of role.

I'm *strongly* against Node Role.  In Ironic, a Node has no explicit Role 
assigned
to it; whatever Role it has is implicit through the Instance running on it
(which maps to a Heat Resource).

In that sense, we're not really monitoring Nodes; we're monitoring Resources, 
and
a Node just happens to be one attribute of a Resource.

Mainn

 Liz
 
  * Deployment Role
  ... and if we are in the context of undercloud, people can shorten it to
  just Roles. But 'Resource Category' seems to me that it doesn't solve
  anything.
  
  -- Jarda
  
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] [Tuskar] Terminology Revival #1 - Roles

 On Jan 22, 2014, at 4:02 AM, Jaromir Coufal jcou...@redhat.com wrote:
 
  
  
  On 2014/22/01 00:56, Tzu-Mainn Chen wrote:
  Hiya - Resource is actually a Heat term that corresponds to what we're
  deploying within
  the Overcloud Stack - i.e., if we specify that we want an Overcloud with 1
  Controller
  and 3 Compute, Heat will create a Stack that contains 1 Controller and 3
  Compute
  Resources.
  
  Then a quick question - why do we design deployment by
  increasing/decreasing number of *instances* instead of resources?
 
 Yeah, great question Jarda. When I test out the “Stacks” functionality in
 Horizon the user doesn’t create a Stack that spins up resources, it spins up
 instances. Maybe there is a difference around the terms being used behind
 the scenes and in Horizon?

Maybe we're looking at different parts of the UI, but when I look at a Stack
detail page in Horizon, I see a tab for Resources, and not Instances.  The 
resource
table might link to an Instance, but that information is retrieved from the 
Resource.

Mainn

 Liz
 
  
  -- Jarda
  
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] [Tuskar] Terminology Revival #1 - Roles


Oh dear user... :)

I'll step a little bit back. We need to agree if we want to name 
concepts one way in the background and other way in the UI for user (did 
we already agree on this point?). We all know pros and cons. And I will 
still fight for users to get global infrastructure terminology  (e.g. he 
is going to define Node Profiles instead of Flavors). Because I received 
a lot of negative feedback on mixing overcloud terms into undercloud, 
confusion about overcloud/undercloud term itself, etc. If it would be 
easier for developers to name the concepts in the background differently 
then it's fine - we just need to talk about 2 terms per concept then. 
And I would be a bit afraid of schizophrenia...



On 2014/22/01 15:10, Tzu-Mainn Chen wrote:

That's a fair question; I'd argue that it *should* be resources.  When we
update an overcloud deployment, it'll create additional resources.


Honestly it would get super confusing for me, if somebody tells me - you 
have 5 compute resources. (And I am talking from user's world, not from 
developers one). But resource itself can be anything.


-- Jarda

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] [Tuskar] [UX] Infrastructure Management UI - Icehouse scoped wireframes


Hey everybody,

I am sending updated wireframes.

http://people.redhat.com/~jcoufal/openstack/tripleo/2014-01-22_tripleo-ui-icehouse.pdf

Updates:
* p15-18 for down-scaling deployment

Any questions are welcome, I am happy to answer them.
-- Jarda


On 2014/16/01 01:50, Jaromir Coufal wrote:

Hi folks,

thanks everybody for feedback. Based on that I updated wireframes and
tried to provide a minimum scope for Icehouse timeframe.

http://people.redhat.com/~jcoufal/openstack/tripleo/2014-01-16_tripleo-ui-icehouse.pdf


Hopefully we are able to deliver described set of features. But if you
find something what is missing which is critical for the first release
(or that we are implementing a feature which should not have such high
priority), please speak up now.

The wireframes are very close to implementation. In time, there will
appear more views and we will see if we can get them in as well.

Thanks all for participation
-- Jarda

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] [Tuskar] Terminology Revival #1 - Roles


On Jan 22, 2014, at 10:53 AM, Jaromir Coufal jcou...@redhat.com wrote:

 Oh dear user... :)
 
 I'll step a little bit back. We need to agree if we want to name concepts one 
 way in the background and other way in the UI for user (did we already agree 
 on this point?). We all know pros and cons. And I will still fight for users 
 to get global infrastructure terminology  (e.g. he is going to define Node 
 Profiles instead of Flavors). Because I received a lot of negative feedback 
 on mixing overcloud terms into undercloud, confusion about 
 overcloud/undercloud term itself, etc. If it would be easier for developers 
 to name the concepts in the background differently then it's fine - we just 
 need to talk about 2 terms per concept then. And I would be a bit afraid of 
 schizophrenia…
 

Haha, sorry if this is my fault for reviving this whole thread :) Terminology 
is always tough. It probably makes sense to start with where we initially 
agreed and get some Operators eyes on the design so that they can weigh in. 

Liz

 
 On 2014/22/01 15:10, Tzu-Mainn Chen wrote:
 That's a fair question; I'd argue that it *should* be resources.  When we
 update an overcloud deployment, it'll create additional resources.
 
 Honestly it would get super confusing for me, if somebody tells me - you have 
 5 compute resources. (And I am talking from user's world, not from developers 
 one). But resource itself can be anything.
 
 -- Jarda
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] [Tuskar] [UX] Infrastructure Management UI - Icehouse scoped wireframes


On Jan 20, 2014, at 3:02 AM, Jaromir Coufal jcou...@redhat.com wrote:

 Hello everybody,
 
 based on feedback which I received last week, I am sending updated 
 wireframes. They are still not completely final, more use-cases and smaller 
 updates will occur, but I believe that we are going forward pretty well.
 
 http://people.redhat.com/~jcoufal/openstack/tripleo/2014-01-20_tripleo-ui-icehouse.pdf
 
 What has changed?
 * 'Architecture' dropdown was added for all node descriptions
 * New views for Deployed and Free nodes
 * Removed Configuration part from Deployment Overview page (will be happening 
 under Configuration tab (under construction))
 * Added progressing page of overcloud being deployed + Deployment Log
 * Added Overcloud Horizon UI link to Deployment Overview page
 * Added view for down-scaling (need more work)
 * Added Implementation guide for developers
 
 New versions of wireframes, supporting other use-cases will occur in time, 
 but I hope that without huge changes.
 

Hi Jarda,

Nice job keeping up with all of the changes on these. They definitely look to 
me like they are getting to a state of reality for this release. Just a few 
nitpicks:

1) Looking at page 6, the user can see that 1 node is down. I think it’s 
important that they can click on the link and be taken to the table to view 
details about this node. Would the table be filtered on the nodes that are down 
(just this one in this example)? I wonder if we should be consistent and add in 
a 33% of nodes are down?

2) How does the user get back to the list of registered nodes after they’ve 
clicked on the “Deployment Overview” section of navigation? It seems like they 
are floating a little bit in the navigation in pages 2-8. Would it make sense 
to have this be some sort of subsection of the Deployment Overview?

3) Would the “See and change defaults” link switch the user over to the 
configuration tab? I’m not sure this section even needs to be here if the user 
doesn’t see anything here in line.

Best,
Liz


 Cheers
 -- Jarda
 
 On 2014/16/01 01:50, Jaromir Coufal wrote:
 Hi folks,
 
 thanks everybody for feedback. Based on that I updated wireframes and
 tried to provide a minimum scope for Icehouse timeframe.
 
 http://people.redhat.com/~jcoufal/openstack/tripleo/2014-01-16_tripleo-ui-icehouse.pdf
 
 
 Hopefully we are able to deliver described set of features. But if you
 find something what is missing which is critical for the first release
 (or that we are implementing a feature which should not have such high
 priority), please speak up now.
 
 The wireframes are very close to implementation. In time, there will
 appear more views and we will see if we can get them in as well.
 
 Thanks all for participation
 -- Jarda
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [Trove] how to list available configuration parameters for datastores

2014-01-22 Thread Craig Vyvial

Hey everyone I have run into an issue with the configuration parameter URI.
I'd like some input on what the URI might look like for getting the list
configuration parameters for a specific datastore.

Problem: Configuration parameters need to be selected per datastore.

Currently: Its setup to use the default(mysql) datastore and this wont work
for other datastores like redis/cassandra/etc.

/configurations/parameters - parameter list for mysql
/configurations/parameters/parameter_name - details of parameter

We need to be able to request the parameter list per datastore. Here are
some suggestions that outlines how each method may work.

ONE:

/configurations/parameters?datastore=mysql - list parameter for mysql
/configurations/parameters?datastore=redis - list parameter for redis

- we do not use query parameters for anything other than pagination (limit
and marker)
- this requires some finagling with the context to add the datastore.
https://gist.github.com/cp16net/8547197

TWO:

/configurations/parameters - list of datastores that have configuration
parameters
/configurations/parameters/datastore - list of parameters for datastore

THREE:

/datastores/datastore/configuration/parameters - list the parameters for
the datastore

FOUR:

/datastores/datastore - add an href on the return to the configuration
parameter list for the datastore
/configurations/parameters/datastore - list of parameters for datastore

FIVE:

* Require a configuration be created with a datastore.
Then a user may list the configuration parameters allowed on that
configuration.

/configurations/config_id/parameters - parameter list for mysql

- after some thought i think this method (5) might be the best way to
handle this.


I've outlined a few ways we could make this work. Let me know if you agree
or why you may disagree with strategy 5.

Thanks,
Craig Vyvial
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron][IPv6] A pair of mode keywords

2014-01-22 Thread Collins, Sean

I don't know if it's reasonable to expect a deployment of OpenStack that
has an *external* DHCP server. It's certainly hard to imagine how you'd
get the Neutron API and an external DHCP server to agree on an IP
assignment, since OpenStack expects to be the source of truth.

-- 
Sean M. Collins
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Questions regarding image location and glanceclient behaviour ...

2014-01-22 Thread Mark Washenberger

On Wed, Jan 22, 2014 at 1:05 AM, Public Mail kpublicm...@gmail.com wrote:

 Hi All,

 I have two questions ...

 1) Glance v1 APIs can take a --location argument when creating an image
but v2 APIs can't - bug or feature? (Details below)


I'd call that a missing feature. I think we probably need a glance
image-location-add command somewhere in the client. But fair warning, this
is typically a role-restricted operation.



 2) How should glanceclient (v2 commands) handle reserved attributes?
 a) status quo: (Apparently) let the user set them but the server
will return attribute is reserved error.  Pros: No missing
functionality, no damage done.  Cons: Bad usability.
 b) hard-code list of reserved attributes in client and don't expose
them to the user.
 Pros: quick to implement.
 Cons: Need to track reserved attributes in server
 implementation.
 c) get-reserved words from schema downloaded from server (and don't
expose them to the user).
 Pros: Don't need to track server implmentation.
 Cons: Complex - reserved words can vary from command to
 command.

   I personally favor (b) on the grounds that a client implementation
   needs to closely understand server behaviour anyway so the sync-ing
   of reserved attributes shouldn't be a big problem (*provided* the
   list of reserved attributes is made available in the reference
   documentation which doesn't seem to be the case currently).



We are in a bit of a bind with schemas--what's needed is schema resources
to represent each request and response, not just each resource. Because,
obviously, the things you can PATCH and POST are necessarily different than
the things you can GET in any service api. However, it is not clear to me
how we get from one schema per resource to one schema per request and
response in a backwards compatible way. So b) might be the only way to go.




 So what does everybody think?

 details
 When using glance client's v1 interface I can image-create an image and
 specify the image file's location via the --location parameter.
 Alternatively I can image-create an empty image and then image-update the
 image's location to some url.

 However, when using the client's v2 commands I can neither image-create the
 file using the --location parameter, nor image-update the file later.

 When using image-create with --location, the client gives the following
 error (printed by warlock):

   Unable to set 'locations' to '[u'http://192.168.1.111/foo/bar']'

 This is because the schema dictates that the location should be an object
 of the form [{url: string, metadata: object}, ...] but there is no
 way to specify such an object from the command line - I cannot specify a
 string like '{url: 192.168.1.111/foo/bar, metadata: {}}' for there
 is
 no conversion from command line strings to python dicts nor is there any
 conversion from a simple URL string to a suitable location object.

 If I modify glanceclient.v2.images.Controller.create to convert the
 locations parameter from a URL string to the desired object then the
 request goes through to the glance server where it fails with a 403 error
 (Attribute 'locations' is reserved).

 So is this discrepancy between V1  V2 deliberate (a feature :)) or is it a
 bug?
 /details

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][Neutron] PMTUd broken in gre networks

2014-01-22 Thread Rick Jones


On 01/22/2014 03:01 AM, Robert Collins wrote:

I certainly think having the MTU set to the right value is important.
I wonder if there's a standard way we can signal the MTU (e.g. in the
virtio interface) other than DHCP. Not because DHCP is bad, but
because that would work with statically injected network configs as
well.


Can LLDP be used here somehow?  It might require stretching  things a 
bit - not all LLDP agents seem to include the information, and it might 
require some sort of cascade.  It would also require the VM to pay 
attention to the frames as they arrive, but in broad, hand-waving, 
blue-sky theory it could communicate maximum frame size information 
within the broadcast domain.


rick

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Trove] how to list available configuration parameters for datastores

2014-01-22 Thread Robert Myers

I like #4 over #5 because it seems weird to have to create a configuration
first to see what parameters are allowed. With #4 you could look up what is
allowed first then create your configuration.

Robert
On Jan 22, 2014 10:18 AM, Craig Vyvial cp16...@gmail.com wrote:

 Hey everyone I have run into an issue with the configuration parameter
 URI. I'd like some input on what the URI might look like for getting the
 list configuration parameters for a specific datastore.

 Problem: Configuration parameters need to be selected per datastore.

 Currently: Its setup to use the default(mysql) datastore and this wont
 work for other datastores like redis/cassandra/etc.

 /configurations/parameters - parameter list for mysql
 /configurations/parameters/parameter_name - details of parameter

 We need to be able to request the parameter list per datastore. Here are
 some suggestions that outlines how each method may work.

 ONE:

 /configurations/parameters?datastore=mysql - list parameter for mysql
 /configurations/parameters?datastore=redis - list parameter for redis

 - we do not use query parameters for anything other than pagination (limit
 and marker)
 - this requires some finagling with the context to add the datastore.
 https://gist.github.com/cp16net/8547197

 TWO:

 /configurations/parameters - list of datastores that have configuration
 parameters
 /configurations/parameters/datastore - list of parameters for datastore

 THREE:

 /datastores/datastore/configuration/parameters - list the parameters for
 the datastore

 FOUR:

 /datastores/datastore - add an href on the return to the configuration
 parameter list for the datastore
 /configurations/parameters/datastore - list of parameters for datastore

 FIVE:

 * Require a configuration be created with a datastore.
 Then a user may list the configuration parameters allowed on that
 configuration.

 /configurations/config_id/parameters - parameter list for mysql

 - after some thought i think this method (5) might be the best way to
 handle this.


 I've outlined a few ways we could make this work. Let me know if you agree
 or why you may disagree with strategy 5.

 Thanks,
 Craig Vyvial

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [TripleO] our update story: can people live with it?

2014-01-22 Thread Dan Prince

I've been thinking a bit more about how TripleO updates are developing 
specifically with regards to compute nodes. What is commonly called the update 
story I think.

As I understand it we expect people to actually have to reboot a compute node 
in the cluster in order to deploy an update. This really worries me because it 
seems like way overkill for such a simple operation. Lets say all I need to 
deploy is a simple change to Nova's libvirt driver. And I need to deploy it to 
*all* my compute instances. Do we really expect people to actually have to 
reboot every single compute node in their cluster for such a thing. And then do 
this again and again for each update they deploy?

I understand the whole read only images thing plays into this too... but I'm 
wondering if there is a middle ground where things might work better. Perhaps 
we have a mechanism where we can tar up individual venvs from /opt/stack/ or 
perhaps also this is an area where real OpenStack packages could shine. It 
seems like we could certainly come up with some simple mechanisms to deploy 
these sorts of changes with Heat such that compute host reboot can be avoided 
for each new deploy.

Dan

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [OpenStack-Dev] Cherry picking commit from oslo-incubator

2014-01-22 Thread Ben Nemec


On 2014-01-22 06:32, Sean Dague wrote:

I think we need to graduate things to stable interfaces a lot faster.
Realizing that stable just means have to deprecate to change it. So
the interface is still changeable, just requires standard deprecation
techniques. Which we are trying to get more python libraries to do
anyway, so it would be good if we built up a bunch of best practices 
here.


-Sean


Big +1 to this.  Eliminating the sync process is going to be the 
cleanest solution for the code that is stable enough to be usable with 
things like automatic syncs.  The less code that is left in incubator, 
the easier the syncs will be.


That said, I think there's only a few people (Doug, Mark, and Thierry?) 
who have done the promote to library thing, and I will admit I don't 
have a good handle on what is involved.  It may be that we need better 
documentation of that process so more people can help out with it.  I 
know Michael Still mentioned he was planning to graduate lockutils but 
didn't know exactly how.


-Ben

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [OpenStack-Dev] Cherry picking commit from oslo-incubator


On 22/01/14 10:59 -0600, Ben Nemec wrote:

On 2014-01-22 06:32, Sean Dague wrote:

I think we need to graduate things to stable interfaces a lot faster.
Realizing that stable just means have to deprecate to change it. So
the interface is still changeable, just requires standard deprecation
techniques. Which we are trying to get more python libraries to do
anyway, so it would be good if we built up a bunch of best practices 
here.


-Sean


Big +1 to this.  Eliminating the sync process is going to be the 
cleanest solution for the code that is stable enough to be usable with 
things like automatic syncs.  The less code that is left in incubator, 
the easier the syncs will be.


That said, I think there's only a few people (Doug, Mark, and 
Thierry?) who have done the promote to library thing, and I will admit 
I don't have a good handle on what is involved.  It may be that we 
need better documentation of that process so more people can help out 
with it.  I know Michael Still mentioned he was planning to graduate 
lockutils but didn't know exactly how.


We're in the process of grouping independent modules into modules that
actually make sense to avoid having 1 python package per module on
pypi.

Some of the graduation status is being tracked here[0] and here's[1] a
graph of the current dependencies.

As mentioned in my last email, I fully agree with this and we should
definitely establish what the process is. oslo.config was the first
package that graduated from the incubator. Other packages will come
out of there during Icehouse.

Cheers,
FF

[0] https://wiki.openstack.org/wiki/Oslo/GraduationStatus
[1] https://wiki.openstack.org/wiki/Oslo/Dependencies



-Ben

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


--
@flaper87
Flavio Percoco


pgpCMoxOfF9LI.pgp
Description: PGP signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] FW: [horizon] hypervisor summary page shows incorrect stats on overcommiting cpu/disk/ram in openstack

2014-01-22 Thread Penmetcha, Surya NarayanaRaju

Hi,

Though cpu/disk/ram stats are overcommitted in openstack, hypervisor summary 
page in horizon UI displays the actual stats on compute node instead of 
overcommitted values calculated in openstack. This gives incorrect data to the 
user while provisioning instances as the used value of cpu/disk/ram is shown 
more than total value after reaching the actual stats limits on compute node.

Is this a defect (or) intentional to show the actual stats instead of 
overcommitted stats of compute node in hypervisor summary page in horizon UI ?

Raju
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] our update story: can people live with it?

Excerpts from Dan Prince's message of 2014-01-22 09:17:24 -0800:
 I've been thinking a bit more about how TripleO updates are developing
 specifically with regards to compute nodes. What is commonly called the
 update story I think.
 
 As I understand it we expect people to actually have to reboot a compute
 node in the cluster in order to deploy an update. This really worries me
 because it seems like way overkill for such a simple operation. Lets say
 all I need to deploy is a simple change to Nova's libvirt driver. And
 I need to deploy it to *all* my compute instances. Do we really expect
 people to actually have to reboot every single compute node in their
 cluster for such a thing. And then do this again and again for each
 update they deploy?
 

Agreed, if we make everybody reboot to push out a patch to libvirt, we
have failed. And thus far, we are failing to do that, but with good
reason.

Right at this very moment, we are leaning on 'rebuild' in Nova, which
reboots the instance. But this is so that we handle the hardest thing
well first (rebooting to have a new kernel).

For small updates we need to decouple things a bit more. There is a
notion of the image ID in Nova, versus the image ID that is actually
running. Right now we update it with a nova rebuild command only.

But ideally we would give operators a tool to optimize and avoid the
reboot when it is appropriate. The heuristic should be as simple as
comparing kernels. Once we have determined that a new image does not
need a reboot, we can just change the ID in Metadata, and an
os-refresh-config script will do something like this:

if [ $(cat /etc/image_id) != $(os-apply-config --key image_id) ] ;
then;
download_new_image
mount_image /tmp/new_image
mount / -o remount,rw # Assuming we've achieved ro root
rsync --one-file-system -a /tmp/new_image/ /
mount / -o remount,ro # ditto
fi

No reboot required. This would run early in configure.d, so that any
pre-configure.d scripts will have run to quiesce services that can't
handle having their binaries removed out from under them (read:
non-Unix services). Then configure.d runs as usual, configures things,
restarts services, and we are now running the new image.

 I understand the whole read only images thing plays into this too... but
 I'm wondering if there is a middle ground where things might work
 better. Perhaps we have a mechanism where we can tar up individual venvs
 from /opt/stack/ or perhaps also this is an area where real OpenStack
 packages could shine. It seems like we could certainly come up with some
 simple mechanisms to deploy these sorts of changes with Heat such that
 compute host reboot can be avoided for each new deploy.

Given the scenario above, that would be a further optimization. I don't
think it makes sense to specialize for venvs or openstack services
though, so just ensure the root filesystems match seems like a
workable, highly efficient system. Note that we've talked about having
highly efficient ways to widely distribute the new images as well.

I would call your e-mail a documentation/roadmap bug. This plan may
have been recorded somewhere, but for me it has just always been in my
head as the end goal (thanks to Robert Collins for drilling the hole
and pouring it in there btw ;).

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] our update story: can people live with it?

2014-01-22 Thread Dan Prince

- Original Message -
 From: Clint Byrum cl...@fewbar.com
 To: openstack-dev openstack-dev@lists.openstack.org
 Sent: Wednesday, January 22, 2014 12:45:45 PM
 Subject: Re: [openstack-dev] [TripleO] our update story: can people live  
 with it?

 Excerpts from Dan Prince's message of 2014-01-22 09:17:24 -0800:
  I've been thinking a bit more about how TripleO updates are developing
  specifically with regards to compute nodes. What is commonly called the
  update story I think.

  As I understand it we expect people to actually have to reboot a compute
  node in the cluster in order to deploy an update. This really worries me
  because it seems like way overkill for such a simple operation. Lets say
  all I need to deploy is a simple change to Nova's libvirt driver. And
  I need to deploy it to *all* my compute instances. Do we really expect
  people to actually have to reboot every single compute node in their
  cluster for such a thing. And then do this again and again for each
  update they deploy?

 Agreed, if we make everybody reboot to push out a patch to libvirt, we
 have failed. And thus far, we are failing to do that, but with good
 reason.

 Right at this very moment, we are leaning on 'rebuild' in Nova, which
 reboots the instance. But this is so that we handle the hardest thing
 well first (rebooting to have a new kernel).

 For small updates we need to decouple things a bit more. There is a
 notion of the image ID in Nova, versus the image ID that is actually
 running. Right now we update it with a nova rebuild command only.

 But ideally we would give operators a tool to optimize and avoid the
 reboot when it is appropriate. The heuristic should be as simple as
 comparing kernels.

When we get to implementing such a thing I might prefer it not to be 
auto-magic. I can see a case where I want the new image but maybe not the new 
kernel. Perhaps this should be addressed when building the image (by using the 
older kernel)... but still. I could see a case for explicitly not wanting to 
reboot here as well.

 Once we have determined that a new image does not
 need a reboot, we can just change the ID in Metadata, and an
 os-refresh-config script will do something like this:

 if [ $(cat /etc/image_id) != $(os-apply-config --key image_id) ] ;
 then;
 download_new_image
 mount_image /tmp/new_image
 mount / -o remount,rw # Assuming we've achieved ro root
 rsync --one-file-system -a /tmp/new_image/ /
 mount / -o remount,ro # ditto
 fi

 No reboot required. This would run early in configure.d, so that any
 pre-configure.d scripts will have run to quiesce services that can't
 handle having their binaries removed out from under them (read:
 non-Unix services). Then configure.d runs as usual, configures things,
 restarts services, and we are now running the new image.

Cool. I like this a good bit better as it avoids the reboot. Still, this is a 
rather large amount of data to copy around if I'm only changing a single file 
in Nova.

  I understand the whole read only images thing plays into this too... but
  I'm wondering if there is a middle ground where things might work
  better. Perhaps we have a mechanism where we can tar up individual venvs
  from /opt/stack/ or perhaps also this is an area where real OpenStack
  packages could shine. It seems like we could certainly come up with some
  simple mechanisms to deploy these sorts of changes with Heat such that
  compute host reboot can be avoided for each new deploy.

 Given the scenario above, that would be a further optimization. I don't
 think it makes sense to specialize for venvs or openstack services
 though, so just ensure the root filesystems match seems like a
 workable, highly efficient system. Note that we've talked about having
 highly efficient ways to widely distribute the new images as well.

Yes. Optimization! In the big scheme of things I could see 3 approaches being 
useful:

1) Deploy a full image and reboot if you have a kernel update. (entire image is 
copied)

2) Deploy a full image if you change a bunch of things and/or you prefer to do 
that. (entire image is copied)

3) Deploy specific application level updates via packages or tarballs. (only 
selected applications/packages get deployed)

 I would call your e-mail a documentation/roadmap bug.

Fair enough. Thanks for the info.

 This plan may have been recorded somewhere, but for me it has just always 
 been in my
 head as the end goal (thanks to Robert Collins for drilling the hole
 and pouring it in there btw ;).

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Trove] how to list available configuration parameters for datastores

2014-01-22 Thread Kaleb Pomeroy

My thoughts so far:

/datastores/datastore/configuration/parameters (Option Three)
+ configuration set without an associated datastore is meaningless
+ a configuration set must be associated to exactly one datastore
+ each datastore must have 0-1 configuration set
+ All above relationships are immediately apparent
- Listing all configuration sets becomes more difficult (which I don't think 
that is a valid concern)

/configurations/config_id/parameters (Option Five)
+ Smaller, canonical route to a configuration set
- datastore/config relationshiop is much more ambiguous

I'm planning on working a blueprint for this feature soon, so I'd like any 
feedback anyone has.

- kpom


From: Craig Vyvial [cp16...@gmail.com]
Sent: Wednesday, January 22, 2014 10:10 AM
To: OpenStack Development Mailing List
Subject: [openstack-dev] [Trove] how to list available configuration parameters 
for datastores

Hey everyone I have run into an issue with the configuration parameter URI. I'd 
like some input on what the URI might look like for getting the list 
configuration parameters for a specific datastore.

Problem: Configuration parameters need to be selected per datastore.

Currently: Its setup to use the default(mysql) datastore and this wont work for 
other datastores like redis/cassandra/etc.

/configurations/parameters - parameter list for mysql
/configurations/parameters/parameter_name - details of parameter

We need to be able to request the parameter list per datastore. Here are some 
suggestions that outlines how each method may work.

ONE:

/configurations/parameters?datastore=mysql - list parameter for mysql
/configurations/parameters?datastore=redis - list parameter for redis

- we do not use query parameters for anything other than pagination (limit and 
marker)
- this requires some finagling with the context to add the datastore.
https://gist.github.com/cp16net/8547197

TWO:

/configurations/parameters - list of datastores that have configuration 
parameters
/configurations/parameters/datastore - list of parameters for datastore

THREE:

/datastores/datastore/configuration/parameters - list the parameters for the 
datastore

FOUR:

/datastores/datastore - add an href on the return to the configuration 
parameter list for the datastore
/configurations/parameters/datastore - list of parameters for datastore

FIVE:

* Require a configuration be created with a datastore.
Then a user may list the configuration parameters allowed on that configuration.

/configurations/config_id/parameters - parameter list for mysql

- after some thought i think this method (5) might be the best way to handle 
this.


I've outlined a few ways we could make this work. Let me know if you agree or 
why you may disagree with strategy 5.

Thanks,
Craig Vyvial
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] Icehouse-2 milestone candidates available

2014-01-22 Thread Thierry Carrez

Hi everyone,

Milestone-proposed branches were created for Keystone, Glance, Nova,
Neutron, Cinder, Ceilometer, Heat and and Trove in preparation
for the icehouse-2 milestone publication tomorrow.

Horizon should be there in a few hours.

You can find candidate tarballs at:
http://tarballs.openstack.org/keystone/keystone-milestone-proposed.tar.gz
http://tarballs.openstack.org/glance/glance-milestone-proposed.tar.gz
http://tarballs.openstack.org/nova/nova-milestone-proposed.tar.gz
http://tarballs.openstack.org/neutron/neutron-milestone-proposed.tar.gz
http://tarballs.openstack.org/cinder/cinder-milestone-proposed.tar.gz
http://tarballs.openstack.org/ceilometer/ceilometer-milestone-proposed.tar.gz
http://tarballs.openstack.org/heat/heat-milestone-proposed.tar.gz
http://tarballs.openstack.org/trove/trove-milestone-proposed.tar.gz

You can also access the milestone-proposed branches directly at:
https://github.com/openstack/keystone/tree/milestone-proposed
https://github.com/openstack/glance/tree/milestone-proposed
https://github.com/openstack/nova/tree/milestone-proposed
https://github.com/openstack/neutron/tree/milestone-proposed
https://github.com/openstack/cinder/tree/milestone-proposed
https://github.com/openstack/ceilometer/tree/milestone-proposed
https://github.com/openstack/heat/tree/milestone-proposed
https://github.com/openstack/trove/tree/milestone-proposed

Regards,

-- 
Thierry Carrez (ttx)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [Swift] 1.12.0 release candidate

2014-01-22 Thread Thierry Carrez

Hi everyone,

A milestone-proposed branch was created for Swift in preparation for the
1.12.0 release.

Please test the proposed delivery to ensure no critical regression found
its way in. Release-critical fixes might be backported to the
milestone-proposed branch until final release, and will be tracked using
the 1.12.0 milestone targeting:

https://launchpad.net/swift/+milestone/1.12.0

You can find the candidate tarball at:
http://tarballs.openstack.org/swift/swift-milestone-proposed.tar.gz

You can also access the milestone-proposed branch directly at:
https://github.com/openstack/swift/tree/milestone-proposed

Regards,

-- 
Thierry Carrez (ttx)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] our update story: can people live with it?

2014-01-22 Thread Fox, Kevin M

Another tricky bit left is how to handle service restarts as needed?

Thanks,
Kevin

From: Dan Prince [dpri...@redhat.com]
Sent: Wednesday, January 22, 2014 10:15 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [TripleO] our update story: can people live
with it?

- Original Message -
 From: Clint Byrum cl...@fewbar.com
 To: openstack-dev openstack-dev@lists.openstack.org
 Sent: Wednesday, January 22, 2014 12:45:45 PM
 Subject: Re: [openstack-dev] [TripleO] our update story: can people live  
 with it?

 Excerpts from Dan Prince's message of 2014-01-22 09:17:24 -0800:
  I've been thinking a bit more about how TripleO updates are developing
  specifically with regards to compute nodes. What is commonly called the
  update story I think.

  As I understand it we expect people to actually have to reboot a compute
  node in the cluster in order to deploy an update. This really worries me
  because it seems like way overkill for such a simple operation. Lets say
  all I need to deploy is a simple change to Nova's libvirt driver. And
  I need to deploy it to *all* my compute instances. Do we really expect
  people to actually have to reboot every single compute node in their
  cluster for such a thing. And then do this again and again for each
  update they deploy?

 Agreed, if we make everybody reboot to push out a patch to libvirt, we
 have failed. And thus far, we are failing to do that, but with good
 reason.

 Right at this very moment, we are leaning on 'rebuild' in Nova, which
 reboots the instance. But this is so that we handle the hardest thing
 well first (rebooting to have a new kernel).

 For small updates we need to decouple things a bit more. There is a
 notion of the image ID in Nova, versus the image ID that is actually
 running. Right now we update it with a nova rebuild command only.

 But ideally we would give operators a tool to optimize and avoid the
 reboot when it is appropriate. The heuristic should be as simple as
 comparing kernels.

When we get to implementing such a thing I might prefer it not to be 
auto-magic. I can see a case where I want the new image but maybe not the new 
kernel. Perhaps this should be addressed when building the image (by using the 
older kernel)... but still. I could see a case for explicitly not wanting to 
reboot here as well.

 Once we have determined that a new image does not
 need a reboot, we can just change the ID in Metadata, and an
 os-refresh-config script will do something like this:

 if [ $(cat /etc/image_id) != $(os-apply-config --key image_id) ] ;
 then;
 download_new_image
 mount_image /tmp/new_image
 mount / -o remount,rw # Assuming we've achieved ro root
 rsync --one-file-system -a /tmp/new_image/ /
 mount / -o remount,ro # ditto
 fi

 No reboot required. This would run early in configure.d, so that any
 pre-configure.d scripts will have run to quiesce services that can't
 handle having their binaries removed out from under them (read:
 non-Unix services). Then configure.d runs as usual, configures things,
 restarts services, and we are now running the new image.

Cool. I like this a good bit better as it avoids the reboot. Still, this is a 
rather large amount of data to copy around if I'm only changing a single file 
in Nova.

  I understand the whole read only images thing plays into this too... but
  I'm wondering if there is a middle ground where things might work
  better. Perhaps we have a mechanism where we can tar up individual venvs
  from /opt/stack/ or perhaps also this is an area where real OpenStack
  packages could shine. It seems like we could certainly come up with some
  simple mechanisms to deploy these sorts of changes with Heat such that
  compute host reboot can be avoided for each new deploy.

 Given the scenario above, that would be a further optimization. I don't
 think it makes sense to specialize for venvs or openstack services
 though, so just ensure the root filesystems match seems like a
 workable, highly efficient system. Note that we've talked about having
 highly efficient ways to widely distribute the new images as well.

Yes. Optimization! In the big scheme of things I could see 3 approaches being 
useful:

1) Deploy a full image and reboot if you have a kernel update. (entire image is 
copied)

2) Deploy a full image if you change a bunch of things and/or you prefer to do 
that. (entire image is copied)

3) Deploy specific application level updates via packages or tarballs. (only 
selected applications/packages get deployed)

 I would call your e-mail a documentation/roadmap bug.

Fair enough. Thanks for the info.

 This plan may have been recorded somewhere, but for me it has just always 
 been in my
 head as the end goal (thanks to Robert Collins for drilling the hole
 and pouring it in there btw ;).

 ___
 OpenStack-dev mailing

Re: [openstack-dev] [TripleO] our update story: can people live with it?

2014-01-22 Thread Russell Bryant

On 01/22/2014 12:17 PM, Dan Prince wrote:
 I've been thinking a bit more about how TripleO updates are developing 
 specifically with regards to compute nodes. What is commonly called the 
 update story I think.
 
 As I understand it we expect people to actually have to reboot a compute node 
 in the cluster in order to deploy an update. This really worries me because 
 it seems like way overkill for such a simple operation. Lets say all I need 
 to deploy is a simple change to Nova's libvirt driver. And I need to deploy 
 it to *all* my compute instances. Do we really expect people to actually have 
 to reboot every single compute node in their cluster for such a thing. And 
 then do this again and again for each update they deploy?

FWIW, I agree that this is going to be considered unacceptable by most
people.  Hopefully everyone is on the same page with that.  It sounds
like that's the case so far in this thread, at least...

If you have to reboot the compute node, ideally you also have support
for live migrating all running VMs on that compute node elsewhere before
doing so.  That's not something you want to have to do for *every*
little change to *every* compute node.

-- 
Russell Bryant

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Gate Update - Wed Morning Edition

2014-01-22 Thread Russell Bryant

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 01/22/2014 09:38 AM, Sean Dague wrote:
 Thanks to everyone that's been pitching in digging on reset bugs.
 More help is needed. Many core reviewers are at this point
 completely ignoring normal reviews until the gate is back, so if
 you are waiting for a review on some code, the best way to get it,
 is help us fix the bugs reseting the gate.

I've got a couple more gate bug fixes up for review this morning:


1) https://review.openstack.org/#/c/68443/

13 fails in 24hrs / 22 fails in 7 days

Needs nova-core review.


2) https://review.openstack.org/#/c/68275/

8 fails in 24hrs / 17 fails in 7 days

This one currently needs review from oslo-core to get into
oslo-incubator.  Then I'll sync it into nova (that's where I saw the
bug in the gate, at least).

- -- 
Russell Bryant
-BEGIN PGP SIGNATURE-
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlLgEMUACgkQFg9ft4s9SAYXxACeOLo6Q2OE3ccMOJAg75blgxyA
JFwAnRSB628CEDvVB3ty0yx/F57QDyLz
=fUWE
-END PGP SIGNATURE-

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] our update story: can people live with it?

Excerpts from Dan Prince's message of 2014-01-22 10:15:20 -0800:

 - Original Message -
  From: Clint Byrum cl...@fewbar.com
  To: openstack-dev openstack-dev@lists.openstack.org
  Sent: Wednesday, January 22, 2014 12:45:45 PM
  Subject: Re: [openstack-dev] [TripleO] our update story: can people live
  with it?

  Excerpts from Dan Prince's message of 2014-01-22 09:17:24 -0800:
   I've been thinking a bit more about how TripleO updates are developing
   specifically with regards to compute nodes. What is commonly called the
   update story I think.

   As I understand it we expect people to actually have to reboot a compute
   node in the cluster in order to deploy an update. This really worries me
   because it seems like way overkill for such a simple operation. Lets say
   all I need to deploy is a simple change to Nova's libvirt driver. And
   I need to deploy it to *all* my compute instances. Do we really expect
   people to actually have to reboot every single compute node in their
   cluster for such a thing. And then do this again and again for each
   update they deploy?

  Agreed, if we make everybody reboot to push out a patch to libvirt, we
  have failed. And thus far, we are failing to do that, but with good
  reason.

  Right at this very moment, we are leaning on 'rebuild' in Nova, which
  reboots the instance. But this is so that we handle the hardest thing
  well first (rebooting to have a new kernel).

  For small updates we need to decouple things a bit more. There is a
  notion of the image ID in Nova, versus the image ID that is actually
  running. Right now we update it with a nova rebuild command only.

  But ideally we would give operators a tool to optimize and avoid the
  reboot when it is appropriate. The heuristic should be as simple as
  comparing kernels.

 When we get to implementing such a thing I might prefer it not to be 
 auto-magic. I can see a case where I want the new image but maybe not the new 
 kernel. Perhaps this should be addressed when building the image (by using 
 the older kernel)... but still. I could see a case for explicitly not wanting 
 to reboot here as well.

I prefer choosing what to update at image build time. This is the time
where it is most clear how, from a developer and deployer standpoint,
to influence the update. I can diff images, I can freeze mirrors.. etc.
etc, all decoupled from anybody else and from production or test cycles.

That said, I do think it would be good for deployers to be able have
a way to control when reboots are and aren't allowed. That seems like
policy, which may be best handled in Nova.. so we can have a user that
can do updates to Heat Metadata/stacks, but not rebuilds in Nova.

I have no idea of Heat's trust model will allow us to have such separation
though.

  Once we have determined that a new image does not
  need a reboot, we can just change the ID in Metadata, and an
  os-refresh-config script will do something like this:

  if [ $(cat /etc/image_id) != $(os-apply-config --key image_id) ] ;
  then;
  download_new_image
  mount_image /tmp/new_image
  mount / -o remount,rw # Assuming we've achieved ro root
  rsync --one-file-system -a /tmp/new_image/ /
  mount / -o remount,ro # ditto
  fi

  No reboot required. This would run early in configure.d, so that any
  pre-configure.d scripts will have run to quiesce services that can't
  handle having their binaries removed out from under them (read:
  non-Unix services). Then configure.d runs as usual, configures things,
  restarts services, and we are now running the new image.

 Cool. I like this a good bit better as it avoids the reboot. Still, this is a 
 rather large amount of data to copy around if I'm only changing a single file 
 in Nova.

I think in most cases transfer cost is worth it to know you're deploying
what you tested. Also it is pretty easy to just do this optimization
but still be rsyncing the contents of the image. Instead of downloading
the whole thing we could have a box expose the mounted image via rsync
and then all of the machines can just rsync changes. Also rsync has a
batch mode where if you know for sure the end-state of machines you can
pre-calculate that rsync and just ship that. Lots of optimization
possible that will work fine in your just-update-one-file scenario.

But really, how much does downtime cost? How much do 10Gb NICs and
switches cost?

   I understand the whole read only images thing plays into this too... but
   I'm wondering if there is a middle ground where things might work
   better. Perhaps we have a mechanism where we can tar up individual venvs
   from /opt/stack/ or perhaps also this is an area where real OpenStack
   packages could shine. It seems like we could certainly come up with some
   simple mechanisms to deploy these sorts of changes with Heat such that
   compute host reboot can be avoided for each new deploy.

  Given the scenario above,

Re: [openstack-dev] [TripleO] [Tuskar] Terminology Revival #1 - Roles



- Original Message -
 Oh dear user... :)
 
 I'll step a little bit back. We need to agree if we want to name
 concepts one way in the background and other way in the UI for user (did
 we already agree on this point?). We all know pros and cons. And I will
 still fight for users to get global infrastructure terminology  (e.g. he
 is going to define Node Profiles instead of Flavors). Because I received

Jarda, sidepoint - could you explain again what the attributes of a node profile
are?  Beyond the Flavor, does it also define an image. . . ?

Mainn


 a lot of negative feedback on mixing overcloud terms into undercloud,
 confusion about overcloud/undercloud term itself, etc. If it would be
 easier for developers to name the concepts in the background differently
 then it's fine - we just need to talk about 2 terms per concept then.
 And I would be a bit afraid of schizophrenia...
 
 
 On 2014/22/01 15:10, Tzu-Mainn Chen wrote:
  That's a fair question; I'd argue that it *should* be resources.  When we
  update an overcloud deployment, it'll create additional resources.
 
 Honestly it would get super confusing for me, if somebody tells me - you
 have 5 compute resources. (And I am talking from user's world, not from
 developers one). But resource itself can be anything.
 
 -- Jarda
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] our update story: can people live with it?

On Wed, 2014-01-22 at 13:15 -0500, Dan Prince wrote:

 - Original Message -
  From: Clint Byrum cl...@fewbar.com
  To: openstack-dev openstack-dev@lists.openstack.org
  Sent: Wednesday, January 22, 2014 12:45:45 PM
  Subject: Re: [openstack-dev] [TripleO] our update story: can people live
  with it?

  Excerpts from Dan Prince's message of 2014-01-22 09:17:24 -0800:
   I've been thinking a bit more about how TripleO updates are developing
   specifically with regards to compute nodes. What is commonly called the
   update story I think.

   As I understand it we expect people to actually have to reboot a compute
   node in the cluster in order to deploy an update. This really worries me
   because it seems like way overkill for such a simple operation. Lets say
   all I need to deploy is a simple change to Nova's libvirt driver. And
   I need to deploy it to *all* my compute instances. Do we really expect
   people to actually have to reboot every single compute node in their
   cluster for such a thing. And then do this again and again for each
   update they deploy?

  Agreed, if we make everybody reboot to push out a patch to libvirt, we
  have failed. And thus far, we are failing to do that, but with good
  reason.

  Right at this very moment, we are leaning on 'rebuild' in Nova, which
  reboots the instance. But this is so that we handle the hardest thing
  well first (rebooting to have a new kernel).

  For small updates we need to decouple things a bit more. There is a
  notion of the image ID in Nova, versus the image ID that is actually
  running. Right now we update it with a nova rebuild command only.

  But ideally we would give operators a tool to optimize and avoid the
  reboot when it is appropriate. The heuristic should be as simple as
  comparing kernels.

 When we get to implementing such a thing I might prefer it not to be 
 auto-magic. I can see a case where I want the new image but maybe not the new 
 kernel. Perhaps this should be addressed when building the image (by using 
 the older kernel)... but still. I could see a case for explicitly not wanting 
 to reboot here as well.

++

  Once we have determined that a new image does not
  need a reboot, we can just change the ID in Metadata, and an
  os-refresh-config script will do something like this:

  if [ $(cat /etc/image_id) != $(os-apply-config --key image_id) ] ;
  then;
  download_new_image
  mount_image /tmp/new_image
  mount / -o remount,rw # Assuming we've achieved ro root
  rsync --one-file-system -a /tmp/new_image/ /
  mount / -o remount,ro # ditto
  fi

  No reboot required. This would run early in configure.d, so that any
  pre-configure.d scripts will have run to quiesce services that can't
  handle having their binaries removed out from under them (read:
  non-Unix services). Then configure.d runs as usual, configures things,
  restarts services, and we are now running the new image.

 Cool. I like this a good bit better as it avoids the reboot. Still, this is a 
 rather large amount of data to copy around if I'm only changing a single file 
 in Nova.

Right.

   I understand the whole read only images thing plays into this too... but
   I'm wondering if there is a middle ground where things might work
   better. Perhaps we have a mechanism where we can tar up individual venvs
   from /opt/stack/ or perhaps also this is an area where real OpenStack
   packages could shine. It seems like we could certainly come up with some
   simple mechanisms to deploy these sorts of changes with Heat such that
   compute host reboot can be avoided for each new deploy.

  Given the scenario above, that would be a further optimization. I don't
  think it makes sense to specialize for venvs or openstack services
  though, so just ensure the root filesystems match seems like a
  workable, highly efficient system. Note that we've talked about having
  highly efficient ways to widely distribute the new images as well.

 Yes. Optimization! In the big scheme of things I could see 3 approaches being 
 useful:

 1) Deploy a full image and reboot if you have a kernel update. (entire image 
 is copied)

 2) Deploy a full image if you change a bunch of things and/or you prefer to 
 do that. (entire image is copied)

 3) Deploy specific application level updates via packages or tarballs. (only 
 selected applications/packages get deployed)

++. FWIW, #3 happens a heck of a lot more often than #1 or #2 in CD
environments, so this level of optimization will be frequently used.
And, as I've said before, optimizing for frequently-used scenarios is
worth spending the time on. Optimizing for infrequently-occurring
things... not so much. :)

Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] our update story: can people live with it?

Agreed, it is tricky if we try to only restart what we've changed.

OR, just restart everything. We can make endpoints HA and use rolling
updates to avoid spurious faults.

There are complex ways to handle things even smoother.. but I go back to
What does complexity cost?

Excerpts from Fox, Kevin M's message of 2014-01-22 10:32:02 -0800:
 Another tricky bit left is how to handle service restarts as needed?
 
 Thanks,
 Kevin
 
 From: Dan Prince [dpri...@redhat.com]
 Sent: Wednesday, January 22, 2014 10:15 AM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [TripleO] our update story: can people live  
   with it?
 
 - Original Message -
  From: Clint Byrum cl...@fewbar.com
  To: openstack-dev openstack-dev@lists.openstack.org
  Sent: Wednesday, January 22, 2014 12:45:45 PM
  Subject: Re: [openstack-dev] [TripleO] our update story: can people live
with it?
 
  Excerpts from Dan Prince's message of 2014-01-22 09:17:24 -0800:
   I've been thinking a bit more about how TripleO updates are developing
   specifically with regards to compute nodes. What is commonly called the
   update story I think.
  
   As I understand it we expect people to actually have to reboot a compute
   node in the cluster in order to deploy an update. This really worries me
   because it seems like way overkill for such a simple operation. Lets say
   all I need to deploy is a simple change to Nova's libvirt driver. And
   I need to deploy it to *all* my compute instances. Do we really expect
   people to actually have to reboot every single compute node in their
   cluster for such a thing. And then do this again and again for each
   update they deploy?
  
 
  Agreed, if we make everybody reboot to push out a patch to libvirt, we
  have failed. And thus far, we are failing to do that, but with good
  reason.
 
  Right at this very moment, we are leaning on 'rebuild' in Nova, which
  reboots the instance. But this is so that we handle the hardest thing
  well first (rebooting to have a new kernel).
 
  For small updates we need to decouple things a bit more. There is a
  notion of the image ID in Nova, versus the image ID that is actually
  running. Right now we update it with a nova rebuild command only.
 
  But ideally we would give operators a tool to optimize and avoid the
  reboot when it is appropriate. The heuristic should be as simple as
  comparing kernels.
 
 When we get to implementing such a thing I might prefer it not to be 
 auto-magic. I can see a case where I want the new image but maybe not the new 
 kernel. Perhaps this should be addressed when building the image (by using 
 the older kernel)... but still. I could see a case for explicitly not wanting 
 to reboot here as well.
 
  Once we have determined that a new image does not
  need a reboot, we can just change the ID in Metadata, and an
  os-refresh-config script will do something like this:
 
  if [ $(cat /etc/image_id) != $(os-apply-config --key image_id) ] ;
  then;
  download_new_image
  mount_image /tmp/new_image
  mount / -o remount,rw # Assuming we've achieved ro root
  rsync --one-file-system -a /tmp/new_image/ /
  mount / -o remount,ro # ditto
  fi
 
  No reboot required. This would run early in configure.d, so that any
  pre-configure.d scripts will have run to quiesce services that can't
  handle having their binaries removed out from under them (read:
  non-Unix services). Then configure.d runs as usual, configures things,
  restarts services, and we are now running the new image.
 
 Cool. I like this a good bit better as it avoids the reboot. Still, this is a 
 rather large amount of data to copy around if I'm only changing a single file 
 in Nova.
 
 
   I understand the whole read only images thing plays into this too... but
   I'm wondering if there is a middle ground where things might work
   better. Perhaps we have a mechanism where we can tar up individual venvs
   from /opt/stack/ or perhaps also this is an area where real OpenStack
   packages could shine. It seems like we could certainly come up with some
   simple mechanisms to deploy these sorts of changes with Heat such that
   compute host reboot can be avoided for each new deploy.
 
  Given the scenario above, that would be a further optimization. I don't
  think it makes sense to specialize for venvs or openstack services
  though, so just ensure the root filesystems match seems like a
  workable, highly efficient system. Note that we've talked about having
  highly efficient ways to widely distribute the new images as well.
 
 Yes. Optimization! In the big scheme of things I could see 3 approaches being 
 useful:
 
 1) Deploy a full image and reboot if you have a kernel update. (entire image 
 is copied)
 
 2) Deploy a full image if you change a bunch of things and/or you prefer to 
 do that. (entire image is copied)
 
 3) Deploy specific application level

[openstack-dev] [Swift] domain-level quotas

2014-01-22 Thread Matthieu Huin

Greetings,

I'd be interested in your opinions and feedback on the following blueprint: 
https://blueprints.launchpad.net/swift/+spec/domain-level-quotas

The idea is to have a middleware checking a domain's current usage against a 
limit set in the configuration before allowing an upload. The domain id can be 
extracted from the token, then used to query keystone for a list of projects 
belonging to the domain. Swift would then compute the domain usage in a similar 
fashion as the way it is currently done for accounts, and proceed from there.

Do you think it is viable ? Thoughts ?

Thanks,

Matthieu Huin 

m...@enovance.com 
http://www.enovance.com 
eNovance SaS - 10 rue de la Victoire 75009 Paris - France 


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron][IPv6] A pair of mode keywords

2014-01-22 Thread Shixiong Shang

Sean, I agree with you. I prefer OpenStack as the single source of truth. What 
end user chooses may be different. But with this pair of keywords, at least we 
provide comprehensive coverage on all scenarios.

For Icehouse, I suggest we only consider the supports for the scenarios that 
OpenStack has full control of address assignment, plus one or two scenarios 
Comcast needs, in order to cover most of the deployments. We can leave other 
cases for future releases, or professional service opportunities. 

Shixiong




 On Jan 22, 2014, at 11:20 AM, Collins, Sean 
 sean_colli...@cable.comcast.com wrote:
 
 I don't know if it's reasonable to expect a deployment of OpenStack that
 has an *external* DHCP server. It's certainly hard to imagine how you'd
 get the Neutron API and an external DHCP server to agree on an IP
 assignment, since OpenStack expects to be the source of truth.
 
 -- 
 Sean M. Collins
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron][IPv6] A pair of mode keywords

2014-01-22 Thread Shixiong Shang

That is correct, Xu Han!



 On Jan 22, 2014, at 11:14 AM, Xuhan Peng pengxu...@gmail.com wrote:
 
 Ian, 
 
 I think the last two attributes PDF from Shixiong's last email is trying to 
 solve the problem you are saying, right?
 —
 Xu Han Peng (xuhanp)
 
 
 On Wed, Jan 22, 2014 at 8:15 PM, Ian Wells ijw.ubu...@cack.org.uk wrote:
 On 21 January 2014 22:46, Veiga, Anthony anthony_ve...@cable.comcast.com 
 wrote:
 
 Hi, Sean and Xuhan:
 
 I totally agree. This is not the ultimate solution with the assumption that 
 we had to use “enable_dhcp”.
 
 We haven’t decided the name of another parameter, however, we are open to 
 any suggestions. As we mentioned during the meeting, the second parameter 
 should highlight the need of addressing. If so, it should have at least 
 four values:
 
 1) off (i.e. address is assigned by external devices out of OpenStack 
 control)
 2) slaac (i.e. address is calculated based on RA sent by OpenStack dnsmasq)
 3) dhcpv6-stateful (i.e. address is obtained from OpenStack dnsmasq acting 
 as DHCPv6 stateful server)
 4) dhcpv6-stateless (i.e. address is calculated based on RA sent from 
 either OpenStack dnsmasq, or external router, and optional information is 
 retrieved from OpenStack dnsmasq acting as DHCPv6 stateless server)
 
 So how does this work if I have an external DHCPv6 server and an internal 
 router?  (How baroque do we have to get?)  enable_dhcp, for backward 
 compatibility reasons, should probably disable *both* RA and DHCPv6, despite 
 the name, so we can't use that to disable the DHCP server.  We could add a 
 *third* attribute, which I hate as an idea but does resolve the problem - 
 one flag for each of the servers, one for the mode the servers are operating 
 in, and enable_dhcp which needs to DIAF but will persist till the API is 
 revved.
 -- 
 Ian.
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [ceilometer] per domain/project/user limits on alarms

2014-01-22 Thread Matthieu Huin

Greetings,

I'd be interested in some opinions and feedback on the following blueprint: 
https://blueprints.launchpad.net/ceilometer/+spec/quotas-on-alarms

I think it'd be interesting to allow admins to limit the number of running 
alarms at any of the three levels defined by keystone. Thoughts ?

Thanks,

Matthieu Huin 

m...@enovance.com 
http://www.enovance.com 
eNovance SaS - 10 rue de la Victoire 75009 Paris - France 


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron] Neutron should disallow /32 CIDR

2014-01-22 Thread Carl Baldwin

Agreed.  That would be a good place for that check.

Carl

On Wed, Jan 22, 2014 at 6:40 AM, Paul Ward wpw...@us.ibm.com wrote:
 Thanks for your input, Carl.  You're right, it seems the more appropriate
 place for this is _validate_subnet().  It checks ip version, gateway, etc...
 but not the size of the subnet.



 Carl Baldwin c...@ecbaldwin.net wrote on 01/21/2014 09:22:55 PM:

 From: Carl Baldwin c...@ecbaldwin.net
 To: OpenStack Development Mailing List
 openstack-dev@lists.openstack.org,
 Date: 01/21/2014 09:27 PM


 Subject: Re: [openstack-dev] [neutron] Neutron should disallow /32 CIDR

 The bottom line is that the method you mentioned shouldn't validate
 the subnet. It should assume the subnet has been validated and
 validate the pool.  It seems to do a adequate job of that.
 Perhaps there is a _validate_subnet method that you should be
 focused on?  (I'd check but I don't have convenient access to the
 code at the moment)
 Carl
 On Jan 21, 2014 6:16 PM, Paul Ward wpw...@us.ibm.com wrote:
 You beat me to it. :)  I just responded about not checking the
 allocation pool start and end but rather, checking subnet_first_ip
 and subnet_last_ip, which is set as follows:

 subnet = netaddr.IPNetwork(subnet_cidr)
 subnet_first_ip = netaddr.IPAddress(subnet.first + 1)
 subnet_last_ip = netaddr.IPAddress(subnet.last - 1)

 However, I'm curious about your contention that we're ok... I'm
 assuming you mean that this should already be handled.   I don't
 believe anything is really checking to be sure the allocation pool
 leaves room for a gateway, I think it just makes sure it fits in the
 subnet.  A member of our test team successfully created a network
 with a subnet of 255.255.255.255, so it got through somehow.  I will
 look into that more tomorrow.



 Carl Baldwin c...@ecbaldwin.net wrote on 01/21/2014 05:27:49 PM:

  From: Carl Baldwin c...@ecbaldwin.net
  To: OpenStack Development Mailing List (not for usage questions)
  openstack-dev@lists.openstack.org,
  Date: 01/21/2014 05:32 PM
  Subject: Re: [openstack-dev] [neutron] Neutron should disallow /32 CIDR
 
  I think there may be some confusion between the two concepts:  subnet
  and allocation pool.  You are right that an ipv4 subnet smaller than
  /30 is not useable on a network.
 
  However, this method is checking the validity of an allocation pool.
  These pools should not include room for a gateway nor broadcast
  address.  Their relation to subnets is that the range of ips contained
  in the pool must fit within the allocatable IP space on the subnet
  from which they are allocated.  Other than that, they are simple
  ranges; they don't need to be cidr aligned or anything.  A pool of a
  single IP is valid.
 
  I just checked the method's implementation now.  It does check that
  the pool fits within the allocatable range of the subnet.  I think
  we're good.
 
  Carl
 
  On Tue, Jan 21, 2014 at 3:35 PM, Paul Ward wpw...@us.ibm.com wrote:
   Currently, NeutronDbPluginV2._validate_allocation_pools() does some
   very
   basic checking to be sure the specified subnet is valid.  One thing
   that's
   missing is checking for a CIDR of /32.  A subnet with one IP address
   in it
   is unusable as the sole IP address will be allocated to the gateway,
   and
   thus no IPs are left over to be allocated to VMs.
  
   The fix for this is simple.  In
   NeutronDbPluginV2._validate_allocation_pools(), we'd check for
   start_ip ==
   end_ip and raise an exception if that's true.
  
   I've opened lauchpad bug report 1271311
   (https://bugs.launchpad.net/neutron/+bug/1271311) for this, but wanted
   to
   start a discussion here to see if others find this enhancement to be a
   valuable addition.
  
  
   ___
   OpenStack-dev mailing list
   OpenStack-dev@lists.openstack.org
   http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
  
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [trove] Datastore Type/Version Migration

2014-01-22 Thread Andrey Shestakov

Hi

Looks good approach. Lets start discussion.

I propose API spec for it
https://gist.github.com/andreyshestakov/8559309

Please look it and add your advices and comments.

Thanks

On Thu, Nov 21, 2013 at 2:44 AM, McReynolds, Auston
amcreyno...@ebay.com wrote:
 With Multiple Datastore Types/Versions merged to master, the
 conversation around how to support migrating from one datastore
 version to another has begun.

 Please see https://gist.github.com/amcrn/dfd493200fcdfdb61a23 for
 a consolidation of thoughts thus far.


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][neutron]About creating vms without ip address

2014-01-22 Thread Yuriy Taraday

Hello.

On Tue, Jan 21, 2014 at 12:52 PM, Dong Liu willowd...@gmail.com wrote:

 What's your opinion?


We've just discussed a use case for this today. I want to create a sandbox
for Fuel but I can't do it with OpenStack.
The reason is a bit different from telecom case: Fuel needs to manage nodes
directly via DHCP and PXE and you can't do that with Neutron since you
can't make its dnsmasq service quiet.

So, it's a great idea. We can have either VMs with no IP address associated
or networks with no fixed IP range, either could work.
There can be a problem with handling floating IPs though.

-- 

Kind regards, Yuriy.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][neutron]About creating vms without ip address

2014-01-22 Thread CARVER, PAUL

Yuriy Taraday wrote:

Fuel needs to manage nodes directly via DHCP and PXE and you can't do that 
with Neutron since you can't make its dnsmasq service quiet.

Can you elaborate on what you mean by this? You can turn of Neutron’s dnsmasq 
on a per network basis, correct? Do you mean something else by “make its 
dnsmasq service quiet”?

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Trove] how to list available configuration parameters for datastores

2014-01-22 Thread Michael Basnight

On Jan 22, 2014, at 10:19 AM, Kaleb Pomeroy wrote:

 My thoughts so far: 
 
 /datastores/datastore/configuration/parameters (Option Three)
 + configuration set without an associated datastore is meaningless
 + a configuration set must be associated to exactly one datastore
 + each datastore must have 0-1 configuration set
 + All above relationships are immediately apparent 
 - Listing all configuration sets becomes more difficult (which I don't think 
 that is a valid concern)

+1 to option 3, given what kaleb and craig have outlined so far. I dont see the 
above minus as a valid concern either, kaleb.



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [Neutron][LBaaS] Status update and weekly meeting

2014-01-22 Thread Eugene Nikanorov

Hi folks,

At this point we have a few major action items, mostly patches on review.
Please note that the gate is in pretty bad shape, so don't expect anything
to be approved/merged until this is sorted out.

1) SSL extension
https://review.openstack.org/#/c/63510/
The code here is in a good shape IMO, but we are yet undecided on the
general approach.
In my opinion while we are lacking good and stable open source solution
(which is Haproxy 1.5 released) that can be made a vendor extension with a
prospect of moving into the core lbaas API.

2) Loadbalancer instance
https://review.openstack.org/#/c/60207/
New API made fully backward-compatible.
As new drivers appear (like https://review.openstack.org/#/c/67405/ ) the
code shows the need of container entity to bind entities like router,
device, agents.

3) Multiple providers with the same driver
https://review.openstack.org/#/c/64139/
The code is good to merge, we need for the gate to be stable.

4) L7 rules
https://review.openstack.org/#/c/61721/
My concern here is  how Vip and Pool are associated. I think it could be
made more generic.
I've left corresponding comments.

5) We have lbaas scenario test which is still waiting to be merged:
https://review.openstack.org/#/c/58697/

We'll have a regular irc meeting tomorrow at 14-00 UTC on
#openstack-meeting.
I'd like to discuss primarily 1 item which is 'uneven API experience',
which could be divided into two distinct parts:
 - Presenting different API for different drivers (e.g. justification for
vendor extension framework)
 - Generic API experience
On the (2) I'd like to discuss the concern that I've raised in
https://review.openstack.org/#/c/68190/

Thanks,
Eugene.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] milestone-proposed branches

2014-01-22 Thread James Slagle

On Thu, Jan 16, 2014 at 10:32 AM, Thierry Carrez thie...@openstack.org wrote:
 James Slagle wrote:
 [...]
 And yes, I'm volunteering to do the work to support the above, and the
 release work :).

 Let me know if you have any question or need help. The process and tools
 used for the integrated release are described here:

 https://wiki.openstack.org/wiki/Release_Team/How_To_Release

Thanks Thierry, I wanted to give this a go for icehouse milestone 2,
but given that those were cut yesterday and there are still some
outstanding doc updates in review, I'd like to shoot for milestone 3
instead. Is there anything additional we need to do to make that
happen?

I read through that wiki page. I did have a couple of questions:

Who usually runs through the steps there? You? or a project member?

When repo_tarball_diff.sh is run, are there any acceptable missing
files? I'm seeing an AUTHORS and ChangeLog file showing up in the
output from our repos, those are automatically generated, so I assume
those are ok. There are also some egg_info files showing up, which I
also think can be safely ignored.  (I submitted a patch that updates
the grep command used in the script:
https://review.openstack.org/#/c/68471/ )

Thanks.


 Also note that we were considering switching from using
 milestone-proposed to using proposed/*, to avoid reusing branch names:

 https://review.openstack.org/#/c/65103/

 --
 Thierry Carrez (ttx)

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
-- James Slagle
--

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [Trove][Discussion] Are we using troveclient/tools/install_venv_common.py ?

2014-01-22 Thread Nilakhya


Hi All,

Are we using tools/install_venv_common.py in python-troveclient,

If so just let us know.

Otherwise, it may be cleaned up (removing it from openstack-common.conf)

Thanks.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] our update story: can people live with it?

Excerpts from Jay Pipes's message of 2014-01-22 10:53:14 -0800:
 On Wed, 2014-01-22 at 13:15 -0500, Dan Prince wrote:

  - Original Message -
   From: Clint Byrum cl...@fewbar.com
   To: openstack-dev openstack-dev@lists.openstack.org
   Sent: Wednesday, January 22, 2014 12:45:45 PM
   Subject: Re: [openstack-dev] [TripleO] our update story: can people live  
 with it?

   Given the scenario above, that would be a further optimization. I don't
   think it makes sense to specialize for venvs or openstack services
   though, so just ensure the root filesystems match seems like a
   workable, highly efficient system. Note that we've talked about having
   highly efficient ways to widely distribute the new images as well.

  Yes. Optimization! In the big scheme of things I could see 3 approaches 
  being useful:

  1) Deploy a full image and reboot if you have a kernel update. (entire 
  image is copied)

  2) Deploy a full image if you change a bunch of things and/or you prefer to 
  do that. (entire image is copied)

  3) Deploy specific application level updates via packages or tarballs. 
  (only selected applications/packages get deployed)

 ++. FWIW, #3 happens a heck of a lot more often than #1 or #2 in CD
 environments, so this level of optimization will be frequently used.
 And, as I've said before, optimizing for frequently-used scenarios is
 worth spending the time on. Optimizing for infrequently-occurring
 things... not so much. :)

I do understand that little tweaks are more common than whole software
updates.

I also think that little tweaks must be tested just like big ones.

So I would argue that it is more important to optimize for trusting that
what you tested is what is in production, and then to address any issues
if that work-flow needs optimization. A system that leaves operators
afraid to do a big update because it will trigger the bad path is a
system that doesn't handle big updates well.

Ideally we'd optimize all 3 in all of the obvious ways before determining
that the one file update just isn't fast enough.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] our update story: can people live with it?

2014-01-22 Thread Fox, Kevin M

I think most of the time taken to reboot is spent in bringing down/up the 
services though, so I'm not sure what it really buys you if you do it all. It 
may let you skip the crazy long bootup time on enterprise hardware, but that 
could be worked around with kexec on the full reboot method too.

Thanks,
Kevin

From: Clint Byrum [cl...@fewbar.com]
Sent: Wednesday, January 22, 2014 10:55 AM
To: openstack-dev
Subject: Re: [openstack-dev] [TripleO] our update story: can people live
with it?

Agreed, it is tricky if we try to only restart what we've changed.

OR, just restart everything. We can make endpoints HA and use rolling
updates to avoid spurious faults.

There are complex ways to handle things even smoother.. but I go back to
What does complexity cost?

Excerpts from Fox, Kevin M's message of 2014-01-22 10:32:02 -0800:
 Another tricky bit left is how to handle service restarts as needed?

 Thanks,
 Kevin
 
 From: Dan Prince [dpri...@redhat.com]
 Sent: Wednesday, January 22, 2014 10:15 AM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [TripleO] our update story: can people live  
   with it?

 - Original Message -
  From: Clint Byrum cl...@fewbar.com
  To: openstack-dev openstack-dev@lists.openstack.org
  Sent: Wednesday, January 22, 2014 12:45:45 PM
  Subject: Re: [openstack-dev] [TripleO] our update story: can people live
with it?
 
  Excerpts from Dan Prince's message of 2014-01-22 09:17:24 -0800:
   I've been thinking a bit more about how TripleO updates are developing
   specifically with regards to compute nodes. What is commonly called the
   update story I think.
  
   As I understand it we expect people to actually have to reboot a compute
   node in the cluster in order to deploy an update. This really worries me
   because it seems like way overkill for such a simple operation. Lets say
   all I need to deploy is a simple change to Nova's libvirt driver. And
   I need to deploy it to *all* my compute instances. Do we really expect
   people to actually have to reboot every single compute node in their
   cluster for such a thing. And then do this again and again for each
   update they deploy?
  
 
  Agreed, if we make everybody reboot to push out a patch to libvirt, we
  have failed. And thus far, we are failing to do that, but with good
  reason.
 
  Right at this very moment, we are leaning on 'rebuild' in Nova, which
  reboots the instance. But this is so that we handle the hardest thing
  well first (rebooting to have a new kernel).
 
  For small updates we need to decouple things a bit more. There is a
  notion of the image ID in Nova, versus the image ID that is actually
  running. Right now we update it with a nova rebuild command only.
 
  But ideally we would give operators a tool to optimize and avoid the
  reboot when it is appropriate. The heuristic should be as simple as
  comparing kernels.

 When we get to implementing such a thing I might prefer it not to be 
 auto-magic. I can see a case where I want the new image but maybe not the new 
 kernel. Perhaps this should be addressed when building the image (by using 
 the older kernel)... but still. I could see a case for explicitly not wanting 
 to reboot here as well.

  Once we have determined that a new image does not
  need a reboot, we can just change the ID in Metadata, and an
  os-refresh-config script will do something like this:
 
  if [ $(cat /etc/image_id) != $(os-apply-config --key image_id) ] ;
  then;
  download_new_image
  mount_image /tmp/new_image
  mount / -o remount,rw # Assuming we've achieved ro root
  rsync --one-file-system -a /tmp/new_image/ /
  mount / -o remount,ro # ditto
  fi
 
  No reboot required. This would run early in configure.d, so that any
  pre-configure.d scripts will have run to quiesce services that can't
  handle having their binaries removed out from under them (read:
  non-Unix services). Then configure.d runs as usual, configures things,
  restarts services, and we are now running the new image.

 Cool. I like this a good bit better as it avoids the reboot. Still, this is a 
 rather large amount of data to copy around if I'm only changing a single file 
 in Nova.

 
   I understand the whole read only images thing plays into this too... but
   I'm wondering if there is a middle ground where things might work
   better. Perhaps we have a mechanism where we can tar up individual venvs
   from /opt/stack/ or perhaps also this is an area where real OpenStack
   packages could shine. It seems like we could certainly come up with some
   simple mechanisms to deploy these sorts of changes with Heat such that
   compute host reboot can be avoided for each new deploy.
 
  Given the scenario above, that would be a further optimization. I don't
  think it makes sense to specialize for venvs or openstack services
  though,

Re: [openstack-dev] [qa][Neutron][Tempest][Network] Break down NetworkBasicOps to smaller test cases

On Tue, 2014-01-21 at 01:15 -0500, Yair Fried wrote:
 I seem to be unable to convey my point using generalization, so I will give a 
 specific example:
 I would like to have update dns server as an additional network scenario. 
 Currently I could add it to the existing module:
 
 1. tests connectivity
 2. re-associate floating ip
 3. update dns server
 
 In which case, failure to re-associate ip will prevent my test from running, 
 even though these are completely unrelated scenarios, and (IMO) we would like 
 to get feedback on both of them.
 
 Another way, is to copy the entire network_basic_ops module, remove 
 re-associate floating ip and add update dns server. For the obvious 
 reasons - this also seems like the wrong way to go.
 
 I am looking for an elegant way to share the code of these scenarios.

Well, unfortunately, there are no very elegant answers at this time :)

The closest thing we have would be to create a fixtures.Fixture that
constructed a VM and associated the floating IP address to the instance.
You could then have separate tests that for checking connectivity and
updating the DNS server for that instance. However, fixtures are for
resources that are shared between test methods and are not modified
during those test methods. They cannot be modified, because then
parallel execution of the test methods may yield non-deterministic
results.

There would need to be a separate fixture for the instance that would
have the floating IP re-associated with it (because re-associating the
floating IP by nature is a modification to the resource).

Having a separate fixture means essentially doubling the amount of
resources used by the test case class in question, which is why we're
pushing to just have all of the tests done serially in a single test
method, even though that means that a failure to re-associate the
floating IP would mean that the update DNS server test would not be
executed.

Choose your poison.

Best,
-jay




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Trove] how to list available configuration parameters for datastores

2014-01-22 Thread Denis Makogon

Goodday to all.

#3 looks more than acceptable.
/datastores/datastore/configuration/parameters.
According to configuration parameters design, a configuration set must be
associated to exactly one datastore.

Best regards, Denis Makogon.


2014/1/22 Michael Basnight mbasni...@gmail.com

 On Jan 22, 2014, at 10:19 AM, Kaleb Pomeroy wrote:

  My thoughts so far:
 
  /datastores/datastore/configuration/parameters (Option Three)
  + configuration set without an associated datastore is meaningless
  + a configuration set must be associated to exactly one datastore
  + each datastore must have 0-1 configuration set
  + All above relationships are immediately apparent
  - Listing all configuration sets becomes more difficult (which I don't
 think that is a valid concern)

 +1 to option 3, given what kaleb and craig have outlined so far. I dont
 see the above minus as a valid concern either, kaleb.


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] our update story: can people live with it?

2014-01-22 Thread Keith Basil

On Jan 22, 2014, at 1:53 PM, Jay Pipes wrote:

 On Wed, 2014-01-22 at 13:15 -0500, Dan Prince wrote:

 - Original Message -
 From: Clint Byrum cl...@fewbar.com
 To: openstack-dev openstack-dev@lists.openstack.org
 Sent: Wednesday, January 22, 2014 12:45:45 PM
 Subject: Re: [openstack-dev] [TripleO] our update story: can people live
 with it?

 Excerpts from Dan Prince's message of 2014-01-22 09:17:24 -0800:
 I've been thinking a bit more about how TripleO updates are developing
 specifically with regards to compute nodes. What is commonly called the
 update story I think.

 As I understand it we expect people to actually have to reboot a compute
 node in the cluster in order to deploy an update. This really worries me
 because it seems like way overkill for such a simple operation. Lets say
 all I need to deploy is a simple change to Nova's libvirt driver. And
 I need to deploy it to *all* my compute instances. Do we really expect
 people to actually have to reboot every single compute node in their
 cluster for such a thing. And then do this again and again for each
 update they deploy?

 Agreed, if we make everybody reboot to push out a patch to libvirt, we
 have failed. And thus far, we are failing to do that, but with good
 reason.

 Right at this very moment, we are leaning on 'rebuild' in Nova, which
 reboots the instance. But this is so that we handle the hardest thing
 well first (rebooting to have a new kernel).

 For small updates we need to decouple things a bit more. There is a
 notion of the image ID in Nova, versus the image ID that is actually
 running. Right now we update it with a nova rebuild command only.

 But ideally we would give operators a tool to optimize and avoid the
 reboot when it is appropriate. The heuristic should be as simple as
 comparing kernels.

 When we get to implementing such a thing I might prefer it not to be 
 auto-magic. I can see a case where I want the new image but maybe not the 
 new kernel. Perhaps this should be addressed when building the image (by 
 using the older kernel)... but still. I could see a case for explicitly not 
 wanting to reboot here as well.

 ++

 Once we have determined that a new image does not
 need a reboot, we can just change the ID in Metadata, and an
 os-refresh-config script will do something like this:

 if [ $(cat /etc/image_id) != $(os-apply-config --key image_id) ] ;
 then;
download_new_image
mount_image /tmp/new_image
mount / -o remount,rw # Assuming we've achieved ro root
rsync --one-file-system -a /tmp/new_image/ /
mount / -o remount,ro # ditto
 fi

 No reboot required. This would run early in configure.d, so that any
 pre-configure.d scripts will have run to quiesce services that can't
 handle having their binaries removed out from under them (read:
 non-Unix services). Then configure.d runs as usual, configures things,
 restarts services, and we are now running the new image.

 Cool. I like this a good bit better as it avoids the reboot. Still, this is 
 a rather large amount of data to copy around if I'm only changing a single 
 file in Nova.

 Right.

 I understand the whole read only images thing plays into this too... but
 I'm wondering if there is a middle ground where things might work
 better. Perhaps we have a mechanism where we can tar up individual venvs
 from /opt/stack/ or perhaps also this is an area where real OpenStack
 packages could shine. It seems like we could certainly come up with some
 simple mechanisms to deploy these sorts of changes with Heat such that
 compute host reboot can be avoided for each new deploy.

 Given the scenario above, that would be a further optimization. I don't
 think it makes sense to specialize for venvs or openstack services
 though, so just ensure the root filesystems match seems like a
 workable, highly efficient system. Note that we've talked about having
 highly efficient ways to widely distribute the new images as well.

 Yes. Optimization! In the big scheme of things I could see 3 approaches 
 being useful:

 1) Deploy a full image and reboot if you have a kernel update. (entire image 
 is copied)

 2) Deploy a full image if you change a bunch of things and/or you prefer to 
 do that. (entire image is copied)

 3) Deploy specific application level updates via packages or tarballs. (only 
 selected applications/packages get deployed)

 ++. FWIW, #3 happens a heck of a lot more often than #1 or #2 in CD
 environments, so this level of optimization will be frequently used.
 And, as I've said before, optimizing for frequently-used scenarios is
 worth spending the time on. Optimizing for infrequently-occurring
 things... not so much. :)

I don't understand the aversion to using existing, well-known tools to handle 
this?

A hybrid model (blending 2 and 3, above) here I think would work best where
TripleO lays down a baseline image and the cloud operator would employ an 
well-known
and support configuration tool for

Re: [openstack-dev] [TripleO] our update story: can people live with it?

Excerpts from Fox, Kevin M's message of 2014-01-22 12:19:56 -0800:
 I think most of the time taken to reboot is spent in bringing down/up the 
 services though, so I'm not sure what it really buys you if you do it all. It 
 may let you skip the crazy long bootup time on enterprise hardware, but 
 that could be worked around with kexec on the full reboot method too.
 

If we could get kexec reliable.. but I have no evidence that it is
anything but a complete flake.

What it saves you is losing running processes that you don't end up
killing, which is expensive on many types of services.. Nova Compute
being a notable example.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][neutron]About creating vms without ip address

2014-01-22 Thread Yuriy Taraday

On Thu, Jan 23, 2014 at 12:04 AM, CARVER, PAUL pc2...@att.com wrote:

  Can you elaborate on what you mean by this? You can turn of Neutron’s
 dnsmasq on a per network basis, correct? Do you mean something else by
 “make its dnsmasq service quiet”?


What I meant is for dnsmasq to not send offers to specific VMs so that
Fuel's DHCP service will serve them. We shouldn't shut off network's DHCP
entirely though since we still need Fuel VM to receive some address for
external connectivity.

-- 

Kind regards, Yuriy.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [qa][Neutron][Tempest][Network] Break down NetworkBasicOps to smaller test cases

2014-01-22 Thread David Kranz


On 01/22/2014 03:19 PM, Jay Pipes wrote:

On Tue, 2014-01-21 at 01:15 -0500, Yair Fried wrote:

I seem to be unable to convey my point using generalization, so I will give a 
specific example:
I would like to have update dns server as an additional network scenario. 
Currently I could add it to the existing module:

1. tests connectivity
2. re-associate floating ip
3. update dns server

In which case, failure to re-associate ip will prevent my test from running, 
even though these are completely unrelated scenarios, and (IMO) we would like 
to get feedback on both of them.

Another way, is to copy the entire network_basic_ops module, remove re-associate floating 
ip and add update dns server. For the obvious reasons - this also seems like the 
wrong way to go.

I am looking for an elegant way to share the code of these scenarios.

Well, unfortunately, there are no very elegant answers at this time :)

The closest thing we have would be to create a fixtures.Fixture that
constructed a VM and associated the floating IP address to the instance.
You could then have separate tests that for checking connectivity and
updating the DNS server for that instance. However, fixtures are for
resources that are shared between test methods and are not modified
during those test methods. They cannot be modified, because then
parallel execution of the test methods may yield non-deterministic
results.

There would need to be a separate fixture for the instance that would
have the floating IP re-associated with it (because re-associating the
floating IP by nature is a modification to the resource).

Having a separate fixture means essentially doubling the amount of
resources used by the test case class in question, which is why we're
pushing to just have all of the tests done serially in a single test
method, even though that means that a failure to re-associate the
floating IP would mean that the update DNS server test would not be
executed.

Choose your poison.

Best,
-jay




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Thanks, Jay. So to close this loop, I think Yair started down this road 
after receiving feedback that this test was getting too much stuff in 
it. Sounds like you are advocating putting more stuff in it as the least 
of evils. Which is fine by me because it is a lot easier.


 -David

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] Changes coming in gate structure


Changes coming in gate structure


Unless you've been living under a rock, on the moon, around Saturn,
you'll have noticed that the gate has been quite backed up the last 2
weeks. Every time we get towards a milestone this gets measurably
worse, and the expectation at is at i3 we're going to see at least 40%
more load than we are dealing with now (if history is any indication),
which doesn't bode well.

It turns out, when you have a huge and rapidly growing Open Source
project, you keep finding scaling limits in existing software, your
software, and approaches in general. It also turns out that you find
out that you need to act defensively on situations that you didn't
think you'd have to worry about. Like code reviews with 3 month old
test results being put into the review queue. Or code that *can't*
pass (which a look at the logs would show) being reverified in the
gate.

All of these things compound on the fact that there are real bugs in
OpenStack, which end up having a non linear failure effect. Once you
get past a certain point the failure rates multiply to the point where
everything stops (which happened Sunday, when we only merged 4 changes
in 24 hrs).

The history of the gate structure is a long one. It was added in
Diablo when there was a project which literally would not run with
the other OpenStack components. The idea of gating merge of everything
on everything else is to ensure we have some understanding that
OpenStack actually works, all together, for some set of
configurations.

It wasn't until Folsom cycle that we started running these tests before
Human review (kind of amazing).

The gate is also based on an assumption that most of the bugs we are
catching are outside to project, vs. bugs that are already in the
project. However, in an asynchronous system, bugs can show up only
very occasionally, and get past our best efforts to detect them, then
pile up in the code base until we rout them out.

=
Towards a Svelter Gate - Leaning on Check
=

We've got a current plan of attack to try to maintain nearly the same
level of integration test guarantees, and hope to make it so on the
merge side we're able to get more throughput. This is a set of things
that all have to happen at once to not completely blow out the
guarantees we've got in the source.

Make a clean recent Check prereq for entering gate
==

A huge compounding problem has been patches that can't pass being
promoted to the gate. So we're going to make Zuul able to enforce a
recent clean check scorecard before going into the gate. Our working
theory of recent is last 24hrs.

If it doesn't have a recent set of check results on +A, we'll trigger
a check rerun, and if clean, it gets sent to the gate.

We'll also probably add a sweeper to zuul so it will refresh results
on changes that are getting comments on them that are older than some
number of days automatically.

Svelt Gate
==

The gate jobs will be trimmed down immensely. Nothing project
specific, so pep8 / unit tests all ripped out, no functional test
runs. Less overall configs. Exactly how minimal we'll figure out as we
decide what we can live without. The floor for this would be
devstack-tempest-full and grenade.

This is basically sanity check that the combination of patches in
flight doesn't ruin the world for everyone.

Idle Cloud for Elastic Recheck Bugs
===

We have actually been using gate as double duty, both as ensuring
integration, but also as a set of clean test results to figure out
what bugs are in OpenStack that only show up from time to time. The
check queue is way too noisy, as our system actually blocks tons of
bad code from getting in.

With the Svelt gate, we'll need a set of background nodes to build
that dataset. But with elastic search we now have the technology, so
this is good.

It will let us work these issues in parallel. This issues will still
cause people pain in getting clean results in check.

=
Timelines, Dangers, and Opportunities
=

We need changes soon. Every past experience is milestone 3 is 40%
heavier than milestone 2, and nothing indicates that icehouse is going
to be any different. So Jim's put getting these required bits into
Zuul to the top of his list, and we're hoping we'll have them within a
week.

With this approach, wedging the gate is highly unlikely. However as we
won't be testing every check test again in gate, it means there is a
possibility that a combination of patches might make the check results
wedge for everyone (like pg job gets wedged). So it moves that issue
around. Right now it's hard to say if that particular issue will get
better or worse. However the Sherlock rule of gate blocks remains in
effect: once you've eliminated the impossible, any

Re: [openstack-dev] [TripleO] our update story: can people live with it?

On Wed, 2014-01-22 at 12:12 -0800, Clint Byrum wrote:
 Excerpts from Jay Pipes's message of 2014-01-22 10:53:14 -0800:
  On Wed, 2014-01-22 at 13:15 -0500, Dan Prince wrote:

   - Original Message -
From: Clint Byrum cl...@fewbar.com
To: openstack-dev openstack-dev@lists.openstack.org
Sent: Wednesday, January 22, 2014 12:45:45 PM
Subject: Re: [openstack-dev] [TripleO] our update story: can people 
livewith it?

Given the scenario above, that would be a further optimization. I don't
think it makes sense to specialize for venvs or openstack services
though, so just ensure the root filesystems match seems like a
workable, highly efficient system. Note that we've talked about having
highly efficient ways to widely distribute the new images as well.

   Yes. Optimization! In the big scheme of things I could see 3 approaches 
   being useful:

   1) Deploy a full image and reboot if you have a kernel update. (entire 
   image is copied)

   2) Deploy a full image if you change a bunch of things and/or you prefer 
   to do that. (entire image is copied)

   3) Deploy specific application level updates via packages or tarballs. 
   (only selected applications/packages get deployed)

  ++. FWIW, #3 happens a heck of a lot more often than #1 or #2 in CD
  environments, so this level of optimization will be frequently used.
  And, as I've said before, optimizing for frequently-used scenarios is
  worth spending the time on. Optimizing for infrequently-occurring
  things... not so much. :)

 I do understand that little tweaks are more common than whole software
 updates.

 I also think that little tweaks must be tested just like big ones.

 So I would argue that it is more important to optimize for trusting that
 what you tested is what is in production, and then to address any issues
 if that work-flow needs optimization. A system that leaves operators
 afraid to do a big update because it will trigger the bad path is a
 system that doesn't handle big updates well.

 Ideally we'd optimize all 3 in all of the obvious ways before determining
 that the one file update just isn't fast enough.

Well said. No disagreement from me.

Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Changes coming in gate structure

2014-01-22 Thread John Griffith

On Wed, Jan 22, 2014 at 1:39 PM, Sean Dague s...@dague.net wrote:
 
 Changes coming in gate structure
 

 Unless you've been living under a rock, on the moon, around Saturn,
 you'll have noticed that the gate has been quite backed up the last 2
 weeks. Every time we get towards a milestone this gets measurably
 worse, and the expectation at is at i3 we're going to see at least 40%
 more load than we are dealing with now (if history is any indication),
 which doesn't bode well.

 It turns out, when you have a huge and rapidly growing Open Source
 project, you keep finding scaling limits in existing software, your
 software, and approaches in general. It also turns out that you find
 out that you need to act defensively on situations that you didn't
 think you'd have to worry about. Like code reviews with 3 month old
 test results being put into the review queue. Or code that *can't*
 pass (which a look at the logs would show) being reverified in the
 gate.

 All of these things compound on the fact that there are real bugs in
 OpenStack, which end up having a non linear failure effect. Once you
 get past a certain point the failure rates multiply to the point where
 everything stops (which happened Sunday, when we only merged 4 changes
 in 24 hrs).

 The history of the gate structure is a long one. It was added in
 Diablo when there was a project which literally would not run with
 the other OpenStack components. The idea of gating merge of everything
 on everything else is to ensure we have some understanding that
 OpenStack actually works, all together, for some set of
 configurations.

 It wasn't until Folsom cycle that we started running these tests before
 Human review (kind of amazing).

 The gate is also based on an assumption that most of the bugs we are
 catching are outside to project, vs. bugs that are already in the
 project. However, in an asynchronous system, bugs can show up only
 very occasionally, and get past our best efforts to detect them, then
 pile up in the code base until we rout them out.

 =
 Towards a Svelter Gate - Leaning on Check
 =

 We've got a current plan of attack to try to maintain nearly the same
 level of integration test guarantees, and hope to make it so on the
 merge side we're able to get more throughput. This is a set of things
 that all have to happen at once to not completely blow out the
 guarantees we've got in the source.

 Make a clean recent Check prereq for entering gate
 ==

 A huge compounding problem has been patches that can't pass being
 promoted to the gate. So we're going to make Zuul able to enforce a
 recent clean check scorecard before going into the gate. Our working
 theory of recent is last 24hrs.

 If it doesn't have a recent set of check results on +A, we'll trigger
 a check rerun, and if clean, it gets sent to the gate.

 We'll also probably add a sweeper to zuul so it will refresh results
 on changes that are getting comments on them that are older than some
 number of days automatically.

 Svelt Gate
 ==

 The gate jobs will be trimmed down immensely. Nothing project
 specific, so pep8 / unit tests all ripped out, no functional test
 runs. Less overall configs. Exactly how minimal we'll figure out as we
 decide what we can live without. The floor for this would be
 devstack-tempest-full and grenade.

 This is basically sanity check that the combination of patches in
 flight doesn't ruin the world for everyone.

 Idle Cloud for Elastic Recheck Bugs
 ===

 We have actually been using gate as double duty, both as ensuring
 integration, but also as a set of clean test results to figure out
 what bugs are in OpenStack that only show up from time to time. The
 check queue is way too noisy, as our system actually blocks tons of
 bad code from getting in.

 With the Svelt gate, we'll need a set of background nodes to build
 that dataset. But with elastic search we now have the technology, so
 this is good.

 It will let us work these issues in parallel. This issues will still
 cause people pain in getting clean results in check.

 =
 Timelines, Dangers, and Opportunities
 =

 We need changes soon. Every past experience is milestone 3 is 40%
 heavier than milestone 2, and nothing indicates that icehouse is going
 to be any different. So Jim's put getting these required bits into
 Zuul to the top of his list, and we're hoping we'll have them within a
 week.

 With this approach, wedging the gate is highly unlikely. However as we
 won't be testing every check test again in gate, it means there is a
 possibility that a combination of patches might make the check results
 wedge for everyone (like pg job gets wedged). So it moves that issue
 around. Right now it's hard to say

Re: [openstack-dev] [TripleO] our update story: can people live with it?

Excerpts from Keith Basil's message of 2014-01-22 12:27:50 -0800:
 On Jan 22, 2014, at 1:53 PM, Jay Pipes wrote:

  On Wed, 2014-01-22 at 13:15 -0500, Dan Prince wrote:

  - Original Message -
  From: Clint Byrum cl...@fewbar.com
  To: openstack-dev openstack-dev@lists.openstack.org
  Sent: Wednesday, January 22, 2014 12:45:45 PM
  Subject: Re: [openstack-dev] [TripleO] our update story: can people live  
with it?

  Excerpts from Dan Prince's message of 2014-01-22 09:17:24 -0800:
  I've been thinking a bit more about how TripleO updates are developing
  specifically with regards to compute nodes. What is commonly called the
  update story I think.

  As I understand it we expect people to actually have to reboot a compute
  node in the cluster in order to deploy an update. This really worries me
  because it seems like way overkill for such a simple operation. Lets say
  all I need to deploy is a simple change to Nova's libvirt driver. And
  I need to deploy it to *all* my compute instances. Do we really expect
  people to actually have to reboot every single compute node in their
  cluster for such a thing. And then do this again and again for each
  update they deploy?

  Agreed, if we make everybody reboot to push out a patch to libvirt, we
  have failed. And thus far, we are failing to do that, but with good
  reason.

  Right at this very moment, we are leaning on 'rebuild' in Nova, which
  reboots the instance. But this is so that we handle the hardest thing
  well first (rebooting to have a new kernel).

  For small updates we need to decouple things a bit more. There is a
  notion of the image ID in Nova, versus the image ID that is actually
  running. Right now we update it with a nova rebuild command only.

  But ideally we would give operators a tool to optimize and avoid the
  reboot when it is appropriate. The heuristic should be as simple as
  comparing kernels.

  When we get to implementing such a thing I might prefer it not to be 
  auto-magic. I can see a case where I want the new image but maybe not the 
  new kernel. Perhaps this should be addressed when building the image (by 
  using the older kernel)... but still. I could see a case for explicitly 
  not wanting to reboot here as well.

  ++

  Once we have determined that a new image does not
  need a reboot, we can just change the ID in Metadata, and an
  os-refresh-config script will do something like this:

  if [ $(cat /etc/image_id) != $(os-apply-config --key image_id) ] ;
  then;
 download_new_image
 mount_image /tmp/new_image
 mount / -o remount,rw # Assuming we've achieved ro root
 rsync --one-file-system -a /tmp/new_image/ /
 mount / -o remount,ro # ditto
  fi

  No reboot required. This would run early in configure.d, so that any
  pre-configure.d scripts will have run to quiesce services that can't
  handle having their binaries removed out from under them (read:
  non-Unix services). Then configure.d runs as usual, configures things,
  restarts services, and we are now running the new image.

  Cool. I like this a good bit better as it avoids the reboot. Still, this 
  is a rather large amount of data to copy around if I'm only changing a 
  single file in Nova.

  Right.

  I understand the whole read only images thing plays into this too... but
  I'm wondering if there is a middle ground where things might work
  better. Perhaps we have a mechanism where we can tar up individual venvs
  from /opt/stack/ or perhaps also this is an area where real OpenStack
  packages could shine. It seems like we could certainly come up with some
  simple mechanisms to deploy these sorts of changes with Heat such that
  compute host reboot can be avoided for each new deploy.

  Given the scenario above, that would be a further optimization. I don't
  think it makes sense to specialize for venvs or openstack services
  though, so just ensure the root filesystems match seems like a
  workable, highly efficient system. Note that we've talked about having
  highly efficient ways to widely distribute the new images as well.

  Yes. Optimization! In the big scheme of things I could see 3 approaches 
  being useful:

  1) Deploy a full image and reboot if you have a kernel update. (entire 
  image is copied)

  2) Deploy a full image if you change a bunch of things and/or you prefer 
  to do that. (entire image is copied)

  3) Deploy specific application level updates via packages or tarballs. 
  (only selected applications/packages get deployed)

  ++. FWIW, #3 happens a heck of a lot more often than #1 or #2 in CD
  environments, so this level of optimization will be frequently used.
  And, as I've said before, optimizing for frequently-used scenarios is
  worth spending the time on. Optimizing for infrequently-occurring
  things... not so much. :)

 I don't understand the aversion to using existing, well-known tools to handle 
 this?

These tools are of

Re: [openstack-dev] [gantt] How to include nova modules in unit tests

2014-01-22 Thread Joe Gordon

On Tue, Jan 21, 2014 at 7:35 PM, Dugger, Donald D donald.d.dug...@intel.com
 wrote:

 Well, the first goal is to get the scheduler code into a separate tree,
 even though that code is still utilizing common code from nova.  Right now
 just about every scheduler file includes some nova modules.  Ultimately
 yes, we want to remove the depency on nova but that is a future effort and
 would create way too many changes for the immediate future.


The nova code you are trying to use isn't a public API and can change at
any time. Before considering using gantt we would have to fully remove any
nova imports in gantt.



 When we want to cut the cord from nova it'll be easy, just remove that
 line from the `test-requirements.txt' file and we'll be forced to replace
 all of the nova code.


I'm not sure it will actually be that easy.



 --
 Don Dugger
 Censeo Toto nos in Kansa esse decisse. - D. Gale
 Ph: 303/443-3786

 -Original Message-
 From: Robert Collins [mailto:robe...@robertcollins.net]
 Sent: Tuesday, January 21, 2014 5:16 PM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [gantt] How to include nova modules in unit
 tests

 On 22 January 2014 11:57, Dugger, Donald D donald.d.dug...@intel.com
 wrote:
  I almost have the unit tests for gantt working except for one problem
  - is there a way to have the test infrastructure allow the gantt tree
  to import objects from the nova tree.
 
 
 
  The problem is that we want to break out just the scheduler code into
  the gantt tree without duplicating all of nova.  The current scheduler
  has many imports of nova objects, which is not a problem except for the
 unit tests.
  The unit tests run in an environment that doesn't include the nova
  tree so all of those imports wind up failing.

 The goal though is to have an independent system; perhaps marking all the
 tests that still depend on tendrils of nova 'skipped' and then work on
 burning down the skips to 0 is a better approach than making it easy to
 have such dependencies?

 -Rob

 --
 Robert Collins rbtcoll...@hp.com
 Distinguished Technologist
 HP Converged Cloud

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] our update story: can people live with it?

2014-01-22 Thread Fox, Kevin M

Maybe I misunderstand, but I thought:

kexec - lets you boot a new kernel/initrd starting at the point a boot loader 
would skipping the bios init. All previous running processes are not running in 
the new boot just like a normal reboot.

CRIU - Lets you snapshot/restart running processes.

While you could use both together to upgrades kernel while leaving all the 
processes running after the reboot,  I don't think that's very tested at the 
moment. checkpointing the system memory is not without cost too. Restarting the 
services may be faster.

I think we're pretty far off in a tangent though. My main point was, if you 
can't selectively restart services as needed, I'm not sure how useful patching 
the image really is over a full reboot. It should take on the same order of 
magnitude service unavailability I think.

Thanks,
Kevin


From: Clint Byrum [cl...@fewbar.com]
Sent: Wednesday, January 22, 2014 12:36 PM
To: openstack-dev
Subject: Re: [openstack-dev] [TripleO] our update story: can people live
with it?

Excerpts from Fox, Kevin M's message of 2014-01-22 12:19:56 -0800:
 I think most of the time taken to reboot is spent in bringing down/up the 
 services though, so I'm not sure what it really buys you if you do it all. It 
 may let you skip the crazy long bootup time on enterprise hardware, but 
 that could be worked around with kexec on the full reboot method too.


If we could get kexec reliable.. but I have no evidence that it is
anything but a complete flake.

What it saves you is losing running processes that you don't end up
killing, which is expensive on many types of services.. Nova Compute
being a notable example.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Changes coming in gate structure