Re: [openstack-dev] [OpenStack-Dev] Cherry picking commit from oslo-incubator
On 21/01/14 13:14 -0500, Joe Gordon wrote: On Jan 17, 2014 12:24 AM, Flavio Percoco fla...@redhat.com wrote: On 16/01/14 17:32 -0500, Doug Hellmann wrote: On Thu, Jan 16, 2014 at 3:19 PM, Ben Nemec openst...@nemebean.com wrote: On 2014-01-16 13:48, John Griffith wrote: Hey Everyone, A review came up today that cherry-picked a specific commit to OSLO Incubator, without updating the rest of the files in the module. I rejected that patch, because my philosophy has been that when you update/pull from oslo-incubator it should be done as a full sync of the entire module, not a cherry pick of the bits and pieces that you may or may not be interested in. As it turns out I've received a bit of push back on this, so it seems maybe I'm being unreasonable, or that I'm mistaken in my understanding of the process here. To me it seems like a complete and total waste to have an oslo-incubator and common libs if you're going to turn around and just cherry pick changes, but maybe I'm completely out of line. Thoughts?? I suppose there might be exceptions, but in general I'm with you. For one thing, if someone tries to pull out a specific change in the Oslo code, there's no guarantee that code even works. Depending on how the sync was done it's possible the code they're syncing never passed the Oslo unit tests in the form being synced, and since unit tests aren't synced to the target projects it's conceivable that completely broken code could get through Jenkins. Obviously it's possible to do a successful partial sync, but for the sake of reviewer sanity I'm -1 on partial syncs without a _very_ good reason (like it's blocking the gate and there's some reason the full module can't be synced). I agree. Cherry picking a single (or even partial) commit really should be avoided. The update tool does allow syncing just a single module, but that should be used very VERY carefully, especially because some of the changes we're making as we work on graduating some more libraries will include cross-dependent changes between oslo modules. Agrred. Syncing on master should be complete synchornization from Oslo incubator. IMHO, the only case where cherry-picking from oslo should be allowed is when backporting patches to stable branches. Master branches should try to keep up-to-date with Oslo and sync everything every time. When we started Oslo incubator, we treated that code as trusted. But since then there have been occasional issues when syncing the code. So Oslo incubator code has lost *my* trust. Therefore I am always a hesitant to do a full Oslo sync because I am not an expert on the Oslo code and I risk breaking something when doing it (the issue may not appear 100% of the time too). Syncing code in becomes the first time that code is run against tempest, which scares me. While this might be true in some cases, I think we should address it differently. Just dropping the trust on the project won't help much. I would like to propose having a integration test job in Oslo incubator that syncs in the code, similar to how we do global requirements. But isn't this what other gates are for? I mean, when proposing an oslo sync, each project has it's own gate plus integrated tests that do this exact job. Additionally, what about a periodic jenkins job that does the Oslo syncs and is managed by the Oslo team itself? This would be awesome. It would take the burden of doing the sync from the project maintainers. Before doing this, though, we need to improve the `update` script. Currently, there's no good way to generate useful commit messages out of the sync. Cheers, FF -- @flaper87 Flavio Percoco pgpfQZZGqrqIv.pgp Description: PGP signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron]About creating vms without ip address
Hi Dong, Can you elaborate an example of what you get, and what you were expecting exactly?. I have a similar problem within one operator, where they assign you sparse blocks of IP addresses (floating IPs), directly routed to your machine, and they also assign the virtual mac addresses from their API. Direct routing means, that the subnet router will route your IP from outside the subnet directly through your subnet, to your machine..., and the traffic (with external IP) is routed back to this internal router through the subnet to this router. Chears, - Original Message - From: Dong Liu willowd...@gmail.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Sent: Tuesday, January 21, 2014 9:52:44 AM Subject: [openstack-dev] [nova][neutron]About creating vms without ip address Hi fellow OpenStackers I found that we could not create vms without ip address. But in the telecom scene, the ip address usually managed by the telecom network element themselves. So they need a vm without ip address and configurate it through some specific method. How can we provide a kind of vms like this. I think provide a bility that allow tenant to create vm without ip address is necessary. What's your opinion? Regards Dong Liu ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Neutron] PMTUd broken in gre networks
On 22 January 2014 00:00, Robert Collins robe...@robertcollins.net wrote: I think dropping frames that can't be forwarded is entirely sane - at a guess it's what a physical ethernet switch would do if you try to send a 1600 byte frame (on a non-jumbo-frame switched network) - but perhaps there is an actual standard for this we could follow? Speaking from bitter experience, if you've misconfigured your switch so that it's dropping packets for this reason, you will have a period of hair tearing out to solve the problem before you work it out. Believe me, been there, rabbit messages that don't turn up because they're the first ones that were too big are not a helpful diagnostic indicator. Getting the MTU *right* on all hosts seems to be key to keeping your hair attached to your head for a little longer. Hence the DHCP suggestion to set it to the right value. (c) we require Neutron plugins to work out the MTU, which for any encap except VLAN is (host interface MTU - header size). do you mean tunnel wrap overheads? (What if a particular tunnel has a trailer.. crazy talk I know). Yup, basically. Unfortunately, thinking about this a bit more, you can't easily be certain what the max packet size allowed in a GRE tunnel is going to be, because you don't know which interface it's going over (or what's between), but to a certain extent we can use config items to fix what we can't discover. -- Ian. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] [Tuskar] Terminology Revival #1 - Roles
On 2014/22/01 00:56, Tzu-Mainn Chen wrote: Hiya - Resource is actually a Heat term that corresponds to what we're deploying within the Overcloud Stack - i.e., if we specify that we want an Overcloud with 1 Controller and 3 Compute, Heat will create a Stack that contains 1 Controller and 3 Compute Resources. Then a quick question - why do we design deployment by increasing/decreasing number of *instances* instead of resources? -- Jarda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] Questions regarding image location and glanceclient behaviour ...
Hi All, I have two questions ... 1) Glance v1 APIs can take a --location argument when creating an image but v2 APIs can't - bug or feature? (Details below) 2) How should glanceclient (v2 commands) handle reserved attributes? a) status quo: (Apparently) let the user set them but the server will return attribute is reserved error. Pros: No missing functionality, no damage done. Cons: Bad usability. b) hard-code list of reserved attributes in client and don't expose them to the user. Pros: quick to implement. Cons: Need to track reserved attributes in server implementation. c) get-reserved words from schema downloaded from server (and don't expose them to the user). Pros: Don't need to track server implmentation. Cons: Complex - reserved words can vary from command to command. I personally favor (b) on the grounds that a client implementation needs to closely understand server behaviour anyway so the sync-ing of reserved attributes shouldn't be a big problem (*provided* the list of reserved attributes is made available in the reference documentation which doesn't seem to be the case currently). So what does everybody think? details When using glance client's v1 interface I can image-create an image and specify the image file's location via the --location parameter. Alternatively I can image-create an empty image and then image-update the image's location to some url. However, when using the client's v2 commands I can neither image-create the file using the --location parameter, nor image-update the file later. When using image-create with --location, the client gives the following error (printed by warlock): Unable to set 'locations' to '[u'http://192.168.1.111/foo/bar']' This is because the schema dictates that the location should be an object of the form [{url: string, metadata: object}, ...] but there is no way to specify such an object from the command line - I cannot specify a string like '{url: 192.168.1.111/foo/bar, metadata: {}}' for there is no conversion from command line strings to python dicts nor is there any conversion from a simple URL string to a suitable location object. If I modify glanceclient.v2.images.Controller.create to convert the locations parameter from a URL string to the desired object then the request goes through to the glance server where it fails with a 403 error (Attribute 'locations' is reserved). So is this discrepancy between V1 V2 deliberate (a feature :)) or is it a bug? /details ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] [Tuskar] Terminology Revival #1 - Roles
On 2014/22/01 00:56, Tzu-Mainn Chen wrote: Hiya - Resource is actually a Heat term that corresponds to what we're deploying within the Overcloud Stack - i.e., if we specify that we want an Overcloud with 1 Controller and 3 Compute, Heat will create a Stack that contains 1 Controller and 3 Compute Resources. Then a quick question - why do we design deployment by increasing/decreasing number of *instances* instead of resources? -- Jarda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [requirements][oslo] Upgrade six to 1.5.2?
On Tue, Jan 21 2014, ZhiQiang Fan wrote: six 1.5.2 has been released on 2014-01-06, it provides urllib/urlparse compatibility. Is there any plan to upgrade six to 1.5.2? (since it is fresh new, may need some time to test) six 1.4.1 is lack of urllib/urlparse support, so oslo-incubator/py3kcompat is needed, and it is used in some projects, if we upgrade six, should we remove py3k in the same time, or just leave those code there? Upgrade and remove our own code, that'd be better. I think us all Python 3 hackers will be ok enough to handle the transition as needed. -- Julien Danjou # Free Software hacker # independent consultant # http://julien.danjou.info signature.asc Description: PGP signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [OpenStack-Dev] Cherry picking commit from oslo-incubator
On Tue, Jan 21 2014, Joe Gordon wrote: I would like to propose having a integration test job in Oslo incubator that syncs in the code, similar to how we do global requirements. I don't think that would be possible as a voting job, since the point of oslo-incubator is to be able to break the API compatibility. -- Julien Danjou ;; Free Software hacker ; independent consultant ;; http://julien.danjou.info signature.asc Description: PGP signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Next steps for Whole Host allocation / Pclouds
Le 22/01/2014 02:50, Jay Pipes a écrit : Yup, agreed. It's difficult to guess what the capacity implications would be without having solid numbers on customer demands for this functionality, including hard data on how long such instances would typically live (see my previous point about re-using compute hosts for other purposes once the last dedicated instance is terminated on that host). Best, -jay My personal opinion (but I can be wrong) is that such feature would only be accepted by operators only if there is some termination period defined when you create a dedicated instance. Again, what happens when the lease (or the lock-in) ends should be defined by the operator, on his own convenience, and that's why Climate is behaviour-driven by configuration flags for lease termination. Back to the initial subject, I think that's pretty good having such dedicated instances model in Nova (thanks to an API extension, which could be non-core), but the instance lifecycle (in case of termination period) should stay in Climate, IMHO. -Sylvain ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [gantt] How to include nova modules in unit tests
Le 22/01/2014 01:37, Dugger, Donald D a écrit : Sylvain- Tnx, that worked great. (Now if I can just find a way to get the affinity tests working, all the other tests pass. I only have 17 tests failing out of 254.) I'm pretty busy these days with Climate 0.1 to deliver, but if I find some time, I will take a look on these. -Sylvain ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [requirements][oslo] Upgrade six to 1.5.2?
On Wed, Jan 22, 2014 at 11:17 AM, Julien Danjou jul...@danjou.info wrote: On Tue, Jan 21 2014, ZhiQiang Fan wrote: six 1.5.2 has been released on 2014-01-06, it provides urllib/urlparse compatibility. Is there any plan to upgrade six to 1.5.2? (since it is fresh new, may need some time to test) six 1.4.1 is lack of urllib/urlparse support, so oslo-incubator/py3kcompat is needed, and it is used in some projects, if we upgrade six, should we remove py3k in the same time, or just leave those code there? Upgrade and remove our own code, that'd be better. I think us all +1 less code we maintain is a good thing :) Chmouel. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Neutron] PMTUd broken in gre networks
On 22 January 2014 21:28, Ian Wells ijw.ubu...@cack.org.uk wrote: On 22 January 2014 00:00, Robert Collins robe...@robertcollins.net wrote: I think dropping frames that can't be forwarded is entirely sane - at a guess it's what a physical ethernet switch would do if you try to send a 1600 byte frame (on a non-jumbo-frame switched network) - but perhaps there is an actual standard for this we could follow? Speaking from bitter experience, if you've misconfigured your switch so that it's dropping packets for this reason, you will have a period of hair tearing out to solve the problem before you work it out. Believe me, been there, rabbit messages that don't turn up because they're the first ones that were too big are not a helpful diagnostic indicator. PMTU blackhole problems show the same symptoms :) - been there, done tat. Getting the MTU *right* on all hosts seems to be key to keeping your hair attached to your head for a little longer. Hence the DHCP suggestion to set it to the right value. I certainly think having the MTU set to the right value is important. I wonder if there's a standard way we can signal the MTU (e.g. in the virtio interface) other than DHCP. Not because DHCP is bad, but because that would work with statically injected network configs as well. (c) we require Neutron plugins to work out the MTU, which for any encap except VLAN is (host interface MTU - header size). do you mean tunnel wrap overheads? (What if a particular tunnel has a trailer.. crazy talk I know). Yup, basically. Unfortunately, thinking about this a bit more, you can't easily be certain what the max packet size allowed in a GRE tunnel is going to be, because you don't know which interface it's going over (or what's between), but to a certain extent we can use config items to fix what we can't discover. One thing we could do is encourage OS vendors to turn /proc/sys/net/ipv4/tcp_mtu_probing (http://www.ietf.org/rfc/rfc4821.txt) on in combination with dropping over-size frames. That should detect the actual MTU. Another thing would be for encapsulation failures in the switch to be reflected in the vNIC in the instance - export back media errors (e.g. babbles) so that users can diagnose problems. Note that IPv6 doesn't *have* a DF bit, because routers are not permitted to fragment - arguably encapsulating an ipv6 frame in GRE and then fragmenting the outer layer is a violation of that. As for automatically determining the size - we can determine the PMTU between all hosts in the mesh, report those back centrally and take the lowest then subtract the GRE overhead. -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Heat] Reducing pep8 ignores
Hi all, we have an approved blueprint that concerns reducing number of ignored PEP8 and openstack/hacking style checks for heat ( https://blueprints.launchpad.net/heat/+spec/reduce-flake8-ignored-rules). I've been already warned that enabling some of these rules will be quite controversial, and personally I do not like some of these rules myself either. In order to understand what is the opinion of the community, I would like to ask you to leave a comment on the blueprint page about what do you think about enabling these checks. The style rules being currently ignored are: F841 local variable 'json_template' is assigned to but never used H201 no 'except:' at least use 'except Exception:' (this actually checks for bare 'except:' lines, so 'except BaseException:' will pass too) H302 do not import objects, only modules (this I don't like myself as it can clutter the code beyond reasonable limit) H306 imports not in alphabetical order H404 multi line docstring should start with a summary Another question I have is how to proceed with such changes. I've already implemented H306 (order of imports) and am being now puzzled with how to propose such change to Gerrit. This change naturally touches many files (163 so far) and as such is clearly not suited for review in one piece. The only solution I currently can think of is to split it in 4-5-6 patches without actually changing tox.ini, and after all of them are merged, issue a final patch that updates tox.ini and any files breaking the rule that were introduced in between. But there is still a question on how Jenkins works with verify and merge jobs. Can it happen that we end up with code in master that does not pass pep8 check? Or there will be a 'race condition' between my final patch and any other that breaks the style rules? I would really appreciate any thoughts and comments about this. Best regards, Pavlo Shchelokovskyy. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [savanna] savannaclient v2 api
Current EDP config-hints are not only plugin specific. Several types of jobs must have certain key/values and without it job will fail. For instance, MapReduce (former Jar) job type requires Mapper/Reducer classes parameters to be set[1]. Moreover, for such kind of jobs we already have separated configuration defaults [2]. Also initial versions of patch implementing config-hints contained plugin-independent defaults for all each job types [3]. I remember we postponed decision about which configs are commmon for all plugins and agreed to show users all vanilla-specific defaults. That's why now we have several TODOs in the code about config-hints should be plugin-specific. So I propose to leave config-hints REST call in EDP internal and make it plugin-independent (or job-specific) by removing of parsing all vanilla-specific defaults and define small list of configs which is definitely common for each type of jobs. The first things come to mind: - For MapReduce jobs it's already defined in [1] - Configs like number of map and reduce tasks are common for all type of jobs - At least user always has an ability to set any key/value(s) as params/arguments for job [1] http://docs.openstack.org/developer/savanna/userdoc/edp.html#workflow [2] https://github.com/openstack/savanna/blob/master/savanna/service/edp/resources/mapred-job-config.xml [3] https://review.openstack.org/#/c/45419/10 Regards, Alexander Ignatov On 20 Jan 2014, at 22:04, Matthew Farrellee m...@redhat.com wrote: On 01/20/2014 12:50 PM, Andrey Lazarev wrote: Inlined. On Mon, Jan 20, 2014 at 8:15 AM, Matthew Farrellee m...@redhat.com mailto:m...@redhat.com wrote: (inline, trying to make this readable by a text-only mail client that doesn't use tabs to indicate quoting) On 01/20/2014 02:50 AM, Andrey Lazarev wrote: -- FIX - @rest.get('/jobs/config-hints/job_type') - should move to GET /plugins/plugin_name/plugin_version, similar to get_node_processes and get_required_image_tags -- Not sure if it should be plugin specific right now. EDP uses it to show some configs to users in the dashboard. it's just a cosmetic thing. Also when user starts define some configs for some job he might not define cluster yet and thus plugin to run this job. I think we should leave it as is and leave only abstract configs like Mapper/Reducer class and allow users to apply any key/value configs if needed. FYI, the code contains comments suggesting it should be plugin specific. https://github.com/openstack/savanna/blob/master/savanna/service/edp/workflow_creator/workflow_factory.py#L179 https://github.com/openstack/__savanna/blob/master/savanna/__service/edp/workflow_creator/__workflow_factory.py#L179 https://github.com/openstack/__savanna/blob/master/savanna/__service/edp/workflow_creator/__workflow_factory.py#L179 https://github.com/openstack/savanna/blob/master/savanna/service/edp/workflow_creator/workflow_factory.py#L179 IMHO, the EDP should have no plugin specific dependencies. If it currently does, we should look into why and see if we can't eliminate this entirely. [AL] EDP uses plugins in two ways: 1. for HDFS user 2. for config hints I think both items should not be plugin specific on EDP API level. But implementation should go to plugin and call plugin API for result. In fact they are both plugin specific. The user is forced to click through a plugin selection (when launching a job on transient cluster) or the plugin selection has already occurred (when launching a job on an existing cluster). Since the config is something that is plugin specific, you might not have hbase hints from vanilla but you would from hdp, and you already have plugin information whenever you ask for a hint, my view that this be under the /plugins namespace is growing stronger. [AL] Disagree. They are plugin specific, but EDP itself could have additional plugin-independent logic inside. Now config hints return EDP properties (like mapred.input.dir) as well as plugin-specific properties. Placing it under /plugins namespace will give a vision that it is fully plugin specific. I like to see EDP API fully plugin independent and in one workspace. If core side needs some information internally it can easily go into the plugin. I'm not sure if we're disagreeing. We may, in fact, be in violent agreement. The EDP API is fully plugin independent, and should stay that way as a
Re: [openstack-dev] [requirements][oslo] Upgrade six to 1.5.2?
On 22/01/14 11:40 +0100, Chmouel Boudjnah wrote: On Wed, Jan 22, 2014 at 11:17 AM, Julien Danjou jul...@danjou.info wrote: On Tue, Jan 21 2014, ZhiQiang Fan wrote: six 1.5.2 has been released on 2014-01-06, it provides urllib/urlparse compatibility. Is there any plan to upgrade six to 1.5.2? (since it is fresh new, may need some time to test) six 1.4.1 is lack of urllib/urlparse support, so oslo-incubator/ py3kcompat is needed, and it is used in some projects, if we upgrade six, should we remove py3k in the same time, or just leave those code there? Upgrade and remove our own code, that'd be better. I think us all +1 less code we maintain is a good thing :) +1 :) Chmouel. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- @flaper87 Flavio Percoco pgp1bpGlNdhXH.pgp Description: PGP signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Disabling file injection *by default*
On 01/22/2014 03:27 AM, Robert Collins wrote: On 22 January 2014 10:50, Kashyap Chamarthy kcham...@redhat.com wrote: [CC'ed libguestfs author, Rich Jones] Heya, On 01/21/2014 07:59 AM, Robert Collins wrote: I was reminded of this while I cleaned up failed file injection nbd devices on ci-overcloud.tripleo.org :/ - what needs to happen for us to change the defaults around file injection so that it's disabled? I presume you're talking about libguestfs based file injection. I remember recently debugging/testing by disabling it to isolate a different problem: inject_partition=-2 No, the default is nbd based injection, which is terrible on two counts: - its got horrible security ramifications - its a horrible thing to be doing libguestfs based injection is only terrible on one count: - its a horrible thing to be doing That said, I'm trying to understand the rationale of your proposal in this case. Can you point me to a URL or some such? I'm just curious as a heavy user of libguestfs. There's nothing wrong with libguestfs, this is about the feature which has been discussed, here, a lot :) - for delivering metadata to images, config-drive || metadata service are much better. Hypervisors shouldn't be in the business of tinkering inside VM file systems at all. Thanks for the details, Robert and Rich. -- /kashyap ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Neutron] PMTUd broken in gre networks
On 22 January 2014 12:01, Robert Collins robe...@robertcollins.net wrote: Getting the MTU *right* on all hosts seems to be key to keeping your hair attached to your head for a little longer. Hence the DHCP suggestion to set it to the right value. I certainly think having the MTU set to the right value is important. I wonder if there's a standard way we can signal the MTU (e.g. in the virtio interface) other than DHCP. Not because DHCP is bad, but because that would work with statically injected network configs as well. To the best of my knowledge, no. And it wants to be a part of the static config too. derail And the static config, the last I checked, also sucks - we really want the data to be in a metadata format that cloud-init absorbs, but the last I checked there's a feature in config-drive et al that writes /etc/network/interfaces. Which is no use to anyone on Windows, or Redhat, or... /derail One thing we could do is encourage OS vendors to turn /proc/sys/net/ipv4/tcp_mtu_probing (http://www.ietf.org/rfc/rfc4821.txt) on in combination with dropping over-size frames. That should detect the actual MTU. Though it's really a bit of a workaround. Another thing would be for encapsulation failures in the switch to be reflected in the vNIC in the instance - export back media errors (e.g. babbles) so that users can diagnose problems. Ditto. Note that IPv6 doesn't *have* a DF bit, because routers are not permitted to fragment - arguably encapsulating an ipv6 frame in GRE and then fragmenting the outer layer is a violation of that. Fragmentation is fine for the tunnel, *if* the tunnel also reassembles. The issue of fragmentation is it's horrible to implement on all your endpoints, aiui, and used to lead to innumerable fragmentation attacks. As for automatically determining the size - we can determine the PMTU between all hosts in the mesh, report those back centrally and take the lowest then subtract the GRE overhead. If there's one path, and if there's no lower MTU on the GRE path (which can go via routers)... We can make an educated guess at the MTU but we can't know it without testing each GRE tunnel as we set it up (and multiple routes defeats even that) so I would recommend a config option as the best of a nasty set of choices. It can still go wrong but it's then blatantly and obviously a config fault rather than some code guessing wrong, which would be harder for an end user to work around. -- Ian. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][IPv6] A pair of mode keywords
On 21 January 2014 22:46, Veiga, Anthony anthony_ve...@cable.comcast.comwrote: Hi, Sean and Xuhan: I totally agree. This is not the ultimate solution with the assumption that we had to use “enable_dhcp”. We haven’t decided the name of another parameter, however, we are open to any suggestions. As we mentioned during the meeting, the second parameter should highlight the need of addressing. If so, it should have at least four values: 1) off (i.e. address is assigned by external devices out of OpenStack control) 2) slaac (i.e. address is calculated based on RA sent by OpenStack dnsmasq) 3) dhcpv6-stateful (i.e. address is obtained from OpenStack dnsmasq acting as DHCPv6 stateful server) 4) dhcpv6-stateless (i.e. address is calculated based on RA sent from either OpenStack dnsmasq, or external router, and optional information is retrieved from OpenStack dnsmasq acting as DHCPv6 stateless server) So how does this work if I have an external DHCPv6 server and an internal router? (How baroque do we have to get?) enable_dhcp, for backward compatibility reasons, should probably disable *both* RA and DHCPv6, despite the name, so we can't use that to disable the DHCP server. We could add a *third* attribute, which I hate as an idea but does resolve the problem - one flag for each of the servers, one for the mode the servers are operating in, and enable_dhcp which needs to DIAF but will persist till the API is revved. -- Ian. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] [Tuskar] Terminology Revival #1 - Roles
On 2014/22/01 10:00, Jaromir Coufal wrote: On 2014/22/01 00:56, Tzu-Mainn Chen wrote: Hiya - Resource is actually a Heat term that corresponds to what we're deploying within the Overcloud Stack - i.e., if we specify that we want an Overcloud with 1 Controller and 3 Compute, Heat will create a Stack that contains 1 Controller and 3 Compute Resources. Then a quick question - why do we design deployment by increasing/decreasing number of *instances* instead of resources? -- Jarda And one more thing - Resource is very broad term as well as Role is. The only difference is that Heat accepted 'Resource' as specific term for them (you see? they used broad term for their concept). So I am asking myself, where is difference between generic term Resource and Role? Why cannot we accept Roles? It's short, well describing... I am leaning towards Role. We can be more specific with adding some extra word, e.g.: * Node Role * Deployment Role ... and if we are in the context of undercloud, people can shorten it to just Roles. But 'Resource Category' seems to me that it doesn't solve anything. -- Jarda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] [Tuskar] Terminology Revival #1 - Roles
Hello, Jaromir On Wed, Jan 22, 2014 at 4:09 PM, Jaromir Coufal jcou...@redhat.com wrote: I am leaning towards Role. We can be more specific with adding some extra word, e.g.: * Node Role We use this term a lot internally for the very similar purpose, so it looks reasonable to me. Just my 2c. -- Best regards, Oleg Gelbukh * Deployment Role ... and if we are in the context of undercloud, people can shorten it to just Roles. But 'Resource Category' seems to me that it doesn't solve anything. -- Jarda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Heat] Reducing pep8 ignores
On Wed, Jan 22, 2014 at 01:23:05PM +0200, Pavlo Shchelokovskyy wrote: Hi all, we have an approved blueprint that concerns reducing number of ignored PEP8 and openstack/hacking style checks for heat ( https://blueprints.launchpad.net/heat/+spec/reduce-flake8-ignored-rules). I've been already warned that enabling some of these rules will be quite controversial, and personally I do not like some of these rules myself either. In order to understand what is the opinion of the community, I would like to ask you to leave a comment on the blueprint page about what do you think about enabling these checks. The style rules being currently ignored are: F841 local variable 'json_template' is assigned to but never used This was fixed an enabled in https://review.openstack.org/#/c/62827/ H201 no 'except:' at least use 'except Exception:' (this actually checks for bare 'except:' lines, so 'except BaseException:' will pass too) This sounds reasonable, we made an effort to purge naked excepts a while back so hopefully it shouldn't be too difficult to enable. However there are a couple of remaining instances (in resource.py and scheduler.py in particular), so we need to evaluate if these are justifiable or need to be reworked. H302 do not import objects, only modules (this I don't like myself as it can clutter the code beyond reasonable limit) H306 imports not in alphabetical order H404 multi line docstring should start with a summary Personally I don't care much about any of these, in particular the import ones seem to me unncessesarily inconvenient so I'd prefer to leave these disabled. H404 is probably a stronger argument, as it would help improve the quality of our auto-generated docs, but again I see it as of marginal value considering the (probably large) effort involved. I'd rather see that effort used to provide a better, more automated way to keep our API documentation updated (since that's the documentation users really need, combined with the existing template/resource documentation). Another question I have is how to proceed with such changes. I've already implemented H306 (order of imports) and am being now puzzled with how to propose such change to Gerrit. This change naturally touches many files (163 so far) and as such is clearly not suited for review in one piece. The only solution I currently can think of is to split it in 4-5-6 patches without actually changing tox.ini, and after all of them are merged, issue a final patch that updates tox.ini and any files breaking the rule that were introduced in between. But there is still a question on how Jenkins works with verify and merge jobs. Can it happen that we end up with code in master that does not pass pep8 check? Or there will be a 'race condition' between my final patch and any other that breaks the style rules? I would really appreciate any thoughts and comments about this. If you do proceed with the work, then I thing those reviewing will just have to police the queue and ensure we don't merge patches which break the style rules after you've fixed them. Steve ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Heat] Reducing pep8 ignores
On 01/22/2014 06:23 AM, Pavlo Shchelokovskyy wrote: Hi all, we have an approved blueprint that concerns reducing number of ignored PEP8 and openstack/hacking style checks for heat (https://blueprints.launchpad.net/heat/+spec/reduce-flake8-ignored-rules). I've been already warned that enabling some of these rules will be quite controversial, and personally I do not like some of these rules myself either. In order to understand what is the opinion of the community, I would like to ask you to leave a comment on the blueprint page about what do you think about enabling these checks. The style rules being currently ignored are: F841 local variable 'json_template' is assigned to but never used H201 no 'except:' at least use 'except Exception:' (this actually checks for bare 'except:' lines, so 'except BaseException:' will pass too) H302 do not import objects, only modules (this I don't like myself as it can clutter the code beyond reasonable limit) Realize you can do import aliases. import sqlalchemy as sa That looks to be best practice in the python community right now. H306 imports not in alphabetical order H404 multi line docstring should start with a summary Another question I have is how to proceed with such changes. I've already implemented H306 (order of imports) and am being now puzzled with how to propose such change to Gerrit. This change naturally touches many files (163 so far) and as such is clearly not suited for review in one piece. The only solution I currently can think of is to split it in 4-5-6 patches without actually changing tox.ini, and after all of them are merged, issue a final patch that updates tox.ini and any files breaking the rule that were introduced in between. But there is still a question on how Jenkins works with verify and merge jobs. Can it happen that we end up with code in master that does not pass pep8 check? Or there will be a 'race condition' between my final patch and any other that breaks the style rules? I would really appreciate any thoughts and comments about this. As long as it is all done in a git patch series on your side, with the patches stacked on top of each other in the correct order, it will be fine. The system that we have won't let you merge a pep8 error, you are protected from that. When I was doing similar cleanups for nova last year I just ended up with a 17 deep patch queue, which tended to merge in chunks of 4 then need rebasing, as something else landed and changed in front of me. -Sean -- Sean Dague Samsung Research America s...@dague.net / sean.da...@samsung.com http://dague.net signature.asc Description: OpenPGP digital signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [OpenStack-Dev] Cherry picking commit from oslo-incubator
On 01/22/2014 05:19 AM, Julien Danjou wrote: On Tue, Jan 21 2014, Joe Gordon wrote: I would like to propose having a integration test job in Oslo incubator that syncs in the code, similar to how we do global requirements. I don't think that would be possible as a voting job, since the point of oslo-incubator is to be able to break the API compatibility. I'm starting to feel like we need to revisit that point. Because what happens now is a chunk of code gets worked off in a corner, possibly randomly changing interfaces, not running unit tests in a way that we know it's multi process safe. So there ends up being a ton of blind trust in the sync right now. Which is why the syncs are coming slower, and you'll have nova 4 - 6 months behind on many modules, missing a critical bug fix that's buried some where inside a bunch of other interface changes that are expensive. (Not theoretical, I just tripped over this in Dec). I think we need to graduate things to stable interfaces a lot faster. Realizing that stable just means have to deprecate to change it. So the interface is still changeable, just requires standard deprecation techniques. Which we are trying to get more python libraries to do anyway, so it would be good if we built up a bunch of best practices here. -Sean -- Sean Dague Samsung Research America s...@dague.net / sean.da...@samsung.com http://dague.net signature.asc Description: OpenPGP digital signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron] PCI pass-through SRIOV
Sounds great! Let's do it on Thursday. --Robert On 1/22/14 12:46 AM, Irena Berezovsky ire...@mellanox.commailto:ire...@mellanox.com wrote: Hi Robert, all, I would suggest not to delay the SR-IOV discussion to the next week. Let’s try to cover the SRIOV side and especially the nova-neutron interaction points and interfaces this Thursday. Once we have the interaction points well defined, we can run parallel patches to cover the full story. Thanks a lot, Irena From: Robert Li (baoli) [mailto:ba...@cisco.com] Sent: Wednesday, January 22, 2014 12:02 AM To: OpenStack Development Mailing List (not for usage questions) Subject: [openstack-dev] [nova][neutron] PCI passthrough SRIOV Hi Folks, As the debate about PCI flavor versus host aggregate goes on, I'd like to move forward with the SRIOV side of things in the same time. I know that tomorrow's IRC will be focusing on the BP review, and it may well continue into Thursday. Therefore, let's start discussing SRIOV side of things on Monday. Basically, we need to work out the details on: -- regardless it's PCI flavor or host aggregate or something else, how to use it to specify a SRIOV port. -- new parameters for —nic -- new parameters for neutron net-create/neutron port-create -- interface between nova and neutron -- nova side of work -- neutron side of work We should start coding ASAP. Thanks, Robert ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] Neutron should disallow /32 CIDR
Thanks for your input, Carl. You're right, it seems the more appropriate place for this is _validate_subnet(). It checks ip version, gateway, etc... but not the size of the subnet. Carl Baldwin c...@ecbaldwin.net wrote on 01/21/2014 09:22:55 PM: From: Carl Baldwin c...@ecbaldwin.net To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 01/21/2014 09:27 PM Subject: Re: [openstack-dev] [neutron] Neutron should disallow /32 CIDR The bottom line is that the method you mentioned shouldn't validate the subnet. It should assume the subnet has been validated and validate the pool. It seems to do a adequate job of that. Perhaps there is a _validate_subnet method that you should be focused on? (I'd check but I don't have convenient access to the code at the moment) Carl On Jan 21, 2014 6:16 PM, Paul Ward wpw...@us.ibm.com wrote: You beat me to it. :) I just responded about not checking the allocation pool start and end but rather, checking subnet_first_ip and subnet_last_ip, which is set as follows: subnet = netaddr.IPNetwork(subnet_cidr) subnet_first_ip = netaddr.IPAddress(subnet.first + 1) subnet_last_ip = netaddr.IPAddress(subnet.last - 1) However, I'm curious about your contention that we're ok... I'm assuming you mean that this should already be handled. I don't believe anything is really checking to be sure the allocation pool leaves room for a gateway, I think it just makes sure it fits in the subnet. A member of our test team successfully created a network with a subnet of 255.255.255.255, so it got through somehow. I will look into that more tomorrow. Carl Baldwin c...@ecbaldwin.net wrote on 01/21/2014 05:27:49 PM: From: Carl Baldwin c...@ecbaldwin.net To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org, Date: 01/21/2014 05:32 PM Subject: Re: [openstack-dev] [neutron] Neutron should disallow /32 CIDR I think there may be some confusion between the two concepts: subnet and allocation pool. You are right that an ipv4 subnet smaller than /30 is not useable on a network. However, this method is checking the validity of an allocation pool. These pools should not include room for a gateway nor broadcast address. Their relation to subnets is that the range of ips contained in the pool must fit within the allocatable IP space on the subnet from which they are allocated. Other than that, they are simple ranges; they don't need to be cidr aligned or anything. A pool of a single IP is valid. I just checked the method's implementation now. It does check that the pool fits within the allocatable range of the subnet. I think we're good. Carl On Tue, Jan 21, 2014 at 3:35 PM, Paul Ward wpw...@us.ibm.com wrote: Currently, NeutronDbPluginV2._validate_allocation_pools() does some very basic checking to be sure the specified subnet is valid. One thing that's missing is checking for a CIDR of /32. A subnet with one IP address in it is unusable as the sole IP address will be allocated to the gateway, and thus no IPs are left over to be allocated to VMs. The fix for this is simple. In NeutronDbPluginV2._validate_allocation_pools(), we'd check for start_ip == end_ip and raise an exception if that's true. I've opened lauchpad bug report 1271311 (https://bugs.launchpad.net/neutron/+bug/1271311) for this, but wanted to start a discussion here to see if others find this enhancement to be a valuable addition. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] [Tuskar] Terminology Revival #1 - Roles
That's a fair question; I'd argue that it *should* be resources. When we update an overcloud deployment, it'll create additional resources. Mainn - Original Message - On 2014/22/01 00:56, Tzu-Mainn Chen wrote: Hiya - Resource is actually a Heat term that corresponds to what we're deploying within the Overcloud Stack - i.e., if we specify that we want an Overcloud with 1 Controller and 3 Compute, Heat will create a Stack that contains 1 Controller and 3 Compute Resources. Then a quick question - why do we design deployment by increasing/decreasing number of *instances* instead of resources? -- Jarda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] [Tuskar] Terminology Revival #1 - Roles
- Original Message - On 2014/22/01 10:00, Jaromir Coufal wrote: On 2014/22/01 00:56, Tzu-Mainn Chen wrote: Hiya - Resource is actually a Heat term that corresponds to what we're deploying within the Overcloud Stack - i.e., if we specify that we want an Overcloud with 1 Controller and 3 Compute, Heat will create a Stack that contains 1 Controller and 3 Compute Resources. Then a quick question - why do we design deployment by increasing/decreasing number of *instances* instead of resources? -- Jarda And one more thing - Resource is very broad term as well as Role is. The only difference is that Heat accepted 'Resource' as specific term for them (you see? they used broad term for their concept). So I am asking myself, where is difference between generic term Resource and Role? Why cannot we accept Roles? It's short, well describing... True, but Heat was creating something new, while it seems like (to me), our intention is mostly to consume other Openstack APIs and expose the results in the UI. If I call a Heat API which returns something that they call a Resource, I think it's confusing to developers to rename that. I am leaning towards Role. We can be more specific with adding some extra word, e.g.: * Node Role * Deployment Role ... and if we are in the context of undercloud, people can shorten it to just Roles. But 'Resource Category' seems to me that it doesn't solve anything. I'd be okay with Resource Role! -- Jarda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [OpenStack-Dev] Cherry picking commit from oslo-incubator
On 22/01/14 07:32 -0500, Sean Dague wrote: On 01/22/2014 05:19 AM, Julien Danjou wrote: On Tue, Jan 21 2014, Joe Gordon wrote: I would like to propose having a integration test job in Oslo incubator that syncs in the code, similar to how we do global requirements. I don't think that would be possible as a voting job, since the point of oslo-incubator is to be able to break the API compatibility. I'm starting to feel like we need to revisit that point. Because what happens now is a chunk of code gets worked off in a corner, possibly randomly changing interfaces, not running unit tests in a way that we know it's multi process safe. This is not true. If there have been abrupt changes on the interfaces then we failed at keeping backwards compatibility. However, that doesn't mean the API is not considered when reviewing nor that it doesn't matter because the library is incubated. This kind of changes usually get filed on all projects using a specific functionality. For example, https://bugs.launchpad.net/oslo/+bug/1266962 Again, if there have been cases where an API has been changed without even notifying others - either with a good commit message, m-l thread or bug report - then there's definitely something wrong in the process and it should be fixed. Also, I'd expect these errors to be raised as soon as they're noted because they may also affect other projects as well. The above is very different than just saying oslo-incubator is not trustworthy because things get copied around and changes to the libraries are made randomly. So there ends up being a ton of blind trust in the sync right now. Which is why the syncs are coming slower, and you'll have nova 4 - 6 months behind on many modules, missing a critical bug fix that's buried some where inside a bunch of other interface changes that are expensive. (Not theoretical, I just tripped over this in Dec). I'm sorry but this is not an excuse to avoid syncing from oslo-incubator. Actually, if things like this can happen, the bigger the gap is the harder it'll be to sync from oslo. My suggestion has always been to do periodic syncs from oslo and keep up to day. Interface changes that *just* break other projects without a good way forward have to be raised here. I know we're talking about incubated libraries that are suppose to change but as mentioned above, we always consider backwards compatibility even on incubated libs because they're on its way to stability and breaking other projects is not fun. I think we need to graduate things to stable interfaces a lot faster. Realizing that stable just means have to deprecate to change it. So the interface is still changeable, just requires standard deprecation techniques. Which we are trying to get more python libraries to do anyway, so it would be good if we built up a bunch of best practices here. Agreed. This is something that we've been working on during Icehouse. We should probably define more clear what's the incubation path of modules that land in oslo-incubator. For example, determine where would they fit, how long should they be around based on their functionality and/or complexity etc. We talked about having a meeting on this matter after I-2. Not sure when it'll happen but it'll be a perfect time to discuss this further. Cheers, FF -- @flaper87 Flavio Percoco pgpv4mSdRVkCi.pgp Description: PGP signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] [Tuskar] Terminology Revival #1 - Roles
On 2014/22/01 10:00, Jaromir Coufal wrote: On 2014/22/01 00:56, Tzu-Mainn Chen wrote: Hiya - Resource is actually a Heat term that corresponds to what we're deploying within the Overcloud Stack - i.e., if we specify that we want an Overcloud with 1 Controller and 3 Compute, Heat will create a Stack that contains 1 Controller and 3 Compute Resources. Then a quick question - why do we design deployment by increasing/decreasing number of *instances* instead of resources? -- Jarda And one more thing - Resource is very broad term as well as Role is. The only difference is that Heat accepted 'Resource' as specific term for them (you see? they used broad term for their concept). So I am asking myself, where is difference between generic term Resource and Role? Why cannot we accept Roles? It's short, well describing... True, but Heat was creating something new, while it seems like (to me), our intention is mostly to consume other Openstack APIs and expose the results in the UI. If I call a Heat API which returns something that they call a Resource, I think it's confusing to developers to rename that. I am leaning towards Role. We can be more specific with adding some extra word, e.g.: * Node Role * Deployment Role ... and if we are in the context of undercloud, people can shorten it to just Roles. But 'Resource Category' seems to me that it doesn't solve anything. I'd be okay with Resource Role! Actually - didn't someone raise the objection that Role was a defined term within Keystone and potentially a source of confusion? Mainn -- Jarda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] Gate Update - Wed Morning Edition
Things aren't great, but they are actually better than yesterday. Vital Stats: Gate queue length: 107 Check queue length: 107 Head of gate entered: 45hrs ago Changes merged in last 24hrs: 58 The 58 changes merged is actually a good number, not a great number, but best we've seen in a number of days. I saw at least a 6 streak merge yesterday, so zuul is starting to behave like we expect it should. = Previous Top Bugs = Our previous top 2 issues - 1270680 and 1270608 (not confusing at all) are under control. Bug 1270680 - v3 extensions api inherently racey wrt instances Russell managed the second part of the fix for this, we've not seen it come back since that was ninja merged. Bug 1270608 - n-cpu 'iSCSI device not found' log causes gate-tempest-dsvm-*-full to fail Turning off the test that was triggering this made it completely go away. We'll have to revisit if that's because there is a cinder bug or a tempest bug, but we'll do that once the dust has settled. = New Top Bugs = Note: all fail numbers are across all queues Bug 1253896 - Attempts to verify guests are running via SSH fails. SSH connection to guest does not work. 83 fails in 24hrs Bug 1224001 - test_network_basic_ops fails waiting for network to become available 51 fails in 24hrs Bug 1254890 - Timed out waiting for thing causes tempest-dsvm-* failures 30 fails in 24hrs We are now sorting - http://status.openstack.org/elastic-recheck/ by failures in the last 24hrs, so we can use it more as a hit list. The top 3 issues are fingerprinted against infra, but are mostly related to normal restart operations at this point. = Starvation Update = with 214 jobs across queues, and averaging 7 devstack nodes per job, our working set is 1498 nodes (i.e. if we had than number we'd be able to be running all the jobs right now in parallel). Our current quota of nodes gives us ~ 480. Which is 1/3 our working set, and part of the reasons for delays. Rackspace has generously increased our quota in 2 of their availability zones, and Monty is going to prioritize getting those online. Because of Jenkins scaling issues (it starts generating failures when talking to too many build slaves), that means spinning up more Jenkins masters. We've found a 1 / 100 ratio makes Jenkins basically stable, pushing beyond that means new fails. Jenkins is not inherently elastic, so this is a somewhat manual process. Monty is diving on that. There is also a TCP slow start algorthm for zuul that Clark was working on yesterday, which we'll put into production as soon as it is good. This will prevent us from speculating all the way down the gate queue, just to throw it all away on a reset. It acts just like TCP, on every success we grow our speculation length, on every fail we reduce it, with a sane minimum so we don't over throttle ourselves. Thanks to everyone that's been pitching in digging on reset bugs. More help is needed. Many core reviewers are at this point completely ignoring normal reviews until the gate is back, so if you are waiting for a review on some code, the best way to get it, is help us fix the bugs reseting the gate. -Sean -- Sean Dague Samsung Research America s...@dague.net / sean.da...@samsung.com http://dague.net signature.asc Description: OpenPGP digital signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] [Tuskar] Terminology Revival #1 - Roles
On 22/01/14 14:31, Tzu-Mainn Chen wrote: On 2014/22/01 10:00, Jaromir Coufal wrote: On 2014/22/01 00:56, Tzu-Mainn Chen wrote: Hiya - Resource is actually a Heat term that corresponds to what we're deploying within the Overcloud Stack - i.e., if we specify that we want an Overcloud with 1 Controller and 3 Compute, Heat will create a Stack that contains 1 Controller and 3 Compute Resources. Then a quick question - why do we design deployment by increasing/decreasing number of *instances* instead of resources? -- Jarda And one more thing - Resource is very broad term as well as Role is. The only difference is that Heat accepted 'Resource' as specific term for them (you see? they used broad term for their concept). So I am asking myself, where is difference between generic term Resource and Role? Why cannot we accept Roles? It's short, well describing... True, but Heat was creating something new, while it seems like (to me), our intention is mostly to consume other Openstack APIs and expose the results in the UI. If I call a Heat API which returns something that they call a Resource, I think it's confusing to developers to rename that. I am leaning towards Role. We can be more specific with adding some extra word, e.g.: * Node Role * Deployment Role ... and if we are in the context of undercloud, people can shorten it to just Roles. But 'Resource Category' seems to me that it doesn't solve anything. I'd be okay with Resource Role! Actually - didn't someone raise the objection that Role was a defined term within Keystone and potentially a source of confusion? Mainn Yup, I think the concern was that it could be confused with User Roles. However, Resource Role is probably clear enough IMO. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Next steps for Whole Host allocation / Pclouds
From: Sylvain Bauza sylvain.ba...@bull.net Le 22/01/2014 02:50, Jay Pipes a écrit : Yup, agreed. It's difficult to guess what the capacity implications would be without having solid numbers on customer demands for this functionality, including hard data on how long such instances would typically live (see my previous point about re-using compute hosts for other purposes once the last dedicated instance is terminated on that host). Best, -jay My personal opinion (but I can be wrong) is that such feature would only be accepted by operators only if there is some termination period defined when you create a dedicated instance. Again, what happens when the lease (or the lock-in) ends should be defined by the operator, on his own convenience, and that's why Climate is behaviour-driven by configuration flags for lease termination. Is that enough? Remember that some of us are concerned with business workloads, rather than HPC jobs. While it might be acceptable in a business workload to plan on regularly recycling every individual instance, it is definitely not acceptable to plan on a specific end to a given workload. And if the workload lives on, then usage of particular hosts can live on (at least for a pretty large amount of time like the product of (lifetime of a VM) * (number of VMs on the host) ). Thanks, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Gate Update - Wed Morning Edition
On 01/22/2014 09:38 AM, Sean Dague wrote: Things aren't great, but they are actually better than yesterday. Vital Stats: Gate queue length: 107 Check queue length: 107 Head of gate entered: 45hrs ago Changes merged in last 24hrs: 58 The 58 changes merged is actually a good number, not a great number, but best we've seen in a number of days. I saw at least a 6 streak merge yesterday, so zuul is starting to behave like we expect it should. = Previous Top Bugs = Our previous top 2 issues - 1270680 and 1270608 (not confusing at all) are under control. Bug 1270680 - v3 extensions api inherently racey wrt instances Russell managed the second part of the fix for this, we've not seen it come back since that was ninja merged. Bug 1270608 - n-cpu 'iSCSI device not found' log causes gate-tempest-dsvm-*-full to fail Turning off the test that was triggering this made it completely go away. We'll have to revisit if that's because there is a cinder bug or a tempest bug, but we'll do that once the dust has settled. = New Top Bugs = Note: all fail numbers are across all queues Bug 1253896 - Attempts to verify guests are running via SSH fails. SSH connection to guest does not work. 83 fails in 24hrs Bug 1224001 - test_network_basic_ops fails waiting for network to become available 51 fails in 24hrs Bug 1254890 - Timed out waiting for thing causes tempest-dsvm-* failures 30 fails in 24hrs We are now sorting - http://status.openstack.org/elastic-recheck/ by failures in the last 24hrs, so we can use it more as a hit list. The top 3 issues are fingerprinted against infra, but are mostly related to normal restart operations at this point. = Starvation Update = with 214 jobs across queues, and averaging 7 devstack nodes per job, our working set is 1498 nodes (i.e. if we had than number we'd be able to be running all the jobs right now in parallel). Our current quota of nodes gives us ~ 480. Which is 1/3 our working set, and part of the reasons for delays. Rackspace has generously increased our quota in 2 of their availability zones, and Monty is going to prioritize getting those online. Because of Jenkins scaling issues (it starts generating failures when talking to too many build slaves), that means spinning up more Jenkins masters. We've found a 1 / 100 ratio makes Jenkins basically stable, pushing beyond that means new fails. Jenkins is not inherently elastic, so this is a somewhat manual process. Monty is diving on that. There is also a TCP slow start algorthm for zuul that Clark was working on yesterday, which we'll put into production as soon as it is good. This will prevent us from speculating all the way down the gate queue, just to throw it all away on a reset. It acts just like TCP, on every success we grow our speculation length, on every fail we reduce it, with a sane minimum so we don't over throttle ourselves. Thanks to everyone that's been pitching in digging on reset bugs. More help is needed. Many core reviewers are at this point completely ignoring normal reviews until the gate is back, so if you are waiting for a review on some code, the best way to get it, is help us fix the bugs reseting the gate. One last thing, Anita has also gotten on top of pruning out all the neutron changes from the gate. Something is very wrong in the neutron isolated jobs right now, so their chance of passing is close enough to 0, that we need to keep them out of the gate. This is a new regression in the last couple of days. This is a contributing factor in the gates moving again. She and Mark are rallying the Neutron folks to sort this one out. -Sean -- Sean Dague Samsung Research America s...@dague.net / sean.da...@samsung.com http://dague.net signature.asc Description: OpenPGP digital signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Gate Update - Wed Morning Edition
It's worth noticing that elastic recheck is signalling bug 1253896 and bug 1224001 but they have actually the same signature. I found also interesting that neutron is triggering a lot bug 1254890, which appears to be a hang on /dev/nbdX during key injection; so far I have no explanation for that. As suggested on IRC, the neutron isolated job had a failure rate of about 5-7% last week (until thursday I think). It might be therefore also looking at tempest/devstack patches which might be triggering failure or uncovering issues in neutron. I shared a few findings on the mailing list yesterday ([1]). I hope people actively looking at failures will find them helpful. Salvatore [1] http://lists.openstack.org/pipermail/openstack-dev/2014-January/025013.html On 22 January 2014 14:57, Sean Dague s...@dague.net wrote: On 01/22/2014 09:38 AM, Sean Dague wrote: Things aren't great, but they are actually better than yesterday. Vital Stats: Gate queue length: 107 Check queue length: 107 Head of gate entered: 45hrs ago Changes merged in last 24hrs: 58 The 58 changes merged is actually a good number, not a great number, but best we've seen in a number of days. I saw at least a 6 streak merge yesterday, so zuul is starting to behave like we expect it should. = Previous Top Bugs = Our previous top 2 issues - 1270680 and 1270608 (not confusing at all) are under control. Bug 1270680 - v3 extensions api inherently racey wrt instances Russell managed the second part of the fix for this, we've not seen it come back since that was ninja merged. Bug 1270608 - n-cpu 'iSCSI device not found' log causes gate-tempest-dsvm-*-full to fail Turning off the test that was triggering this made it completely go away. We'll have to revisit if that's because there is a cinder bug or a tempest bug, but we'll do that once the dust has settled. = New Top Bugs = Note: all fail numbers are across all queues Bug 1253896 - Attempts to verify guests are running via SSH fails. SSH connection to guest does not work. 83 fails in 24hrs Bug 1224001 - test_network_basic_ops fails waiting for network to become available 51 fails in 24hrs Bug 1254890 - Timed out waiting for thing causes tempest-dsvm-* failures 30 fails in 24hrs We are now sorting - http://status.openstack.org/elastic-recheck/ by failures in the last 24hrs, so we can use it more as a hit list. The top 3 issues are fingerprinted against infra, but are mostly related to normal restart operations at this point. = Starvation Update = with 214 jobs across queues, and averaging 7 devstack nodes per job, our working set is 1498 nodes (i.e. if we had than number we'd be able to be running all the jobs right now in parallel). Our current quota of nodes gives us ~ 480. Which is 1/3 our working set, and part of the reasons for delays. Rackspace has generously increased our quota in 2 of their availability zones, and Monty is going to prioritize getting those online. Because of Jenkins scaling issues (it starts generating failures when talking to too many build slaves), that means spinning up more Jenkins masters. We've found a 1 / 100 ratio makes Jenkins basically stable, pushing beyond that means new fails. Jenkins is not inherently elastic, so this is a somewhat manual process. Monty is diving on that. There is also a TCP slow start algorthm for zuul that Clark was working on yesterday, which we'll put into production as soon as it is good. This will prevent us from speculating all the way down the gate queue, just to throw it all away on a reset. It acts just like TCP, on every success we grow our speculation length, on every fail we reduce it, with a sane minimum so we don't over throttle ourselves. Thanks to everyone that's been pitching in digging on reset bugs. More help is needed. Many core reviewers are at this point completely ignoring normal reviews until the gate is back, so if you are waiting for a review on some code, the best way to get it, is help us fix the bugs reseting the gate. One last thing, Anita has also gotten on top of pruning out all the neutron changes from the gate. Something is very wrong in the neutron isolated jobs right now, so their chance of passing is close enough to 0, that we need to keep them out of the gate. This is a new regression in the last couple of days. This is a contributing factor in the gates moving again. She and Mark are rallying the Neutron folks to sort this one out. -Sean -- Sean Dague Samsung Research America s...@dague.net / sean.da...@samsung.com http://dague.net ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list
Re: [openstack-dev] [TripleO] [Tuskar] Terminology Revival #1 - Roles
On Jan 22, 2014, at 4:02 AM, Jaromir Coufal jcou...@redhat.com wrote: On 2014/22/01 00:56, Tzu-Mainn Chen wrote: Hiya - Resource is actually a Heat term that corresponds to what we're deploying within the Overcloud Stack - i.e., if we specify that we want an Overcloud with 1 Controller and 3 Compute, Heat will create a Stack that contains 1 Controller and 3 Compute Resources. Then a quick question - why do we design deployment by increasing/decreasing number of *instances* instead of resources? Yeah, great question Jarda. When I test out the “Stacks” functionality in Horizon the user doesn’t create a Stack that spins up resources, it spins up instances. Maybe there is a difference around the terms being used behind the scenes and in Horizon? Liz -- Jarda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] [Tuskar] Terminology Revival #1 - Roles
On Jan 22, 2014, at 9:52 AM, Dougal Matthews dou...@redhat.com wrote: On 22/01/14 14:31, Tzu-Mainn Chen wrote: On 2014/22/01 10:00, Jaromir Coufal wrote: On 2014/22/01 00:56, Tzu-Mainn Chen wrote: Hiya - Resource is actually a Heat term that corresponds to what we're deploying within the Overcloud Stack - i.e., if we specify that we want an Overcloud with 1 Controller and 3 Compute, Heat will create a Stack that contains 1 Controller and 3 Compute Resources. Then a quick question - why do we design deployment by increasing/decreasing number of *instances* instead of resources? -- Jarda And one more thing - Resource is very broad term as well as Role is. The only difference is that Heat accepted 'Resource' as specific term for them (you see? they used broad term for their concept). So I am asking myself, where is difference between generic term Resource and Role? Why cannot we accept Roles? It's short, well describing... True, but Heat was creating something new, while it seems like (to me), our intention is mostly to consume other Openstack APIs and expose the results in the UI. If I call a Heat API which returns something that they call a Resource, I think it's confusing to developers to rename that. I am leaning towards Role. We can be more specific with adding some extra word, e.g.: * Node Role * Deployment Role ... and if we are in the context of undercloud, people can shorten it to just Roles. But 'Resource Category' seems to me that it doesn't solve anything. I'd be okay with Resource Role! Actually - didn't someone raise the objection that Role was a defined term within Keystone and potentially a source of confusion? Yeah, that was me :) Mainn Yup, I think the concern was that it could be confused with User Roles. However, Resource Role is probably clear enough IMO. Exactly. If we add something to make “Role” more specific to the user it would be much more clear. Liz ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] [Tuskar] Terminology Revival #1 - Roles
On Jan 22, 2014, at 7:09 AM, Jaromir Coufal jcou...@redhat.com wrote: On 2014/22/01 10:00, Jaromir Coufal wrote: On 2014/22/01 00:56, Tzu-Mainn Chen wrote: Hiya - Resource is actually a Heat term that corresponds to what we're deploying within the Overcloud Stack - i.e., if we specify that we want an Overcloud with 1 Controller and 3 Compute, Heat will create a Stack that contains 1 Controller and 3 Compute Resources. Then a quick question - why do we design deployment by increasing/decreasing number of *instances* instead of resources? -- Jarda And one more thing - Resource is very broad term as well as Role is. The only difference is that Heat accepted 'Resource' as specific term for them (you see? they used broad term for their concept). So I am asking myself, where is difference between generic term Resource and Role? Why cannot we accept Roles? It's short, well describing... I am leaning towards Role. We can be more specific with adding some extra word, e.g.: * Node Role +1 to Node Role. I agree that “role” is being used as a generic term here. I’m still convinced it’s important to use “Node” in the name since this is the item we are describing by assigning it a certain type of role. Liz * Deployment Role ... and if we are in the context of undercloud, people can shorten it to just Roles. But 'Resource Category' seems to me that it doesn't solve anything. -- Jarda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] [Tuskar] Terminology Revival #1 - Roles
On Jan 22, 2014, at 7:09 AM, Jaromir Coufal jcou...@redhat.com wrote: On 2014/22/01 10:00, Jaromir Coufal wrote: On 2014/22/01 00:56, Tzu-Mainn Chen wrote: Hiya - Resource is actually a Heat term that corresponds to what we're deploying within the Overcloud Stack - i.e., if we specify that we want an Overcloud with 1 Controller and 3 Compute, Heat will create a Stack that contains 1 Controller and 3 Compute Resources. Then a quick question - why do we design deployment by increasing/decreasing number of *instances* instead of resources? -- Jarda And one more thing - Resource is very broad term as well as Role is. The only difference is that Heat accepted 'Resource' as specific term for them (you see? they used broad term for their concept). So I am asking myself, where is difference between generic term Resource and Role? Why cannot we accept Roles? It's short, well describing... I am leaning towards Role. We can be more specific with adding some extra word, e.g.: * Node Role +1 to Node Role. I agree that “role” is being used as a generic term here. I’m still convinced it’s important to use “Node” in the name since this is the item we are describing by assigning it a certain type of role. I'm *strongly* against Node Role. In Ironic, a Node has no explicit Role assigned to it; whatever Role it has is implicit through the Instance running on it (which maps to a Heat Resource). In that sense, we're not really monitoring Nodes; we're monitoring Resources, and a Node just happens to be one attribute of a Resource. Mainn Liz * Deployment Role ... and if we are in the context of undercloud, people can shorten it to just Roles. But 'Resource Category' seems to me that it doesn't solve anything. -- Jarda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] [Tuskar] Terminology Revival #1 - Roles
On Jan 22, 2014, at 4:02 AM, Jaromir Coufal jcou...@redhat.com wrote: On 2014/22/01 00:56, Tzu-Mainn Chen wrote: Hiya - Resource is actually a Heat term that corresponds to what we're deploying within the Overcloud Stack - i.e., if we specify that we want an Overcloud with 1 Controller and 3 Compute, Heat will create a Stack that contains 1 Controller and 3 Compute Resources. Then a quick question - why do we design deployment by increasing/decreasing number of *instances* instead of resources? Yeah, great question Jarda. When I test out the “Stacks” functionality in Horizon the user doesn’t create a Stack that spins up resources, it spins up instances. Maybe there is a difference around the terms being used behind the scenes and in Horizon? Maybe we're looking at different parts of the UI, but when I look at a Stack detail page in Horizon, I see a tab for Resources, and not Instances. The resource table might link to an Instance, but that information is retrieved from the Resource. Mainn Liz -- Jarda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] [Tuskar] Terminology Revival #1 - Roles
Oh dear user... :) I'll step a little bit back. We need to agree if we want to name concepts one way in the background and other way in the UI for user (did we already agree on this point?). We all know pros and cons. And I will still fight for users to get global infrastructure terminology (e.g. he is going to define Node Profiles instead of Flavors). Because I received a lot of negative feedback on mixing overcloud terms into undercloud, confusion about overcloud/undercloud term itself, etc. If it would be easier for developers to name the concepts in the background differently then it's fine - we just need to talk about 2 terms per concept then. And I would be a bit afraid of schizophrenia... On 2014/22/01 15:10, Tzu-Mainn Chen wrote: That's a fair question; I'd argue that it *should* be resources. When we update an overcloud deployment, it'll create additional resources. Honestly it would get super confusing for me, if somebody tells me - you have 5 compute resources. (And I am talking from user's world, not from developers one). But resource itself can be anything. -- Jarda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] [Tuskar] [UX] Infrastructure Management UI - Icehouse scoped wireframes
Hey everybody, I am sending updated wireframes. http://people.redhat.com/~jcoufal/openstack/tripleo/2014-01-22_tripleo-ui-icehouse.pdf Updates: * p15-18 for down-scaling deployment Any questions are welcome, I am happy to answer them. -- Jarda On 2014/16/01 01:50, Jaromir Coufal wrote: Hi folks, thanks everybody for feedback. Based on that I updated wireframes and tried to provide a minimum scope for Icehouse timeframe. http://people.redhat.com/~jcoufal/openstack/tripleo/2014-01-16_tripleo-ui-icehouse.pdf Hopefully we are able to deliver described set of features. But if you find something what is missing which is critical for the first release (or that we are implementing a feature which should not have such high priority), please speak up now. The wireframes are very close to implementation. In time, there will appear more views and we will see if we can get them in as well. Thanks all for participation -- Jarda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] [Tuskar] Terminology Revival #1 - Roles
On Jan 22, 2014, at 10:53 AM, Jaromir Coufal jcou...@redhat.com wrote: Oh dear user... :) I'll step a little bit back. We need to agree if we want to name concepts one way in the background and other way in the UI for user (did we already agree on this point?). We all know pros and cons. And I will still fight for users to get global infrastructure terminology (e.g. he is going to define Node Profiles instead of Flavors). Because I received a lot of negative feedback on mixing overcloud terms into undercloud, confusion about overcloud/undercloud term itself, etc. If it would be easier for developers to name the concepts in the background differently then it's fine - we just need to talk about 2 terms per concept then. And I would be a bit afraid of schizophrenia… Haha, sorry if this is my fault for reviving this whole thread :) Terminology is always tough. It probably makes sense to start with where we initially agreed and get some Operators eyes on the design so that they can weigh in. Liz On 2014/22/01 15:10, Tzu-Mainn Chen wrote: That's a fair question; I'd argue that it *should* be resources. When we update an overcloud deployment, it'll create additional resources. Honestly it would get super confusing for me, if somebody tells me - you have 5 compute resources. (And I am talking from user's world, not from developers one). But resource itself can be anything. -- Jarda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] [Tuskar] [UX] Infrastructure Management UI - Icehouse scoped wireframes
On Jan 20, 2014, at 3:02 AM, Jaromir Coufal jcou...@redhat.com wrote: Hello everybody, based on feedback which I received last week, I am sending updated wireframes. They are still not completely final, more use-cases and smaller updates will occur, but I believe that we are going forward pretty well. http://people.redhat.com/~jcoufal/openstack/tripleo/2014-01-20_tripleo-ui-icehouse.pdf What has changed? * 'Architecture' dropdown was added for all node descriptions * New views for Deployed and Free nodes * Removed Configuration part from Deployment Overview page (will be happening under Configuration tab (under construction)) * Added progressing page of overcloud being deployed + Deployment Log * Added Overcloud Horizon UI link to Deployment Overview page * Added view for down-scaling (need more work) * Added Implementation guide for developers New versions of wireframes, supporting other use-cases will occur in time, but I hope that without huge changes. Hi Jarda, Nice job keeping up with all of the changes on these. They definitely look to me like they are getting to a state of reality for this release. Just a few nitpicks: 1) Looking at page 6, the user can see that 1 node is down. I think it’s important that they can click on the link and be taken to the table to view details about this node. Would the table be filtered on the nodes that are down (just this one in this example)? I wonder if we should be consistent and add in a 33% of nodes are down? 2) How does the user get back to the list of registered nodes after they’ve clicked on the “Deployment Overview” section of navigation? It seems like they are floating a little bit in the navigation in pages 2-8. Would it make sense to have this be some sort of subsection of the Deployment Overview? 3) Would the “See and change defaults” link switch the user over to the configuration tab? I’m not sure this section even needs to be here if the user doesn’t see anything here in line. Best, Liz Cheers -- Jarda On 2014/16/01 01:50, Jaromir Coufal wrote: Hi folks, thanks everybody for feedback. Based on that I updated wireframes and tried to provide a minimum scope for Icehouse timeframe. http://people.redhat.com/~jcoufal/openstack/tripleo/2014-01-16_tripleo-ui-icehouse.pdf Hopefully we are able to deliver described set of features. But if you find something what is missing which is critical for the first release (or that we are implementing a feature which should not have such high priority), please speak up now. The wireframes are very close to implementation. In time, there will appear more views and we will see if we can get them in as well. Thanks all for participation -- Jarda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Trove] how to list available configuration parameters for datastores
Hey everyone I have run into an issue with the configuration parameter URI. I'd like some input on what the URI might look like for getting the list configuration parameters for a specific datastore. Problem: Configuration parameters need to be selected per datastore. Currently: Its setup to use the default(mysql) datastore and this wont work for other datastores like redis/cassandra/etc. /configurations/parameters - parameter list for mysql /configurations/parameters/parameter_name - details of parameter We need to be able to request the parameter list per datastore. Here are some suggestions that outlines how each method may work. ONE: /configurations/parameters?datastore=mysql - list parameter for mysql /configurations/parameters?datastore=redis - list parameter for redis - we do not use query parameters for anything other than pagination (limit and marker) - this requires some finagling with the context to add the datastore. https://gist.github.com/cp16net/8547197 TWO: /configurations/parameters - list of datastores that have configuration parameters /configurations/parameters/datastore - list of parameters for datastore THREE: /datastores/datastore/configuration/parameters - list the parameters for the datastore FOUR: /datastores/datastore - add an href on the return to the configuration parameter list for the datastore /configurations/parameters/datastore - list of parameters for datastore FIVE: * Require a configuration be created with a datastore. Then a user may list the configuration parameters allowed on that configuration. /configurations/config_id/parameters - parameter list for mysql - after some thought i think this method (5) might be the best way to handle this. I've outlined a few ways we could make this work. Let me know if you agree or why you may disagree with strategy 5. Thanks, Craig Vyvial ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][IPv6] A pair of mode keywords
I don't know if it's reasonable to expect a deployment of OpenStack that has an *external* DHCP server. It's certainly hard to imagine how you'd get the Neutron API and an external DHCP server to agree on an IP assignment, since OpenStack expects to be the source of truth. -- Sean M. Collins ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Questions regarding image location and glanceclient behaviour ...
On Wed, Jan 22, 2014 at 1:05 AM, Public Mail kpublicm...@gmail.com wrote: Hi All, I have two questions ... 1) Glance v1 APIs can take a --location argument when creating an image but v2 APIs can't - bug or feature? (Details below) I'd call that a missing feature. I think we probably need a glance image-location-add command somewhere in the client. But fair warning, this is typically a role-restricted operation. 2) How should glanceclient (v2 commands) handle reserved attributes? a) status quo: (Apparently) let the user set them but the server will return attribute is reserved error. Pros: No missing functionality, no damage done. Cons: Bad usability. b) hard-code list of reserved attributes in client and don't expose them to the user. Pros: quick to implement. Cons: Need to track reserved attributes in server implementation. c) get-reserved words from schema downloaded from server (and don't expose them to the user). Pros: Don't need to track server implmentation. Cons: Complex - reserved words can vary from command to command. I personally favor (b) on the grounds that a client implementation needs to closely understand server behaviour anyway so the sync-ing of reserved attributes shouldn't be a big problem (*provided* the list of reserved attributes is made available in the reference documentation which doesn't seem to be the case currently). We are in a bit of a bind with schemas--what's needed is schema resources to represent each request and response, not just each resource. Because, obviously, the things you can PATCH and POST are necessarily different than the things you can GET in any service api. However, it is not clear to me how we get from one schema per resource to one schema per request and response in a backwards compatible way. So b) might be the only way to go. So what does everybody think? details When using glance client's v1 interface I can image-create an image and specify the image file's location via the --location parameter. Alternatively I can image-create an empty image and then image-update the image's location to some url. However, when using the client's v2 commands I can neither image-create the file using the --location parameter, nor image-update the file later. When using image-create with --location, the client gives the following error (printed by warlock): Unable to set 'locations' to '[u'http://192.168.1.111/foo/bar']' This is because the schema dictates that the location should be an object of the form [{url: string, metadata: object}, ...] but there is no way to specify such an object from the command line - I cannot specify a string like '{url: 192.168.1.111/foo/bar, metadata: {}}' for there is no conversion from command line strings to python dicts nor is there any conversion from a simple URL string to a suitable location object. If I modify glanceclient.v2.images.Controller.create to convert the locations parameter from a URL string to the desired object then the request goes through to the glance server where it fails with a 403 error (Attribute 'locations' is reserved). So is this discrepancy between V1 V2 deliberate (a feature :)) or is it a bug? /details ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Neutron] PMTUd broken in gre networks
On 01/22/2014 03:01 AM, Robert Collins wrote: I certainly think having the MTU set to the right value is important. I wonder if there's a standard way we can signal the MTU (e.g. in the virtio interface) other than DHCP. Not because DHCP is bad, but because that would work with statically injected network configs as well. Can LLDP be used here somehow? It might require stretching things a bit - not all LLDP agents seem to include the information, and it might require some sort of cascade. It would also require the VM to pay attention to the frames as they arrive, but in broad, hand-waving, blue-sky theory it could communicate maximum frame size information within the broadcast domain. rick ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Trove] how to list available configuration parameters for datastores
I like #4 over #5 because it seems weird to have to create a configuration first to see what parameters are allowed. With #4 you could look up what is allowed first then create your configuration. Robert On Jan 22, 2014 10:18 AM, Craig Vyvial cp16...@gmail.com wrote: Hey everyone I have run into an issue with the configuration parameter URI. I'd like some input on what the URI might look like for getting the list configuration parameters for a specific datastore. Problem: Configuration parameters need to be selected per datastore. Currently: Its setup to use the default(mysql) datastore and this wont work for other datastores like redis/cassandra/etc. /configurations/parameters - parameter list for mysql /configurations/parameters/parameter_name - details of parameter We need to be able to request the parameter list per datastore. Here are some suggestions that outlines how each method may work. ONE: /configurations/parameters?datastore=mysql - list parameter for mysql /configurations/parameters?datastore=redis - list parameter for redis - we do not use query parameters for anything other than pagination (limit and marker) - this requires some finagling with the context to add the datastore. https://gist.github.com/cp16net/8547197 TWO: /configurations/parameters - list of datastores that have configuration parameters /configurations/parameters/datastore - list of parameters for datastore THREE: /datastores/datastore/configuration/parameters - list the parameters for the datastore FOUR: /datastores/datastore - add an href on the return to the configuration parameter list for the datastore /configurations/parameters/datastore - list of parameters for datastore FIVE: * Require a configuration be created with a datastore. Then a user may list the configuration parameters allowed on that configuration. /configurations/config_id/parameters - parameter list for mysql - after some thought i think this method (5) might be the best way to handle this. I've outlined a few ways we could make this work. Let me know if you agree or why you may disagree with strategy 5. Thanks, Craig Vyvial ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [TripleO] our update story: can people live with it?
I've been thinking a bit more about how TripleO updates are developing specifically with regards to compute nodes. What is commonly called the update story I think. As I understand it we expect people to actually have to reboot a compute node in the cluster in order to deploy an update. This really worries me because it seems like way overkill for such a simple operation. Lets say all I need to deploy is a simple change to Nova's libvirt driver. And I need to deploy it to *all* my compute instances. Do we really expect people to actually have to reboot every single compute node in their cluster for such a thing. And then do this again and again for each update they deploy? I understand the whole read only images thing plays into this too... but I'm wondering if there is a middle ground where things might work better. Perhaps we have a mechanism where we can tar up individual venvs from /opt/stack/ or perhaps also this is an area where real OpenStack packages could shine. It seems like we could certainly come up with some simple mechanisms to deploy these sorts of changes with Heat such that compute host reboot can be avoided for each new deploy. Dan ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [OpenStack-Dev] Cherry picking commit from oslo-incubator
On 2014-01-22 06:32, Sean Dague wrote: I think we need to graduate things to stable interfaces a lot faster. Realizing that stable just means have to deprecate to change it. So the interface is still changeable, just requires standard deprecation techniques. Which we are trying to get more python libraries to do anyway, so it would be good if we built up a bunch of best practices here. -Sean Big +1 to this. Eliminating the sync process is going to be the cleanest solution for the code that is stable enough to be usable with things like automatic syncs. The less code that is left in incubator, the easier the syncs will be. That said, I think there's only a few people (Doug, Mark, and Thierry?) who have done the promote to library thing, and I will admit I don't have a good handle on what is involved. It may be that we need better documentation of that process so more people can help out with it. I know Michael Still mentioned he was planning to graduate lockutils but didn't know exactly how. -Ben ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [OpenStack-Dev] Cherry picking commit from oslo-incubator
On 22/01/14 10:59 -0600, Ben Nemec wrote: On 2014-01-22 06:32, Sean Dague wrote: I think we need to graduate things to stable interfaces a lot faster. Realizing that stable just means have to deprecate to change it. So the interface is still changeable, just requires standard deprecation techniques. Which we are trying to get more python libraries to do anyway, so it would be good if we built up a bunch of best practices here. -Sean Big +1 to this. Eliminating the sync process is going to be the cleanest solution for the code that is stable enough to be usable with things like automatic syncs. The less code that is left in incubator, the easier the syncs will be. That said, I think there's only a few people (Doug, Mark, and Thierry?) who have done the promote to library thing, and I will admit I don't have a good handle on what is involved. It may be that we need better documentation of that process so more people can help out with it. I know Michael Still mentioned he was planning to graduate lockutils but didn't know exactly how. We're in the process of grouping independent modules into modules that actually make sense to avoid having 1 python package per module on pypi. Some of the graduation status is being tracked here[0] and here's[1] a graph of the current dependencies. As mentioned in my last email, I fully agree with this and we should definitely establish what the process is. oslo.config was the first package that graduated from the incubator. Other packages will come out of there during Icehouse. Cheers, FF [0] https://wiki.openstack.org/wiki/Oslo/GraduationStatus [1] https://wiki.openstack.org/wiki/Oslo/Dependencies -Ben ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- @flaper87 Flavio Percoco pgpCMoxOfF9LI.pgp Description: PGP signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] FW: [horizon] hypervisor summary page shows incorrect stats on overcommiting cpu/disk/ram in openstack
Hi, Though cpu/disk/ram stats are overcommitted in openstack, hypervisor summary page in horizon UI displays the actual stats on compute node instead of overcommitted values calculated in openstack. This gives incorrect data to the user while provisioning instances as the used value of cpu/disk/ram is shown more than total value after reaching the actual stats limits on compute node. Is this a defect (or) intentional to show the actual stats instead of overcommitted stats of compute node in hypervisor summary page in horizon UI ? Raju ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] our update story: can people live with it?
Excerpts from Dan Prince's message of 2014-01-22 09:17:24 -0800: I've been thinking a bit more about how TripleO updates are developing specifically with regards to compute nodes. What is commonly called the update story I think. As I understand it we expect people to actually have to reboot a compute node in the cluster in order to deploy an update. This really worries me because it seems like way overkill for such a simple operation. Lets say all I need to deploy is a simple change to Nova's libvirt driver. And I need to deploy it to *all* my compute instances. Do we really expect people to actually have to reboot every single compute node in their cluster for such a thing. And then do this again and again for each update they deploy? Agreed, if we make everybody reboot to push out a patch to libvirt, we have failed. And thus far, we are failing to do that, but with good reason. Right at this very moment, we are leaning on 'rebuild' in Nova, which reboots the instance. But this is so that we handle the hardest thing well first (rebooting to have a new kernel). For small updates we need to decouple things a bit more. There is a notion of the image ID in Nova, versus the image ID that is actually running. Right now we update it with a nova rebuild command only. But ideally we would give operators a tool to optimize and avoid the reboot when it is appropriate. The heuristic should be as simple as comparing kernels. Once we have determined that a new image does not need a reboot, we can just change the ID in Metadata, and an os-refresh-config script will do something like this: if [ $(cat /etc/image_id) != $(os-apply-config --key image_id) ] ; then; download_new_image mount_image /tmp/new_image mount / -o remount,rw # Assuming we've achieved ro root rsync --one-file-system -a /tmp/new_image/ / mount / -o remount,ro # ditto fi No reboot required. This would run early in configure.d, so that any pre-configure.d scripts will have run to quiesce services that can't handle having their binaries removed out from under them (read: non-Unix services). Then configure.d runs as usual, configures things, restarts services, and we are now running the new image. I understand the whole read only images thing plays into this too... but I'm wondering if there is a middle ground where things might work better. Perhaps we have a mechanism where we can tar up individual venvs from /opt/stack/ or perhaps also this is an area where real OpenStack packages could shine. It seems like we could certainly come up with some simple mechanisms to deploy these sorts of changes with Heat such that compute host reboot can be avoided for each new deploy. Given the scenario above, that would be a further optimization. I don't think it makes sense to specialize for venvs or openstack services though, so just ensure the root filesystems match seems like a workable, highly efficient system. Note that we've talked about having highly efficient ways to widely distribute the new images as well. I would call your e-mail a documentation/roadmap bug. This plan may have been recorded somewhere, but for me it has just always been in my head as the end goal (thanks to Robert Collins for drilling the hole and pouring it in there btw ;). ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] our update story: can people live with it?
- Original Message - From: Clint Byrum cl...@fewbar.com To: openstack-dev openstack-dev@lists.openstack.org Sent: Wednesday, January 22, 2014 12:45:45 PM Subject: Re: [openstack-dev] [TripleO] our update story: can people live with it? Excerpts from Dan Prince's message of 2014-01-22 09:17:24 -0800: I've been thinking a bit more about how TripleO updates are developing specifically with regards to compute nodes. What is commonly called the update story I think. As I understand it we expect people to actually have to reboot a compute node in the cluster in order to deploy an update. This really worries me because it seems like way overkill for such a simple operation. Lets say all I need to deploy is a simple change to Nova's libvirt driver. And I need to deploy it to *all* my compute instances. Do we really expect people to actually have to reboot every single compute node in their cluster for such a thing. And then do this again and again for each update they deploy? Agreed, if we make everybody reboot to push out a patch to libvirt, we have failed. And thus far, we are failing to do that, but with good reason. Right at this very moment, we are leaning on 'rebuild' in Nova, which reboots the instance. But this is so that we handle the hardest thing well first (rebooting to have a new kernel). For small updates we need to decouple things a bit more. There is a notion of the image ID in Nova, versus the image ID that is actually running. Right now we update it with a nova rebuild command only. But ideally we would give operators a tool to optimize and avoid the reboot when it is appropriate. The heuristic should be as simple as comparing kernels. When we get to implementing such a thing I might prefer it not to be auto-magic. I can see a case where I want the new image but maybe not the new kernel. Perhaps this should be addressed when building the image (by using the older kernel)... but still. I could see a case for explicitly not wanting to reboot here as well. Once we have determined that a new image does not need a reboot, we can just change the ID in Metadata, and an os-refresh-config script will do something like this: if [ $(cat /etc/image_id) != $(os-apply-config --key image_id) ] ; then; download_new_image mount_image /tmp/new_image mount / -o remount,rw # Assuming we've achieved ro root rsync --one-file-system -a /tmp/new_image/ / mount / -o remount,ro # ditto fi No reboot required. This would run early in configure.d, so that any pre-configure.d scripts will have run to quiesce services that can't handle having their binaries removed out from under them (read: non-Unix services). Then configure.d runs as usual, configures things, restarts services, and we are now running the new image. Cool. I like this a good bit better as it avoids the reboot. Still, this is a rather large amount of data to copy around if I'm only changing a single file in Nova. I understand the whole read only images thing plays into this too... but I'm wondering if there is a middle ground where things might work better. Perhaps we have a mechanism where we can tar up individual venvs from /opt/stack/ or perhaps also this is an area where real OpenStack packages could shine. It seems like we could certainly come up with some simple mechanisms to deploy these sorts of changes with Heat such that compute host reboot can be avoided for each new deploy. Given the scenario above, that would be a further optimization. I don't think it makes sense to specialize for venvs or openstack services though, so just ensure the root filesystems match seems like a workable, highly efficient system. Note that we've talked about having highly efficient ways to widely distribute the new images as well. Yes. Optimization! In the big scheme of things I could see 3 approaches being useful: 1) Deploy a full image and reboot if you have a kernel update. (entire image is copied) 2) Deploy a full image if you change a bunch of things and/or you prefer to do that. (entire image is copied) 3) Deploy specific application level updates via packages or tarballs. (only selected applications/packages get deployed) I would call your e-mail a documentation/roadmap bug. Fair enough. Thanks for the info. This plan may have been recorded somewhere, but for me it has just always been in my head as the end goal (thanks to Robert Collins for drilling the hole and pouring it in there btw ;). ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Trove] how to list available configuration parameters for datastores
My thoughts so far: /datastores/datastore/configuration/parameters (Option Three) + configuration set without an associated datastore is meaningless + a configuration set must be associated to exactly one datastore + each datastore must have 0-1 configuration set + All above relationships are immediately apparent - Listing all configuration sets becomes more difficult (which I don't think that is a valid concern) /configurations/config_id/parameters (Option Five) + Smaller, canonical route to a configuration set - datastore/config relationshiop is much more ambiguous I'm planning on working a blueprint for this feature soon, so I'd like any feedback anyone has. - kpom From: Craig Vyvial [cp16...@gmail.com] Sent: Wednesday, January 22, 2014 10:10 AM To: OpenStack Development Mailing List Subject: [openstack-dev] [Trove] how to list available configuration parameters for datastores Hey everyone I have run into an issue with the configuration parameter URI. I'd like some input on what the URI might look like for getting the list configuration parameters for a specific datastore. Problem: Configuration parameters need to be selected per datastore. Currently: Its setup to use the default(mysql) datastore and this wont work for other datastores like redis/cassandra/etc. /configurations/parameters - parameter list for mysql /configurations/parameters/parameter_name - details of parameter We need to be able to request the parameter list per datastore. Here are some suggestions that outlines how each method may work. ONE: /configurations/parameters?datastore=mysql - list parameter for mysql /configurations/parameters?datastore=redis - list parameter for redis - we do not use query parameters for anything other than pagination (limit and marker) - this requires some finagling with the context to add the datastore. https://gist.github.com/cp16net/8547197 TWO: /configurations/parameters - list of datastores that have configuration parameters /configurations/parameters/datastore - list of parameters for datastore THREE: /datastores/datastore/configuration/parameters - list the parameters for the datastore FOUR: /datastores/datastore - add an href on the return to the configuration parameter list for the datastore /configurations/parameters/datastore - list of parameters for datastore FIVE: * Require a configuration be created with a datastore. Then a user may list the configuration parameters allowed on that configuration. /configurations/config_id/parameters - parameter list for mysql - after some thought i think this method (5) might be the best way to handle this. I've outlined a few ways we could make this work. Let me know if you agree or why you may disagree with strategy 5. Thanks, Craig Vyvial ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] Icehouse-2 milestone candidates available
Hi everyone, Milestone-proposed branches were created for Keystone, Glance, Nova, Neutron, Cinder, Ceilometer, Heat and and Trove in preparation for the icehouse-2 milestone publication tomorrow. Horizon should be there in a few hours. You can find candidate tarballs at: http://tarballs.openstack.org/keystone/keystone-milestone-proposed.tar.gz http://tarballs.openstack.org/glance/glance-milestone-proposed.tar.gz http://tarballs.openstack.org/nova/nova-milestone-proposed.tar.gz http://tarballs.openstack.org/neutron/neutron-milestone-proposed.tar.gz http://tarballs.openstack.org/cinder/cinder-milestone-proposed.tar.gz http://tarballs.openstack.org/ceilometer/ceilometer-milestone-proposed.tar.gz http://tarballs.openstack.org/heat/heat-milestone-proposed.tar.gz http://tarballs.openstack.org/trove/trove-milestone-proposed.tar.gz You can also access the milestone-proposed branches directly at: https://github.com/openstack/keystone/tree/milestone-proposed https://github.com/openstack/glance/tree/milestone-proposed https://github.com/openstack/nova/tree/milestone-proposed https://github.com/openstack/neutron/tree/milestone-proposed https://github.com/openstack/cinder/tree/milestone-proposed https://github.com/openstack/ceilometer/tree/milestone-proposed https://github.com/openstack/heat/tree/milestone-proposed https://github.com/openstack/trove/tree/milestone-proposed Regards, -- Thierry Carrez (ttx) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Swift] 1.12.0 release candidate
Hi everyone, A milestone-proposed branch was created for Swift in preparation for the 1.12.0 release. Please test the proposed delivery to ensure no critical regression found its way in. Release-critical fixes might be backported to the milestone-proposed branch until final release, and will be tracked using the 1.12.0 milestone targeting: https://launchpad.net/swift/+milestone/1.12.0 You can find the candidate tarball at: http://tarballs.openstack.org/swift/swift-milestone-proposed.tar.gz You can also access the milestone-proposed branch directly at: https://github.com/openstack/swift/tree/milestone-proposed Regards, -- Thierry Carrez (ttx) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] our update story: can people live with it?
Another tricky bit left is how to handle service restarts as needed? Thanks, Kevin From: Dan Prince [dpri...@redhat.com] Sent: Wednesday, January 22, 2014 10:15 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [TripleO] our update story: can people live with it? - Original Message - From: Clint Byrum cl...@fewbar.com To: openstack-dev openstack-dev@lists.openstack.org Sent: Wednesday, January 22, 2014 12:45:45 PM Subject: Re: [openstack-dev] [TripleO] our update story: can people live with it? Excerpts from Dan Prince's message of 2014-01-22 09:17:24 -0800: I've been thinking a bit more about how TripleO updates are developing specifically with regards to compute nodes. What is commonly called the update story I think. As I understand it we expect people to actually have to reboot a compute node in the cluster in order to deploy an update. This really worries me because it seems like way overkill for such a simple operation. Lets say all I need to deploy is a simple change to Nova's libvirt driver. And I need to deploy it to *all* my compute instances. Do we really expect people to actually have to reboot every single compute node in their cluster for such a thing. And then do this again and again for each update they deploy? Agreed, if we make everybody reboot to push out a patch to libvirt, we have failed. And thus far, we are failing to do that, but with good reason. Right at this very moment, we are leaning on 'rebuild' in Nova, which reboots the instance. But this is so that we handle the hardest thing well first (rebooting to have a new kernel). For small updates we need to decouple things a bit more. There is a notion of the image ID in Nova, versus the image ID that is actually running. Right now we update it with a nova rebuild command only. But ideally we would give operators a tool to optimize and avoid the reboot when it is appropriate. The heuristic should be as simple as comparing kernels. When we get to implementing such a thing I might prefer it not to be auto-magic. I can see a case where I want the new image but maybe not the new kernel. Perhaps this should be addressed when building the image (by using the older kernel)... but still. I could see a case for explicitly not wanting to reboot here as well. Once we have determined that a new image does not need a reboot, we can just change the ID in Metadata, and an os-refresh-config script will do something like this: if [ $(cat /etc/image_id) != $(os-apply-config --key image_id) ] ; then; download_new_image mount_image /tmp/new_image mount / -o remount,rw # Assuming we've achieved ro root rsync --one-file-system -a /tmp/new_image/ / mount / -o remount,ro # ditto fi No reboot required. This would run early in configure.d, so that any pre-configure.d scripts will have run to quiesce services that can't handle having their binaries removed out from under them (read: non-Unix services). Then configure.d runs as usual, configures things, restarts services, and we are now running the new image. Cool. I like this a good bit better as it avoids the reboot. Still, this is a rather large amount of data to copy around if I'm only changing a single file in Nova. I understand the whole read only images thing plays into this too... but I'm wondering if there is a middle ground where things might work better. Perhaps we have a mechanism where we can tar up individual venvs from /opt/stack/ or perhaps also this is an area where real OpenStack packages could shine. It seems like we could certainly come up with some simple mechanisms to deploy these sorts of changes with Heat such that compute host reboot can be avoided for each new deploy. Given the scenario above, that would be a further optimization. I don't think it makes sense to specialize for venvs or openstack services though, so just ensure the root filesystems match seems like a workable, highly efficient system. Note that we've talked about having highly efficient ways to widely distribute the new images as well. Yes. Optimization! In the big scheme of things I could see 3 approaches being useful: 1) Deploy a full image and reboot if you have a kernel update. (entire image is copied) 2) Deploy a full image if you change a bunch of things and/or you prefer to do that. (entire image is copied) 3) Deploy specific application level updates via packages or tarballs. (only selected applications/packages get deployed) I would call your e-mail a documentation/roadmap bug. Fair enough. Thanks for the info. This plan may have been recorded somewhere, but for me it has just always been in my head as the end goal (thanks to Robert Collins for drilling the hole and pouring it in there btw ;). ___ OpenStack-dev mailing
Re: [openstack-dev] [TripleO] our update story: can people live with it?
On 01/22/2014 12:17 PM, Dan Prince wrote: I've been thinking a bit more about how TripleO updates are developing specifically with regards to compute nodes. What is commonly called the update story I think. As I understand it we expect people to actually have to reboot a compute node in the cluster in order to deploy an update. This really worries me because it seems like way overkill for such a simple operation. Lets say all I need to deploy is a simple change to Nova's libvirt driver. And I need to deploy it to *all* my compute instances. Do we really expect people to actually have to reboot every single compute node in their cluster for such a thing. And then do this again and again for each update they deploy? FWIW, I agree that this is going to be considered unacceptable by most people. Hopefully everyone is on the same page with that. It sounds like that's the case so far in this thread, at least... If you have to reboot the compute node, ideally you also have support for live migrating all running VMs on that compute node elsewhere before doing so. That's not something you want to have to do for *every* little change to *every* compute node. -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Gate Update - Wed Morning Edition
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 01/22/2014 09:38 AM, Sean Dague wrote: Thanks to everyone that's been pitching in digging on reset bugs. More help is needed. Many core reviewers are at this point completely ignoring normal reviews until the gate is back, so if you are waiting for a review on some code, the best way to get it, is help us fix the bugs reseting the gate. I've got a couple more gate bug fixes up for review this morning: 1) https://review.openstack.org/#/c/68443/ 13 fails in 24hrs / 22 fails in 7 days Needs nova-core review. 2) https://review.openstack.org/#/c/68275/ 8 fails in 24hrs / 17 fails in 7 days This one currently needs review from oslo-core to get into oslo-incubator. Then I'll sync it into nova (that's where I saw the bug in the gate, at least). - -- Russell Bryant -BEGIN PGP SIGNATURE- Version: GnuPG v1 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlLgEMUACgkQFg9ft4s9SAYXxACeOLo6Q2OE3ccMOJAg75blgxyA JFwAnRSB628CEDvVB3ty0yx/F57QDyLz =fUWE -END PGP SIGNATURE- ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] our update story: can people live with it?
Excerpts from Dan Prince's message of 2014-01-22 10:15:20 -0800: - Original Message - From: Clint Byrum cl...@fewbar.com To: openstack-dev openstack-dev@lists.openstack.org Sent: Wednesday, January 22, 2014 12:45:45 PM Subject: Re: [openstack-dev] [TripleO] our update story: can people live with it? Excerpts from Dan Prince's message of 2014-01-22 09:17:24 -0800: I've been thinking a bit more about how TripleO updates are developing specifically with regards to compute nodes. What is commonly called the update story I think. As I understand it we expect people to actually have to reboot a compute node in the cluster in order to deploy an update. This really worries me because it seems like way overkill for such a simple operation. Lets say all I need to deploy is a simple change to Nova's libvirt driver. And I need to deploy it to *all* my compute instances. Do we really expect people to actually have to reboot every single compute node in their cluster for such a thing. And then do this again and again for each update they deploy? Agreed, if we make everybody reboot to push out a patch to libvirt, we have failed. And thus far, we are failing to do that, but with good reason. Right at this very moment, we are leaning on 'rebuild' in Nova, which reboots the instance. But this is so that we handle the hardest thing well first (rebooting to have a new kernel). For small updates we need to decouple things a bit more. There is a notion of the image ID in Nova, versus the image ID that is actually running. Right now we update it with a nova rebuild command only. But ideally we would give operators a tool to optimize and avoid the reboot when it is appropriate. The heuristic should be as simple as comparing kernels. When we get to implementing such a thing I might prefer it not to be auto-magic. I can see a case where I want the new image but maybe not the new kernel. Perhaps this should be addressed when building the image (by using the older kernel)... but still. I could see a case for explicitly not wanting to reboot here as well. I prefer choosing what to update at image build time. This is the time where it is most clear how, from a developer and deployer standpoint, to influence the update. I can diff images, I can freeze mirrors.. etc. etc, all decoupled from anybody else and from production or test cycles. That said, I do think it would be good for deployers to be able have a way to control when reboots are and aren't allowed. That seems like policy, which may be best handled in Nova.. so we can have a user that can do updates to Heat Metadata/stacks, but not rebuilds in Nova. I have no idea of Heat's trust model will allow us to have such separation though. Once we have determined that a new image does not need a reboot, we can just change the ID in Metadata, and an os-refresh-config script will do something like this: if [ $(cat /etc/image_id) != $(os-apply-config --key image_id) ] ; then; download_new_image mount_image /tmp/new_image mount / -o remount,rw # Assuming we've achieved ro root rsync --one-file-system -a /tmp/new_image/ / mount / -o remount,ro # ditto fi No reboot required. This would run early in configure.d, so that any pre-configure.d scripts will have run to quiesce services that can't handle having their binaries removed out from under them (read: non-Unix services). Then configure.d runs as usual, configures things, restarts services, and we are now running the new image. Cool. I like this a good bit better as it avoids the reboot. Still, this is a rather large amount of data to copy around if I'm only changing a single file in Nova. I think in most cases transfer cost is worth it to know you're deploying what you tested. Also it is pretty easy to just do this optimization but still be rsyncing the contents of the image. Instead of downloading the whole thing we could have a box expose the mounted image via rsync and then all of the machines can just rsync changes. Also rsync has a batch mode where if you know for sure the end-state of machines you can pre-calculate that rsync and just ship that. Lots of optimization possible that will work fine in your just-update-one-file scenario. But really, how much does downtime cost? How much do 10Gb NICs and switches cost? I understand the whole read only images thing plays into this too... but I'm wondering if there is a middle ground where things might work better. Perhaps we have a mechanism where we can tar up individual venvs from /opt/stack/ or perhaps also this is an area where real OpenStack packages could shine. It seems like we could certainly come up with some simple mechanisms to deploy these sorts of changes with Heat such that compute host reboot can be avoided for each new deploy. Given the scenario above,
Re: [openstack-dev] [TripleO] [Tuskar] Terminology Revival #1 - Roles
- Original Message - Oh dear user... :) I'll step a little bit back. We need to agree if we want to name concepts one way in the background and other way in the UI for user (did we already agree on this point?). We all know pros and cons. And I will still fight for users to get global infrastructure terminology (e.g. he is going to define Node Profiles instead of Flavors). Because I received Jarda, sidepoint - could you explain again what the attributes of a node profile are? Beyond the Flavor, does it also define an image. . . ? Mainn a lot of negative feedback on mixing overcloud terms into undercloud, confusion about overcloud/undercloud term itself, etc. If it would be easier for developers to name the concepts in the background differently then it's fine - we just need to talk about 2 terms per concept then. And I would be a bit afraid of schizophrenia... On 2014/22/01 15:10, Tzu-Mainn Chen wrote: That's a fair question; I'd argue that it *should* be resources. When we update an overcloud deployment, it'll create additional resources. Honestly it would get super confusing for me, if somebody tells me - you have 5 compute resources. (And I am talking from user's world, not from developers one). But resource itself can be anything. -- Jarda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] our update story: can people live with it?
On Wed, 2014-01-22 at 13:15 -0500, Dan Prince wrote: - Original Message - From: Clint Byrum cl...@fewbar.com To: openstack-dev openstack-dev@lists.openstack.org Sent: Wednesday, January 22, 2014 12:45:45 PM Subject: Re: [openstack-dev] [TripleO] our update story: can people live with it? Excerpts from Dan Prince's message of 2014-01-22 09:17:24 -0800: I've been thinking a bit more about how TripleO updates are developing specifically with regards to compute nodes. What is commonly called the update story I think. As I understand it we expect people to actually have to reboot a compute node in the cluster in order to deploy an update. This really worries me because it seems like way overkill for such a simple operation. Lets say all I need to deploy is a simple change to Nova's libvirt driver. And I need to deploy it to *all* my compute instances. Do we really expect people to actually have to reboot every single compute node in their cluster for such a thing. And then do this again and again for each update they deploy? Agreed, if we make everybody reboot to push out a patch to libvirt, we have failed. And thus far, we are failing to do that, but with good reason. Right at this very moment, we are leaning on 'rebuild' in Nova, which reboots the instance. But this is so that we handle the hardest thing well first (rebooting to have a new kernel). For small updates we need to decouple things a bit more. There is a notion of the image ID in Nova, versus the image ID that is actually running. Right now we update it with a nova rebuild command only. But ideally we would give operators a tool to optimize and avoid the reboot when it is appropriate. The heuristic should be as simple as comparing kernels. When we get to implementing such a thing I might prefer it not to be auto-magic. I can see a case where I want the new image but maybe not the new kernel. Perhaps this should be addressed when building the image (by using the older kernel)... but still. I could see a case for explicitly not wanting to reboot here as well. ++ Once we have determined that a new image does not need a reboot, we can just change the ID in Metadata, and an os-refresh-config script will do something like this: if [ $(cat /etc/image_id) != $(os-apply-config --key image_id) ] ; then; download_new_image mount_image /tmp/new_image mount / -o remount,rw # Assuming we've achieved ro root rsync --one-file-system -a /tmp/new_image/ / mount / -o remount,ro # ditto fi No reboot required. This would run early in configure.d, so that any pre-configure.d scripts will have run to quiesce services that can't handle having their binaries removed out from under them (read: non-Unix services). Then configure.d runs as usual, configures things, restarts services, and we are now running the new image. Cool. I like this a good bit better as it avoids the reboot. Still, this is a rather large amount of data to copy around if I'm only changing a single file in Nova. Right. I understand the whole read only images thing plays into this too... but I'm wondering if there is a middle ground where things might work better. Perhaps we have a mechanism where we can tar up individual venvs from /opt/stack/ or perhaps also this is an area where real OpenStack packages could shine. It seems like we could certainly come up with some simple mechanisms to deploy these sorts of changes with Heat such that compute host reboot can be avoided for each new deploy. Given the scenario above, that would be a further optimization. I don't think it makes sense to specialize for venvs or openstack services though, so just ensure the root filesystems match seems like a workable, highly efficient system. Note that we've talked about having highly efficient ways to widely distribute the new images as well. Yes. Optimization! In the big scheme of things I could see 3 approaches being useful: 1) Deploy a full image and reboot if you have a kernel update. (entire image is copied) 2) Deploy a full image if you change a bunch of things and/or you prefer to do that. (entire image is copied) 3) Deploy specific application level updates via packages or tarballs. (only selected applications/packages get deployed) ++. FWIW, #3 happens a heck of a lot more often than #1 or #2 in CD environments, so this level of optimization will be frequently used. And, as I've said before, optimizing for frequently-used scenarios is worth spending the time on. Optimizing for infrequently-occurring things... not so much. :) Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] our update story: can people live with it?
Agreed, it is tricky if we try to only restart what we've changed. OR, just restart everything. We can make endpoints HA and use rolling updates to avoid spurious faults. There are complex ways to handle things even smoother.. but I go back to What does complexity cost? Excerpts from Fox, Kevin M's message of 2014-01-22 10:32:02 -0800: Another tricky bit left is how to handle service restarts as needed? Thanks, Kevin From: Dan Prince [dpri...@redhat.com] Sent: Wednesday, January 22, 2014 10:15 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [TripleO] our update story: can people live with it? - Original Message - From: Clint Byrum cl...@fewbar.com To: openstack-dev openstack-dev@lists.openstack.org Sent: Wednesday, January 22, 2014 12:45:45 PM Subject: Re: [openstack-dev] [TripleO] our update story: can people live with it? Excerpts from Dan Prince's message of 2014-01-22 09:17:24 -0800: I've been thinking a bit more about how TripleO updates are developing specifically with regards to compute nodes. What is commonly called the update story I think. As I understand it we expect people to actually have to reboot a compute node in the cluster in order to deploy an update. This really worries me because it seems like way overkill for such a simple operation. Lets say all I need to deploy is a simple change to Nova's libvirt driver. And I need to deploy it to *all* my compute instances. Do we really expect people to actually have to reboot every single compute node in their cluster for such a thing. And then do this again and again for each update they deploy? Agreed, if we make everybody reboot to push out a patch to libvirt, we have failed. And thus far, we are failing to do that, but with good reason. Right at this very moment, we are leaning on 'rebuild' in Nova, which reboots the instance. But this is so that we handle the hardest thing well first (rebooting to have a new kernel). For small updates we need to decouple things a bit more. There is a notion of the image ID in Nova, versus the image ID that is actually running. Right now we update it with a nova rebuild command only. But ideally we would give operators a tool to optimize and avoid the reboot when it is appropriate. The heuristic should be as simple as comparing kernels. When we get to implementing such a thing I might prefer it not to be auto-magic. I can see a case where I want the new image but maybe not the new kernel. Perhaps this should be addressed when building the image (by using the older kernel)... but still. I could see a case for explicitly not wanting to reboot here as well. Once we have determined that a new image does not need a reboot, we can just change the ID in Metadata, and an os-refresh-config script will do something like this: if [ $(cat /etc/image_id) != $(os-apply-config --key image_id) ] ; then; download_new_image mount_image /tmp/new_image mount / -o remount,rw # Assuming we've achieved ro root rsync --one-file-system -a /tmp/new_image/ / mount / -o remount,ro # ditto fi No reboot required. This would run early in configure.d, so that any pre-configure.d scripts will have run to quiesce services that can't handle having their binaries removed out from under them (read: non-Unix services). Then configure.d runs as usual, configures things, restarts services, and we are now running the new image. Cool. I like this a good bit better as it avoids the reboot. Still, this is a rather large amount of data to copy around if I'm only changing a single file in Nova. I understand the whole read only images thing plays into this too... but I'm wondering if there is a middle ground where things might work better. Perhaps we have a mechanism where we can tar up individual venvs from /opt/stack/ or perhaps also this is an area where real OpenStack packages could shine. It seems like we could certainly come up with some simple mechanisms to deploy these sorts of changes with Heat such that compute host reboot can be avoided for each new deploy. Given the scenario above, that would be a further optimization. I don't think it makes sense to specialize for venvs or openstack services though, so just ensure the root filesystems match seems like a workable, highly efficient system. Note that we've talked about having highly efficient ways to widely distribute the new images as well. Yes. Optimization! In the big scheme of things I could see 3 approaches being useful: 1) Deploy a full image and reboot if you have a kernel update. (entire image is copied) 2) Deploy a full image if you change a bunch of things and/or you prefer to do that. (entire image is copied) 3) Deploy specific application level
[openstack-dev] [Swift] domain-level quotas
Greetings, I'd be interested in your opinions and feedback on the following blueprint: https://blueprints.launchpad.net/swift/+spec/domain-level-quotas The idea is to have a middleware checking a domain's current usage against a limit set in the configuration before allowing an upload. The domain id can be extracted from the token, then used to query keystone for a list of projects belonging to the domain. Swift would then compute the domain usage in a similar fashion as the way it is currently done for accounts, and proceed from there. Do you think it is viable ? Thoughts ? Thanks, Matthieu Huin m...@enovance.com http://www.enovance.com eNovance SaS - 10 rue de la Victoire 75009 Paris - France ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][IPv6] A pair of mode keywords
Sean, I agree with you. I prefer OpenStack as the single source of truth. What end user chooses may be different. But with this pair of keywords, at least we provide comprehensive coverage on all scenarios. For Icehouse, I suggest we only consider the supports for the scenarios that OpenStack has full control of address assignment, plus one or two scenarios Comcast needs, in order to cover most of the deployments. We can leave other cases for future releases, or professional service opportunities. Shixiong On Jan 22, 2014, at 11:20 AM, Collins, Sean sean_colli...@cable.comcast.com wrote: I don't know if it's reasonable to expect a deployment of OpenStack that has an *external* DHCP server. It's certainly hard to imagine how you'd get the Neutron API and an external DHCP server to agree on an IP assignment, since OpenStack expects to be the source of truth. -- Sean M. Collins ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][IPv6] A pair of mode keywords
That is correct, Xu Han! On Jan 22, 2014, at 11:14 AM, Xuhan Peng pengxu...@gmail.com wrote: Ian, I think the last two attributes PDF from Shixiong's last email is trying to solve the problem you are saying, right? — Xu Han Peng (xuhanp) On Wed, Jan 22, 2014 at 8:15 PM, Ian Wells ijw.ubu...@cack.org.uk wrote: On 21 January 2014 22:46, Veiga, Anthony anthony_ve...@cable.comcast.com wrote: Hi, Sean and Xuhan: I totally agree. This is not the ultimate solution with the assumption that we had to use “enable_dhcp”. We haven’t decided the name of another parameter, however, we are open to any suggestions. As we mentioned during the meeting, the second parameter should highlight the need of addressing. If so, it should have at least four values: 1) off (i.e. address is assigned by external devices out of OpenStack control) 2) slaac (i.e. address is calculated based on RA sent by OpenStack dnsmasq) 3) dhcpv6-stateful (i.e. address is obtained from OpenStack dnsmasq acting as DHCPv6 stateful server) 4) dhcpv6-stateless (i.e. address is calculated based on RA sent from either OpenStack dnsmasq, or external router, and optional information is retrieved from OpenStack dnsmasq acting as DHCPv6 stateless server) So how does this work if I have an external DHCPv6 server and an internal router? (How baroque do we have to get?) enable_dhcp, for backward compatibility reasons, should probably disable *both* RA and DHCPv6, despite the name, so we can't use that to disable the DHCP server. We could add a *third* attribute, which I hate as an idea but does resolve the problem - one flag for each of the servers, one for the mode the servers are operating in, and enable_dhcp which needs to DIAF but will persist till the API is revved. -- Ian. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [ceilometer] per domain/project/user limits on alarms
Greetings, I'd be interested in some opinions and feedback on the following blueprint: https://blueprints.launchpad.net/ceilometer/+spec/quotas-on-alarms I think it'd be interesting to allow admins to limit the number of running alarms at any of the three levels defined by keystone. Thoughts ? Thanks, Matthieu Huin m...@enovance.com http://www.enovance.com eNovance SaS - 10 rue de la Victoire 75009 Paris - France ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] Neutron should disallow /32 CIDR
Agreed. That would be a good place for that check. Carl On Wed, Jan 22, 2014 at 6:40 AM, Paul Ward wpw...@us.ibm.com wrote: Thanks for your input, Carl. You're right, it seems the more appropriate place for this is _validate_subnet(). It checks ip version, gateway, etc... but not the size of the subnet. Carl Baldwin c...@ecbaldwin.net wrote on 01/21/2014 09:22:55 PM: From: Carl Baldwin c...@ecbaldwin.net To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 01/21/2014 09:27 PM Subject: Re: [openstack-dev] [neutron] Neutron should disallow /32 CIDR The bottom line is that the method you mentioned shouldn't validate the subnet. It should assume the subnet has been validated and validate the pool. It seems to do a adequate job of that. Perhaps there is a _validate_subnet method that you should be focused on? (I'd check but I don't have convenient access to the code at the moment) Carl On Jan 21, 2014 6:16 PM, Paul Ward wpw...@us.ibm.com wrote: You beat me to it. :) I just responded about not checking the allocation pool start and end but rather, checking subnet_first_ip and subnet_last_ip, which is set as follows: subnet = netaddr.IPNetwork(subnet_cidr) subnet_first_ip = netaddr.IPAddress(subnet.first + 1) subnet_last_ip = netaddr.IPAddress(subnet.last - 1) However, I'm curious about your contention that we're ok... I'm assuming you mean that this should already be handled. I don't believe anything is really checking to be sure the allocation pool leaves room for a gateway, I think it just makes sure it fits in the subnet. A member of our test team successfully created a network with a subnet of 255.255.255.255, so it got through somehow. I will look into that more tomorrow. Carl Baldwin c...@ecbaldwin.net wrote on 01/21/2014 05:27:49 PM: From: Carl Baldwin c...@ecbaldwin.net To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org, Date: 01/21/2014 05:32 PM Subject: Re: [openstack-dev] [neutron] Neutron should disallow /32 CIDR I think there may be some confusion between the two concepts: subnet and allocation pool. You are right that an ipv4 subnet smaller than /30 is not useable on a network. However, this method is checking the validity of an allocation pool. These pools should not include room for a gateway nor broadcast address. Their relation to subnets is that the range of ips contained in the pool must fit within the allocatable IP space on the subnet from which they are allocated. Other than that, they are simple ranges; they don't need to be cidr aligned or anything. A pool of a single IP is valid. I just checked the method's implementation now. It does check that the pool fits within the allocatable range of the subnet. I think we're good. Carl On Tue, Jan 21, 2014 at 3:35 PM, Paul Ward wpw...@us.ibm.com wrote: Currently, NeutronDbPluginV2._validate_allocation_pools() does some very basic checking to be sure the specified subnet is valid. One thing that's missing is checking for a CIDR of /32. A subnet with one IP address in it is unusable as the sole IP address will be allocated to the gateway, and thus no IPs are left over to be allocated to VMs. The fix for this is simple. In NeutronDbPluginV2._validate_allocation_pools(), we'd check for start_ip == end_ip and raise an exception if that's true. I've opened lauchpad bug report 1271311 (https://bugs.launchpad.net/neutron/+bug/1271311) for this, but wanted to start a discussion here to see if others find this enhancement to be a valuable addition. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [trove] Datastore Type/Version Migration
Hi Looks good approach. Lets start discussion. I propose API spec for it https://gist.github.com/andreyshestakov/8559309 Please look it and add your advices and comments. Thanks On Thu, Nov 21, 2013 at 2:44 AM, McReynolds, Auston amcreyno...@ebay.com wrote: With Multiple Datastore Types/Versions merged to master, the conversation around how to support migrating from one datastore version to another has begun. Please see https://gist.github.com/amcrn/dfd493200fcdfdb61a23 for a consolidation of thoughts thus far. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron]About creating vms without ip address
Hello. On Tue, Jan 21, 2014 at 12:52 PM, Dong Liu willowd...@gmail.com wrote: What's your opinion? We've just discussed a use case for this today. I want to create a sandbox for Fuel but I can't do it with OpenStack. The reason is a bit different from telecom case: Fuel needs to manage nodes directly via DHCP and PXE and you can't do that with Neutron since you can't make its dnsmasq service quiet. So, it's a great idea. We can have either VMs with no IP address associated or networks with no fixed IP range, either could work. There can be a problem with handling floating IPs though. -- Kind regards, Yuriy. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron]About creating vms without ip address
Yuriy Taraday wrote: Fuel needs to manage nodes directly via DHCP and PXE and you can't do that with Neutron since you can't make its dnsmasq service quiet. Can you elaborate on what you mean by this? You can turn of Neutron’s dnsmasq on a per network basis, correct? Do you mean something else by “make its dnsmasq service quiet”? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Trove] how to list available configuration parameters for datastores
On Jan 22, 2014, at 10:19 AM, Kaleb Pomeroy wrote: My thoughts so far: /datastores/datastore/configuration/parameters (Option Three) + configuration set without an associated datastore is meaningless + a configuration set must be associated to exactly one datastore + each datastore must have 0-1 configuration set + All above relationships are immediately apparent - Listing all configuration sets becomes more difficult (which I don't think that is a valid concern) +1 to option 3, given what kaleb and craig have outlined so far. I dont see the above minus as a valid concern either, kaleb. signature.asc Description: Message signed with OpenPGP using GPGMail ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Neutron][LBaaS] Status update and weekly meeting
Hi folks, At this point we have a few major action items, mostly patches on review. Please note that the gate is in pretty bad shape, so don't expect anything to be approved/merged until this is sorted out. 1) SSL extension https://review.openstack.org/#/c/63510/ The code here is in a good shape IMO, but we are yet undecided on the general approach. In my opinion while we are lacking good and stable open source solution (which is Haproxy 1.5 released) that can be made a vendor extension with a prospect of moving into the core lbaas API. 2) Loadbalancer instance https://review.openstack.org/#/c/60207/ New API made fully backward-compatible. As new drivers appear (like https://review.openstack.org/#/c/67405/ ) the code shows the need of container entity to bind entities like router, device, agents. 3) Multiple providers with the same driver https://review.openstack.org/#/c/64139/ The code is good to merge, we need for the gate to be stable. 4) L7 rules https://review.openstack.org/#/c/61721/ My concern here is how Vip and Pool are associated. I think it could be made more generic. I've left corresponding comments. 5) We have lbaas scenario test which is still waiting to be merged: https://review.openstack.org/#/c/58697/ We'll have a regular irc meeting tomorrow at 14-00 UTC on #openstack-meeting. I'd like to discuss primarily 1 item which is 'uneven API experience', which could be divided into two distinct parts: - Presenting different API for different drivers (e.g. justification for vendor extension framework) - Generic API experience On the (2) I'd like to discuss the concern that I've raised in https://review.openstack.org/#/c/68190/ Thanks, Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] milestone-proposed branches
On Thu, Jan 16, 2014 at 10:32 AM, Thierry Carrez thie...@openstack.org wrote: James Slagle wrote: [...] And yes, I'm volunteering to do the work to support the above, and the release work :). Let me know if you have any question or need help. The process and tools used for the integrated release are described here: https://wiki.openstack.org/wiki/Release_Team/How_To_Release Thanks Thierry, I wanted to give this a go for icehouse milestone 2, but given that those were cut yesterday and there are still some outstanding doc updates in review, I'd like to shoot for milestone 3 instead. Is there anything additional we need to do to make that happen? I read through that wiki page. I did have a couple of questions: Who usually runs through the steps there? You? or a project member? When repo_tarball_diff.sh is run, are there any acceptable missing files? I'm seeing an AUTHORS and ChangeLog file showing up in the output from our repos, those are automatically generated, so I assume those are ok. There are also some egg_info files showing up, which I also think can be safely ignored. (I submitted a patch that updates the grep command used in the script: https://review.openstack.org/#/c/68471/ ) Thanks. Also note that we were considering switching from using milestone-proposed to using proposed/*, to avoid reusing branch names: https://review.openstack.org/#/c/65103/ -- Thierry Carrez (ttx) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- -- James Slagle -- ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Trove][Discussion] Are we using troveclient/tools/install_venv_common.py ?
Hi All, Are we using tools/install_venv_common.py in python-troveclient, If so just let us know. Otherwise, it may be cleaned up (removing it from openstack-common.conf) Thanks. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] our update story: can people live with it?
Excerpts from Jay Pipes's message of 2014-01-22 10:53:14 -0800: On Wed, 2014-01-22 at 13:15 -0500, Dan Prince wrote: - Original Message - From: Clint Byrum cl...@fewbar.com To: openstack-dev openstack-dev@lists.openstack.org Sent: Wednesday, January 22, 2014 12:45:45 PM Subject: Re: [openstack-dev] [TripleO] our update story: can people live with it? Given the scenario above, that would be a further optimization. I don't think it makes sense to specialize for venvs or openstack services though, so just ensure the root filesystems match seems like a workable, highly efficient system. Note that we've talked about having highly efficient ways to widely distribute the new images as well. Yes. Optimization! In the big scheme of things I could see 3 approaches being useful: 1) Deploy a full image and reboot if you have a kernel update. (entire image is copied) 2) Deploy a full image if you change a bunch of things and/or you prefer to do that. (entire image is copied) 3) Deploy specific application level updates via packages or tarballs. (only selected applications/packages get deployed) ++. FWIW, #3 happens a heck of a lot more often than #1 or #2 in CD environments, so this level of optimization will be frequently used. And, as I've said before, optimizing for frequently-used scenarios is worth spending the time on. Optimizing for infrequently-occurring things... not so much. :) I do understand that little tweaks are more common than whole software updates. I also think that little tweaks must be tested just like big ones. So I would argue that it is more important to optimize for trusting that what you tested is what is in production, and then to address any issues if that work-flow needs optimization. A system that leaves operators afraid to do a big update because it will trigger the bad path is a system that doesn't handle big updates well. Ideally we'd optimize all 3 in all of the obvious ways before determining that the one file update just isn't fast enough. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] our update story: can people live with it?
I think most of the time taken to reboot is spent in bringing down/up the services though, so I'm not sure what it really buys you if you do it all. It may let you skip the crazy long bootup time on enterprise hardware, but that could be worked around with kexec on the full reboot method too. Thanks, Kevin From: Clint Byrum [cl...@fewbar.com] Sent: Wednesday, January 22, 2014 10:55 AM To: openstack-dev Subject: Re: [openstack-dev] [TripleO] our update story: can people live with it? Agreed, it is tricky if we try to only restart what we've changed. OR, just restart everything. We can make endpoints HA and use rolling updates to avoid spurious faults. There are complex ways to handle things even smoother.. but I go back to What does complexity cost? Excerpts from Fox, Kevin M's message of 2014-01-22 10:32:02 -0800: Another tricky bit left is how to handle service restarts as needed? Thanks, Kevin From: Dan Prince [dpri...@redhat.com] Sent: Wednesday, January 22, 2014 10:15 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [TripleO] our update story: can people live with it? - Original Message - From: Clint Byrum cl...@fewbar.com To: openstack-dev openstack-dev@lists.openstack.org Sent: Wednesday, January 22, 2014 12:45:45 PM Subject: Re: [openstack-dev] [TripleO] our update story: can people live with it? Excerpts from Dan Prince's message of 2014-01-22 09:17:24 -0800: I've been thinking a bit more about how TripleO updates are developing specifically with regards to compute nodes. What is commonly called the update story I think. As I understand it we expect people to actually have to reboot a compute node in the cluster in order to deploy an update. This really worries me because it seems like way overkill for such a simple operation. Lets say all I need to deploy is a simple change to Nova's libvirt driver. And I need to deploy it to *all* my compute instances. Do we really expect people to actually have to reboot every single compute node in their cluster for such a thing. And then do this again and again for each update they deploy? Agreed, if we make everybody reboot to push out a patch to libvirt, we have failed. And thus far, we are failing to do that, but with good reason. Right at this very moment, we are leaning on 'rebuild' in Nova, which reboots the instance. But this is so that we handle the hardest thing well first (rebooting to have a new kernel). For small updates we need to decouple things a bit more. There is a notion of the image ID in Nova, versus the image ID that is actually running. Right now we update it with a nova rebuild command only. But ideally we would give operators a tool to optimize and avoid the reboot when it is appropriate. The heuristic should be as simple as comparing kernels. When we get to implementing such a thing I might prefer it not to be auto-magic. I can see a case where I want the new image but maybe not the new kernel. Perhaps this should be addressed when building the image (by using the older kernel)... but still. I could see a case for explicitly not wanting to reboot here as well. Once we have determined that a new image does not need a reboot, we can just change the ID in Metadata, and an os-refresh-config script will do something like this: if [ $(cat /etc/image_id) != $(os-apply-config --key image_id) ] ; then; download_new_image mount_image /tmp/new_image mount / -o remount,rw # Assuming we've achieved ro root rsync --one-file-system -a /tmp/new_image/ / mount / -o remount,ro # ditto fi No reboot required. This would run early in configure.d, so that any pre-configure.d scripts will have run to quiesce services that can't handle having their binaries removed out from under them (read: non-Unix services). Then configure.d runs as usual, configures things, restarts services, and we are now running the new image. Cool. I like this a good bit better as it avoids the reboot. Still, this is a rather large amount of data to copy around if I'm only changing a single file in Nova. I understand the whole read only images thing plays into this too... but I'm wondering if there is a middle ground where things might work better. Perhaps we have a mechanism where we can tar up individual venvs from /opt/stack/ or perhaps also this is an area where real OpenStack packages could shine. It seems like we could certainly come up with some simple mechanisms to deploy these sorts of changes with Heat such that compute host reboot can be avoided for each new deploy. Given the scenario above, that would be a further optimization. I don't think it makes sense to specialize for venvs or openstack services though,
Re: [openstack-dev] [qa][Neutron][Tempest][Network] Break down NetworkBasicOps to smaller test cases
On Tue, 2014-01-21 at 01:15 -0500, Yair Fried wrote: I seem to be unable to convey my point using generalization, so I will give a specific example: I would like to have update dns server as an additional network scenario. Currently I could add it to the existing module: 1. tests connectivity 2. re-associate floating ip 3. update dns server In which case, failure to re-associate ip will prevent my test from running, even though these are completely unrelated scenarios, and (IMO) we would like to get feedback on both of them. Another way, is to copy the entire network_basic_ops module, remove re-associate floating ip and add update dns server. For the obvious reasons - this also seems like the wrong way to go. I am looking for an elegant way to share the code of these scenarios. Well, unfortunately, there are no very elegant answers at this time :) The closest thing we have would be to create a fixtures.Fixture that constructed a VM and associated the floating IP address to the instance. You could then have separate tests that for checking connectivity and updating the DNS server for that instance. However, fixtures are for resources that are shared between test methods and are not modified during those test methods. They cannot be modified, because then parallel execution of the test methods may yield non-deterministic results. There would need to be a separate fixture for the instance that would have the floating IP re-associated with it (because re-associating the floating IP by nature is a modification to the resource). Having a separate fixture means essentially doubling the amount of resources used by the test case class in question, which is why we're pushing to just have all of the tests done serially in a single test method, even though that means that a failure to re-associate the floating IP would mean that the update DNS server test would not be executed. Choose your poison. Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Trove] how to list available configuration parameters for datastores
Goodday to all. #3 looks more than acceptable. /datastores/datastore/configuration/parameters. According to configuration parameters design, a configuration set must be associated to exactly one datastore. Best regards, Denis Makogon. 2014/1/22 Michael Basnight mbasni...@gmail.com On Jan 22, 2014, at 10:19 AM, Kaleb Pomeroy wrote: My thoughts so far: /datastores/datastore/configuration/parameters (Option Three) + configuration set without an associated datastore is meaningless + a configuration set must be associated to exactly one datastore + each datastore must have 0-1 configuration set + All above relationships are immediately apparent - Listing all configuration sets becomes more difficult (which I don't think that is a valid concern) +1 to option 3, given what kaleb and craig have outlined so far. I dont see the above minus as a valid concern either, kaleb. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] our update story: can people live with it?
On Jan 22, 2014, at 1:53 PM, Jay Pipes wrote: On Wed, 2014-01-22 at 13:15 -0500, Dan Prince wrote: - Original Message - From: Clint Byrum cl...@fewbar.com To: openstack-dev openstack-dev@lists.openstack.org Sent: Wednesday, January 22, 2014 12:45:45 PM Subject: Re: [openstack-dev] [TripleO] our update story: can people live with it? Excerpts from Dan Prince's message of 2014-01-22 09:17:24 -0800: I've been thinking a bit more about how TripleO updates are developing specifically with regards to compute nodes. What is commonly called the update story I think. As I understand it we expect people to actually have to reboot a compute node in the cluster in order to deploy an update. This really worries me because it seems like way overkill for such a simple operation. Lets say all I need to deploy is a simple change to Nova's libvirt driver. And I need to deploy it to *all* my compute instances. Do we really expect people to actually have to reboot every single compute node in their cluster for such a thing. And then do this again and again for each update they deploy? Agreed, if we make everybody reboot to push out a patch to libvirt, we have failed. And thus far, we are failing to do that, but with good reason. Right at this very moment, we are leaning on 'rebuild' in Nova, which reboots the instance. But this is so that we handle the hardest thing well first (rebooting to have a new kernel). For small updates we need to decouple things a bit more. There is a notion of the image ID in Nova, versus the image ID that is actually running. Right now we update it with a nova rebuild command only. But ideally we would give operators a tool to optimize and avoid the reboot when it is appropriate. The heuristic should be as simple as comparing kernels. When we get to implementing such a thing I might prefer it not to be auto-magic. I can see a case where I want the new image but maybe not the new kernel. Perhaps this should be addressed when building the image (by using the older kernel)... but still. I could see a case for explicitly not wanting to reboot here as well. ++ Once we have determined that a new image does not need a reboot, we can just change the ID in Metadata, and an os-refresh-config script will do something like this: if [ $(cat /etc/image_id) != $(os-apply-config --key image_id) ] ; then; download_new_image mount_image /tmp/new_image mount / -o remount,rw # Assuming we've achieved ro root rsync --one-file-system -a /tmp/new_image/ / mount / -o remount,ro # ditto fi No reboot required. This would run early in configure.d, so that any pre-configure.d scripts will have run to quiesce services that can't handle having their binaries removed out from under them (read: non-Unix services). Then configure.d runs as usual, configures things, restarts services, and we are now running the new image. Cool. I like this a good bit better as it avoids the reboot. Still, this is a rather large amount of data to copy around if I'm only changing a single file in Nova. Right. I understand the whole read only images thing plays into this too... but I'm wondering if there is a middle ground where things might work better. Perhaps we have a mechanism where we can tar up individual venvs from /opt/stack/ or perhaps also this is an area where real OpenStack packages could shine. It seems like we could certainly come up with some simple mechanisms to deploy these sorts of changes with Heat such that compute host reboot can be avoided for each new deploy. Given the scenario above, that would be a further optimization. I don't think it makes sense to specialize for venvs or openstack services though, so just ensure the root filesystems match seems like a workable, highly efficient system. Note that we've talked about having highly efficient ways to widely distribute the new images as well. Yes. Optimization! In the big scheme of things I could see 3 approaches being useful: 1) Deploy a full image and reboot if you have a kernel update. (entire image is copied) 2) Deploy a full image if you change a bunch of things and/or you prefer to do that. (entire image is copied) 3) Deploy specific application level updates via packages or tarballs. (only selected applications/packages get deployed) ++. FWIW, #3 happens a heck of a lot more often than #1 or #2 in CD environments, so this level of optimization will be frequently used. And, as I've said before, optimizing for frequently-used scenarios is worth spending the time on. Optimizing for infrequently-occurring things... not so much. :) I don't understand the aversion to using existing, well-known tools to handle this? A hybrid model (blending 2 and 3, above) here I think would work best where TripleO lays down a baseline image and the cloud operator would employ an well-known and support configuration tool for
Re: [openstack-dev] [TripleO] our update story: can people live with it?
Excerpts from Fox, Kevin M's message of 2014-01-22 12:19:56 -0800: I think most of the time taken to reboot is spent in bringing down/up the services though, so I'm not sure what it really buys you if you do it all. It may let you skip the crazy long bootup time on enterprise hardware, but that could be worked around with kexec on the full reboot method too. If we could get kexec reliable.. but I have no evidence that it is anything but a complete flake. What it saves you is losing running processes that you don't end up killing, which is expensive on many types of services.. Nova Compute being a notable example. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron]About creating vms without ip address
On Thu, Jan 23, 2014 at 12:04 AM, CARVER, PAUL pc2...@att.com wrote: Can you elaborate on what you mean by this? You can turn of Neutron’s dnsmasq on a per network basis, correct? Do you mean something else by “make its dnsmasq service quiet”? What I meant is for dnsmasq to not send offers to specific VMs so that Fuel's DHCP service will serve them. We shouldn't shut off network's DHCP entirely though since we still need Fuel VM to receive some address for external connectivity. -- Kind regards, Yuriy. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [qa][Neutron][Tempest][Network] Break down NetworkBasicOps to smaller test cases
On 01/22/2014 03:19 PM, Jay Pipes wrote: On Tue, 2014-01-21 at 01:15 -0500, Yair Fried wrote: I seem to be unable to convey my point using generalization, so I will give a specific example: I would like to have update dns server as an additional network scenario. Currently I could add it to the existing module: 1. tests connectivity 2. re-associate floating ip 3. update dns server In which case, failure to re-associate ip will prevent my test from running, even though these are completely unrelated scenarios, and (IMO) we would like to get feedback on both of them. Another way, is to copy the entire network_basic_ops module, remove re-associate floating ip and add update dns server. For the obvious reasons - this also seems like the wrong way to go. I am looking for an elegant way to share the code of these scenarios. Well, unfortunately, there are no very elegant answers at this time :) The closest thing we have would be to create a fixtures.Fixture that constructed a VM and associated the floating IP address to the instance. You could then have separate tests that for checking connectivity and updating the DNS server for that instance. However, fixtures are for resources that are shared between test methods and are not modified during those test methods. They cannot be modified, because then parallel execution of the test methods may yield non-deterministic results. There would need to be a separate fixture for the instance that would have the floating IP re-associated with it (because re-associating the floating IP by nature is a modification to the resource). Having a separate fixture means essentially doubling the amount of resources used by the test case class in question, which is why we're pushing to just have all of the tests done serially in a single test method, even though that means that a failure to re-associate the floating IP would mean that the update DNS server test would not be executed. Choose your poison. Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Thanks, Jay. So to close this loop, I think Yair started down this road after receiving feedback that this test was getting too much stuff in it. Sounds like you are advocating putting more stuff in it as the least of evils. Which is fine by me because it is a lot easier. -David ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] Changes coming in gate structure
Changes coming in gate structure Unless you've been living under a rock, on the moon, around Saturn, you'll have noticed that the gate has been quite backed up the last 2 weeks. Every time we get towards a milestone this gets measurably worse, and the expectation at is at i3 we're going to see at least 40% more load than we are dealing with now (if history is any indication), which doesn't bode well. It turns out, when you have a huge and rapidly growing Open Source project, you keep finding scaling limits in existing software, your software, and approaches in general. It also turns out that you find out that you need to act defensively on situations that you didn't think you'd have to worry about. Like code reviews with 3 month old test results being put into the review queue. Or code that *can't* pass (which a look at the logs would show) being reverified in the gate. All of these things compound on the fact that there are real bugs in OpenStack, which end up having a non linear failure effect. Once you get past a certain point the failure rates multiply to the point where everything stops (which happened Sunday, when we only merged 4 changes in 24 hrs). The history of the gate structure is a long one. It was added in Diablo when there was a project which literally would not run with the other OpenStack components. The idea of gating merge of everything on everything else is to ensure we have some understanding that OpenStack actually works, all together, for some set of configurations. It wasn't until Folsom cycle that we started running these tests before Human review (kind of amazing). The gate is also based on an assumption that most of the bugs we are catching are outside to project, vs. bugs that are already in the project. However, in an asynchronous system, bugs can show up only very occasionally, and get past our best efforts to detect them, then pile up in the code base until we rout them out. = Towards a Svelter Gate - Leaning on Check = We've got a current plan of attack to try to maintain nearly the same level of integration test guarantees, and hope to make it so on the merge side we're able to get more throughput. This is a set of things that all have to happen at once to not completely blow out the guarantees we've got in the source. Make a clean recent Check prereq for entering gate == A huge compounding problem has been patches that can't pass being promoted to the gate. So we're going to make Zuul able to enforce a recent clean check scorecard before going into the gate. Our working theory of recent is last 24hrs. If it doesn't have a recent set of check results on +A, we'll trigger a check rerun, and if clean, it gets sent to the gate. We'll also probably add a sweeper to zuul so it will refresh results on changes that are getting comments on them that are older than some number of days automatically. Svelt Gate == The gate jobs will be trimmed down immensely. Nothing project specific, so pep8 / unit tests all ripped out, no functional test runs. Less overall configs. Exactly how minimal we'll figure out as we decide what we can live without. The floor for this would be devstack-tempest-full and grenade. This is basically sanity check that the combination of patches in flight doesn't ruin the world for everyone. Idle Cloud for Elastic Recheck Bugs === We have actually been using gate as double duty, both as ensuring integration, but also as a set of clean test results to figure out what bugs are in OpenStack that only show up from time to time. The check queue is way too noisy, as our system actually blocks tons of bad code from getting in. With the Svelt gate, we'll need a set of background nodes to build that dataset. But with elastic search we now have the technology, so this is good. It will let us work these issues in parallel. This issues will still cause people pain in getting clean results in check. = Timelines, Dangers, and Opportunities = We need changes soon. Every past experience is milestone 3 is 40% heavier than milestone 2, and nothing indicates that icehouse is going to be any different. So Jim's put getting these required bits into Zuul to the top of his list, and we're hoping we'll have them within a week. With this approach, wedging the gate is highly unlikely. However as we won't be testing every check test again in gate, it means there is a possibility that a combination of patches might make the check results wedge for everyone (like pg job gets wedged). So it moves that issue around. Right now it's hard to say if that particular issue will get better or worse. However the Sherlock rule of gate blocks remains in effect: once you've eliminated the impossible, any
Re: [openstack-dev] [TripleO] our update story: can people live with it?
On Wed, 2014-01-22 at 12:12 -0800, Clint Byrum wrote: Excerpts from Jay Pipes's message of 2014-01-22 10:53:14 -0800: On Wed, 2014-01-22 at 13:15 -0500, Dan Prince wrote: - Original Message - From: Clint Byrum cl...@fewbar.com To: openstack-dev openstack-dev@lists.openstack.org Sent: Wednesday, January 22, 2014 12:45:45 PM Subject: Re: [openstack-dev] [TripleO] our update story: can people livewith it? Given the scenario above, that would be a further optimization. I don't think it makes sense to specialize for venvs or openstack services though, so just ensure the root filesystems match seems like a workable, highly efficient system. Note that we've talked about having highly efficient ways to widely distribute the new images as well. Yes. Optimization! In the big scheme of things I could see 3 approaches being useful: 1) Deploy a full image and reboot if you have a kernel update. (entire image is copied) 2) Deploy a full image if you change a bunch of things and/or you prefer to do that. (entire image is copied) 3) Deploy specific application level updates via packages or tarballs. (only selected applications/packages get deployed) ++. FWIW, #3 happens a heck of a lot more often than #1 or #2 in CD environments, so this level of optimization will be frequently used. And, as I've said before, optimizing for frequently-used scenarios is worth spending the time on. Optimizing for infrequently-occurring things... not so much. :) I do understand that little tweaks are more common than whole software updates. I also think that little tweaks must be tested just like big ones. So I would argue that it is more important to optimize for trusting that what you tested is what is in production, and then to address any issues if that work-flow needs optimization. A system that leaves operators afraid to do a big update because it will trigger the bad path is a system that doesn't handle big updates well. Ideally we'd optimize all 3 in all of the obvious ways before determining that the one file update just isn't fast enough. Well said. No disagreement from me. Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Changes coming in gate structure
On Wed, Jan 22, 2014 at 1:39 PM, Sean Dague s...@dague.net wrote: Changes coming in gate structure Unless you've been living under a rock, on the moon, around Saturn, you'll have noticed that the gate has been quite backed up the last 2 weeks. Every time we get towards a milestone this gets measurably worse, and the expectation at is at i3 we're going to see at least 40% more load than we are dealing with now (if history is any indication), which doesn't bode well. It turns out, when you have a huge and rapidly growing Open Source project, you keep finding scaling limits in existing software, your software, and approaches in general. It also turns out that you find out that you need to act defensively on situations that you didn't think you'd have to worry about. Like code reviews with 3 month old test results being put into the review queue. Or code that *can't* pass (which a look at the logs would show) being reverified in the gate. All of these things compound on the fact that there are real bugs in OpenStack, which end up having a non linear failure effect. Once you get past a certain point the failure rates multiply to the point where everything stops (which happened Sunday, when we only merged 4 changes in 24 hrs). The history of the gate structure is a long one. It was added in Diablo when there was a project which literally would not run with the other OpenStack components. The idea of gating merge of everything on everything else is to ensure we have some understanding that OpenStack actually works, all together, for some set of configurations. It wasn't until Folsom cycle that we started running these tests before Human review (kind of amazing). The gate is also based on an assumption that most of the bugs we are catching are outside to project, vs. bugs that are already in the project. However, in an asynchronous system, bugs can show up only very occasionally, and get past our best efforts to detect them, then pile up in the code base until we rout them out. = Towards a Svelter Gate - Leaning on Check = We've got a current plan of attack to try to maintain nearly the same level of integration test guarantees, and hope to make it so on the merge side we're able to get more throughput. This is a set of things that all have to happen at once to not completely blow out the guarantees we've got in the source. Make a clean recent Check prereq for entering gate == A huge compounding problem has been patches that can't pass being promoted to the gate. So we're going to make Zuul able to enforce a recent clean check scorecard before going into the gate. Our working theory of recent is last 24hrs. If it doesn't have a recent set of check results on +A, we'll trigger a check rerun, and if clean, it gets sent to the gate. We'll also probably add a sweeper to zuul so it will refresh results on changes that are getting comments on them that are older than some number of days automatically. Svelt Gate == The gate jobs will be trimmed down immensely. Nothing project specific, so pep8 / unit tests all ripped out, no functional test runs. Less overall configs. Exactly how minimal we'll figure out as we decide what we can live without. The floor for this would be devstack-tempest-full and grenade. This is basically sanity check that the combination of patches in flight doesn't ruin the world for everyone. Idle Cloud for Elastic Recheck Bugs === We have actually been using gate as double duty, both as ensuring integration, but also as a set of clean test results to figure out what bugs are in OpenStack that only show up from time to time. The check queue is way too noisy, as our system actually blocks tons of bad code from getting in. With the Svelt gate, we'll need a set of background nodes to build that dataset. But with elastic search we now have the technology, so this is good. It will let us work these issues in parallel. This issues will still cause people pain in getting clean results in check. = Timelines, Dangers, and Opportunities = We need changes soon. Every past experience is milestone 3 is 40% heavier than milestone 2, and nothing indicates that icehouse is going to be any different. So Jim's put getting these required bits into Zuul to the top of his list, and we're hoping we'll have them within a week. With this approach, wedging the gate is highly unlikely. However as we won't be testing every check test again in gate, it means there is a possibility that a combination of patches might make the check results wedge for everyone (like pg job gets wedged). So it moves that issue around. Right now it's hard to say
Re: [openstack-dev] [TripleO] our update story: can people live with it?
Excerpts from Keith Basil's message of 2014-01-22 12:27:50 -0800: On Jan 22, 2014, at 1:53 PM, Jay Pipes wrote: On Wed, 2014-01-22 at 13:15 -0500, Dan Prince wrote: - Original Message - From: Clint Byrum cl...@fewbar.com To: openstack-dev openstack-dev@lists.openstack.org Sent: Wednesday, January 22, 2014 12:45:45 PM Subject: Re: [openstack-dev] [TripleO] our update story: can people live with it? Excerpts from Dan Prince's message of 2014-01-22 09:17:24 -0800: I've been thinking a bit more about how TripleO updates are developing specifically with regards to compute nodes. What is commonly called the update story I think. As I understand it we expect people to actually have to reboot a compute node in the cluster in order to deploy an update. This really worries me because it seems like way overkill for such a simple operation. Lets say all I need to deploy is a simple change to Nova's libvirt driver. And I need to deploy it to *all* my compute instances. Do we really expect people to actually have to reboot every single compute node in their cluster for such a thing. And then do this again and again for each update they deploy? Agreed, if we make everybody reboot to push out a patch to libvirt, we have failed. And thus far, we are failing to do that, but with good reason. Right at this very moment, we are leaning on 'rebuild' in Nova, which reboots the instance. But this is so that we handle the hardest thing well first (rebooting to have a new kernel). For small updates we need to decouple things a bit more. There is a notion of the image ID in Nova, versus the image ID that is actually running. Right now we update it with a nova rebuild command only. But ideally we would give operators a tool to optimize and avoid the reboot when it is appropriate. The heuristic should be as simple as comparing kernels. When we get to implementing such a thing I might prefer it not to be auto-magic. I can see a case where I want the new image but maybe not the new kernel. Perhaps this should be addressed when building the image (by using the older kernel)... but still. I could see a case for explicitly not wanting to reboot here as well. ++ Once we have determined that a new image does not need a reboot, we can just change the ID in Metadata, and an os-refresh-config script will do something like this: if [ $(cat /etc/image_id) != $(os-apply-config --key image_id) ] ; then; download_new_image mount_image /tmp/new_image mount / -o remount,rw # Assuming we've achieved ro root rsync --one-file-system -a /tmp/new_image/ / mount / -o remount,ro # ditto fi No reboot required. This would run early in configure.d, so that any pre-configure.d scripts will have run to quiesce services that can't handle having their binaries removed out from under them (read: non-Unix services). Then configure.d runs as usual, configures things, restarts services, and we are now running the new image. Cool. I like this a good bit better as it avoids the reboot. Still, this is a rather large amount of data to copy around if I'm only changing a single file in Nova. Right. I understand the whole read only images thing plays into this too... but I'm wondering if there is a middle ground where things might work better. Perhaps we have a mechanism where we can tar up individual venvs from /opt/stack/ or perhaps also this is an area where real OpenStack packages could shine. It seems like we could certainly come up with some simple mechanisms to deploy these sorts of changes with Heat such that compute host reboot can be avoided for each new deploy. Given the scenario above, that would be a further optimization. I don't think it makes sense to specialize for venvs or openstack services though, so just ensure the root filesystems match seems like a workable, highly efficient system. Note that we've talked about having highly efficient ways to widely distribute the new images as well. Yes. Optimization! In the big scheme of things I could see 3 approaches being useful: 1) Deploy a full image and reboot if you have a kernel update. (entire image is copied) 2) Deploy a full image if you change a bunch of things and/or you prefer to do that. (entire image is copied) 3) Deploy specific application level updates via packages or tarballs. (only selected applications/packages get deployed) ++. FWIW, #3 happens a heck of a lot more often than #1 or #2 in CD environments, so this level of optimization will be frequently used. And, as I've said before, optimizing for frequently-used scenarios is worth spending the time on. Optimizing for infrequently-occurring things... not so much. :) I don't understand the aversion to using existing, well-known tools to handle this? These tools are of
Re: [openstack-dev] [gantt] How to include nova modules in unit tests
On Tue, Jan 21, 2014 at 7:35 PM, Dugger, Donald D donald.d.dug...@intel.com wrote: Well, the first goal is to get the scheduler code into a separate tree, even though that code is still utilizing common code from nova. Right now just about every scheduler file includes some nova modules. Ultimately yes, we want to remove the depency on nova but that is a future effort and would create way too many changes for the immediate future. The nova code you are trying to use isn't a public API and can change at any time. Before considering using gantt we would have to fully remove any nova imports in gantt. When we want to cut the cord from nova it'll be easy, just remove that line from the `test-requirements.txt' file and we'll be forced to replace all of the nova code. I'm not sure it will actually be that easy. -- Don Dugger Censeo Toto nos in Kansa esse decisse. - D. Gale Ph: 303/443-3786 -Original Message- From: Robert Collins [mailto:robe...@robertcollins.net] Sent: Tuesday, January 21, 2014 5:16 PM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [gantt] How to include nova modules in unit tests On 22 January 2014 11:57, Dugger, Donald D donald.d.dug...@intel.com wrote: I almost have the unit tests for gantt working except for one problem - is there a way to have the test infrastructure allow the gantt tree to import objects from the nova tree. The problem is that we want to break out just the scheduler code into the gantt tree without duplicating all of nova. The current scheduler has many imports of nova objects, which is not a problem except for the unit tests. The unit tests run in an environment that doesn't include the nova tree so all of those imports wind up failing. The goal though is to have an independent system; perhaps marking all the tests that still depend on tendrils of nova 'skipped' and then work on burning down the skips to 0 is a better approach than making it easy to have such dependencies? -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] our update story: can people live with it?
Maybe I misunderstand, but I thought: kexec - lets you boot a new kernel/initrd starting at the point a boot loader would skipping the bios init. All previous running processes are not running in the new boot just like a normal reboot. CRIU - Lets you snapshot/restart running processes. While you could use both together to upgrades kernel while leaving all the processes running after the reboot, I don't think that's very tested at the moment. checkpointing the system memory is not without cost too. Restarting the services may be faster. I think we're pretty far off in a tangent though. My main point was, if you can't selectively restart services as needed, I'm not sure how useful patching the image really is over a full reboot. It should take on the same order of magnitude service unavailability I think. Thanks, Kevin From: Clint Byrum [cl...@fewbar.com] Sent: Wednesday, January 22, 2014 12:36 PM To: openstack-dev Subject: Re: [openstack-dev] [TripleO] our update story: can people live with it? Excerpts from Fox, Kevin M's message of 2014-01-22 12:19:56 -0800: I think most of the time taken to reboot is spent in bringing down/up the services though, so I'm not sure what it really buys you if you do it all. It may let you skip the crazy long bootup time on enterprise hardware, but that could be worked around with kexec on the full reboot method too. If we could get kexec reliable.. but I have no evidence that it is anything but a complete flake. What it saves you is losing running processes that you don't end up killing, which is expensive on many types of services.. Nova Compute being a notable example. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Changes coming in gate structure
On Wed, 2014-01-22 at 15:39 -0500, Sean Dague wrote: snip == Executive Summary == To summarize, the effects of these changes will be: - 1) Decrease the impact of failures resetting the entire gate queue by doing the heavy testing in the check queue where changes are not dependent on each other. - 2) Run a slimmer set of jobs in the gate queue to maintain sanity, but not block as much on existing bugs in OpenStack. - 3) As a result, this should increase our confidence that changes put into the gate will pass. This will help prevent gate resets, and the disruption they cause by needing to invalidate and restart the whole gate queue. All good things, Sean ++. Might I also suggest one other thing that, IMO, would reduce gate contention? What if we added an option to git review that would inject something into the git commit message that would indicate the patch author felt the patch does not need to have integration testing run against it. Lots of patches make no substantive code changes and just clean up style, comments, or documentation. Having integration tests run for these patches is just noise and provides no value. There should be a way to indicate to Zuul not to run integration testing if some marker is in the commit message. For example, let us imagine that issuing: git review --skip-integration-tests would cause a git commit hook to execute that injected this marker into the commit message: Skip-Integration-Tests in the same way that the Change-Id commit hook injects the Change-Id: Ix line into the commit message. A -core reviewer would see Skip-Integration-Tests in the commit message. If the -core reviewer disagreed with the patch author that the patch did not have substantive code changes and actually wanted integration tests to be run for the patch, they could simply ask the patch author to run a git commit --amend and remove the Skip-Integration-Tests line from the commit message. If a -core reviewer did a +1A on a patch that had a Skip-Integration-Test marker in the commit message, Zuul would simply not execute the integration tests and would only execute things like rebase/merge conflict checks, and if all those basic tests succeeded, merge the patch into the target branch. This should significantly reduce the gate contention IMO, and should not be difficult at all to implement. Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [OpenStack-Dev] Cherry picking commit from oslo-incubator
On Wed, Jan 22, 2014 at 3:22 AM, Flavio Percoco fla...@redhat.com wrote: On 21/01/14 13:14 -0500, Joe Gordon wrote: On Jan 17, 2014 12:24 AM, Flavio Percoco fla...@redhat.com wrote: On 16/01/14 17:32 -0500, Doug Hellmann wrote: On Thu, Jan 16, 2014 at 3:19 PM, Ben Nemec openst...@nemebean.com wrote: On 2014-01-16 13:48, John Griffith wrote: Hey Everyone, A review came up today that cherry-picked a specific commit to OSLO Incubator, without updating the rest of the files in the module. I rejected that patch, because my philosophy has been that when you update/pull from oslo-incubator it should be done as a full sync of the entire module, not a cherry pick of the bits and pieces that you may or may not be interested in. As it turns out I've received a bit of push back on this, so it seems maybe I'm being unreasonable, or that I'm mistaken in my understanding of the process here. To me it seems like a complete and total waste to have an oslo-incubator and common libs if you're going to turn around and just cherry pick changes, but maybe I'm completely out of line. Thoughts?? I suppose there might be exceptions, but in general I'm with you. For one thing, if someone tries to pull out a specific change in the Oslo code, there's no guarantee that code even works. Depending on how the sync was done it's possible the code they're syncing never passed the Oslo unit tests in the form being synced, and since unit tests aren't synced to the target projects it's conceivable that completely broken code could get through Jenkins. Obviously it's possible to do a successful partial sync, but for the sake of reviewer sanity I'm -1 on partial syncs without a _very_ good reason (like it's blocking the gate and there's some reason the full module can't be synced). I agree. Cherry picking a single (or even partial) commit really should be avoided. The update tool does allow syncing just a single module, but that should be used very VERY carefully, especially because some of the changes we're making as we work on graduating some more libraries will include cross-dependent changes between oslo modules. Agrred. Syncing on master should be complete synchornization from Oslo incubator. IMHO, the only case where cherry-picking from oslo should be allowed is when backporting patches to stable branches. Master branches should try to keep up-to-date with Oslo and sync everything every time. When we started Oslo incubator, we treated that code as trusted. But since then there have been occasional issues when syncing the code. So Oslo incubator code has lost *my* trust. Therefore I am always a hesitant to do a full Oslo sync because I am not an expert on the Oslo code and I risk breaking something when doing it (the issue may not appear 100% of the time too). Syncing code in becomes the first time that code is run against tempest, which scares me. While this might be true in some cases, I think we should address it differently. Just dropping the trust on the project won't help much. How else would you address it? I would like to propose having a integration test job in Oslo incubator that syncs in the code, similar to how we do global requirements. But isn't this what other gates are for? I mean, when proposing an oslo sync, each project has it's own gate plus integrated tests that do this exact job. Sort of. There are two possible failure modes here: 1) oslo-incubator sync is attempted by Alice and the patch fails integration tests in the check queue. Alice doesn't know why it failed and has to go and resolve issue with oslo folks. Alice now thinks doing oslo-incubator syncs are a hassle and stops doing them and moves on to something else. 2) oslo-incubator sync is merged, but introduces non-deterministic bug. This wasn't caught in oslo-incubator because there are no integration tests there. The risk here is that the oslo-incubator code isn't run enough to detect non-deterministic bugs. In fact we just found one yesterday ( https://review.openstack.org/#/c/68275/). Additionally, what about a periodic jenkins job that does the Oslo syncs and is managed by the Oslo team itself? This would be awesome. It would take the burden of doing the sync from the project maintainers. Before doing this, though, we need to improve the `update` script. Currently, there's no good way to generate useful commit messages out of the sync. Cheers, FF -- @flaper87 Flavio Percoco ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org
Re: [openstack-dev] [Swift] domain-level quotas
Hi Matthieu, Am 22.01.14 20:02, schrieb Matthieu Huin: The idea is to have a middleware checking a domain's current usage against a limit set in the configuration before allowing an upload. The domain id can be extracted from the token, then used to query keystone for a list of projects belonging to the domain. Swift would then compute the domain usage in a similar fashion as the way it is currently done for accounts, and proceed from there. the problem might be to compute the current usage of all accounts within a domain. It won't be a problem if you have only a few accounts in a domain, but with tens, hundreds or even thousands accounts in a domain there will be a performance impact because you need to iterate over all accounts (doing a HEAD on every account) and sum up the total usage. I think some performance tests would be helpful (doing a HEAD on all accounts repeatedly with some PUTs in-between) to see if the performance impact is an issue at all (since there will be a lot of caching involved). Christian ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [OpenStack-Dev] Cherry picking commit from oslo-incubator
On Wed, Jan 22, 2014 at 5:19 AM, Julien Danjou jul...@danjou.info wrote: On Tue, Jan 21 2014, Joe Gordon wrote: I would like to propose having a integration test job in Oslo incubator that syncs in the code, similar to how we do global requirements. I don't think that would be possible as a voting job, since the point of oslo-incubator is to be able to break the API compatibility. If oslo-incubator can break APIs whenever it wants how can a downstream project stay in sync with oslo-incubator? -- Julien Danjou ;; Free Software hacker ; independent consultant ;; http://julien.danjou.info ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [QA] Meeting Thursday January 23rd at 22:00UTC
Just a quick reminder that the weekly OpenStack QA team IRC meeting will be tommorrow Thursday, January 23rd at 22:00 UTC in the #openstack-meeting channel. The agenda for tommorrow's meeting can be found here: https://wiki.openstack.org/wiki/Meetings/QATeamMeeting Anyone is welcome to add an item to the agenda. Also, I was asked to add the meeting time in other timezones to this weekly reminder email. So tommorrow's meeting will be at: 17:00 EST 07:00 JST 08:30 ACDT 23:00 CET 16:00 CST 14:00 PST -Matt Treinish ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [gantt] Sync up patches
On 01/21/2014 04:43 PM, Joe Gordon wrote: On Thu, Jan 16, 2014 at 4:42 PM, Dugger, Donald D donald.d.dug...@intel.com mailto:donald.d.dug...@intel.com wrote: OK, it looks like the concensus is that we don't try and keep the gantt tree in sync with nova instead we: 1) Get the current gantt tree to pass unit tests 2) Get gantt to pass integration tests (e.g. get it working as the nova scheduler) 3) Modify devstack to optionally use gantt 4) Freeze scheduler changes to nova as we: This should be covered the the standard feature freeze for Icehouse a) Extract all the changes that were needed to get gantt working b) Recreate the gantt tree from the current nova tree c) Apply all the patches from step 4.a 5) Unfreeze scheduler work but now all work is targeted exclusively to the gantt tree LGTM, although once we have a working gantt for Icehouse I think we should have another round of discussion about deprecating nova-scheduler in favor of gantt. On I high level that is something I think we all support, but the devil is in the details. Right, I don't think it's worth talking about until gantt is demonstrated to be working. Otherwise the deprecation path discussion is a moot point. It's really a switch I'd rather flip at the beginning of a release cycle, anyway. It seems quite likely that any deprecation cycle will start in Juno and not Icehouse at this point. -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] our update story: can people live with it?
Hi On 22 January 2014 21:33, Fox, Kevin M kevin@pnnl.gov wrote: I think we're pretty far off in a tangent though. My main point was, if you can't selectively restart services as needed, I'm not sure how useful patching the image really is over a full reboot. It should take on the same order of magnitude service unavailability I think. The in-place upgrade (currently known as takeovernode) is not yet well designed and while there is a script in tripleo-incubator called takeovernode, nobody is likely to resume working on it until a little later in our roadmap. The crude hack we have atm does no detection of services that need to be restarted, but that is not intended to suggest that we don't care about such a feature :) I think Clint has covered pretty much all the bases here, but I would re-iterate that in no way do we think the kind of upgrade we're working on at the moment (i.e. a nova rebuild driven instance reboot) is the only one that should exist. We know that in-place upgrades need to happen for tripleo's full story to be taken seriously, and we will get to it. If folk have suggestions for behaviours/techniques/tools, those would be great to capture, probably in https://etherpad.openstack.org/p/tripleo-image-updates . http://manpages.ubuntu.com/manpages/oneiric/man1/checkrestart.1.html is one such tool that we turned up in earlier research about how to detect services that need to be restarted after an upgrade. It's not a complete solution on its own, but it gets us some of the way. (Also, just because we favour low-entropy golden images for all software changes, doesn't mean that any given user can't choose to roll out an upgrade to some piece(s) of software via any other mechanism they choose, if that is what they feel is right for their operation. A combination of the two strategies is entirely possible). -- Cheers, Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Trove] how to list available configuration parameters for datastores
Ok with overwhelming support for #3. What if we modified #3 slightly because looking at it again seems like we could shorten the path since /datastores/datastore/configuration doesnt do anything. instead of #1 /datastores/datastore/configuration/parameters maybe: #2 /datastores/datastore/parameters #3 /datastores/datastore/configurationparameters On Wed, Jan 22, 2014 at 2:27 PM, Denis Makogon dmako...@mirantis.comwrote: Goodday to all. #3 looks more than acceptable. /datastores/datastore/configuration/parameters. According to configuration parameters design, a configuration set must be associated to exactly one datastore. Best regards, Denis Makogon. 2014/1/22 Michael Basnight mbasni...@gmail.com On Jan 22, 2014, at 10:19 AM, Kaleb Pomeroy wrote: My thoughts so far: /datastores/datastore/configuration/parameters (Option Three) + configuration set without an associated datastore is meaningless + a configuration set must be associated to exactly one datastore + each datastore must have 0-1 configuration set + All above relationships are immediately apparent - Listing all configuration sets becomes more difficult (which I don't think that is a valid concern) +1 to option 3, given what kaleb and craig have outlined so far. I dont see the above minus as a valid concern either, kaleb. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Changes coming in gate structure
On 01/22/2014 04:43 PM, Jay Pipes wrote: On Wed, 2014-01-22 at 15:39 -0500, Sean Dague wrote: snip == Executive Summary == To summarize, the effects of these changes will be: - 1) Decrease the impact of failures resetting the entire gate queue by doing the heavy testing in the check queue where changes are not dependent on each other. - 2) Run a slimmer set of jobs in the gate queue to maintain sanity, but not block as much on existing bugs in OpenStack. - 3) As a result, this should increase our confidence that changes put into the gate will pass. This will help prevent gate resets, and the disruption they cause by needing to invalidate and restart the whole gate queue. All good things, Sean ++. Might I also suggest one other thing that, IMO, would reduce gate contention? What if we added an option to git review that would inject something into the git commit message that would indicate the patch author felt the patch does not need to have integration testing run against it. Lots of patches make no substantive code changes and just clean up style, comments, or documentation. Having integration tests run for these patches is just noise and provides no value. There should be a way to indicate to Zuul not to run integration testing if some marker is in the commit message. For example, let us imagine that issuing: git review --skip-integration-tests would cause a git commit hook to execute that injected this marker into the commit message: Skip-Integration-Tests in the same way that the Change-Id commit hook injects the Change-Id: Ix line into the commit message. A -core reviewer would see Skip-Integration-Tests in the commit message. If the -core reviewer disagreed with the patch author that the patch did not have substantive code changes and actually wanted integration tests to be run for the patch, they could simply ask the patch author to run a git commit --amend and remove the Skip-Integration-Tests line from the commit message. If a -core reviewer did a +1A on a patch that had a Skip-Integration-Test marker in the commit message, Zuul would simply not execute the integration tests and would only execute things like rebase/merge conflict checks, and if all those basic tests succeeded, merge the patch into the target branch. This should significantly reduce the gate contention IMO, and should not be difficult at all to implement. Best, -jay To my unlearned eye, this feels like something that would be abused fairly quickly. Then we would have to look at some form of roll back plan. Thanks, Anita. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Changes coming in gate structure
Can you consider issuing the check job before forwarding to the gate if the current patch set is not already (re)based against master? That way, if it is, and there was a successful check job, even if it was days old, a new one would not be needed? Perhaps? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev