Re: [openstack-dev] [nova] Rocky PTG summary - cells
I would also prefer not having to rely on reading all the cell DBs to calculate quotas. On Thu, Mar 15, 2018 at 3:29 AM, melanie witt wrote: > > > I would prefer not to block instance creations because of "down" cells, ++ > so maybe there is some possibility to avoid it if we can get > "queued_for_delete" and "user_id" columns added to the instance_mappings > table. > > seems reason enough to add them from my perspective. Regards, Surya. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Rocky PTG summary - cells
Thanks for the reply, both solution looks reasonable. On Thu, Mar 15, 2018 at 10:29 AM, melanie witt wrote: > On Thu, 15 Mar 2018 09:54:59 +0800, Zhenyu Zheng wrote: > >> Thanks for the recap, got one question for the "block creation": >> >> * An attempt to create an instance should be blocked if the project >> has instances in a "down" cell (the instance_mappings table has a >> "project_id" column) because we cannot count instances in "down" >> cells for the quota check. >> >> >> Since users are not aware of any cell information, and the cells are >> mostly randomly selected, there could be high possibility that >> users(projects) instances are equally spreaded across cells. The proposed >> behavior seems can >> easily cause a lot of users couldn't create instances because one of the >> cells is down, isn't it too rude? >> > > To be honest, I share your concern. I had planned to change quota checks > to use placement instead of reading cell databases ASAP but hit a snag > where we won't be able to count instances from placement because we can't > determine the "type" of an allocation. Allocations can be instances, or > network-related resources, or volume-related resources, etc. Adding the > concept of an allocation "type" in placement has been a controversial > discussion so far. > > BUT ... we also said we would add a column like "queued_for_delete" to the > instance_mappings table. If we do that, we could count instances from the > instance_mappings table in the API database and count cores/ram from > placement and no longer rely on reading cell databases for quota checks. > Although, there is one more wrinkle: instance_mappings has a project_id > column but does not have a user_id column, so we wouldn't be able to get a > count by project + user needed for the quota check against user quota. So, > if people would not be opposed, we could also add a "user_id" column to > instance_mappings to handle that case. > > I would prefer not to block instance creations because of "down" cells, so > maybe there is some possibility to avoid it if we can get > "queued_for_delete" and "user_id" columns added to the instance_mappings > table. > > -melanie > > > > > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Rocky PTG summary - cells
On Thu, 15 Mar 2018 09:54:59 +0800, Zhenyu Zheng wrote: Thanks for the recap, got one question for the "block creation": * An attempt to create an instance should be blocked if the project has instances in a "down" cell (the instance_mappings table has a "project_id" column) because we cannot count instances in "down" cells for the quota check. Since users are not aware of any cell information, and the cells are mostly randomly selected, there could be high possibility that users(projects) instances are equally spreaded across cells. The proposed behavior seems can easily cause a lot of users couldn't create instances because one of the cells is down, isn't it too rude? To be honest, I share your concern. I had planned to change quota checks to use placement instead of reading cell databases ASAP but hit a snag where we won't be able to count instances from placement because we can't determine the "type" of an allocation. Allocations can be instances, or network-related resources, or volume-related resources, etc. Adding the concept of an allocation "type" in placement has been a controversial discussion so far. BUT ... we also said we would add a column like "queued_for_delete" to the instance_mappings table. If we do that, we could count instances from the instance_mappings table in the API database and count cores/ram from placement and no longer rely on reading cell databases for quota checks. Although, there is one more wrinkle: instance_mappings has a project_id column but does not have a user_id column, so we wouldn't be able to get a count by project + user needed for the quota check against user quota. So, if people would not be opposed, we could also add a "user_id" column to instance_mappings to handle that case. I would prefer not to block instance creations because of "down" cells, so maybe there is some possibility to avoid it if we can get "queued_for_delete" and "user_id" columns added to the instance_mappings table. -melanie __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Rocky PTG summary - cells
Thanks for the recap, got one question for the "block creation": * An attempt to create an instance should be blocked if the project has > instances in a "down" cell (the instance_mappings table has a "project_id" > column) because we cannot count instances in "down" cells for the quota > check. Since users are not aware of any cell information, and the cells are mostly randomly selected, there could be high possibility that users(projects) instances are equally spreaded across cells. The proposed behavior seems can easily cause a lot of users couldn't create instances because one of the cells is down, isn't it too rude? BR, Kevin Zheng On Thu, Mar 15, 2018 at 2:26 AM, Chris Dent wrote: > On Wed, 14 Mar 2018, melanie witt wrote: > > I’ve created a summary etherpad [0] for the nova cells session from the >> PTG and included a plain text export of it on this email. >> > > Nice summary. Apparently I wasn't there or paying attention when > something was decided: > > * An attempt to delete an instance in a "down" cell should result in a >> 500 or 503 error. >> > > Depending on how we look at it, this doesn't really align with what > 500 or 503 are supposed to be used. They are supposed to indicate > that the web server is broken in some fashion: 500 being an > unexpected and uncaught exception in the web server, 503 that the > web server is either overloaded or down for maintenance. > > So, you could argue that 409 is the right thing here (as seems to > always happen when we discuss these things). You send a DELETE to > kill the instance, but the current state of the instance is "on a > cell that can't be reached" which is in "conflict" with the state > required to do a DELETE. > > If a 5xx is really necessary, for whatever reason, then 503 is a > better choice than 500 because it at least signals that the broken > thing is sort of "over there somewhere" rather than the web server > having an error (which is what 500 is supposed to mean). > > -- > Chris Dent ٩◔̯◔۶ https://anticdent.org/ > freenode: cdent tw: @anticdent > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Rocky PTG summary - cells
On Wed, 14 Mar 2018, melanie witt wrote: I’ve created a summary etherpad [0] for the nova cells session from the PTG and included a plain text export of it on this email. Nice summary. Apparently I wasn't there or paying attention when something was decided: * An attempt to delete an instance in a "down" cell should result in a 500 or 503 error. Depending on how we look at it, this doesn't really align with what 500 or 503 are supposed to be used. They are supposed to indicate that the web server is broken in some fashion: 500 being an unexpected and uncaught exception in the web server, 503 that the web server is either overloaded or down for maintenance. So, you could argue that 409 is the right thing here (as seems to always happen when we discuss these things). You send a DELETE to kill the instance, but the current state of the instance is "on a cell that can't be reached" which is in "conflict" with the state required to do a DELETE. If a 5xx is really necessary, for whatever reason, then 503 is a better choice than 500 because it at least signals that the broken thing is sort of "over there somewhere" rather than the web server having an error (which is what 500 is supposed to mean). -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent__ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] Rocky PTG summary - cells
Hi everyone, I’ve created a summary etherpad [0] for the nova cells session from the PTG and included a plain text export of it on this email. Thanks, -melanie [0] https://etherpad.openstack.org/p/nova-ptg-rocky-cells-summary *Cells: Rocky PTG Summary https://etherpad.openstack.org/p/nova-ptg-rocky L11 *Key topics * How to handle a "down" cell * How to handle each cell having a separate ceph cluster * How do we plan to progress on removing "upcalls" *Agreements and decisions * In order to list instances even when we can't connect to a cell database, we'll construct something minimal from the API database and we'll add a column to the instance_mappings table such as "queued_for_delete" to determine which are the non-deleted instances and then list them. * tssurya will write a spec for the new column. * We're not going to pursue the approach of having backup URLs for cell databases to fall back on when a cell is "down". * An attempt to delete an instance in a "down" cell should result in a 500 or 503 error. * An attempt to create an instance should be blocked if the project has instances in a "down" cell (the instance_mappings table has a "project_id" column) because we cannot count instances in "down" cells for the quota check. * At this time, we won't pursue the idea of adding an allocation "type" concept to placement (which could be leveraged for counting cores/ram resource usage for quotas). * The topic of each cell having a separate ceph cluster and having each cell cache images in the imagebackend led to the topic of the "cinder imagebackend" again. * Implementing a cinder imagebackend in nova would be an enormous undertaking that realistically isn't going to happen. * A pragmatic solution was suggested to make boot-from-volume a first class citizen and make automatic boot-from-volume work well, so that we let cinder handle the caching of images in this scenario (and of course handle all of the other use cases for cinder imagebackend). This would eventually lead to the deprecation of the ceph imagebackend. Further discussion is required on this. * On removing upcalls, progress in placement will help address the remaining upcalls. * dansmith will work on filtering compute hosts using the volume availability zone to address the cinder/cross_az_attach issue. mriedem and bauzas will review. * For the xenapi host aggregate upcall, the xenapi subteam will remove it as a patch on top of their live-migration support patch series. * For the server group late affinity check up-call for server create and evacuate, the plan is to handle it race-free with placement/scheduler. However, affinity modeling in placement isn't slated for work in Rocky, so the late affinity check upcall will have to be removed in S, at the earliest. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev