Re: [openstack-dev] [oslo.db]A proposal for DB read/write separation
Li Ma, This is interesting, In general I am in favor of expanding the scope of any read/write separation capabilities that we have. I'm not clear what exactly you are proposing, hopefully you can answer some of my questions inline. The thing I had thought of immediately was detection of whether an operation is read or write and integrating that into oslo.db or sqlalchemy. Mike Bayer has some thoughts on that[1] and there are other approaches around that can be copied/learned from. These sorts of things are clear to me and while moving towards more transparency for the developer, still require context. Please, share with us more details on your proposal. -Mike [1] http://www.percona.com/doc/percona-xtradb-cluster/5.5/wsrep-system-index.html [2] http://techspot.zzzeek.org/2012/01/11/django-style-database-routers-in-sqlalchemy/ On Thu, Aug 7, 2014 at 10:03 PM, Li Ma skywalker.n...@gmail.com wrote: Getting a massive amount of information from data storage to be displayed is where most of the activity happens in OpenStack. The two activities of reading data and writing (creating, updating and deleting) data are fundamentally different. The optimization for these two opposite database activities can be done by physically separating the databases that service these two different activities. All the writes go to database servers, which then replicates the written data to the database server(s) dedicated to servicing the reads. Currently, AFAIK, many OpenStack deployment in production try to take advantage of MySQL (includes Percona or MariaDB) multi-master Galera cluster. It is possible to design and implement a read/write separation schema for such a DB cluster. I just want to clarify here, are you suggesting that _all_ reads and _all_ writes would hit different databases? It would be interesting to see a relational schema design that would allow that to work. That seems like something that you wouldn't try in a relational database at all. Actually, OpenStack has a method for read scalability via defining master_connection and slave_connection in configuration, but this method lacks of flexibility due to deciding master or slave in the logical context(code). It's not transparent for application developer. As a result, it is not widely used in all the OpenStack projects. So, I'd like to propose a transparent read/write separation method for oslo.db that every project may happily takes advantage of it without any code modification. The problem with making it transparent to the developer is that, well, you can't unless your application is tolerant of old data in an asynchronous replication world. If you are in a fully synchronous world you could fully separate writes and reads, but what would be the point since your database performance is now trash anyway. Please note that although Galera is a considered a synchronous model it's not actually all the way there. You can break the certification of course, but there are also things that are done to keep the performance to an acceptable level. Take for example the wswrep_causal_reads configuration parameter[2]. Without this sucker being turned on you can't make read/write separation transparent to the developer. Turning it on causes a significant performance degradation unfortunately. I feel like this is a problem fundamental to a consistent relational dataset. If you are okay with eventual consistency it's okay, you can make things transparent to the developer. But by it's very nature relational datasets are well, relational, they need all the other pieces and those pieces need to be consistent. I guess what I am saying is that your proposal needs more details. Please respond with specifics and examples to move the discussion forward. Moreover, I'd like to put it in the mailing list in advance to make sure it is acceptable for oslo.db. I'd appreciate any comments. br. Li Ma ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo.db]A proposal for DB read/write separation
Lee, No problem about mixing up the Mike's, there's a bunch of us out there :-). What are you are describing here is very much like a spec I wrote for Nova[1] a couple months ago and then never got back to. At the time I considered gearing the feature toward oslo.db and I can't remember exactly why I didn't. I think it probably had more to do with having folks that are familiar with the problem reviewing code in Nova than anything else. Anyway, I'd like to revisit this in Kilo or if you see a nice way to integrate this into oslo.db I'd love to see your proposal. -Mike [1] https://review.openstack.org/#/c/93466/ On Sun, Aug 10, 2014 at 10:30 PM, Li Ma skywalker.n...@gmail.com wrote: not sure if I said that :). I know extremely little about galera. Hi Mike Bayer, I'm so sorry I mistake you from Mike Wilson in the last post. :-) Also, say sorry to Mike Wilson. I’d totally guess that Galera would need to first have SELECTs come from a slave node, then the moment it sees any kind of DML / writing, it transparently switches the rest of the transaction over to a writer node. You are totally right. @transaction.writer def read_and_write_something(arg1, arg2, …): # … @transaction.reader def only_read_something(arg1, arg2, …): # … The first approach that I had in mind is the decorator-based method to separates read/write ops like what you said. To some degree, it is almost the same app-level approach to the master/slave configuration, due to transparency to developers. However, as I stated before, the current approach is merely used in OpenStack. Decorator is more friendly than use_slave_flag or something like that. If ideally transparency cannot be achieved, to say the least, decorator-based app-level switching is a great improvement, compared with the current implementation. OK so Galera would perhaps have some way to make this happen, and that's great. If any Galera expert here, please correct me. At least in my experiment, transactions work in that way. this (the word “integrate”, and what does that mean) is really the only thing making me nervous. Mike, just feel free. What I'd like to do is to add a django-style routing method as a plus in oslo.db, like: [database] # Original master/slave configuration master_connection = slave_connection = # Only Support Synchronous Replication enable_auto_routing = True [db_cluster] master_connection = master_connection = ... slave_connection = slave_connection = ... HOWEVER, I think it needs more investigation, so this is why I'd like to put it in the mailing list in the early stage to raise some discussions in depth. I'm not a Galera expert. I really appreciate any challenges here. Thanks, Li Ma - Original Message - From: Mike Bayer mba...@redhat.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Sent: 星期日, 2014年 8 月 10日 下午 11:57:47 Subject: Re: [openstack-dev] [oslo.db]A proposal for DB read/write separation On Aug 10, 2014, at 11:17 AM, Li Ma skywalker.n...@gmail.com wrote: How about Galera multi-master cluster? As Mike Bayer said, it is virtually synchronous by default. It is still possible that outdated rows are queried that make results not stable. not sure if I said that :). I know extremely little about galera. Let's move forward to synchronous replication, like Galera with causal-reads on. The dominant advantage is that it has consistent relational dataset support. The disadvantage are that it uses optimistic locking and its performance sucks (also said by Mike Bayer :-). For optimistic locking problem, I think it can be dealt with by retry-on-deadlock. It's not the topic here. I *really* don’t think I said that, because I like optimistic locking, and I’ve never used Galera ;). Where I am ignorant here is of what exactly occurs if you write some rows within a transaction with Galera, then do some reads in that same transaction. I’d totally guess that Galera would need to first have SELECTs come from a slave node, then the moment it sees any kind of DML / writing, it transparently switches the rest of the transaction over to a writer node. No idea, but it has to be something like that? So, the transparent read/write separation is dependent on such an environment. SQLalchemy tutorial provides code sample for it [1]. Besides, Mike Bayer also provides a blog post for it [2]. So this thing with the “django-style routers”, the way that example is, it actually would work poorly with a Session that is not in “autocommit” mode, assuming you’re working with regular old databases that are doing some simple behind-the-scenes replication. Because again, if you do a flush, those rows go to the master, if the transaction is still open, then reading from the slaves you won’t see the rows you just inserted.So in reality, that example is kind of crappy
Re: [openstack-dev] [Oslo] First steps towards amqp 1.0
This is the first time I've heard of the dispatch router, I'm really excited now that I've looked at it a bit. Thx Gordon and Russell for bringing this up. I'm very familiar with the scaling issues associated with any kind of brokered messaging solution. We grew an Openstack installation to about 7,000 nodes and started having significant scaling issues with the qpid broker. We've talked about our problems at a couple summits in a fair amount of detail[1][2]. I won't bother repeating the information in this thread. I really like the idea of separating the logic of routing away from the the message emitter. Russell mentioned the 0mq matchmaker, we essentially ditched the qpid broker for direct communication via 0mq and it's matchmaker. It still has a lot of problems which dispatch seems to address. For example, in ceilometer we have store-and-forward behavior as a requirement. This kind of communication requires a broker but 0mq doesn't really officially support one, which means we would probably end up with some broker as part of OpenStack. Matchmaker is also a fairly basic implementation of what is essentially a directory. For any sort of serious production use case you end up sprinkling JSON files all over the place or maintaining a Redis backend. I feel like the matchmaker needs a bunch more work to make modifying the directory simpler for operations. I would rather put that work into a separate project like dispatch than have to maintain essentially a one off in Openstack's codebase. I wonder how this fits into messaging from a driver perspective in Openstack or even how this fits into oslo.messaging? Right now we have topics for binaries(compute, network, consoleauth, etc), hostname.service_topic for nodes, fanout queue per node (not sure if kombu also has this) and different exchanges per project. If we can abstract the routing from the emission of the message all we really care about is emitter, endpoint, messaging pattern (fanout, store and forward, etc). Also not sure if there's a dispatch analogue in the rabbit world, if not we need to have some mapping of concepts etc between impls. So many questions, but in general I'm really excited about this and eager to contribute. For sure I will start playing with this in Bluehost's environments that haven't been completely 0mqized. I also have some lingering concerns about qpid in general. Beyond scaling issues I've run into some other terrible bugs that motivated our move away from it. Again, these are mentioned in our presentations at summits and I'd be happy to talk more about them in a separate discussion. I've also been able to talk to some other qpid+openstack users who have seen the same bugs. Another large installation that comes to mind is Qihoo 360 in China. They run a few thousand nodes with qpid for messaging and are familiar with the snags we run into. Gordon, I would really appreciate if you could watch those two talks and comment. The bugs are probably separate from the dispatch router discussion, but it does dampen my enthusiasm a bit not knowing how to fix issues beyond scale :-(. -Mike Wilson [1] http://www.openstack.org/summit/portland-2013/session-videos/presentation/using-openstack-in-a-traditional-hosting-environment [2] http://www.openstack.org/summit/openstack-summit-hong-kong-2013/session-videos/presentation/going-brokerless-the-transition-from-qpid-to-0mq On Mon, Dec 9, 2013 at 4:29 PM, Mark McLoughlin mar...@redhat.com wrote: On Mon, 2013-12-09 at 16:05 +0100, Flavio Percoco wrote: Greetings, As $subject mentions, I'd like to start discussing the support for AMQP 1.0[0] in oslo.messaging. We already have rabbit and qpid drivers for earlier (and different!) versions of AMQP, the proposal would be to add an additional driver for a _protocol_ not a particular broker. (Both RabbitMQ and Qpid support AMQP 1.0 now). By targeting a clear mapping on to a protocol, rather than a specific implementation, we would simplify the task in the future for anyone wishing to move to any other system that spoke AMQP 1.0. That would no longer require a new driver, merely different configuration and deployment. That would then allow openstack to more easily take advantage of any emerging innovations in this space. Sounds sane to me. To put it another way, assuming all AMQP 1.0 client libraries are equal, all the operator cares about is that we have a driver that connect into whatever AMQP 1.0 messaging topology they want to use. Of course, not all client libraries will be equal, so if we don't offer the choice of library/driver to the operator, then the onus is on us to pick the best client library for this driver. (Enjoying the rest of this thread too, thanks to Gordon for his insights) Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Neutron Distributed Virtual Router
I guess the question that immediately comes to mind is, is there anyone that doesn't want a distributed router? I guess there could be someone out there that hates the idea of traffic flowing in a balanced fashion, but can't they just run a single router then? Does there really need to be some flag to disable/enable this behavior? Maybe I am oversimplifying things... you tell me. -Mike Wilson On Mon, Dec 9, 2013 at 3:01 PM, Vasudevan, Swaminathan (PNB Roseville) swaminathan.vasude...@hp.com wrote: Hi Folks, We are in the process of defining the API for the Neutron Distributed Virtual Router, and we have a question. Just wanted to get the feedback from the community before we implement and post for review. We are planning to use the “distributed” flag for the routers that are supposed to be routing traffic locally (both East West and North South). This “distributed” flag is already there in the “neutronclient” API, but currently only utilized by the “Nicira Plugin”. We would like to go ahead and use the same “distributed” flag and add an extension to the router table to accommodate the “distributed flag”. Please let us know your feedback. Thanks. Swaminathan Vasudevan Systems Software Engineer (TC) HP Networking Hewlett-Packard 8000 Foothills Blvd M/S 5541 Roseville, CA - 95747 tel: 916.785.0937 fax: 916.785.1815 email: swaminathan.vasude...@hp.com ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Neutron] Running multiple neutron-servers
Hi Neutron team, I haven't been involved in neutron meetings for quite some time so I'm not sure where we are on this at this point. It is often recommended in OpenStack guides and other operational materials to run multiple neutron-servers to deal with the API load from Nova. Things like the _heal_instance_info_caches periodic task as well as just normal create requests are pretty heavy. Those issues aside I think we can all agree that it would good for the neutron-server to be horizontally scalable. I don't have a handle on the all the issues surrounding this. However, I did report a bug a few months ago about concurrency and updates to the IpAvailabilityRanges[1]. There was a fix proposed by Zhang Hua [2] that seems like it needs more discussion. Essentially, Salvatore has concerns about patching up a design flaw from what I gather. At the same time, we still have had this issue since the initial release of neutron(quantum) and it is still a really big deal for deployers. I would like to propose that we pick up the conversation where it left off on the proposed fix and _also_ consider any possible redesign going forward. Could I get some feedback from Salvatore specifically and other members of the team on this? I would also be happy to pitch in towards whatever solution is decided on provided we can rescue the poor deployers :-). -Mike Wilson [1] https://bugs.launchpad.net/neutron/+bug/1214115 [2] https://review.openstack.org/43275 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Simulating many fake nova compute nodes for scheduler testing
On Mon, Mar 3, 2014 at 3:10 PM, Sergey Skripnick sskripn...@mirantis.comwrote: I can run multiple compute service in same hosts without containers. Containers give you a nice isolation and another way to try a more realistic scenario, but my initial goal now is to be able to simulate many fake compute node scenario with as little resources as possible. I believe it is impossible to use threads without changes in the code. Having gone the threads route once myself, I can say from experience that it requires changes to the code. I was able to get threads up and running with a few modifications, but there were other issues that I never fully resolved that make me lean more towards the container model that has been discussed earlier in the thread. Btw, I would suggest having a look at Rally, the Openstack Benchmarking Service. They have deployment frameworks that use LXC that you might be able to write a thread model for. -Mike -- Regards, Sergey Skripnick ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][LBaaS] Mini-summit Interest?
Hangouts worked well at the nova mid-cycle meetup. Just make sure you have your network situation sorted out before hand. Bandwidth and firewalls are what comes to mind immediately. -Mike On Tue, Mar 11, 2014 at 9:34 AM, Tom Creighton tom.creigh...@rackspace.comwrote: When the Designate team had their mini-summit, they had an open Google Hangout for remote participants. We could even have an open conference bridge if you are not partial to video conferencing. With the issue of inclusion solved, let's focus on a date that is good for the team! Cheers, Tom Creighton On Mar 10, 2014, at 4:10 PM, Edgar Magana emag...@plumgrid.com wrote: Eugene, A have a few arguments why I believe this is not 100% inclusive * Is the foundation involved on this process? How? What is the budget? Who is the responsible from the foundation side? * If somebody made already travel arraignments, it won't be possible to make changes at not cost. * Staying extra days in a different city could impact anyone's budget * As a OpenStack developer. I want to understand why the summit is not enough for deciding the next steps for each project. If that is the case, I would prefer to make changes on the organization of the summit instead of creating mini-summits all around! I could continue but I think these are good enough. I could agree with your point about previous summits being distractive for developers, this is why this time the OpenStack foundation is trying very hard to allocate specific days for the conference and specific days for the summit. The point that I am totally agree with you is that we SHOULD NOT have session about work that will be done no matter what! Those are just a waste of good time that could be invested in very interesting discussions about topics that are still not clear. I would recommend that you express this opinion to Mark. He is the right guy to decide which sessions will bring interesting discussions and which ones will be just a declaration of intents. Thanks, Edgar From: Eugene Nikanorov enikano...@mirantis.com Reply-To: OpenStack List openstack-dev@lists.openstack.org Date: Monday, March 10, 2014 10:32 AM To: OpenStack List openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [Neutron][LBaaS] Mini-summit Interest? Hi Edgar, I'm neutral to the suggestion of mini summit at this point. Why do you think it will exclude developers? If we keep it 1-3 days prior to OS Summit in Atlanta (e.g. in the same city) that would allow anyone who joins OS Summit to save on extra travelling. OS Summit itself is too distractive to have really productive discussions, unless your missing the sessions and spend time discussing. For instance design sessions basically only good for declaration of intents, but not for real discussion of a complex topic at meaningful detail level. What would be your suggestions to make this more inclusive? I think the time and place is the key here - hence Atlanta and few days prior OS summit. Thanks, Eugene. On Mon, Mar 10, 2014 at 10:59 PM, Edgar Magana emag...@plumgrid.com wrote: Team, I found that having a mini-summit with a very short notice means excluding a lot of developers of such an interesting topic for Neutron. The OpenStack summit is the opportunity for all developers to come together and discuss the next steps, there are many developers that CAN NOT afford another trip for a special summit. I am personally against that and I do support Mark's proposal of having all the conversation over IRC and mailing list. Please, do not start excluding people that won't be able to attend another face-to-face meeting besides the summit. I believe that these are the little things that make an open source community weak if we do not control it. Thanks, Edgar On 3/6/14 9:51 PM, Mark McClain mmccl...@yahoo-inc.com wrote: On Mar 6, 2014, at 4:31 PM, Jay Pipes jaypi...@gmail.com wrote: On Thu, 2014-03-06 at 21:14 +, Youcef Laribi wrote: +1 I think if we can have it before the Juno summit, we can take concrete, well thought-out proposals to the community at the summit. Unless something has changed starting at the Hong Kong design summit (which unfortunately I was not able to attend), the design summits have always been a place to gather to *discuss* and *debate* proposed blueprints and design specs. It has never been about a gathering to rubber-stamp proposals that have already been hashed out in private somewhere else. You are correct that is the goal of the design summit. While I do think it is wise to discuss the next steps with LBaaS at this point in time, I am not a proponent of in person mini-design summits. Many contributors to LBaaS are distributed all over the global, and scheduling a mini summit with short notice will exclude
Re: [openstack-dev] [all][db][performance] Proposal: Get rid of soft deletion (step by step)
Undeleting things is an important use case in my opinion. We do this in our environment on a regular basis. In that light I'm not sure that it would be appropriate just to log the deletion and git rid of the row. I would like to see it go to an archival table where it is easily restored. -Mike On Mon, Mar 10, 2014 at 3:44 PM, Joshua Harlow harlo...@yahoo-inc.comwrote: Sounds like a good idea to me. I've never understood why we treat the DB as a LOG (keeping deleted == 0 records around) when we should just use a LOG (or similar system) to begin with instead. Does anyone use the feature of switching deleted == 1 back to deleted = 0? Has this worked out for u? Seems like some of the feedback on https://etherpad.openstack.org/p/operators-feedback-mar14 also suggests that this has been a operational pain-point for folks (Tool to delete things properly suggestions and such...). From: Boris Pavlovic bpavlo...@mirantis.com Reply-To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Date: Monday, March 10, 2014 at 1:29 PM To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Victor Sergeyev vserge...@mirantis.com Subject: [openstack-dev] [all][db][performance] Proposal: Get rid of soft deletion (step by step) Hi stackers, (It's proposal for Juno.) Intro: Soft deletion means that records from DB are not actually deleted, they are just marked as a deleted. To mark record as a deleted we put in special table's column deleted record's ID value. Issue 1: Indexes Queries We have to add in every query AND deleted == 0 to get non-deleted records. It produce performance issue, cause we should add it in any index one extra column. As well it produce extra complexity in db migrations and building queries. Issue 2: Unique constraints Why we store ID in deleted and not True/False? The reason is that we would like to be able to create real DB unique constraints and avoid race conditions on insert operation. Sample: we Have table (id, name, password, deleted) we would like to put in column name only unique value. Approach without UC: if count(`select where name = name`) == 0: insert(...) (race cause we are able to add new record between ) Approach with UC: try: insert(...) except Duplicate: ... So to add UC we have to add them on (name, deleted). (to be able to make insert/delete/insert with same name) As well it produce performance issues, because we have to use Complex unique constraints on 2 or more columns. + extra code complexity in db migrations. Issue 3: Garbage collector It is really hard to make garbage collector that will have good performance and be enough common to work in any case for any project. Without garbage collector DevOps have to cleanup records by hand, (risk to break something). If they don't cleanup DB they will get very soon performance issue. To put in a nutshell most important issues: 1) Extra complexity to each select query extra column in each index 2) Extra column in each Unique Constraint (worse performance) 3) 2 Extra column in each table: (deleted, deleted_at) 4) Common garbage collector is required To resolve all these issues we should just remove soft deletion. One of approaches that I see is in step by step removing deleted column from every table with probably code refactoring. Actually we have 3 different cases: 1) We don't use soft deleted records: 1.1) Do .delete() instead of .soft_delete() 1.2) Change query to avoid adding extra deleted == 0 to each query 1.3) Drop deleted and deleted_at columns 2) We use soft deleted records for internal stuff e.g. periodic tasks 2.1) Refactor code somehow: E.g. store all required data by periodic task in some special table that has: (id, type, json_data) columns 2.2) On delete add record to this table 2.3-5) similar to 1.1, 1.2, 13 3) We use soft deleted records in API 3.1) Deprecated API call if it is possible 3.2) Make proxy call to ceilometer from API 3.3) On .delete() store info about records in (ceilometer, or somewhere else) 3.4-6) similar to 1.1, 1.2, 1.3 This is not ready RoadMap, just base thoughts to start the constructive discussion in the mailing list, so %stacker% your opinion is very important! Best regards, Boris Pavlovic ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][db][performance] Proposal: Get rid of soft deletion (step by step)
The restore use case is for sure inconsistently implemented and used. I think I agree with Boris that we treat it as separate and just move on with cleaning up soft delete. I imagine most deployments don't like having most of the rows in their table be useless and make db access slow? That being said, I am a little sad my hacky restore method will need to be reworked :-). -Mike On Thu, Mar 13, 2014 at 1:30 PM, Clint Byrum cl...@fewbar.com wrote: Excerpts from Tim Bell's message of 2014-03-12 11:02:25 -0700: If you want to archive images per-say, on deletion just export it to a 'backup tape' (for example) and store enough of the metadata on that 'tape' to re-insert it if this is really desired and then delete it from the database (or do the export... asynchronously). The same could be said with VMs, although likely not all resources, aka networks/.../ make sense to do this. So instead of deleted = 1, wait for cleaner, just save the resource (if possible) + enough metadata on some other system ('backup tape', alternate storage location, hdfs, ceph...) and leave it there unless it's really needed. Making the database more complex (and all associated code) to achieve this same goal seems like a hack that just needs to be addressed with a better way to do archiving. In a cloudy world of course people would be able to recreate everything they need on-demand so who needs undelete anyway ;-) I have no problem if there was an existing process integrated into all of the OpenStack components which would produce me an archive trail with meta data and a command to recover the object from that data. Currently, my understanding is that there is no such function and thus the proposal to remove the deleted column is premature. That seems like an unreasonable request of low level tools like Nova. End user applications and infrastructure management should be responsible for these things and will do a much better job of it, as you can work your own business needs for reliability and recovery speed into an application aware solution. If Nova does it, your cloud just has to provide everybody with the same un-delete, which is probably overkill for _many_ applications. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [db][all] (Proposal) Restorable Delayed deletion of OS Resources
After a read through seems pretty good. +1 On Thu, Mar 13, 2014 at 1:42 PM, Boris Pavlovic bpavlo...@mirantis.comwrote: Hi stackers, As a result of discussion: [openstack-dev] [all][db][performance] Proposal: Get rid of soft deletion (step by step) http://osdir.com/ml/openstack-dev/2014-03/msg00947.html I understood that there should be another proposal. About how we should implement Restorable Delayed Deletion of OpenStack Resource in common way without these hacks with soft deletion in DB. It is actually very simple, take a look at this document: https://docs.google.com/document/d/1WGrIgMtWJqPDyT6PkPeZhNpej2Q9Mwimula8S8lYGV4/edit?usp=sharing Best regards, Boris Pavlovic ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][db][performance] Proposal: Get rid of soft deletion (step by step)
that will be in oslo) 2) Split of work of getting rid of soft deletion in steps (that I already mention): a) remove soft deletion from places where we are not using it b) replace internal code where we are using soft deletion to that framework c) replace API stuff using ceilometer (for logs) or this framework (for restorable stuff) To put in a nutshell: Restoring Delete resources / Delayed Deletion != Soft deletion. Best regards, Boris Pavlovic On Thu, Mar 13, 2014 at 9:21 PM, Mike Wilson geekinu...@gmail.com mailto:geekinu...@gmail.com wrote: For some guests we use the LVM imagebackend and there are times when the guest is deleted on accident. Humans, being what they are, don't back up their files and don't take care of important data, so it is not uncommon to use lvrestore and undelete an instance so that people can get their data. Of course, this is not always possible if the data has been subsequently overwritten. But it is common enough that I imagine most of our operators are familiar with how to do it. So I guess my saying that we do it on a regular basis is not quite accurate. Probably would be better to say that it is not uncommon to do this, but definitely not a daily task or something of that ilk. I have personally undeleted an instance a few times after accidental deletion also. I can't remember the specifics, but I do remember doing it :-). -Mike On Tue, Mar 11, 2014 at 12:46 PM, Johannes Erdfelt johan...@erdfelt.com mailto:johan...@erdfelt.com wrote: On Tue, Mar 11, 2014, Mike Wilson geekinu...@gmail.com mailto:geekinu...@gmail.com wrote: Undeleting things is an important use case in my opinion. We do this in our environment on a regular basis. In that light I'm not sure that it would be appropriate just to log the deletion and git rid of the row. I would like to see it go to an archival table where it is easily restored. I'm curious, what are you undeleting and why? JE ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo.messaging] [zeromq] nova-rpc-zmq-receiver bottleneck
Hi Yatin, I'm glad you are thinking about the drawbacks of the zmq-receiver causes, I want to give you a reason to keep the zmq-receiver and get your feedback. The way I think about the zmq-receiver is a tiny little mini-broker that exists separate from any other OpenStack service. As such, it's implementation can be augmented to support store-and-forward and possibly other messaging behaviors that are desirable for ceilometer currently and possibly other things in the future. Integrating the receiver into each service is going to remove its independence and black box nature and give it all the bugs and quirks of any project it gets lumped in with. I would prefer that we continue to improve zmq-receiver to overcome the tough parts. Either that or find a good replacement and use that. An example of a possible replacement might be the qpid dispatch router[1], although this guy explicitly wants to avoid any store and forward behaviors. Of course, dispatch router is going to be tied to qpid, I just wanted to give an example of something with similar functionality. -Mike On Thu, Mar 13, 2014 at 11:36 AM, yatin kumbhare yatinkumbh...@gmail.comwrote: Hello Folks, When zeromq is use as rpc-backend, nova-rpc-zmq-receiver service needs to be run on every node. zmq-receiver receives messages on tcp://*:9501 with socket type PULL and based on topic-name (which is extracted from received data), it forwards data to respective local services, over IPC protocol. While, openstack services, listen/bind on IPC socket with socket-type PULL. I see, zmq-receiver as a bottleneck and overhead as per the current design. 1. if this service crashes: communication lost. 2. overhead of running this extra service on every nodes, which just forward messages as is. I'm looking forward to, remove zmq-receiver service and enable direct communication (nova-* and cinder-*) across and within node. I believe, this will create, zmq experience more seamless. the communication will change from IPC to zmq TCP socket type for each service. like: rpc.cast from scheduler -to - compute would be direct rpc message passing. no routing through zmq-receiver. Now, TCP protocol, all services will bind to unique port (port-range could be, 9501-9510) from nova.conf, rpc_zmq_matchmaker = nova.openstack.common.rpc.matchmaker_ring.MatchMakerRing. I have put arbitrary ports numbers after the service name. file:///etc/oslo/matchmaker_ring.json { cert:9507: [ controller ], cinder-scheduler:9508: [ controller ], cinder-volume:9509: [ controller ], compute:9501: [ controller,computenodex ], conductor:9502: [ controller ], consoleauth:9503: [ controller ], network:9504: [ controller,computenodex ], scheduler:9506: [ controller ], zmq_replies:9510: [ controller,computenodex ] } Here, the json file would keep track of ports for each services. Looking forward to seek community feedback on this idea. Regards, Yatin ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] BUG? nova-compute should delete unused instance files on boot
+1 to what Chris suggested. Zombie state that doesn't affect quota, but doesn't create more problems by trying to reuse resources that aren't available. That way we can tell the customer that things are deleted, but we don't need to break our cloud by screwing up future schedule requests. -Mike On Tue, Oct 8, 2013 at 11:58 AM, Joshua Harlow harlo...@yahoo-inc.comwrote: Sure, basically a way around this is to do migration of the VM's on the host u are doing maintenance on. That¹s one way y! has its ops folks work around this. Another solution is just don't do local_deletes :-P It sounds like your 'zombie' state would be useful as a way to solve this also. To me though any solution that creates 2 sets of the same resources in your cloud though isn't a good way (which afaik the current local_delete aims for) as it causes maintenance and operator pain (and needless problems that a person has to go in and figure out resolve). I'd rather have the delete fail, leave the quota of the user alone, and tell the user the hypervisor where the VM is on is currently under maintenance (ideally the `host-update` resolves this, as long as its supported on all hypervisor types). At least that gives a sane operational experience and doesn't cause support bugs that are hard to resolve. But maybe this type of action should be more configurable. Allow or disallow local deletes. On 10/7/13 11:50 PM, Chris Friesen chris.frie...@windriver.com wrote: On 10/07/2013 05:30 PM, Joshua Harlow wrote: A scenario that I've seen: Take 'nova-compute' down for software upgrade, API still accessible since you want to provide API uptime (aka not taking the whole cluster offline). User Y deletes VM on that hypervisor where nova-compute is currently down, DB locally deletes, at this point VM 'A' is still active but nova thinks its not. Isn't this sort of thing exactly what nova host-update --maintenance enable hostname was intended for? I.e., push all the VMs off that compute node so you can take down the services without causing problems. Its kind of a pain that the host-update stuff is implemented at the hypervisor level though (and isn't available for libvirt), it seems like it could be implemented at a more generic level. (And on that note, why isn't there a host table in the database since we can have multiple services running on one host and we might want to take them all down?) Alternately, maybe we need to have a 2-stage delete, where the VM gets put into a zombie state in the database and the resources can't be reused until the compute service confirms that the VM has been killed. Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Scheduler meeting and Icehouse Summit
I need to understand better what holistic scheduling means, but I agree with you that this is not exactly what Boris has raised as an issue. I don't have a rock solid design for what I want to do, but at least the objectives I want to achieve are that spinning up more schedulers increases your response time and ability to schedule perhaps at the cost of the accuracy of the answer (just good enough) and the need to retry your request against several scheduler threads. I will try to look for more resources to understand holistic scheduling a quick google search takes me to a bunch of EE and manufacturing engineering type papers. I'll do more research on this. However, this does fit under performance for sure, it is not unrelated at all. If there is a chance to incorporate this into a performance session I think this is where it belongs. -Mike Wilson On Mon, Oct 14, 2013 at 9:53 PM, Mike Spreitzer mspre...@us.ibm.com wrote: Yes, Rethinking Scheduler Design http://summit.openstack.org/cfp/details/34 is not the same as the performance issue that Boris raised. I think the former would be a natural consequence of moving to an optimization-based joint decision-making framework, because such a thing necessarily takes a good enough attitude. The issue Boris raised is more efficient tracking of the true state of resources, and I am interested in that issue too. A holistic scheduler needs such tracking, in addition to the needs of the individual services. Having multiple consumers makes the issue more interesting :-) Regards, Mike ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Does DB schema hygiene warrant long migrations?
So, I observe a consensus here of long migrations suckm +1 to that. I also observe a consensus that we need to get no-downtime schema changes working. It seems super important. Also +1 to that. Getting back to the original review, it got -2'd because Michael would like to make sure that the benefit outweighs the cost of the downtime. I completely agree with that, so far we've heard arguments from both Jay and Boris as to why this is faster/slower but I think some sort of evidence other than hearsay is needed. Can we get some sort of benchmark result that clearly illustrates the performance consequences of the migration in the long run? -Mike On Thu, Oct 24, 2013 at 4:53 PM, Boris Pavlovic bo...@pavlovic.me wrote: Michael, - pruning isn't done by the system automatically, so we have to assume it never happens We are working around it https://blueprints.launchpad.net/nova/+spec/db-purge-engine - we need to have a clearer consensus about what we think the maximum size of a nova deployment is. Are we really saying we don't support nova installs with a million instances? If so what is the maximum number of instances we're targeting? Having a top level size in mind isn't a bad thing, but I don't think we have one at the moment that we all agree on. Until that happens I'm going to continue targeting the largest databases people have told me about (plus a fudge factor). Rally https://wiki.openstack.org/wiki/Rally should help us to determine this. At this moment I can just use theoretical knowledges. (and they said even 1mln instances in current nova implementation won't work) Best regards, Boris Pavlovic On Fri, Oct 25, 2013 at 2:35 AM, Michael Still mi...@stillhq.com wrote: On Fri, Oct 25, 2013 at 9:07 AM, Boris Pavlovic bo...@pavlovic.me wrote: Johannes, +1, purging should help here a lot. Sure, but my point is more: - pruning isn't done by the system automatically, so we have to assume it never happens - we need to have a clearer consensus about what we think the maximum size of a nova deployment is. Are we really saying we don't support nova installs with a million instances? If so what is the maximum number of instances we're targeting? Having a top level size in mind isn't a bad thing, but I don't think we have one at the moment that we all agree on. Until that happens I'm going to continue targeting the largest databases people have told me about (plus a fudge factor). Michael -- Rackspace Australia ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] Shared network between specific tenants, but not all tenants?
+1 I also have tenants asking for this :-). I'm interested to see a blueprint. -Mike On Tue, Oct 29, 2013 at 1:24 PM, Jay Pipes jaypi...@gmail.com wrote: On 10/29/2013 02:25 PM, Justin Hammond wrote: We have been considering this and have some notes on our concept, but we haven't made a blueprint for it. I will speak amongst my group and find out what they think of making it more public. OK, cool, glad to know I'm not the only one with tenants asking for this :) Looking forward to a possible blueprint on this. Best, -jay On 10/29/13 12:26 PM, Jay Pipes jaypi...@gmail.com wrote: Hi Neutron devs, Are there any plans to support networks that are shared/routed only between certain tenants (not all tenants)? Thanks, -jay __**_ OpenStack-dev mailing list OpenStack-dev@lists.openstack.**org OpenStack-dev@lists.openstack.org http://lists.openstack.org/**cgi-bin/mailman/listinfo/**openstack-devhttp://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __**_ OpenStack-dev mailing list OpenStack-dev@lists.openstack.**org OpenStack-dev@lists.openstack.org http://lists.openstack.org/**cgi-bin/mailman/listinfo/**openstack-devhttp://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __**_ OpenStack-dev mailing list OpenStack-dev@lists.openstack.**org OpenStack-dev@lists.openstack.org http://lists.openstack.org/**cgi-bin/mailman/listinfo/**openstack-devhttp://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Does Nova really need an SQL database?
I'm not sure the problem is that we use a general SQL database. The problems as I see it are: -Multi-master in MySQL sucks. Complicated, problematic and not performant. Also, no great way to do multi-master over higher latency networks. -MySQL and Postgres require tuning to scale. -We tend to write queries badly when using SQLA. Ie. lots of code-level joins and filtering. -SQLA mapping is pretty slow. See Boris and Alexei's patch to compute_node_get_all for an example of how this can be worked around[1]. Also comstud's work on the mysql backend[2]. -Thread serialization problem in eventlet, also somewhat addressed by the mysql backend Some of these problems are addressed very well by some NOSQL DBs, specifically the multi-master problems just go away for the most part. However our general SQL databases provide some nice things like transactions that would require some more work on our end to do properly. All that being said, I am very interested in what NOSQL DBs can do for us. -Mike Wilson [1] https://review.openstack.org/#/c/43151/ [2] https://blueprints.launchpad.net/nova/+spec/db-mysqldb-impl On Mon, Nov 18, 2013 at 12:35 PM, Mike Spreitzer mspre...@us.ibm.comwrote: There were some concerns expressed at the summit about scheduler scalability in Nova, and a little recollection of Boris' proposal to keep the needed state in memory. I also heard one guy say that he thinks Nova does not really need a general SQL database, that a NOSQL database with a bit of denormalization and/or client-maintained secondary indices could suffice. Has that sort of thing been considered before? What is the community's level of interest in exploring that? Thanks, Mike ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Reg : Security groups implementation using openflows in quantum ovs plugin
Hi Kanthi, Just to reiterate what Kyle said, we do have an internal implementation using flows that looks very similar to security groups. Jun Park was the guy that wrote this and is looking to get it upstreamed. I think he'll be back in the office late next week. I'll point him to this thread when he's back. -Mike On Mon, Nov 18, 2013 at 3:39 PM, Kyle Mestery (kmestery) kmest...@cisco.com wrote: On Nov 18, 2013, at 4:26 PM, Kanthi P pavuluri.kan...@gmail.com wrote: Hi All, We are planning to implement quantum security groups using openflows for ovs plugin instead of iptables which is the case now. Doing so we can avoid the extra linux bridge which is connected between the vnet device and the ovs bridge, which is given as a work around since ovs bridge is not compatible with iptables. We are planning to create a blueprint and work on it. Could you please share your views on this Hi Kanthi: Overall, this idea is interesting and removing those extra bridges would certainly be nice. Some people at Bluehost gave a talk at the Summit [1] in which they explained they have done something similar, you may want to reach out to them since they have code for this internally already. The OVS plugin is in feature freeze during Icehouse, and will be deprecated in favor of ML2 [2] at the end of Icehouse. I would advise you to retarget your work at ML2 when running with the OVS agent instead. The Neutron team will not accept new features into the OVS plugin anymore. Thanks, Kyle [1] http://www.openstack.org/summit/openstack-summit-hong-kong-2013/session-videos/presentation/towards-truly-open-and-commoditized-software-defined-networks-in-openstack [2] https://wiki.openstack.org/wiki/Neutron/ML2 Thanks, Kanthi ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Reg : Security groups implementation using openflows in quantum ovs plugin
The current implementation is fairly generic, the plan is to get it into the ML2 plugin. -Mike On Tue, Nov 19, 2013 at 2:31 AM, Kanthi P pavuluri.kan...@gmail.com wrote: Hi All, Thanks for the response! Amir,Mike: Is your implementation being done according to ML2 plugin Regards, Kanthi On Tue, Nov 19, 2013 at 1:43 AM, Mike Wilson geekinu...@gmail.com wrote: Hi Kanthi, Just to reiterate what Kyle said, we do have an internal implementation using flows that looks very similar to security groups. Jun Park was the guy that wrote this and is looking to get it upstreamed. I think he'll be back in the office late next week. I'll point him to this thread when he's back. -Mike On Mon, Nov 18, 2013 at 3:39 PM, Kyle Mestery (kmestery) kmest...@cisco.com wrote: On Nov 18, 2013, at 4:26 PM, Kanthi P pavuluri.kan...@gmail.com wrote: Hi All, We are planning to implement quantum security groups using openflows for ovs plugin instead of iptables which is the case now. Doing so we can avoid the extra linux bridge which is connected between the vnet device and the ovs bridge, which is given as a work around since ovs bridge is not compatible with iptables. We are planning to create a blueprint and work on it. Could you please share your views on this Hi Kanthi: Overall, this idea is interesting and removing those extra bridges would certainly be nice. Some people at Bluehost gave a talk at the Summit [1] in which they explained they have done something similar, you may want to reach out to them since they have code for this internally already. The OVS plugin is in feature freeze during Icehouse, and will be deprecated in favor of ML2 [2] at the end of Icehouse. I would advise you to retarget your work at ML2 when running with the OVS agent instead. The Neutron team will not accept new features into the OVS plugin anymore. Thanks, Kyle [1] http://www.openstack.org/summit/openstack-summit-hong-kong-2013/session-videos/presentation/towards-truly-open-and-commoditized-software-defined-networks-in-openstack [2] https://wiki.openstack.org/wiki/Neutron/ML2 Thanks, Kanthi ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Does Nova really need an SQL database?
I've been thinking about this use case for a DHT-like design, I think I want to do what other people have alluded to here and try and intercept problematic requests like this one in some sort of pre sending to ring-segment stage. In this case the pre-stage could decide to send this off to a scheduler that has a more complete view of the world. Alternatively, don't make a single request for 50 instances, just send 50 requests for one? Is that a viable thing to do for this use case? -Mike On Tue, Nov 19, 2013 at 7:03 PM, Joshua Harlow harlo...@yahoo-inc.comwrote: At yahoo at least 50+ simultaneous will be the common case (maybe we are special). Think of what happens on www.yahoo.com say on the olympics, news.yahoo.com could need 50+ very very quickly (especially if say a gold medal is won by some famous person). So I wouldn't discount those being the common case (may not be common for some, but is common for others). In fact any website with spurious/spikey traffic will have the same desire; so it might be a target use-case for website like companies (or ones that can't upfront predict spikes). Overall though I think what u said about 'don't fill it up' is good general knowledge. Filling up stuff beyond a certain threshold is dangerous just in general (one should only push the limits so far before madness). On 11/19/13 4:08 PM, Clint Byrum cl...@fewbar.com wrote: Excerpts from Chris Friesen's message of 2013-11-19 12:18:16 -0800: On 11/19/2013 01:51 PM, Clint Byrum wrote: Excerpts from Chris Friesen's message of 2013-11-19 11:37:02 -0800: On 11/19/2013 12:35 PM, Clint Byrum wrote: Each scheduler process can own a different set of resources. If they each grab instance requests in a round-robin fashion, then they will fill their resources up in a relatively well balanced way until one scheduler's resources are exhausted. At that time it should bow out of taking new instances. If it can't fit a request in, it should kick the request out for retry on another scheduler. In this way, they only need to be in sync in that they need a way to agree on who owns which resources. A distributed hash table that gets refreshed whenever schedulers come and go would be fine for that. That has some potential, but at high occupancy you could end up refusing to schedule something because no one scheduler has sufficient resources even if the cluster as a whole does. I'm not sure what you mean here. What resource spans multiple compute hosts? Imagine the cluster is running close to full occupancy, each scheduler has room for 40 more instances. Now I come along and issue a single request to boot 50 instances. The cluster has room for that, but none of the schedulers do. You're assuming that all 50 come in at once. That is only one use case and not at all the most common. This gets worse once you start factoring in things like heat and instance groups that will want to schedule whole sets of resources (instances, IP addresses, network links, cinder volumes, etc.) at once with constraints on where they can be placed relative to each other. Actually that is rather simple. Such requests have to be serialized into a work-flow. So if you say give me 2 instances in 2 different locations then you allocate 1 instance, and then another one with 'not_in_location(1)' as a condition. Actually, you don't want to serialize it, you want to hand the whole set of resource requests and constraints to the scheduler all at once. If you do them one at a time, then early decisions made with less-than-complete knowledge can result in later scheduling requests failing due to being unable to meet constraints, even if there are actually sufficient resources in the cluster. The VM ensembles document at https://docs.google.com/document/d/1bAMtkaIFn4ZSMqqsXjs_riXofuRvApa--qo4U Twsmhw/edit?pli=1 has a good example of how one-at-a-time scheduling can cause spurious failures. And if you're handing the whole set of requests to a scheduler all at once, then you want the scheduler to have access to as many resources as possible so that it has the highest likelihood of being able to satisfy the request given the constraints. This use case is real and valid, which is why I think there is room for multiple approaches. For instance the situation you describe can also be dealt with by just having the cloud stay under-utilized and accepting that when you get over a certain percentage utilized spurious failures will happen. We have a similar solution in the ext3 filesystem on Linux. Don't fill it up, or suffer a huge performance penalty. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev
Re: [openstack-dev] [Nova] Does Nova really need an SQL database?
I agree heartily with the availability and resiliency aspect. For me, that is the biggest reason to consider a NOSQL backend. The other potential performance benefits are attractive to me also. -Mike On Wed, Nov 20, 2013 at 9:06 AM, Soren Hansen so...@linux2go.dk wrote: 2013/11/18 Mike Spreitzer mspre...@us.ibm.com: There were some concerns expressed at the summit about scheduler scalability in Nova, and a little recollection of Boris' proposal to keep the needed state in memory. I also heard one guy say that he thinks Nova does not really need a general SQL database, that a NOSQL database with a bit of denormalization and/or client-maintained secondary indices could suffice. I may have said something along those lines. Just to clarify -- since you started this post by talking about scheduler scalability -- the main motivation for using a non-SQL backend isn't scheduler scalability, it's availability and resilience. I just don't accept the failure modes that MySQL (and derivatives such as Galera) impose. Has that sort of thing been considered before? It's been talked about on and off since... well, probably since we started this project. What is the community's level of interest in exploring that? The session on adding a backend using a non-SQL datastore was pretty well attended. -- Soren Hansen | http://linux2go.dk/ Ubuntu Developer | http://www.ubuntu.com/ OpenStack Developer | http://www.openstack.org/ ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Reg : Security groups implementation using openflows in quantum ovs plugin
Adding Jun to this thread since gmail is failing him. On Tue, Nov 19, 2013 at 10:44 AM, Amir Sadoughi amir.sadou...@rackspace.com wrote: Yes, my work has been on ML2 with neutron-openvswitch-agent. I’m interested to see what Jun Park has. I might have something ready before he is available again, but would like to collaborate regardless. Amir On Nov 19, 2013, at 3:31 AM, Kanthi P pavuluri.kan...@gmail.com wrote: Hi All, Thanks for the response! Amir,Mike: Is your implementation being done according to ML2 plugin Regards, Kanthi On Tue, Nov 19, 2013 at 1:43 AM, Mike Wilson geekinu...@gmail.com wrote: Hi Kanthi, Just to reiterate what Kyle said, we do have an internal implementation using flows that looks very similar to security groups. Jun Park was the guy that wrote this and is looking to get it upstreamed. I think he'll be back in the office late next week. I'll point him to this thread when he's back. -Mike On Mon, Nov 18, 2013 at 3:39 PM, Kyle Mestery (kmestery) kmest...@cisco.com wrote: On Nov 18, 2013, at 4:26 PM, Kanthi P pavuluri.kan...@gmail.com wrote: Hi All, We are planning to implement quantum security groups using openflows for ovs plugin instead of iptables which is the case now. Doing so we can avoid the extra linux bridge which is connected between the vnet device and the ovs bridge, which is given as a work around since ovs bridge is not compatible with iptables. We are planning to create a blueprint and work on it. Could you please share your views on this Hi Kanthi: Overall, this idea is interesting and removing those extra bridges would certainly be nice. Some people at Bluehost gave a talk at the Summit [1] in which they explained they have done something similar, you may want to reach out to them since they have code for this internally already. The OVS plugin is in feature freeze during Icehouse, and will be deprecated in favor of ML2 [2] at the end of Icehouse. I would advise you to retarget your work at ML2 when running with the OVS agent instead. The Neutron team will not accept new features into the OVS plugin anymore. Thanks, Kyle [1] http://www.openstack.org/summit/openstack-summit-hong-kong-2013/session-videos/presentation/towards-truly-open-and-commoditized-software-defined-networks-in-openstack [2] https://wiki.openstack.org/wiki/Neutron/ML2 Thanks, Kanthi ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Icehouse mid-cycle meetup
Hotel information has been posted. Look forward to seeing you all in February :-). -Mike On Mon, Nov 25, 2013 at 8:14 AM, Russell Bryant rbry...@redhat.com wrote: Greetings, Other groups have started doing mid-cycle meetups with success. I've received significant interest in having one for Nova. I'm now excited to announce some details. We will be holding a mid-cycle meetup for the compute program from February 10-12, 2014, in Orem, UT. Huge thanks to Bluehost for hosting us! Details are being posted to the event wiki page [1]. If you plan to attend, please register. Hotel recommendations with booking links will be posted soon. Please let me know if you have any questions. Thanks, [1] https://wiki.openstack.org/wiki/Nova/IcehouseCycleMeetup -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] A simple way to improve nova scheduler
Just some added info for that talk, we are using qpid as our messaging backend. I have no data for RabbitMQ, but our schedulers are _always_ behind on processing updates. It may be different with rabbit. -Mike On Tue, Jul 23, 2013 at 1:56 PM, Joe Gordon joe.gord...@gmail.com wrote: On Jul 23, 2013 3:44 PM, Ian Wells ijw.ubu...@cack.org.uk wrote: * periodic updates can overwhelm things. Solution: remove unneeded updates, most scheduling data only changes when an instance does some state change. It's not clear that periodic updates do overwhelm things, though. Boris ran the tests. Apparently 10k nodes updating once a minute extend the read query by ~10% (the main problem being the read query is abysmal in the first place). I don't know how much of the rest of the infrastructure was involved in his test, though (RabbitMQ, Conductor). A great openstack at scale talk, that covers the scheduler http://www.bluehost.com/blog/bluehost/bluehost-presents-operational-case-study-at-openstack-summit-2111 There are reasonably solid reasons why we would want an alternative to the DB backend, but I'm not sure the update rate is one of them. If we were going for an alternative the obvious candidate to my mind would be something like ZooKeeper (particularly since in some setups it's already a channel between the compute hosts and the control server). -- Ian. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Python overhead for rootwrap
In my opinion: 1. Stop using rootwrap completely and get strong argument checking support into sudo (regex). 2. Some sort of long lived rootwrap process, either forked by the service that want's to shell out or a general purpose rootwrapd type thing. I prefer #1 because it's surprising that sudo doesn't do this type of thing already. It _must_ be something that everyone wants. But #2 may be quicker and easier to implement, my $.02. -Mike Wilson On Thu, Jul 25, 2013 at 2:21 PM, Joe Gordon joe.gord...@gmail.com wrote: Hi All, We have recently hit some performance issues with nova-network. It turns out the root cause of this was we do roughly 20 rootwrapped shell commands, many inside of global locks. (https://bugs.launchpad.net/oslo/+bug/1199433 ) It turns out starting python itself, has a fairly significant overhead when compared to the run time of many of the binary commands we execute. For example: $ time python -c print 'test' test real 0m0.023s user 0m0.016s sys 0m0.004s $ time ip a ... real 0m0.003s user 0m0.000s sys 0m0.000s While we have removed the extra overhead of using entry points, we are now hitting the overhead of just shelling out to python. While there are many possible ways to reduce this issue, such as reducing the number of rootwrapped calls and making locks finer grain, I think its worth exploring alternates to the current rootwrap model. Any ideas? I am sending this email out to get the discussion started. best, Joe Gordon ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] Need feedback for approach on bp-db-slave-handle
So back at the Portland summit myself and Jun Park presented about some of our difficulties scaling Openstack with the Folsom release: http://www.openstack.org/summit/portland-2013/session-videos/presentation/using-openstack-in-a-traditional-hosting-environment . One of the main obstacles we ran into was the amount of chattiness to MySQL. As we were deploying literally hundreds of nodes per day we weren't able to dig in and weed out unnecessary traffic or delve into any type of optimization approach. Instead we utilized a well known database scaling paradigm: shoving off reads to replication slaves and only sending reads which are sensitive to replication latency to the write master. I feel like replication, be it in MySQL or Postgres, is a fairly well understood concept and has lots of tools and documentation around it. The only hard part IMO about scaling this way is that you need to audit your queries to understand which could be split out, but you also need to understand the intricacies of your application to understand when it is inappropriate to send a heavy query to a read slave. In other words, some queries hurt a lot, but we can't _always_ just send them to read slaves. So rather than talk about it, here's some example code. Please look at the reviews below when you see me doing unfamiliar things with context, slave_connection, etc. Example slaveififed _sync_power_states: https://review.openstack.org/#/c/38872 Connection and session code in oslo-incubator: https://review.openstack.org/#/c/29464/ Change to Context: https://review.openstack.org/#/c/30363/ Decorator for sqlalchemy api: https://review.openstack.org/#/c/30370/ In my example my DBA is upset because he's getting this query from every node that we have every periodic_interval. However, it wouldn't be good for me to simply send every call to nova.db.sqlalchemy.api.instance_get_all_by_host to a read slave. Some parts of the codebase are absolutely not tolerant of data that is possibly a few hundred milliseconds out of sync with the master. So we need a way to indicate you hit the slave this time, but not other times. That's where the lag_tolerant context comes in. Since context is passed all the way through the stack to the DB layer we can indicate that we are tolerant of laggy data and that's not going to be changed even if the call goes over RPC. I'd appreciate any feedback on this approach, I have really only discussed it with Devananda van der Veen and Russell Bryant briefly but they have been extremely helpful. This hopefully get some more eyes on it, so yeah, fire away! -Mike ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Experiences of using Neutron in large scale
Kumar, How large of a deployment are you considering it for? We've run Neutron in a fairly large environment (10k+ nodes) for a year now and have learned some interesting lessons. We use a modified Openvswitch plugin and as such have no experience with the Nicira plugin. I think the largest single problem that we have as it pertains to scalability are the race conditions in neutron-server. Allocating IPs, network, ports etc tend to have some racey behaviors. I feel like many of these issues are being addressed by neutron developers, but also Neutron is very viable for large-scale production today. For instance most of the race conditions that I mention can be averted if you aren't writing to the database concurrently. You could designate ONE neutron-server as the write server and the rest as read, it's a little tricky to do because you have to have a router in front of them all or reroute requests, but the API set is not very large so a very doable task. That being said, in our environment we use a single neutron-server with another standing by as backup. It's not as performant as we'd like it to be, but it hasn't stopped us from growing so far. -Mike Wilson P.S. There is a presentation from the Portland summit that myself and Jun Park did. In it we talk about some of the issues around scale although neutron (quantum at the time) is a smaller part of the talk. : http://www.openstack.org/summit/portland-2013/session-videos/presentation/using-openstack-in-a-traditional-hosting-environment . On Wed, Oct 2, 2013 at 11:04 AM, Kumar chvs...@gmail.com wrote: Hi, We are considering to run openstack Neutron in a large scale deployment. I would like to know community experience and suggestions. To get to know the quality I am going through neutron bugs( I assume that is the best way to know the quality) Some of them are real concerning like below bugs https://bugs.launchpad.net/neutron/+bug/1211915 https://bugs.launchpad.net/neutron/+bug/1230407 https://bugs.launchpad.net/neutron/+bug/121 The bug 1211915 is raised for simple tempest tests,whats about huge deployments? I am told even vendor neutron plugins too have similar issues when we create tens of instances in single click on horizon. And people see too many connection timeouts in quantum service logs with vendor plugins as well. I was told that some were struck with nova-network as there is no support yet to migrate Neutron and they could not take advantage of new network services. I would like to know community thinking on the same. Please note that I am not concerned on fix availability. Thanks, -Kumar ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev