> All production Openstack applications today are fully serialized to only be able to emit a single query to the database at a time; True. That's why any deployment configures tons (tens) of workers of any significant service.
> When I talk about moving to threads, this is not a "won't help or hurt" kind of issue, at the moment it's a change that will immediately allow massive improvement to the performance of all Openstack applications instantly. Not sure If it will give much benefit over separate processes. I guess we don't configure many worker for gate testing (at least, neutron still doesn't do it), so there could be an improvement, but I guess to enable multithreading we would need to fix the same issues that prevented us from configuring multiple workers in the gate, plus possibly more. > We need to change the DB library or dump eventlet. I'm +1 for the 1st option. Other option, which is multithreading will most certainly bring concurrency issues other than database. Thanks, Eugene. On Mon, May 11, 2015 at 4:46 PM, Boris Pavlovic <bo...@pavlovic.me> wrote: > Mike, > > Thank you for saying all that you said above. > > Best regards, > Boris Pavlovic > > On Tue, May 12, 2015 at 2:35 AM, Clint Byrum <cl...@fewbar.com> wrote: > >> Excerpts from Mike Bayer's message of 2015-05-11 15:44:30 -0700: >> > >> > On 5/11/15 5:25 PM, Robert Collins wrote: >> > > >> > > Details: Skip over this bit if you know it all already. >> > > >> > > The GIL plays a big factor here: if you want to scale the amount of >> > > CPU available to a Python service, you have two routes: >> > > A) move work to a different process through some RPC - be that DB's >> > > using SQL, other services using oslo.messaging or HTTP - whatever. >> > > B) use C extensions to perform work in threads - e.g. openssl context >> > > processing. >> > > >> > > To increase concurrency you can use threads, eventlet, asyncio, >> > > twisted etc - because within a single process *all* Python bytecode >> > > execution happens inside the GIL lock, so you get at most one CPU for >> > > a CPU bound workload. For an IO bound workload, you can fit more work >> > > in by context switching within that one CPU capacity. And - the GIL is >> > > a poor scheduler, so at the limit - an IO bound workload where the IO >> > > backend has more capacity than we have CPU to consume it within our >> > > process, you will run into priority inversion and other problems. >> > > [This varies by Python release too]. >> > > >> > > request_duration = time_in_cpu + time_blocked >> > > request_cpu_utilisation = time_in_cpu/request_duration >> > > cpu_utilisation = concurrency * request_cpu_utilisation >> > > >> > > Assuming that we don't want any one process to spend a lot of time at >> > > 100% - to avoid such at-the-limit issues, lets pick say 80% >> > > utilisation, or a safety factor of 0.2. If a single request consumes >> > > 50% of its duration waiting on IO, and 50% of its duration executing >> > > bytecode, we can only run one such request concurrently without >> > > hitting 100% utilisations. (2*0.5 CPU == 1). For a request that spends >> > > 75% of its duration waiting on IO and 25% on CPU, we can run 3 such >> > > requests concurrently without exceeding our target of 80% utilisation: >> > > (3*0.25=0.75). >> > > >> > > What we have today in our standard architecture for OpenStack is >> > > optimised for IO bound workloads: waiting on the >> > > network/subprocesses/disk/libvirt etc. Running high numbers of >> > > eventlet handlers in a single process only works when the majority of >> > > the work being done by a handler is IO. >> > >> > Everything stated here is great, however in our situation there is one >> > unfortunate fact which renders it completely incorrect at the moment. >> > I'm still puzzled why we are getting into deep think sessions about the >> > vagaries of the GIL and async when there is essentially a full-on >> > red-alert performance blocker rendering all of this discussion useless, >> > so I must again remind us: what we have *today* in Openstack is *as >> > completely un-optimized as you can possibly be*. >> > >> > The most GIL-heavy nightmare CPU bound task you can imagine running on >> > 25 threads on a ten year old Pentium will run better than the Openstack >> > we have today, because we are running a C-based, non-eventlet patched DB >> > library within a single OS thread that happens to use eventlet, but the >> > use of eventlet is totally pointless because right now it blocks >> > completely on all database IO. All production Openstack applications >> > today are fully serialized to only be able to emit a single query to the >> > database at a time; for each message sent, the entire application blocks >> > an order of magnitude more than it would under the GIL waiting for the >> > database library to send a message to MySQL, waiting for MySQL to send a >> > response including the full results, waiting for the database to unwrap >> > the response into Python structures, and finally back to the Python >> > space, where we can send another database message and block the entire >> > application and all greenlets while this single message proceeds. >> > >> > To share a link I've already shared about a dozen times here, here's >> > some tests under similar conditions which illustrate what that >> > concurrency looks like: >> > >> http://www.diamondtin.com/2014/sqlalchemy-gevent-mysql-python-drivers-comparison/ >> . >> > MySQLdb takes *20 times longer* to handle the work of 100 sessions than >> > PyMySQL when it's inappropriately run under gevent, when there is >> > modestly high concurrency happening. When I talk about moving to >> > threads, this is not a "won't help or hurt" kind of issue, at the moment >> > it's a change that will immediately allow massive improvement to the >> > performance of all Openstack applications instantly. We need to change >> > the DB library or dump eventlet. >> > >> > As far as if we should dump eventlet or use a pure-Python DB library, my >> > contention is that a thread based + C database library will outperform >> > an eventlet + Python-based database library. Additionally, if we make >> > either change, when we do so we may very well see all kinds of new >> > database-concurrency related bugs in our apps too, because we will be >> > talking to the database much more intensively all the sudden; it is my >> > opinion that a traditional threading model will be an easier environment >> > to handle working out the approach to these issues; we have to assume >> > "concurrency at any time" in any case because we run multiple instances >> > of Nova etc. at the same time. At the end of the day, we aren't going >> > to see wildly better performance with one approach over the other in any >> > case, so we should pick the one that is easier to develop, maintain, and >> > keep stable. >> > >> >> Mike, I agree with the entire paragraph above, and I've been surprised to >> see the way this thread has gone with so much speculation. Optimization >> can be such a divisive thing, I think we need to be mindful of that. >> >> Anyway, there is additional thought that might change the decision >> a bit. There is one "pro" to changing to use pymsql vs. changing to >> use threads, and that is that it isolates the change to only database >> access. Switching to threading means introducing threads to every piece >> of code we might touch while multiple threads are active. >> >> It really seems worth it to see if I/O bound portions of OpenStack >> become more responsive with pymysql before embarking on a change to the >> concurrency model. If it doesn't, not much harm done, and if it does, >> but makes us CPU bound, well then we have even more of a reason to set >> out on such a large task. >> >> __________________________________________________________________________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev