Mike, Thank you for saying all that you said above.
Best regards, Boris Pavlovic On Tue, May 12, 2015 at 2:35 AM, Clint Byrum <cl...@fewbar.com> wrote: > Excerpts from Mike Bayer's message of 2015-05-11 15:44:30 -0700: > > > > On 5/11/15 5:25 PM, Robert Collins wrote: > > > > > > Details: Skip over this bit if you know it all already. > > > > > > The GIL plays a big factor here: if you want to scale the amount of > > > CPU available to a Python service, you have two routes: > > > A) move work to a different process through some RPC - be that DB's > > > using SQL, other services using oslo.messaging or HTTP - whatever. > > > B) use C extensions to perform work in threads - e.g. openssl context > > > processing. > > > > > > To increase concurrency you can use threads, eventlet, asyncio, > > > twisted etc - because within a single process *all* Python bytecode > > > execution happens inside the GIL lock, so you get at most one CPU for > > > a CPU bound workload. For an IO bound workload, you can fit more work > > > in by context switching within that one CPU capacity. And - the GIL is > > > a poor scheduler, so at the limit - an IO bound workload where the IO > > > backend has more capacity than we have CPU to consume it within our > > > process, you will run into priority inversion and other problems. > > > [This varies by Python release too]. > > > > > > request_duration = time_in_cpu + time_blocked > > > request_cpu_utilisation = time_in_cpu/request_duration > > > cpu_utilisation = concurrency * request_cpu_utilisation > > > > > > Assuming that we don't want any one process to spend a lot of time at > > > 100% - to avoid such at-the-limit issues, lets pick say 80% > > > utilisation, or a safety factor of 0.2. If a single request consumes > > > 50% of its duration waiting on IO, and 50% of its duration executing > > > bytecode, we can only run one such request concurrently without > > > hitting 100% utilisations. (2*0.5 CPU == 1). For a request that spends > > > 75% of its duration waiting on IO and 25% on CPU, we can run 3 such > > > requests concurrently without exceeding our target of 80% utilisation: > > > (3*0.25=0.75). > > > > > > What we have today in our standard architecture for OpenStack is > > > optimised for IO bound workloads: waiting on the > > > network/subprocesses/disk/libvirt etc. Running high numbers of > > > eventlet handlers in a single process only works when the majority of > > > the work being done by a handler is IO. > > > > Everything stated here is great, however in our situation there is one > > unfortunate fact which renders it completely incorrect at the moment. > > I'm still puzzled why we are getting into deep think sessions about the > > vagaries of the GIL and async when there is essentially a full-on > > red-alert performance blocker rendering all of this discussion useless, > > so I must again remind us: what we have *today* in Openstack is *as > > completely un-optimized as you can possibly be*. > > > > The most GIL-heavy nightmare CPU bound task you can imagine running on > > 25 threads on a ten year old Pentium will run better than the Openstack > > we have today, because we are running a C-based, non-eventlet patched DB > > library within a single OS thread that happens to use eventlet, but the > > use of eventlet is totally pointless because right now it blocks > > completely on all database IO. All production Openstack applications > > today are fully serialized to only be able to emit a single query to the > > database at a time; for each message sent, the entire application blocks > > an order of magnitude more than it would under the GIL waiting for the > > database library to send a message to MySQL, waiting for MySQL to send a > > response including the full results, waiting for the database to unwrap > > the response into Python structures, and finally back to the Python > > space, where we can send another database message and block the entire > > application and all greenlets while this single message proceeds. > > > > To share a link I've already shared about a dozen times here, here's > > some tests under similar conditions which illustrate what that > > concurrency looks like: > > > http://www.diamondtin.com/2014/sqlalchemy-gevent-mysql-python-drivers-comparison/ > . > > MySQLdb takes *20 times longer* to handle the work of 100 sessions than > > PyMySQL when it's inappropriately run under gevent, when there is > > modestly high concurrency happening. When I talk about moving to > > threads, this is not a "won't help or hurt" kind of issue, at the moment > > it's a change that will immediately allow massive improvement to the > > performance of all Openstack applications instantly. We need to change > > the DB library or dump eventlet. > > > > As far as if we should dump eventlet or use a pure-Python DB library, my > > contention is that a thread based + C database library will outperform > > an eventlet + Python-based database library. Additionally, if we make > > either change, when we do so we may very well see all kinds of new > > database-concurrency related bugs in our apps too, because we will be > > talking to the database much more intensively all the sudden; it is my > > opinion that a traditional threading model will be an easier environment > > to handle working out the approach to these issues; we have to assume > > "concurrency at any time" in any case because we run multiple instances > > of Nova etc. at the same time. At the end of the day, we aren't going > > to see wildly better performance with one approach over the other in any > > case, so we should pick the one that is easier to develop, maintain, and > > keep stable. > > > > Mike, I agree with the entire paragraph above, and I've been surprised to > see the way this thread has gone with so much speculation. Optimization > can be such a divisive thing, I think we need to be mindful of that. > > Anyway, there is additional thought that might change the decision > a bit. There is one "pro" to changing to use pymsql vs. changing to > use threads, and that is that it isolates the change to only database > access. Switching to threading means introducing threads to every piece > of code we might touch while multiple threads are active. > > It really seems worth it to see if I/O bound portions of OpenStack > become more responsive with pymysql before embarking on a change to the > concurrency model. If it doesn't, not much harm done, and if it does, > but makes us CPU bound, well then we have even more of a reason to set > out on such a large task. > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev