Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
Arguably if we're actually seeing performance issues then it's not a distraction but rather a real problem that needs fixing. The important take-away from the thread is that we aren't anywhere near hitting python limits though. Our main bottleneck is due to the fact that we are serializing all DB requests in a DB-heavy codebase. On Mon, May 11, 2015 at 9:18 PM, Chris Friesen chris.frie...@windriver.com wrote: On 05/11/2015 08:22 PM, Jay Pipes wrote: c) Many OpenStack services, including Nova, Cinder, and Neutron, when looked at from a thousand-foot level, are little more than glue code that pipes out to a shell to execute system commands (sorry, but it's true). No apologies necessary. :) So, bottom line for me: focus on the things that will have the biggest impact to long-term cost reduction of our codebase. +1 So, to me, the highest priority performance and scale fixes actually have to do with the simplification of our subsystems and architecture, not with whether we use mysql-python, PyMySQL, Python vs. Scala vs. Rust, or any other distractions. Arguably if we're actually seeing performance issues then it's not a distraction but rather a real problem that needs fixing. But I agree that we shouldn't be trying to optimize the performance pf code that isn't causing problems. Chris __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Kevin Benton __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
On 5/11/15 9:17 PM, Robert Collins wrote: On 12 May 2015 at 10:44, Mike Bayer mba...@redhat.com wrote: What we have today in our standard architecture for OpenStack is optimised for IO bound workloads: waiting on the network/subprocesses/disk/libvirt etc. Running high numbers of eventlet handlers in a single process only works when the majority of the work being done by a handler is IO. Everything stated here is great, however in our situation there is one unfortunate fact which renders it completely incorrect at the moment. I'm still puzzled why we are getting into deep think sessions about the vagaries of the GIL and async when there is essentially a full-on red-alert performance blocker rendering all of this discussion useless, so I must again remind us: what we have *today* in Openstack is *as completely un-optimized as you can possibly be*. Sorry if I seems like I went on a tangent, but choosing a concurrency model in Python, which a lot of this discussion has been about, is inextricably linked to the workload being tackled. The point of my tl;dr was that using threads - which gets us out of the pit below - is fine for most of our workloads and irrelevant to the actual issues in the other ones. Clearly that didn't come across. - Sorry. Robert - Other people noted my fast takeoff as well so i think I saw GIL and lots of thoughtful calculations and after that, my reading comprehension is dulled by the fog of my own angst :).I'll try to slow down more next time. The most GIL-heavy nightmare CPU bound task you can imagine running on 25 threads on a ten year old Pentium will run better than the Openstack we have today, because we are running a C-based, non-eventlet patched DB library within a single OS thread that happens to use eventlet, but the use of eventlet is totally pointless because right now it blocks completely on all database IO. To confirm my understanding: this library releases the GIL, but because we only have one thread, we don't get more work done. Yes, that sucks. And your tl;dr is that we need to either use an eventlet ready library or not use eventlet's greenthreads, either of which I support as a short term rectification. yes, the GIL is released within the MySQLdb C routines that are primarily focused on IO here. Robert's analysis talks about various at the limit issues, but I was They tend to turn up at scale. You get 100 requests a day out of 5 million that are inexplicably slow, and eventually you have enough data around the situation to try an experiment, and lo and behold the problem goes away. They don't disagree with the argument you're making though - this is just the bigger context, when folk go to deploy our (real threads || eventlet friendly DB library) code, how many processes will they need? It's been pointed out separately that Openstack already uses a lot of processes, and even now with our serialized DB access per-process we still achieve concurrency through this. So by all means, let's keep using processes, that is always a good thing although it does present the challenge that we have a lot of DB connections opened as a result (because we use pooling). FWIW, I think moving to an eventlet friendly library should be the first step because it can be done much more rapidly and with arguably less risk. Yes I'm not really sure why we aren't just changing mysql+mysqldb:// to mysql+pymysql:// in our config files right now. Because this would also solve the Py3K issue for the time being. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
- Original Message - From: Robert Collins robe...@robertcollins.net To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Sent: Tuesday, May 12, 2015 3:06:21 AM Subject: Re: [openstack-dev] [all] Replace mysql-python with mysqlclient On 12 May 2015 at 10:12, Attila Fazekas afaze...@redhat.com wrote: If you can illustrate a test script that demonstrates the actual failing of OS threads that does not occur greenlets here, that would make it immediately apparent what it is you're getting at here. http://www.fpaste.org/220824/raw/ I just put together hello word C example and a hello word threading example, and replaced the print with sleep(3). When I use the sleep(3) from python, the 5 thread program runs in ~3 second, when I use the sleep(3) from native code, it runs ~15 sec. So yes, it is very likely a GIL lock wait related issue, when the native code is not assisting. Your test code isn't releasing the GIL here, and I'd expect C DB drivers to be releasing the GIL: you've illustrated how a C extension can hold the GIL, but not whether thats happening. Yes. And you are right the C driver wrapper releases the GIL at every important mysql C driver call. (Py_BEGIN_ALLOW_THREADS) Good to know :) Do you need a DB example, by using the mysql C driver, and waiting in an actual I/O primitive ? waiting in an I/O primitive is fine as long as the GIL has been released. http://www.fpaste.org/221101/ Actually the eventlet version of the play/test code is producing the mentioned error: 'Lock wait timeout exceeded; try restarting transaction'. I have not seen the above issue with the regular python threads. The driver does not cooperates with the event hub :( PS.: The 'Deadlock found when trying to get lock; try restarting transaction' would be different situation, and it is not related to the eventlet issue. -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
- Original Message - From: John Garbutt j...@johngarbutt.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Cc: Dan Smith d...@danplanet.com Sent: Saturday, May 9, 2015 12:45:26 PM Subject: Re: [openstack-dev] [all] Replace mysql-python with mysqlclient On 30 April 2015 at 18:54, Mike Bayer mba...@redhat.com wrote: On 4/30/15 11:16 AM, Dan Smith wrote: There is an open discussion to replace mysql-python with PyMySQL, but PyMySQL has worse performance: https://wiki.openstack.org/wiki/PyMySQL_evaluation My major concern with not moving to something different (i.e. not based on the C library) is the threading problem. Especially as we move in the direction of cellsv2 in nova, not blocking the process while waiting for a reply from mysql is going to be critical. Further, I think that we're likely to get back a lot of performance from a supports-eventlet database connection because of the parallelism that conductor currently can only provide in exchange for the footprint of forking into lots of workers. If we're going to move, shouldn't we be looking at something that supports our threading model? yes, but at the same time, we should change our threading model at the level of where APIs are accessed to refer to a database, at the very least using a threadpool behind eventlet. CRUD-oriented database access is faster using traditional threads, even in Python, than using an eventlet-like system or using explicit async. The tests at http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/ show this.With traditional threads, we can stay on the C-based MySQL APIs and take full advantage of their speed. Sorry to go back in time, I wanted to go back to an important point. It seems we have three possible approaches: * C lib and eventlet, blocks whole process * pure python lib, and eventlet, eventlet does its thing * go for a C lib and dispatch calls via thread pool * go with pure C protocol lib, which explicitly using `python patch-able` I/O function (Maybe others like.: threading, mutex, sleep ..) * go with pure C protocol lib and the python part explicitly call for `decode` and `encode`, the C part just do CPU intensive operations, and it never calls for I/O primitives . We have a few problems: * performance sucks, we have to fork lots of nova-conductors and api nodes * need to support python2.7 and 3.4, but its not currently possible with the lib we use? * want to pick a lib that we can fix when there are issues, and work to improve It sounds like: * currently do the first one, it sucks, forking nova-conductor helps * seems we are thinking the second one might work, we sure get py3.4 + py2.7 support * the last will mean more work, but its likely to be more performant * worried we are picking a unsupported lib with little future I am leaning towards us moving to making DB calls with a thread pool and some fast C based library, so we get the 'best' performance. Is that a crazy thing to be thinking? What am I missing here? Using the python socket from C code: https://github.com/esnme/ultramysql/blob/master/python/io_cpython.c#L100 Also possible to implement a mysql driver just as a protocol parser, and you are free to use you favorite event based I/O strategy (direct epoll usage) even without eventlet (or similar). The issue with ultramysql, it does not implements the `standard` python DB API, so you would need to add an extra wrapper to SQLAlchemy. Thanks, John __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
On 5/11/15 9:58 AM, Attila Fazekas wrote: - Original Message - From: John Garbutt j...@johngarbutt.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Cc: Dan Smith d...@danplanet.com Sent: Saturday, May 9, 2015 12:45:26 PM Subject: Re: [openstack-dev] [all] Replace mysql-python with mysqlclient On 30 April 2015 at 18:54, Mike Bayer mba...@redhat.com wrote: On 4/30/15 11:16 AM, Dan Smith wrote: There is an open discussion to replace mysql-python with PyMySQL, but PyMySQL has worse performance: https://wiki.openstack.org/wiki/PyMySQL_evaluation My major concern with not moving to something different (i.e. not based on the C library) is the threading problem. Especially as we move in the direction of cellsv2 in nova, not blocking the process while waiting for a reply from mysql is going to be critical. Further, I think that we're likely to get back a lot of performance from a supports-eventlet database connection because of the parallelism that conductor currently can only provide in exchange for the footprint of forking into lots of workers. If we're going to move, shouldn't we be looking at something that supports our threading model? yes, but at the same time, we should change our threading model at the level of where APIs are accessed to refer to a database, at the very least using a threadpool behind eventlet. CRUD-oriented database access is faster using traditional threads, even in Python, than using an eventlet-like system or using explicit async. The tests at http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/ show this.With traditional threads, we can stay on the C-based MySQL APIs and take full advantage of their speed. Sorry to go back in time, I wanted to go back to an important point. It seems we have three possible approaches: * C lib and eventlet, blocks whole process * pure python lib, and eventlet, eventlet does its thing * go for a C lib and dispatch calls via thread pool * go with pure C protocol lib, which explicitly using `python patch-able` I/O function (Maybe others like.: threading, mutex, sleep ..) * go with pure C protocol lib and the python part explicitly call for `decode` and `encode`, the C part just do CPU intensive operations, and it never calls for I/O primitives . We have a few problems: * performance sucks, we have to fork lots of nova-conductors and api nodes * need to support python2.7 and 3.4, but its not currently possible with the lib we use? * want to pick a lib that we can fix when there are issues, and work to improve It sounds like: * currently do the first one, it sucks, forking nova-conductor helps * seems we are thinking the second one might work, we sure get py3.4 + py2.7 support * the last will mean more work, but its likely to be more performant * worried we are picking a unsupported lib with little future I am leaning towards us moving to making DB calls with a thread pool and some fast C based library, so we get the 'best' performance. Is that a crazy thing to be thinking? What am I missing here? Using the python socket from C code: https://github.com/esnme/ultramysql/blob/master/python/io_cpython.c#L100 Also possible to implement a mysql driver just as a protocol parser, and you are free to use you favorite event based I/O strategy (direct epoll usage) even without eventlet (or similar). The issue with ultramysql, it does not implements the `standard` python DB API, so you would need to add an extra wrapper to SQLAlchemy. This driver appears to have seen its last commit about a year ago, that doesn't even implement the standard DBAPI (which is already a red flag). There is apparently a separately released (!) DBAPI-compat wrapper https://pypi.python.org/pypi/umysqldb/1.0.3 which has had no releases in two years. If this wrapper is indeed compatible with MySQLdb then it would run in SQLAlchemy without changes (though I'd be extremely surprised if it passes our test suite). How would using these obscure libraries be any preferable than running Nova API functions within the thread-pooling facilities already included with eventlet ?Keeping in mind that I've now done the work [1] to show that there is no performance gain to be had for all the trouble we go through to use eventlet/gevent/asyncio with local database connections. [1] http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/ __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
On 5/11/15 2:02 PM, Attila Fazekas wrote: Not just with local database connections, the 10G network itself also fast. Is is possible you spend more time even on the kernel side tcp/ip stack (and the context switch..) (Not in physical I/O wait) than in the actual work on the DB side. (Check netperf TCP_RR) The scary part of a blocking I/O call is when you have two python thread (or green thread) and one of them is holding a DB lock the other is waiting for the same lock in a native blocking I/O syscall. that's a database deadlock and whether you use eventlet, threads, asycnio or even just two transactions in a single-threaded script, that can happen regardless. if your two eventlet non blocking greenlets are waiting forever for a deadlock, you're just as deadlocked as if you have OS threads. If you do a read(2) in native code, the python itself might not be able to preempt it Your transaction might be finished with `DB Lock wait timeout`, with 30 sec of doing nothing, instead of scheduling to the another python thread, which would be able to release the lock. Here's the you're losing me part because Python threads are OS threads, so Python isn't directly involved trying to preempt anything, unless you're referring to the effect of the GIL locking up the program. However, it's pretty easy to make two threads in Python hit a database and do a deadlock against each other, and the rest of the program's threads continue to run just fine; in a DB deadlock situation you are blocked on IO and IO releases the GIL. If you can illustrate a test script that demonstrates the actual failing of OS threads that does not occur greenlets here, that would make it immediately apparent what it is you're getting at here. [1] http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/ __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
- Original Message - From: Mike Bayer mba...@redhat.com To: openstack-dev@lists.openstack.org Sent: Monday, May 11, 2015 9:07:13 PM Subject: Re: [openstack-dev] [all] Replace mysql-python with mysqlclient On 5/11/15 2:02 PM, Attila Fazekas wrote: Not just with local database connections, the 10G network itself also fast. Is is possible you spend more time even on the kernel side tcp/ip stack (and the context switch..) (Not in physical I/O wait) than in the actual work on the DB side. (Check netperf TCP_RR) The scary part of a blocking I/O call is when you have two python thread (or green thread) and one of them is holding a DB lock the other is waiting for the same lock in a native blocking I/O syscall. that's a database deadlock and whether you use eventlet, threads, asycnio or even just two transactions in a single-threaded script, that can happen regardless. if your two eventlet non blocking greenlets are waiting forever for a deadlock, you're just as deadlocked as if you have OS threads. If you do a read(2) in native code, the python itself might not be able to preempt it Your transaction might be finished with `DB Lock wait timeout`, with 30 sec of doing nothing, instead of scheduling to the another python thread, which would be able to release the lock. Here's the you're losing me part because Python threads are OS threads, so Python isn't directly involved trying to preempt anything, unless you're referring to the effect of the GIL locking up the program. However, it's pretty easy to make two threads in Python hit a database and do a deadlock against each other, and the rest of the program's threads continue to run just fine; in a DB deadlock situation you are blocked on IO and IO releases the GIL. If you can illustrate a test script that demonstrates the actual failing of OS threads that does not occur greenlets here, that would make it immediately apparent what it is you're getting at here. http://www.fpaste.org/220824/raw/ I just put together hello word C example and a hello word threading example, and replaced the print with sleep(3). When I use the sleep(3) from python, the 5 thread program runs in ~3 second, when I use the sleep(3) from native code, it runs ~15 sec. So yes, it is very likely a GIL lock wait related issue, when the native code is not assisting. Do you need a DB example, by using the mysql C driver, and waiting in an actual I/O primitive ? The greenthreads will not help here. If I would import the python time.sleep from the C code it might help. Using pure python driver helps to avoid this kind of issues, but in this case you have the `cPython is slow` issue. [1] http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/ __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
All production Openstack applications today are fully serialized to only be able to emit a single query to the database at a time; True. That's why any deployment configures tons (tens) of workers of any significant service. When I talk about moving to threads, this is not a won't help or hurt kind of issue, at the moment it's a change that will immediately allow massive improvement to the performance of all Openstack applications instantly. Not sure If it will give much benefit over separate processes. I guess we don't configure many worker for gate testing (at least, neutron still doesn't do it), so there could be an improvement, but I guess to enable multithreading we would need to fix the same issues that prevented us from configuring multiple workers in the gate, plus possibly more. We need to change the DB library or dump eventlet. I'm +1 for the 1st option. Other option, which is multithreading will most certainly bring concurrency issues other than database. Thanks, Eugene. On Mon, May 11, 2015 at 4:46 PM, Boris Pavlovic bo...@pavlovic.me wrote: Mike, Thank you for saying all that you said above. Best regards, Boris Pavlovic On Tue, May 12, 2015 at 2:35 AM, Clint Byrum cl...@fewbar.com wrote: Excerpts from Mike Bayer's message of 2015-05-11 15:44:30 -0700: On 5/11/15 5:25 PM, Robert Collins wrote: Details: Skip over this bit if you know it all already. The GIL plays a big factor here: if you want to scale the amount of CPU available to a Python service, you have two routes: A) move work to a different process through some RPC - be that DB's using SQL, other services using oslo.messaging or HTTP - whatever. B) use C extensions to perform work in threads - e.g. openssl context processing. To increase concurrency you can use threads, eventlet, asyncio, twisted etc - because within a single process *all* Python bytecode execution happens inside the GIL lock, so you get at most one CPU for a CPU bound workload. For an IO bound workload, you can fit more work in by context switching within that one CPU capacity. And - the GIL is a poor scheduler, so at the limit - an IO bound workload where the IO backend has more capacity than we have CPU to consume it within our process, you will run into priority inversion and other problems. [This varies by Python release too]. request_duration = time_in_cpu + time_blocked request_cpu_utilisation = time_in_cpu/request_duration cpu_utilisation = concurrency * request_cpu_utilisation Assuming that we don't want any one process to spend a lot of time at 100% - to avoid such at-the-limit issues, lets pick say 80% utilisation, or a safety factor of 0.2. If a single request consumes 50% of its duration waiting on IO, and 50% of its duration executing bytecode, we can only run one such request concurrently without hitting 100% utilisations. (2*0.5 CPU == 1). For a request that spends 75% of its duration waiting on IO and 25% on CPU, we can run 3 such requests concurrently without exceeding our target of 80% utilisation: (3*0.25=0.75). What we have today in our standard architecture for OpenStack is optimised for IO bound workloads: waiting on the network/subprocesses/disk/libvirt etc. Running high numbers of eventlet handlers in a single process only works when the majority of the work being done by a handler is IO. Everything stated here is great, however in our situation there is one unfortunate fact which renders it completely incorrect at the moment. I'm still puzzled why we are getting into deep think sessions about the vagaries of the GIL and async when there is essentially a full-on red-alert performance blocker rendering all of this discussion useless, so I must again remind us: what we have *today* in Openstack is *as completely un-optimized as you can possibly be*. The most GIL-heavy nightmare CPU bound task you can imagine running on 25 threads on a ten year old Pentium will run better than the Openstack we have today, because we are running a C-based, non-eventlet patched DB library within a single OS thread that happens to use eventlet, but the use of eventlet is totally pointless because right now it blocks completely on all database IO. All production Openstack applications today are fully serialized to only be able to emit a single query to the database at a time; for each message sent, the entire application blocks an order of magnitude more than it would under the GIL waiting for the database library to send a message to MySQL, waiting for MySQL to send a response including the full results, waiting for the database to unwrap the response into Python structures, and finally back to the Python space, where we can send another database message and block the entire application and all greenlets while this single message proceeds. To share a link I've already shared
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
Given Python’s inherent inability to scale (GIL) relative to other languages/platforms, have there been any serious discussions on allowing other more scalable languages into the OpenStack ecosystem when concurrency/scalability is paramount? Regards. -- Deklan Dieterly Hewlett-Packard Company Sr. Systems Software Engineer HP Cloud From: Eugene Nikanorov enikano...@mirantis.commailto:enikano...@mirantis.com Reply-To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org Date: Monday, May 11, 2015 at 6:30 PM To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [all] Replace mysql-python with mysqlclient All production Openstack applications today are fully serialized to only be able to emit a single query to the database at a time; True. That's why any deployment configures tons (tens) of workers of any significant service. When I talk about moving to threads, this is not a won't help or hurt kind of issue, at the moment it's a change that will immediately allow massive improvement to the performance of all Openstack applications instantly. Not sure If it will give much benefit over separate processes. I guess we don't configure many worker for gate testing (at least, neutron still doesn't do it), so there could be an improvement, but I guess to enable multithreading we would need to fix the same issues that prevented us from configuring multiple workers in the gate, plus possibly more. We need to change the DB library or dump eventlet. I'm +1 for the 1st option. Other option, which is multithreading will most certainly bring concurrency issues other than database. Thanks, Eugene. On Mon, May 11, 2015 at 4:46 PM, Boris Pavlovic bo...@pavlovic.memailto:bo...@pavlovic.me wrote: Mike, Thank you for saying all that you said above. Best regards, Boris Pavlovic On Tue, May 12, 2015 at 2:35 AM, Clint Byrum cl...@fewbar.commailto:cl...@fewbar.com wrote: Excerpts from Mike Bayer's message of 2015-05-11 15:44:30 -0700: On 5/11/15 5:25 PM, Robert Collins wrote: Details: Skip over this bit if you know it all already. The GIL plays a big factor here: if you want to scale the amount of CPU available to a Python service, you have two routes: A) move work to a different process through some RPC - be that DB's using SQL, other services using oslo.messaging or HTTP - whatever. B) use C extensions to perform work in threads - e.g. openssl context processing. To increase concurrency you can use threads, eventlet, asyncio, twisted etc - because within a single process *all* Python bytecode execution happens inside the GIL lock, so you get at most one CPU for a CPU bound workload. For an IO bound workload, you can fit more work in by context switching within that one CPU capacity. And - the GIL is a poor scheduler, so at the limit - an IO bound workload where the IO backend has more capacity than we have CPU to consume it within our process, you will run into priority inversion and other problems. [This varies by Python release too]. request_duration = time_in_cpu + time_blocked request_cpu_utilisation = time_in_cpu/request_duration cpu_utilisation = concurrency * request_cpu_utilisation Assuming that we don't want any one process to spend a lot of time at 100% - to avoid such at-the-limit issues, lets pick say 80% utilisation, or a safety factor of 0.2. If a single request consumes 50% of its duration waiting on IO, and 50% of its duration executing bytecode, we can only run one such request concurrently without hitting 100% utilisations. (2*0.5 CPU == 1). For a request that spends 75% of its duration waiting on IO and 25% on CPU, we can run 3 such requests concurrently without exceeding our target of 80% utilisation: (3*0.25=0.75). What we have today in our standard architecture for OpenStack is optimised for IO bound workloads: waiting on the network/subprocesses/disk/libvirt etc. Running high numbers of eventlet handlers in a single process only works when the majority of the work being done by a handler is IO. Everything stated here is great, however in our situation there is one unfortunate fact which renders it completely incorrect at the moment. I'm still puzzled why we are getting into deep think sessions about the vagaries of the GIL and async when there is essentially a full-on red-alert performance blocker rendering all of this discussion useless, so I must again remind us: what we have *today* in Openstack is *as completely un-optimized as you can possibly be*. The most GIL-heavy nightmare CPU bound task you can imagine running on 25 threads on a ten year old Pentium will run better than the Openstack we have today, because we are running a C-based, non-eventlet patched DB library within a single
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
On 12 May 2015 at 10:12, Attila Fazekas afaze...@redhat.com wrote: If you can illustrate a test script that demonstrates the actual failing of OS threads that does not occur greenlets here, that would make it immediately apparent what it is you're getting at here. http://www.fpaste.org/220824/raw/ I just put together hello word C example and a hello word threading example, and replaced the print with sleep(3). When I use the sleep(3) from python, the 5 thread program runs in ~3 second, when I use the sleep(3) from native code, it runs ~15 sec. So yes, it is very likely a GIL lock wait related issue, when the native code is not assisting. Your test code isn't releasing the GIL here, and I'd expect C DB drivers to be releasing the GIL: you've illustrated how a C extension can hold the GIL, but not whether thats happening. Do you need a DB example, by using the mysql C driver, and waiting in an actual I/O primitive ? waiting in an I/O primitive is fine as long as the GIL has been released. -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
On Tue, 12 May 2015 at 05:08 Mike Bayer mba...@redhat.com wrote: On 5/11/15 2:02 PM, Attila Fazekas wrote: The scary part of a blocking I/O call is when you have two python thread (or green thread) and one of them is holding a DB lock the other is waiting for the same lock in a native blocking I/O syscall. that's a database deadlock and whether you use eventlet, threads, asycnio or even just two transactions in a single-threaded script, that can happen regardless. if your two eventlet non blocking greenlets are waiting forever for a deadlock, you're just as deadlocked as if you have OS threads. Not true (if I understand the situation Attila is referring to). If you do a read(2) in native code, the python itself might not be able to preempt it Your transaction might be finished with `DB Lock wait timeout`, with 30 sec of doing nothing, instead of scheduling to the another python thread, which would be able to release the lock. Here's the you're losing me part because Python threads are OS threads, so Python isn't directly involved trying to preempt anything, unless you're referring to the effect of the GIL locking up the program. However, it's pretty easy to make two threads in Python hit a database and do a deadlock against each other, and the rest of the program's threads continue to run just fine; in a DB deadlock situation you are blocked on IO and IO releases the GIL. If you can illustrate a test script that demonstrates the actual failing of OS threads that does not occur greenlets here, that would make it immediately apparent what it is you're getting at here. 1. Thread A does something that takes a lock on the DB side 2. Thread B does something that blocks waiting for that same DB lock 3. Depends on the threading model - see below In a true preemptive threading system (eg: regular python threads), (3) is: 3. Eventually A finishes its transaction/whatever, commits and releases the DB lock 4. B then takes the lock and proceeds 5. Profit However, in a system where B's DB client can't be preempted (eg: eventlet or asyncio calling into a C-based mysql library, and A and B are running on the same underlying kernel thread), (3) is: 3. B will never be preempted, A will never be rescheduled, and thus A will never complete whatever it was doing. 4. Deadlock (in mysql-python's case, until a deadlock timer raises an exception and kills B 30s later) 5. Sadness. More specifically, we add a @retry to paper over the particular observed occurrence and then repeat this discussion on os-dev when the topic comes up again 6 months later. Note that this is not the usual database transaction deadlock caused by A and B each taking a lock and then trying to take the other's lock - this is a deadlock purely in the client-side code caused entirely by the lack of preemption during an otherwise safe series of DB operations. See my oslo.db unittest in Ib35c95defea8ace5b456af28801659f2ba67eb96 that reproduces the above with eventlet and allows you to test the behaviour of various DB drivers. (zzzeek: I know you've already seen all of the above in previous discussions, so sorry for repeating). - Gus __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
On 10 May 2015 at 03:26, John Garbutt j...@johngarbutt.com wrote: On 9 May 2015 at 15:02, Mike Bayer mba...@redhat.com wrote: On 5/9/15 6:45 AM, John Garbutt wrote: I am leaning towards us moving to making DB calls with a thread pool and some fast C based library, so we get the 'best' performance. Is that a crazy thing to be thinking? What am I missing here? Thanks, John So 'best' performance, and the number of processes we have are all tied together. tl;dr: the number of Python processes required to handle a concurrency of N requests for a service is given by N*(1-safety_factor) * avg_request_cpu_use/(avg_request_cpu_use+avg_request_time_blocking) When requests are CPU bound, you need one process per concurrent request. When requests are IO bound, you can multiplex requests into a process, until the sum of the CPU work per second exceeds your safety factor (which I like to keep down around 0.8 to leave leeway for bursts). Threads don't help this at all. They don't hinder it either (broadly speaking - Mike has very specific performance metrics that show the overheads within the system of different multiplexing approachs). Threads are useful for dealing with things that expect threads, like most DB libraries. Using a thread pool is fine, but don't expect it to alter the fundamentals around how many processes we need. Details: Skip over this bit if you know it all already. The GIL plays a big factor here: if you want to scale the amount of CPU available to a Python service, you have two routes: A) move work to a different process through some RPC - be that DB's using SQL, other services using oslo.messaging or HTTP - whatever. B) use C extensions to perform work in threads - e.g. openssl context processing. To increase concurrency you can use threads, eventlet, asyncio, twisted etc - because within a single process *all* Python bytecode execution happens inside the GIL lock, so you get at most one CPU for a CPU bound workload. For an IO bound workload, you can fit more work in by context switching within that one CPU capacity. And - the GIL is a poor scheduler, so at the limit - an IO bound workload where the IO backend has more capacity than we have CPU to consume it within our process, you will run into priority inversion and other problems. [This varies by Python release too]. request_duration = time_in_cpu + time_blocked request_cpu_utilisation = time_in_cpu/request_duration cpu_utilisation = concurrency * request_cpu_utilisation Assuming that we don't want any one process to spend a lot of time at 100% - to avoid such at-the-limit issues, lets pick say 80% utilisation, or a safety factor of 0.2. If a single request consumes 50% of its duration waiting on IO, and 50% of its duration executing bytecode, we can only run one such request concurrently without hitting 100% utilisations. (2*0.5 CPU == 1). For a request that spends 75% of its duration waiting on IO and 25% on CPU, we can run 3 such requests concurrently without exceeding our target of 80% utilisation: (3*0.25=0.75). What we have today in our standard architecture for OpenStack is optimised for IO bound workloads: waiting on the network/subprocesses/disk/libvirt etc. Running high numbers of eventlet handlers in a single process only works when the majority of the work being done by a handler is IO. For some of our servers, e.g. Nova-compute, where we're spending a lot of time waiting on the DB (via the conductor), or libvirt, or VMWare callouts etc - this makes a lot of sense. In fact its nearly ideal: we're going to spend stuff all time executing bytecode, and the majority of time waiting. For other servers, e.g. heat-engine or murano, were we are doing complex processing of the state that was stored in the persistent store backing the system, that ratio is going to change dramatically. And for some, like nova-conductor, the better and faster we make the DB layer, the less time we spend blocked, and the *less* concurrency we can support in a single process. (But hopefully the less concurrency that is needed, for a given workload). So - a thread pool doesn't help with the number of I'd like to do that but I want the whole Openstack DB API layer in the thread pool, not just the low level DBAPI (Python driver) calls. There's no need for eventlet-style concurrency or even less for async-style concurrency in transactionally-oriented code. Sorry, not sure I get which DB API is which. I was thinking we could dispatch all calls to this API into a thread pool: https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py That would work I think. I guess an alternative is to add this in the objects layer, on top of the rpc dispatch: https://github.com/openstack/nova/blob/master/nova/objects/base.py#L188 But that somehow feels like a layer violation, maybe its not. No opinion here, sorry :) -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
On 05/11/2015 08:22 PM, Jay Pipes wrote: c) Many OpenStack services, including Nova, Cinder, and Neutron, when looked at from a thousand-foot level, are little more than glue code that pipes out to a shell to execute system commands (sorry, but it's true). No apologies necessary. :) So, bottom line for me: focus on the things that will have the biggest impact to long-term cost reduction of our codebase. +1 So, to me, the highest priority performance and scale fixes actually have to do with the simplification of our subsystems and architecture, not with whether we use mysql-python, PyMySQL, Python vs. Scala vs. Rust, or any other distractions. Arguably if we're actually seeing performance issues then it's not a distraction but rather a real problem that needs fixing. But I agree that we shouldn't be trying to optimize the performance pf code that isn't causing problems. Chris __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
Excerpts from Mike Bayer's message of 2015-05-11 15:44:30 -0700: On 5/11/15 5:25 PM, Robert Collins wrote: Details: Skip over this bit if you know it all already. The GIL plays a big factor here: if you want to scale the amount of CPU available to a Python service, you have two routes: A) move work to a different process through some RPC - be that DB's using SQL, other services using oslo.messaging or HTTP - whatever. B) use C extensions to perform work in threads - e.g. openssl context processing. To increase concurrency you can use threads, eventlet, asyncio, twisted etc - because within a single process *all* Python bytecode execution happens inside the GIL lock, so you get at most one CPU for a CPU bound workload. For an IO bound workload, you can fit more work in by context switching within that one CPU capacity. And - the GIL is a poor scheduler, so at the limit - an IO bound workload where the IO backend has more capacity than we have CPU to consume it within our process, you will run into priority inversion and other problems. [This varies by Python release too]. request_duration = time_in_cpu + time_blocked request_cpu_utilisation = time_in_cpu/request_duration cpu_utilisation = concurrency * request_cpu_utilisation Assuming that we don't want any one process to spend a lot of time at 100% - to avoid such at-the-limit issues, lets pick say 80% utilisation, or a safety factor of 0.2. If a single request consumes 50% of its duration waiting on IO, and 50% of its duration executing bytecode, we can only run one such request concurrently without hitting 100% utilisations. (2*0.5 CPU == 1). For a request that spends 75% of its duration waiting on IO and 25% on CPU, we can run 3 such requests concurrently without exceeding our target of 80% utilisation: (3*0.25=0.75). What we have today in our standard architecture for OpenStack is optimised for IO bound workloads: waiting on the network/subprocesses/disk/libvirt etc. Running high numbers of eventlet handlers in a single process only works when the majority of the work being done by a handler is IO. Everything stated here is great, however in our situation there is one unfortunate fact which renders it completely incorrect at the moment. I'm still puzzled why we are getting into deep think sessions about the vagaries of the GIL and async when there is essentially a full-on red-alert performance blocker rendering all of this discussion useless, so I must again remind us: what we have *today* in Openstack is *as completely un-optimized as you can possibly be*. The most GIL-heavy nightmare CPU bound task you can imagine running on 25 threads on a ten year old Pentium will run better than the Openstack we have today, because we are running a C-based, non-eventlet patched DB library within a single OS thread that happens to use eventlet, but the use of eventlet is totally pointless because right now it blocks completely on all database IO. All production Openstack applications today are fully serialized to only be able to emit a single query to the database at a time; for each message sent, the entire application blocks an order of magnitude more than it would under the GIL waiting for the database library to send a message to MySQL, waiting for MySQL to send a response including the full results, waiting for the database to unwrap the response into Python structures, and finally back to the Python space, where we can send another database message and block the entire application and all greenlets while this single message proceeds. To share a link I've already shared about a dozen times here, here's some tests under similar conditions which illustrate what that concurrency looks like: http://www.diamondtin.com/2014/sqlalchemy-gevent-mysql-python-drivers-comparison/. MySQLdb takes *20 times longer* to handle the work of 100 sessions than PyMySQL when it's inappropriately run under gevent, when there is modestly high concurrency happening. When I talk about moving to threads, this is not a won't help or hurt kind of issue, at the moment it's a change that will immediately allow massive improvement to the performance of all Openstack applications instantly. We need to change the DB library or dump eventlet. As far as if we should dump eventlet or use a pure-Python DB library, my contention is that a thread based + C database library will outperform an eventlet + Python-based database library. Additionally, if we make either change, when we do so we may very well see all kinds of new database-concurrency related bugs in our apps too, because we will be talking to the database much more intensively all the sudden; it is my opinion that a traditional threading model will be an easier environment to handle working out the approach to these issues; we have to assume concurrency at any time
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
Mike, Thank you for saying all that you said above. Best regards, Boris Pavlovic On Tue, May 12, 2015 at 2:35 AM, Clint Byrum cl...@fewbar.com wrote: Excerpts from Mike Bayer's message of 2015-05-11 15:44:30 -0700: On 5/11/15 5:25 PM, Robert Collins wrote: Details: Skip over this bit if you know it all already. The GIL plays a big factor here: if you want to scale the amount of CPU available to a Python service, you have two routes: A) move work to a different process through some RPC - be that DB's using SQL, other services using oslo.messaging or HTTP - whatever. B) use C extensions to perform work in threads - e.g. openssl context processing. To increase concurrency you can use threads, eventlet, asyncio, twisted etc - because within a single process *all* Python bytecode execution happens inside the GIL lock, so you get at most one CPU for a CPU bound workload. For an IO bound workload, you can fit more work in by context switching within that one CPU capacity. And - the GIL is a poor scheduler, so at the limit - an IO bound workload where the IO backend has more capacity than we have CPU to consume it within our process, you will run into priority inversion and other problems. [This varies by Python release too]. request_duration = time_in_cpu + time_blocked request_cpu_utilisation = time_in_cpu/request_duration cpu_utilisation = concurrency * request_cpu_utilisation Assuming that we don't want any one process to spend a lot of time at 100% - to avoid such at-the-limit issues, lets pick say 80% utilisation, or a safety factor of 0.2. If a single request consumes 50% of its duration waiting on IO, and 50% of its duration executing bytecode, we can only run one such request concurrently without hitting 100% utilisations. (2*0.5 CPU == 1). For a request that spends 75% of its duration waiting on IO and 25% on CPU, we can run 3 such requests concurrently without exceeding our target of 80% utilisation: (3*0.25=0.75). What we have today in our standard architecture for OpenStack is optimised for IO bound workloads: waiting on the network/subprocesses/disk/libvirt etc. Running high numbers of eventlet handlers in a single process only works when the majority of the work being done by a handler is IO. Everything stated here is great, however in our situation there is one unfortunate fact which renders it completely incorrect at the moment. I'm still puzzled why we are getting into deep think sessions about the vagaries of the GIL and async when there is essentially a full-on red-alert performance blocker rendering all of this discussion useless, so I must again remind us: what we have *today* in Openstack is *as completely un-optimized as you can possibly be*. The most GIL-heavy nightmare CPU bound task you can imagine running on 25 threads on a ten year old Pentium will run better than the Openstack we have today, because we are running a C-based, non-eventlet patched DB library within a single OS thread that happens to use eventlet, but the use of eventlet is totally pointless because right now it blocks completely on all database IO. All production Openstack applications today are fully serialized to only be able to emit a single query to the database at a time; for each message sent, the entire application blocks an order of magnitude more than it would under the GIL waiting for the database library to send a message to MySQL, waiting for MySQL to send a response including the full results, waiting for the database to unwrap the response into Python structures, and finally back to the Python space, where we can send another database message and block the entire application and all greenlets while this single message proceeds. To share a link I've already shared about a dozen times here, here's some tests under similar conditions which illustrate what that concurrency looks like: http://www.diamondtin.com/2014/sqlalchemy-gevent-mysql-python-drivers-comparison/ . MySQLdb takes *20 times longer* to handle the work of 100 sessions than PyMySQL when it's inappropriately run under gevent, when there is modestly high concurrency happening. When I talk about moving to threads, this is not a won't help or hurt kind of issue, at the moment it's a change that will immediately allow massive improvement to the performance of all Openstack applications instantly. We need to change the DB library or dump eventlet. As far as if we should dump eventlet or use a pure-Python DB library, my contention is that a thread based + C database library will outperform an eventlet + Python-based database library. Additionally, if we make either change, when we do so we may very well see all kinds of new database-concurrency related bugs in our apps too, because we will be talking to the database much more intensively
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
On 12 May 2015 at 11:35, Clint Byrum cl...@fewbar.com wrote: Excerpts from Mike Bayer's message of 2015-05-11 15:44:30 -0700: Anyway, there is additional thought that might change the decision a bit. There is one pro to changing to use pymsql vs. changing to use threads, and that is that it isolates the change to only database access. Switching to threading means introducing threads to every piece of code we might touch while multiple threads are active. I agree. It really seems worth it to see if I/O bound portions of OpenStack become more responsive with pymysql before embarking on a change to the concurrency model. If it doesn't, not much harm done, and if it does, but makes us CPU bound, well then we have even more of a reason to set out on such a large task. And yes. -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
On 12 May 2015 at 10:44, Mike Bayer mba...@redhat.com wrote: What we have today in our standard architecture for OpenStack is optimised for IO bound workloads: waiting on the network/subprocesses/disk/libvirt etc. Running high numbers of eventlet handlers in a single process only works when the majority of the work being done by a handler is IO. Everything stated here is great, however in our situation there is one unfortunate fact which renders it completely incorrect at the moment. I'm still puzzled why we are getting into deep think sessions about the vagaries of the GIL and async when there is essentially a full-on red-alert performance blocker rendering all of this discussion useless, so I must again remind us: what we have *today* in Openstack is *as completely un-optimized as you can possibly be*. Sorry if I seems like I went on a tangent, but choosing a concurrency model in Python, which a lot of this discussion has been about, is inextricably linked to the workload being tackled. The point of my tl;dr was that using threads - which gets us out of the pit below - is fine for most of our workloads and irrelevant to the actual issues in the other ones. Clearly that didn't come across. - Sorry. The most GIL-heavy nightmare CPU bound task you can imagine running on 25 threads on a ten year old Pentium will run better than the Openstack we have today, because we are running a C-based, non-eventlet patched DB library within a single OS thread that happens to use eventlet, but the use of eventlet is totally pointless because right now it blocks completely on all database IO. To confirm my understanding: this library releases the GIL, but because we only have one thread, we don't get more work done. Yes, that sucks. And your tl;dr is that we need to either use an eventlet ready library or not use eventlet's greenthreads, either of which I support as a short term rectification. ... talking to the database much more intensively all the sudden; it is my opinion that a traditional threading model will be an easier environment to handle working out the approach to these issues; we have to assume concurrency at any time in any case because we run multiple instances of Nova etc. at the same time. At the end of the day, we aren't going to see wildly better performance with one approach over the other in any case, so we should pick the one that is easier to develop, maintain, and keep stable. I agree. I'd actually be quite interested in exploring a CSP model for even clearer code and diagnosis of issues, but simple sequential code within threads would be a win itself. Robert's analysis talks about various at the limit issues, but I was They tend to turn up at scale. You get 100 requests a day out of 5 million that are inexplicably slow, and eventually you have enough data around the situation to try an experiment, and lo and behold the problem goes away. They don't disagree with the argument you're making though - this is just the bigger context, when folk go to deploy our (real threads || eventlet friendly DB library) code, how many processes will they need? FWIW, I think moving to an eventlet friendly library should be the first step because it can be done much more rapidly and with arguably less risk. I don't think the discuss ends there though :) -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
On 12 May 2015 at 13:02, Dieterly, Deklan deklan.diete...@hp.com wrote: Given Python’s inherent inability to scale (GIL) relative to other languages/platforms, have there been any serious discussions on allowing other more scalable languages into the OpenStack ecosystem when concurrency/scalability is paramount? The GIL is a particular part of the Python scaling story, but don't let it scare you: http://en.wikipedia.org/wiki/Global_Interpreter_Lock - Ruby MRI also has a GIL equivalent. Last I heard golang still defaults GOMAXPROCS to 1 and often performs less efficiently when it is 1 (that is, individual requests becomes slower but more requests can get CPU at once). In rust threads are quite interesting, though there's an arena per thread and you need hand ownership around (http://doc.rust-lang.org/1.0.0-alpha/book/tasks.html). We do allow other languages in - see the Swift golang stuff happening right now, but:- short of C layer languages (which rust arguably is), scaling CPU bound workloads is always tricky in one way or another, we just get to pick what bit will be tricky for us. Jython can free thread, for instance - the GIL is CPython constraint, not Python per se. -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
- Original Message - From: Mike Bayer mba...@redhat.com To: openstack-dev@lists.openstack.org Sent: Monday, May 11, 2015 4:44:58 PM Subject: Re: [openstack-dev] [all] Replace mysql-python with mysqlclient On 5/11/15 9:58 AM, Attila Fazekas wrote: - Original Message - From: John Garbutt j...@johngarbutt.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Cc: Dan Smith d...@danplanet.com Sent: Saturday, May 9, 2015 12:45:26 PM Subject: Re: [openstack-dev] [all] Replace mysql-python with mysqlclient On 30 April 2015 at 18:54, Mike Bayer mba...@redhat.com wrote: On 4/30/15 11:16 AM, Dan Smith wrote: There is an open discussion to replace mysql-python with PyMySQL, but PyMySQL has worse performance: https://wiki.openstack.org/wiki/PyMySQL_evaluation My major concern with not moving to something different (i.e. not based on the C library) is the threading problem. Especially as we move in the direction of cellsv2 in nova, not blocking the process while waiting for a reply from mysql is going to be critical. Further, I think that we're likely to get back a lot of performance from a supports-eventlet database connection because of the parallelism that conductor currently can only provide in exchange for the footprint of forking into lots of workers. If we're going to move, shouldn't we be looking at something that supports our threading model? yes, but at the same time, we should change our threading model at the level of where APIs are accessed to refer to a database, at the very least using a threadpool behind eventlet. CRUD-oriented database access is faster using traditional threads, even in Python, than using an eventlet-like system or using explicit async. The tests at http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/ show this.With traditional threads, we can stay on the C-based MySQL APIs and take full advantage of their speed. Sorry to go back in time, I wanted to go back to an important point. It seems we have three possible approaches: * C lib and eventlet, blocks whole process * pure python lib, and eventlet, eventlet does its thing * go for a C lib and dispatch calls via thread pool * go with pure C protocol lib, which explicitly using `python patch-able` I/O function (Maybe others like.: threading, mutex, sleep ..) * go with pure C protocol lib and the python part explicitly call for `decode` and `encode`, the C part just do CPU intensive operations, and it never calls for I/O primitives . We have a few problems: * performance sucks, we have to fork lots of nova-conductors and api nodes * need to support python2.7 and 3.4, but its not currently possible with the lib we use? * want to pick a lib that we can fix when there are issues, and work to improve It sounds like: * currently do the first one, it sucks, forking nova-conductor helps * seems we are thinking the second one might work, we sure get py3.4 + py2.7 support * the last will mean more work, but its likely to be more performant * worried we are picking a unsupported lib with little future I am leaning towards us moving to making DB calls with a thread pool and some fast C based library, so we get the 'best' performance. Is that a crazy thing to be thinking? What am I missing here? Using the python socket from C code: https://github.com/esnme/ultramysql/blob/master/python/io_cpython.c#L100 Also possible to implement a mysql driver just as a protocol parser, and you are free to use you favorite event based I/O strategy (direct epoll usage) even without eventlet (or similar). The issue with ultramysql, it does not implements the `standard` python DB API, so you would need to add an extra wrapper to SQLAlchemy. This driver appears to have seen its last commit about a year ago, that doesn't even implement the standard DBAPI (which is already a red flag). There is apparently a separately released (!) DBAPI-compat wrapper https://pypi.python.org/pypi/umysqldb/1.0.3 which has had no releases in two years. If this wrapper is indeed compatible with MySQLdb then it would run in SQLAlchemy without changes (though I'd be extremely surprised if it passes our test suite). How would using these obscure libraries be any preferable than running Nova API functions within the thread-pooling facilities already included with eventlet ?Keeping in mind that I've now done the work [1] to show that there is no performance gain to be had for all the trouble we go through to use eventlet/gevent/asyncio with local database connections. Not just with local database connections, the 10G network itself also fast. Is is possible you spend more time even on the kernel side tcp/ip stack (and the context switch..) (Not in physical I/O wait) than in the actual
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
On 5/11/15 5:25 PM, Robert Collins wrote: Details: Skip over this bit if you know it all already. The GIL plays a big factor here: if you want to scale the amount of CPU available to a Python service, you have two routes: A) move work to a different process through some RPC - be that DB's using SQL, other services using oslo.messaging or HTTP - whatever. B) use C extensions to perform work in threads - e.g. openssl context processing. To increase concurrency you can use threads, eventlet, asyncio, twisted etc - because within a single process *all* Python bytecode execution happens inside the GIL lock, so you get at most one CPU for a CPU bound workload. For an IO bound workload, you can fit more work in by context switching within that one CPU capacity. And - the GIL is a poor scheduler, so at the limit - an IO bound workload where the IO backend has more capacity than we have CPU to consume it within our process, you will run into priority inversion and other problems. [This varies by Python release too]. request_duration = time_in_cpu + time_blocked request_cpu_utilisation = time_in_cpu/request_duration cpu_utilisation = concurrency * request_cpu_utilisation Assuming that we don't want any one process to spend a lot of time at 100% - to avoid such at-the-limit issues, lets pick say 80% utilisation, or a safety factor of 0.2. If a single request consumes 50% of its duration waiting on IO, and 50% of its duration executing bytecode, we can only run one such request concurrently without hitting 100% utilisations. (2*0.5 CPU == 1). For a request that spends 75% of its duration waiting on IO and 25% on CPU, we can run 3 such requests concurrently without exceeding our target of 80% utilisation: (3*0.25=0.75). What we have today in our standard architecture for OpenStack is optimised for IO bound workloads: waiting on the network/subprocesses/disk/libvirt etc. Running high numbers of eventlet handlers in a single process only works when the majority of the work being done by a handler is IO. Everything stated here is great, however in our situation there is one unfortunate fact which renders it completely incorrect at the moment. I'm still puzzled why we are getting into deep think sessions about the vagaries of the GIL and async when there is essentially a full-on red-alert performance blocker rendering all of this discussion useless, so I must again remind us: what we have *today* in Openstack is *as completely un-optimized as you can possibly be*. The most GIL-heavy nightmare CPU bound task you can imagine running on 25 threads on a ten year old Pentium will run better than the Openstack we have today, because we are running a C-based, non-eventlet patched DB library within a single OS thread that happens to use eventlet, but the use of eventlet is totally pointless because right now it blocks completely on all database IO. All production Openstack applications today are fully serialized to only be able to emit a single query to the database at a time; for each message sent, the entire application blocks an order of magnitude more than it would under the GIL waiting for the database library to send a message to MySQL, waiting for MySQL to send a response including the full results, waiting for the database to unwrap the response into Python structures, and finally back to the Python space, where we can send another database message and block the entire application and all greenlets while this single message proceeds. To share a link I've already shared about a dozen times here, here's some tests under similar conditions which illustrate what that concurrency looks like: http://www.diamondtin.com/2014/sqlalchemy-gevent-mysql-python-drivers-comparison/. MySQLdb takes *20 times longer* to handle the work of 100 sessions than PyMySQL when it's inappropriately run under gevent, when there is modestly high concurrency happening. When I talk about moving to threads, this is not a won't help or hurt kind of issue, at the moment it's a change that will immediately allow massive improvement to the performance of all Openstack applications instantly. We need to change the DB library or dump eventlet. As far as if we should dump eventlet or use a pure-Python DB library, my contention is that a thread based + C database library will outperform an eventlet + Python-based database library. Additionally, if we make either change, when we do so we may very well see all kinds of new database-concurrency related bugs in our apps too, because we will be talking to the database much more intensively all the sudden; it is my opinion that a traditional threading model will be an easier environment to handle working out the approach to these issues; we have to assume concurrency at any time in any case because we run multiple instances of Nova etc. at the same time. At the end of the day, we aren't going to see wildly better performance with one approach over the
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
On 05/11/2015 09:02 PM, Dieterly, Deklan wrote: Given Python’s inherent inability to scale (GIL) relative to other languages/platforms, have there been any serious discussions on allowing other more scalable languages into the OpenStack ecosystem when concurrency/scalability is paramount? Robert has already responded to your lack of specificity in the above statement. I'd like to add a couple things: a) The architecture of our OpenStack projects -- including the systems we use as backing data stores and the coarse-grained locking techniques we have used to date -- is a bigger problem than the language most OpenStack components are written in. b) The speed of development and the familiarity with the Python language of the folks involved in our CI, testing, and infra/build platforms and the inherent economies of scale we get from that represent a far greater long-term cost reduction than trying to rewrite existing systems in faster or more scalable platforms. Developer and operator time costs way more than the tiny amount of additional costs that comes from buying a few more and faster processors to put controller services on. c) Many OpenStack services, including Nova, Cinder, and Neutron, when looked at from a thousand-foot level, are little more than glue code that pipes out to a shell to execute system commands (sorry, but it's true). If you look at the time spent in the database and message queue it's really very little compared to the time spent on a compute node spawning an image. The DB and message queue are, IME, not where scaling problems occur. Instead, they occur in things like Nova pulling images from Glance unnecessarily (an architectural problem, not a concurrency problem) or the implementation of iptables saves when lots of security groups on a single compute node would cause excessive rebuilds of the routing tables. Now, do I support the hummingbird Golang object server effort in Swift? Absolutely, I do. Because it 100% makes sense there. That part of the Swift code base is where concurrency and performance matters big time. Would implementing all of nova-compute in Golang result in huge performance gains? No, not at all. It just doesn't make much sense, since much of nova-compute's time is spent shell'd out in execution or waiting on locks (an implementation thing that has little to nothing to do with the language used). So, bottom line for me: focus on the things that will have the biggest impact to long-term cost reduction of our codebase. Very little of that cost (in most of our OpenStack projects) has to do with concurrency issues. A *lot* of that long-term cost has to do with the unnecessary complexity of our architecture and subsystems, because it's high-priced humans that need to get paid to maintain such complexity. So, to me, the highest priority performance and scale fixes actually have to do with the simplification of our subsystems and architecture, not with whether we use mysql-python, PyMySQL, Python vs. Scala vs. Rust, or any other distractions. Best, -jay __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
On 30 April 2015 at 18:54, Mike Bayer mba...@redhat.com wrote: On 4/30/15 11:16 AM, Dan Smith wrote: There is an open discussion to replace mysql-python with PyMySQL, but PyMySQL has worse performance: https://wiki.openstack.org/wiki/PyMySQL_evaluation My major concern with not moving to something different (i.e. not based on the C library) is the threading problem. Especially as we move in the direction of cellsv2 in nova, not blocking the process while waiting for a reply from mysql is going to be critical. Further, I think that we're likely to get back a lot of performance from a supports-eventlet database connection because of the parallelism that conductor currently can only provide in exchange for the footprint of forking into lots of workers. If we're going to move, shouldn't we be looking at something that supports our threading model? yes, but at the same time, we should change our threading model at the level of where APIs are accessed to refer to a database, at the very least using a threadpool behind eventlet. CRUD-oriented database access is faster using traditional threads, even in Python, than using an eventlet-like system or using explicit async. The tests at http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/ show this.With traditional threads, we can stay on the C-based MySQL APIs and take full advantage of their speed. Sorry to go back in time, I wanted to go back to an important point. It seems we have three possible approaches: * C lib and eventlet, blocks whole process * pure python lib, and eventlet, eventlet does its thing * go for a C lib and dispatch calls via thread pool We have a few problems: * performance sucks, we have to fork lots of nova-conductors and api nodes * need to support python2.7 and 3.4, but its not currently possible with the lib we use? * want to pick a lib that we can fix when there are issues, and work to improve It sounds like: * currently do the first one, it sucks, forking nova-conductor helps * seems we are thinking the second one might work, we sure get py3.4 + py2.7 support * the last will mean more work, but its likely to be more performant * worried we are picking a unsupported lib with little future I am leaning towards us moving to making DB calls with a thread pool and some fast C based library, so we get the 'best' performance. Is that a crazy thing to be thinking? What am I missing here? Thanks, John __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
On 9 May 2015 at 15:02, Mike Bayer mba...@redhat.com wrote: On 5/9/15 6:45 AM, John Garbutt wrote: I am leaning towards us moving to making DB calls with a thread pool and some fast C based library, so we get the 'best' performance. Is that a crazy thing to be thinking? What am I missing here? Thanks, John I'd like to do that but I want the whole Openstack DB API layer in the thread pool, not just the low level DBAPI (Python driver) calls. There's no need for eventlet-style concurrency or even less for async-style concurrency in transactionally-oriented code. Sorry, not sure I get which DB API is which. I was thinking we could dispatch all calls to this API into a thread pool: https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py I guess an alternative is to add this in the objects layer, on top of the rpc dispatch: https://github.com/openstack/nova/blob/master/nova/objects/base.py#L188 But that somehow feels like a layer violation, maybe its not. Is that similar to what you where thinking? Thanks, John __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
On 5/9/15 6:45 AM, John Garbutt wrote: I am leaning towards us moving to making DB calls with a thread pool and some fast C based library, so we get the 'best' performance. Is that a crazy thing to be thinking? What am I missing here? Thanks, John I'd like to do that but I want the whole Openstack DB API layer in the thread pool, not just the low level DBAPI (Python driver) calls. There's no need for eventlet-style concurrency or even less for async-style concurrency in transactionally-oriented code. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
Has anybody considered the native python connector for MySQL that supports Python 3. Here are the Ubuntu Packages. $ apt-get show python-mysql.connector E: Invalid operation show rbradfor@rubble:~$ apt-cache show python-mysql.connector Package: python-mysql.connector Priority: optional Section: universe/python Installed-Size: 386 Maintainer: Ubuntu Developers ubuntu-devel-disc...@lists.ubuntu.com Original-Maintainer: Sandro Tosi mo...@debian.org Architecture: all Source: mysql-connector-python Version: 1.1.6-1 Replaces: mysql-utilities ( 1.3.5-2) Depends: python:any (= 2.7.5-5~), python:any ( 2.8) Breaks: mysql-utilities ( 1.3.5-2) Filename: pool/universe/m/mysql-connector-python/python-mysql.connector_1.1.6-1_all.deb Size: 67196 MD5sum: 22b2cb35cf8b14ac0bf4493b0d676adb SHA1: de626403e1b14f617e9acb0a6934f044fae061c7 SHA256: 99e34f67d085c28b49eb8145c281deaa6d2b2a48d741e6831e149510087aab94 Description-en: pure Python implementation of MySQL Client/Server protocol MySQL driver written in Python which does not depend on MySQL C client libraries and implements the DB API v2.0 specification (PEP-249). . MySQL Connector/Python is implementing the MySQL Client/Server protocol completely in Python. This means you don't have to compile anything or MySQL (client library) doesn't even have to be installed on the machine. Description-md5: bb7e2eba7769d706d44e0ef91171b4ed Homepage: http://dev.mysql.com/doc/connector-python/en/index.html Bugs: https://bugs.launchpad.net/ubuntu/+filebug Origin: Ubuntu $ apt-cache show python3-mysql.connector Package: python3-mysql.connector Priority: optional Section: universe/python Installed-Size: 385 Maintainer: Ubuntu Developers ubuntu-devel-disc...@lists.ubuntu.com Original-Maintainer: Sandro Tosi mo...@debian.org Architecture: all Source: mysql-connector-python Version: 1.1.6-1 Depends: python3:any (= 3.3.2-2~) Filename: pool/universe/m/mysql-connector-python/python3-mysql.connector_1.1.6-1_all.deb Size: 64870 MD5sum: 461208ed1b89d516d6f6ce43c003a173 SHA1: bd439c4057824178490b402ad6c84067e1e2884e SHA256: 487af52b98bc5f048faf4dc73420eff20b75a150e1f92c82de2ecdd4671659ae Description-en: pure Python implementation of MySQL Client/Server protocol (Python3) MySQL driver written in Python which does not depend on MySQL C client libraries and implements the DB API v2.0 specification (PEP-249). . MySQL Connector/Python is implementing the MySQL Client/Server protocol completely in Python. This means you don't have to compile anything or MySQL (client library) doesn't even have to be installed on the machine. . This package contains the Python 3 version of mysql.connector. Description-md5: 4bca3815f5856ddf4a629b418ec76c8f Homepage: http://dev.mysql.com/doc/connector-python/en/index.html Bugs: https://bugs.launchpad.net/ubuntu/+filebug Origin: Ubuntu Ronald Bradford Web Site: http://ronaldbradford.com LinkedIn: http://www.linkedin.com/in/ronaldbradford Twitter: @RonaldBradford http://twitter.com/ronaldbradford Skype: RonaldBradford GTalk: Ronald.Bradford On Thu, May 7, 2015 at 9:39 PM, Mike Bayer mba...@redhat.com wrote: On 5/7/15 5:32 PM, Thomas Goirand wrote: If there are really fixes and features we need in Py2K then of course we have to either convince MySQLdb to merge them or switch to mysqlclient. Given the no reply in 6 months I think that's enough to say it: mysql-python is a dangerous package with a non-responsive upstream. That's always bad, and IMO, enough to try to get rid of it. If you think switching to PyMYSQL is effortless, and the best way forward, then let's do that ASAP! haha - id rather have drop eventlet + mysqlclient :) as far as this thread, where this has been heading is that django has already been recommending mysqlclient and it's become apparent just what a barrage of emails and messages have been sent Andy Dustman's way, with no response.I agree this is troubling behavior, and I've alerted people at RH internal that we need to start thinking about this package switch.My original issue was that for Fedora etc., changing it in this way is challenging, and from my discussions with packaging people, this is actually correct - this isn't an easy way to do it for them and there have been many emails as a result. My other issue is the SQLAlchemy testing issue - I'd essentially have to just stop testing mysql-python and switch to mysqlclient entirely, which means i need to revise all my docs and get all my users to switch also when the SQLAlchemy MySQLdb dialect eventually diverges from mysql-python 1.2.5, hence the whole thing is in a not-minor-enough way my problem as well.A simple module name change for mysqlclient, then there's no problem. But there you go - assuming continued crickets from AD, and seeing that people continue find it important to appease projects like Trac that IMO quite amateurishly hardcode import MySQLdb, I don't see much other option.
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
I guess I may have spoken too soon. https://wiki.openstack.org/wiki/PyMySQL_evaluation states Oracle refuses to publish MySQL-connector-Python on Pypi, which is critical to the Openstack infrastructure. I am unclear when this statement was made and who is involved in this discussion. As I have contacts in the MySQL engineering and Oracle Corporation product development teams I will endeavor to seek a more current and definitive response and statement. Regards Ronald On Fri, May 8, 2015 at 10:33 AM, Ronald Bradford m...@ronaldbradford.com wrote: Has anybody considered the native python connector for MySQL that supports Python 3. Here are the Ubuntu Packages. $ apt-get show python-mysql.connector E: Invalid operation show rbradfor@rubble:~$ apt-cache show python-mysql.connector Package: python-mysql.connector Priority: optional Section: universe/python Installed-Size: 386 Maintainer: Ubuntu Developers ubuntu-devel-disc...@lists.ubuntu.com Original-Maintainer: Sandro Tosi mo...@debian.org Architecture: all Source: mysql-connector-python Version: 1.1.6-1 Replaces: mysql-utilities ( 1.3.5-2) Depends: python:any (= 2.7.5-5~), python:any ( 2.8) Breaks: mysql-utilities ( 1.3.5-2) Filename: pool/universe/m/mysql-connector-python/python-mysql.connector_1.1.6-1_all.deb Size: 67196 MD5sum: 22b2cb35cf8b14ac0bf4493b0d676adb SHA1: de626403e1b14f617e9acb0a6934f044fae061c7 SHA256: 99e34f67d085c28b49eb8145c281deaa6d2b2a48d741e6831e149510087aab94 Description-en: pure Python implementation of MySQL Client/Server protocol MySQL driver written in Python which does not depend on MySQL C client libraries and implements the DB API v2.0 specification (PEP-249). . MySQL Connector/Python is implementing the MySQL Client/Server protocol completely in Python. This means you don't have to compile anything or MySQL (client library) doesn't even have to be installed on the machine. Description-md5: bb7e2eba7769d706d44e0ef91171b4ed Homepage: http://dev.mysql.com/doc/connector-python/en/index.html Bugs: https://bugs.launchpad.net/ubuntu/+filebug Origin: Ubuntu $ apt-cache show python3-mysql.connector Package: python3-mysql.connector Priority: optional Section: universe/python Installed-Size: 385 Maintainer: Ubuntu Developers ubuntu-devel-disc...@lists.ubuntu.com Original-Maintainer: Sandro Tosi mo...@debian.org Architecture: all Source: mysql-connector-python Version: 1.1.6-1 Depends: python3:any (= 3.3.2-2~) Filename: pool/universe/m/mysql-connector-python/python3-mysql.connector_1.1.6-1_all.deb Size: 64870 MD5sum: 461208ed1b89d516d6f6ce43c003a173 SHA1: bd439c4057824178490b402ad6c84067e1e2884e SHA256: 487af52b98bc5f048faf4dc73420eff20b75a150e1f92c82de2ecdd4671659ae Description-en: pure Python implementation of MySQL Client/Server protocol (Python3) MySQL driver written in Python which does not depend on MySQL C client libraries and implements the DB API v2.0 specification (PEP-249). . MySQL Connector/Python is implementing the MySQL Client/Server protocol completely in Python. This means you don't have to compile anything or MySQL (client library) doesn't even have to be installed on the machine. . This package contains the Python 3 version of mysql.connector. Description-md5: 4bca3815f5856ddf4a629b418ec76c8f Homepage: http://dev.mysql.com/doc/connector-python/en/index.html Bugs: https://bugs.launchpad.net/ubuntu/+filebug Origin: Ubuntu Ronald Bradford Web Site: http://ronaldbradford.com LinkedIn: http://www.linkedin.com/in/ronaldbradford Twitter: @RonaldBradford http://twitter.com/ronaldbradford Skype: RonaldBradford GTalk: Ronald.Bradford On Thu, May 7, 2015 at 9:39 PM, Mike Bayer mba...@redhat.com wrote: On 5/7/15 5:32 PM, Thomas Goirand wrote: If there are really fixes and features we need in Py2K then of course we have to either convince MySQLdb to merge them or switch to mysqlclient. Given the no reply in 6 months I think that's enough to say it: mysql-python is a dangerous package with a non-responsive upstream. That's always bad, and IMO, enough to try to get rid of it. If you think switching to PyMYSQL is effortless, and the best way forward, then let's do that ASAP! haha - id rather have drop eventlet + mysqlclient :) as far as this thread, where this has been heading is that django has already been recommending mysqlclient and it's become apparent just what a barrage of emails and messages have been sent Andy Dustman's way, with no response.I agree this is troubling behavior, and I've alerted people at RH internal that we need to start thinking about this package switch.My original issue was that for Fedora etc., changing it in this way is challenging, and from my discussions with packaging people, this is actually correct - this isn't an easy way to do it for them and there have been many emails as a result. My other issue is the SQLAlchemy testing issue - I'd
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
Excerpts from Ronald Bradford's message of 2015-05-08 10:41:30 -0400: I guess I may have spoken too soon. https://wiki.openstack.org/wiki/PyMySQL_evaluation states Oracle refuses to publish MySQL-connector-Python on Pypi, which is critical to the Openstack infrastructure. I am unclear when this statement was made and who is involved in this discussion. As I have contacts in the MySQL engineering and Oracle Corporation product development teams I will endeavor to seek a more current and definitive response and statement. We install all of our library dependencies via pip (for unit, functional, and integration tests). New versions of pip require special handling to install packages not hosted on PyPI, and that special handling must be performed in every place where we have a dependency on the package, which places an extra burden on us that we would prefer to avoid. Doug __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
If that could get published, please do make it happen! As for as who tried to contact oracle, and never got a response, I am not sure about that question (or answer). But if we can get that to happen it would be great for the whole python community (IMHO). -Josh Ronald Bradford wrote: I guess I may have spoken too soon. https://wiki.openstack.org/wiki/PyMySQL_evaluation states Oracle refuses to publish MySQL-connector-Python on Pypi, which is critical to the Openstack infrastructure. I am unclear when this statement was made and who is involved in this discussion. As I have contacts in the MySQL engineering and Oracle Corporation product development teams I will endeavor to seek a more current and definitive response and statement. Regards Ronald On Fri, May 8, 2015 at 10:33 AM, Ronald Bradford m...@ronaldbradford.com mailto:m...@ronaldbradford.com wrote: Has anybody considered the native python connector for MySQL that supports Python 3. Here are the Ubuntu Packages. $ apt-get show python-mysql.connector E: Invalid operation show rbradfor@rubble:~$ apt-cache show python-mysql.connector Package: python-mysql.connector Priority: optional Section: universe/python Installed-Size: 386 Maintainer: Ubuntu Developers ubuntu-devel-disc...@lists.ubuntu.com mailto:ubuntu-devel-disc...@lists.ubuntu.com Original-Maintainer: Sandro Tosi mo...@debian.org mailto:mo...@debian.org Architecture: all Source: mysql-connector-python Version: 1.1.6-1 Replaces: mysql-utilities ( 1.3.5-2) Depends: python:any (= 2.7.5-5~), python:any ( 2.8) Breaks: mysql-utilities ( 1.3.5-2) Filename: pool/universe/m/mysql-connector-python/python-mysql.connector_1.1.6-1_all.deb Size: 67196 MD5sum: 22b2cb35cf8b14ac0bf4493b0d676adb SHA1: de626403e1b14f617e9acb0a6934f044fae061c7 SHA256: 99e34f67d085c28b49eb8145c281deaa6d2b2a48d741e6831e149510087aab94 Description-en: pure Python implementation of MySQL Client/Server protocol MySQL driver written in Python which does not depend on MySQL C client libraries and implements the DB API v2.0 specification (PEP-249). . MySQL Connector/Python is implementing the MySQL Client/Server protocol completely in Python. This means you don't have to compile anything or MySQL (client library) doesn't even have to be installed on the machine. Description-md5: bb7e2eba7769d706d44e0ef91171b4ed Homepage: http://dev.mysql.com/doc/connector-python/en/index.html Bugs: https://bugs.launchpad.net/ubuntu/+filebug Origin: Ubuntu $ apt-cache show python3-mysql.connector Package: python3-mysql.connector Priority: optional Section: universe/python Installed-Size: 385 Maintainer: Ubuntu Developers ubuntu-devel-disc...@lists.ubuntu.com mailto:ubuntu-devel-disc...@lists.ubuntu.com Original-Maintainer: Sandro Tosi mo...@debian.org mailto:mo...@debian.org Architecture: all Source: mysql-connector-python Version: 1.1.6-1 Depends: python3:any (= 3.3.2-2~) Filename: pool/universe/m/mysql-connector-python/python3-mysql.connector_1.1.6-1_all.deb Size: 64870 MD5sum: 461208ed1b89d516d6f6ce43c003a173 SHA1: bd439c4057824178490b402ad6c84067e1e2884e SHA256: 487af52b98bc5f048faf4dc73420eff20b75a150e1f92c82de2ecdd4671659ae Description-en: pure Python implementation of MySQL Client/Server protocol (Python3) MySQL driver written in Python which does not depend on MySQL C client libraries and implements the DB API v2.0 specification (PEP-249). . MySQL Connector/Python is implementing the MySQL Client/Server protocol completely in Python. This means you don't have to compile anything or MySQL (client library) doesn't even have to be installed on the machine. . This package contains the Python 3 version of mysql.connector. Description-md5: 4bca3815f5856ddf4a629b418ec76c8f Homepage: http://dev.mysql.com/doc/connector-python/en/index.html Bugs: https://bugs.launchpad.net/ubuntu/+filebug Origin: Ubuntu Ronald Bradford Web Site: http://ronaldbradford.com http://ronaldbradford.com/ LinkedIn: http://www.linkedin.com/in/ronaldbradford Twitter: @RonaldBradford http://twitter.com/ronaldbradford Skype: RonaldBradford GTalk: Ronald.Bradford On Thu, May 7, 2015 at 9:39 PM, Mike Bayer mba...@redhat.com mailto:mba...@redhat.com wrote: On 5/7/15 5:32 PM, Thomas Goirand wrote: If there are really fixes and features we need in Py2K then of course we have to either convince MySQLdb to merge them or switch to mysqlclient. Given the no reply in 6 months I think that's enough to say it: mysql-python is a dangerous package with a non-responsive upstream. That's always bad, and
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
On 5/8/15 10:41 AM, Ronald Bradford wrote: I guess I may have spoken too soon. https://wiki.openstack.org/wiki/PyMySQL_evaluation states Oracle refuses to publish MySQL-connector-Python on Pypi, which is critical to the Openstack infrastructure. I am unclear when this statement was made and who is involved in this discussion. As I have contacts in the MySQL engineering and Oracle Corporation product development teams I will endeavor to seek a more current and definitive response and statement. I made that statement. I and others have been in contact for many months with Andrew Rist as well as Geert Vanderkelen regarding this issue without any result. We all preferred mysql-connector originally but as time has dragged on and I've sent a few messages to Andrew and others that Openstack is essentially going to give up on their driver to no result, we've all gotten more involved with PyMySQL, it has come out as the better driver overall.PyMySQL is written by the same author of the mysqlclient driver that it looks like we are all switching to regardless (Django has already recommended this to their userbase). PyMySQL also has very straightforward source code, performs better in tests, and doesn't have weird decisions like deciding to make a huge backwards-incompatible change to return bytearrays and not bytes in Py3K raw mode (http://dev.mysql.com/doc/relnotes/connector-python/en/news-2-0-0.html). PyMySQL also is easily accessible as a project with very fast support via Github; several of us have been able to improve PyMySQL via pull requests quickly and without issue, and the maintainer even made me a member of the project so I can even commit fixes directly if I wanted.I don't know that Oracle's ownership of MySQL-connector would be comfortable with these things, and the only way to get support is through Oracle's large and cumbersome bug tracker. Regards Ronald On Fri, May 8, 2015 at 10:33 AM, Ronald Bradford m...@ronaldbradford.com mailto:m...@ronaldbradford.com wrote: Has anybody considered the native python connector for MySQL that supports Python 3. Here are the Ubuntu Packages. $ apt-get show python-mysql.connector E: Invalid operation show rbradfor@rubble:~$ apt-cache show python-mysql.connector Package: python-mysql.connector Priority: optional Section: universe/python Installed-Size: 386 Maintainer: Ubuntu Developers ubuntu-devel-disc...@lists.ubuntu.com mailto:ubuntu-devel-disc...@lists.ubuntu.com Original-Maintainer: Sandro Tosi mo...@debian.org mailto:mo...@debian.org Architecture: all Source: mysql-connector-python Version: 1.1.6-1 Replaces: mysql-utilities ( 1.3.5-2) Depends: python:any (= 2.7.5-5~), python:any ( 2.8) Breaks: mysql-utilities ( 1.3.5-2) Filename: pool/universe/m/mysql-connector-python/python-mysql.connector_1.1.6-1_all.deb Size: 67196 MD5sum: 22b2cb35cf8b14ac0bf4493b0d676adb SHA1: de626403e1b14f617e9acb0a6934f044fae061c7 SHA256: 99e34f67d085c28b49eb8145c281deaa6d2b2a48d741e6831e149510087aab94 Description-en: pure Python implementation of MySQL Client/Server protocol MySQL driver written in Python which does not depend on MySQL C client libraries and implements the DB API v2.0 specification (PEP-249). . MySQL Connector/Python is implementing the MySQL Client/Server protocol completely in Python. This means you don't have to compile anything or MySQL (client library) doesn't even have to be installed on the machine. Description-md5: bb7e2eba7769d706d44e0ef91171b4ed Homepage: http://dev.mysql.com/doc/connector-python/en/index.html Bugs: https://bugs.launchpad.net/ubuntu/+filebug Origin: Ubuntu $ apt-cache show python3-mysql.connector Package: python3-mysql.connector Priority: optional Section: universe/python Installed-Size: 385 Maintainer: Ubuntu Developers ubuntu-devel-disc...@lists.ubuntu.com mailto:ubuntu-devel-disc...@lists.ubuntu.com Original-Maintainer: Sandro Tosi mo...@debian.org mailto:mo...@debian.org Architecture: all Source: mysql-connector-python Version: 1.1.6-1 Depends: python3:any (= 3.3.2-2~) Filename: pool/universe/m/mysql-connector-python/python3-mysql.connector_1.1.6-1_all.deb Size: 64870 MD5sum: 461208ed1b89d516d6f6ce43c003a173 SHA1: bd439c4057824178490b402ad6c84067e1e2884e SHA256: 487af52b98bc5f048faf4dc73420eff20b75a150e1f92c82de2ecdd4671659ae Description-en: pure Python implementation of MySQL Client/Server protocol (Python3) MySQL driver written in Python which does not depend on MySQL C client libraries and implements the DB API v2.0 specification (PEP-249). . MySQL Connector/Python is implementing the MySQL Client/Server protocol completely in Python. This means you don't have to compile
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
On 5/7/15 5:32 PM, Thomas Goirand wrote: If there are really fixes and features we need in Py2K then of course we have to either convince MySQLdb to merge them or switch to mysqlclient. Given the no reply in 6 months I think that's enough to say it: mysql-python is a dangerous package with a non-responsive upstream. That's always bad, and IMO, enough to try to get rid of it. If you think switching to PyMYSQL is effortless, and the best way forward, then let's do that ASAP! haha - id rather have drop eventlet + mysqlclient :) as far as this thread, where this has been heading is that django has already been recommending mysqlclient and it's become apparent just what a barrage of emails and messages have been sent Andy Dustman's way, with no response.I agree this is troubling behavior, and I've alerted people at RH internal that we need to start thinking about this package switch.My original issue was that for Fedora etc., changing it in this way is challenging, and from my discussions with packaging people, this is actually correct - this isn't an easy way to do it for them and there have been many emails as a result. My other issue is the SQLAlchemy testing issue - I'd essentially have to just stop testing mysql-python and switch to mysqlclient entirely, which means i need to revise all my docs and get all my users to switch also when the SQLAlchemy MySQLdb dialect eventually diverges from mysql-python 1.2.5, hence the whole thing is in a not-minor-enough way my problem as well.A simple module name change for mysqlclient, then there's no problem. But there you go - assuming continued crickets from AD, and seeing that people continue find it important to appease projects like Trac that IMO quite amateurishly hardcode import MySQLdb, I don't see much other option. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
On 05/05/2015 09:56 PM, Mike Bayer wrote: Having two packages that both install into the same name is the least ideal arrangement From your point of view, and for testing against both, certainly. But for a distribution, avoiding dot have 2 packages clashing each other and deciding on only a single implementation of the same API is so much better in many ways. This avoid the duplication of work, security support, and above all: this makes it possible for all reverse dependency to just use the new implementation without doing anything. and I don't see why we have to settle for a mediocre outcome like that. What we want is MySQL-Python to be maintained, we have a maintainer, we have the code, we have everything we need, except a password. We should at least make an attempt at that outcome. A fork is often the worst thing that can happen to a project. See the examples of libav vs ffmpeg, libreoffice vs openoffice, or mysql vs mariadb. At the end, end users and developers all suffer. The only thing we can do is pickup the implementation which we believe is best for us. And in this case, it looks like mysqlclient has python3 support, which we want as a feature. If you believe you can make it so that either: #1 mysql-python can get Python 3 support. #2 both forks are re-merged, and maintained as one. then that's the best possible outcome (especially #2). Whatever happens, talking to both upstream seems a very good idea to me. However, it may not be possible to revert what has (or is about to) happen in Debian, as this is the decision of the package maintainer. I don't think it would be a good idea to go up to the Debian technical committee if the maintainer of the python-mysqldb package doesn't do something we like. The only other option we'd have would be to re-introduce mysql-python as a separate package, but the Debian FTP masters may oppose to it and reject it, unless we have a very good reason to do so (and at this point, I don't know if we do...). Hoping the above helps with Debian insights, Cheers, Thomas Goirand (zigo) __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
On 05/05/2015 08:41 PM, Mike Bayer wrote: On 5/4/15 6:48 PM, Thomas Goirand wrote: I don't see what it would break. If I do: Package: python-mysqlclient Breaks: python-mysqldb Replaces: python-mysqldb Provides: python-mysqldb everything is fine, and python-mysqlclient becomes another implementation of the same thing. Then I believe it'd be a good idea to simply remove python-mysqldb from Debian, since it's not maintained upstream anymore. It is also imprudent to switch production openstack applications to a driver that is new and untested (even though it is a port), nor is it necessary. Supporting Python 3 is necessary, as we are going to remove Python 2 from Debian from Buster. I don't know debian but the approach would be that something like the mysqlclient-py3k package applies to Python 3 only. There should be no reason Openstack applications are hardcoded to one database driver. If they share the same import mysqldb, and if they are API compatible, how is this a problem? how do you know they are API compatible? According to Victor, that's what the author of the fork says. That's also what he wants, as per the issue 44 which you raised (the mysqlclient upstream wants it to be a drop-in replacement for mysqldb, to help distributions to better switch to Python 3). This is in fact exactly where this approach can become a huge problem. No MySQL drivers I've ever used are fully API compatible with any of the other ones. *all* of them have subtle and not-so-subtle differences in behavior. That mysqlclient is now a fork means it will begin to diverge, and as issues come up to which their resolution requires even more subtle or not-so-subtle changes in behavior, these differences will only continue to grow. I agree. Which is exactly why we don't want one package for Py2, and the other one for Py3. From a SQLAlchemy perspective this would be much easier to maintain as a new sub-dialect. Best for SQLA and everyone else would be a re-merge as a single project. Either that, or just mysqlclient takes over completely mysql-python in PyPi, just like you suggested in the github issue 44. I'd love to see one or the other happen. The later could be decided by a PyPi administrator, given the fact that the mysql-python maintainer is unresponsive. Have you tried to approach someone with such rights at PyPi? Though if it doesn't happen, as you wrote it's going to be hell for you to test against both implementation. Maybe then the only choice you have is to decide to use only one of them (and mysqlclient seems the best of both). I by the way found methane very reasonable in his replies The approach should be simply that in Python 3, the mysqlclient library is installed instead of mysql-python. So, in Python 3, we'd have some bugfixes, and not in Python 2? This seems a very weird approach to me, which *will* lead to lots of issues. I've asked three times now to please show the bugfixes that are needed. Yourself, you wrote that there was some bugfixes and subtle differences, didn't you? Show me the issues that aren't being fixed, and then I will be convinced and begin the process of pushing here at Red Hat to make the same packaging changes such that our customers will no longer be able to use the original MySQLdb. We're talking about an instant, systemwide replacement of one MySQLdb implementation for another and I just think that is high risk. IMO, since that's a fork, the risk isn't greater than just upgrading from one version to next for any given package. Switching to mysqlclient is basically almost free (by that, I mean effortless), if I understand what Victor wrote. The same thing can't be said of removing Eventlet or switching to pymysql, even though if both may be needed. So why add the later as a blocker for the former? Well, switching to pymysql *is* just as effortless IMHO, and in fact *more* effortless because it can be done impacting only individual applications at a time, rather than forcing it on everything at once. SQLAlchemy has a dialect for PyMySQL already which is well maintained and well tested. We change the database URL in projects to include mysql+pymysql, update requirements.txt, distros add their packages like they have to anyway, and we're done. Really? If it's that simple, then please start doing this, and let's happily switch to PyMYSQL for Liberty. But again, I really want to see what the critical issues in MySQLdb are that are holding us back. The main motivation is the lack of support for Python 3. If there are really fixes and features we need in Py2K then of course we have to either convince MySQLdb to merge them or switch to mysqlclient. Given the no reply in 6 months I think that's enough to say it: mysql-python is a dangerous package with a non-responsive upstream. That's always bad, and IMO, enough to try to get rid of it. If you think switching to PyMYSQL is effortless, and the best way forward, then
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
On 04/30/2015 05:00 PM, Victor Stinner wrote: Hi, I propose to replace mysql-python with mysqlclient in OpenStack applications to get Python 3 support, bug fixes and some new features (support MariaDB's libmysqlclient.so, support microsecond in TIME column). In fact, when looking at the python-mysqldb package description in Debian, I can see: Mysqlclient is an interface to the popular MySQL database server for Python. . This is a fork of MySQLdb. It add Python 3.3 support and merges some pull requests. Then I saw this: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=768096 The package is currently only in Debian experimental, but I am betting that soon, the new python-mysqldb package will be uploaded to Sid, and it's very likely that Ubuntu will follow (and sync the package from Debian). As a consequence, I think it'd be much better that OpenStack follows that and use the same thing as distributions. I of course don't know what Fedora will do, but maybe they may follow the trend... Also, I've been using that fork without realizing it, and as much as I can tell, OpenStack continues to work... Cheers, Thomas Goirand (zigo) __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
On 5/5/15 1:11 PM, Thomas Goirand wrote: On 04/30/2015 05:00 PM, Victor Stinner wrote: Hi, I propose to replace mysql-python with mysqlclient in OpenStack applications to get Python 3 support, bug fixes and some new features (support MariaDB's libmysqlclient.so, support microsecond in TIME column). In fact, when looking at the python-mysqldb package description in Debian, I can see: Mysqlclient is an interface to the popular MySQL database server for Python. . This is a fork of MySQLdb. It add Python 3.3 support and merges some pull requests. Then I saw this: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=768096 Wow, the thread decides to go forward with the move based on incorrect information. MySQL-Python's last release was on Jan 2, 2014, *not* in 2010. They are looking at the entirely wrong repository. Andy Dustman is a real person who is easily locatable on many services including Twitter, Linkedin, Github, etc. Any chance that anyone has tried to get a comment from him on this, given that with the Django recommendation and the distro package moves, his package is about to be more or less wiped out of most major distributions?It just would be good style IMHO. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
On 5/5/15 3:07 PM, Mike Bayer wrote: On 5/5/15 1:11 PM, Thomas Goirand wrote: On 04/30/2015 05:00 PM, Victor Stinner wrote: Hi, I propose to replace mysql-python with mysqlclient in OpenStack applications to get Python 3 support, bug fixes and some new features (support MariaDB's libmysqlclient.so, support microsecond in TIME column). In fact, when looking at the python-mysqldb package description in Debian, I can see: Mysqlclient is an interface to the popular MySQL database server for Python. . This is a fork of MySQLdb. It add Python 3.3 support and merges some pull requests. Then I saw this: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=768096 Wow, the thread decides to go forward with the move based on incorrect information. MySQL-Python's last release was on Jan 2, 2014, *not* in 2010. They are looking at the entirely wrong repository. Andy Dustman is a real person who is easily locatable on many services including Twitter, Linkedin, Github, etc. Any chance that anyone has tried to get a comment from him on this, given that with the Django recommendation and the distro package moves, his package is about to be more or less wiped out of most major distributions?It just would be good style IMHO. There's also a great thread from Naoki on the Django list, where at least we can get a view of his plans for the project: https://groups.google.com/forum/#!msg/django-developers/n-TI8mBcegE/hlNLYncAFFkJ e.g. he isn't going to go for new features or anything like that, just ongoing compatibility. That's a good thing. But what we really want here is for Naoki to be able to release new MySQL-Python versions. I'd like to see if we can get a hold of Andy Dustman and get his feelings on that. Right now, I cannot test SQLAlchemy against both MySQL-Python and mysqlclient conveniently. I need to make two different virtual environments and run the whole test suite separately. My test suite is able to run the tests against multiple backends in one Python process and with this packaging/import arrangement that's not possible. Having two packages that both install into the same name is the least ideal arrangement and I don't see why we have to settle for a mediocre outcome like that. What we want is MySQL-Python to be maintained, we have a maintainer, we have the code, we have everything we need, except a password. We should at least make an attempt at that outcome. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
On 5/4/15 6:48 PM, Thomas Goirand wrote: I don't see what it would break. If I do: Package: python-mysqlclient Breaks: python-mysqldb Replaces: python-mysqldb Provides: python-mysqldb everything is fine, and python-mysqlclient becomes another implementation of the same thing. Then I believe it'd be a good idea to simply remove python-mysqldb from Debian, since it's not maintained upstream anymore. It is also imprudent to switch production openstack applications to a driver that is new and untested (even though it is a port), nor is it necessary. Supporting Python 3 is necessary, as we are going to remove Python 2 from Debian from Buster. I don't know debian but the approach would be that something like the mysqlclient-py3k package applies to Python 3 only. There should be no reason Openstack applications are hardcoded to one database driver. If they share the same import mysqldb, and if they are API compatible, how is this a problem? how do you know they are API compatible? This is in fact exactly where this approach can become a huge problem. No MySQL drivers I've ever used are fully API compatible with any of the other ones. *all* of them have subtle and not-so-subtle differences in behavior. That mysqlclient is now a fork means it will begin to diverge, and as issues come up to which their resolution requires even more subtle or not-so-subtle changes in behavior, these differences will only continue to grow. From a SQLAlchemy perspective this would be much easier to maintain as a new sub-dialect. I've proposed that they change their name: https://github.com/PyMySQL/mysqlclient-python/issues/44 . However, the maintainers are not going for it, so I guess that isn't going to happen. The approach should be simply that in Python 3, the mysqlclient library is installed instead of mysql-python. So, in Python 3, we'd have some bugfixes, and not in Python 2? This seems a very weird approach to me, which *will* lead to lots of issues. I've asked three times now to please show the bugfixes that are needed.Show me the issues that aren't being fixed, and then I will be convinced and begin the process of pushing here at Red Hat to make the same packaging changes such that our customers will no longer be able to use the original MySQLdb. We're talking about an instant, systemwide replacement of one MySQLdb implementation for another and I just think that is high risk. B. use pymysql.All other performance arguments are moot right now as we are in the basement. Eventlet has to die, we all know it. Not only for performances reason. But this is completely orthogonal to the discussion we're having about having Python 3 support. Please don't stand on the way to do it, just because we have other (unrelated) issues with Eventlet + MySQL. Switching to mysqlclient is basically almost free (by that, I mean effortless), if I understand what Victor wrote. The same thing can't be said of removing Eventlet or switching to pymysql, even though if both may be needed. So why add the later as a blocker for the former? Well, switching to pymysql *is* just as effortless IMHO, and in fact *more* effortless because it can be done impacting only individual applications at a time, rather than forcing it on everything at once. SQLAlchemy has a dialect for PyMySQL already which is well maintained and well tested. We change the database URL in projects to include mysql+pymysql, update requirements.txt, distros add their packages like they have to anyway, and we're done. From my view, if we're going to switch DBAPIs then PyMySQL would be it - if we're going for bug fixes in the DBAPI, the doesn't support eventlet is the *biggest* bug. But again, I really want to see what the critical issues in MySQLdb are that are holding us back. If there are really fixes and features we need in Py2K then of course we have to either convince MySQLdb to merge them or switch to mysqlclient. At the moment though I need to see the evidence for me to really buy this argument. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
Hi, Mike Bayer wrote: It is not feasible to use MySQLclient in Python 2 because it uses the same module name as Python-MySQL, and would wreak havoc with distro packaging and many other things. IMO mysqlclient is just the new upstream for MySQL-Python, since MySQL-Python is no more maintained. Why Linux distributions would not package mysqlclient if it provides Python 3 support, contains bugfixes and more features? It's quite common to have two packages in conflicts beceause they provide the same function, same library, same program, etc. I would even suggest packagers to use mysqlclient as the new source without modifying their package. It is also imprudent to switch production openstack applications to a driver that is new and untested (even though it is a port), nor is it necessary. Why do you consider that mysqlclient is not tested or less tested than mysql-python? Which kind of regression do you expect in mysqlclient? As mysql-python, mysqlclient Github project is connected to Travis: https://travis-ci.org/PyMySQL/mysqlclient-python (tests pass) I trust more a project which is actively developed. There should be no reason Openstack applications are hardcoded to one database driver. The approach should be simply that in Python 3, the mysqlclient library is installed instead of mysql-python. Technically, it's now possible to have different dependencies on Python 2 and Python 3. But in practice, there are some annoying corner cases. It's more convinient to have same dependencies on Python 2 and Python 3. Using mysqlclient on Python 2 and Python 3 would avoid to have bugs specific to Python 2 (bugs already fixed in mysqlclient) and new features only available on Python 3. Victor __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
I propose to replace mysql-python with mysqlclient in OpenStack applications to get Python 3 support, bug fixes and some new features (support MariaDB's libmysqlclient.so, support microsecond in TIME column). I just proposed a change to add mysqlclient dependency to global requirements: https://review.openstack.org/#/c/179745/ Victor __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
On 04/30/2015 07:48 PM, Mike Bayer wrote: On 4/30/15 11:00 AM, Victor Stinner wrote: Hi, I propose to replace mysql-python with mysqlclient in OpenStack applications to get Python 3 support, bug fixes and some new features (support MariaDB's libmysqlclient.so, support microsecond in TIME column). It is not feasible to use MySQLclient in Python 2 because it uses the same module name as Python-MySQL, and would wreak havoc with distro packaging and many other things. I don't see what it would break. If I do: Package: python-mysqlclient Breaks: python-mysqldb Replaces: python-mysqldb Provides: python-mysqldb everything is fine, and python-mysqlclient becomes another implementation of the same thing. Then I believe it'd be a good idea to simply remove python-mysqldb from Debian, since it's not maintained upstream anymore. It is also imprudent to switch production openstack applications to a driver that is new and untested (even though it is a port), nor is it necessary. Supporting Python 3 is necessary, as we are going to remove Python 2 from Debian from Buster. There should be no reason Openstack applications are hardcoded to one database driver. If they share the same import mysqldb, and if they are API compatible, how is this a problem? The approach should be simply that in Python 3, the mysqlclient library is installed instead of mysql-python. So, in Python 3, we'd have some bugfixes, and not in Python 2? This seems a very weird approach to me, which *will* lead to lots of issues. MySQLclient installs under the same name, so in this case there isn't even any change to the SQLAlchemy URL required. Nor there should be in anything else, if they are completely API compatible. PyMySQL is monkeypatchable, so as long as we are using eventlet, it is *insane* that we are using MySQL-Python at all, because it is actively making openstack applications perform much much more poorly than if we just removed eventlet. So as long as eventlet is running, PyMySQL wins the performance argument hands down (as described at the link http://www.diamondtin.com/2014/sqlalchemy-gevent-mysql-python-drivers-comparison/ which is in the third paragraph of that wiki page). And it's Py3k compatible. Ok, so you are for switching to pymysql. Good. But is this realistic? Are you going to provide yourself all the patches for absolutely all projects of OpenStack that is using python-mysqldb? 1. keep Mysql-python on Py2K, use mysqlclient on py3k, changing the implementation of the MySQLdb module on Py2K, server-wide, would be very disruptive I'm sorry to say it this way, because I respect you a lot and you did a lot of very good things. But Mike, this is a very silly idea. We are already having difficulties to push support for Py3, and in some cases, it's hard to deal with the differences. Now, you want to add even more source of problems, with bugs specific to Py2 or Py3 implementation? Why should we make our life even more miserable? I completely fail to understand what we would try to achieve by doing this. 2. if we actually care about performance, we either A. dump eventlet or B. use pymysql.All other performance arguments are moot right now as we are in the basement. Eventlet has to die, we all know it. Not only for performances reason. But this is completely orthogonal to the discussion we're having about having Python 3 support. Please don't stand on the way to do it, just because we have other (unrelated) issues with Eventlet + MySQL. Switching to mysqlclient is basically almost free (by that, I mean effortless), if I understand what Victor wrote. The same thing can't be said of removing Eventlet or switching to pymysql, even though if both may be needed. So why add the later as a blocker for the former? Cheers, Thomas Goirand (zigo) __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
On 4/30/15 11:16 AM, Dan Smith wrote: There is an open discussion to replace mysql-python with PyMySQL, but PyMySQL has worse performance: https://wiki.openstack.org/wiki/PyMySQL_evaluation My major concern with not moving to something different (i.e. not based on the C library) is the threading problem. Especially as we move in the direction of cellsv2 in nova, not blocking the process while waiting for a reply from mysql is going to be critical. Further, I think that we're likely to get back a lot of performance from a supports-eventlet database connection because of the parallelism that conductor currently can only provide in exchange for the footprint of forking into lots of workers. If we're going to move, shouldn't we be looking at something that supports our threading model? yes, but at the same time, we should change our threading model at the level of where APIs are accessed to refer to a database, at the very least using a threadpool behind eventlet. CRUD-oriented database access is faster using traditional threads, even in Python, than using an eventlet-like system or using explicit async. The tests at http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/ show this.With traditional threads, we can stay on the C-based MySQL APIs and take full advantage of their speed. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
On 4/30/15 11:00 AM, Victor Stinner wrote: Hi, I propose to replace mysql-python with mysqlclient in OpenStack applications to get Python 3 support, bug fixes and some new features (support MariaDB's libmysqlclient.so, support microsecond in TIME column). It is not feasible to use MySQLclient in Python 2 because it uses the same module name as Python-MySQL, and would wreak havoc with distro packaging and many other things. It is also imprudent to switch production openstack applications to a driver that is new and untested (even though it is a port), nor is it necessary. There should be no reason Openstack applications are hardcoded to one database driver. The approach should be simply that in Python 3, the mysqlclient library is installed instead of mysql-python. MySQLclient installs under the same name, so in this case there isn't even any change to the SQLAlchemy URL required. The MySQL database is popular, but the Python driver mysql-python doesn't look to be maintained anymore. The latest commit was done in january 2014, before the release of MySQL-python 1.2.5: https://github.com/farcepest/MySQLdb1/commits/master One major issue is that mysql-python doesn't support Python 3. It blocks porting most OpenStack applications to Python 3. There are now 32 open issues and 25 pending pull requests. I also sent an email to Andy Dustman (aka farcepest) last week, but I didn't get any reply yet. There is an open discussion to replace mysql-python with PyMySQL, but PyMySQL has worse performance: https://wiki.openstack.org/wiki/PyMySQL_evaluation PyMySQL is monkeypatchable, so as long as we are using eventlet, it is *insane* that we are using MySQL-Python at all, because it is actively making openstack applications perform much much more poorly than if we just removed eventlet. So as long as eventlet is running, PyMySQL wins the performance argument hands down (as described at the link http://www.diamondtin.com/2014/sqlalchemy-gevent-mysql-python-drivers-comparison/ which is in the third paragraph of that wiki page). And it's Py3k compatible. The performance results in that wiki page are also out of date. Naoki INADA has merged several performance improvements since then. My ultimate setup would still use mysql-python Py2K / MySQLclient Py3K, and Openstack applications would again use traditional threads for database APIs. But that is two changes. so to sum up: 1. keep Mysql-python on Py2K, use mysqlclient on py3k, changing the implementation of the MySQLdb module on Py2K, server-wide, would be very disruptive 2. if we actually care about performance, we either A. dump eventlet or B. use pymysql.All other performance arguments are moot right now as we are in the basement. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
If we're going to move, shouldn't we be looking at something that supports our threading model? I would prefer to make baby steps, and first fix the Python 3 compatibility. Enhance concurrency/parallelism is a much more complex project than just replacing a single line in dependencies ;-) See my email, I mentioned a workaround for mysqlclient and a spec discussing a more general solution for concurrency. Victor __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
There is an open discussion to replace mysql-python with PyMySQL, but PyMySQL has worse performance: https://wiki.openstack.org/wiki/PyMySQL_evaluation My major concern with not moving to something different (i.e. not based on the C library) is the threading problem. Especially as we move in the direction of cellsv2 in nova, not blocking the process while waiting for a reply from mysql is going to be critical. Further, I think that we're likely to get back a lot of performance from a supports-eventlet database connection because of the parallelism that conductor currently can only provide in exchange for the footprint of forking into lots of workers. If we're going to move, shouldn't we be looking at something that supports our threading model? --Dan __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev