Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-12 Thread Kevin Benton
Arguably if we're actually seeing performance issues then it's not a
distraction but rather a real problem that needs fixing.

The important take-away from the thread is that we aren't anywhere near
hitting python limits though. Our main bottleneck is due to the fact that
we are serializing all DB requests in a DB-heavy codebase.

On Mon, May 11, 2015 at 9:18 PM, Chris Friesen chris.frie...@windriver.com
wrote:

 On 05/11/2015 08:22 PM, Jay Pipes wrote:

 c) Many OpenStack services, including Nova, Cinder, and Neutron, when
 looked at
 from a thousand-foot level, are little more than glue code that pipes out
 to a
 shell to execute system commands (sorry, but it's true).


 No apologies necessary. :)

  So, bottom line for me: focus on the things that will have the biggest
 impact to
 long-term cost reduction of our codebase.


 +1

  So, to me, the highest priority performance and scale fixes actually have
 to do
 with the simplification of our subsystems and architecture, not with
 whether we
 use mysql-python, PyMySQL, Python vs. Scala vs. Rust, or any other
 distractions.


 Arguably if we're actually seeing performance issues then it's not a
 distraction but rather a real problem that needs fixing.

 But I agree that we shouldn't be trying to optimize the performance pf
 code that isn't causing problems.

 Chris


 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




-- 
Kevin Benton
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-12 Thread Mike Bayer



On 5/11/15 9:17 PM, Robert Collins wrote:

On 12 May 2015 at 10:44, Mike Bayer mba...@redhat.com wrote:


What we have today in our standard architecture for OpenStack is
optimised for IO bound workloads: waiting on the
network/subprocesses/disk/libvirt etc. Running high numbers of
eventlet handlers in a single process only works when the majority of
the work being done by a handler is IO.


Everything stated here is great, however in our situation there is one
unfortunate fact which renders it completely incorrect at the moment.   I'm
still puzzled why we are getting into deep think sessions about the vagaries
of the GIL and async when there is essentially a full-on red-alert
performance blocker rendering all of this discussion useless, so I must
again remind us: what we have *today* in Openstack is *as completely
un-optimized as you can possibly be*.

Sorry if I seems like I went on a tangent, but choosing a concurrency
model in Python, which a lot of this discussion has been about, is
inextricably linked to the workload being tackled. The point of my
tl;dr was that using threads - which gets us out of the pit below - is
fine for most of our workloads and irrelevant to the actual issues in
the other ones. Clearly that didn't come across. - Sorry.

Robert -

Other people noted my fast takeoff as well so i think I saw GIL and 
lots of thoughtful calculations and after that, my reading comprehension 
is dulled by the fog of my own angst :).I'll try to slow down more 
next time.




The most GIL-heavy nightmare CPU bound task you can imagine running on 25
threads on a ten year old Pentium will run better than the Openstack we have
today, because we are running a C-based, non-eventlet patched DB library
within a single OS thread that happens to use eventlet, but the use of
eventlet is totally pointless because right now it blocks completely on all
database IO.

To confirm my understanding: this library releases the GIL, but
because we only have one thread, we don't get more work done.

Yes, that sucks. And your tl;dr is that we need to either use an
eventlet ready library or not use eventlet's greenthreads, either of
which I support as a short term rectification.
yes, the GIL is released within the MySQLdb C routines that are 
primarily focused on IO here.






Robert's analysis talks about various at the limit issues,  but I was

They tend to turn up at scale. You get 100 requests a day out of 5
million that are inexplicably slow, and eventually you have enough
data around the situation to try an experiment, and lo and behold the
problem goes away. They don't disagree with the argument you're making
though - this is just the bigger context, when folk go to deploy our
(real threads || eventlet friendly DB library) code, how many
processes will they need?
It's been pointed out separately that Openstack already uses a lot of 
processes, and even now with our serialized DB access per-process we 
still achieve concurrency through this.  So by all means, let's keep 
using processes, that is always a good thing although it does present 
the challenge that we have a lot of DB connections opened as a result 
(because we use pooling).





FWIW, I think moving to an eventlet friendly library should be the
first step because it can be done much more rapidly and with arguably
less risk.
Yes I'm not really sure why we aren't just changing mysql+mysqldb:// 
to mysql+pymysql:// in our config files right now.   Because this 
would also solve the Py3K issue for the time being.




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-12 Thread Attila Fazekas




- Original Message -
 From: Robert Collins robe...@robertcollins.net
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Sent: Tuesday, May 12, 2015 3:06:21 AM
 Subject: Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
 
 On 12 May 2015 at 10:12, Attila Fazekas afaze...@redhat.com wrote:
 
 
 
 
 
  If you can illustrate a test script that demonstrates the actual failing
  of OS threads that does not occur greenlets here, that would make it
  immediately apparent what it is you're getting at here.
 
 
  http://www.fpaste.org/220824/raw/
 
  I just put together hello word C example and a hello word threading
  example,
  and replaced the print with sleep(3).
 
  When I use the sleep(3) from python, the 5 thread program runs in ~3
  second,
  when I use the sleep(3) from native code, it runs ~15 sec.
 
  So yes, it is very likely a GIL lock wait related issue,
  when the native code is not assisting.
 
 Your test code isn't releasing the GIL here, and I'd expect C DB
 drivers to be releasing the GIL: you've illustrated how a C extension
 can hold the GIL, but not whether thats happening.

Yes.

And you are right the C driver wrapper releases the GIL at every important 
mysql C driver call. (Py_BEGIN_ALLOW_THREADS)

Good to know :)


 
  Do you need a DB example, by using the mysql C driver,
  and waiting in an actual I/O primitive ?
 
 waiting in an I/O primitive is fine as long as the GIL has been released.

http://www.fpaste.org/221101/

Actually the eventlet version of the play/test code
is producing the mentioned error:
'Lock wait timeout exceeded; try restarting transaction'.

I have not seen the above issue with the regular python threads.

The driver does not cooperates with the event hub :(


PS.:
The 'Deadlock found when trying to get lock; try restarting transaction'
would be different situation, and it is not related to the eventlet issue.

 
 -Rob
 
 
 --
 Robert Collins rbtcoll...@hp.com
 Distinguished Technologist
 HP Converged Cloud
 
 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-11 Thread Attila Fazekas




- Original Message -
 From: John Garbutt j...@johngarbutt.com
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Cc: Dan Smith d...@danplanet.com
 Sent: Saturday, May 9, 2015 12:45:26 PM
 Subject: Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
 
 On 30 April 2015 at 18:54, Mike Bayer mba...@redhat.com wrote:
  On 4/30/15 11:16 AM, Dan Smith wrote:
  There is an open discussion to replace mysql-python with PyMySQL, but
  PyMySQL has worse performance:
 
  https://wiki.openstack.org/wiki/PyMySQL_evaluation
 
  My major concern with not moving to something different (i.e. not based
  on the C library) is the threading problem. Especially as we move in the
  direction of cellsv2 in nova, not blocking the process while waiting for
  a reply from mysql is going to be critical. Further, I think that we're
  likely to get back a lot of performance from a supports-eventlet
  database connection because of the parallelism that conductor currently
  can only provide in exchange for the footprint of forking into lots of
  workers.
 
  If we're going to move, shouldn't we be looking at something that
  supports our threading model?
 
  yes, but at the same time, we should change our threading model at the
  level
  of where APIs are accessed to refer to a database, at the very least using
  a
  threadpool behind eventlet.   CRUD-oriented database access is faster using
  traditional threads, even in Python, than using an eventlet-like system or
  using explicit async.  The tests at
  http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/
  show this.With traditional threads, we can stay on the C-based MySQL
  APIs and take full advantage of their speed.
 
 Sorry to go back in time, I wanted to go back to an important point.
 
 It seems we have three possible approaches:
 * C lib and eventlet, blocks whole process
 * pure python lib, and eventlet, eventlet does its thing
 * go for a C lib and dispatch calls via thread pool

* go with pure C protocol lib, which explicitly using `python patch-able` 
  I/O function (Maybe others like.: threading, mutex, sleep ..)

* go with pure C protocol lib and the python part explicitly call
  for `decode` and `encode`, the C part just do CPU intensive operations,
  and it never calls for I/O primitives .   

 We have a few problems:
 * performance sucks, we have to fork lots of nova-conductors and api nodes
 * need to support python2.7 and 3.4, but its not currently possible
 with the lib we use?
 * want to pick a lib that we can fix when there are issues, and work to
 improve
 
 It sounds like:
 * currently do the first one, it sucks, forking nova-conductor helps
 * seems we are thinking the second one might work, we sure get py3.4 +
 py2.7 support
 * the last will mean more work, but its likely to be more performant
 * worried we are picking a unsupported lib with little future
 
 I am leaning towards us moving to making DB calls with a thread pool
 and some fast C based library, so we get the 'best' performance.
 
 Is that a crazy thing to be thinking? What am I missing here?

Using the python socket from C code:
https://github.com/esnme/ultramysql/blob/master/python/io_cpython.c#L100

Also possible to implement a mysql driver just as a protocol parser,
and you are free to use you favorite event based I/O strategy (direct epoll 
usage)
even without eventlet (or similar).

The issue with ultramysql, it does not implements
the `standard` python DB API, so you would need to add an extra wrapper to 
SQLAlchemy.

 
 Thanks,
 John
 
 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-11 Thread Mike Bayer



On 5/11/15 9:58 AM, Attila Fazekas wrote:




- Original Message -

From: John Garbutt j...@johngarbutt.com
To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.org
Cc: Dan Smith d...@danplanet.com
Sent: Saturday, May 9, 2015 12:45:26 PM
Subject: Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

On 30 April 2015 at 18:54, Mike Bayer mba...@redhat.com wrote:

On 4/30/15 11:16 AM, Dan Smith wrote:

There is an open discussion to replace mysql-python with PyMySQL, but
PyMySQL has worse performance:

https://wiki.openstack.org/wiki/PyMySQL_evaluation

My major concern with not moving to something different (i.e. not based
on the C library) is the threading problem. Especially as we move in the
direction of cellsv2 in nova, not blocking the process while waiting for
a reply from mysql is going to be critical. Further, I think that we're
likely to get back a lot of performance from a supports-eventlet
database connection because of the parallelism that conductor currently
can only provide in exchange for the footprint of forking into lots of
workers.

If we're going to move, shouldn't we be looking at something that
supports our threading model?

yes, but at the same time, we should change our threading model at the
level
of where APIs are accessed to refer to a database, at the very least using
a
threadpool behind eventlet.   CRUD-oriented database access is faster using
traditional threads, even in Python, than using an eventlet-like system or
using explicit async.  The tests at
http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/
show this.With traditional threads, we can stay on the C-based MySQL
APIs and take full advantage of their speed.

Sorry to go back in time, I wanted to go back to an important point.

It seems we have three possible approaches:
* C lib and eventlet, blocks whole process
* pure python lib, and eventlet, eventlet does its thing
* go for a C lib and dispatch calls via thread pool

* go with pure C protocol lib, which explicitly using `python patch-able`
   I/O function (Maybe others like.: threading, mutex, sleep ..)

* go with pure C protocol lib and the python part explicitly call
   for `decode` and `encode`, the C part just do CPU intensive operations,
   and it never calls for I/O primitives .


We have a few problems:
* performance sucks, we have to fork lots of nova-conductors and api nodes
* need to support python2.7 and 3.4, but its not currently possible
with the lib we use?
* want to pick a lib that we can fix when there are issues, and work to
improve

It sounds like:
* currently do the first one, it sucks, forking nova-conductor helps
* seems we are thinking the second one might work, we sure get py3.4 +
py2.7 support
* the last will mean more work, but its likely to be more performant
* worried we are picking a unsupported lib with little future

I am leaning towards us moving to making DB calls with a thread pool
and some fast C based library, so we get the 'best' performance.

Is that a crazy thing to be thinking? What am I missing here?

Using the python socket from C code:
https://github.com/esnme/ultramysql/blob/master/python/io_cpython.c#L100

Also possible to implement a mysql driver just as a protocol parser,
and you are free to use you favorite event based I/O strategy (direct epoll 
usage)
even without eventlet (or similar).

The issue with ultramysql, it does not implements
the `standard` python DB API, so you would need to add an extra wrapper to 
SQLAlchemy.


This driver appears to have seen its last commit about a year ago, that 
doesn't even implement the standard DBAPI (which is already a red 
flag).   There is apparently a separately released (!) DBAPI-compat 
wrapper https://pypi.python.org/pypi/umysqldb/1.0.3 which has had no 
releases in two years. If this wrapper is indeed compatible with 
MySQLdb then it would run in SQLAlchemy without changes (though I'd be 
extremely surprised if it passes our test suite).


How would using these obscure libraries be any preferable than running 
Nova API functions within the thread-pooling facilities already included 
with eventlet ?Keeping in mind that I've now done the work [1] 
to show that there is no performance gain to be had for all the trouble 
we go through to use eventlet/gevent/asyncio with local database 
connections.


[1] http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/







__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-11 Thread Mike Bayer



On 5/11/15 2:02 PM, Attila Fazekas wrote:


Not just with local database connections,
the 10G network itself also fast. Is is possible you spend more time even on
the kernel side tcp/ip stack (and the context switch..) (Not in physical I/O 
wait)
than in the actual work on the DB side. (Check netperf TCP_RR)

The scary part of a blocking I/O call is when you have two
python thread (or green thread) and one of them is holding a DB lock the other
is waiting for the same lock in a native blocking I/O syscall.
that's a database deadlock and whether you use eventlet, threads, 
asycnio or even just two transactions in a single-threaded script, that 
can happen regardless.  if your two eventlet non blocking greenlets 
are waiting forever for a deadlock,  you're just as deadlocked as if you 
have OS threads.




If you do a read(2) in native code, the python itself might not be able to 
preempt it
Your transaction might be finished with `DB Lock wait timeout`,
with 30 sec of doing nothing, instead of scheduling to the another python 
thread,
which would be able to release the lock.



Here's the you're losing me part because Python threads are OS 
threads, so Python isn't directly involved trying to preempt anything, 
unless you're referring to the effect of the GIL locking up the 
program.   However, it's pretty easy to make two threads in Python hit a 
database and do a deadlock against each other, and the rest of the 
program's threads continue to run just fine; in a DB deadlock situation 
you are blocked on IO and IO releases the GIL.


If you can illustrate a test script that demonstrates the actual failing 
of OS threads that does not occur greenlets here, that would make it 
immediately apparent what it is you're getting at here.







[1] http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/







__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-11 Thread Attila Fazekas




- Original Message -
 From: Mike Bayer mba...@redhat.com
 To: openstack-dev@lists.openstack.org
 Sent: Monday, May 11, 2015 9:07:13 PM
 Subject: Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
 
 
 
 On 5/11/15 2:02 PM, Attila Fazekas wrote:
 
  Not just with local database connections,
  the 10G network itself also fast. Is is possible you spend more time even
  on
  the kernel side tcp/ip stack (and the context switch..) (Not in physical
  I/O wait)
  than in the actual work on the DB side. (Check netperf TCP_RR)
 
  The scary part of a blocking I/O call is when you have two
  python thread (or green thread) and one of them is holding a DB lock the
  other
  is waiting for the same lock in a native blocking I/O syscall.
 that's a database deadlock and whether you use eventlet, threads,
 asycnio or even just two transactions in a single-threaded script, that
 can happen regardless.  if your two eventlet non blocking greenlets
 are waiting forever for a deadlock,  you're just as deadlocked as if you
 have OS threads.
 
 
  If you do a read(2) in native code, the python itself might not be able to
  preempt it
  Your transaction might be finished with `DB Lock wait timeout`,
  with 30 sec of doing nothing, instead of scheduling to the another python
  thread,
  which would be able to release the lock.
 
 
 Here's the you're losing me part because Python threads are OS
 threads, so Python isn't directly involved trying to preempt anything,
 unless you're referring to the effect of the GIL locking up the
 program.   However, it's pretty easy to make two threads in Python hit a
 database and do a deadlock against each other, and the rest of the
 program's threads continue to run just fine; in a DB deadlock situation
 you are blocked on IO and IO releases the GIL.
 
 If you can illustrate a test script that demonstrates the actual failing
 of OS threads that does not occur greenlets here, that would make it
 immediately apparent what it is you're getting at here.


http://www.fpaste.org/220824/raw/

I just put together hello word C example and a hello word threading example,
and replaced the print with sleep(3).

When I use the sleep(3) from python, the 5 thread program runs in ~3 second,
when I use the sleep(3) from native code, it runs ~15 sec.

So yes, it is very likely a GIL lock wait related issue,
when the native code is not assisting.
 
Do you need a DB example, by using the mysql C driver,
and waiting in an actual I/O primitive ?

The greenthreads will not help here.

If I would import the python time.sleep from the C code it might help.

Using pure python driver helps to avoid this kind of issues,
but in this case you have the `cPython is slow` issue.

 
 
  [1]
  http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/
 
 
 
 
 
 
 
  __
  OpenStack Development Mailing List (not for usage questions)
  Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
  __
  OpenStack Development Mailing List (not for usage questions)
  Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-11 Thread Eugene Nikanorov
 All production Openstack applications today are fully serialized to only
be able to emit a single query to the database at a time;
True. That's why any deployment configures tons (tens) of workers of any
significant service.

  When I talk about moving to threads, this is not a won't help or hurt
kind of issue, at the moment it's a change that will immediately allow
massive improvement to the performance of all Openstack applications
instantly.
Not sure If it will give much benefit over separate processes.
I guess we don't configure many worker for gate testing (at least, neutron
still doesn't do it), so there could be an improvement, but I guess to
enable multithreading we would need to fix the same issues that prevented
us from configuring multiple workers in the gate, plus possibly more.

 We need to change the DB library or dump eventlet.
I'm +1 for the 1st option.

Other option, which is multithreading will most certainly bring concurrency
issues other than database.

Thanks,
Eugene.


On Mon, May 11, 2015 at 4:46 PM, Boris Pavlovic bo...@pavlovic.me wrote:

 Mike,

 Thank you for saying all that you said above.

 Best regards,
 Boris Pavlovic

 On Tue, May 12, 2015 at 2:35 AM, Clint Byrum cl...@fewbar.com wrote:

 Excerpts from Mike Bayer's message of 2015-05-11 15:44:30 -0700:
 
  On 5/11/15 5:25 PM, Robert Collins wrote:
  
   Details: Skip over this bit if you know it all already.
  
   The GIL plays a big factor here: if you want to scale the amount of
   CPU available to a Python service, you have two routes:
   A) move work to a different process through some RPC - be that DB's
   using SQL, other services using oslo.messaging or HTTP - whatever.
   B) use C extensions to perform work in threads - e.g. openssl context
   processing.
  
   To increase concurrency you can use threads, eventlet, asyncio,
   twisted etc - because within a single process *all* Python bytecode
   execution happens inside the GIL lock, so you get at most one CPU for
   a CPU bound workload. For an IO bound workload, you can fit more work
   in by context switching within that one CPU capacity. And - the GIL is
   a poor scheduler, so at the limit - an IO bound workload where the IO
   backend has more capacity than we have CPU to consume it within our
   process, you will run into priority inversion and other problems.
   [This varies by Python release too].
  
   request_duration = time_in_cpu + time_blocked
   request_cpu_utilisation = time_in_cpu/request_duration
   cpu_utilisation = concurrency * request_cpu_utilisation
  
   Assuming that we don't want any one process to spend a lot of time at
   100% - to avoid such at-the-limit issues, lets pick say 80%
   utilisation, or a safety factor of 0.2. If a single request consumes
   50% of its duration waiting on IO, and 50% of its duration executing
   bytecode, we can only run one such request concurrently without
   hitting 100% utilisations. (2*0.5 CPU == 1). For a request that spends
   75% of its duration waiting on IO and 25% on CPU, we can run 3 such
   requests concurrently without exceeding our target of 80% utilisation:
   (3*0.25=0.75).
  
   What we have today in our standard architecture for OpenStack is
   optimised for IO bound workloads: waiting on the
   network/subprocesses/disk/libvirt etc. Running high numbers of
   eventlet handlers in a single process only works when the majority of
   the work being done by a handler is IO.
 
  Everything stated here is great, however in our situation there is one
  unfortunate fact which renders it completely incorrect at the moment.
  I'm still puzzled why we are getting into deep think sessions about the
  vagaries of the GIL and async when there is essentially a full-on
  red-alert performance blocker rendering all of this discussion useless,
  so I must again remind us: what we have *today* in Openstack is *as
  completely un-optimized as you can possibly be*.
 
  The most GIL-heavy nightmare CPU bound task you can imagine running on
  25 threads on a ten year old Pentium will run better than the Openstack
  we have today, because we are running a C-based, non-eventlet patched DB
  library within a single OS thread that happens to use eventlet, but the
  use of eventlet is totally pointless because right now it blocks
  completely on all database IO.   All production Openstack applications
  today are fully serialized to only be able to emit a single query to the
  database at a time; for each message sent, the entire application blocks
  an order of magnitude more than it would under the GIL waiting for the
  database library to send a message to MySQL, waiting for MySQL to send a
  response including the full results, waiting for the database to unwrap
  the response into Python structures, and finally back to the Python
  space, where we can send another database message and block the entire
  application and all greenlets while this single message proceeds.
 
  To share a link I've already shared 

Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-11 Thread Dieterly, Deklan
Given Python’s inherent inability to scale (GIL) relative to other 
languages/platforms, have there been any serious discussions on allowing other 
more scalable languages into the OpenStack ecosystem when 
concurrency/scalability is paramount?

Regards.
--
Deklan Dieterly
Hewlett-Packard Company
Sr. Systems Software Engineer
HP Cloud


From: Eugene Nikanorov enikano...@mirantis.commailto:enikano...@mirantis.com
Reply-To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org
Date: Monday, May 11, 2015 at 6:30 PM
To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

 All production Openstack applications today are fully serialized to only be 
 able to emit a single query to the database at a time;
True. That's why any deployment configures tons (tens) of workers of any 
significant service.

  When I talk about moving to threads, this is not a won't help or hurt kind 
 of issue, at the moment it's a change that will immediately allow massive 
 improvement to the performance of all Openstack applications instantly.
Not sure If it will give much benefit over separate processes.
I guess we don't configure many worker for gate testing (at least, neutron 
still doesn't do it), so there could be an improvement, but I guess to enable 
multithreading we would need to fix the same issues that prevented us from 
configuring multiple workers in the gate, plus possibly more.

 We need to change the DB library or dump eventlet.
I'm +1 for the 1st option.

Other option, which is multithreading will most certainly bring concurrency 
issues other than database.

Thanks,
Eugene.


On Mon, May 11, 2015 at 4:46 PM, Boris Pavlovic 
bo...@pavlovic.memailto:bo...@pavlovic.me wrote:
Mike,

Thank you for saying all that you said above.

Best regards,
Boris Pavlovic

On Tue, May 12, 2015 at 2:35 AM, Clint Byrum 
cl...@fewbar.commailto:cl...@fewbar.com wrote:
Excerpts from Mike Bayer's message of 2015-05-11 15:44:30 -0700:

 On 5/11/15 5:25 PM, Robert Collins wrote:
 
  Details: Skip over this bit if you know it all already.
 
  The GIL plays a big factor here: if you want to scale the amount of
  CPU available to a Python service, you have two routes:
  A) move work to a different process through some RPC - be that DB's
  using SQL, other services using oslo.messaging or HTTP - whatever.
  B) use C extensions to perform work in threads - e.g. openssl context
  processing.
 
  To increase concurrency you can use threads, eventlet, asyncio,
  twisted etc - because within a single process *all* Python bytecode
  execution happens inside the GIL lock, so you get at most one CPU for
  a CPU bound workload. For an IO bound workload, you can fit more work
  in by context switching within that one CPU capacity. And - the GIL is
  a poor scheduler, so at the limit - an IO bound workload where the IO
  backend has more capacity than we have CPU to consume it within our
  process, you will run into priority inversion and other problems.
  [This varies by Python release too].
 
  request_duration = time_in_cpu + time_blocked
  request_cpu_utilisation = time_in_cpu/request_duration
  cpu_utilisation = concurrency * request_cpu_utilisation
 
  Assuming that we don't want any one process to spend a lot of time at
  100% - to avoid such at-the-limit issues, lets pick say 80%
  utilisation, or a safety factor of 0.2. If a single request consumes
  50% of its duration waiting on IO, and 50% of its duration executing
  bytecode, we can only run one such request concurrently without
  hitting 100% utilisations. (2*0.5 CPU == 1). For a request that spends
  75% of its duration waiting on IO and 25% on CPU, we can run 3 such
  requests concurrently without exceeding our target of 80% utilisation:
  (3*0.25=0.75).
 
  What we have today in our standard architecture for OpenStack is
  optimised for IO bound workloads: waiting on the
  network/subprocesses/disk/libvirt etc. Running high numbers of
  eventlet handlers in a single process only works when the majority of
  the work being done by a handler is IO.

 Everything stated here is great, however in our situation there is one
 unfortunate fact which renders it completely incorrect at the moment.
 I'm still puzzled why we are getting into deep think sessions about the
 vagaries of the GIL and async when there is essentially a full-on
 red-alert performance blocker rendering all of this discussion useless,
 so I must again remind us: what we have *today* in Openstack is *as
 completely un-optimized as you can possibly be*.

 The most GIL-heavy nightmare CPU bound task you can imagine running on
 25 threads on a ten year old Pentium will run better than the Openstack
 we have today, because we are running a C-based, non-eventlet patched DB
 library within a single

Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-11 Thread Robert Collins
On 12 May 2015 at 10:12, Attila Fazekas afaze...@redhat.com wrote:





 If you can illustrate a test script that demonstrates the actual failing
 of OS threads that does not occur greenlets here, that would make it
 immediately apparent what it is you're getting at here.


 http://www.fpaste.org/220824/raw/

 I just put together hello word C example and a hello word threading example,
 and replaced the print with sleep(3).

 When I use the sleep(3) from python, the 5 thread program runs in ~3 second,
 when I use the sleep(3) from native code, it runs ~15 sec.

 So yes, it is very likely a GIL lock wait related issue,
 when the native code is not assisting.

Your test code isn't releasing the GIL here, and I'd expect C DB
drivers to be releasing the GIL: you've illustrated how a C extension
can hold the GIL, but not whether thats happening.

 Do you need a DB example, by using the mysql C driver,
 and waiting in an actual I/O primitive ?

waiting in an I/O primitive is fine as long as the GIL has been released.

-Rob


-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-11 Thread Angus Lees
On Tue, 12 May 2015 at 05:08 Mike Bayer mba...@redhat.com wrote:

 On 5/11/15 2:02 PM, Attila Fazekas wrote:
  The scary part of a blocking I/O call is when you have two
  python thread (or green thread) and one of them is holding a DB lock the
 other
  is waiting for the same lock in a native blocking I/O syscall.
 that's a database deadlock and whether you use eventlet, threads,
 asycnio or even just two transactions in a single-threaded script, that
 can happen regardless.  if your two eventlet non blocking greenlets
 are waiting forever for a deadlock,  you're just as deadlocked as if you
 have OS threads.


Not true (if I understand the situation Attila is referring to).

 If you do a read(2) in native code, the python itself might not be able
 to preempt it
  Your transaction might be finished with `DB Lock wait timeout`,
  with 30 sec of doing nothing, instead of scheduling to the another
 python thread,
  which would be able to release the lock.


 Here's the you're losing me part because Python threads are OS
 threads, so Python isn't directly involved trying to preempt anything,
 unless you're referring to the effect of the GIL locking up the
 program.   However, it's pretty easy to make two threads in Python hit a
 database and do a deadlock against each other, and the rest of the
 program's threads continue to run just fine; in a DB deadlock situation
 you are blocked on IO and IO releases the GIL.

 If you can illustrate a test script that demonstrates the actual failing
 of OS threads that does not occur greenlets here, that would make it
 immediately apparent what it is you're getting at here.


1. Thread A does something that takes a lock on the DB side
2. Thread B does something that blocks waiting for that same DB lock
3. Depends on the threading model - see below

In a true preemptive threading system (eg: regular python threads), (3)
is:

3.  Eventually A finishes its transaction/whatever, commits and releases
the DB lock
4. B then takes the lock and proceeds
5. Profit

However, in a system where B's DB client can't be preempted (eg: eventlet
or asyncio calling into a C-based mysql library, and A and B are running on
the same underlying kernel thread), (3) is:

3. B will never be preempted, A will never be rescheduled, and thus A will
never complete whatever it was doing.
4. Deadlock (in mysql-python's case, until a deadlock timer raises an
exception and kills B 30s later)
5. Sadness.  More specifically, we add a @retry to paper over the
particular observed occurrence and then repeat this discussion on os-dev
when the topic comes up again 6 months later.

Note that this is not the usual database transaction deadlock caused by A
and B each taking a lock and then trying to take the other's lock - this is
a deadlock purely in the client-side code caused entirely by the lack of
preemption during an otherwise safe series of DB operations.

See my oslo.db unittest in Ib35c95defea8ace5b456af28801659f2ba67eb96 that
reproduces the above with eventlet and allows you to test the behaviour of
various DB drivers.

(zzzeek: I know you've already seen all of the above in previous
discussions, so sorry for repeating).

 - Gus
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-11 Thread Robert Collins
On 10 May 2015 at 03:26, John Garbutt j...@johngarbutt.com wrote:
 On 9 May 2015 at 15:02, Mike Bayer mba...@redhat.com wrote:
 On 5/9/15 6:45 AM, John Garbutt wrote:

 I am leaning towards us moving to making DB calls with a thread pool and
 some fast C based library, so we get the 'best' performance. Is that a crazy
 thing to be thinking? What am I missing here? Thanks, John

So 'best' performance, and the number of processes we have are all
tied together.

tl;dr: the number of Python processes required to handle a concurrency
of N requests for a service is given by
N*(1-safety_factor) *
avg_request_cpu_use/(avg_request_cpu_use+avg_request_time_blocking)
When requests are CPU bound, you need one process per concurrent request.
When requests are IO bound, you can multiplex requests into a process,
until the sum of the CPU work per second exceeds your safety factor
(which I like to keep down around 0.8 to leave leeway for bursts).

Threads don't help this at all. They don't hinder it either (broadly
speaking - Mike has very specific performance metrics that show the
overheads within the system of different  multiplexing approachs).
Threads are useful for dealing with things that expect threads, like
most DB libraries. Using a thread pool is fine, but don't expect it to
alter the fundamentals around how many processes we need.

Details: Skip over this bit if you know it all already.

The GIL plays a big factor here: if you want to scale the amount of
CPU available to a Python service, you have two routes:
A) move work to a different process through some RPC - be that DB's
using SQL, other services using oslo.messaging or HTTP - whatever.
B) use C extensions to perform work in threads - e.g. openssl context
processing.

To increase concurrency you can use threads, eventlet, asyncio,
twisted etc - because within a single process *all* Python bytecode
execution happens inside the GIL lock, so you get at most one CPU for
a CPU bound workload. For an IO bound workload, you can fit more work
in by context switching within that one CPU capacity. And - the GIL is
a poor scheduler, so at the limit - an IO bound workload where the IO
backend has more capacity than we have CPU to consume it within our
process, you will run into priority inversion and other problems.
[This varies by Python release too].

request_duration = time_in_cpu + time_blocked
request_cpu_utilisation = time_in_cpu/request_duration
cpu_utilisation = concurrency * request_cpu_utilisation

Assuming that we don't want any one process to spend a lot of time at
100% - to avoid such at-the-limit issues, lets pick say 80%
utilisation, or a safety factor of 0.2. If a single request consumes
50% of its duration waiting on IO, and 50% of its duration executing
bytecode, we can only run one such request concurrently without
hitting 100% utilisations. (2*0.5 CPU == 1). For a request that spends
75% of its duration waiting on IO and 25% on CPU, we can run 3 such
requests concurrently without exceeding our target of 80% utilisation:
(3*0.25=0.75).

What we have today in our standard architecture for OpenStack is
optimised for IO bound workloads: waiting on the
network/subprocesses/disk/libvirt etc. Running high numbers of
eventlet handlers in a single process only works when the majority of
the work being done by a handler is IO.

For some of our servers, e.g. Nova-compute, where we're spending a lot
of time waiting on the DB (via the conductor), or libvirt, or VMWare
callouts etc - this makes a lot of sense. In fact its nearly ideal:
we're going to spend stuff all time executing bytecode, and the
majority of time waiting.

For other servers, e.g. heat-engine or murano, were we are doing
complex processing of the state that was stored in the persistent
store backing the system, that ratio is going to change dramatically.

And for some, like nova-conductor, the better and faster we make the
DB layer, the less time we spend blocked, and the *less* concurrency
we can support in a single process. (But hopefully the less
concurrency that is needed, for a given workload).

So - a thread pool doesn't help with the number of

 I'd like to do that but I want the whole Openstack DB API layer in the
 thread pool, not just the low level DBAPI (Python driver) calls.   There's
 no need for eventlet-style concurrency or even less for async-style
 concurrency in transactionally-oriented code.

 Sorry, not sure I get which DB API is which.

 I was thinking we could dispatch all calls to this API into a thread pool:
 https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py

That would work I think.

 I guess an alternative is to add this in the objects layer, on top of
 the rpc dispatch:
 https://github.com/openstack/nova/blob/master/nova/objects/base.py#L188
 But that somehow feels like a layer violation, maybe its not.

No opinion here, sorry :)

-Rob


-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-11 Thread Chris Friesen

On 05/11/2015 08:22 PM, Jay Pipes wrote:

c) Many OpenStack services, including Nova, Cinder, and Neutron, when looked at
from a thousand-foot level, are little more than glue code that pipes out to a
shell to execute system commands (sorry, but it's true).


No apologies necessary. :)


So, bottom line for me: focus on the things that will have the biggest impact to
long-term cost reduction of our codebase.


+1


So, to me, the highest priority performance and scale fixes actually have to do
with the simplification of our subsystems and architecture, not with whether we
use mysql-python, PyMySQL, Python vs. Scala vs. Rust, or any other distractions.


Arguably if we're actually seeing performance issues then it's not a distraction 
but rather a real problem that needs fixing.


But I agree that we shouldn't be trying to optimize the performance pf code that 
isn't causing problems.


Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-11 Thread Clint Byrum
Excerpts from Mike Bayer's message of 2015-05-11 15:44:30 -0700:
 
 On 5/11/15 5:25 PM, Robert Collins wrote:
 
  Details: Skip over this bit if you know it all already.
 
  The GIL plays a big factor here: if you want to scale the amount of
  CPU available to a Python service, you have two routes:
  A) move work to a different process through some RPC - be that DB's
  using SQL, other services using oslo.messaging or HTTP - whatever.
  B) use C extensions to perform work in threads - e.g. openssl context
  processing.
 
  To increase concurrency you can use threads, eventlet, asyncio,
  twisted etc - because within a single process *all* Python bytecode
  execution happens inside the GIL lock, so you get at most one CPU for
  a CPU bound workload. For an IO bound workload, you can fit more work
  in by context switching within that one CPU capacity. And - the GIL is
  a poor scheduler, so at the limit - an IO bound workload where the IO
  backend has more capacity than we have CPU to consume it within our
  process, you will run into priority inversion and other problems.
  [This varies by Python release too].
 
  request_duration = time_in_cpu + time_blocked
  request_cpu_utilisation = time_in_cpu/request_duration
  cpu_utilisation = concurrency * request_cpu_utilisation
 
  Assuming that we don't want any one process to spend a lot of time at
  100% - to avoid such at-the-limit issues, lets pick say 80%
  utilisation, or a safety factor of 0.2. If a single request consumes
  50% of its duration waiting on IO, and 50% of its duration executing
  bytecode, we can only run one such request concurrently without
  hitting 100% utilisations. (2*0.5 CPU == 1). For a request that spends
  75% of its duration waiting on IO and 25% on CPU, we can run 3 such
  requests concurrently without exceeding our target of 80% utilisation:
  (3*0.25=0.75).
 
  What we have today in our standard architecture for OpenStack is
  optimised for IO bound workloads: waiting on the
  network/subprocesses/disk/libvirt etc. Running high numbers of
  eventlet handlers in a single process only works when the majority of
  the work being done by a handler is IO.
 
 Everything stated here is great, however in our situation there is one 
 unfortunate fact which renders it completely incorrect at the moment.   
 I'm still puzzled why we are getting into deep think sessions about the 
 vagaries of the GIL and async when there is essentially a full-on 
 red-alert performance blocker rendering all of this discussion useless, 
 so I must again remind us: what we have *today* in Openstack is *as 
 completely un-optimized as you can possibly be*.
 
 The most GIL-heavy nightmare CPU bound task you can imagine running on 
 25 threads on a ten year old Pentium will run better than the Openstack 
 we have today, because we are running a C-based, non-eventlet patched DB 
 library within a single OS thread that happens to use eventlet, but the 
 use of eventlet is totally pointless because right now it blocks 
 completely on all database IO.   All production Openstack applications 
 today are fully serialized to only be able to emit a single query to the 
 database at a time; for each message sent, the entire application blocks 
 an order of magnitude more than it would under the GIL waiting for the 
 database library to send a message to MySQL, waiting for MySQL to send a 
 response including the full results, waiting for the database to unwrap 
 the response into Python structures, and finally back to the Python 
 space, where we can send another database message and block the entire 
 application and all greenlets while this single message proceeds.
 
 To share a link I've already shared about a dozen times here, here's 
 some tests under similar conditions which illustrate what that 
 concurrency looks like: 
 http://www.diamondtin.com/2014/sqlalchemy-gevent-mysql-python-drivers-comparison/.
  
 MySQLdb takes *20 times longer* to handle the work of 100 sessions than 
 PyMySQL when it's inappropriately run under gevent, when there is 
 modestly high concurrency happening.   When I talk about moving to 
 threads, this is not a won't help or hurt kind of issue, at the moment 
 it's a change that will immediately allow massive improvement to the 
 performance of all Openstack applications instantly.  We need to change 
 the DB library or dump eventlet.
 
 As far as if we should dump eventlet or use a pure-Python DB library, my 
 contention is that a thread based + C database library will outperform 
 an eventlet + Python-based database library. Additionally, if we make 
 either change, when we do so we may very well see all kinds of new 
 database-concurrency related bugs in our apps too, because we will be 
 talking to the database much more intensively all the sudden; it is my 
 opinion that a traditional threading model will be an easier environment 
 to handle working out the approach to these issues; we have to assume 
 concurrency at any time 

Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-11 Thread Boris Pavlovic
Mike,

Thank you for saying all that you said above.

Best regards,
Boris Pavlovic

On Tue, May 12, 2015 at 2:35 AM, Clint Byrum cl...@fewbar.com wrote:

 Excerpts from Mike Bayer's message of 2015-05-11 15:44:30 -0700:
 
  On 5/11/15 5:25 PM, Robert Collins wrote:
  
   Details: Skip over this bit if you know it all already.
  
   The GIL plays a big factor here: if you want to scale the amount of
   CPU available to a Python service, you have two routes:
   A) move work to a different process through some RPC - be that DB's
   using SQL, other services using oslo.messaging or HTTP - whatever.
   B) use C extensions to perform work in threads - e.g. openssl context
   processing.
  
   To increase concurrency you can use threads, eventlet, asyncio,
   twisted etc - because within a single process *all* Python bytecode
   execution happens inside the GIL lock, so you get at most one CPU for
   a CPU bound workload. For an IO bound workload, you can fit more work
   in by context switching within that one CPU capacity. And - the GIL is
   a poor scheduler, so at the limit - an IO bound workload where the IO
   backend has more capacity than we have CPU to consume it within our
   process, you will run into priority inversion and other problems.
   [This varies by Python release too].
  
   request_duration = time_in_cpu + time_blocked
   request_cpu_utilisation = time_in_cpu/request_duration
   cpu_utilisation = concurrency * request_cpu_utilisation
  
   Assuming that we don't want any one process to spend a lot of time at
   100% - to avoid such at-the-limit issues, lets pick say 80%
   utilisation, or a safety factor of 0.2. If a single request consumes
   50% of its duration waiting on IO, and 50% of its duration executing
   bytecode, we can only run one such request concurrently without
   hitting 100% utilisations. (2*0.5 CPU == 1). For a request that spends
   75% of its duration waiting on IO and 25% on CPU, we can run 3 such
   requests concurrently without exceeding our target of 80% utilisation:
   (3*0.25=0.75).
  
   What we have today in our standard architecture for OpenStack is
   optimised for IO bound workloads: waiting on the
   network/subprocesses/disk/libvirt etc. Running high numbers of
   eventlet handlers in a single process only works when the majority of
   the work being done by a handler is IO.
 
  Everything stated here is great, however in our situation there is one
  unfortunate fact which renders it completely incorrect at the moment.
  I'm still puzzled why we are getting into deep think sessions about the
  vagaries of the GIL and async when there is essentially a full-on
  red-alert performance blocker rendering all of this discussion useless,
  so I must again remind us: what we have *today* in Openstack is *as
  completely un-optimized as you can possibly be*.
 
  The most GIL-heavy nightmare CPU bound task you can imagine running on
  25 threads on a ten year old Pentium will run better than the Openstack
  we have today, because we are running a C-based, non-eventlet patched DB
  library within a single OS thread that happens to use eventlet, but the
  use of eventlet is totally pointless because right now it blocks
  completely on all database IO.   All production Openstack applications
  today are fully serialized to only be able to emit a single query to the
  database at a time; for each message sent, the entire application blocks
  an order of magnitude more than it would under the GIL waiting for the
  database library to send a message to MySQL, waiting for MySQL to send a
  response including the full results, waiting for the database to unwrap
  the response into Python structures, and finally back to the Python
  space, where we can send another database message and block the entire
  application and all greenlets while this single message proceeds.
 
  To share a link I've already shared about a dozen times here, here's
  some tests under similar conditions which illustrate what that
  concurrency looks like:
 
 http://www.diamondtin.com/2014/sqlalchemy-gevent-mysql-python-drivers-comparison/
 .
  MySQLdb takes *20 times longer* to handle the work of 100 sessions than
  PyMySQL when it's inappropriately run under gevent, when there is
  modestly high concurrency happening.   When I talk about moving to
  threads, this is not a won't help or hurt kind of issue, at the moment
  it's a change that will immediately allow massive improvement to the
  performance of all Openstack applications instantly.  We need to change
  the DB library or dump eventlet.
 
  As far as if we should dump eventlet or use a pure-Python DB library, my
  contention is that a thread based + C database library will outperform
  an eventlet + Python-based database library. Additionally, if we make
  either change, when we do so we may very well see all kinds of new
  database-concurrency related bugs in our apps too, because we will be
  talking to the database much more intensively 

Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-11 Thread Robert Collins
On 12 May 2015 at 11:35, Clint Byrum cl...@fewbar.com wrote:
 Excerpts from Mike Bayer's message of 2015-05-11 15:44:30 -0700:

 Anyway, there is additional thought that might change the decision
 a bit. There is one pro to changing to use pymsql vs. changing to
 use threads, and that is that it isolates the change to only database
 access. Switching to threading means introducing threads to every piece
 of code we might touch while multiple threads are active.

I agree.

 It really seems worth it to see if I/O bound portions of OpenStack
 become more responsive with pymysql before embarking on a change to the
 concurrency model. If it doesn't, not much harm done, and if it does,
 but makes us CPU bound, well then we have even more of a reason to set
 out on such a large task.

And yes.

-Rob


-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-11 Thread Robert Collins
On 12 May 2015 at 10:44, Mike Bayer mba...@redhat.com wrote:

 What we have today in our standard architecture for OpenStack is
 optimised for IO bound workloads: waiting on the
 network/subprocesses/disk/libvirt etc. Running high numbers of
 eventlet handlers in a single process only works when the majority of
 the work being done by a handler is IO.


 Everything stated here is great, however in our situation there is one
 unfortunate fact which renders it completely incorrect at the moment.   I'm
 still puzzled why we are getting into deep think sessions about the vagaries
 of the GIL and async when there is essentially a full-on red-alert
 performance blocker rendering all of this discussion useless, so I must
 again remind us: what we have *today* in Openstack is *as completely
 un-optimized as you can possibly be*.

Sorry if I seems like I went on a tangent, but choosing a concurrency
model in Python, which a lot of this discussion has been about, is
inextricably linked to the workload being tackled. The point of my
tl;dr was that using threads - which gets us out of the pit below - is
fine for most of our workloads and irrelevant to the actual issues in
the other ones. Clearly that didn't come across. - Sorry.

 The most GIL-heavy nightmare CPU bound task you can imagine running on 25
 threads on a ten year old Pentium will run better than the Openstack we have
 today, because we are running a C-based, non-eventlet patched DB library
 within a single OS thread that happens to use eventlet, but the use of
 eventlet is totally pointless because right now it blocks completely on all
 database IO.

To confirm my understanding: this library releases the GIL, but
because we only have one thread, we don't get more work done.

Yes, that sucks. And your tl;dr is that we need to either use an
eventlet ready library or not use eventlet's greenthreads, either of
which I support as a short term rectification.
...
 talking to the database much more intensively all the sudden; it is my
 opinion that a traditional threading model will be an easier environment to
 handle working out the approach to these issues; we have to assume
 concurrency at any time in any case because we run multiple instances of
 Nova etc. at the same time.  At the end of the day, we aren't going to see
 wildly better performance with one approach over the other in any case, so
 we should pick the one that is easier to develop, maintain, and keep stable.

I agree. I'd actually be quite interested in exploring a CSP model for
even clearer code and diagnosis of issues, but simple sequential code
within threads would be a win itself.

 Robert's analysis talks about various at the limit issues,  but I was

They tend to turn up at scale. You get 100 requests a day out of 5
million that are inexplicably slow, and eventually you have enough
data around the situation to try an experiment, and lo and behold the
problem goes away. They don't disagree with the argument you're making
though - this is just the bigger context, when folk go to deploy our
(real threads || eventlet friendly DB library) code, how many
processes will they need?

FWIW, I think moving to an eventlet friendly library should be the
first step because it can be done much more rapidly and with arguably
less risk.

I don't think the discuss ends there though :)

-Rob

-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-11 Thread Robert Collins
On 12 May 2015 at 13:02, Dieterly, Deklan deklan.diete...@hp.com wrote:
 Given Python’s inherent inability to scale (GIL) relative to other 
 languages/platforms, have there been any serious discussions on allowing 
 other more scalable languages into the OpenStack ecosystem when 
 concurrency/scalability is paramount?

The GIL is a particular part of the Python scaling story, but don't
let it scare you: http://en.wikipedia.org/wiki/Global_Interpreter_Lock
- Ruby MRI also has a GIL equivalent. Last I heard golang still
defaults GOMAXPROCS to 1 and often performs less efficiently when it
is  1 (that is, individual requests becomes slower but more requests
can get CPU at once). In rust threads are quite interesting, though
there's an arena per thread and you need hand ownership around
(http://doc.rust-lang.org/1.0.0-alpha/book/tasks.html).

We do allow other languages in - see the Swift golang stuff happening
right now, but:- short of C layer languages (which rust arguably is),
scaling CPU bound workloads is always tricky in one way or another, we
just get to pick what bit will be tricky for us.

Jython can free thread, for instance - the GIL is  CPython constraint,
not Python per se.

-Rob


-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-11 Thread Attila Fazekas




- Original Message -
 From: Mike Bayer mba...@redhat.com
 To: openstack-dev@lists.openstack.org
 Sent: Monday, May 11, 2015 4:44:58 PM
 Subject: Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
 
 
 
 On 5/11/15 9:58 AM, Attila Fazekas wrote:
 
 
 
  - Original Message -
  From: John Garbutt j...@johngarbutt.com
  To: OpenStack Development Mailing List (not for usage questions)
  openstack-dev@lists.openstack.org
  Cc: Dan Smith d...@danplanet.com
  Sent: Saturday, May 9, 2015 12:45:26 PM
  Subject: Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
 
  On 30 April 2015 at 18:54, Mike Bayer mba...@redhat.com wrote:
  On 4/30/15 11:16 AM, Dan Smith wrote:
  There is an open discussion to replace mysql-python with PyMySQL, but
  PyMySQL has worse performance:
 
  https://wiki.openstack.org/wiki/PyMySQL_evaluation
  My major concern with not moving to something different (i.e. not based
  on the C library) is the threading problem. Especially as we move in the
  direction of cellsv2 in nova, not blocking the process while waiting for
  a reply from mysql is going to be critical. Further, I think that we're
  likely to get back a lot of performance from a supports-eventlet
  database connection because of the parallelism that conductor currently
  can only provide in exchange for the footprint of forking into lots of
  workers.
 
  If we're going to move, shouldn't we be looking at something that
  supports our threading model?
  yes, but at the same time, we should change our threading model at the
  level
  of where APIs are accessed to refer to a database, at the very least
  using
  a
  threadpool behind eventlet.   CRUD-oriented database access is faster
  using
  traditional threads, even in Python, than using an eventlet-like system
  or
  using explicit async.  The tests at
  http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/
  show this.With traditional threads, we can stay on the C-based MySQL
  APIs and take full advantage of their speed.
  Sorry to go back in time, I wanted to go back to an important point.
 
  It seems we have three possible approaches:
  * C lib and eventlet, blocks whole process
  * pure python lib, and eventlet, eventlet does its thing
  * go for a C lib and dispatch calls via thread pool
  * go with pure C protocol lib, which explicitly using `python patch-able`
 I/O function (Maybe others like.: threading, mutex, sleep ..)
 
  * go with pure C protocol lib and the python part explicitly call
 for `decode` and `encode`, the C part just do CPU intensive operations,
 and it never calls for I/O primitives .
 
  We have a few problems:
  * performance sucks, we have to fork lots of nova-conductors and api nodes
  * need to support python2.7 and 3.4, but its not currently possible
  with the lib we use?
  * want to pick a lib that we can fix when there are issues, and work to
  improve
 
  It sounds like:
  * currently do the first one, it sucks, forking nova-conductor helps
  * seems we are thinking the second one might work, we sure get py3.4 +
  py2.7 support
  * the last will mean more work, but its likely to be more performant
  * worried we are picking a unsupported lib with little future
 
  I am leaning towards us moving to making DB calls with a thread pool
  and some fast C based library, so we get the 'best' performance.
 
  Is that a crazy thing to be thinking? What am I missing here?
  Using the python socket from C code:
  https://github.com/esnme/ultramysql/blob/master/python/io_cpython.c#L100
 
  Also possible to implement a mysql driver just as a protocol parser,
  and you are free to use you favorite event based I/O strategy (direct epoll
  usage)
  even without eventlet (or similar).
 
  The issue with ultramysql, it does not implements
  the `standard` python DB API, so you would need to add an extra wrapper to
  SQLAlchemy.
 
 This driver appears to have seen its last commit about a year ago, that
 doesn't even implement the standard DBAPI (which is already a red
 flag).   There is apparently a separately released (!) DBAPI-compat
 wrapper https://pypi.python.org/pypi/umysqldb/1.0.3 which has had no
 releases in two years. If this wrapper is indeed compatible with
 MySQLdb then it would run in SQLAlchemy without changes (though I'd be
 extremely surprised if it passes our test suite).
 
 How would using these obscure libraries be any preferable than running
 Nova API functions within the thread-pooling facilities already included
 with eventlet ?Keeping in mind that I've now done the work [1]
 to show that there is no performance gain to be had for all the trouble
 we go through to use eventlet/gevent/asyncio with local database
 connections.

Not just with local database connections,
the 10G network itself also fast. Is is possible you spend more time even on
the kernel side tcp/ip stack (and the context switch..) (Not in physical I/O 
wait)
than in the actual

Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-11 Thread Mike Bayer



On 5/11/15 5:25 PM, Robert Collins wrote:


Details: Skip over this bit if you know it all already.

The GIL plays a big factor here: if you want to scale the amount of
CPU available to a Python service, you have two routes:
A) move work to a different process through some RPC - be that DB's
using SQL, other services using oslo.messaging or HTTP - whatever.
B) use C extensions to perform work in threads - e.g. openssl context
processing.

To increase concurrency you can use threads, eventlet, asyncio,
twisted etc - because within a single process *all* Python bytecode
execution happens inside the GIL lock, so you get at most one CPU for
a CPU bound workload. For an IO bound workload, you can fit more work
in by context switching within that one CPU capacity. And - the GIL is
a poor scheduler, so at the limit - an IO bound workload where the IO
backend has more capacity than we have CPU to consume it within our
process, you will run into priority inversion and other problems.
[This varies by Python release too].

request_duration = time_in_cpu + time_blocked
request_cpu_utilisation = time_in_cpu/request_duration
cpu_utilisation = concurrency * request_cpu_utilisation

Assuming that we don't want any one process to spend a lot of time at
100% - to avoid such at-the-limit issues, lets pick say 80%
utilisation, or a safety factor of 0.2. If a single request consumes
50% of its duration waiting on IO, and 50% of its duration executing
bytecode, we can only run one such request concurrently without
hitting 100% utilisations. (2*0.5 CPU == 1). For a request that spends
75% of its duration waiting on IO and 25% on CPU, we can run 3 such
requests concurrently without exceeding our target of 80% utilisation:
(3*0.25=0.75).

What we have today in our standard architecture for OpenStack is
optimised for IO bound workloads: waiting on the
network/subprocesses/disk/libvirt etc. Running high numbers of
eventlet handlers in a single process only works when the majority of
the work being done by a handler is IO.


Everything stated here is great, however in our situation there is one 
unfortunate fact which renders it completely incorrect at the moment.   
I'm still puzzled why we are getting into deep think sessions about the 
vagaries of the GIL and async when there is essentially a full-on 
red-alert performance blocker rendering all of this discussion useless, 
so I must again remind us: what we have *today* in Openstack is *as 
completely un-optimized as you can possibly be*.


The most GIL-heavy nightmare CPU bound task you can imagine running on 
25 threads on a ten year old Pentium will run better than the Openstack 
we have today, because we are running a C-based, non-eventlet patched DB 
library within a single OS thread that happens to use eventlet, but the 
use of eventlet is totally pointless because right now it blocks 
completely on all database IO.   All production Openstack applications 
today are fully serialized to only be able to emit a single query to the 
database at a time; for each message sent, the entire application blocks 
an order of magnitude more than it would under the GIL waiting for the 
database library to send a message to MySQL, waiting for MySQL to send a 
response including the full results, waiting for the database to unwrap 
the response into Python structures, and finally back to the Python 
space, where we can send another database message and block the entire 
application and all greenlets while this single message proceeds.


To share a link I've already shared about a dozen times here, here's 
some tests under similar conditions which illustrate what that 
concurrency looks like: 
http://www.diamondtin.com/2014/sqlalchemy-gevent-mysql-python-drivers-comparison/. 
MySQLdb takes *20 times longer* to handle the work of 100 sessions than 
PyMySQL when it's inappropriately run under gevent, when there is 
modestly high concurrency happening.   When I talk about moving to 
threads, this is not a won't help or hurt kind of issue, at the moment 
it's a change that will immediately allow massive improvement to the 
performance of all Openstack applications instantly.  We need to change 
the DB library or dump eventlet.


As far as if we should dump eventlet or use a pure-Python DB library, my 
contention is that a thread based + C database library will outperform 
an eventlet + Python-based database library. Additionally, if we make 
either change, when we do so we may very well see all kinds of new 
database-concurrency related bugs in our apps too, because we will be 
talking to the database much more intensively all the sudden; it is my 
opinion that a traditional threading model will be an easier environment 
to handle working out the approach to these issues; we have to assume 
concurrency at any time in any case because we run multiple instances 
of Nova etc. at the same time.  At the end of the day, we aren't going 
to see wildly better performance with one approach over the 

Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-11 Thread Jay Pipes

On 05/11/2015 09:02 PM, Dieterly, Deklan wrote:

Given Python’s inherent inability to scale (GIL) relative to other
languages/platforms, have there been any serious discussions on
allowing other more scalable languages into the OpenStack ecosystem
when concurrency/scalability is paramount?


Robert has already responded to your lack of specificity in the above 
statement. I'd like to add a couple things:


a) The architecture of our OpenStack projects -- including the systems 
we use as backing data stores and the coarse-grained locking techniques 
we have used to date -- is a bigger problem than the language most 
OpenStack components are written in.


b) The speed of development and the familiarity with the Python language 
of the folks involved in our CI, testing, and infra/build platforms and 
the inherent economies of scale we get from that represent a far greater 
long-term cost reduction than trying to rewrite existing systems in 
faster or more scalable platforms. Developer and operator time costs 
way more than the tiny amount of additional costs that comes from buying 
a few more and faster processors to put controller services on.


c) Many OpenStack services, including Nova, Cinder, and Neutron, when 
looked at from a thousand-foot level, are little more than glue code 
that pipes out to a shell to execute system commands (sorry, but it's 
true). If you look at the time spent in the database and message queue 
it's really very little compared to the time spent on a compute node 
spawning an image. The DB and message queue are, IME, not where scaling 
problems occur. Instead, they occur in things like Nova pulling images 
from Glance unnecessarily (an architectural problem, not a concurrency 
problem) or the implementation of iptables saves when lots of security 
groups on a single compute node would cause excessive rebuilds of the 
routing tables.


Now, do I support the hummingbird Golang object server effort in Swift?

Absolutely, I do.

Because it 100% makes sense there. That part of the Swift code base is 
where concurrency and performance matters big time. Would implementing 
all of nova-compute in Golang result in huge performance gains? No, not 
at all. It just doesn't make much sense, since much of nova-compute's 
time is spent shell'd out in execution or waiting on locks (an 
implementation thing that has little to nothing to do with the language 
used).


So, bottom line for me: focus on the things that will have the biggest 
impact to long-term cost reduction of our codebase.


Very little of that cost (in most of our OpenStack projects) has to do 
with concurrency issues. A *lot* of that long-term cost has to do with 
the unnecessary complexity of our architecture and subsystems, because 
it's high-priced humans that need to get paid to maintain such complexity.


So, to me, the highest priority performance and scale fixes actually 
have to do with the simplification of our subsystems and architecture, 
not with whether we use mysql-python, PyMySQL, Python vs. Scala vs. 
Rust, or any other distractions.


Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-09 Thread John Garbutt
On 30 April 2015 at 18:54, Mike Bayer mba...@redhat.com wrote:
 On 4/30/15 11:16 AM, Dan Smith wrote:
 There is an open discussion to replace mysql-python with PyMySQL, but
 PyMySQL has worse performance:

 https://wiki.openstack.org/wiki/PyMySQL_evaluation

 My major concern with not moving to something different (i.e. not based
 on the C library) is the threading problem. Especially as we move in the
 direction of cellsv2 in nova, not blocking the process while waiting for
 a reply from mysql is going to be critical. Further, I think that we're
 likely to get back a lot of performance from a supports-eventlet
 database connection because of the parallelism that conductor currently
 can only provide in exchange for the footprint of forking into lots of
 workers.

 If we're going to move, shouldn't we be looking at something that
 supports our threading model?

 yes, but at the same time, we should change our threading model at the level
 of where APIs are accessed to refer to a database, at the very least using a
 threadpool behind eventlet.   CRUD-oriented database access is faster using
 traditional threads, even in Python, than using an eventlet-like system or
 using explicit async.  The tests at
 http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/
 show this.With traditional threads, we can stay on the C-based MySQL
 APIs and take full advantage of their speed.

Sorry to go back in time, I wanted to go back to an important point.

It seems we have three possible approaches:
* C lib and eventlet, blocks whole process
* pure python lib, and eventlet, eventlet does its thing
* go for a C lib and dispatch calls via thread pool

We have a few problems:
* performance sucks, we have to fork lots of nova-conductors and api nodes
* need to support python2.7 and 3.4, but its not currently possible
with the lib we use?
* want to pick a lib that we can fix when there are issues, and work to improve

It sounds like:
* currently do the first one, it sucks, forking nova-conductor helps
* seems we are thinking the second one might work, we sure get py3.4 +
py2.7 support
* the last will mean more work, but its likely to be more performant
* worried we are picking a unsupported lib with little future

I am leaning towards us moving to making DB calls with a thread pool
and some fast C based library, so we get the 'best' performance.

Is that a crazy thing to be thinking? What am I missing here?

Thanks,
John

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-09 Thread John Garbutt
On 9 May 2015 at 15:02, Mike Bayer mba...@redhat.com wrote:
 On 5/9/15 6:45 AM, John Garbutt wrote:

 I am leaning towards us moving to making DB calls with a thread pool and
 some fast C based library, so we get the 'best' performance. Is that a crazy
 thing to be thinking? What am I missing here? Thanks, John

 I'd like to do that but I want the whole Openstack DB API layer in the
 thread pool, not just the low level DBAPI (Python driver) calls.   There's
 no need for eventlet-style concurrency or even less for async-style
 concurrency in transactionally-oriented code.

Sorry, not sure I get which DB API is which.

I was thinking we could dispatch all calls to this API into a thread pool:
https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py

I guess an alternative is to add this in the objects layer, on top of
the rpc dispatch:
https://github.com/openstack/nova/blob/master/nova/objects/base.py#L188
But that somehow feels like a layer violation, maybe its not.

Is that similar to what you where thinking?

Thanks,
John

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-09 Thread Mike Bayer



On 5/9/15 6:45 AM, John Garbutt wrote:
I am leaning towards us moving to making DB calls with a thread pool 
and some fast C based library, so we get the 'best' performance. Is 
that a crazy thing to be thinking? What am I missing here? Thanks, John 


I'd like to do that but I want the whole Openstack DB API layer in the 
thread pool, not just the low level DBAPI (Python driver) calls.   
There's no need for eventlet-style concurrency or even less for 
async-style concurrency in transactionally-oriented code.





__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-08 Thread Ronald Bradford
Has anybody considered the native python connector for MySQL that supports
Python 3.

Here are the Ubuntu Packages.


$ apt-get show python-mysql.connector
E: Invalid operation show
rbradfor@rubble:~$ apt-cache show python-mysql.connector
Package: python-mysql.connector
Priority: optional
Section: universe/python
Installed-Size: 386
Maintainer: Ubuntu Developers ubuntu-devel-disc...@lists.ubuntu.com
Original-Maintainer: Sandro Tosi mo...@debian.org
Architecture: all
Source: mysql-connector-python
Version: 1.1.6-1
Replaces: mysql-utilities ( 1.3.5-2)
Depends: python:any (= 2.7.5-5~), python:any ( 2.8)
Breaks: mysql-utilities ( 1.3.5-2)
Filename:
pool/universe/m/mysql-connector-python/python-mysql.connector_1.1.6-1_all.deb
Size: 67196
MD5sum: 22b2cb35cf8b14ac0bf4493b0d676adb
SHA1: de626403e1b14f617e9acb0a6934f044fae061c7
SHA256: 99e34f67d085c28b49eb8145c281deaa6d2b2a48d741e6831e149510087aab94
Description-en: pure Python implementation of MySQL Client/Server protocol
 MySQL driver written in Python which does not depend on MySQL C client
 libraries and implements the DB API v2.0 specification (PEP-249).
 .
 MySQL Connector/Python is implementing the MySQL Client/Server protocol
 completely in Python. This means you don't have to compile anything or
MySQL
 (client library) doesn't even have to be installed on the machine.
Description-md5: bb7e2eba7769d706d44e0ef91171b4ed
Homepage: http://dev.mysql.com/doc/connector-python/en/index.html
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Origin: Ubuntu

$ apt-cache show python3-mysql.connector
Package: python3-mysql.connector
Priority: optional
Section: universe/python
Installed-Size: 385
Maintainer: Ubuntu Developers ubuntu-devel-disc...@lists.ubuntu.com
Original-Maintainer: Sandro Tosi mo...@debian.org
Architecture: all
Source: mysql-connector-python
Version: 1.1.6-1
Depends: python3:any (= 3.3.2-2~)
Filename:
pool/universe/m/mysql-connector-python/python3-mysql.connector_1.1.6-1_all.deb
Size: 64870
MD5sum: 461208ed1b89d516d6f6ce43c003a173
SHA1: bd439c4057824178490b402ad6c84067e1e2884e
SHA256: 487af52b98bc5f048faf4dc73420eff20b75a150e1f92c82de2ecdd4671659ae
Description-en: pure Python implementation of MySQL Client/Server protocol
(Python3)
 MySQL driver written in Python which does not depend on MySQL C client
 libraries and implements the DB API v2.0 specification (PEP-249).
 .
 MySQL Connector/Python is implementing the MySQL Client/Server protocol
 completely in Python. This means you don't have to compile anything or
MySQL
 (client library) doesn't even have to be installed on the machine.
 .
 This package contains the Python 3 version of mysql.connector.
Description-md5: 4bca3815f5856ddf4a629b418ec76c8f
Homepage: http://dev.mysql.com/doc/connector-python/en/index.html
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Origin: Ubuntu


Ronald Bradford

Web Site: http://ronaldbradford.com
LinkedIn: http://www.linkedin.com/in/ronaldbradford
Twitter: @RonaldBradford http://twitter.com/ronaldbradford
Skype: RonaldBradford
GTalk:  Ronald.Bradford



On Thu, May 7, 2015 at 9:39 PM, Mike Bayer mba...@redhat.com wrote:



 On 5/7/15 5:32 PM, Thomas Goirand wrote:

 If there are really fixes and features we

 need in Py2K then of course we have to either convince MySQLdb to merge
 them or switch to mysqlclient.


 Given the no reply in 6 months I think that's enough to say it:
 mysql-python is a dangerous package with a non-responsive upstream. That's
 always bad, and IMO, enough to try to get rid of it. If you think switching
 to PyMYSQL is effortless, and the best way forward, then let's do that ASAP!


 haha - id rather have drop eventlet + mysqlclient :)

 as far as this thread, where this has been heading is that django has
 already been recommending mysqlclient and it's become apparent just what a
 barrage of emails and messages have been sent Andy Dustman's way, with no
 response.I agree this is troubling behavior, and I've alerted people at
 RH internal that we need to start thinking about this package switch.My
 original issue was that for Fedora etc., changing it in this way is
 challenging, and from my discussions with packaging people, this is
 actually correct - this isn't an easy way to do it for them and there have
 been many emails as a result.  My other issue is the SQLAlchemy testing
 issue - I'd essentially have to just stop testing mysql-python and switch
 to mysqlclient entirely, which means i need to revise all my docs and get
 all my users to switch also when the SQLAlchemy MySQLdb dialect eventually
 diverges from mysql-python 1.2.5, hence the whole thing is in a
 not-minor-enough way my problem as well.A simple module name change for
 mysqlclient, then there's no problem.   But there you go - assuming
 continued crickets from AD, and seeing that people continue find it
 important to appease projects like Trac that IMO quite amateurishly
 hardcode import MySQLdb, I don't see much other option.


 

Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-08 Thread Ronald Bradford
I guess I may have spoken too soon.
https://wiki.openstack.org/wiki/PyMySQL_evaluation states   Oracle refuses
to publish MySQL-connector-Python on Pypi, which is critical to the
Openstack infrastructure.

I am unclear when this statement was made and who is involved in this
discussion.  As I have contacts in the MySQL engineering and Oracle
Corporation product development teams I will endeavor to seek a more
current and definitive response and statement.

Regards

Ronald



On Fri, May 8, 2015 at 10:33 AM, Ronald Bradford m...@ronaldbradford.com
wrote:

 Has anybody considered the native python connector for MySQL that supports
 Python 3.

 Here are the Ubuntu Packages.


 $ apt-get show python-mysql.connector
 E: Invalid operation show
 rbradfor@rubble:~$ apt-cache show python-mysql.connector
 Package: python-mysql.connector
 Priority: optional
 Section: universe/python
 Installed-Size: 386
 Maintainer: Ubuntu Developers ubuntu-devel-disc...@lists.ubuntu.com
 Original-Maintainer: Sandro Tosi mo...@debian.org
 Architecture: all
 Source: mysql-connector-python
 Version: 1.1.6-1
 Replaces: mysql-utilities ( 1.3.5-2)
 Depends: python:any (= 2.7.5-5~), python:any ( 2.8)
 Breaks: mysql-utilities ( 1.3.5-2)
 Filename:
 pool/universe/m/mysql-connector-python/python-mysql.connector_1.1.6-1_all.deb
 Size: 67196
 MD5sum: 22b2cb35cf8b14ac0bf4493b0d676adb
 SHA1: de626403e1b14f617e9acb0a6934f044fae061c7
 SHA256: 99e34f67d085c28b49eb8145c281deaa6d2b2a48d741e6831e149510087aab94
 Description-en: pure Python implementation of MySQL Client/Server protocol
  MySQL driver written in Python which does not depend on MySQL C client
  libraries and implements the DB API v2.0 specification (PEP-249).
  .
  MySQL Connector/Python is implementing the MySQL Client/Server protocol
  completely in Python. This means you don't have to compile anything or
 MySQL
  (client library) doesn't even have to be installed on the machine.
 Description-md5: bb7e2eba7769d706d44e0ef91171b4ed
 Homepage: http://dev.mysql.com/doc/connector-python/en/index.html
 Bugs: https://bugs.launchpad.net/ubuntu/+filebug
 Origin: Ubuntu

 $ apt-cache show python3-mysql.connector
 Package: python3-mysql.connector
 Priority: optional
 Section: universe/python
 Installed-Size: 385
 Maintainer: Ubuntu Developers ubuntu-devel-disc...@lists.ubuntu.com
 Original-Maintainer: Sandro Tosi mo...@debian.org
 Architecture: all
 Source: mysql-connector-python
 Version: 1.1.6-1
 Depends: python3:any (= 3.3.2-2~)
 Filename:
 pool/universe/m/mysql-connector-python/python3-mysql.connector_1.1.6-1_all.deb
 Size: 64870
 MD5sum: 461208ed1b89d516d6f6ce43c003a173
 SHA1: bd439c4057824178490b402ad6c84067e1e2884e
 SHA256: 487af52b98bc5f048faf4dc73420eff20b75a150e1f92c82de2ecdd4671659ae
 Description-en: pure Python implementation of MySQL Client/Server protocol
 (Python3)
  MySQL driver written in Python which does not depend on MySQL C client
  libraries and implements the DB API v2.0 specification (PEP-249).
  .
  MySQL Connector/Python is implementing the MySQL Client/Server protocol
  completely in Python. This means you don't have to compile anything or
 MySQL
  (client library) doesn't even have to be installed on the machine.
  .
  This package contains the Python 3 version of mysql.connector.
 Description-md5: 4bca3815f5856ddf4a629b418ec76c8f
 Homepage: http://dev.mysql.com/doc/connector-python/en/index.html
 Bugs: https://bugs.launchpad.net/ubuntu/+filebug
 Origin: Ubuntu


 Ronald Bradford

 Web Site: http://ronaldbradford.com
 LinkedIn: http://www.linkedin.com/in/ronaldbradford
 Twitter: @RonaldBradford http://twitter.com/ronaldbradford
 Skype: RonaldBradford
 GTalk:  Ronald.Bradford



 On Thu, May 7, 2015 at 9:39 PM, Mike Bayer mba...@redhat.com wrote:



 On 5/7/15 5:32 PM, Thomas Goirand wrote:

 If there are really fixes and features we

 need in Py2K then of course we have to either convince MySQLdb to merge
 them or switch to mysqlclient.


 Given the no reply in 6 months I think that's enough to say it:
 mysql-python is a dangerous package with a non-responsive upstream. That's
 always bad, and IMO, enough to try to get rid of it. If you think switching
 to PyMYSQL is effortless, and the best way forward, then let's do that ASAP!


 haha - id rather have drop eventlet + mysqlclient :)

 as far as this thread, where this has been heading is that django has
 already been recommending mysqlclient and it's become apparent just what a
 barrage of emails and messages have been sent Andy Dustman's way, with no
 response.I agree this is troubling behavior, and I've alerted people at
 RH internal that we need to start thinking about this package switch.My
 original issue was that for Fedora etc., changing it in this way is
 challenging, and from my discussions with packaging people, this is
 actually correct - this isn't an easy way to do it for them and there have
 been many emails as a result.  My other issue is the SQLAlchemy testing
 issue - I'd 

Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-08 Thread Doug Hellmann
Excerpts from Ronald Bradford's message of 2015-05-08 10:41:30 -0400:
 I guess I may have spoken too soon.
 https://wiki.openstack.org/wiki/PyMySQL_evaluation states   Oracle refuses
 to publish MySQL-connector-Python on Pypi, which is critical to the
 Openstack infrastructure.
 
 I am unclear when this statement was made and who is involved in this
 discussion.  As I have contacts in the MySQL engineering and Oracle
 Corporation product development teams I will endeavor to seek a more
 current and definitive response and statement.

We install all of our library dependencies via pip (for unit,
functional, and integration tests). New versions of pip require special
handling to install packages not hosted on PyPI, and that special
handling must be performed in every place where we have a dependency on
the package, which places an extra burden on us that we would prefer to
avoid.

Doug

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-08 Thread Joshua Harlow

If that could get published, please do make it happen!

As for as who tried to contact oracle, and never got a response, I am 
not sure about that question (or answer). But if we can get that to 
happen it would be great for the whole python community (IMHO).


-Josh

Ronald Bradford wrote:

I guess I may have spoken too soon.
https://wiki.openstack.org/wiki/PyMySQL_evaluation states  Oracle
refuses to publish MySQL-connector-Python on Pypi, which is critical to
the Openstack infrastructure.

I am unclear when this statement was made and who is involved in this
discussion.  As I have contacts in the MySQL engineering and Oracle
Corporation product development teams I will endeavor to seek a more
current and definitive response and statement.

Regards

Ronald



On Fri, May 8, 2015 at 10:33 AM, Ronald Bradford m...@ronaldbradford.com
mailto:m...@ronaldbradford.com wrote:

Has anybody considered the native python connector for MySQL that
supports Python 3.

Here are the Ubuntu Packages.


$ apt-get show python-mysql.connector
E: Invalid operation show
rbradfor@rubble:~$ apt-cache show python-mysql.connector
Package: python-mysql.connector
Priority: optional
Section: universe/python
Installed-Size: 386
Maintainer: Ubuntu Developers ubuntu-devel-disc...@lists.ubuntu.com
mailto:ubuntu-devel-disc...@lists.ubuntu.com
Original-Maintainer: Sandro Tosi mo...@debian.org
mailto:mo...@debian.org
Architecture: all
Source: mysql-connector-python
Version: 1.1.6-1
Replaces: mysql-utilities ( 1.3.5-2)
Depends: python:any (= 2.7.5-5~), python:any ( 2.8)
Breaks: mysql-utilities ( 1.3.5-2)
Filename:

pool/universe/m/mysql-connector-python/python-mysql.connector_1.1.6-1_all.deb
Size: 67196
MD5sum: 22b2cb35cf8b14ac0bf4493b0d676adb
SHA1: de626403e1b14f617e9acb0a6934f044fae061c7
SHA256: 99e34f67d085c28b49eb8145c281deaa6d2b2a48d741e6831e149510087aab94
Description-en: pure Python implementation of MySQL Client/Server
protocol
  MySQL driver written in Python which does not depend on MySQL C client
  libraries and implements the DB API v2.0 specification (PEP-249).
  .
  MySQL Connector/Python is implementing the MySQL Client/Server
protocol
  completely in Python. This means you don't have to compile
anything or MySQL
  (client library) doesn't even have to be installed on the machine.
Description-md5: bb7e2eba7769d706d44e0ef91171b4ed
Homepage: http://dev.mysql.com/doc/connector-python/en/index.html
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Origin: Ubuntu

$ apt-cache show python3-mysql.connector
Package: python3-mysql.connector
Priority: optional
Section: universe/python
Installed-Size: 385
Maintainer: Ubuntu Developers ubuntu-devel-disc...@lists.ubuntu.com
mailto:ubuntu-devel-disc...@lists.ubuntu.com
Original-Maintainer: Sandro Tosi mo...@debian.org
mailto:mo...@debian.org
Architecture: all
Source: mysql-connector-python
Version: 1.1.6-1
Depends: python3:any (= 3.3.2-2~)
Filename:

pool/universe/m/mysql-connector-python/python3-mysql.connector_1.1.6-1_all.deb
Size: 64870
MD5sum: 461208ed1b89d516d6f6ce43c003a173
SHA1: bd439c4057824178490b402ad6c84067e1e2884e
SHA256: 487af52b98bc5f048faf4dc73420eff20b75a150e1f92c82de2ecdd4671659ae
Description-en: pure Python implementation of MySQL Client/Server
protocol (Python3)
  MySQL driver written in Python which does not depend on MySQL C client
  libraries and implements the DB API v2.0 specification (PEP-249).
  .
  MySQL Connector/Python is implementing the MySQL Client/Server
protocol
  completely in Python. This means you don't have to compile
anything or MySQL
  (client library) doesn't even have to be installed on the machine.
  .
  This package contains the Python 3 version of mysql.connector.
Description-md5: 4bca3815f5856ddf4a629b418ec76c8f
Homepage: http://dev.mysql.com/doc/connector-python/en/index.html
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Origin: Ubuntu


Ronald Bradford

Web Site: http://ronaldbradford.com http://ronaldbradford.com/
LinkedIn: http://www.linkedin.com/in/ronaldbradford
Twitter: @RonaldBradford http://twitter.com/ronaldbradford
Skype: RonaldBradford
GTalk:  Ronald.Bradford



On Thu, May 7, 2015 at 9:39 PM, Mike Bayer mba...@redhat.com
mailto:mba...@redhat.com wrote:



On 5/7/15 5:32 PM, Thomas Goirand wrote:

If there are really fixes and features we

need in Py2K then of course we have to either convince
MySQLdb to merge
them or switch to mysqlclient.


Given the no reply in 6 months I think that's enough to
say it: mysql-python is a dangerous package with a
non-responsive upstream. That's always bad, and 

Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-08 Thread Mike Bayer



On 5/8/15 10:41 AM, Ronald Bradford wrote:
I guess I may have spoken too soon. 
https://wiki.openstack.org/wiki/PyMySQL_evaluation states   Oracle 
refuses to publish MySQL-connector-Python on Pypi, which is critical 
to the Openstack infrastructure.


I am unclear when this statement was made and who is involved in this 
discussion.  As I have contacts in the MySQL engineering and Oracle 
Corporation product development teams I will endeavor to seek a more 
current and definitive response and statement.


I made that statement.   I and others have been in contact for many 
months with Andrew Rist as well as Geert Vanderkelen regarding this 
issue without any result.  We all preferred mysql-connector originally 
but as time has dragged on and I've sent a few messages to Andrew and 
others that Openstack is essentially going to give up on their driver to 
no result,  we've all gotten more involved with PyMySQL, it has come out 
as the better driver overall.PyMySQL is written by the same author 
of the mysqlclient driver that it looks like we are all switching to 
regardless (Django has already recommended this to their userbase).


PyMySQL also has very straightforward source code, performs better in 
tests, and doesn't have weird decisions like deciding to make a huge 
backwards-incompatible change to return bytearrays and not bytes in Py3K 
raw mode 
(http://dev.mysql.com/doc/relnotes/connector-python/en/news-2-0-0.html).


PyMySQL also is easily accessible as a project with very fast support 
via Github; several of us have been able to improve PyMySQL via pull 
requests quickly and without issue, and the maintainer even made me a 
member of the project so I can even commit fixes directly if I 
wanted.I don't know that Oracle's ownership of MySQL-connector would 
be comfortable with these things, and the only way to get support is 
through Oracle's large and cumbersome bug tracker.







Regards

Ronald



On Fri, May 8, 2015 at 10:33 AM, Ronald Bradford 
m...@ronaldbradford.com mailto:m...@ronaldbradford.com wrote:


Has anybody considered the native python connector for MySQL that
supports Python 3.

Here are the Ubuntu Packages.


$ apt-get show python-mysql.connector
E: Invalid operation show
rbradfor@rubble:~$ apt-cache show python-mysql.connector
Package: python-mysql.connector
Priority: optional
Section: universe/python
Installed-Size: 386
Maintainer: Ubuntu Developers
ubuntu-devel-disc...@lists.ubuntu.com
mailto:ubuntu-devel-disc...@lists.ubuntu.com
Original-Maintainer: Sandro Tosi mo...@debian.org
mailto:mo...@debian.org
Architecture: all
Source: mysql-connector-python
Version: 1.1.6-1
Replaces: mysql-utilities ( 1.3.5-2)
Depends: python:any (= 2.7.5-5~), python:any ( 2.8)
Breaks: mysql-utilities ( 1.3.5-2)
Filename:

pool/universe/m/mysql-connector-python/python-mysql.connector_1.1.6-1_all.deb
Size: 67196
MD5sum: 22b2cb35cf8b14ac0bf4493b0d676adb
SHA1: de626403e1b14f617e9acb0a6934f044fae061c7
SHA256:
99e34f67d085c28b49eb8145c281deaa6d2b2a48d741e6831e149510087aab94
Description-en: pure Python implementation of MySQL Client/Server
protocol
 MySQL driver written in Python which does not depend on MySQL C
client
 libraries and implements the DB API v2.0 specification (PEP-249).
 .
 MySQL Connector/Python is implementing the MySQL Client/Server
protocol
 completely in Python. This means you don't have to compile
anything or MySQL
 (client library) doesn't even have to be installed on the machine.
Description-md5: bb7e2eba7769d706d44e0ef91171b4ed
Homepage: http://dev.mysql.com/doc/connector-python/en/index.html
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Origin: Ubuntu

$ apt-cache show python3-mysql.connector
Package: python3-mysql.connector
Priority: optional
Section: universe/python
Installed-Size: 385
Maintainer: Ubuntu Developers
ubuntu-devel-disc...@lists.ubuntu.com
mailto:ubuntu-devel-disc...@lists.ubuntu.com
Original-Maintainer: Sandro Tosi mo...@debian.org
mailto:mo...@debian.org
Architecture: all
Source: mysql-connector-python
Version: 1.1.6-1
Depends: python3:any (= 3.3.2-2~)
Filename:

pool/universe/m/mysql-connector-python/python3-mysql.connector_1.1.6-1_all.deb
Size: 64870
MD5sum: 461208ed1b89d516d6f6ce43c003a173
SHA1: bd439c4057824178490b402ad6c84067e1e2884e
SHA256:
487af52b98bc5f048faf4dc73420eff20b75a150e1f92c82de2ecdd4671659ae
Description-en: pure Python implementation of MySQL Client/Server
protocol (Python3)
 MySQL driver written in Python which does not depend on MySQL C
client
 libraries and implements the DB API v2.0 specification (PEP-249).
 .
 MySQL Connector/Python is implementing the MySQL Client/Server
protocol
 completely in Python. This means you don't have to compile
 

Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-07 Thread Mike Bayer



On 5/7/15 5:32 PM, Thomas Goirand wrote:

If there are really fixes and features we

need in Py2K then of course we have to either convince MySQLdb to merge
them or switch to mysqlclient.


Given the no reply in 6 months I think that's enough to say it: 
mysql-python is a dangerous package with a non-responsive upstream. 
That's always bad, and IMO, enough to try to get rid of it. If you 
think switching to PyMYSQL is effortless, and the best way forward, 
then let's do that ASAP!


haha - id rather have drop eventlet + mysqlclient :)

as far as this thread, where this has been heading is that django has 
already been recommending mysqlclient and it's become apparent just what 
a barrage of emails and messages have been sent Andy Dustman's way, with 
no response.I agree this is troubling behavior, and I've alerted 
people at RH internal that we need to start thinking about this package 
switch.My original issue was that for Fedora etc., changing it in 
this way is challenging, and from my discussions with packaging people, 
this is actually correct - this isn't an easy way to do it for them and 
there have been many emails as a result.  My other issue is the 
SQLAlchemy testing issue - I'd essentially have to just stop testing 
mysql-python and switch to mysqlclient entirely, which means i need to 
revise all my docs and get all my users to switch also when the 
SQLAlchemy MySQLdb dialect eventually diverges from mysql-python 1.2.5, 
hence the whole thing is in a not-minor-enough way my problem as 
well.A simple module name change for mysqlclient, then there's no 
problem.   But there you go - assuming continued crickets from AD, and 
seeing that people continue find it important to appease projects like 
Trac that IMO quite amateurishly hardcode import MySQLdb, I don't see 
much other option.


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-07 Thread Thomas Goirand



On 05/05/2015 09:56 PM, Mike Bayer wrote:

Having two packages that both install into the same name is the least
ideal arrangement


From your point of view, and for testing against both, certainly. But 
for a distribution, avoiding dot have 2 packages clashing each other and 
deciding on only a single implementation of the same API is so much 
better in many ways. This avoid the duplication of work, security 
support, and above all: this makes it possible for all reverse 
dependency to just use the new implementation without doing anything.



and I don't see why we have to settle for a mediocre
outcome like that.  What we want is MySQL-Python to be maintained, we
have a maintainer, we have the code, we have everything we need, except
a password. We should at least make an attempt at that outcome.


A fork is often the worst thing that can happen to a project. See the 
examples of libav vs ffmpeg, libreoffice vs openoffice, or mysql vs 
mariadb. At the end, end users and developers all suffer. The only thing 
we can do is pickup the implementation which we believe is best for us. 
And in this case, it looks like mysqlclient has python3 support, which 
we want as a feature.


If you believe you can make it so that either:
#1 mysql-python can get Python 3 support.
#2 both forks are re-merged, and maintained as one.

then that's the best possible outcome (especially #2).

Whatever happens, talking to both upstream seems a very good idea to me.

However, it may not be possible to revert what has (or is about to) 
happen in Debian, as this is the decision of the package maintainer. I 
don't think it would be a good idea to go up to the Debian technical 
committee if the maintainer of the python-mysqldb package doesn't do 
something we like. The only other option we'd have would be to 
re-introduce mysql-python as a separate package, but the Debian FTP 
masters may oppose to it and reject it, unless we have a very good 
reason to do so (and at this point, I don't know if we do...).


Hoping the above helps with Debian insights,
Cheers,

Thomas Goirand (zigo)

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-07 Thread Thomas Goirand



On 05/05/2015 08:41 PM, Mike Bayer wrote:



On 5/4/15 6:48 PM, Thomas Goirand wrote:

I don't see what it would break. If I do:

Package: python-mysqlclient
Breaks: python-mysqldb
Replaces: python-mysqldb
Provides: python-mysqldb

everything is fine, and python-mysqlclient becomes another
implementation of the same thing. Then I believe it'd be a good idea
to simply remove python-mysqldb from Debian, since it's not maintained
upstream anymore.


It is also imprudent to switch
production openstack applications to a driver that is new and untested
(even though it is a port), nor is it necessary.


Supporting Python 3 is necessary, as we are going to remove Python 2
from Debian from Buster.

I don't know debian but the approach would be that something like the
mysqlclient-py3k package applies to Python 3 only.






There should be no
reason Openstack applications are hardcoded to one database driver.


If they share the same import mysqldb, and if they are API
compatible, how is this a problem?

how do you know they are API compatible?


According to Victor, that's what the author of the fork says. That's 
also what he wants, as per the issue 44 which you raised (the 
mysqlclient upstream wants it to be a drop-in replacement for mysqldb, 
to help distributions to better switch to Python 3).



This is in fact exactly where
this approach can become a huge problem.   No MySQL drivers I've ever
used are fully API compatible with any of the other ones. *all* of them
have subtle and not-so-subtle differences in behavior.  That mysqlclient
is now a fork means it will begin to diverge, and as issues come up to
which their resolution requires even more subtle or not-so-subtle
changes in behavior, these differences will only continue to grow.


I agree. Which is exactly why we don't want one package for Py2, and the 
other one for Py3.



From a SQLAlchemy perspective this would be much easier to maintain as
a new sub-dialect.


Best for SQLA and everyone else would be a re-merge as a single project. 
Either that, or just mysqlclient takes over completely mysql-python in 
PyPi, just like you suggested in the github issue 44. I'd love to see 
one or the other happen. The later could be decided by a PyPi 
administrator, given the fact that the mysql-python maintainer is 
unresponsive. Have you tried to approach someone with such rights at PyPi?


Though if it doesn't happen, as you wrote it's going to be hell for you 
to test against both implementation. Maybe then the only choice you have 
is to decide to use only one of them (and mysqlclient seems the best of 
both).


I by the way found methane very reasonable in his replies


The
approach should be simply that in Python 3, the mysqlclient library is
installed instead of mysql-python.


So, in Python 3, we'd have some bugfixes, and not in Python 2? This
seems a very weird approach to me, which *will* lead to lots of issues.

I've asked three times now to please show the bugfixes that are
needed.


Yourself, you wrote that there was some bugfixes and subtle differences, 
didn't you?



Show me the issues that aren't being fixed, and then I will
be convinced and begin the process of pushing here at Red Hat to make
the same packaging changes such that our customers will no longer be
able to use the original MySQLdb. We're talking about an instant,
systemwide replacement of one MySQLdb implementation for another and I
just think that is high risk.


IMO, since that's a fork, the risk isn't greater than just upgrading 
from one version to next for any given package.



Switching to mysqlclient is basically almost free (by that, I mean
effortless), if I understand what Victor wrote. The same thing can't
be said of removing Eventlet or switching to pymysql, even though if
both may be needed. So why add the later as a blocker for the former?

Well, switching to pymysql *is* just as effortless IMHO, and in fact
*more* effortless because it can be done impacting only individual
applications at a time, rather than forcing it on everything at once.
SQLAlchemy has a dialect for PyMySQL already which is well maintained
and well tested.  We change the database URL in projects to include
mysql+pymysql, update requirements.txt, distros add their packages
like they have to anyway, and we're done.


Really? If it's that simple, then please start doing this, and let's 
happily switch to PyMYSQL for Liberty.



But again, I really want to see what the critical issues in MySQLdb are
that are holding us back.


The main motivation is the lack of support for Python 3.


If there are really fixes and features we
need in Py2K then of course we have to either convince MySQLdb to merge
them or switch to mysqlclient.


Given the no reply in 6 months I think that's enough to say it: 
mysql-python is a dangerous package with a non-responsive upstream. 
That's always bad, and IMO, enough to try to get rid of it. If you think 
switching to PyMYSQL is effortless, and the best way forward, then 

Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-05 Thread Thomas Goirand



On 04/30/2015 05:00 PM, Victor Stinner wrote:

Hi,

I propose to replace mysql-python with mysqlclient in OpenStack applications to 
get Python 3 support, bug fixes and some new features (support MariaDB's 
libmysqlclient.so, support microsecond in TIME column).


In fact, when looking at the python-mysqldb package description in 
Debian, I can see:


 Mysqlclient is an interface to the popular MySQL database server for
 Python.
 .
 This is a fork of MySQLdb. It add Python 3.3 support and merges some
 pull requests.

Then I saw this:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=768096

The package is currently only in Debian experimental, but I am betting 
that soon, the new python-mysqldb package will be uploaded to Sid, and 
it's very likely that Ubuntu will follow (and sync the package from Debian).


As a consequence, I think it'd be much better that OpenStack follows 
that and use the same thing as distributions. I of course don't know 
what Fedora will do, but maybe they may follow the trend...


Also, I've been using that fork without realizing it, and as much as I 
can tell, OpenStack continues to work...


Cheers,

Thomas Goirand (zigo)

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-05 Thread Mike Bayer



On 5/5/15 1:11 PM, Thomas Goirand wrote:



On 04/30/2015 05:00 PM, Victor Stinner wrote:

Hi,

I propose to replace mysql-python with mysqlclient in OpenStack 
applications to get Python 3 support, bug fixes and some new features 
(support MariaDB's libmysqlclient.so, support microsecond in TIME 
column).


In fact, when looking at the python-mysqldb package description in 
Debian, I can see:


 Mysqlclient is an interface to the popular MySQL database server for
 Python.
 .
 This is a fork of MySQLdb. It add Python 3.3 support and merges some
 pull requests.

Then I saw this:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=768096


Wow, the thread decides to go forward with the move based on incorrect 
information.  MySQL-Python's last release was on Jan 2, 2014, *not* in 
2010.  They are looking at the entirely wrong repository.


Andy Dustman is a real person who is easily locatable on many services 
including Twitter, Linkedin, Github, etc.  Any chance that anyone 
has tried to get a comment from him on this, given that with the Django 
recommendation and the distro package moves, his package is about to be 
more or less wiped out of most major distributions?It just would be 
good style IMHO.




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-05 Thread Mike Bayer



On 5/5/15 3:07 PM, Mike Bayer wrote:



On 5/5/15 1:11 PM, Thomas Goirand wrote:



On 04/30/2015 05:00 PM, Victor Stinner wrote:

Hi,

I propose to replace mysql-python with mysqlclient in OpenStack 
applications to get Python 3 support, bug fixes and some new 
features (support MariaDB's libmysqlclient.so, support microsecond 
in TIME column).


In fact, when looking at the python-mysqldb package description in 
Debian, I can see:


 Mysqlclient is an interface to the popular MySQL database server for
 Python.
 .
 This is a fork of MySQLdb. It add Python 3.3 support and merges some
 pull requests.

Then I saw this:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=768096


Wow, the thread decides to go forward with the move based on incorrect 
information.  MySQL-Python's last release was on Jan 2, 2014, *not* in 
2010.  They are looking at the entirely wrong repository.


Andy Dustman is a real person who is easily locatable on many services 
including Twitter, Linkedin, Github, etc.  Any chance that anyone 
has tried to get a comment from him on this, given that with the 
Django recommendation and the distro package moves, his package is 
about to be more or less wiped out of most major distributions?It 
just would be good style IMHO.
There's also a great thread from Naoki on the Django list, where at 
least we can get a view of his plans for the project: 
https://groups.google.com/forum/#!msg/django-developers/n-TI8mBcegE/hlNLYncAFFkJ 
e.g. he isn't going to go for new features or anything like that, just 
ongoing compatibility.  That's a good thing.


But what we really want here is for Naoki to be able to release new 
MySQL-Python versions.   I'd like to see if we can get a hold of Andy 
Dustman and get his feelings on that.


Right now, I cannot test SQLAlchemy against both MySQL-Python and 
mysqlclient conveniently.   I need to make two different virtual 
environments and run the whole test suite separately. My test suite 
is able to run the tests against multiple backends in one Python process 
and with this packaging/import arrangement that's not possible.


Having two packages that both install into the same name is the least 
ideal arrangement and I don't see why we have to settle for a mediocre 
outcome like that.  What we want is MySQL-Python to be maintained, we 
have a maintainer, we have the code, we have everything we need, except 
a password.   We should at least make an attempt at that outcome.















__ 


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-05 Thread Mike Bayer



On 5/4/15 6:48 PM, Thomas Goirand wrote:

I don't see what it would break. If I do:

Package: python-mysqlclient
Breaks: python-mysqldb
Replaces: python-mysqldb
Provides: python-mysqldb

everything is fine, and python-mysqlclient becomes another 
implementation of the same thing. Then I believe it'd be a good idea 
to simply remove python-mysqldb from Debian, since it's not maintained 
upstream anymore.



It is also imprudent to switch
production openstack applications to a driver that is new and untested
(even though it is a port), nor is it necessary.


Supporting Python 3 is necessary, as we are going to remove Python 2 
from Debian from Buster.
I don't know debian but the approach would be that something like the 
mysqlclient-py3k package applies to Python 3 only.







There should be no
reason Openstack applications are hardcoded to one database driver.


If they share the same import mysqldb, and if they are API 
compatible, how is this a problem?
how do you know they are API compatible?   This is in fact exactly where 
this approach can become a huge problem.   No MySQL drivers I've ever 
used are fully API compatible with any of the other ones. *all* of them 
have subtle and not-so-subtle differences in behavior.  That mysqlclient 
is now a fork means it will begin to diverge, and as issues come up to 
which their resolution requires even more subtle or not-so-subtle 
changes in behavior, these differences will only continue to grow.


From a SQLAlchemy perspective this would be much easier to maintain as 
a new sub-dialect.  I've proposed that they change their name: 
https://github.com/PyMySQL/mysqlclient-python/issues/44 .  However, the 
maintainers are not going for it, so I guess that isn't going to happen.








The
approach should be simply that in Python 3, the mysqlclient library is
installed instead of mysql-python.


So, in Python 3, we'd have some bugfixes, and not in Python 2? This 
seems a very weird approach to me, which *will* lead to lots of issues.
I've asked three times now to please show the bugfixes that are 
needed.Show me the issues that aren't being fixed, and then I will 
be convinced and begin the process of pushing here at Red Hat to make 
the same packaging changes such that our customers will no longer be 
able to use the original MySQLdb. We're talking about an instant, 
systemwide replacement of one MySQLdb implementation for another and I 
just think that is high risk.




B. use pymysql.All other performance arguments are moot right now as
we are in the basement.


Eventlet has to die, we all know it. Not only for performances reason. 
But this is completely orthogonal to the discussion we're having about 
having Python 3 support. Please don't stand on the way to do it, just 
because we have other (unrelated) issues with Eventlet + MySQL.


Switching to mysqlclient is basically almost free (by that, I mean 
effortless), if I understand what Victor wrote. The same thing can't 
be said of removing Eventlet or switching to pymysql, even though if 
both may be needed. So why add the later as a blocker for the former?
Well, switching to pymysql *is* just as effortless IMHO, and in fact 
*more* effortless because it can be done impacting only individual 
applications at a time, rather than forcing it on everything at once.  
   SQLAlchemy has a dialect for PyMySQL already which is well 
maintained and well tested.  We change the database URL in projects to 
include mysql+pymysql, update requirements.txt, distros add their 
packages like they have to anyway, and we're done. From my view, if 
we're going to switch DBAPIs then PyMySQL would be it - if we're going 
for bug fixes in the DBAPI, the doesn't support eventlet is the 
*biggest* bug.


But again, I really want to see what the critical issues in MySQLdb are 
that are holding us back.   If there are really fixes and features we 
need in Py2K then of course we have to either convince MySQLdb to merge 
them or switch to mysqlclient.   At the moment though I need to see the 
evidence for me to really buy this argument.







__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-04 Thread Victor Stinner
Hi,

Mike Bayer wrote:
 It is not feasible to use MySQLclient in Python 2 because it uses the
 same module name as Python-MySQL, and would wreak havoc with distro
 packaging and many other things.

IMO mysqlclient is just the new upstream for MySQL-Python, since MySQL-Python 
is no more maintained.

Why Linux distributions would not package mysqlclient if it provides Python 3 
support, contains bugfixes and more features?

It's quite common to have two packages in conflicts beceause they provide the 
same function, same library, same program, etc.

I would even suggest packagers to use mysqlclient as the new source without 
modifying their package.


 It is also imprudent to switch
 production openstack applications to a driver that is new and untested
 (even though it is a port), nor is it necessary.

Why do you consider that mysqlclient is not tested or less tested than 
mysql-python? Which kind of regression do you expect in mysqlclient?

As mysql-python, mysqlclient Github project is connected to Travis:
https://travis-ci.org/PyMySQL/mysqlclient-python
(tests pass)

I trust more a project which is actively developed.


 There should be no
 reason Openstack applications are hardcoded to one database driver.
 The approach should be simply that in Python 3, the mysqlclient library
 is installed instead of mysql-python.

Technically, it's now possible to have different dependencies on Python 2 and 
Python 3. But in practice, there are some annoying corner cases. It's more 
convinient to have same dependencies on Python 2 and Python 3.

Using mysqlclient on Python 2 and Python 3 would avoid to have bugs specific to 
Python 2 (bugs already fixed in mysqlclient) and new features only available on 
Python 3.

Victor

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-04 Thread Victor Stinner
 I propose to replace mysql-python with mysqlclient in OpenStack applications
 to get Python 3 support, bug fixes and some new features (support MariaDB's
 libmysqlclient.so, support microsecond in TIME column).

I just proposed a change to add mysqlclient dependency to global requirements:

   https://review.openstack.org/#/c/179745/

Victor

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-05-04 Thread Thomas Goirand



On 04/30/2015 07:48 PM, Mike Bayer wrote:



On 4/30/15 11:00 AM, Victor Stinner wrote:

Hi,

I propose to replace mysql-python with mysqlclient in OpenStack
applications to get Python 3 support, bug fixes and some new features
(support MariaDB's libmysqlclient.so, support microsecond in TIME
column).


It is not feasible to use MySQLclient in Python 2 because it uses the
same module name as Python-MySQL, and would wreak havoc with distro
packaging and many other things.


I don't see what it would break. If I do:

Package: python-mysqlclient
Breaks: python-mysqldb
Replaces: python-mysqldb
Provides: python-mysqldb

everything is fine, and python-mysqlclient becomes another 
implementation of the same thing. Then I believe it'd be a good idea to 
simply remove python-mysqldb from Debian, since it's not maintained 
upstream anymore.



It is also imprudent to switch
production openstack applications to a driver that is new and untested
(even though it is a port), nor is it necessary.


Supporting Python 3 is necessary, as we are going to remove Python 2 
from Debian from Buster.



There should be no
reason Openstack applications are hardcoded to one database driver.


If they share the same import mysqldb, and if they are API compatible, 
how is this a problem?



The
approach should be simply that in Python 3, the mysqlclient library is
installed instead of mysql-python.


So, in Python 3, we'd have some bugfixes, and not in Python 2? This 
seems a very weird approach to me, which *will* lead to lots of issues.



MySQLclient installs under the same
name, so in this case there isn't even any change to the SQLAlchemy URL
required.


Nor there should be in anything else, if they are completely API compatible.


PyMySQL is monkeypatchable, so as long as we are using eventlet, it is
*insane* that we are using MySQL-Python at all, because it is actively
making openstack applications perform much much more poorly than if we
just removed eventlet.  So as long as eventlet is running, PyMySQL
wins the performance argument hands down (as described at the link
http://www.diamondtin.com/2014/sqlalchemy-gevent-mysql-python-drivers-comparison/
which is in the third paragraph of that wiki page).  And it's Py3k
compatible.


Ok, so you are for switching to pymysql. Good. But is this realistic? 
Are you going to provide yourself all the patches for absolutely all 
projects of OpenStack that is using python-mysqldb?



1. keep Mysql-python on Py2K, use mysqlclient on py3k, changing the
implementation of the MySQLdb module on Py2K, server-wide, would be
very disruptive


I'm sorry to say it this way, because I respect you a lot and you did a 
lot of very good things. But Mike, this is a very silly idea. We are 
already having difficulties to push support for Py3, and in some cases, 
it's hard to deal with the differences. Now, you want to add even more 
source of problems, with bugs specific to Py2 or Py3 implementation? Why 
should we make our life even more miserable? I completely fail to 
understand what we would try to achieve by doing this.



2. if we actually care about performance, we either A. dump eventlet or
B. use pymysql.All other performance arguments are moot right now as
we are in the basement.


Eventlet has to die, we all know it. Not only for performances reason. 
But this is completely orthogonal to the discussion we're having about 
having Python 3 support. Please don't stand on the way to do it, just 
because we have other (unrelated) issues with Eventlet + MySQL.


Switching to mysqlclient is basically almost free (by that, I mean 
effortless), if I understand what Victor wrote. The same thing can't be 
said of removing Eventlet or switching to pymysql, even though if both 
may be needed. So why add the later as a blocker for the former?


Cheers,

Thomas Goirand (zigo)

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-04-30 Thread Mike Bayer



On 4/30/15 11:16 AM, Dan Smith wrote:

There is an open discussion to replace mysql-python with PyMySQL, but
PyMySQL has worse performance:

https://wiki.openstack.org/wiki/PyMySQL_evaluation

My major concern with not moving to something different (i.e. not based
on the C library) is the threading problem. Especially as we move in the
direction of cellsv2 in nova, not blocking the process while waiting for
a reply from mysql is going to be critical. Further, I think that we're
likely to get back a lot of performance from a supports-eventlet
database connection because of the parallelism that conductor currently
can only provide in exchange for the footprint of forking into lots of
workers.

If we're going to move, shouldn't we be looking at something that
supports our threading model?
yes, but at the same time, we should change our threading model at the 
level of where APIs are accessed to refer to a database, at the very 
least using a threadpool behind eventlet.   CRUD-oriented database 
access is faster using traditional threads, even in Python, than using 
an eventlet-like system or using explicit async.  The tests at 
http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/ 
show this.With traditional threads, we can stay on the C-based MySQL 
APIs and take full advantage of their speed.




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-04-30 Thread Mike Bayer



On 4/30/15 11:00 AM, Victor Stinner wrote:

Hi,

I propose to replace mysql-python with mysqlclient in OpenStack applications to 
get Python 3 support, bug fixes and some new features (support MariaDB's 
libmysqlclient.so, support microsecond in TIME column).


It is not feasible to use MySQLclient in Python 2 because it uses the 
same module name as Python-MySQL, and would wreak havoc with distro 
packaging and many other things.   It is also imprudent to switch 
production openstack applications to a driver that is new and untested 
(even though it is a port), nor is it necessary.   There should be no 
reason Openstack applications are hardcoded to one database driver.
The approach should be simply that in Python 3, the mysqlclient library 
is installed instead of mysql-python. MySQLclient installs under the 
same name, so in this case there isn't even any change to the SQLAlchemy 
URL required.




The MySQL database is popular, but the Python driver mysql-python doesn't look 
to be maintained anymore. The latest commit was done in january 2014, before 
the release of MySQL-python 1.2.5:

https://github.com/farcepest/MySQLdb1/commits/master

One major issue is that mysql-python doesn't support Python 3. It blocks 
porting most OpenStack applications to Python 3. There are now 32 open issues 
and 25 pending pull requests. I also sent an email to Andy Dustman (aka 
farcepest) last week, but I didn't get any reply yet.


There is an open discussion to replace mysql-python with PyMySQL, but PyMySQL 
has worse performance:

https://wiki.openstack.org/wiki/PyMySQL_evaluation
PyMySQL is monkeypatchable, so as long as we are using eventlet, it is 
*insane* that we are using MySQL-Python at all, because it is actively 
making openstack applications perform much much more poorly than if we 
just removed eventlet.  So as long as eventlet is running, PyMySQL 
wins the performance argument hands down (as described at the link 
http://www.diamondtin.com/2014/sqlalchemy-gevent-mysql-python-drivers-comparison/ 
which is in the third paragraph of that wiki page).  And it's Py3k 
compatible.


The performance results in that wiki page are also out of date. Naoki 
INADA has merged several performance improvements since then.


My ultimate setup would still use mysql-python Py2K / MySQLclient Py3K, 
and Openstack applications would again use traditional threads for 
database APIs.   But that is two changes.



so to sum up:


1. keep Mysql-python on Py2K, use mysqlclient on py3k, changing the 
implementation of the MySQLdb module on Py2K, server-wide, would be 
very disruptive


2. if we actually care about performance, we either A. dump eventlet or 
B. use pymysql.All other performance arguments are moot right now as 
we are in the basement.




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-04-30 Thread Victor Stinner
 If we're going to move, shouldn't we be looking at something that
 supports our threading model?

I would prefer to make baby steps, and first fix the Python 3 compatibility.

Enhance concurrency/parallelism is a much more complex project than just 
replacing a single line in dependencies ;-)

See my email, I mentioned a workaround for mysqlclient and a spec discussing a 
more general solution for concurrency.

Victor

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Replace mysql-python with mysqlclient

2015-04-30 Thread Dan Smith
 There is an open discussion to replace mysql-python with PyMySQL, but
 PyMySQL has worse performance:
 
 https://wiki.openstack.org/wiki/PyMySQL_evaluation

My major concern with not moving to something different (i.e. not based
on the C library) is the threading problem. Especially as we move in the
direction of cellsv2 in nova, not blocking the process while waiting for
a reply from mysql is going to be critical. Further, I think that we're
likely to get back a lot of performance from a supports-eventlet
database connection because of the parallelism that conductor currently
can only provide in exchange for the footprint of forking into lots of
workers.

If we're going to move, shouldn't we be looking at something that
supports our threading model?

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev