Hello, Answears below
--- Best regards Sławek Kapłoński sla...@kaplonski.pl Dnia poniedziałek, 22 września 2014 13:41:51 Jay Pipes pisze: > Hi Peter, Sławek, answers inline... > > On 09/22/2014 08:12 AM, Peter Boros wrote: > > Hi, > > > > StaleDataError is not given by MySQL, but rather SQLAlchemy. After a > > quick look, it seems like SQLAlchemy gets this, if the update updated > > different number of rows then it expected. I am not sure what is the > > expectation based on, perhaps soembody can chime in and we can put > > this together. What is the transaction isolation level you are running > > on? > > The transaction isolation level is REPEATABLE_READ, unless Sławek has > changed the defaults (unlikely). For sure I didn't change it > > > For the timeout setting in neutron: that's a good way to approach it > > too, you can even be more agressive and set it to a few seconds. In > > MySQL making connections is very cheap (at least compared to other > > databases), an idle timeout of a few seconds for a connection is > > typical. > > > > On Mon, Sep 22, 2014 at 12:35 PM, Sławek Kapłoński <sla...@kaplonski.pl> wrote: > >> Hello, > >> > >> Thanks for Your explanations. I thought so and now I decrease > >> "idle_connection_timeout" in neutron and nova. Now when master server is > >> back to cluster than in less than one minute all conections are again > >> made to this master node becuase old connections which was made to > >> backup node are closed. So for now it looks almost perfect but when I > >> now testing cluster (with master node active and all connections > >> established to this node) in neutron I still sometimes see errors like: > >> StaleDataError: UPDATE statement on table 'ports' expected to update 1 > >> row(s); 0 were matched. > >> > >> and also today I found errors like: > >> 2014-09-22 11:38:05.715 11474 INFO sqlalchemy.engine.base.Engine [-] > >> ROLLBACK 2014-09-22 11:38:05.784 11474 ERROR > >> neutron.openstack.common.db.sqlalchemy.session [-] DB exception wrapped. > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session Traceback (most recent > >> call > >> last): > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session File > >> "/usr/lib/python2.7/dist- > >> packages/neutron/openstack/common/db/sqlalchemy/session.py", line 524, in > >> _wrap > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session return f(*args, > >> **kwargs) 2014-09-22 11:38:05.784 11474 TRACE > > From looking up the code, it looks like you are using Havana [1]. The > code in the master branch of Neutron now uses oslo.db, not > neutron.openstack.common.db, so this issue may have been resolved in > later versions of Neutron. Yes, I'm using havana and I have now no possibility to upgrade it fast to icehouse (about master branch I even don't want to think :)). Do You want to tell me that this problem will be existing in havana and this can't be fixed in that release? > > [1] > https://github.com/openstack/neutron/blob/stable/havana/neutron/openstack/co > mmon/db/sqlalchemy/session.py#L524 > >> neutron.openstack.common.db.sqlalchemy.session File > >> "/usr/lib/python2.7/dist- > >> packages/neutron/openstack/common/db/sqlalchemy/session.py", line 718, in > >> flush 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session return super(Session, > >> self).flush(*args, **kwargs) > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session File > >> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 1818, > >> in > >> flush > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session self._flush(objects) > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session File > >> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 1936, > >> in > >> _flush > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session > >> transaction.rollback(_capture_exception=True) > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session File > >> "/usr/lib/python2.7/dist-packages/sqlalchemy/util/langhelpers.py", line > >> 58, in __exit__ > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session > >> compat.reraise(exc_type, > >> exc_value, exc_tb) > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session File > >> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 1900, > >> in > >> _flush > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session > >> flush_context.execute() > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session File > >> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/unitofwork.py", line > >> 372, in execute > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session rec.execute(self) > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session File > >> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/unitofwork.py", line > >> 525, in execute > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session uow > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session File > >> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", line > >> 64, in save_obj > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session table, insert) > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session File > >> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", line > >> 541, in _emit_insert_statements > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session execute(statement, > >> multiparams) > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session File > >> "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 662, > >> in > >> execute > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session params) > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session File > >> "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 761, > >> in > >> _execute_clauseelement > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session compiled_sql, > >> distilled_params > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session File > >> "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 874, > >> in > >> _execute_context > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session context) > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session File > >> "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1027, > >> in > >> _handle_dbapi_exception > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session > >> util.reraise(*exc_info) > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session File > >> "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 856, > >> in > >> _execute_context > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session context) > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session File > >> "/usr/lib/python2.7/dist-packages/sqlalchemy/connectors/mysqldb.py", line > >> 60, in do_executemany > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session rowcount = > >> cursor.executemany(statement, parameters) > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session File > >> "/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 206, in > >> executemany > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session r = r + > >> self.execute(query, a) > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session TypeError: unsupported > >> operand type(s) for +: 'int' and 'NoneType' > > Hmm, this is just bad coding in the MySQLdb driver, frankly. It is > assuming a call to Cursor.execute() will return an integer, but it can > return None in some circumstances. See code here: > > http://sourceforge.net/p/mysql-python/code/ci/8041cc6df636b9c42d52e01b727aa9 > 8b43f3632c/tree/MySQLdb/MySQLdb/cursors.py > > Note that MySQLdb1 (the above, which is what is packaged in your Linux > distro I believe) is old and buggy. The maintainer has moved on to > MySQLdb2, which has a different call interface in this part of the code. Ok, I checked that code and I found that in fact sometimes method "execute" returns None and when it is called from executemany than there is problem. Simple change in executemany method to: if not m: r = 0 for a in args: iter_r = self.execute(query, a) if iter_r: r = r + iter_r return r for me now in that tests which I made but I don't know what be result of that change in longer time period and why execute method returns that None value :/ You wrote that MySQLdb2 has different call interface in this part of code - should change to MySQLdb2 fix that problem or not? If yes can You explain me how I can change it? Should I install other package just or maybe I should set in somewhere in config? (I'm using ubuntu 12.04) > > >> 2014-09-22 11:38:05.784 11474 TRACE > >> neutron.openstack.common.db.sqlalchemy.session > >> > >> And I have to investigate why it is happening becuase whith single mysql > >> server I have no such errors :/ > > Not sure, frankly. The code is executing many INSERT or UPDATE > statements in a single block. The MySQL connection is clearly getting > borked on one of those attempts and results in the traceback you see above. I'm not sure in 100% but on much bigger cluster with one database server there was no such problem even once so I supposed that it is related somehow to galera which I now want to use. > > best, > -jay > > >> --- > >> Best regards > >> Sławek Kapłoński > >> sla...@kaplonski.pl > >> > >> Dnia poniedziałek, 22 września 2014 11:18:27 Peter Boros pisze: > >>> Hi, > >>> > >>> Let me answer this and one of your previous questions in one because > >>> they are related. > >>> > >>> Earlier you wrote: > >>>> I made such modifications today in my infra and generally it looks > >>>> better now. I don't see deadlocks. But I have one more problem with > >>>> that: generally it works fine when main node is active but in > >>>> situation when this node is down, haproxy connect to one of backup > >>>> nodes. Still all is ok but problem is when main node is up again - all > >>>> new connections are made to main node but active connections which was > >>>> made to backup node still are active and neutron (or nova) are using > >>>> connections to two servers and then there are problems with deadlock > >>>> again. > >>>> Do You know how to prevent such situation? > >>> > >>> This is because of how haproxy works. Haproxy's load balancing is TCP > >>> level, once the TCP connection is established, haproxy has nothing to > >>> do with it. If the MySQL application (neutron in this case), uses > >>> persistent connections, at the time of failing over, haproxy doesn't > >>> make an extra decision upon failover, because a connection is already > >>> established. This can be mitigated by using haproxy 1.5 and defining > >>> the backend with on-marked-down shutdown-sessions, this will kill the > >>> connections at the TCP level on the formerly active node. Or in case > >>> of graceful failover, include killing connections in the failover > >>> script on the formerly active node. The application in this case will > >>> get error 1 or 2 you described. > >>> > >>> From your description error 1 and 2 are related to killing > >>> > >>> connections. Case 1 (MySQL server has gone away) happens when the > >>> connection was killed (but not at the MySQL protocol level) while it > >>> was idle, and the application is attempting to re-use it. In this case > >>> the correct behaviour would be re-establishing the connection. Error 2 > >>> is the same thing, but while the connection was actually doing > >>> something, reconnecting and retrying is the correct behaviour. These > >>> errors are not avoidable, if the node dies non-gracefully. A server > >>> can for example lose power while doing the transaction, in this case > >>> the transaction will be aborted, and the application will get one of > >>> the errors described above. The application has to know that the data > >>> is not written, since it didn't do commit or database didn't > >>> acknowledge the commit. > >>> > >>> On Sat, Sep 20, 2014 at 12:11 AM, Sławek Kapłoński <sla...@kaplonski.pl> > >> > >> wrote: > >>>> Hello, > >>>> > >>>> New questions below > >>>> > >>>> --- > >>>> Best regards > >>>> Sławek Kapłoński > >>>> sla...@kaplonski.pl > >>>> > >>>> Dnia czwartek, 18 września 2014 09:45:21 Clint Byrum pisze: > >>>>> Excerpts from Sławek Kapłoński's message of 2014-09-18 09:29:27 -0700: > >>>>>> Hello, > >>>>>> > >>>>>> Is anyone here using openstack with mysql galera and haproxy? Have > >>>>>> You > >>>>>> got > >>>>>> any problems with that? > >>>>>> I was today installed such ha infra for database (two mysql servers > >>>>>> in > >>>>>> galera cluster and haproxy on controller and neutron node, this > >>>>>> haproxy > >>>>>> is connecting to one of galera servers with round robin algorithm). > >>>>>> Generally all is working fine but I have few problems: > >>>>>> 1. I have a lot of messages like: > >>>>>> WARNING neutron.openstack.common.db.sqlalchemy.session [-] Got mysql > >>>>>> server > >>>>>> has gone away: (2006, 'MySQL server has gone away') > >>>>>> 2. I have (most on neutron) many errors like: > >>>>>> OperationalError: (OperationalError) (2013, 'Lost connection to MySQL > >>>>>> server during query') 'UPDATE ml2_port_bindings SET vif_type=%s, > >>>>>> driver=%s, segment=%s WHERE ml2_port_bindings.port_id = > >>>>> > >>>>> 1 and 2 look like timeout issues. Check haproxy's timeouts. They need > >>>>> to be just a little longer than MySQL's connection timeouts. > >>>> > >>>> After I made ACTIVE/PASSIVE cluster and change sql_idle_timeout in > >>>> neutron > >>>> and nova problem 1 looks that is solver. Unfortunatelly I found that > >>>> when > >>>> I'm deleting port from neutron I still sometimes have got errors like > >>>> in > >>>> 2. I don't check exactly nova logs yet so I'm not sure is it only in > >>>> neutron or in both. > >>>> Do You maybe know why it happens in neutron? It not happend when I have > >>>> single mysql node without haproxy and galera so I suppose that haproxy > >>>> or > >>>> galera is responsible for that problem :/ > >>>> > >>>>>> 3. Also errors: > >>>>>> StaleDataError: UPDATE statement on table 'ports' expected to update > >>>>>> 1 > >>>>>> row(s); 0 were matched. > >>>>>> 4. and errors: > >>>>>> DBDeadlock: (OperationalError) (1213, 'Deadlock found when trying to > >>>>>> get > >>>>>> lock; try restarting transaction') 'UPDATE ipavailabilityranges SET > >>>>>> first_ip=%s WHERE ipavailabilityranges.allocation_pool_id = > >>>>> > >>>>> 3 and 4 are a known issue. Our code doesn't always retry transactions, > >>>>> which is required to use Galera ACTIVE/ACTIVE. Basically, that doesn't > >>>>> work. > >>>>> > >>>>> You can use ACTIVE/PASSIVE, and even do vertical partitioning where > >>>>> one of the servers is ACTIVE for Nova, but another one is ACTIVE for > >>>>> Neutron. But AFAIK, ACTIVE/ACTIVE isn't being tested and the work > >>>>> hasn't > >>>>> been done to make the concurrent transactions work properly. > >>>>> > >>>>> _______________________________________________ > >>>>> OpenStack-operators mailing list > >>>>> OpenStack-operators@lists.openstack.org > >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operator > >>>>> s > >>>> > >>>> _______________________________________________ > >>>> OpenStack-operators mailing list > >>>> OpenStack-operators@lists.openstack.org > >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > >> > >> _______________________________________________ > >> OpenStack-operators mailing list > >> OpenStack-operators@lists.openstack.org > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators