Re: [Openstack] [grizzly]Problems of qpid as rpcbackend
Hi all, I think it is a bug of qpid as rpcbackend. Other service(nova-compute, cinder-scheduler, etc) use eventlet thead to run service. They stop service use thread kill() method. The last step rpc.cleanup() just did nothing, because the relative consume connection run in thread and killed. I think it is unnecessary. All queue is auto-delete, they will be removed when all receiver disappear. However, cinder-volume use process to run service, so stop service need to close connection and receiver (consumer) of the session of connection need to close when call connection.close(). receiver close will sent MessageCancel and QueueDelete message to broker(qpid server), so that all cinder-volume queue be removed. I think that the reason of problem confused me. But I don't know how to solve it. 2013/5/28 minmin ren rmm0...@gmail.com I think I found some problems of qpid as rpcbackend, however I'm not sure about it. Could anyone try to test it with your environment? openstack grizzly version config file need debug=True 1. service openstack-cinder-scheduler stop (nova-compute, nova-scheduler, etc) 2. vi /var/log/cinder/scheduler.log some info will be found like this. I deployed two machines(node1 and dev202) 2013-05-27 06:02:46 CRITICAL [cinder] need more than 0 values to unpack Traceback (most recent call last): File /usr/bin/cinder-scheduler, line 50, in module service.wait() File /usr/lib/python2.6/site-packages/cinder/service.py, line 613, in wait rpc.cleanup() File /usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/__init__.py, line 240, in cleanup return _get_impl().cleanup() File /usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/impl_qpid.py, line 649, in cleanup return rpc_amqp.cleanup(Connection.pool) File /usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/amqp.py, line 671, in cleanup connection_pool.empty() File /usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/amqp.py, line 80, in empty self.get().close() File /usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/impl_qpid.py, line 386, in close self.connection.close() File string, line 6, in close File /usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py, line 316, in close ssn.close(timeout=timeout) File string, line 6, in close File /usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py, line 749, in close if not self._ewait(lambda: self.closed, timeout=timeout): File /usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py, line 566, in _ewait result = self.connection._ewait(lambda: self.error or predicate(), timeout) File /usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py, line 208, in _ewait result = self._wait(lambda: self.error or predicate(), timeout) File /usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py, line 193, in _wait return self._waiter.wait(predicate, timeout=timeout) File /usr/lib/python2.6/site-packages/qpid/concurrency.py, line 57, in wait self.condition.wait(3) File /usr/lib/python2.6/site-packages/qpid/concurrency.py, line 96, in wait sw.wait(timeout) File /usr/lib/python2.6/site-packages/qpid/compat.py, line 53, in wait ready, _, _ = select([self], [], [], timeout) ValueError: need more than 0 values to unpack I put the problems with multi-cinder-volumes on launchpad https://answers.launchpad.net/cinder/+question/229456 Because I encountered this problems, however others services except cinder-volume never appear this problems. Then I found other services log print some critical info, error at self.connection.close() So I delete self.connection.close() which should not be removed, I watch qpid queue infomation, the problem which I confused on multi-cinder-volumes disappear. As a result, I think the problem I found may be a bug. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [grizzly]Problems of qpid as rpcbackend
I am not familiar with impl_qpid,py, but am familiar with amqp.py and have had problems around rpc_amqp.cleanup() the Pool.empty() method it calls. It was a totally different problem, but I decided to take a look at your problem. I noticed that in impl_qpid.py the only other place a connection.close() is done is surrounded by this code: # Close the session if necessary if self.connection.opened(): try: self.connection.close() except qpid_exceptions.ConnectionError: pass I suggest you wrap the close at line 386 of impl_qpid.py with the same code and your problem will be fixed. Here is the line identified from your call stack: File /usr/lib/python2.6/site- packages/cinder/openstack/common/rpc/impl_qpid.py, line 386, in close self.connection.close() If that works, open a bug report. Good catch Ray ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [grizzly]Problems of qpid as rpcbackend
Hi Ray, Thanks for your reply. try except change to line 386 only solve cinder-scheduler or nova-compute service which is the similar implementation stop raise exception. However, all cinder-volume queue be removed when one of multi-cinder-volume service stop. It is another problem. I use pdb module to trace two different sevice stop(cinder-scheduler and cinder-volume). I describe two different implemention stop service cinder-scheduler catch the signal to stop will to call _launcher.stop() cinder/service.py line 612 _launcher.stop() will kill all service thread which run service.start and service.wait . After thread killed, I found that connection.session.recievers is [], that means all consumer released. I'm not sure connection closed or not. I found that the method kill() of class service not be called. cinder-volume launch two processes, service run in child process (service.py line 227) and parent process watch the status of child. When parent process catch to stop signal, it send the stop signal to child process. child process will catch signal and call service.stop (service.py line 239) And I use pdb to trace stop steps. I found that connection.session.receivers is not [] and including three receivers(cinder-volume, cinder-volume.node1, cinder-volume_fanout) qpid will remove receivers of session, then MessageCancel and QueueDelete will set to qpidd. I think QueueDelete told the qpidd to delete all cinder-volume queues. 2013/5/30 Ray Pekowski pekow...@gmail.com I am not familiar with impl_qpid,py, but am familiar with amqp.py and have had problems around rpc_amqp.cleanup() the Pool.empty() method it calls. It was a totally different problem, but I decided to take a look at your problem. I noticed that in impl_qpid.py the only other place a connection.close() is done is surrounded by this code: # Close the session if necessary if self.connection.opened(): try: self.connection.close() except qpid_exceptions.ConnectionError: pass I suggest you wrap the close at line 386 of impl_qpid.py with the same code and your problem will be fixed. Here is the line identified from your call stack: File /usr/lib/python2.6/site- packages/cinder/openstack/common/rpc/impl_qpid.py, line 386, in close self.connection.close() If that works, open a bug report. Good catch Ray ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] [grizzly]Problems of qpid as rpcbackend
I think I found some problems of qpid as rpcbackend, however I'm not sure about it. Could anyone try to test it with your environment? openstack grizzly version config file need debug=True 1. service openstack-cinder-scheduler stop (nova-compute, nova-scheduler, etc) 2. vi /var/log/cinder/scheduler.log some info will be found like this. I deployed two machines(node1 and dev202) 2013-05-27 06:02:46 CRITICAL [cinder] need more than 0 values to unpack Traceback (most recent call last): File /usr/bin/cinder-scheduler, line 50, in module service.wait() File /usr/lib/python2.6/site-packages/cinder/service.py, line 613, in wait rpc.cleanup() File /usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/__init__.py, line 240, in cleanup return _get_impl().cleanup() File /usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/impl_qpid.py, line 649, in cleanup return rpc_amqp.cleanup(Connection.pool) File /usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/amqp.py, line 671, in cleanup connection_pool.empty() File /usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/amqp.py, line 80, in empty self.get().close() File /usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/impl_qpid.py, line 386, in close self.connection.close() File string, line 6, in close File /usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py, line 316, in close ssn.close(timeout=timeout) File string, line 6, in close File /usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py, line 749, in close if not self._ewait(lambda: self.closed, timeout=timeout): File /usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py, line 566, in _ewait result = self.connection._ewait(lambda: self.error or predicate(), timeout) File /usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py, line 208, in _ewait result = self._wait(lambda: self.error or predicate(), timeout) File /usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py, line 193, in _wait return self._waiter.wait(predicate, timeout=timeout) File /usr/lib/python2.6/site-packages/qpid/concurrency.py, line 57, in wait self.condition.wait(3) File /usr/lib/python2.6/site-packages/qpid/concurrency.py, line 96, in wait sw.wait(timeout) File /usr/lib/python2.6/site-packages/qpid/compat.py, line 53, in wait ready, _, _ = select([self], [], [], timeout) ValueError: need more than 0 values to unpack I put the problems with multi-cinder-volumes on launchpad https://answers.launchpad.net/cinder/+question/229456 Because I encountered this problems, however others services except cinder-volume never appear this problems. Then I found other services log print some critical info, error at self.connection.close() So I delete self.connection.close() which should not be removed, I watch qpid queue infomation, the problem which I confused on multi-cinder-volumes disappear. As a result, I think the problem I found may be a bug. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp