Re: [openstack-dev] Should RPC consume_in_thread() be more fault tolerant?
In this case, the exception is just hard to debug, and not something we expected in our design. The thorough solution would be to debug further to see the root cause, and deal with it as I suggested in the code review. However, If it is time consuming to debug, we can put the patch in there with logging to record the circumstances the 'unexpected' exception occurs, so that we can put in new patches to deal it when it becomes expected. After all, this is an iterative process. When the software evolves, a lot of unexpected has become expected. -Original Message- From: Raymond Pekowski (Code Review) [mailto:rev...@openstack.org] Sent: Tuesday, June 25, 2013 9:52 AM Cc: Dirk Mueller; Andrea Rosa; Ben Nemec; Chris Behrens; Eric Windisch; Russell Bryant; Qing He Subject: Change in openstack/oslo-incubator[master]: Make AMQP based RPC consumer threads more robust Raymond Pekowski has posted comments on this change. Change subject: Make AMQP based RPC consumer threads more robust .. Patch Set 12: By definition, unexpected exceptions are unexpected, so there is nothing yet to do root cause analysis on. We could argue as to what the appropriate action to take is, but that has already been discussed on the mailing list started by this thread: http://lists.openstack.org/pipermail/openstack-dev/2013-June/010040.html Please comment on that thread since you don't agree with the consensus that was reached. -- To view, visit https://review.openstack.org/32235 To unsubscribe, visit https://review.openstack.org/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0d6ec8a5e3a310314da201656ee862bb40b41616 Gerrit-PatchSet: 12 Gerrit-Project: openstack/oslo-incubator Gerrit-Branch: master Gerrit-Owner: Raymond Pekowski raymond_pekow...@dell.com Gerrit-Reviewer: Andrea Rosa andrea.r...@hp.com Gerrit-Reviewer: Ben Nemec openst...@nemebean.com Gerrit-Reviewer: Chris Behrens cbehr...@codestud.com Gerrit-Reviewer: Dirk Mueller d...@dmllr.de Gerrit-Reviewer: Eric Windisch e...@cloudscaling.com Gerrit-Reviewer: Jenkins Gerrit-Reviewer: Qing He qing...@radisys.com Gerrit-Reviewer: Raymond Pekowski raymond_pekow...@dell.com Gerrit-Reviewer: Russell Bryant rbry...@redhat.com ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Should RPC consume_in_thread() be more fault tolerant?
On 06/25/2013 03:15 PM, Ray Pekowski wrote: On Jun 25, 2013 1:09 PM, Qing He qing...@radisys.com mailto:qing...@radisys.com wrote: Basically, when 'unexpected' happens, someone (e.g., operator) needs to know about it and look into it to see if it is something benign or fatal. If it is masked, the system may degrade overtime unnoticed into unusable. The approach implemented in the patch is to log the exception and retry at a rate of one per second. An alternative would be a log and a sys.exit() to kill the entire process. Be aware that the code affected by this patch is rpc created dispatcher like threads. Let's have a vote on which option is preferrable. I like it how it's implemented, *not* killing the process ... -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Should RPC consume_in_thread() be more fault tolerant?
Agree! Let someone know and keep going unless someone wants to interrupt it or do something. (Does there exist a mechanism already to do this?) -Original Message- From: Russell Bryant [mailto:rbry...@redhat.com] Sent: Tuesday, June 25, 2013 12:21 PM To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] Should RPC consume_in_thread() be more fault tolerant? On 06/25/2013 03:15 PM, Ray Pekowski wrote: On Jun 25, 2013 1:09 PM, Qing He qing...@radisys.com mailto:qing...@radisys.com wrote: Basically, when 'unexpected' happens, someone (e.g., operator) needs to know about it and look into it to see if it is something benign or fatal. If it is masked, the system may degrade overtime unnoticed into unusable. The approach implemented in the patch is to log the exception and retry at a rate of one per second. An alternative would be a log and a sys.exit() to kill the entire process. Be aware that the code affected by this patch is rpc created dispatcher like threads. Let's have a vote on which option is preferrable. I like it how it's implemented, *not* killing the process ... -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Should RPC consume_in_thread() be more fault tolerant?
Does the log alert operator? Something like SNMP trap? From: Ray Pekowski [mailto:pekow...@gmail.com] Sent: Tuesday, June 25, 2013 12:16 PM To: OpenStack Development Mailing List Subject: Re: [openstack-dev] Should RPC consume_in_thread() be more fault tolerant? On Jun 25, 2013 1:09 PM, Qing He qing...@radisys.commailto:qing...@radisys.com wrote: Basically, when 'unexpected' happens, someone (e.g., operator) needs to know about it and look into it to see if it is something benign or fatal. If it is masked, the system may degrade overtime unnoticed into unusable. The approach implemented in the patch is to log the exception and retry at a rate of one per second. An alternative would be a log and a sys.exit() to kill the entire process. Be aware that the code affected by this patch is rpc created dispatcher like threads. Let's have a vote on which option is preferrable. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Should RPC consume_in_thread() be more fault tolerant?
On 06/25/2013 04:08 PM, Qing He wrote: Does the log alert operator? Something like SNMP trap? You can turn on a mode where it will emit a notification, and notifications can be published via AMQP. -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Should RPC consume_in_thread() be more fault tolerant?
Clarify, operator does not have to go through a long log to find the issue. Instead, he/she needs to be notified that something severe/unexpected just happened and he/she needs to check it out. From: Qing He Sent: Tuesday, June 25, 2013 1:09 PM To: 'OpenStack Development Mailing List' Subject: RE: [openstack-dev] Should RPC consume_in_thread() be more fault tolerant? Does the log alert operator? Something like SNMP trap? From: Ray Pekowski [mailto:pekow...@gmail.com]mailto:[mailto:pekow...@gmail.com] Sent: Tuesday, June 25, 2013 12:16 PM To: OpenStack Development Mailing List Subject: Re: [openstack-dev] Should RPC consume_in_thread() be more fault tolerant? On Jun 25, 2013 1:09 PM, Qing He qing...@radisys.commailto:qing...@radisys.com wrote: Basically, when 'unexpected' happens, someone (e.g., operator) needs to know about it and look into it to see if it is something benign or fatal. If it is masked, the system may degrade overtime unnoticed into unusable. The approach implemented in the patch is to log the exception and retry at a rate of one per second. An alternative would be a log and a sys.exit() to kill the entire process. Be aware that the code affected by this patch is rpc created dispatcher like threads. Let's have a vote on which option is preferrable. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev