Re: [openstack-dev] Should RPC consume_in_thread() be more fault tolerant?

2013-06-25 Thread Qing He
In this case, the exception is just hard to debug, and not something we 
expected in our design. The thorough solution would be to debug further to see 
the root cause, and deal with it as I suggested in the code review. 

However, If it is time consuming to debug, we can put the patch in there with 
logging to record the circumstances the 'unexpected' exception occurs, so that 
we can put in new patches to deal it when it becomes expected. After all, this 
is an iterative process. When the software evolves, a lot of unexpected has 
become expected. 

-Original Message-
From: Raymond Pekowski (Code Review) [mailto:rev...@openstack.org] 
Sent: Tuesday, June 25, 2013 9:52 AM
Cc: Dirk Mueller; Andrea Rosa; Ben Nemec; Chris Behrens; Eric Windisch; Russell 
Bryant; Qing He
Subject: Change in openstack/oslo-incubator[master]: Make AMQP based RPC 
consumer threads more robust

Raymond Pekowski has posted comments on this change.

Change subject: Make AMQP based RPC consumer threads more robust 
..


Patch Set 12:

By definition, unexpected exceptions are unexpected, so there is nothing yet to 
do root cause analysis on.  We could argue as to what the appropriate action to 
take is, but that has already been discussed on the mailing list started by 
this thread: 
http://lists.openstack.org/pipermail/openstack-dev/2013-June/010040.html
Please comment on that thread since you don't agree with the consensus that was 
reached.

--
To view, visit https://review.openstack.org/32235
To unsubscribe, visit https://review.openstack.org/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0d6ec8a5e3a310314da201656ee862bb40b41616
Gerrit-PatchSet: 12
Gerrit-Project: openstack/oslo-incubator
Gerrit-Branch: master
Gerrit-Owner: Raymond Pekowski raymond_pekow...@dell.com
Gerrit-Reviewer: Andrea Rosa andrea.r...@hp.com
Gerrit-Reviewer: Ben Nemec openst...@nemebean.com
Gerrit-Reviewer: Chris Behrens cbehr...@codestud.com
Gerrit-Reviewer: Dirk Mueller d...@dmllr.de
Gerrit-Reviewer: Eric Windisch e...@cloudscaling.com
Gerrit-Reviewer: Jenkins
Gerrit-Reviewer: Qing He qing...@radisys.com
Gerrit-Reviewer: Raymond Pekowski raymond_pekow...@dell.com
Gerrit-Reviewer: Russell Bryant rbry...@redhat.com

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Should RPC consume_in_thread() be more fault tolerant?

2013-06-25 Thread Russell Bryant
On 06/25/2013 03:15 PM, Ray Pekowski wrote:
 
 On Jun 25, 2013 1:09 PM, Qing He qing...@radisys.com
 mailto:qing...@radisys.com wrote:

 Basically, when 'unexpected' happens, someone (e.g., operator) needs
 to know about it and look into it to see if it is something benign or
 fatal. If it is masked, the system may degrade overtime unnoticed into
 unusable.
 
 The approach implemented in the patch is to log the exception and retry
 at a rate of one per second.  An alternative would be a log and a
 sys.exit() to kill the entire process.  Be aware that the code affected
 by this patch is rpc created dispatcher like threads.  Let's have a vote
 on which option is preferrable.

I like it how it's implemented, *not* killing the process ...

-- 
Russell Bryant

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Should RPC consume_in_thread() be more fault tolerant?

2013-06-25 Thread Qing He
Agree! Let someone know and keep going unless someone wants to interrupt it or 
do something. (Does there exist a mechanism already to do this?)

-Original Message-
From: Russell Bryant [mailto:rbry...@redhat.com] 
Sent: Tuesday, June 25, 2013 12:21 PM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] Should RPC consume_in_thread() be more fault 
tolerant?

On 06/25/2013 03:15 PM, Ray Pekowski wrote:
 
 On Jun 25, 2013 1:09 PM, Qing He qing...@radisys.com 
 mailto:qing...@radisys.com wrote:

 Basically, when 'unexpected' happens, someone (e.g., operator) needs
 to know about it and look into it to see if it is something benign or 
 fatal. If it is masked, the system may degrade overtime unnoticed into 
 unusable.
 
 The approach implemented in the patch is to log the exception and 
 retry at a rate of one per second.  An alternative would be a log and 
 a
 sys.exit() to kill the entire process.  Be aware that the code 
 affected by this patch is rpc created dispatcher like threads.  Let's 
 have a vote on which option is preferrable.

I like it how it's implemented, *not* killing the process ...

--
Russell Bryant

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Should RPC consume_in_thread() be more fault tolerant?

2013-06-25 Thread Qing He
Does the log alert operator? Something like SNMP trap?

From: Ray Pekowski [mailto:pekow...@gmail.com]
Sent: Tuesday, June 25, 2013 12:16 PM
To: OpenStack Development Mailing List
Subject: Re: [openstack-dev] Should RPC consume_in_thread() be more fault 
tolerant?


On Jun 25, 2013 1:09 PM, Qing He 
qing...@radisys.commailto:qing...@radisys.com wrote:

 Basically, when 'unexpected' happens, someone (e.g., operator) needs to know 
 about it and look into it to see if it is something benign or fatal. If it is 
 masked, the system may degrade overtime unnoticed into unusable.

The approach implemented in the patch is to log the exception and retry at a 
rate of one per second.  An alternative would be a log and a sys.exit() to kill 
the entire process.  Be aware that the code affected by this patch is rpc 
created dispatcher like threads.  Let's have a vote on which option is 
preferrable.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Should RPC consume_in_thread() be more fault tolerant?

2013-06-25 Thread Russell Bryant
On 06/25/2013 04:08 PM, Qing He wrote:
 Does the log alert operator? Something like SNMP trap?

You can turn on a mode where it will emit a notification, and
notifications can be published via AMQP.

-- 
Russell Bryant

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Should RPC consume_in_thread() be more fault tolerant?

2013-06-25 Thread Qing He
Clarify, operator does not have to go through a long log to find the issue. 
Instead, he/she needs to be notified that something severe/unexpected just 
happened and he/she needs to check it out.

From: Qing He
Sent: Tuesday, June 25, 2013 1:09 PM
To: 'OpenStack Development Mailing List'
Subject: RE: [openstack-dev] Should RPC consume_in_thread() be more fault 
tolerant?

Does the log alert operator? Something like SNMP trap?

From: Ray Pekowski 
[mailto:pekow...@gmail.com]mailto:[mailto:pekow...@gmail.com]
Sent: Tuesday, June 25, 2013 12:16 PM
To: OpenStack Development Mailing List
Subject: Re: [openstack-dev] Should RPC consume_in_thread() be more fault 
tolerant?


On Jun 25, 2013 1:09 PM, Qing He 
qing...@radisys.commailto:qing...@radisys.com wrote:

 Basically, when 'unexpected' happens, someone (e.g., operator) needs to know 
 about it and look into it to see if it is something benign or fatal. If it is 
 masked, the system may degrade overtime unnoticed into unusable.

The approach implemented in the patch is to log the exception and retry at a 
rate of one per second.  An alternative would be a log and a sys.exit() to kill 
the entire process.  Be aware that the code affected by this patch is rpc 
created dispatcher like threads.  Let's have a vote on which option is 
preferrable.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev