Trying to understand why a node gets 'frozen'

Carlos Alonso Sun, 01 Oct 2017 02:43:05 -0700

Hello everyone!!

I'm trying to understand an issue we're experiencing on CouchDB 2.1.0
running on Ubuntu 14.04. The cluster itself is currently replicating from
another source cluster and we have seen that one node gets frozen from time
to time having to restart it to get it to respond again.


Before getting unresponsive, the node throws a lot of {error,
sel_conn_closed}. See an example trace below.

[error] 2017-10-01T05:25:23.921126Z couchdb@couchdb-1 <0.13489.0> --------
gen_server <0.13489.0> terminated with reason:
{checkpoint_commit_failure,<<"Failure on target commit:
{'EXIT',{http_request_failed,\"POST\",\n                             \"
http://127.0.0.1:5984/mydb/_ensure_full_commit\",\n
     {error,sel_conn_closed}}}">>}
  last msg: {'EXIT',<0.10626.0>,{checkpoint_commit_failure,<<"Failure on
target commit: {'EXIT',{http_request_failed,\"POST\",\n
         \"http://127.0.0.1:5984/mydb/_ensure_full_commit\",\n
               {error,sel_conn_closed}}}">>}}
     state: {state,<0.10626.0>,<0.13490.0>,20,{httpdb,"
https://source_ip/mydb/",nil,[{"Accept","application/json"},{"Authorization","Basic
..."},{"User-Agent","CouchDB-Replicator/2.1.0"}],30000,[{is_ssl,true},{socket_options,[{keepalive,true},{nodelay,false}]},{ssl_options,[{depth,3},{verify,verify_none}]}],10,250,<0.11931.0>,20,nil,undefined},{httpdb,"
http://127.0.0.1:5984/mydb/",nil,[{"Accept","application/json"},{"Authorization","Basic
..."},{"User-Agent","CouchDB-Replicator/2.1.0"}],30000,[{socket_options,[{keepalive,true},{nodelay,false}]}],10,250,<0.11995.0>,20,nil,undefined},[],<0.25756.4748>,nil,{<0.13490.0>,#Ref<0.0.724041731.98305>},[{docs_read,1},{missing_checked,1},{missing_found,1}],nil,nil,{batch,[<<"{\"_id\":\"df84bfda818ea150b249da89e8d79a38\",\"_rev\":\"1-ebb0119fbdcad604ad372fa6e05d06a2\",...\":{\"start\":1,\"ids\":[\"ebb0119fbdcad604ad372fa6e05d06a2\"]}}">>],605}}

The particular node is 'responsible' for a replication that has quite many
{mp_parser_died,noproc} errors, which AFAIK is a known bug (
https://github.com/apache/couchdb/issues/745), but I don't know if that may
have any relationship.

When that happens, just restarting the node brings it up and running
properly.

Any help would be really appreciated.

Regards
-- 
[image: Cabify - Your private Driver] <http://www.cabify.com/>

*Carlos Alonso*
Data Engineer
Madrid, Spain

[email protected]

Prueba gratis con este código
#CARLOSA6319 <https://cabify.com/i/carlosa6319>
[image: Facebook] <http://cbify.com/fb_ES>[image: Twitter]
<http://cbify.com/tw_ES>[image: Instagram] <http://cbify.com/in_ES>[image:
Linkedin] <https://www.linkedin.com/in/mrcalonso>

-- 
Este mensaje y cualquier archivo adjunto va dirigido exclusivamente a su 
destinatario, pudiendo contener información confidencial sometida a secreto 
profesional. No está permitida su reproducción o distribución sin la 
autorización expresa de Cabify. Si usted no es el destinatario final por 
favor elimínelo e infórmenos por esta vía. 

This message and any attached file are intended exclusively for the 
addressee, and it may be confidential. You are not allowed to copy or 
disclose it without Cabify's prior written authorization. If you are not 
the intended recipient please delete it from your system and notify us by 
e-mail.

Trying to understand why a node gets 'frozen'

Reply via email to