Hello there,

I realized that when a replication, continuous or transient, is ran, but the 
target host is unreachable the replication job is deleted. Here are a few 
examples of logs:


[error] 2023-10-18T12:43:37.394891Z couchdb@127.0.0.1 <0.582.0> -------- 
couch_replicator_scheduler : Transient job 
{"0f63c93e6e24efacede944ce1ed14795","+continuous"} failed, removing. Error: 
<<"{checkpoint_commit_failure,<<\"instance_start_time on source and target 
database has changed since last checkpoint.\">>}">>

[error] 2023-10-18T12:43:38.679316Z couchdb@127.0.0.1 <0.582.0> -------- 
couch_replicator_scheduler : Transient job 
{"ecc1efdf8f86c4ef626f0ba36766ec56","+continuous"} failed, removing. Error: 
<<"{checkpoint_commit_failure,<<\"instance_start_time on source and target 
database has changed since last checkpoint.\">>}">>

[error] 2023-10-18T12:43:42.230056Z couchdb@127.0.0.1 <0.582.0> -------- 
couch_replicator_scheduler : Transient job 
{"a3ede72be05b5aaf0da538843928491a","+continuous"} failed, removing. Error: 
<<"{checkpoint_commit_failure,<<\"instance_start_time on source and target 
database has changed since last checkpoint.\">>}">>

[error] 2023-10-18T12:44:37.178885Z couchdb@127.0.0.1 <0.582.0> -------- 
couch_replicator_scheduler : Transient job 
{"76e126167983ab9e8003853ad5cbcfaa",[]} failed, removing. Error: 
<<"{http_request_failed,\"GET\",\n                     
\"http://some.anonymized.host:5984/db/\",\n                     
{error,{error,{conn_failed,{error,econnrefused}}}}}">>

[error] 2023-10-18T12:44:43.901342Z couchdb@127.0.0.1 <0.582.0> -------- 
couch_replicator_scheduler : Transient job 
{"0f63c93e6e24efacede944ce1ed14795","+continuous"} failed, removing. Error: 
<<"{checkpoint_commit_failure,<<\"Failure on target commit: {'EXIT',\\n    
{http_request_failed,\\\"POST\\\",\\n        
\\\"http://some.anonymized.host:5984/db/_ensure_full_commit\\\",\\n        
{error,{error,{conn_failed,{error,econnrefused}}}}}}\">>}">>



The problem is that in my usecase it is expected for these hosts to be 
unreachable. I want couchdb to consider this as a transient error and continue, 
and a human will tell Couchdb when a replication job should be actually 
removed. Today I need some application code to recreate those replication jobs 
but I'd like not to.

Is there a way to have those replication persist ? 

--
Matthieu Rakotojaona
Research Engineer, Inria <https://www.inria.fr/>
STACK team <https://stack-research-group.gitlabpages.inria.fr/web/>

Reply via email to