We are using CouchDB 1.6.1/CentOS Linux release 7.0.1406. CouchDB was installed
using `yum`.
We tried to run data conversion on some 100 databases. Most databases have less
than 1500 documents (around 1MB) except for 3 which have around 200,000
documents (around 250 MB). Conversion ran fine on few databases then we started
seeing `Error: connect ECONNREFUSED 127.0.0.1:5984` errors.
Conversion steps:
Replicate `database_1` to `database_1_backup`.
Delete `database_1`.
Recreate `database_1`.
Read documents from `database_1_backup` in memory.
Write to `database_1` using bulkDocs.
Crash log:
[Wed, 13 Apr 2016 21:05:06 GMT] [info] [<0.2715.524>] starting new replication
`27dd24d1bd28e13225559e3e0a6c275a` at <0.5681.524> (`database_1` ->
`database_1_backup`)
[Wed, 13 Apr 2016 21:05:07 GMT] [info] [<0.5681.524>] recording a checkpoint
for `database_1` -> `database_1_backup` at source update_seq 2209
[Wed, 13 Apr 2016 21:05:07 GMT] [info] [<0.2715.524>] <ip.address> - - POST
/_replicate 200
[Wed, 13 Apr 2016 21:05:07 GMT] [info] [<0.31752.523>] <ip.address> - - GET
/database_1_backup/ 200
[Wed, 13 Apr 2016 21:05:07 GMT] [info] [<0.10914.524>] <ip.address> - - GET
/database_1/ 200
[Wed, 13 Apr 2016 21:05:07 GMT] [info] [<0.2623.524>] <ip.address> - - GET
/database_1/ 200
[Wed, 13 Apr 2016 21:05:07 GMT] [info] [<0.7567.524>] <ip.address> - - DELETE
/database_1/ 200
[Wed, 13 Apr 2016 21:05:07 GMT] [error] [<0.137.0>] ** Generic server
couch_index_server terminating
** Last message in was {'$gen_cast',{reset_indexes,<<"database_1">>}}
** When Server state == {st,"/var/lib/couchdb"}
** Reason for termination ==
** {{badmatch,{error,eacces}},
[{couch_file,nuke_dir,2,[{file,"couch_file.erl"},{line,237}]},
{couch_file,'-nuke_dir/2-fun-0-',3,[{file,"couch_file.erl"},{line,228}]},
{lists,foreach,2,[{file,"lists.erl"},{line,1323}]},
{couch_file,nuke_dir,2,[{file,"couch_file.erl"},{line,236}]},
{couch_index_server,hafndle_cast,2,
[{file,"src/couch_index_server.erl"},{line,117}]},
{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,604}]},
{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
Our seconds attempt to re-run the conversion completely crashed couchDB.
[Wed, 13 Apr 2016 22:17:19 GMT] [info] [<0.19197.0>] starting new replication
`6fe446668153db8635e9f49ddd8895f2` at <0.20012.0> (`database_2` -> `database_2`)
[Wed, 13 Apr 2016 22:17:19 GMT] [info] [<0.20012.0>] recording a checkpoint for
`database_2` -> `database_2` at source update_seq 1631
[Wed, 13 Apr 2016 22:17:19 GMT] [error] [<0.20012.0>] Replication
`6fe446668153db8635e9f49ddd8895f2` (`database_2` -> `database_2`) failed:
{checkpoint_commit_failure,<<"Error updating the target checkpoint document:
conflict">>}
[Wed, 13 Apr 2016 22:17:19 GMT] [error] [<0.20012.0>] ** Generic server
<0.20012.0> terminating
** Last message in was {'EXIT',<0.20027.0>,normal}
** When Server state == {rep_state,
{rep,
{"6fe446668153db8635e9f49ddd8895f2",[]},
<<"database_2">>,<<"database_2">>,
[{checkpoint_interval,5000},
{connection_timeout,30000},
{http_connections,20},
{retries,10},
{socket_options,[{keepalive,true},{nodelay,false}]},
{use_checkpoints,true},
{worker_batch_size,500},
{worker_processes,4}],
erl_crash.dump - https://paste.ee/r/EWRYV <https://paste.ee/r/EWRYV>
SeLinux is not an issue here, at least not this time.
Any help would be greatly appreciated debugging this crash log.
Thanks,
Sajin Shrestha