Hi everyone,
I moved some data from an Amazon EC2 small instance to a large one and in
the process upgraded from CouchDB 1.1.0 to CouchDB 1.1.1. I also went
with
Erlang R14B04 instead of R14B03 (Hurray for commando updates!) and now my
CouchDB instance seems to sometimes die when under heavy load. By
"dying" I
mean that the beam process seems to stay in memory, but the HTTP server
is
gone and no requests get served. For now I "fix" this by stopping and
restarting the process.
Here are some details on my setup. The server is running a 64-bit Ubuntu
Server (Oneiric) Amazon EC2 image on a large instance with 2 CPU cores
and
7.5 GB RAM. I build both Erlang and CouchDB from source. I collect log
entries and bulk insert them in batches of up to 200 documents. I also
run
couchdb-lucene on the same host and I *think* most of the crashes happen
when couchdb-lucene is running a tough query and is hogging the CPU or
the
HDD. I have some largish db's (~50 million documents, ~25 GB in disk
space). I plan on splitting my dbs into smaller ones. I hope that gets me
more responsive file access and faster full text index searches. I think
my
lucene indexes may be getting too large for that machine's memory and it
can't serve them too well. I frequently get "OS process has timed out"
errors when trying to query those indexes. Anyway, that shouldn't be
crashing the core couchdb process, right?
I am pasting my idea of what the relevant portion of the couchdb log file
is below, hoping somebody could decipher something out of it. Am I
correct
in thinking that the "** Reason for termination == ** {timeout," part
means
the process is crashing since writing to or reading from a file timed
out?
Any help is greatly appreciated.
Best,
Hristo
===============
[Thu, 08 Dec 2011 20:17:16 GMT] [error] [<0.78.0>]
{error_report,<0.31.0>,
{<0.78.0>,supervisor_report,
[{supervisor,{local,couch_**server_sup}},
{errorContext,child_**terminated},
{reason,shutdown},
{offender,
[{pid,<0.86.0>},
{name,couch_secondary_**services},
{mfargs,
{couch_server_sup,start_**secondary_services,
[]}},
{restart_type,permanent},
{shutdown,infinity},
{child_type,supervisor}]}]}}
[Thu, 08 Dec 2011 20:17:21 GMT] [error] [<0.407.0>] ** Generic server
<0.407.0> terminating
** Last message in was delayed_commit
** When Server state == {db,<0.406.0>,<0.407.0>,nil,<<**
"1323371423957954">>,
<0.404.0>,<0.408.0>,
{db_header,5,204982,0,
{199491055,{204980,0}},
{199498140,204980},
{111685732,[]},
0,nil,nil,1000},
204982,
{btree,<0.404.0>,
{199513565,{205011,0}},
#Fun<couch_db_updater.10.**19222179>,
#Fun<couch_db_updater.11.**21515767>,
#Fun<couch_btree.5.112258129>,
#Fun<couch_db_updater.12.**93888648>},
{btree,<0.404.0>,
{199518784,205011},
#Fun<couch_db_updater.13.**40165027>,
#Fun<couch_db_updater.14.**82810239>,
#Fun<couch_btree.5.112258129>,
#Fun<couch_db_updater.15.104121193>},
{btree,<0.404.0>,
{111685732,[]},
#Fun<couch_btree.0.23070627>,
#Fun<couch_btree.1.117278773>,
#Fun<couch_btree.2.112258129>,**nil},
205013,
<<"database1">>,
"/data/couchdb/data/database1.**couch",
[],[],nil,
{user_ctx,null,[],undefined},
#Ref<0.0.30.131014>,1000,
[before_header,after_header,**on_file_open],
false}
** Reason for termination ==
** {timeout,
{gen_server,call,
[<0.406.0>,
{db_updated,
{db,<0.406.0>,<0.407.0>,nil,<<**"1323371423957954">>,<0.404.0>**,
<0.408.0>,
{db_header,5,205013,0,
{199513565,{205011,0}},
{199518784,205011},
{111685732,[]},
0,nil,nil,1000},
205013,
{btree,<0.404.0>,
{199513565,{205011,0}},
#Fun<couch_db_updater.10.**19222179>,
#Fun<couch_db_updater.11.**21515767>,
#Fun<couch_btree.5.112258129>,
#Fun<couch_db_updater.12.**93888648>},
{btree,<0.404.0>,
{199518784,205011},
#Fun<couch_db_updater.13.**40165027>,
#Fun<couch_db_updater.14.**82810239>,
#Fun<couch_btree.5.112258129>,
#Fun<couch_db_updater.15.104121193>},
{btree,<0.404.0>,
{111685732,[]},
#Fun<couch_btree.0.23070627>,
#Fun<couch_btree.1.117278773>,
#Fun<couch_btree.2.112258129>,**nil},
205013,
<<"database1">>,
"/data/couchdb/data/database1.**couch",
[],[],nil,
{user_ctx,null,[],undefined},
nil,1000,
[before_header,after_header,**on_file_open],
false}}]}}
[Thu, 08 Dec 2011 20:17:21 GMT] [error] [<0.407.0>]
{error_report,<0.31.0>,
{<0.407.0>,crash_report,
[[{initial_call,{couch_db_**updater,init,['Argument__1']}}**,
{pid,<0.407.0>},
{registered_name,[]},
{error_info,
{exit,
{timeout,
{gen_server,call,
[<0.406.0>,
{db_updated,
{db,<0.406.0>,<0.407.0>,nil,
<<"1323371423957954">>,<0.404.**
0>,<0.408.0>,
{db_header,5,205013,0,
{199513565,{205011,0}},
{199518784,205011},
{111685732,[]},
0,nil,nil,1000},
205013,
{btree,<0.404.0>,
{199513565,{205011,0}},
#Fun<couch_db_updater.10.**19222179>,
#Fun<couch_db_updater.11.**21515767>,
#Fun<couch_btree.5.112258129>,
#Fun<couch_db_updater.12.**93888648>},
{btree,<0.404.0>,
{199518784,205011},
#Fun<couch_db_updater.13.**40165027>,
#Fun<couch_db_updater.14.**82810239>,
#Fun<couch_btree.5.112258129>,
#Fun<couch_db_updater.15.104121193>},
{btree,<0.404.0>,
{111685732,[]},
#Fun<couch_btree.0.23070627>,
#Fun<couch_btree.1.117278773>,
#Fun<couch_btree.2.112258129>,**nil},
205013,
<<"database1">>,
"/data/couchdb/data/database1.**couch",
[],[],nil,
{user_ctx,null,[],undefined},
nil,1000,
[before_header,after_header,**
on_file_open],
false}}]}},
[{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]}**},
{ancestors,[<0.406.0>,<0.403.**0>]},
{messages,[{'EXIT',<0.406.0>,**shutdown}]},
{links,[]},
{dictionary,[]},
{trap_exit,true},
{status,running},
{heap_size,28657},
{stack_size,24},
{reductions,4487709}],
[]]}}
[Thu, 08 Dec 2011 20:17:22 GMT] [error] [<0.178.0>] ** Generic server
<0.178.0> terminating
** Last message in was {update_docs,<0.2027.0>,
[[{doc,<<"**55e776b94547442ab17b82bd1a0598**
43">>,
{1,
[<<102,77,172,235,192,72,84,**
223,58,68,105,
199,153,147,196,81>>]},
{[{<<"host">>,<<"Host1">>},
{<<"time">>,1323375464000},
{<<"text">>,
<<"Some text">>},
{<<"level">>,0},
{<<"source">>,<<"source1">>},
{<<"type">>,<<"Entry1">>}]},
[],false,[]}],
...
[[A BUNCH OF DOCS HERE]]
...
{[{<<"host">>,<<"Host1">>},
{<<"time">>,1323375467000},
{<<"text">>,
<<"Some text">>},
{<<"level">>,0},
{<<"source">>,<<"source1">>},
{<<"type">>,<<"Entry1">>}]},
[],false,[]}]],
[],false,false}
** When Server state == {db,<0.177.0>,<0.178.0>,nil,<<**
"1323371411352029">>,
<0.175.0>,<0.179.0>,
{db_header,5,13636863,0,
{6776455960,{13636861,0}},
{6776479023,13636861},
{1039786,[]},
0,nil,nil,1000},
13636863,
{btree,<0.175.0>,
{6776455960,{13636861,0}},
#Fun<couch_db_updater.10.**19222179>,
#Fun<couch_db_updater.11.**21515767>,
#Fun<couch_btree.5.112258129>,
#Fun<couch_db_updater.12.**93888648>},
{btree,<0.175.0>,
{6776479023,13636861},
#Fun<couch_db_updater.13.**40165027>,
#Fun<couch_db_updater.14.**82810239>,
#Fun<couch_btree.5.112258129>,
#Fun<couch_db_updater.15.104121193>},
{btree,<0.175.0>,
{1039786,[]},
#Fun<couch_btree.0.23070627>,
#Fun<couch_btree.1.117278773>,
#Fun<couch_btree.2.112258129>,**nil},
13636863,
<<"database2">>,
"/data/couchdb/data/database2.**couch",
[],[],nil,
{user_ctx,null,[],undefined},
nil,1000,
[before_header,after_header,**on_file_open],
false}
** Reason for termination ==
** {timeout,
{gen_server,call,
[<0.177.0>,
{db_updated,
{db,<0.177.0>,<0.178.0>,nil,<<**"1323371411352029">>,<0.175.0>**,
<0.179.0>,
{db_header,5,13636863,0,
{6776455960,{13636861,0}},
{6776479023,13636861},
{1039786,[]},
0,nil,nil,1000},
13636863,
{btree,<0.175.0>,
{6776557909,{13637061,0}},
#Fun<couch_db_updater.10.**19222179>,
#Fun<couch_db_updater.11.**21515767>,
#Fun<couch_btree.5.112258129>,
#Fun<couch_db_updater.12.**93888648>},
{btree,<0.175.0>,
{6776580448,13637061},
#Fun<couch_db_updater.13.**40165027>,
#Fun<couch_db_updater.14.**82810239>,
#Fun<couch_btree.5.112258129>,
#Fun<couch_db_updater.15.104121193>},
{btree,<0.175.0>,
{1039786,[]},
#Fun<couch_btree.0.23070627>,
#Fun<couch_btree.1.117278773>,
#Fun<couch_btree.2.112258129>,**nil},
13637063,
<<"database2">>,
"/data/couchdb/data/database2.**couch",
[],[],nil,
{user_ctx,null,[],undefined},
#Ref<0.0.30.133811>,1000,
[before_header,after_header,**on_file_open],
false}}]}}
[Thu, 08 Dec 2011 20:17:22 GMT] [error] [<0.178.0>]
{error_report,<0.31.0>,
{<0.178.0>,crash_report,
[[{initial_call,{couch_db_**updater,init,['Argument__1']}}**,
{pid,<0.178.0>},
{registered_name,[]},
{error_info,
{exit,
{timeout,
{gen_server,call,
[<0.177.0>,
{db_updated,
{db,<0.177.0>,<0.178.0>,nil,
<<"1323371411352029">>,<0.175.**
0>,<0.179.0>,
{db_header,5,13636863,0,
{6776455960,{13636861,0}},
{6776479023,13636861},
{1039786,[]},
0,nil,nil,1000},
13636863,
{btree,<0.175.0>,
{6776557909,{13637061,0}},
#Fun<couch_db_updater.10.**19222179>,
#Fun<couch_db_updater.11.**21515767>,
#Fun<couch_btree.5.112258129>,
#Fun<couch_db_updater.12.**93888648>},
{btree,<0.175.0>,
{6776580448,13637061},
#Fun<couch_db_updater.13.**40165027>,
#Fun<couch_db_updater.14.**82810239>,
#Fun<couch_btree.5.112258129>,
#Fun<couch_db_updater.15.104121193>},
{btree,<0.175.0>,
{1039786,[]},
#Fun<couch_btree.0.23070627>,
#Fun<couch_btree.1.117278773>,
#Fun<couch_btree.2.112258129>,**nil},
13637063,
<<"database2">>,
"/data/couchdb/data/database2.**couch",
[],[],nil,
{user_ctx,null,[],undefined},
#Ref<0.0.30.133811>,1000,
[before_header,after_header,**
on_file_open],
false}}]}},
[{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]}**},
{ancestors,[<0.177.0>,<0.174.**0>]},
{messages,
[{'EXIT',<0.177.0>,shutdown},**
delayed_commit]},
{links,[]},
{dictionary,[]},
{trap_exit,true},
{status,running},
{heap_size,121393},
{stack_size,24},
{reductions,83311172}],
[]]}}