I just hit a mysterious crash that I can't figure out Briefly, I am using a couchdb database to hold the processing state for some R jobs I am running on multiple machines, so that different processes make sure to work on different data.
The database is pretty simple, with one view (to show files that are in process and not yet complete). As of this AM, I was running couchdb 1.0.2 on a Gentoo 64 bit machine, using the standard ebuild. In trying to fix this crash, I've since upgraded to the latest 1.1.0, again using the standard Gentoo ebuild. The only thing that I did prior to the crash was to change the configuration setting for "timeout". I was trying to debug occasional non-responses from the CouchDB server, and tried increasing the "timeout" parameter in the config file to a day from an hour. The next thing I knew, my R jobs all started dying unable access the tracking database. While I don't think the config setting is related, but perhaps it is? Perhaps CouchDB restarted while writing or something?? Hoping that maybe this was an issue with 1.0.2, I upgraded. After upgrading to 1.1.0, I still get the error, although the dump is different (shorter dump of 'badmatch' values). The error is in my database /vdsdata%2fd12%2f2007. At the end of this email I copy the log dump of the error. First I hit /vdsdata%2fd12%2f2008 without any errors, then crash on 2007. I can probably recreate the database by inspecting my R job logs, but I usually expect to "relax" and not worry about my data when using CouchDB! The crash log follows: [Mon, 13 Jun 2011 22:29:35 GMT] [info] [<0.32.0>] Apache CouchDB has started on http://127.0.0.1:5984/ [Mon, 13 Jun 2011 22:29:55 GMT] [debug] [<0.279.0>] 'GET' / {1,1} from "127.0.0.1" Headers: [{'Accept',"*/*"}, {'Host',"127.0.0.1:5984"}, {'User-Agent',"curl/7.20.0 (x86_64-pc-linux-gnu) libcurl/7.20.0 OpenSSL/1.0.0d zlib/1.2.5"}] [Mon, 13 Jun 2011 22:29:55 GMT] [debug] [<0.279.0>] OAuth Params: [] [Mon, 13 Jun 2011 22:29:55 GMT] [info] [<0.279.0>] 127.0.0.1 - - 'GET' / 200 [Mon, 13 Jun 2011 22:30:02 GMT] [debug] [<0.350.0>] 'GET' /_all_dbs {1,1} from "127.0.0.1" Headers: [{'Accept',"*/*"}, {'Host',"127.0.0.1:5984"}, {'User-Agent',"curl/7.20.0 (x86_64-pc-linux-gnu) libcurl/7.20.0 OpenSSL/1.0.0d zlib/1.2.5"}] [Mon, 13 Jun 2011 22:30:02 GMT] [debug] [<0.350.0>] OAuth Params: [] [Mon, 13 Jun 2011 22:30:02 GMT] [info] [<0.350.0>] 127.0.0.1 - - 'GET' /_all_dbs 200 [Mon, 13 Jun 2011 22:30:35 GMT] [debug] [<0.644.0>] 'GET' /vdsdata%2fd12%2f2008 {1,1} from "127.0.0.1" Headers: [{'Accept',"*/*"}, {'Host',"127.0.0.1:5984"}, {'User-Agent',"curl/7.20.0 (x86_64-pc-linux-gnu) libcurl/7.20.0 OpenSSL/1.0.0d zlib/1.2.5"}] [Mon, 13 Jun 2011 22:30:35 GMT] [debug] [<0.644.0>] OAuth Params: [] [Mon, 13 Jun 2011 22:30:35 GMT] [info] [<0.644.0>] 127.0.0.1 - - 'GET' /vdsdata%2fd12%2f2008 200 [Mon, 13 Jun 2011 22:30:38 GMT] [debug] [<0.683.0>] 'GET' /vdsdata%2fd12%2f2007 {1,1} from "127.0.0.1" Headers: [{'Accept',"*/*"}, {'Host',"127.0.0.1:5984"}, {'User-Agent',"curl/7.20.0 (x86_64-pc-linux-gnu) libcurl/7.20.0 OpenSSL/1.0.0d zlib/1.2.5"}] [Mon, 13 Jun 2011 22:30:38 GMT] [debug] [<0.683.0>] OAuth Params: [] [Mon, 13 Jun 2011 22:30:38 GMT] [error] [<0.702.0>] ** Generic server <0.702.0> terminating ** Last message in was {pread_iolist,968204} ** When Server state == {file,{file_descriptor,prim_file,{#Port<0.2114>,17}}, 0,974937} ** Reason for termination == ** {{badmatch,{ok,<<0,0,84,10,47,184,104,176,71,1,83,162,179,58,204,120,204, 243,234,131,104,11,100,0,9,100,98,95,104,101,97,100,101, 114,97,5,97,237,97,0,104,2,98,0,14,198,12,104,2,97,84,97, 1,104,2,98,0,14,203,144,97,85,100,0,3,110,105,108,97,0, 100,0,3,110,105,108,100,0,3,110,105,108,98,0,0,3,232>>}}, [{couch_file,read_raw_iolist_int,3}, {couch_file,maybe_read_more_iolist,4}, {couch_file,handle_call,3}, {gen_server,handle_msg,5}, {proc_lib,init_p_do_apply,3}]} [Mon, 13 Jun 2011 22:30:38 GMT] [error] [<0.702.0>] {error_report,<0.31.0>, {<0.702.0>,crash_report, [[{initial_call,{couch_file,init,['Argument__1']}}, {pid,<0.702.0>}, {registered_name,[]}, {error_info, {exit, {{badmatch, {ok, <<0,0,84,10,47,184,104,176,71,1,83,162,179,58, 204,120,204,243,234,131,104,11,100,0,9,100,98, 95,104,101,97,100,101,114,97,5,97,237,97,0, 104,2,98,0,14,198,12,104,2,97,84,97,1,104,2, 98,0,14,203,144,97,85,100,0,3,110,105,108,97, 0,100,0,3,110,105,108,100,0,3,110,105,108,98, 0,0,3,232>>}}, [{couch_file,read_raw_iolist_int,3}, {couch_file,maybe_read_more_iolist,4}, {couch_file,handle_call,3}, {gen_server,handle_msg,5}, {proc_lib,init_p_do_apply,3}]}, [{gen_server,terminate,6}, {proc_lib,init_p_do_apply,3}]}}, {ancestors,[<0.701.0>]}, {messages,[]}, {links,[<0.701.0>,<0.706.0>]}, {dictionary,[]}, {trap_exit,true}, {status,running}, {heap_size,233}, {stack_size,24}, {reductions,1572}], [{neighbour, [{pid,<0.704.0>}, {registered_name,[]}, {initial_call,{couch_db,init,['Argument__1']}}, {current_function,{proc_lib,sync_wait,2}}, {ancestors,[<0.701.0>]}, {messages,[]}, {links,[<0.701.0>,<0.705.0>]}, {dictionary,[]}, {trap_exit,false}, {status,waiting}, {heap_size,233}, {stack_size,16}, {reductions,43}]}, {neighbour, [{pid,<0.706.0>}, {registered_name,[]}, {initial_call, {couch_ref_counter,init,['Argument__1']}}, {current_function,{gen_server,loop,6}}, {ancestors,[<0.705.0>,<0.704.0>,<0.701.0>]}, {messages,[]}, {links,[<0.702.0>]}, {dictionary,[]}, {trap_exit,false}, {status,waiting}, {heap_size,233}, {stack_size,9}, {reductions,47}]}, {neighbour, [{pid,<0.701.0>}, {registered_name,[]}, {initial_call,{erlang,apply,2}}, {current_function,{proc_lib,sync_wait,2}}, {ancestors,[]}, {messages,[]}, {links,[<0.702.0>,<0.704.0>,<0.84.0>]}, {dictionary,[]}, {trap_exit,false}, {status,waiting}, {heap_size,233}, {stack_size,9}, {reductions,50}]}]]}} [Mon, 13 Jun 2011 22:30:38 GMT] [error] [<0.705.0>] {error_report,<0.31.0>, {<0.705.0>,crash_report, [[{initial_call,{couch_db_updater,init,['Argument__1']}}, {pid,<0.705.0>}, {registered_name,[]}, {error_info, {exit, {{{badmatch, {ok, <<0,0,84,10,47,184,104,176,71,1,83,162,179,58, 204,120,204,243,234,131,104,11,100,0,9,100, 98,95,104,101,97,100,101,114,97,5,97,237,97, 0,104,2,98,0,14,198,12,104,2,97,84,97,1,104, 2,98,0,14,203,144,97,85,100,0,3,110,105,108, 97,0,100,0,3,110,105,108,100,0,3,110,105,108, 98,0,0,3,232>>}}, [{couch_file,read_raw_iolist_int,3}, {couch_file,maybe_read_more_iolist,4}, {couch_file,handle_call,3}, {gen_server,handle_msg,5}, {proc_lib,init_p_do_apply,3}]}, {gen_server,call, [<0.702.0>,{pread_iolist,968204},infinity]}}, [{gen_server,init_it,6}, {proc_lib,init_p_do_apply,3}]}}, {ancestors,[<0.704.0>,<0.701.0>]}, {messages,[]}, {links,[<0.704.0>]}, {dictionary,[]}, {trap_exit,true}, {status,running}, {heap_size,610}, {stack_size,24}, {reductions,563}], []]}} [Mon, 13 Jun 2011 22:30:38 GMT] [error] [<0.84.0>] Unexpected message, restarting couch_server: {'EXIT', <0.701.0>, {{badmatch, {ok, <<0,0,84, 10,47, 184,104, 176,71, 1,83, 162,179, 58,204, 120,204, 243,234, 131,104, 11,100, 0,9,100, 98,95, 104,101, 97,100, 101,114, 97,5,97, 237,97, 0,104,2, 98,0,14, 198,12, 104,2, 97,84, 97,1, 104,2, 98,0,14, 203,144, 97,85, 100,0,3, 110,105, 108,97, 0,100,0, 3,110, 105,108, 100,0,3, 110,105, 108,98, 0,0,3, 232>>}}, [{couch_file, read_raw_iolist_int, 3}, {couch_file, maybe_read_more_iolist, 4}, {couch_file, handle_call, 3}, {gen_server, handle_msg, 5}, {proc_lib, init_p_do_apply, 3}]}} [Mon, 13 Jun 2011 22:30:38 GMT] [error] [<0.84.0>] ** Generic server couch_server terminating ** Last message in was {'EXIT',<0.701.0>, {{badmatch, {ok,<<0,0,84,10,47,184,104,176,71,1,83,162, 179,58,204,120,204,243,234,131,104,11, 100,0,9,100,98,95,104,101,97,100,101, 114,97,5,97,237,97,0,104,2,98,0,14,198, 12,104,2,97,84,97,1,104,2,98,0,14,203, 144,97,85,100,0,3,110,105,108,97,0,100, 0,3,110,105,108,100,0,3,110,105,108,98, 0,0,3,232>>}}, [{couch_file,read_raw_iolist_int,3}, {couch_file,maybe_read_more_iolist,4}, {couch_file,handle_call,3}, {gen_server,handle_msg,5}, {proc_lib,init_p_do_apply,3}]}} ** When Server state == {server,"/var/lib/couchdb", {re_pattern,0,0, <<69,82,67,80,124,0,0,0,16,0,0,0,1,0,0,0,0,0, 0,0,0,0,0,0,48,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,93,0,72,25,77,0,0,0,0,0,0, 0,0,0,0,0,0,254,255,255,7,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,77,0,0,0,0,16,171,255,3,0,0,0, 128,254,255,255,7,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,69,26,84,0,72,0>>}, 100,2,"Mon, 13 Jun 2011 22:29:35 GMT"} ** Reason for termination == ** kill [Mon, 13 Jun 2011 22:30:38 GMT] [error] [<0.84.0>] {error_report,<0.31.0>, {<0.84.0>,crash_report, [[{initial_call,{couch_server,init,['Argument__1']}}, {pid,<0.84.0>}, {registered_name,couch_server}, {error_info, {exit,kill, [{gen_server,terminate,6}, {proc_lib,init_p_do_apply,3}]}}, {ancestors, [couch_primary_services,couch_server_sup, <0.32.0>]}, {messages,[]}, {links,[<0.91.0>,<0.662.0>,<0.101.0>,<0.79.0>]}, {dictionary,[]}, {trap_exit,true}, {status,running}, {heap_size,610}, {stack_size,24}, {reductions,4012}], []]}} [Mon, 13 Jun 2011 22:30:38 GMT] [error] [<0.79.0>] {error_report,<0.31.0>, {<0.79.0>,supervisor_report, [{supervisor,{local,couch_primary_services}}, {errorContext,child_terminated}, {reason,kill}, {offender, [{pid,<0.84.0>}, {name,couch_server}, {mfargs,{couch_server,sup_start_link,[]}}, {restart_type,permanent}, {shutdown,1000}, {child_type,worker}]}]}} [Mon, 13 Jun 2011 22:30:38 GMT] [error] [<0.683.0>] Uncaught error in HTTP request: {exit, {kill, {gen_server,call, [couch_server, {open, <<"vdsdata/d12/2007">>, [{user_ctx, {user_ctx,null,[], undefined}}]}, infinity]}}} [Mon, 13 Jun 2011 22:30:38 GMT] [error] [<0.101.0>] ** Generic server <0.101.0> terminating ** Last message in was {'EXIT',<0.84.0>,kill} ** When Server state == {db,<0.101.0>,<0.102.0>,nil,<<"1308004175221958">>, <0.100.0>,<0.103.0>, {db_header,5,1,0, {45554,{1,0}}, {45654,1}, nil,0,nil,nil,1000}, 1, {btree,<0.100.0>, {45554,{1,0}}, #Fun<couch_db_updater.10.19222179>, #Fun<couch_db_updater.11.21515767>, #Fun<couch_btree.5.124754102>, #Fun<couch_db_updater.12.93888648>}, {btree,<0.100.0>, {45654,1}, #Fun<couch_db_updater.13.40165027>, #Fun<couch_db_updater.14.82810239>, #Fun<couch_btree.5.124754102>, #Fun<couch_db_updater.15.104121193>}, {btree,<0.100.0>,nil, #Fun<couch_btree.0.83553141>, #Fun<couch_btree.1.30790806>, #Fun<couch_btree.2.124754102>,nil}, 1,<<"_replicator">>, "/var/lib/couchdb/_replicator.couch", [#Fun<couch_doc.7.1569589>], [],nil, {user_ctx,null,[],undefined}, nil,1000, [before_header,after_header,on_file_open], true} ** Reason for termination == ** kill [Mon, 13 Jun 2011 22:30:38 GMT] [error] [<0.101.0>] {error_report,<0.31.0>, {<0.101.0>,crash_report, [[{initial_call,{couch_db,init,['Argument__1']}}, {pid,<0.101.0>}, {registered_name,[]}, {error_info, {exit,kill, [{gen_server,terminate,6}, {proc_lib,init_p_do_apply,3}]}}, {ancestors,[<0.99.0>]}, {messages,[]}, {links,[]}, {dictionary,[]}, {trap_exit,true}, {status,running}, {heap_size,1597}, {stack_size,24}, {reductions,283}], []]}} [Mon, 13 Jun 2011 22:30:38 GMT] [error] [<0.662.0>] ** Generic server <0.662.0> terminating ** Last message in was {'EXIT',<0.84.0>,kill} ** When Server state == {db,<0.662.0>,<0.663.0>,nil,<<"1308004235273664">>, <0.660.0>,<0.664.0>, {db_header,5,0,0,nil,nil,nil,0,nil,nil,1000}, 0, {btree,<0.660.0>,nil, #Fun<couch_db_updater.10.19222179>, #Fun<couch_db_updater.11.21515767>, #Fun<couch_btree.5.124754102>, #Fun<couch_db_updater.12.93888648>}, {btree,<0.660.0>,nil, #Fun<couch_db_updater.13.40165027>, #Fun<couch_db_updater.14.82810239>, #Fun<couch_btree.5.124754102>, #Fun<couch_db_updater.15.104121193>}, {btree,<0.660.0>,nil,#Fun<couch_btree.0.83553141>, #Fun<couch_btree.1.30790806>, #Fun<couch_btree.2.124754102>,nil}, 0,<<"vdsdata/d12/2008">>, "/var/lib/couchdb/vdsdata/d12/2008.couch",[],[], nil, {user_ctx,null,[],undefined}, nil,1000, [before_header,after_header,on_file_open], false} ** Reason for termination == ** kill [Mon, 13 Jun 2011 22:30:38 GMT] [error] [<0.662.0>] {error_report,<0.31.0>, {<0.662.0>,crash_report, [[{initial_call,{couch_db,init,['Argument__1']}}, {pid,<0.662.0>}, {registered_name,[]}, {error_info, {exit,kill, [{gen_server,terminate,6}, {proc_lib,init_p_do_apply,3}]}}, {ancestors,[<0.659.0>]}, {messages,[]}, {links,[]}, {dictionary,[]}, {trap_exit,true}, {status,running}, {heap_size,610}, {stack_size,24}, {reductions,209}], []]}} [Mon, 13 Jun 2011 22:30:38 GMT] [error] [<0.91.0>] ** Generic server <0.91.0> terminating ** Last message in was {'EXIT',<0.84.0>,kill} ** When Server state == {db,<0.91.0>,<0.92.0>,nil,<<"1308004175208004">>, <0.90.0>,<0.94.0>, {db_header,5,1,0, {15894,{1,0}}, {15988,1}, nil,0,nil,nil,1000}, 1, {btree,<0.90.0>, {15894,{1,0}}, #Fun<couch_db_updater.10.19222179>, #Fun<couch_db_updater.11.21515767>, #Fun<couch_btree.5.124754102>, #Fun<couch_db_updater.12.93888648>}, {btree,<0.90.0>, {15988,1}, #Fun<couch_db_updater.13.40165027>, #Fun<couch_db_updater.14.82810239>, #Fun<couch_btree.5.124754102>, #Fun<couch_db_updater.15.104121193>}, {btree,<0.90.0>,nil,#Fun<couch_btree.0.83553141>, #Fun<couch_btree.1.30790806>, #Fun<couch_btree.2.124754102>,nil}, 1,<<"_users">>,"/var/lib/couchdb/_users.couch", [#Fun<couch_doc.7.1569589>], [],nil, {user_ctx,null,[],undefined}, nil,1000, [before_header,after_header,on_file_open], true} ** Reason for termination == ** kill [Mon, 13 Jun 2011 22:30:38 GMT] [error] [<0.91.0>] {error_report,<0.31.0>, {<0.91.0>,crash_report, [[{initial_call,{couch_db,init,['Argument__1']}}, {pid,<0.91.0>}, {registered_name,[]}, {error_info, {exit,kill, [{gen_server,terminate,6}, {proc_lib,init_p_do_apply,3}]}}, {ancestors,[<0.89.0>]}, {messages,[]}, {links,[]}, {dictionary,[]}, {trap_exit,true}, {status,running}, {heap_size,987}, {stack_size,24}, {reductions,268}], []]}} [Mon, 13 Jun 2011 22:30:38 GMT] [info] [<0.683.0>] Stacktrace: [{io_lib_pretty,cind_tag_tuple,7}, {io_lib_pretty,while_fail,3}, {io_lib_pretty,print,6}, {io_lib_format,build,3}, {io_lib_format,build,3}, {io_lib_format,build,3}, {io_lib_format,build,3}, {io_lib_format,build,3}] [Mon, 13 Jun 2011 22:30:38 GMT] [info] [<0.683.0>] 127.0.0.1 - - 'GET' /vdsdata%2fd12%2f2007 500 [Mon, 13 Jun 2011 22:30:38 GMT] [debug] [<0.683.0>] httpd 500 error response: {"error":"kill","reason":"{gen_server,call,\n [couch_server,\n {open,<<\"vdsdata/d12/2007\">>,\n [{user_ctx,{user_ctx,null,[],undefined}}]},\n infinity]}"} Thanks, James -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
