Hi, Actually we are not sure if we are hitting the timeout or not since there are no log entries anymore after increasing os_process_timeout. It might be that some view code is not well guarded though (you can actually see the code from server state part of log) - in any case I think there should be some error log if the JS code causes indexer failure. Compacting views/dbs seem to be fine with 1.2.0. We have no access to curl on production environment so debugging is a bit difficult.
In the testing environment we haven't seen this issue but the data is not identical and the number of documents is much smaller (about 1/100). View indexing is periodically re-triggered by our java application so the views will get eventually finished after multiple resumes. Regards, Sami On Jun 5, 2012, at 2:20 PM, Robert Newson wrote: > You can increase that timeout with; > > curl -XPUT localhost:5984/_config/couchdb/os_process_timeout -d '"60000"' > > B. > > On 5 June 2012 12:18, Robert Newson <[email protected]> wrote: >> Sounds like https://issues.apache.org/jira/browse/COUCHDB-994 >> >> B. >> >> On 5 June 2012 11:07, Sami Sierla <[email protected]> wrote: >>> Dave, >>> >>> Thank You for quick reply. The issues appear in a production environment to >>> which I don't have access to modify configuration or design documents. Log >>> level at the moment is "error" >>> >>> Below is a lengthy log dump we got when the os_process_timeout was 5000, >>> after increasing timeout to 30000 there has been no log entries at all when >>> indexing stops. >>> >>> ----- >>> >>> [Thu, 31 May 2012 17:42:17 GMT] [error] [<0.15656.0>] OS Process Error >>> <0.15657.0> :: {os_process_error, "OS process timed out."} >>> [Thu, 31 May 2012 17:42:17 GMT] [error] [emulator] Error in process >>> <0.15656.0> with exit value: {{nocatch,{os_process_error,"OS process timed >>> out."}},[{couch_os_process,prompt,2},{couch_query_servers,map_doc_raw,2},{couch_view_updater,'-do_maps/3-fun-0-',3},{couch_view_updater,do_maps,3}]} >>> [Thu, 31 May 2012 17:42:17 GMT] [error] [<0.15648.0>] ** Generic server >>> <0.15648.0> terminating >>> ** Last message in was {'EXIT',<0.15653.0>, >>> {{nocatch, >>> {os_process_error,"OS process timed out."}}, >>> [{couch_os_process,prompt,2}, >>> {couch_query_servers,map_doc_raw,2}, >>> {couch_view_updater,'-do_maps/3-fun-0-',3}, >>> {couch_view_updater,do_maps,3}]}} >>> ** When Server state == {group_state,undefined,<<"mutka_replicated">>, >>> {"/data/mutka/couchdb-index",<<"mutka_replicated">>, >>> {group, >>> >>> <<223,185,95,248,235,18,77,64,18,164,253,96,95,237, >>> 204,20>>, >>> nil,<<"_design/transactionA-1.2.0">>, >>> <<"javascript">>,[], >>> [{view,0,0,0, >>> [<<"transactionByPaymentInstrument">>], >>> <<"function(doc) { if (doc.objectType == >>> \"ProtocolTransaction\" && doc.paymentInstrumentId) { >>> emit([doc.paymentInstrumentId,doc.startTimestamp], null); } }">>, >>> nil,[],[]}, >>> {view,1,0,0, >>> [<<"transactionByTerminal">>], >>> <<"function(doc) { if (doc.objectType == >>> \"ProtocolTransaction\" && doc.paymentTerminalId) { >>> emit([doc.paymentTerminalId,doc.startTimestamp], null); } }">>, >>> nil,[],[]}, >>> {view,2,0,0, >>> [<<"transactionBySession">>], >>> <<"function(doc) { if (doc.objectType == >>> \"ProtocolTransaction\" && doc.protocolSessionId) { >>> emit(doc.protocolSessionId,doc.protocolTransactionId); } }">>, >>> nil,[],[]}, >>> {view,3,0,0, >>> [<<"transactionByRayId">>], >>> <<"function(doc) { if (doc.objectType == >>> \"ProtocolTransaction\" && doc.cId) { >>> emit([-(-doc.cId),doc.startTimestamp], null); } }">>, >>> nil,[],[]}], >>> {[]}, >>> nil,0,0,nil,nil}}, >>> {group, >>> <<223,185,95,248,235,18,77,64,18,164,253,96,95,237, >>> 204,20>>, >>> <0.15650.0>,<<"_design/transactionA-1.2.0">>, >>> <<"javascript">>,[], >>> [{view,0,236439939,0, >>> [<<"transactionByPaymentInstrument">>], >>> <<"function(doc) { if (doc.objectType == >>> \"ProtocolTransaction\" && doc.paymentInstrumentId) { >>> emit([doc.paymentInstrumentId,doc.startTimestamp], null); } }">>, >>> {btree,<0.15650.0>, >>> {47573274456,{8694059,[]},257926106}, >>> #Fun<couch_btree.3.71804109>, >>> #Fun<couch_btree.4.115144917>, >>> #Fun<couch_view.less_json_ids.2>, >>> #Fun<couch_view_group.10.26766604>,snappy}, >>> [],[]}, >>> {view,1,236439939,0, >>> [<<"transactionByTerminal">>], >>> <<"function(doc) { if (doc.objectType == >>> \"ProtocolTransaction\" && doc.paymentTerminalId) { >>> emit([doc.paymentTerminalId,doc.startTimestamp], null); } }">>, >>> {btree,<0.15650.0>, >>> {47574093427,{33638477,[]},942288018}, >>> #Fun<couch_btree.3.71804109>, >>> #Fun<couch_btree.4.115144917>, >>> #Fun<couch_view.less_json_ids.2>, >>> #Fun<couch_view_group.10.26766604>,snappy}, >>> [],[]}, >>> {view,2,236439939,0, >>> [<<"transactionBySession">>], >>> <<"function(doc) { if (doc.objectType == >>> \"ProtocolTransaction\" && doc.protocolSessionId) { >>> emit(doc.protocolSessionId,doc.protocolTransactionId); } }">>, >>> {btree,<0.15650.0>, >>> {47574114746,{9241366,[]},131141244}, >>> #Fun<couch_btree.3.71804109>, >>> #Fun<couch_btree.4.115144917>, >>> #Fun<couch_view.less_json_ids.2>, >>> #Fun<couch_view_group.10.26766604>,snappy}, >>> [],[]}, >>> {view,1,236439939,0, >>> [<<"transactionByTerminal">>], >>> <<"function(doc) { if (doc.objectType == >>> \"ProtocolTransaction\" && doc.paymentTerminalId) { >>> emit([doc.paymentTerminalId,doc.startTimestamp], null); } }">>, >>> {btree,<0.15650.0>, >>> {47574093427,{33638477,[]},942288018}, >>> #Fun<couch_btree.3.71804109>, >>> #Fun<couch_btree.4.115144917>, >>> #Fun<couch_view.less_json_ids.2>, >>> #Fun<couch_view_group.10.26766604>,snappy}, >>> [],[]}, >>> {view,2,236439939,0, >>> [<<"transactionBySession">>], >>> <<"function(doc) { if (doc.objectType == >>> \"ProtocolTransaction\" && doc.protocolSessionId) { >>> emit(doc.protocolSessionId,doc.protocolTransactionId); } }">>, >>> {btree,<0.15650.0>, >>> {47574114746,{9241366,[]},131141244}, >>> #Fun<couch_btree.3.71804109>, >>> #Fun<couch_btree.4.115144917>, >>> #Fun<couch_view.less_json_ids.2>, >>> #Fun<couch_view_group.10.26766604>,snappy}, >>> [],[]}, >>> {view,3,236433956,0, >>> [<<"transactionByRayId">>], >>> <<"function(doc) { if (doc.objectType == >>> \"ProtocolTransaction\" && doc.cId) { >>> emit([-(-doc.cId),doc.startTimestamp], null); } }">>, >>> {btree,<0.15650.0>, >>> {47559121340,{2250018,[]},76590679}, >>> #Fun<couch_btree.3.71804109>, >>> #Fun<couch_btree.4.115144917>, >>> #Fun<couch_view.less_json_ids.2>, >>> #Fun<couch_view_group.10.26766604>,snappy}, >>> [],[]}], >>> {[]}, >>> {btree,<0.15650.0>, >>> {47572622835,[],1061098089}, >>> #Fun<couch_btree.3.71804109>, >>> #Fun<couch_btree.4.115144917>, >>> #Fun<couch_btree.5.93788370>,nil,snappy}, >>> 236439939,0,nil,nil}, >>> <0.15653.0>,nil,false, >>> [{{<0.15441.0>,#Ref<0.0.0.182446>},409571621}], >>> <0.15652.0>,false} >>> ** Reason for termination == >>> ** {os_process_error,"OS process timed out."} >>> [Thu, 31 May 2012 17:42:17 GMT] [error] [<0.15648.0>] >>> {error_report,<0.31.0>, >>> {<0.15648.0>,crash_report, >>> [[{initial_call, >>> {couch_view_group,init,['Argument__1']}}, >>> {pid,<0.15648.0>}, >>> {registered_name,[]}, >>> {error_info, >>> {exit, >>> {os_process_error,"OS process timed out."}, >>> [{gen_server,terminate,6}, >>> {proc_lib,init_p_do_apply,3}]}}, >>> {ancestors,[<0.15647.0>]}, >>> {messages,[]}, >>> {links,[<0.15650.0>,<0.123.0>]}, >>> {dictionary,[]}, >>> {trap_exit,true}, >>> {status,running}, >>> {heap_size,2584}, >>> {stack_size,24}, >>> {reductions,18059924}], >>> []]}} >>> [Thu, 31 May 2012 17:42:17 GMT] [error] [<0.15441.0>] Uncaught server >>> error: {os_process_error, <<"OS process timed out.">>} >>> [Thu, 31 May 2012 17:42:17 GMT] [error] [<0.15650.0>] ** Generic server >>> <0.15650.0> terminating >>> ** Last message in was {'EXIT',<0.15648.0>, {os_process_error,"OS process >>> timed out."}} >>> ** When Server state == >>> {file,{file_descriptor,prim_file,{#Port<0.2119>,19}}, 47574426987} >>> ** Reason for termination == >>> ** {os_process_error,"OS process timed out."} >>> [Thu, 31 May 2012 17:42:17 GMT] [error] [<0.15650.0>] >>> {error_report,<0.31.0>, >>> {<0.15650.0>,crash_report, >>> [[{initial_call,{couch_file,init,['Argument__1']}}, >>> {pid,<0.15650.0>}, >>> {registered_name,[]}, >>> {error_info, >>> {exit, >>> {os_process_error,"OS process timed out."}, >>> [{gen_server,terminate,6}, >>> {proc_lib,init_p_do_apply,3}]}}, >>> {ancestors,[<0.15648.0>,<0.15647.0>]}, >>> {messages,[{'EXIT',<0.15652.0>,shutdown}]}, >>> {links,[]}, >>> {dictionary,[]}, >>> {trap_exit,true}, >>> {status,running}, >>> {heap_size,2584}, >>> {stack_size,24}, >>> {reductions,27732395236}], >>> []]}} >>> >>> >>> -Sami >>> On Jun 5, 2012, at 12:23 PM, Dave Cottlehuber wrote: >>> >>>> On 5 June 2012 11:13, Sami Sierla <[email protected]> wrote: >>>>> Hi, >>>>> >>>>> We have a rather large database (about 90 million documents /200GB) >>>>> running on CouchDB (1.0.3) and we're now updating it to version 1.2.0 due >>>>> to view compaction problems (large view group compactions never finished). >>>>> >>>>> At the moment we are rebuilding (JavaScript) views with 1.2.0 but during >>>>> this we have stumbled upon to new problem : indexer processes suddenly >>>>> just disappear. Initially we got "OS Process Timeout" -errors to log but >>>>> after adjusting os_process_timeout to 30secs indexing still prematurely >>>>> stops but without any log entry. >>>>> >>>>> Any ideas what might cause this behavior? >>>>> >>>>> CouchDB is running on RHEL 5.8 and is statically linked with SpiderMonkey >>>>> 1.8.5 >>>>> >>>>> >>>>> Regards, >>>>> Sami Sierla / Poplatek Oy / Finland >>>> >>>> Sami, >>>> >>>> Have you anything useful in the couch.log file? Are you able to run >>>> the view generation in debug mode (might not be possible due to disk >>>> space constraints & performance impact). >>>> >>>> Also, if you query the view with ?limit=1&descending=true you'll get >>>> the last doc that couch successfully processed (I think). Is there >>>> anything special about that or the subsequent documents? If you >>>> process the view & those docs manually into node or js.exe directly >>>> [1] does that work? >>>> >>>> There's quite a few changes in 1.0.3 -> 1.2.0 including better >>>> detection of ill-formed docs amongst others, more info will help >>>> narrow this down. >>>> >>>> A+ >>>> Dave >>>> >>>> [1]: >>>> http://wiki.apache.org/couchdb/Troubleshooting#Map.2BAC8-Reduce_debugging >>>
