Here's snippet from active task log at the time of crash: (logged every ten
minutes, timestamps are GMT+3)
20:20:02
[{"pid":"<0.15659.0>","changes_done":44507095,"database":"mutka_replicated","design_document":"_design/transactionA-1.2.0","progress":50,"started_on":1338419589,"total_changes":88351302,"type":"indexer","updated_on":1338484791}]
20:30:01
[{"pid":"<0.15659.0>","changes_done":44521837,"database":"mutka_replicated","design_document":"_design/transactionA-1.2.0","progress":50,"started_on":1338419589,"total_changes":88351302,"type":"indexer","updated_on":1338485387}]
20:40:01
[{"pid":"<0.15659.0>","changes_done":44530262,"database":"mutka_replicated","design_document":"_design/transactionA-1.2.0","progress":50,"started_on":1338419589,"total_changes":88351302,"type":"indexer","updated_on":1338485978}]
20:50:01 []
Another thing is that indexing seems to progress very slowly, about 10000 docs
/ 10 mins, can't quite confirm if this has changed after 1.2.0 update. There is
pretty much zero other disk activity on the server.
-Sami
On Jun 5, 2012, at 2:41 PM, Robert Newson wrote:
> Hi Sami,
>
> Can you observe the output of /_active_tasks during the view build and
> capture that for us? Particularly close to the end of indexing.
>
> You are aware that a view build, by design, only catches up to the
> database update sequence at the time you queried and then stops? I
> just want to be sure you're not expecting subsequent changes to also
> be indexed (until you query again) and that we have the same idea
> about when the view build should be stopping.
>
> B.
>
> On 5 June 2012 12:34, Sami Sierla <[email protected]> wrote:
>> Hi,
>>
>> Actually we are not sure if we are hitting the timeout or not since there
>> are no log entries anymore after increasing os_process_timeout. It might be
>> that some view code is not well guarded though (you can actually see the
>> code from server state part of log) - in any case I think there should be
>> some error log if the JS code causes indexer failure. Compacting views/dbs
>> seem to be fine with 1.2.0. We have no access to curl on production
>> environment so debugging is a bit difficult.
>>
>> In the testing environment we haven't seen this issue but the data is not
>> identical and the number of documents is much smaller (about 1/100).
>>
>> View indexing is periodically re-triggered by our java application so the
>> views will get eventually finished after multiple resumes.
>>
>> Regards,
>> Sami
>>
>> On Jun 5, 2012, at 2:20 PM, Robert Newson wrote:
>>
>>> You can increase that timeout with;
>>>
>>> curl -XPUT localhost:5984/_config/couchdb/os_process_timeout -d '"60000"'
>>>
>>> B.
>>>
>>> On 5 June 2012 12:18, Robert Newson <[email protected]> wrote:
>>>> Sounds like https://issues.apache.org/jira/browse/COUCHDB-994
>>>>
>>>> B.
>>>>
>>>> On 5 June 2012 11:07, Sami Sierla <[email protected]> wrote:
>>>>> Dave,
>>>>>
>>>>> Thank You for quick reply. The issues appear in a production environment
>>>>> to which I don't have access to modify configuration or design documents.
>>>>> Log level at the moment is "error"
>>>>>
>>>>> Below is a lengthy log dump we got when the os_process_timeout was 5000,
>>>>> after increasing timeout to 30000 there has been no log entries at all
>>>>> when indexing stops.
>>>>>
>>>>> -----
>>>>>
>>>>> [Thu, 31 May 2012 17:42:17 GMT] [error] [<0.15656.0>] OS Process Error
>>>>> <0.15657.0> :: {os_process_error, "OS process timed out."}
>>>>> [Thu, 31 May 2012 17:42:17 GMT] [error] [emulator] Error in process
>>>>> <0.15656.0> with exit value: {{nocatch,{os_process_error,"OS process
>>>>> timed
>>>>> out."}},[{couch_os_process,prompt,2},{couch_query_servers,map_doc_raw,2},{couch_view_updater,'-do_maps/3-fun-0-',3},{couch_view_updater,do_maps,3}]}
>>>>> [Thu, 31 May 2012 17:42:17 GMT] [error] [<0.15648.0>] ** Generic server
>>>>> <0.15648.0> terminating
>>>>> ** Last message in was {'EXIT',<0.15653.0>,
>>>>> {{nocatch,
>>>>> {os_process_error,"OS process timed
>>>>> out."}},
>>>>> [{couch_os_process,prompt,2},
>>>>> {couch_query_servers,map_doc_raw,2},
>>>>> {couch_view_updater,'-do_maps/3-fun-0-',3},
>>>>> {couch_view_updater,do_maps,3}]}}
>>>>> ** When Server state == {group_state,undefined,<<"mutka_replicated">>,
>>>>>
>>>>> {"/data/mutka/couchdb-index",<<"mutka_replicated">>,
>>>>> {group,
>>>>>
>>>>> <<223,185,95,248,235,18,77,64,18,164,253,96,95,237,
>>>>> 204,20>>,
>>>>> nil,<<"_design/transactionA-1.2.0">>,
>>>>> <<"javascript">>,[],
>>>>> [{view,0,0,0,
>>>>> [<<"transactionByPaymentInstrument">>],
>>>>> <<"function(doc) { if (doc.objectType ==
>>>>> \"ProtocolTransaction\" && doc.paymentInstrumentId) {
>>>>> emit([doc.paymentInstrumentId,doc.startTimestamp], null); } }">>,
>>>>> nil,[],[]},
>>>>> {view,1,0,0,
>>>>> [<<"transactionByTerminal">>],
>>>>> <<"function(doc) { if (doc.objectType ==
>>>>> \"ProtocolTransaction\" && doc.paymentTerminalId) {
>>>>> emit([doc.paymentTerminalId,doc.startTimestamp], null); } }">>,
>>>>> nil,[],[]},
>>>>> {view,2,0,0,
>>>>> [<<"transactionBySession">>],
>>>>> <<"function(doc) { if (doc.objectType ==
>>>>> \"ProtocolTransaction\" && doc.protocolSessionId) {
>>>>> emit(doc.protocolSessionId,doc.protocolTransactionId); } }">>,
>>>>> nil,[],[]},
>>>>> {view,3,0,0,
>>>>> [<<"transactionByRayId">>],
>>>>> <<"function(doc) { if (doc.objectType ==
>>>>> \"ProtocolTransaction\" && doc.cId) {
>>>>> emit([-(-doc.cId),doc.startTimestamp], null); } }">>,
>>>>> nil,[],[]}],
>>>>> {[]},
>>>>> nil,0,0,nil,nil}},
>>>>> {group,
>>>>>
>>>>> <<223,185,95,248,235,18,77,64,18,164,253,96,95,237,
>>>>> 204,20>>,
>>>>> <0.15650.0>,<<"_design/transactionA-1.2.0">>,
>>>>> <<"javascript">>,[],
>>>>> [{view,0,236439939,0,
>>>>> [<<"transactionByPaymentInstrument">>],
>>>>> <<"function(doc) { if (doc.objectType ==
>>>>> \"ProtocolTransaction\" && doc.paymentInstrumentId) {
>>>>> emit([doc.paymentInstrumentId,doc.startTimestamp], null); } }">>,
>>>>> {btree,<0.15650.0>,
>>>>> {47573274456,{8694059,[]},257926106},
>>>>> #Fun<couch_btree.3.71804109>,
>>>>> #Fun<couch_btree.4.115144917>,
>>>>> #Fun<couch_view.less_json_ids.2>,
>>>>> #Fun<couch_view_group.10.26766604>,snappy},
>>>>> [],[]},
>>>>> {view,1,236439939,0,
>>>>> [<<"transactionByTerminal">>],
>>>>> <<"function(doc) { if (doc.objectType ==
>>>>> \"ProtocolTransaction\" && doc.paymentTerminalId) {
>>>>> emit([doc.paymentTerminalId,doc.startTimestamp], null); } }">>,
>>>>> {btree,<0.15650.0>,
>>>>> {47574093427,{33638477,[]},942288018},
>>>>> #Fun<couch_btree.3.71804109>,
>>>>> #Fun<couch_btree.4.115144917>,
>>>>> #Fun<couch_view.less_json_ids.2>,
>>>>> #Fun<couch_view_group.10.26766604>,snappy},
>>>>> [],[]},
>>>>> {view,2,236439939,0,
>>>>> [<<"transactionBySession">>],
>>>>> <<"function(doc) { if (doc.objectType ==
>>>>> \"ProtocolTransaction\" && doc.protocolSessionId) {
>>>>> emit(doc.protocolSessionId,doc.protocolTransactionId); } }">>,
>>>>> {btree,<0.15650.0>,
>>>>> {47574114746,{9241366,[]},131141244},
>>>>> #Fun<couch_btree.3.71804109>,
>>>>> #Fun<couch_btree.4.115144917>,
>>>>> #Fun<couch_view.less_json_ids.2>,
>>>>> #Fun<couch_view_group.10.26766604>,snappy},
>>>>> [],[]},
>>>>> {view,1,236439939,0,
>>>>> [<<"transactionByTerminal">>],
>>>>> <<"function(doc) { if (doc.objectType ==
>>>>> \"ProtocolTransaction\" && doc.paymentTerminalId) {
>>>>> emit([doc.paymentTerminalId,doc.startTimestamp], null); } }">>,
>>>>> {btree,<0.15650.0>,
>>>>> {47574093427,{33638477,[]},942288018},
>>>>> #Fun<couch_btree.3.71804109>,
>>>>> #Fun<couch_btree.4.115144917>,
>>>>> #Fun<couch_view.less_json_ids.2>,
>>>>> #Fun<couch_view_group.10.26766604>,snappy},
>>>>> [],[]},
>>>>> {view,2,236439939,0,
>>>>> [<<"transactionBySession">>],
>>>>> <<"function(doc) { if (doc.objectType ==
>>>>> \"ProtocolTransaction\" && doc.protocolSessionId) {
>>>>> emit(doc.protocolSessionId,doc.protocolTransactionId); } }">>,
>>>>> {btree,<0.15650.0>,
>>>>> {47574114746,{9241366,[]},131141244},
>>>>> #Fun<couch_btree.3.71804109>,
>>>>> #Fun<couch_btree.4.115144917>,
>>>>> #Fun<couch_view.less_json_ids.2>,
>>>>> #Fun<couch_view_group.10.26766604>,snappy},
>>>>> [],[]},
>>>>> {view,3,236433956,0,
>>>>> [<<"transactionByRayId">>],
>>>>> <<"function(doc) { if (doc.objectType ==
>>>>> \"ProtocolTransaction\" && doc.cId) {
>>>>> emit([-(-doc.cId),doc.startTimestamp], null); } }">>,
>>>>> {btree,<0.15650.0>,
>>>>> {47559121340,{2250018,[]},76590679},
>>>>> #Fun<couch_btree.3.71804109>,
>>>>> #Fun<couch_btree.4.115144917>,
>>>>> #Fun<couch_view.less_json_ids.2>,
>>>>> #Fun<couch_view_group.10.26766604>,snappy},
>>>>> [],[]}],
>>>>> {[]},
>>>>> {btree,<0.15650.0>,
>>>>> {47572622835,[],1061098089},
>>>>> #Fun<couch_btree.3.71804109>,
>>>>> #Fun<couch_btree.4.115144917>,
>>>>> #Fun<couch_btree.5.93788370>,nil,snappy},
>>>>> 236439939,0,nil,nil},
>>>>> <0.15653.0>,nil,false,
>>>>> [{{<0.15441.0>,#Ref<0.0.0.182446>},409571621}],
>>>>> <0.15652.0>,false}
>>>>> ** Reason for termination ==
>>>>> ** {os_process_error,"OS process timed out."}
>>>>> [Thu, 31 May 2012 17:42:17 GMT] [error] [<0.15648.0>]
>>>>> {error_report,<0.31.0>,
>>>>> {<0.15648.0>,crash_report,
>>>>> [[{initial_call,
>>>>> {couch_view_group,init,['Argument__1']}},
>>>>> {pid,<0.15648.0>},
>>>>> {registered_name,[]},
>>>>> {error_info,
>>>>> {exit,
>>>>> {os_process_error,"OS process timed out."},
>>>>> [{gen_server,terminate,6},
>>>>> {proc_lib,init_p_do_apply,3}]}},
>>>>> {ancestors,[<0.15647.0>]},
>>>>> {messages,[]},
>>>>> {links,[<0.15650.0>,<0.123.0>]},
>>>>> {dictionary,[]},
>>>>> {trap_exit,true},
>>>>> {status,running},
>>>>> {heap_size,2584},
>>>>> {stack_size,24},
>>>>> {reductions,18059924}],
>>>>> []]}}
>>>>> [Thu, 31 May 2012 17:42:17 GMT] [error] [<0.15441.0>] Uncaught server
>>>>> error: {os_process_error, <<"OS process timed out.">>}
>>>>> [Thu, 31 May 2012 17:42:17 GMT] [error] [<0.15650.0>] ** Generic server
>>>>> <0.15650.0> terminating
>>>>> ** Last message in was {'EXIT',<0.15648.0>, {os_process_error,"OS process
>>>>> timed out."}}
>>>>> ** When Server state ==
>>>>> {file,{file_descriptor,prim_file,{#Port<0.2119>,19}}, 47574426987}
>>>>> ** Reason for termination ==
>>>>> ** {os_process_error,"OS process timed out."}
>>>>> [Thu, 31 May 2012 17:42:17 GMT] [error] [<0.15650.0>]
>>>>> {error_report,<0.31.0>,
>>>>> {<0.15650.0>,crash_report,
>>>>> [[{initial_call,{couch_file,init,['Argument__1']}},
>>>>> {pid,<0.15650.0>},
>>>>> {registered_name,[]},
>>>>> {error_info,
>>>>> {exit,
>>>>> {os_process_error,"OS process timed out."},
>>>>> [{gen_server,terminate,6},
>>>>> {proc_lib,init_p_do_apply,3}]}},
>>>>> {ancestors,[<0.15648.0>,<0.15647.0>]},
>>>>> {messages,[{'EXIT',<0.15652.0>,shutdown}]},
>>>>> {links,[]},
>>>>> {dictionary,[]},
>>>>> {trap_exit,true},
>>>>> {status,running},
>>>>> {heap_size,2584},
>>>>> {stack_size,24},
>>>>> {reductions,27732395236}],
>>>>> []]}}
>>>>>
>>>>>
>>>>> -Sami
>>>>> On Jun 5, 2012, at 12:23 PM, Dave Cottlehuber wrote:
>>>>>
>>>>>> On 5 June 2012 11:13, Sami Sierla <[email protected]> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> We have a rather large database (about 90 million documents /200GB)
>>>>>>> running on CouchDB (1.0.3) and we're now updating it to version 1.2.0
>>>>>>> due to view compaction problems (large view group compactions never
>>>>>>> finished).
>>>>>>>
>>>>>>> At the moment we are rebuilding (JavaScript) views with 1.2.0 but
>>>>>>> during this we have stumbled upon to new problem : indexer processes
>>>>>>> suddenly just disappear. Initially we got "OS Process Timeout" -errors
>>>>>>> to log but after adjusting os_process_timeout to 30secs indexing still
>>>>>>> prematurely stops but without any log entry.
>>>>>>>
>>>>>>> Any ideas what might cause this behavior?
>>>>>>>
>>>>>>> CouchDB is running on RHEL 5.8 and is statically linked with
>>>>>>> SpiderMonkey 1.8.5
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Sami Sierla / Poplatek Oy / Finland
>>>>>>
>>>>>> Sami,
>>>>>>
>>>>>> Have you anything useful in the couch.log file? Are you able to run
>>>>>> the view generation in debug mode (might not be possible due to disk
>>>>>> space constraints & performance impact).
>>>>>>
>>>>>> Also, if you query the view with ?limit=1&descending=true you'll get
>>>>>> the last doc that couch successfully processed (I think). Is there
>>>>>> anything special about that or the subsequent documents? If you
>>>>>> process the view & those docs manually into node or js.exe directly
>>>>>> [1] does that work?
>>>>>>
>>>>>> There's quite a few changes in 1.0.3 -> 1.2.0 including better
>>>>>> detection of ill-formed docs amongst others, more info will help
>>>>>> narrow this down.
>>>>>>
>>>>>> A+
>>>>>> Dave
>>>>>>
>>>>>> [1]:
>>>>>> http://wiki.apache.org/couchdb/Troubleshooting#Map.2BAC8-Reduce_debugging
>>>>>
>>