Hi Hristo,

The problem is before what you presented from your log. For example, could you copy-paste information about process <0.86.0>? From the look of the log, it seems something (I couldn't find what from the part of the log you presented, but it seems that process <0.86.0> could provide some hints about that something) is crashing the generic server (without stopping the application) repeatedly and that's the reason for which you cannot access your documents. That explains why you are able to "fix" the problem by restarting CouchDB and also your message "OS process has timed out" (which is the effect, and not the cause of that problem). What is crashing your server, no idea, but the log should be able to provide the necessary information.

It may be also a good idea to check the space on your harddisk or the maximum file size allowed by your OS. But firstly, I would check what information that process <0.86.0> could provide.

CGS




On 12/08/2011 10:41 PM, Hristo Deshev wrote:
Hi everyone,

I moved some data from an Amazon EC2 small instance to a large one and in
the process upgraded from CouchDB 1.1.0 to CouchDB 1.1.1. I also went with
Erlang R14B04 instead of R14B03 (Hurray for commando updates!) and now my
CouchDB instance seems to sometimes die when under heavy load. By "dying" I
mean that the beam process seems to stay in memory, but the HTTP server is
gone and no requests get served. For now I "fix" this by stopping and
restarting the process.

Here are some details on my setup. The server is running a 64-bit Ubuntu
Server (Oneiric) Amazon EC2 image on a large instance with 2 CPU cores and
7.5 GB RAM. I build both Erlang and CouchDB from source. I collect log
entries and bulk insert them in batches of up to 200 documents. I also run
couchdb-lucene on the same host and I *think* most of the crashes happen
when couchdb-lucene is running a tough query and is hogging the CPU or the
HDD. I have some largish db's (~50 million documents, ~25 GB in disk
space). I plan on splitting my dbs into smaller ones. I hope that gets me
more responsive file access and faster full text index searches. I think my
lucene indexes may be getting too large for that machine's memory and it
can't serve them too well. I frequently get "OS process has timed out"
errors when trying to query those indexes. Anyway, that shouldn't be
crashing the core couchdb process, right?

I am pasting my idea of what the relevant portion of the couchdb log file
is below, hoping somebody could decipher something out of it. Am I correct
in thinking that the "** Reason for termination == ** {timeout," part means
the process is crashing since writing to or reading from a file timed out?
Any help is greatly appreciated.

Best,
Hristo

===============

[Thu, 08 Dec 2011 20:17:16 GMT] [error] [<0.78.0>] {error_report,<0.31.0>,
                        {<0.78.0>,supervisor_report,
                         [{supervisor,{local,couch_server_sup}},
                          {errorContext,child_terminated},
                          {reason,shutdown},
                          {offender,
                              [{pid,<0.86.0>},
                               {name,couch_secondary_services},
                               {mfargs,

{couch_server_sup,start_secondary_services,
                                       []}},
                               {restart_type,permanent},
                               {shutdown,infinity},
                               {child_type,supervisor}]}]}}
[Thu, 08 Dec 2011 20:17:21 GMT] [error] [<0.407.0>] ** Generic server
<0.407.0>  terminating
** Last message in was delayed_commit
** When Server state == {db,<0.406.0>,<0.407.0>,nil,<<"1323371423957954">>,
                             <0.404.0>,<0.408.0>,
                             {db_header,5,204982,0,
                                 {199491055,{204980,0}},
                                 {199498140,204980},
                                 {111685732,[]},
                                 0,nil,nil,1000},
                             204982,
                             {btree,<0.404.0>,
                                 {199513565,{205011,0}},
                                 #Fun<couch_db_updater.10.19222179>,
                                 #Fun<couch_db_updater.11.21515767>,
                                 #Fun<couch_btree.5.112258129>,
                                 #Fun<couch_db_updater.12.93888648>},
                             {btree,<0.404.0>,
                                 {199518784,205011},
                                 #Fun<couch_db_updater.13.40165027>,
                                 #Fun<couch_db_updater.14.82810239>,
                                 #Fun<couch_btree.5.112258129>,
                                 #Fun<couch_db_updater.15.104121193>},
                             {btree,<0.404.0>,
                                 {111685732,[]},
                                 #Fun<couch_btree.0.23070627>,
                                 #Fun<couch_btree.1.117278773>,
                                 #Fun<couch_btree.2.112258129>,nil},
                             205013,
                             <<"database1">>,
                             "/data/couchdb/data/database1.couch",
                             [],[],nil,
                             {user_ctx,null,[],undefined},
                             #Ref<0.0.30.131014>,1000,
                             [before_header,after_header,on_file_open],
                             false}
** Reason for termination ==
** {timeout,
        {gen_server,call,
            [<0.406.0>,
             {db_updated,

{db,<0.406.0>,<0.407.0>,nil,<<"1323371423957954">>,<0.404.0>,
                     <0.408.0>,
                     {db_header,5,205013,0,
                         {199513565,{205011,0}},
                         {199518784,205011},
                         {111685732,[]},
                         0,nil,nil,1000},
                     205013,
                     {btree,<0.404.0>,
                         {199513565,{205011,0}},
                         #Fun<couch_db_updater.10.19222179>,
                         #Fun<couch_db_updater.11.21515767>,
                         #Fun<couch_btree.5.112258129>,
                         #Fun<couch_db_updater.12.93888648>},
                     {btree,<0.404.0>,
                         {199518784,205011},
                         #Fun<couch_db_updater.13.40165027>,
                         #Fun<couch_db_updater.14.82810239>,
                         #Fun<couch_btree.5.112258129>,
                         #Fun<couch_db_updater.15.104121193>},
                     {btree,<0.404.0>,
                         {111685732,[]},
                         #Fun<couch_btree.0.23070627>,
                         #Fun<couch_btree.1.117278773>,
                         #Fun<couch_btree.2.112258129>,nil},
                     205013,
                     <<"database1">>,
                     "/data/couchdb/data/database1.couch",
                     [],[],nil,
                     {user_ctx,null,[],undefined},
                     nil,1000,
                     [before_header,after_header,on_file_open],
                     false}}]}}

[Thu, 08 Dec 2011 20:17:21 GMT] [error] [<0.407.0>] {error_report,<0.31.0>,
                      {<0.407.0>,crash_report,

[[{initial_call,{couch_db_updater,init,['Argument__1']}},
                         {pid,<0.407.0>},
                         {registered_name,[]},
                         {error_info,
                          {exit,
                           {timeout,
                            {gen_server,call,
                             [<0.406.0>,
                              {db_updated,
                               {db,<0.406.0>,<0.407.0>,nil,
                                <<"1323371423957954">>,<0.404.0>,<0.408.0>,
                                {db_header,5,205013,0,
                                 {199513565,{205011,0}},
                                 {199518784,205011},
                                 {111685732,[]},
                                 0,nil,nil,1000},
                                205013,
                                {btree,<0.404.0>,
                                 {199513565,{205011,0}},
                                 #Fun<couch_db_updater.10.19222179>,
                                 #Fun<couch_db_updater.11.21515767>,
                                 #Fun<couch_btree.5.112258129>,
                                 #Fun<couch_db_updater.12.93888648>},
                                {btree,<0.404.0>,
                                 {199518784,205011},
                                 #Fun<couch_db_updater.13.40165027>,
                                 #Fun<couch_db_updater.14.82810239>,
                                 #Fun<couch_btree.5.112258129>,
                                 #Fun<couch_db_updater.15.104121193>},
                                {btree,<0.404.0>,
                                 {111685732,[]},
                                 #Fun<couch_btree.0.23070627>,
                                 #Fun<couch_btree.1.117278773>,
                                 #Fun<couch_btree.2.112258129>,nil},
                                205013,
                                <<"database1">>,
                                "/data/couchdb/data/database1.couch",
                                [],[],nil,
                                {user_ctx,null,[],undefined},
                                nil,1000,
                                [before_header,after_header,on_file_open],
                                false}}]}},
                           [{gen_server,terminate,6},
                            {proc_lib,init_p_do_apply,3}]}},
                         {ancestors,[<0.406.0>,<0.403.0>]},
                         {messages,[{'EXIT',<0.406.0>,shutdown}]},
                         {links,[]},
                         {dictionary,[]},
                         {trap_exit,true},
                         {status,running},
                         {heap_size,28657},
                         {stack_size,24},
                         {reductions,4487709}],
                        []]}}
[Thu, 08 Dec 2011 20:17:22 GMT] [error] [<0.178.0>] ** Generic server
<0.178.0>  terminating
** Last message in was {update_docs,<0.2027.0>,
                            [[{doc,<<"55e776b94547442ab17b82bd1a059843">>,
                                  {1,
                                   [<<102,77,172,235,192,72,84,223,58,68,105,
                                      199,153,147,196,81>>]},
                                  {[{<<"host">>,<<"Host1">>},
                                    {<<"time">>,1323375464000},
                                    {<<"text">>,
                                     <<"Some text">>},
                                    {<<"level">>,0},
                                    {<<"source">>,<<"source1">>},
                                    {<<"type">>,<<"Entry1">>}]},
                                  [],false,[]}],

...
[[A BUNCH OF DOCS HERE]]
...


                                  {[{<<"host">>,<<"Host1">>},
                                    {<<"time">>,1323375467000},
                                    {<<"text">>,
                                     <<"Some text">>},
                                    {<<"level">>,0},
                                    {<<"source">>,<<"source1">>},
                                    {<<"type">>,<<"Entry1">>}]},
                                  [],false,[]}]],
                            [],false,false}
** When Server state == {db,<0.177.0>,<0.178.0>,nil,<<"1323371411352029">>,
                             <0.175.0>,<0.179.0>,
                             {db_header,5,13636863,0,
                                 {6776455960,{13636861,0}},
                                 {6776479023,13636861},
                                 {1039786,[]},
                                 0,nil,nil,1000},
                             13636863,
                             {btree,<0.175.0>,
                                 {6776455960,{13636861,0}},
                                 #Fun<couch_db_updater.10.19222179>,
                                 #Fun<couch_db_updater.11.21515767>,
                                 #Fun<couch_btree.5.112258129>,
                                 #Fun<couch_db_updater.12.93888648>},
                             {btree,<0.175.0>,
                                 {6776479023,13636861},
                                 #Fun<couch_db_updater.13.40165027>,
                                 #Fun<couch_db_updater.14.82810239>,
                                 #Fun<couch_btree.5.112258129>,
                                 #Fun<couch_db_updater.15.104121193>},
                             {btree,<0.175.0>,
                                 {1039786,[]},
                                 #Fun<couch_btree.0.23070627>,
                                 #Fun<couch_btree.1.117278773>,
                                 #Fun<couch_btree.2.112258129>,nil},
                             13636863,
                             <<"database2">>,
                             "/data/couchdb/data/database2.couch",
                             [],[],nil,
                             {user_ctx,null,[],undefined},
                             nil,1000,
                             [before_header,after_header,on_file_open],
                             false}
** Reason for termination ==
** {timeout,
        {gen_server,call,
            [<0.177.0>,
             {db_updated,

{db,<0.177.0>,<0.178.0>,nil,<<"1323371411352029">>,<0.175.0>,
                     <0.179.0>,
                     {db_header,5,13636863,0,
                         {6776455960,{13636861,0}},
                         {6776479023,13636861},
                         {1039786,[]},
                         0,nil,nil,1000},
                     13636863,
                     {btree,<0.175.0>,
                         {6776557909,{13637061,0}},
                         #Fun<couch_db_updater.10.19222179>,
                         #Fun<couch_db_updater.11.21515767>,
                         #Fun<couch_btree.5.112258129>,
                         #Fun<couch_db_updater.12.93888648>},
                     {btree,<0.175.0>,
                         {6776580448,13637061},
                         #Fun<couch_db_updater.13.40165027>,
                         #Fun<couch_db_updater.14.82810239>,
                         #Fun<couch_btree.5.112258129>,
                         #Fun<couch_db_updater.15.104121193>},
                     {btree,<0.175.0>,
                         {1039786,[]},
                         #Fun<couch_btree.0.23070627>,
                         #Fun<couch_btree.1.117278773>,
                         #Fun<couch_btree.2.112258129>,nil},
                     13637063,
                     <<"database2">>,
                     "/data/couchdb/data/database2.couch",
                     [],[],nil,
                     {user_ctx,null,[],undefined},
                     #Ref<0.0.30.133811>,1000,
                     [before_header,after_header,on_file_open],
                     false}}]}}

[Thu, 08 Dec 2011 20:17:22 GMT] [error] [<0.178.0>] {error_report,<0.31.0>,
                      {<0.178.0>,crash_report,

[[{initial_call,{couch_db_updater,init,['Argument__1']}},
                         {pid,<0.178.0>},
                         {registered_name,[]},
                         {error_info,
                          {exit,
                           {timeout,
                            {gen_server,call,
                             [<0.177.0>,
                              {db_updated,
                               {db,<0.177.0>,<0.178.0>,nil,
                                <<"1323371411352029">>,<0.175.0>,<0.179.0>,
                                {db_header,5,13636863,0,
                                 {6776455960,{13636861,0}},
                                 {6776479023,13636861},
                                 {1039786,[]},
                                 0,nil,nil,1000},
                                13636863,
                                {btree,<0.175.0>,
                                 {6776557909,{13637061,0}},
                                 #Fun<couch_db_updater.10.19222179>,
                                 #Fun<couch_db_updater.11.21515767>,
                                 #Fun<couch_btree.5.112258129>,
                                 #Fun<couch_db_updater.12.93888648>},
                                {btree,<0.175.0>,
                                 {6776580448,13637061},
                                 #Fun<couch_db_updater.13.40165027>,
                                 #Fun<couch_db_updater.14.82810239>,
                                 #Fun<couch_btree.5.112258129>,
                                 #Fun<couch_db_updater.15.104121193>},
                                {btree,<0.175.0>,
                                 {1039786,[]},
                                 #Fun<couch_btree.0.23070627>,
                                 #Fun<couch_btree.1.117278773>,
                                 #Fun<couch_btree.2.112258129>,nil},
                                13637063,
                                <<"database2">>,
                                "/data/couchdb/data/database2.couch",
                                [],[],nil,
                                {user_ctx,null,[],undefined},
                                #Ref<0.0.30.133811>,1000,
                                [before_header,after_header,on_file_open],
                                false}}]}},
                           [{gen_server,terminate,6},
                            {proc_lib,init_p_do_apply,3}]}},
                         {ancestors,[<0.177.0>,<0.174.0>]},
                         {messages,
                          [{'EXIT',<0.177.0>,shutdown},delayed_commit]},
                         {links,[]},
                         {dictionary,[]},
                         {trap_exit,true},
                         {status,running},
                         {heap_size,121393},
                         {stack_size,24},
                         {reductions,83311172}],
                        []]}}


Reply via email to