Tayven, Thanks for the info.
How much RAM is in this node? Do you know approximately how much RAM the beam.smp process is consuming when the oom-killer takes action? Have you changed any settings in default.ini/local.ini? -Joan ----- Original Message ----- > From: "Tayven Bigelow" <[email protected]> > To: "Jan Lehnardt" <[email protected]>, [email protected] > Cc: "Nick Becker" <[email protected]> > Sent: Tuesday, January 31, 2017 12:49:11 PM > Subject: Re: Crashing due to memory use > > Hey Jan! > > > You'd be correct on the multiple postings, weren't sure they were > being posted. > > We currently run this in production on cloudant and were hoping to > have a backup utilizing the new couchdb 2.0. We are able to > consistently replicate. > > The memory leak happens when we kick off a new view. > beam.smp terminates on a OOM by the kernel. > > Checking /var/log/syslog shows: > Jan 31 18:32:44 couchdb7 kernel: [594086.565577] Out of memory: Kill > process 23731 (beam.smp) score 961 or sacrifice child > Jan 31 18:32:44 couchdb7 kernel: [594086.565622] Killed process 23773 > (memsup) total-vm:4228kB, anon-rss:12kB, file-rss:0kB > Jan 31 18:32:44 couchdb7 kernel: [594086.569327] Out of memory: Kill > process 23731 (beam.smp) score 961 or sacrifice child > Jan 31 18:32:44 couchdb7 kernel: [594086.569392] Killed process 23731 > (beam.smp) total-vm:126594220kB, anon-rss:64708732kB, file-rss:0kB > Jan 31 18:32:56 couchdb7 monit[9113]: 'couchdb' process is not > running > > The couchdb.log file at the time of crash contains: > > 1981936-[debug] 2017-01-31T17:16:35.355774Z > [email protected] <0.9036.262> -------- OS Process > #Port<0.63437> Input :: > ["map_doc",{"_id":"bill-4690221d-fc07-4278-abdf-cabf1018ecb6","_rev":"5-b90c6c87a0a48e647528a1b3c5bfe12b","MetaData":{"PollId":"147402","Car > rierId":"25504","UserPollStateId":"3362564708"},"UserId":"1002449829201","CreateDate":"2015-11-23T06:42:40.0285675Z","LastModifiedDate":"2015-11-23T06:43:07.5474967Z","SystemSource":"GeoPoll","AttemptCount":1,"BillingIdentifier":"bill-4690221d-fc07-4278-abdf-cabf1018ecb6 > ","CallbackUri":"http://de-geopoll-1:8645/billingcallback","CallbackSent":true,"Activities":[{"MetaData":{},"CreateDate":"2015-11-23T06:42:59.0297329Z","State":"PROCESSING"},{"MetaData":{},"CreateDate":"2015-11-23T06:42:59.0307329Z","State":"SUCCESS"}],"Currency":"US_Dol > lar_USD","ConsumerIdentifier":"250025308","ToBeBilledIdentifier":"255763398389","BillType":"Carrier","BillProcessingStateAsString":"SUCCESS","Value":0.11,"BillProcessingState":"SUCCESS","BillingProvider":"TRANSFERTO","NextProcessingTime":"0001-01-01T00:00:00","NextProces > singTimeAsLong":0,"Id":"bill-4690221d-fc07-4278-abdf-cabf1018ecb6","CreatedDate":"2015-11-23T06:42:40.0285675Z","ModifiedDate":"2015-11-23T06:43:07.5474967Z","Type":"Bill"}] > 1981937-[debug] 2017-01-31T17:16:35.355856Z > [email protected] <0.11910.262> -------- OS Process > #Port<0.63508> Output :: > [[[["GeoPoll","8921801"],null]],[[["77802","PRETUPS"],null]],[[["77802","PRETUPS","SUCCESS","2014","03","05"],null],[["ALL","PRETUPS","SUCC > ESS","2014","03","05"],null],[["77802","ALL","SUCCESS","2014","03","05"],null],[["77802","PRETUPS","ALL","2014","03","05"],null],[["ALL","ALL","SUCCESS","2014","03","05"],null],[["ALL","PRETUPS","ALL","2014","03","05"],null],[["77802","ALL","ALL","2014","03","05"],null], > [["ALL","ALL","ALL","2014","03","05"],null]],[[["77802","2014","3","05"],null]],[["254788760292",null]],[[["PRETUPS","25402","2014-03-05T12:48:59.5664722Z"],43]],[[["PRETUPS","2014-03-05T12:48:59.5664722Z"],43]],[[["PRETUPS","SUCCESS","2014-03-05T12:48:59.5664722Z"],null > ]],[[["PRETUPS","25402","SUCCESS","2014-03-05T12:48:59.5664722Z"],null]],[[["PRETUPS","25402","2014-03-05T12:48:59.5664722Z"],null]],[[["PRETUPS","2014-03-05T12:48:59.5664722Z"],null]],[[["PRETUPS"],null]],[["254788760292",null]],[["1000374925501",null]],[[[2014,3,5,"PRE > TUPS","SUCCESS"],null]]] > 1981938-[debug] 2017-01-31T17:16:35.356012Z > [email protected] <0.9036.262> -------- OS Process > #Port<0.63437> Output :: > [[[["147402","TRANSFERTO","SUCCESS"],null]],[[["TRANSFERTO","SUCCESS","2015-11-23T06:43:07.5474967Z"],null]],[[["TRANSFERTO","SUCCESS","0001 > -01-01T00:00:00"],null]]] > 1981939-[debug] 2017-01-31T17:16:35.356108Z > [email protected] <0.11910.262> -------- OS Process > #Port<0.63508> Input :: > ["map_doc",{"_id":"bill-197d71d3-3091-47ef-9efe-b154161fcbfb","_rev":"3-832e63f45b45d5e3008b7e7bbe2b7392","MetaData":{"PollId":"77802","CarrierId":"25402","UserPollStateId":"3256532401","CarrierName":"Airtel-Kenya","Pretups.Version":"5.1","Pretups.Uri":"https://41.223.56.108:8093/pretups/C2SReceiver","Auth.Login":"pretups","Auth.Password":"0971500a350af5c3d1c0b12221a0558c","Auth.GatewayCode":"EXTGW","Auth.GatewayType":"EXTGW","Auth.ServicePort":"190","Auth.SourceType":"EXT","Cmd.ExtNwCode":"KE","Cmd.Msisdn":"732810086","Cmd.Pin":"2549","Cmd.Login":"","Cmd.Password":"","Cmd.ExtCode":"2468","CountryCode":"254","MobilePhoneLength":"9","TestMobileNumber":"254733621719","Currency":"KES"},"UserId":"1000277123401","CreateDate":"2014-03-05T13:45:49.6889321Z","LastModifiedDate":"2014-03-05T13:46:14.8050931Z","SystemSource":"GeoPoll","AttemptCount":1,"BillingIdentifier":"bill-197d71d3-3091-47ef-9efe-b154161fcbfb","CallbackUri":"http://uk-app-3:8645/billingcallback","Activities":[{"CreateDate":"2014-03-05T13:46:14.2902898Z","State":"PROCESSING"},{"MetaData":{"Type":"EXRCTRFRESP","Txnid":"R140305.1648.210003","Txnstatus":"200","Date":"05/03/2014 > 16:48:40","Extrefnum":"","Data":null},"CreateDate":"2014-03-05T13:46:14.2912898Z","State":"SUCCESS"}],"Currency":"Kenyan_Shilling_KES","ConsumerIdentifier":"8963201","ToBeBilledIdentifier":"254735960469","BillType":"Carrier","BillProcessingStateAsString":"SUCCESS","Value":43.0,"BillProcessingState":"SUCCESS","BillingProvider":"PRETUPS","NextProcessingTime":"0001-01-01T00:00:00","NextProcessingTimeAsLong":0,"Id":"bill-197d71d3-3091-47ef-9efe-b154161fcbfb","CreatedDate":"2014-03-05T13:45:49.6889321Z","ModifiedDate":"2014-03-05T13:46:14.8050931Z","Type":"Bill"}] > 1981940:[debug] 2017-01-31T17:32:57.300061Z > [email protected] <0.111.0> -------- Supervisor > couch_log_sup started couch_log_monitor:start_link() at pid > <0.114.0> > 1981941:[debug] 2017-01-31T17:32:57.301585Z > [email protected] <0.111.0> -------- Supervisor > couch_log_sup started config_listener_mon:start_link(couch_log_sup, > nil) at pid <0.115.0> > 1981942:[info] 2017-01-31T17:32:57.301605Z > [email protected] <0.7.0> -------- Application couch_log > started on node '[email protected]' > 1981943:[debug] 2017-01-31T17:32:57.302447Z > [email protected] <0.119.0> -------- Supervisor > folsom_sup started folsom_sample_slide_sup:start_link() at pid > <0.120.0> > 1981944:[debug] 2017-01-31T17:32:57.303229Z > [email protected] <0.119.0> -------- Supervisor > folsom_sup started folsom_meter_timer_server:start_link() at pid > <0.121.0> > 1981945:[debug] 2017-01-31T17:32:57.303979Z > [email protected] <0.119.0> -------- Supervisor > folsom_sup started folsom_metrics_histogram_ets:start_link() at pid > <0.122.0> > 1981946:[info] 2017-01-31T17:32:57.304074Z > [email protected] <0.7.0> -------- Application folsom > started on node '[email protected]' > 1981947:[debug] 2017-01-31T17:32:57.325716Z > [email protected] <0.126.0> -------- Supervisor > couch_stats_sup started couch_stats_aggregator:start_link() at pid > <0.127.0> > 1981948:[debug] 2017-01-31T17:32:57.326519Z > [email protected] <0.126.0> -------- Supervisor > couch_stats_sup started couch_stats_process_tracker:start_link() at > pid <0.177.0> > 1981949:[info] 2017-01-31T17:32:57.326595Z > [email protected] <0.7.0> -------- Application > couch_stats started on node '[email protected]' > 1981950:[info] 2017-01-31T17:32:57.326673Z > [email protected] <0.7.0> -------- Application khash > started on node '[email protected]' > 1981951:[debug] 2017-01-31T17:32:57.330327Z > [email protected] <0.182.0> -------- Supervisor > couch_event_sup2 started couch_event_server:start_link() at pid > <0.183.0> > 1981952:[debug] 2017-01-31T17:32:57.331211Z > [email protected] <0.185.0> -------- Supervisor > couch_event_os_sup started > config_listener_mon:start_link(couch_event_os_sup, nil) at pid > <0.186.0> > 1981953:[debug] 2017-01-31T17:32:57.331268Z > [email protected] <0.182.0> -------- Supervisor > couch_event_sup2 started couch_event_os_sup:start_link() at pid > <0.185.0> > 1981954:[info] 2017-01-31T17:32:57.331367Z > [email protected] <0.7.0> -------- Application > couch_event started on node '[email protected]' > 1981955:[debug] 2017-01-31T17:32:57.334167Z > [email protected] <0.190.0> -------- Supervisor > ibrowse_sup started ibrowse:start_link() at pid <0.191.0> > 1981956:[info] 2017-01-31T17:32:57.334239Z > [email protected] <0.7.0> -------- Application ibrowse > started on node '[email protected]' > 1981957:[debug] 2017-01-31T17:32:57.335727Z > [email protected] <0.196.0> -------- Supervisor ioq_sup > started config_listener_mon:start_link(ioq_sup, nil) at pid > <0.197.0> > 1981958:[debug] 2017-01-31T17:32:57.336685Z > [email protected] <0.196.0> -------- Supervisor ioq_sup > started ioq:start_link() at pid <0.198.0> > 1981959:[info] 2017-01-31T17:32:57.336756Z > [email protected] <0.7.0> -------- Application ioq > started on node '[email protected]' > 1981960:[info] 2017-01-31T17:32:57.336829Z > [email protected] <0.7.0> -------- Application mochiweb > started on node '[email protected]' > 1981961:[info] 2017-01-31T17:32:57.336899Z > [email protected] <0.7.0> -------- Application oauth > started on node '[email protected]' > 1981962:[info] 2017-01-31T17:32:57.340965Z > [email protected] <0.204.0> -------- Apache CouchDB 2.0.0 > is starting. > > > > For the Large database it would happen when we kicked off 1 out the > 39 views on the database, however on the smaller database I would > have to kick off all 5 views within the database. > The large database has 9 design documents, with the smaller database > having only 1. > The views are all JS. > Other than Fail2Ban, UFW, Logwatch, LogRotate, Monit and Zabbix-Agent > there is nothing else running on the server. Except when we build it > with Dreyfus and Clouseau. > > Example of one of the larger Design documents: > { > "_id": "_design/bills", > "_rev": "4-b0ed6cf8f871391add5004f7e67bc3a8", > "language": "javascript", > "auto_update": true, > "views": { > "by_bill_date_and_bill_provider": { > "map": "function(doc) {\n if (doc._id.indexOf(\"bill-\") === > 0){\n var date = new > Date(doc.CreatedDate?doc.CreatedDate:doc.CreateDate);\n > var year = date.getFullYear();\n var month = > (date.getMonth() + 1);\n var day = date.getDate();\n > emit([year, month, day, doc.BillingProvider, > doc.BillProcessingState], null);\n }\n}", > "reduce": "_count" > }, > "by_poll_id_and_bill_date": { > "map": "function(doc) {\n if ((doc._id.indexOf(\"bill-\") === > 0) && doc.MetaData.PollId){\n var date = new > Date(doc.CreateDate);\n var year = > date.getFullYear().toString();\n var month = > (date.getMonth() + 1).toString();\n var day = > date.getDate().toString();\n if (day.length == 1){\n > day = \"0\" + day;\n }\n\n > emit([doc.MetaData.PollId, year, month, day], null);\n > }\n}", > "reduce": "_count" > }, > } > } > > Example of a doc within the larger database: > { > "_id": "bill-e2a5a7d1-3d9f-4f9b-b526-13b80b9e6947", > "_rev": "5-b40e00a54059c6c79004c0afd584fc60", > "MetaData": { > "PollId": "1844608", > "CarrierId": "2701", > "UserPollStateId": "12614468108" > }, > "UserId": "1002196088104", > "CreateDate": "2017-01-31T07:20:58", > "LastModifiedDate": "2017-01-31T07:21:14.2473555Z", > "SystemSource": "GeoPoll", > "AttemptCount": 1, > "BillingIdentifier": "bill-e2a5a7d1-3d9f-4f9b-b526-13b80b9e6947", > "CallbackUri": "http://XXXXXXXXXXX:8645/billingcallback", > "CallbackSent": true, > "Activities": [ > { > "MetaData": {}, > "CreateDate": "2017-01-31T07:21:11.182049Z", > "State": "PROCESSING" > }, > { > "MetaData": { > "VoucherPin": "", > "OrderRef": "113234210", > "TicketNumber": "", > "BoxNumber": "", > "BatchNumber": "", > "ProcessingTime": "3064.3064" > }, > "CreateDate": "2017-01-31T07:21:11.1820491Z", > "State": "SUCCESS" > } > ], > "Currency": "South_African_Rand_ZAR", > "ConsumerIdentifier": "XXXXXXXXXXXX", > "ToBeBilledIdentifier": "XXXXXXXXXXXX", > "BillType": "Carrier", > "BillProcessingStateAsString": "SUCCESS", > "Value": 2, > "BillProcessingState": "SUCCESS", > "BillingProvider": "VODACOMSA", > "NextProcessingTime": "0001-01-01T00:00:00", > "NextProcessingTimeAsLong": 0, > "FinalProcessingTime": 0, > "LastSubmittedDate": "0001-01-01T00:00:00", > "Id": "bill-e2a5a7d1-3d9f-4f9b-b526-13b80b9e6947", > "CreatedDate": "2017-01-31T07:20:58", > "ModifiedDate": "2017-01-31T07:21:14.2473555Z", > "Type": "Bill" > } > > Docs usually go through 4-5 updates before they are finalized. > Within the larger database we have 16,201,998 docs totaling 23 GB. No > attachments. > > No other traffic besides a single user (me), including replication. > No other patterns that stand out (to me at least). The memory usage > grows and grows before eventually consuming the Swap space and > running into a OOM kill. > > The other 11 nodes are affected. > > Thanks for your assistance!! > > -Tayven > > ________________________________ > From: Jan Lehnardt <[email protected]> > Sent: Tuesday, January 31, 2017 4:38 AM > To: [email protected] > Cc: Tayven Bigelow; Nick Becker > Subject: Re: Crashing due to memory use > > Heya Nick and Tayven, > > I assume you posted multiple times because your mails didn’t show up > immediately due to mailing list moderation. > > You are correct that the database size and hardware configuration > should not cause any issues. > > Can you explain the scenario a little better? > > Is the memory leak happening when building your views for the first > time? > > Does beam.smp terminate on its own or is it an OOM kill from the > kernel? > > How many views do you have? > > How many design docs? > > JS views or Erlang views? > > Is there anything else running on these nodes? > > Can you share your view code? > > Can you share your couch.log? > > Can explain your document structure (total bytes, number of fields, > attachments etc.). > > Can you describe your traffic pattern? > > Can you describe any other pattern that leads up to the memory leak? > > Does this happen on all nodes? If not, is there anything special > about the affected nodes? > > > (shameless plug, if you require professional assistance, my email > footer has contact information) > > > > On 31 Jan 2017, at 00:15, Tayven Bigelow > > <[email protected]> wrote: > > > > Hey Guys! > > > > > > Been using a CouchDB 2.0 12 server cluster for a while now and have > > noticed a memory leak that causes beam.smp to crash while > > populating Views. > > > > The q/r/w/n is set up as: > > > > [cluster] > > q=12 > > r=2 > > w=2 > > n=3 > > > > As far as I know the server should be able to handle the load as it > > has 64GB RAM with a Core i7 6700. We are running ubuntu 16.04.1. > > > > The Database is 16.5 GB in size. > > > > > > I've also attempted to run 2.0 with Dreyfus and Clouseau and ran > > into the same issue with a Database size of 7.8MB. > > > > > > I've noted in previous releases some people have ran into similar > > memory issues with beam.smp and increasing the open file limit was > > part of the resolution. We've increased the nofile limit for the > > couchdb user to 4096 (as found here: > > https://wiki.apache.org/couchdb/Performance ) with no luck. > Performance - Couchdb > Wiki<https://wiki.apache.org/couchdb/Performance> > wiki.apache.org > With up to tens of thousands of documents you will generally find > CouchDB to perform well no matter how you write your code. Once you > start getting into ... > > > > > > > > > Nothing out of the ordinary is thrown in the logs. The only way to > > catch it is by watching memory use. > > > > > > I'm wondering if theres a configuration/setting somewhere that I am > > missing that could be causing this issue. > > > > > > Thanks! > > > > Tayven > > > > > > > > All information in this message is confidential and may be legally > > privileged. If you are not the intended recipient, notify the > > sender immediately and destroy this email. > > -- > Professional Support for Apache CouchDB: > https://neighbourhood.ie/couchdb-support/ > Professional Support for Apache CouchDB™ - > Neighbourhood<https://neighbourhood.ie/couchdb-support/> > neighbourhood.ie > Apache CouchDB is the first choice for geographically distributed > database solutions. From cross data-centre clusters to offline-first > mobile and web solutions ... > > > > Email: [email protected] > > > All information in this message is confidential and may be legally > privileged. If you are not the intended recipient, notify the sender > immediately and destroy this email. >
