Hey Jan!
You'd be correct on the multiple postings, weren't sure they were being posted. We currently run this in production on cloudant and were hoping to have a backup utilizing the new couchdb 2.0. We are able to consistently replicate. The memory leak happens when we kick off a new view. beam.smp terminates on a OOM by the kernel. Checking /var/log/syslog shows: Jan 31 18:32:44 couchdb7 kernel: [594086.565577] Out of memory: Kill process 23731 (beam.smp) score 961 or sacrifice child Jan 31 18:32:44 couchdb7 kernel: [594086.565622] Killed process 23773 (memsup) total-vm:4228kB, anon-rss:12kB, file-rss:0kB Jan 31 18:32:44 couchdb7 kernel: [594086.569327] Out of memory: Kill process 23731 (beam.smp) score 961 or sacrifice child Jan 31 18:32:44 couchdb7 kernel: [594086.569392] Killed process 23731 (beam.smp) total-vm:126594220kB, anon-rss:64708732kB, file-rss:0kB Jan 31 18:32:56 couchdb7 monit[9113]: 'couchdb' process is not running The couchdb.log file at the time of crash contains: 1981936-[debug] 2017-01-31T17:16:35.355774Z [email protected] <0.9036.262> -------- OS Process #Port<0.63437> Input :: ["map_doc",{"_id":"bill-4690221d-fc07-4278-abdf-cabf1018ecb6","_rev":"5-b90c6c87a0a48e647528a1b3c5bfe12b","MetaData":{"PollId":"147402","Car rierId":"25504","UserPollStateId":"3362564708"},"UserId":"1002449829201","CreateDate":"2015-11-23T06:42:40.0285675Z","LastModifiedDate":"2015-11-23T06:43:07.5474967Z","SystemSource":"GeoPoll","AttemptCount":1,"BillingIdentifier":"bill-4690221d-fc07-4278-abdf-cabf1018ecb6 ","CallbackUri":"http://de-geopoll-1:8645/billingcallback","CallbackSent":true,"Activities":[{"MetaData":{},"CreateDate":"2015-11-23T06:42:59.0297329Z","State":"PROCESSING"},{"MetaData":{},"CreateDate":"2015-11-23T06:42:59.0307329Z","State":"SUCCESS"}],"Currency":"US_Dol lar_USD","ConsumerIdentifier":"250025308","ToBeBilledIdentifier":"255763398389","BillType":"Carrier","BillProcessingStateAsString":"SUCCESS","Value":0.11,"BillProcessingState":"SUCCESS","BillingProvider":"TRANSFERTO","NextProcessingTime":"0001-01-01T00:00:00","NextProces singTimeAsLong":0,"Id":"bill-4690221d-fc07-4278-abdf-cabf1018ecb6","CreatedDate":"2015-11-23T06:42:40.0285675Z","ModifiedDate":"2015-11-23T06:43:07.5474967Z","Type":"Bill"}] 1981937-[debug] 2017-01-31T17:16:35.355856Z [email protected] <0.11910.262> -------- OS Process #Port<0.63508> Output :: [[[["GeoPoll","8921801"],null]],[[["77802","PRETUPS"],null]],[[["77802","PRETUPS","SUCCESS","2014","03","05"],null],[["ALL","PRETUPS","SUCC ESS","2014","03","05"],null],[["77802","ALL","SUCCESS","2014","03","05"],null],[["77802","PRETUPS","ALL","2014","03","05"],null],[["ALL","ALL","SUCCESS","2014","03","05"],null],[["ALL","PRETUPS","ALL","2014","03","05"],null],[["77802","ALL","ALL","2014","03","05"],null], [["ALL","ALL","ALL","2014","03","05"],null]],[[["77802","2014","3","05"],null]],[["254788760292",null]],[[["PRETUPS","25402","2014-03-05T12:48:59.5664722Z"],43]],[[["PRETUPS","2014-03-05T12:48:59.5664722Z"],43]],[[["PRETUPS","SUCCESS","2014-03-05T12:48:59.5664722Z"],null ]],[[["PRETUPS","25402","SUCCESS","2014-03-05T12:48:59.5664722Z"],null]],[[["PRETUPS","25402","2014-03-05T12:48:59.5664722Z"],null]],[[["PRETUPS","2014-03-05T12:48:59.5664722Z"],null]],[[["PRETUPS"],null]],[["254788760292",null]],[["1000374925501",null]],[[[2014,3,5,"PRE TUPS","SUCCESS"],null]]] 1981938-[debug] 2017-01-31T17:16:35.356012Z [email protected] <0.9036.262> -------- OS Process #Port<0.63437> Output :: [[[["147402","TRANSFERTO","SUCCESS"],null]],[[["TRANSFERTO","SUCCESS","2015-11-23T06:43:07.5474967Z"],null]],[[["TRANSFERTO","SUCCESS","0001 -01-01T00:00:00"],null]]] 1981939-[debug] 2017-01-31T17:16:35.356108Z [email protected] <0.11910.262> -------- OS Process #Port<0.63508> Input :: ["map_doc",{"_id":"bill-197d71d3-3091-47ef-9efe-b154161fcbfb","_rev":"3-832e63f45b45d5e3008b7e7bbe2b7392","MetaData":{"PollId":"77802","CarrierId":"25402","UserPollStateId":"3256532401","CarrierName":"Airtel-Kenya","Pretups.Version":"5.1","Pretups.Uri":"https://41.223.56.108:8093/pretups/C2SReceiver","Auth.Login":"pretups","Auth.Password":"0971500a350af5c3d1c0b12221a0558c","Auth.GatewayCode":"EXTGW","Auth.GatewayType":"EXTGW","Auth.ServicePort":"190","Auth.SourceType":"EXT","Cmd.ExtNwCode":"KE","Cmd.Msisdn":"732810086","Cmd.Pin":"2549","Cmd.Login":"","Cmd.Password":"","Cmd.ExtCode":"2468","CountryCode":"254","MobilePhoneLength":"9","TestMobileNumber":"254733621719","Currency":"KES"},"UserId":"1000277123401","CreateDate":"2014-03-05T13:45:49.6889321Z","LastModifiedDate":"2014-03-05T13:46:14.8050931Z","SystemSource":"GeoPoll","AttemptCount":1,"BillingIdentifier":"bill-197d71d3-3091-47ef-9efe-b154161fcbfb","CallbackUri":"http://uk-app-3:8645/billingcallback","Activities":[{"CreateDate":"2014-03-05T13:46:14.2902898Z","State":"PROCESSING"},{"MetaData":{"Type":"EXRCTRFRESP","Txnid":"R140305.1648.210003","Txnstatus":"200","Date":"05/03/2014 16:48:40","Extrefnum":"","Data":null},"CreateDate":"2014-03-05T13:46:14.2912898Z","State":"SUCCESS"}],"Currency":"Kenyan_Shilling_KES","ConsumerIdentifier":"8963201","ToBeBilledIdentifier":"254735960469","BillType":"Carrier","BillProcessingStateAsString":"SUCCESS","Value":43.0,"BillProcessingState":"SUCCESS","BillingProvider":"PRETUPS","NextProcessingTime":"0001-01-01T00:00:00","NextProcessingTimeAsLong":0,"Id":"bill-197d71d3-3091-47ef-9efe-b154161fcbfb","CreatedDate":"2014-03-05T13:45:49.6889321Z","ModifiedDate":"2014-03-05T13:46:14.8050931Z","Type":"Bill"}] 1981940:[debug] 2017-01-31T17:32:57.300061Z [email protected] <0.111.0> -------- Supervisor couch_log_sup started couch_log_monitor:start_link() at pid <0.114.0> 1981941:[debug] 2017-01-31T17:32:57.301585Z [email protected] <0.111.0> -------- Supervisor couch_log_sup started config_listener_mon:start_link(couch_log_sup, nil) at pid <0.115.0> 1981942:[info] 2017-01-31T17:32:57.301605Z [email protected] <0.7.0> -------- Application couch_log started on node '[email protected]' 1981943:[debug] 2017-01-31T17:32:57.302447Z [email protected] <0.119.0> -------- Supervisor folsom_sup started folsom_sample_slide_sup:start_link() at pid <0.120.0> 1981944:[debug] 2017-01-31T17:32:57.303229Z [email protected] <0.119.0> -------- Supervisor folsom_sup started folsom_meter_timer_server:start_link() at pid <0.121.0> 1981945:[debug] 2017-01-31T17:32:57.303979Z [email protected] <0.119.0> -------- Supervisor folsom_sup started folsom_metrics_histogram_ets:start_link() at pid <0.122.0> 1981946:[info] 2017-01-31T17:32:57.304074Z [email protected] <0.7.0> -------- Application folsom started on node '[email protected]' 1981947:[debug] 2017-01-31T17:32:57.325716Z [email protected] <0.126.0> -------- Supervisor couch_stats_sup started couch_stats_aggregator:start_link() at pid <0.127.0> 1981948:[debug] 2017-01-31T17:32:57.326519Z [email protected] <0.126.0> -------- Supervisor couch_stats_sup started couch_stats_process_tracker:start_link() at pid <0.177.0> 1981949:[info] 2017-01-31T17:32:57.326595Z [email protected] <0.7.0> -------- Application couch_stats started on node '[email protected]' 1981950:[info] 2017-01-31T17:32:57.326673Z [email protected] <0.7.0> -------- Application khash started on node '[email protected]' 1981951:[debug] 2017-01-31T17:32:57.330327Z [email protected] <0.182.0> -------- Supervisor couch_event_sup2 started couch_event_server:start_link() at pid <0.183.0> 1981952:[debug] 2017-01-31T17:32:57.331211Z [email protected] <0.185.0> -------- Supervisor couch_event_os_sup started config_listener_mon:start_link(couch_event_os_sup, nil) at pid <0.186.0> 1981953:[debug] 2017-01-31T17:32:57.331268Z [email protected] <0.182.0> -------- Supervisor couch_event_sup2 started couch_event_os_sup:start_link() at pid <0.185.0> 1981954:[info] 2017-01-31T17:32:57.331367Z [email protected] <0.7.0> -------- Application couch_event started on node '[email protected]' 1981955:[debug] 2017-01-31T17:32:57.334167Z [email protected] <0.190.0> -------- Supervisor ibrowse_sup started ibrowse:start_link() at pid <0.191.0> 1981956:[info] 2017-01-31T17:32:57.334239Z [email protected] <0.7.0> -------- Application ibrowse started on node '[email protected]' 1981957:[debug] 2017-01-31T17:32:57.335727Z [email protected] <0.196.0> -------- Supervisor ioq_sup started config_listener_mon:start_link(ioq_sup, nil) at pid <0.197.0> 1981958:[debug] 2017-01-31T17:32:57.336685Z [email protected] <0.196.0> -------- Supervisor ioq_sup started ioq:start_link() at pid <0.198.0> 1981959:[info] 2017-01-31T17:32:57.336756Z [email protected] <0.7.0> -------- Application ioq started on node '[email protected]' 1981960:[info] 2017-01-31T17:32:57.336829Z [email protected] <0.7.0> -------- Application mochiweb started on node '[email protected]' 1981961:[info] 2017-01-31T17:32:57.336899Z [email protected] <0.7.0> -------- Application oauth started on node '[email protected]' 1981962:[info] 2017-01-31T17:32:57.340965Z [email protected] <0.204.0> -------- Apache CouchDB 2.0.0 is starting. For the Large database it would happen when we kicked off 1 out the 39 views on the database, however on the smaller database I would have to kick off all 5 views within the database. The large database has 9 design documents, with the smaller database having only 1. The views are all JS. Other than Fail2Ban, UFW, Logwatch, LogRotate, Monit and Zabbix-Agent there is nothing else running on the server. Except when we build it with Dreyfus and Clouseau. Example of one of the larger Design documents: { "_id": "_design/bills", "_rev": "4-b0ed6cf8f871391add5004f7e67bc3a8", "language": "javascript", "auto_update": true, "views": { "by_bill_date_and_bill_provider": { "map": "function(doc) {\n if (doc._id.indexOf(\"bill-\") === 0){\n var date = new Date(doc.CreatedDate?doc.CreatedDate:doc.CreateDate);\n var year = date.getFullYear();\n var month = (date.getMonth() + 1);\n var day = date.getDate();\n emit([year, month, day, doc.BillingProvider, doc.BillProcessingState], null);\n }\n}", "reduce": "_count" }, "by_poll_id_and_bill_date": { "map": "function(doc) {\n if ((doc._id.indexOf(\"bill-\") === 0) && doc.MetaData.PollId){\n var date = new Date(doc.CreateDate);\n var year = date.getFullYear().toString();\n var month = (date.getMonth() + 1).toString();\n var day = date.getDate().toString();\n if (day.length == 1){\n day = \"0\" + day;\n }\n\n emit([doc.MetaData.PollId, year, month, day], null);\n }\n}", "reduce": "_count" }, } } Example of a doc within the larger database: { "_id": "bill-e2a5a7d1-3d9f-4f9b-b526-13b80b9e6947", "_rev": "5-b40e00a54059c6c79004c0afd584fc60", "MetaData": { "PollId": "1844608", "CarrierId": "2701", "UserPollStateId": "12614468108" }, "UserId": "1002196088104", "CreateDate": "2017-01-31T07:20:58", "LastModifiedDate": "2017-01-31T07:21:14.2473555Z", "SystemSource": "GeoPoll", "AttemptCount": 1, "BillingIdentifier": "bill-e2a5a7d1-3d9f-4f9b-b526-13b80b9e6947", "CallbackUri": "http://XXXXXXXXXXX:8645/billingcallback", "CallbackSent": true, "Activities": [ { "MetaData": {}, "CreateDate": "2017-01-31T07:21:11.182049Z", "State": "PROCESSING" }, { "MetaData": { "VoucherPin": "", "OrderRef": "113234210", "TicketNumber": "", "BoxNumber": "", "BatchNumber": "", "ProcessingTime": "3064.3064" }, "CreateDate": "2017-01-31T07:21:11.1820491Z", "State": "SUCCESS" } ], "Currency": "South_African_Rand_ZAR", "ConsumerIdentifier": "XXXXXXXXXXXX", "ToBeBilledIdentifier": "XXXXXXXXXXXX", "BillType": "Carrier", "BillProcessingStateAsString": "SUCCESS", "Value": 2, "BillProcessingState": "SUCCESS", "BillingProvider": "VODACOMSA", "NextProcessingTime": "0001-01-01T00:00:00", "NextProcessingTimeAsLong": 0, "FinalProcessingTime": 0, "LastSubmittedDate": "0001-01-01T00:00:00", "Id": "bill-e2a5a7d1-3d9f-4f9b-b526-13b80b9e6947", "CreatedDate": "2017-01-31T07:20:58", "ModifiedDate": "2017-01-31T07:21:14.2473555Z", "Type": "Bill" } Docs usually go through 4-5 updates before they are finalized. Within the larger database we have 16,201,998 docs totaling 23 GB. No attachments. No other traffic besides a single user (me), including replication. No other patterns that stand out (to me at least). The memory usage grows and grows before eventually consuming the Swap space and running into a OOM kill. The other 11 nodes are affected. Thanks for your assistance!! -Tayven ________________________________ From: Jan Lehnardt <[email protected]> Sent: Tuesday, January 31, 2017 4:38 AM To: [email protected] Cc: Tayven Bigelow; Nick Becker Subject: Re: Crashing due to memory use Heya Nick and Tayven, I assume you posted multiple times because your mails didn’t show up immediately due to mailing list moderation. You are correct that the database size and hardware configuration should not cause any issues. Can you explain the scenario a little better? Is the memory leak happening when building your views for the first time? Does beam.smp terminate on its own or is it an OOM kill from the kernel? How many views do you have? How many design docs? JS views or Erlang views? Is there anything else running on these nodes? Can you share your view code? Can you share your couch.log? Can explain your document structure (total bytes, number of fields, attachments etc.). Can you describe your traffic pattern? Can you describe any other pattern that leads up to the memory leak? Does this happen on all nodes? If not, is there anything special about the affected nodes? (shameless plug, if you require professional assistance, my email footer has contact information) > On 31 Jan 2017, at 00:15, Tayven Bigelow <[email protected]> wrote: > > Hey Guys! > > > Been using a CouchDB 2.0 12 server cluster for a while now and have noticed a > memory leak that causes beam.smp to crash while populating Views. > > The q/r/w/n is set up as: > > [cluster] > q=12 > r=2 > w=2 > n=3 > > As far as I know the server should be able to handle the load as it has 64GB > RAM with a Core i7 6700. We are running ubuntu 16.04.1. > > The Database is 16.5 GB in size. > > > I've also attempted to run 2.0 with Dreyfus and Clouseau and ran into the > same issue with a Database size of 7.8MB. > > > I've noted in previous releases some people have ran into similar memory > issues with beam.smp and increasing the open file limit was part of the > resolution. We've increased the nofile limit for the couchdb user to 4096 (as > found here: https://wiki.apache.org/couchdb/Performance ) with no luck. Performance - Couchdb Wiki<https://wiki.apache.org/couchdb/Performance> wiki.apache.org With up to tens of thousands of documents you will generally find CouchDB to perform well no matter how you write your code. Once you start getting into ... > > > Nothing out of the ordinary is thrown in the logs. The only way to catch it > is by watching memory use. > > > I'm wondering if theres a configuration/setting somewhere that I am missing > that could be causing this issue. > > > Thanks! > > Tayven > > > > All information in this message is confidential and may be legally > privileged. If you are not the intended recipient, notify the sender > immediately and destroy this email. -- Professional Support for Apache CouchDB: https://neighbourhood.ie/couchdb-support/ Professional Support for Apache CouchDB™ - Neighbourhood<https://neighbourhood.ie/couchdb-support/> neighbourhood.ie Apache CouchDB is the first choice for geographically distributed database solutions. From cross data-centre clusters to offline-first mobile and web solutions ... Email: [email protected] All information in this message is confidential and may be legally privileged. If you are not the intended recipient, notify the sender immediately and destroy this email.
