I figured that, but when you asked for the version of them, I made me question myself!
So any ideas where I go from here? Mike -----Original Message----- From: Robert Newson [mailto:[email protected]] Sent: 16 April 2012 15:35 To: [email protected] Subject: Re: BigCouch - Replication failing with Cannot Allocate memory Bigcouch is built as an erlang release, so it includes all the bits of erlang needed to run. As part of the packaging, I also packaged spidermonkey, which should have been pulled in automatically. B. On 16 April 2012 15:32, Mike Kimber <[email protected]> wrote: > I used the instructions on http://bigcouch.cloudant.com/use for RHEL/centos > so used yum to install. Which installed bigcouch-0.4.0-1. > > I did not install Erlang and spidermonkey as the above seemed to do it for me > (I hope or I'm going to look v stupid and it would be a miracle its running > at all!) > > Mike > > -----Original Message----- > From: Robert Newson [mailto:[email protected]] > Sent: 14 April 2012 14:35 > To: [email protected] > Subject: Re: BigCouch - Replication failing with Cannot Allocate memory > > Mike, > > Thanks for the logs, they do look clean, as you said. > > It was remiss of me not to ask for version numbers. Can you tell me > which bigcouch version, erlang version, spidermonkey version you have > here? > > B. > > On 13 April 2012 21:18, Mike Kimber <[email protected]> wrote: >> A clean log file (i.e. stop bigcouch, delete log file, restart bigcouch, run >> replication, wait for failure, stop bigcouch) from the node that failed this >> time around can be found at: >> >> http://pastebin.com/embed_js.php?i=s52rYwwy >> >> Mike >> >> -----Original Message----- >> From: Robert Newson [mailto:[email protected]] >> Sent: 13 April 2012 19:28 >> To: [email protected] >> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory >> >> Mike, >> >> Do you have couch.logs from around that time? >> >> B. >> >> On 13 April 2012 17:54, Mike Kimber <[email protected]> wrote: >>> Sorry forgot to say that I have already up'd it to N=3 and still get the >>> same issue. >>> >>> I ran it again with the 6GB of RAM on each of the servers and ran vmstat >>> and got the following: >>> >>> r b swpd free buff cache si so bi bo in cs us sy id >>> wa st >>> 3 0 0 2067468 31816 302204 0 0 0 5 1820 360 63 32 5 >>> 0 0 >>> 2 0 0 2457728 31816 302212 0 0 0 2 2188 322 70 25 4 >>> 0 0 >>> 2 0 0 1936092 31816 302212 0 0 0 0 3020 200 73 24 3 >>> 0 0 >>> 2 0 0 687428 31816 302212 0 0 0 1 1958 368 56 42 2 >>> 0 0 >>> 2 0 0 2128192 31824 302212 0 0 0 2 2779 243 64 29 7 >>> 0 0 >>> 1 0 0 1829848 31824 302216 0 0 0 0 1734 280 68 29 3 >>> 0 0 >>> 1 0 0 1200300 31832 302216 0 0 0 8 1841 231 43 13 44 >>> 0 0 >>> 2 0 0 1638752 31840 302208 0 0 0 5 2625 350 71 20 8 >>> 0 0 >>> 3 0 0 1670856 31848 302216 0 0 0 3 2150 492 40 21 39 >>> 0 0 >>> 2 0 0 1020848 31848 302216 0 0 0 0 2307 644 67 22 11 >>> 0 0 >>> 1 0 0 271640 31848 302216 0 0 0 6 1995 280 54 42 4 >>> 0 0 >>> 1 0 0 455408 31848 302216 0 0 0 1 1879 238 64 33 3 >>> 0 0 >>> 2 0 0 1240616 25584 193044 0 0 0 2 2408 232 59 34 8 >>> 0 0 >>> 2 0 0 611280 25592 193036 0 0 0 3 2286 246 72 25 2 >>> 0 0 >>> 2 0 0 679548 25592 193044 0 0 0 2 3038 175 78 21 2 >>> 0 0 >>> 2 0 0 786360 25600 193044 0 0 0 3 1679 269 74 23 3 >>> 0 0 >>> 2 0 0 568632 25600 193044 0 0 0 0 2796 274 74 24 2 >>> 0 0 >>> eheap_alloc: Cannot allocate 1824525600 bytes of memory (of type "heap"). >>> 0 0 0 5749480 25600 193044 0 0 0 0 1389 160 33 15 52 >>> 0 0 >>> 0 0 0 5749956 25608 193044 0 0 0 10 1007 82 0 0 >>> 100 0 0 >>> 0 0 0 5749988 25616 193036 0 0 0 3 1016 85 0 0 >>> 100 0 0 >>> 0 0 0 5750020 25616 193044 0 0 0 0 998 79 0 0 >>> 100 0 0 >>> 0 0 0 5750168 25620 193040 0 0 0 1 1007 87 0 0 >>> 100 0 0 >>> 0 0 0 5750308 25620 193044 0 0 0 0 1008 82 0 0 >>> 100 0 0 >>> >>> I really need to work out what each process is doing with respect to memory >>> at the time of failure. I had top running, but not on the node that failed >>> this time, sods law :-) >>> >>> Mike >>> >>> -----Original Message----- >>> From: Robert Newson [mailto:[email protected]] >>> Sent: 13 April 2012 17:31 >>> To: [email protected] >>> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory >>> >>> I should note that bigcouch is tested much more often with N=3. >>> Perhaps there's something about N=1 that exasperates the issue. For a >>> test, could you try with N=3? >>> >>> B. >>> >>> On 13 April 2012 16:24, Mike Kimber <[email protected]> wrote: >>>> "1. Try to replicate the database in another CouchDB." >>>> >>>> I have done this to a couchdb 1.2 database successfully. FYI The Source DB >>>> is a couchdb 1.1.1. >>>> >>>> I haven't done the other tests, but have tested replicating from the >>>> couchdb 1.2 database to the bigcouch install and got the same issue. >>>> >>>> Mike >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: CGS [mailto:[email protected]] >>>> Sent: 13 April 2012 15:01 >>>> To: [email protected] >>>> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory >>>> >>>> If you say so, Robert, I won't argue with you on that. I meant no offense, >>>> so, please, accept my apologies if I crossed the line. It's all your's from >>>> now on. >>>> >>>> Mike, please, ignore my suggestion. Sorry for interfering. >>>> >>>> Good luck! >>>> >>>> CGS >>>> >>>> >>>> >>>> >>>> On Fri, Apr 13, 2012 at 3:19 PM, Robert Newson <[email protected]> wrote: >>>> >>>>> I think you should point out that "My idea behind these tests is that >>>>> it may be that your database may be >>>>> corrupted (or seen as corrupted by BigCouch at the second test) and what >>>>> you get is just garbage at a certain document. " is based on no >>>>> evidence. Nor, if it were true, would it necessarily explain the >>>>> observed behavior either. >>>>> >>>>> It would be useful if we could all stick to asserting only things we >>>>> know to be true or have reasonable grounds to believe are true. >>>>> Unfounded speculation, though offered sincerely, is not helpful on a >>>>> mailing list intended to provide assistance. >>>>> >>>>> Thanks, >>>>> B. >>>>> >>>>> On 13 April 2012 13:55, CGS <[email protected]> wrote: >>>>> > Hi Mike, >>>>> > >>>>> > I haven't used BigCouch by now and that's why I haven't said anything by >>>>> > now. Still, giving a thought of what may occur there, I propose few >>>>> > tests >>>>> > if you have time: >>>>> > 1. Try to replicate the database in another CouchDB. >>>>> > 2. If 1 passes, try to replicate to only one node at the time. >>>>> > 3. If 2 passes, increase the pool of nodes with 1 and repeat the >>>>> > replication (for sure it will fail at all 3 nodes at the time). >>>>> > >>>>> > My idea behind these tests is that it may be that your database may be >>>>> > corrupted (or seen as corrupted by BigCouch at the second test) and what >>>>> > you get is just garbage at a certain document. That's why I proposed the >>>>> > first test. The second test is to see if any of the nodes has a problem >>>>> in >>>>> > configuration (or if there is any incompatibility in between your >>>>> > CouchDB >>>>> > and BigCouch in manipulating your docs). Finally, the third test is to >>>>> see >>>>> > if server/node resources limit the number of replications (and at how >>>>> many >>>>> > it starts to fail). >>>>> > >>>>> > Can you also check the size of the shards at tests 2 and 3? >>>>> > >>>>> > If you consider that these tests are irrelevant, please, ignore my >>>>> > suggestion. >>>>> > >>>>> > CGS >>>>> > >>>>> > >>>>> > >>>>> > On Fri, Apr 13, 2012 at 1:27 PM, Mike Kimber <[email protected]> wrote: >>>>> > >>>>> >> I upped the memory to 6GB on each of the nodes and got exactly the same >>>>> >> issue in the same time frame i.e. the increased RAM did not seem to by >>>>> me >>>>> >> any additional time. >>>>> >> >>>>> >> Mike >>>>> >> >>>>> >> -----Original Message----- >>>>> >> From: Robert Newson [mailto:[email protected]] >>>>> >> Sent: 12 April 2012 19:34 >>>>> >> To: [email protected] >>>>> >> Subject: Re: BigCouch - Replication failing with Cannot Allocate memory >>>>> >> >>>>> >> 2GB total ram does sound tight. I can only compare to high volume >>>>> >> production clusters which have much more ram than this. Given that >>>>> >> beam.smp wanted 1.4 gb and you have 2gb total, do you know where the >>>>> >> rest one? To couchjs processes, by chance? If so, you can reduce the >>>>> >> maximum size of that pool in config, I think the default is 50. >>>>> >> >>>>> >> On 12 April 2012 18:32, Mike Kimber <[email protected]> wrote: >>>>> >> > Ok, I have 3 nodes all load balanced with HAproxy: >>>>> >> > >>>>> >> > Centos 5.8 (Virtualised) >>>>> >> > 2 Cores >>>>> >> > 2GB RAM >>>>> >> > >>>>> >> > I'm trying to replicate about 75K documents which total 6GB when >>>>> >> compacted (0n Couchdb 1.2 which has compression turned on). I'm told >>>>> they >>>>> >> are fairly large documents. >>>>> >> > >>>>> >> > When it goes pear shaped Vsmstat starts using a lot of memory: >>>>> >> > >>>>> >> > procs -----------memory---------- ---swap-- -----io---- --system-- >>>>> >> -----cpu------ >>>>> >> > r b swpd free buff cache si so bi bo in cs us >>>>> sy >>>>> >> id wa st >>>>> >> > 1 2 570576 8808 140 7208 2998 2249 3154 2249 1234 569 1 >>>>> 6 >>>>> >> 2 91 0 >>>>> >> > 0 2 569656 9156 156 7504 2330 1899 2405 1904 1246 595 1 >>>>> 5 >>>>> >> 9 85 0 >>>>> >> > 1 1 575412 9516 236 14928 1549 2261 3242 2261 1237 593 1 >>>>> 7 >>>>> >> 1 91 0 >>>>> >> > 0 2 607092 13220 168 8156 3772 9012 3871 9017 1284 714 1 >>>>> 10 >>>>> >> 4 85 0 >>>>> >> > 1 0 444336 857004 220 10212 5781 0 6202 0 1574 1010 13 >>>>> 7 >>>>> >> 33 47 0 >>>>> >> > 1 0 442176 870684 428 11052 2049 0 2208 140 2561 1541 17 >>>>> 8 >>>>> >> 49 26 0 >>>>> >> > 0 0 442176 813140 460 11968 170 0 348 0 2672 1565 25 >>>>> 9 >>>>> >> 61 4 0 >>>>> >> > 0 1 442176 744972 484 12224 5440 0 5493 7 2432 900 8 >>>>> 4 >>>>> >> 49 40 0 >>>>> >> > 0 1 442176 714048 484 12296 4547 0 4547 0 1799 827 4 >>>>> 2 >>>>> >> 50 44 0 >>>>> >> > 0 1 442176 686304 496 12688 5128 0 5222 0 1696 999 9 >>>>> 2 >>>>> >> 50 40 0 >>>>> >> > 0 3 444000 8712 444 12876 299 368 331 380 1294 188 22 >>>>> 20 >>>>> >> 36 23 0 >>>>> >> > 0 3 469340 10040 116 7336 29 5087 74 5090 1232 268 3 >>>>> 22 >>>>> >> 0 75 0 >>>>> >> > 1 2 584356 10220 124 6744 11367 28722 11370 28722 1643 1300 >>>>> >> > 5 >>>>> >> 19 17 59 0 >>>>> >> > 0 1 624908 10640 132 7036 6518 12879 6590 12884 1296 717 3 >>>>> 10 >>>>> >> 29 58 0 >>>>> >> > 0 2 652556 10948 252 14776 3799 9494 5459 9494 1294 646 2 >>>>> 9 >>>>> >> 32 57 0 >>>>> >> > 0 2 677784 10648 244 14528 3819 8196 3819 8201 1274 588 2 >>>>> 7 >>>>> >> 30 61 0 >>>>> >> > 0 2 688460 9512 212 8224 3013 4522 3125 4522 1379 519 2 >>>>> 7 >>>>> >> 6 84 0 >>>>> >> > 0 3 699164 9888 208 8468 2192 4014 2228 4014 1302 495 1 >>>>> 6 >>>>> >> 11 83 0 >>>>> >> > 2 0 713104 9004 144 9192 2606 4490 2848 4490 1350 487 1 >>>>> 8 >>>>> >> 16 75 0 >>>>> >> > >>>>> >> > It only ever takes out one node at a time and the other nodes seem to >>>>> be >>>>> >> doing very little while the one node is running out of memory. >>>>> >> > >>>>> >> > If I kick it off again it processed some more and then spikes the >>>>> memory >>>>> >> and fails >>>>> >> > >>>>> >> > Thanks >>>>> >> > >>>>> >> > Mike >>>>> >> > >>>>> >> > PS: hope you enjoyed you couchdb get together! >>>>> >> > >>>>> >> > -----Original Message----- >>>>> >> > From: Robert Newson [mailto:[email protected]] >>>>> >> > Sent: 12 April 2012 17:28 >>>>> >> > To: [email protected] >>>>> >> > Subject: Re: BigCouch - Replication failing with Cannot Allocate >>>>> memory >>>>> >> > >>>>> >> > What kind of load were you putting the machine on? >>>>> >> > >>>>> >> > On 12 April 2012 17:24, Robert Newson <[email protected]> wrote: >>>>> >> >> Could you show your vm.args file? >>>>> >> >> >>>>> >> >> On 12 April 2012 17:23, Robert Newson <[email protected]> wrote: >>>>> >> >>> Unfortunately your request for help coincided with the two day >>>>> CouchDB >>>>> >> >>> Summit. #cloudant and the Issues tab on cloudant/bigcouch are other >>>>> >> >>> ways to get bigcouch support, but we happily answer queries here >>>>> too, >>>>> >> >>> when not at the Model UN of CouchDB. :D >>>>> >> >>> >>>>> >> >>> B. >>>>> >> >>> >>>>> >> >>> On 12 April 2012 17:10, Mike Kimber <[email protected]> wrote: >>>>> >> >>>> Looks like this isn't the right place based on the responses so >>>>> far. >>>>> >> Shame I hoped this was going to help solve our index/view rebuild times >>>>> etc. >>>>> >> >>>> >>>>> >> >>>> Mike >>>>> >> >>>> >>>>> >> >>>> -----Original Message----- >>>>> >> >>>> From: Mike Kimber [mailto:[email protected]] >>>>> >> >>>> Sent: 10 April 2012 09:20 >>>>> >> >>>> To: [email protected] >>>>> >> >>>> Subject: BigCouch - Replication failing with Cannot Allocate >>>>> >> >>>> memory >>>>> >> >>>> >>>>> >> >>>> I'm not sure if this is the correct place to raise an issue I am >>>>> >> having with replicating a standalone couchdb 1.1.1 to a 3 node BigCouch >>>>> >> cluster? If this is not the correct place please point me in the right >>>>> >> direction if it is then any one have any ideas why I keep getting the >>>>> >> following error message when I kick of a replication; >>>>> >> >>>> >>>>> >> >>>> eheap_alloc: Cannot allocate 1459620480 bytes of memory (of type >>>>> >> "heap"). >>>>> >> >>>> >>>>> >> >>>> My set-up is: >>>>> >> >>>> >>>>> >> >>>> Standalone couchdb 1.1.1 running on Centos 5.7 >>>>> >> >>>> >>>>> >> >>>> 3 Node BigCouch cluster running on Centos 5.8 with the following >>>>> >> local.ini overrides pulling from the Standalone couchdb (78K documents) >>>>> >> >>>> >>>>> >> >>>> [httpd] >>>>> >> >>>> bind_address = XXX.XX.X.XX >>>>> >> >>>> >>>>> >> >>>> [cluster] >>>>> >> >>>> ; number of shards for a new database >>>>> >> >>>> q = 9 >>>>> >> >>>> ; number of copies of each shard >>>>> >> >>>> n = 1 >>>>> >> >>>> >>>>> >> >>>> [couchdb] >>>>> >> >>>> database_dir = /other/bigcouch/database >>>>> >> >>>> view_index_dir = /other/bigcouch/view >>>>> >> >>>> >>>>> >> >>>> The error is always generate on the third node in the cluster and >>>>> the >>>>> >> server basically max's out on memory before hand. The other nodes seem >>>>> to >>>>> >> be doing very little, but are getting data i.e. the shard sizes are >>>>> >> growing. I've put the copies per shard down to 1 as currently I'm not >>>>> >> interested in resilience. >>>>> >> >>>> >>>>> >> >>>> Any help would be greatly appreciated. >>>>> >> >>>> >>>>> >> >>>> Mike >>>>> >> >>>> >>>>> >> >>>>>
