Hi Brian, Here is the output of ulimit -a on my system:
r...@ip-10-250-55-239:/tmp/collectl-3.3.6# ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 20 file size (blocks, -f) unlimited pending signals (-i) 16382 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) unlimited virtual memory (kbytes, -v) unlimited file locks (-x) unlimited It doesn't seem to me like I am limited. But if someone who knows better about these configuration items see something that might be relevant please chime in (max locked memory and stack size limits might be relevant?)! Regarding the process size, Paul and I worked offline to talk through lots of specifics We did run a little highwater monitoring script which I posted about earlier and that did not show any significant growth of the beam process. Ultimately we never did get to an understanding of what was causing sudden VM death. So since I was never able to get through a full indexing run, in the name of expediency I am going to take a different approach. The good news though is that there did not appear to be an obvious leak. I'll be starting my migration over again to build a new Couch DB. This time I will be using bulk inserts and sequential ID's from the getgo (I started about halfway through last time). In addition I am going to bulk insert 1000 docs at a time, and every million docs (I have 28mm) I will request the view to force an indexing, as well as doing periodic DB and view compaction. Hopefully this will build up the DB and views concurrently and avoid the problems we were having building an index from scratch from a 58GB DB which I was never able to complete without VM death. Unfortunately we didn't get to the bottom of this to understand the root cause. Some additional steps I will take which will hopefully help are: - I will install couchdb itself on the 'local' ephemeral disk instead of on the EBS volume. - I will create a new EBS volume which I will dedicate for CouchDB which will hopefully provide more IO throughput so I can read from MySQL on one EBS volume and write to CouchDB DB/View on another volume. - I will write 000's to the entirety of the EBS volume before first use. This apparently alleviated a 5x write penalty on *first* write to any block on an EBS volume. I'll post an update once I am running and if all is looking well. I can also potentially share my Ruby migration script if anyone is interested. I do very much appreciate the tremendous amount of help that this list has provided, and I especially want to thank Paul Davis for his interest in getting to the heart of this and the several hours he spent with me screen sharing to help co-diagnose the root cause. Cheers, Glenn On Wed, Oct 7, 2009 at 12:39 AM, Brian Candler <[email protected]> wrote: > On Tue, Oct 06, 2009 at 01:36:13PM -0700, Glenn Rempe wrote: >> No. I have not noticed any correlation with the time. Sometimes I >> have seen it run during the day and die as well. I've seen it die >> lots of times... ;-) It seems like it is always dying though >> somewhere between 2 and 6 million records processed out of 28 mm >> (which might support the theory of memory starvation of some kind if >> it is holding some of those records in memory unintentionally, even >> though top reports nothing more than 4GB out of 15GB being used). > > You might have a per-process memory limit of some sort: either ulimit (see > "ulimit -a"); or a hard-coded limitation in your O/S which limits a single > process to 4GB, for example; or conceivably the erlang VM could have a 4GB > limit. > > [I do vaguely remember something about people saying you should build erlang > in 32-bit mode even under a 64-bit OS, but I could well have that wrong] > > Either way, if your process memory usage is continually growing and also > approaching 4GB, I would be concerned. I don't see any reason why building a > view should take an increasing amount of memory. It sounds like a leak. > > Regards, > > Brian. > -- Glenn Rempe email : [email protected] voice : (415) 894-5366 or (415)-89G-LENN twitter : @grempe contact info : http://www.rempe.us/contact.html pgp : http://www.rempe.us/gnupg.txt
