Oh jeeze. I'm so sorry Joan! I saw the J in the name and just ran with it.

Some additional info that I've gotten since yesterday.  With Joan's 
recommendation to swap the ERL_MAX_PORTS with +Q I decided to check into other 
performance gains that could be achieved with erlang that might have been 
missing.  I've since increased the asynchronous I/O threads from 16 to 600. 
With this running we are able to process 1 view without it crashing, however 
when I start running more than one view it starts to increase memory use and 
eventually crash.


> This is unusual. And you don't see similar high memory use for couchjs 
> processes?

Correct. Looking at the tree info via htop, the beam.smp processes are the only 
ones using a high amount of memory.


> Using netstat, how many active connections do you have open on a server when 
> beam.smp is eating lots of RAM? On which ports? A summary report would be 
> useful.

These were taken when some of the cluster was crashing, and the current server 
was using full RAM and 50% of the swap.
netstat -s results in:
https://gist.github.com/anonymous/e240084267242763caf85ea37704cfe7

netstat -apA inet | grep beam.smp  results in:
https://gist.github.com/anonymous/86e8a4a3943d81ed74f7a6124ce79288

> How many databases do you actually have on the machine? You mention two 
> databases, but it's unclear to me how many are actually resident. This 
> includes any backup copies of your database you may have in place visible to 
> CouchDB.
At the moment, including _global_changes, admin, _users, _replicator, and 
_metadata we have 11 databases. We hope to be able to replicate our current 
cloudant set up which would have 185 databases.

> Is this a view you've ever run in production on Cloudant, or something new 
> you're trying only on your local instance? Is this view perhaps using the 
> experimental nodejs view server?
This view is something that we've ran for a while. It began in a locally hosted 
1.6.1 couchdb instance, before being ran in cloudant. It's not using the Nodejs 
view server. We've only included that to try and use clouseau/dreyfus full text 
search. At the moment its not being utilized.

> Are you launching couchdb with any special flags being passed to the Erlang 
> erl process besides ERL_MAX_PORTS?
Prior to this response, outside of the standard 2.0.0 the only other change 
would be forcing the port range. I've since increased the Asynchronous I/O to 
600.
The full arguments being passed to it right now from vm.args are:
-name <nodename> , -setcookie <cookie>, -kernel error_logger silent, -sasl 
sasl_error_logger false, -kernel inet_dist_listen_max 9500, -kernel 
inet_dist_listen_min 9000, +K true, +Bd -noinput, +A 600, +Q 4096

> monitor output of erlang:memory().
-- Note that this is after changing to +Q and increasing +A --
https://gist.github.com/anonymous/720f3e1aa8f444fafe586111bfa81cc4

> Output of ets:i().

At the Start (no high memory use)
 https://gist.github.com/anonymous/20e7fa4387d6d116f2cafc378cf9c9b0

During high memory use:
https://gist.github.com/anonymous/5aa63ec31261402f5750d681d842229b


> and, if you add Fred's recon to your install

> I recommend reducing or disabling swap for best performance.

> Another option is to edit the oom adjuster for beam.smp versus other 
> processes, such as couchjs. This is done through the tunable 
> /proc/<pid>/oom_score_adj, setting the value strongly negative for beam.smp 
> (range is -1000 to +1000) and setting it mildly higher for couchjs.


I'll work on adding these in :)


We're currently working on building a DB with non sensitive material in case 
you would like to try.


Thanks for the recommended reading material as well, if it isn't obvious I 
haven't had much experience in using erlang.


-Tayven


________________________________
From: Joan Touzet <[email protected]>
Sent: Tuesday, January 31, 2017 2:00:43 PM
To: [email protected]
Cc: Nick Becker; Tayven Bigelow
Subject: Re: Crashing due to memory use

Hi Tayven,

> Jan,

Joan, actually. Jan is also on this thread. :)

A few things stand out here. I'm going to heavily trim your
emails for clarity.

----- Original Message -----

> At the time of crash the kernel is reporting that beam.smp is
> consuming 62G of memory + 32G of swap.

This is unusual. And you don't see similar high memory use for
couchjs processes?

I recommend reducing or disabling swap for best performance.

Another option is to edit the oom adjuster for beam.smp versus
other processes, such as couchjs. This is done through the
tunable /proc/<pid>/oom_score_adj, setting the value strongly
negative for beam.smp (range is -1000 to +1000) and setting it
mildly higher for couchjs. Documentation for this is at

https://www.kernel.org/doc/Documentation/filesystems/proc.txt
/proc filesysem documentation - Linux 
kernel<https://www.kernel.org/doc/Documentation/filesystems/proc.txt>
www.kernel.org
/proc/ /coredump_filter allows you to customize which memory segments will be 
dumped when the process is dumped. coredump_filter is a bitmask of memory types.




in section 3.1.

> In Local.ini the changes from the base file are:
[snip]

>  max_connections = 1024
Presuming this is in your [httpd] section, it won't have much
effect, since this only affects the old interface (running on
port 5986).

Using netstat, how many active connections do you have open on a
server when beam.smp is eating lots of RAM? On which ports? A
summary report would be useful.

>  max_dbs_open = 500

How many databases do you actually have on the machine? You
mention two databases, but it's unclear to me how many are actually
resident. This includes any backup copies of your database you may
have in place visible to CouchDB.

>  nodejs = /usr/local/bin/node /home/couchdb/couchdb/share/server/main.js

Apache CouchDB considers this view server experimental. You run it
at your own risk. Though, if this was at fault, I'd expect to see
nodejs processes consuming more RAM and CPU resources than beam.smp
itself. Also, you'd have to be declaring your view's language as
nodejs instead of javascript, which you're not doing per your sample
design document.

> The memory leak happens when we kick off a new view.

Is this a view you've ever run in production on Cloudant, or
something new you're trying only on your local instance? Is this
view perhaps using the experimental nodejs view server?

-----

Are you launching couchdb with any special flags being passed to
the Erlang erl process besides ERL_MAX_PORTS?

Note that in recent versions of Erlang, ERL_MAX_PORTS has been
replaced by the +Q flag. ERL_MAX_PORTS has no effect on these
newer versions. Check the documentation for your specific version
of Erlang.

Recommendation:

If you're going to be running a big cluster on your own, read Fred
Hebert's great free book Stuff Goes Bad: Erlang in Anger.

  http://www.erlang-in-anger.com/
Stuff Goes Bad: Erlang in Anger<http://www.erlang-in-anger.com/>
www.erlang-in-anger.com
Stuff Goes Bad Erlang in Anger Free Ebook. This book intends to be a little 
guide about how to be the Erlang medic in a time of war.




and pay special attention to chapters 4, 5 & 7. Specifically, if
you can get on the node during periods of high memory usage with a
remsh:

$ erl -setcookie <cookie> -name tayven@localhost \
  -remsh couchdb@localhost -hidden

and at least monitor the output of:

  1> erlang:memory().
  2> ets:i().

and, if you add Fred's recon to your install,

  3> recon:proc_count(memory, 3).
  4> recon:proc_count(binary_memory, 3).

we'll know more.

We don't have a smoking gun yet, but hopefully with more data, we
can help you narrow in on one.

-Joan
All information in this message is confidential and may be legally privileged. 
If you are not the intended recipient, notify the sender immediately and 
destroy this email.

Reply via email to