Re: relayd memory usage when loading large URL lists

2015-03-04 Thread Stuart Henderson
On 2015-03-01, Felipe Scarel fbsca...@gmail.com wrote:
 Now loading the phishing/domains URL list, which has about ~63k
 entries. relayd's parent process ballons to over 2GB memory usage
 (I'm assuming it's reading the URL lists and building a data structure
 for the relays),

Yes, it's building a red-black tree structure during startup. 

 So that's about ~520 MB of memory per relay process, out of 3 total.

This is probably shared (fork does copy-on-write, so forked processes can
just use the original memory unless they make changes to it). Try adjusting
the prefork number and check the free memory with top(1) rather than the
per-process memory with ps(1).



Re: relayd memory usage when loading large URL lists

2015-03-04 Thread Felipe Scarel
On Wed, Mar 4, 2015 at 6:29 AM, Stuart Henderson s...@spacehopper.org wrote:
 On 2015-03-01, Felipe Scarel fbsca...@gmail.com wrote:
 Now loading the phishing/domains URL list, which has about ~63k
 entries. relayd's parent process ballons to over 2GB memory usage
 (I'm assuming it's reading the URL lists and building a data structure
 for the relays),

 Yes, it's building a red-black tree structure during startup.


Nice to know.

 So that's about ~520 MB of memory per relay process, out of 3 total.

 This is probably shared (fork does copy-on-write, so forked processes can
 just use the original memory unless they make changes to it). Try adjusting
 the prefork number and check the free memory with top(1) rather than the
 per-process memory with ps(1).


Alright, I'll do that. In other news, Reyk replied to me via Twitter
saying that relayd is not optimized for large blacklists yet. I'll
keep using the current version for the time being, as ~100k URLs is
sufficient for my current demand.

Thanks for your help!



Re: relayd memory usage when loading large URL lists

2015-03-02 Thread Felipe Scarel
On Sun, Mar 1, 2015 at 4:45 PM, Felipe Scarel fbsca...@gmail.com wrote:
 Hello all,

 I'm implementing a simple SSL forward proxy using relayd.
 Configuration has been fine, as was testing. There seems to be one
 issue with memory consumption, however.

 To better illustrate my issue, here follows an excerpt of /etc/relayd.conf :

 http protocol httpsfilter {
   tcp { nodelay, sack, socket buffer 65536, backlog 1024 }
   return error

   match header set Keep-Alive value $TIMEOUT
   match header set Connecton value close

   pass quick url file /etc/relayd.d/custom_whitelist
   block url file /etc/relayd.d/custom_blacklist
   include /etc/relayd.d/auto_blacklist

   ssl ca key  /etc/ssl/private/ca.key password password
   ssl ca cert /etc/ssl/ca.crt
 }

 So basically it checks against a custom whitelist, then a custom
 blacklist, and finally an auto blacklist (which is the main source
 of the problem). Using a few URLs with both custom black/white lists
 poses no issue, but when attempting to load a somewhat bigger URL list
 downloaded from the internet (I'm using
 ftp://ftp.ut-capitole.fr/pub/reseau/cache/squidguard_contrib/blacklists.tar.gz)
 I run into memory problems.

 For example, here is relayd's memory usage when only the custom
 white/black lists are loaded (2 URLs total, no big deal):

 # ps aux | grep relayd
 USER   PID %CPU %MEM   VSZ   RSS TT  STAT  STARTED   TIME COMMAND
 _relayd  17238  0.0  0.1  1528  3208 ??  I  3:27PM0:00.01
 relayd: relay (relayd)
 _relayd  14280  0.0  0.1  1524  3176 ??  I  3:27PM0:00.02
 relayd: relay (relayd)
 _relayd  30448  0.0  0.1  1396  2812 ??  I  3:27PM0:00.01
 relayd: ca (relayd)
 _relayd  10020  0.0  0.1  1376  2768 ??  I  3:27PM0:00.01
 relayd: ca (relayd)
 _relayd  25775  0.0  0.1  1400  2852 ??  I  3:27PM0:00.01
 relayd: ca (relayd)
 root   346  0.0  0.1  1912  3672 ??  Is 3:27PM0:00.02
 relayd: parent (relayd)
 _relayd  15883  0.0  0.1  1440  2828 ??  I  3:27PM0:00.01
 relayd: pfe (relayd)
 _relayd  32000  0.0  0.1  1220  2560 ??  I  3:27PM0:00.01
 relayd: hce (relayd)
 _relayd   2677  0.0  0.1  1516  3188 ??  I  3:27PM0:00.01
 relayd: relay (relayd)

 Now loading the phishing/domains URL list, which has about ~63k
 entries. relayd's parent process ballons to over 2GB memory usage
 (I'm assuming it's reading the URL lists and building a data structure
 for the relays), and after that the relays stabilize with the
 following memory usage:

 # ps aux | grep relayd
 USER   PID %CPU %MEM   VSZ   RSS TT  STAT  STARTED   TIME COMMAND
 _relayd  12982  0.0 12.9 516728 526288 ??  S  3:31PM0:03.44
 relayd: relay (relayd)
 _relayd   1206  0.0  0.1  1368  2836 ??  I  3:31PM0:00.01
 relayd: ca (relayd)
 root 25673  0.0  2.7 155616 111228 ??  Is 3:31PM0:16.35
 relayd: parent (relayd)
 _relayd  15513  0.0  0.1  1416  2832 ??  S  3:31PM0:00.01
 relayd: pfe (relayd)
 _relayd  15643  0.0  0.1  1200  2560 ??  I  3:31PM0:00.01
 relayd: hce (relayd)
 _relayd  25822  0.0 12.9 516716 526296 ??  S  3:31PM0:03.37
 relayd: relay (relayd)
 _relayd  17950  0.0  0.1  1380  2824 ??  I  3:31PM0:00.01
 relayd: ca (relayd)
 _relayd   9068  0.0  0.1  1360  2784 ??  I  3:31PM0:00.01
 relayd: ca (relayd)
 _relayd  19666  0.0 12.9 516712 526292 ??  S  3:31PM0:03.46
 relayd: relay (relayd)

 So that's about ~520 MB of memory per relay process, out of 3 total.
 Next I load another URL list alongside the previous one, the
 adult/urls list, which contains roughtly ~55k entries. Adding up
 with the previous list, we have more or less ~118k URLs for relayd to
 process. The parent process takes a couple minutes to process
 everything, going over 4GB VSZ and 2.2GB RSS. After all's said and
 done, here's what's shown by ps:

 # ps aux | grep relayd
 USER   PID %CPU %MEM   VSZ   RSS TT  STAT  STARTED   TIME COMMAND
 _relayd   6332  0.0  0.1  1428  2228 ??  I  3:35PM0:00.01
 relayd: ca (relayd)
 _relayd   8736  0.0 23.9 967808 976768 ??  I  3:35PM0:06.81
 relayd: relay (relayd)
 _relayd  22890  0.0 23.9 967812 976768 ??  I  3:35PM0:06.77
 relayd: relay (relayd)
 _relayd   5871  0.0 23.9 967804 976760 ??  I  3:35PM0:06.33
 relayd: relay (relayd)
 _relayd   8199  0.0  0.1  1440  2256 ??  I  3:35PM0:00.01
 relayd: ca (relayd)
 root  5571  0.0  5.3 315032 214796 ??  Is 3:35PM1:28.45
 relayd: parent (relayd)
 _relayd  30781  0.0  0.1  1488  2136 ??  S  3:35PM0:00.01
 relayd: pfe (relayd)
 _relayd   1502  0.0  0.0  1272  2040 ??  I  3:35PM0:00.01
 relayd: hce (relayd)
 _relayd  29135  0.0  0.1  1432  2236 ??  I  3:35PM0:00.01
 relayd: ca (relayd)

 Nearly 1GB of RAM per relay process, and ~214 MB to the parent
 process. This server I'm working with has 4GB of RAM, so it can't go
 much further. If I attempt to load the biggest URL list from the set,
 

relayd memory usage when loading large URL lists

2015-03-01 Thread Felipe Scarel
Hello all,

I'm implementing a simple SSL forward proxy using relayd.
Configuration has been fine, as was testing. There seems to be one
issue with memory consumption, however.

To better illustrate my issue, here follows an excerpt of /etc/relayd.conf :

http protocol httpsfilter {
  tcp { nodelay, sack, socket buffer 65536, backlog 1024 }
  return error

  match header set Keep-Alive value $TIMEOUT
  match header set Connecton value close

  pass quick url file /etc/relayd.d/custom_whitelist
  block url file /etc/relayd.d/custom_blacklist
  include /etc/relayd.d/auto_blacklist

  ssl ca key  /etc/ssl/private/ca.key password password
  ssl ca cert /etc/ssl/ca.crt
}

So basically it checks against a custom whitelist, then a custom
blacklist, and finally an auto blacklist (which is the main source
of the problem). Using a few URLs with both custom black/white lists
poses no issue, but when attempting to load a somewhat bigger URL list
downloaded from the internet (I'm using
ftp://ftp.ut-capitole.fr/pub/reseau/cache/squidguard_contrib/blacklists.tar.gz)
I run into memory problems.

For example, here is relayd's memory usage when only the custom
white/black lists are loaded (2 URLs total, no big deal):

# ps aux | grep relayd
USER   PID %CPU %MEM   VSZ   RSS TT  STAT  STARTED   TIME COMMAND
_relayd  17238  0.0  0.1  1528  3208 ??  I  3:27PM0:00.01
relayd: relay (relayd)
_relayd  14280  0.0  0.1  1524  3176 ??  I  3:27PM0:00.02
relayd: relay (relayd)
_relayd  30448  0.0  0.1  1396  2812 ??  I  3:27PM0:00.01
relayd: ca (relayd)
_relayd  10020  0.0  0.1  1376  2768 ??  I  3:27PM0:00.01
relayd: ca (relayd)
_relayd  25775  0.0  0.1  1400  2852 ??  I  3:27PM0:00.01
relayd: ca (relayd)
root   346  0.0  0.1  1912  3672 ??  Is 3:27PM0:00.02
relayd: parent (relayd)
_relayd  15883  0.0  0.1  1440  2828 ??  I  3:27PM0:00.01
relayd: pfe (relayd)
_relayd  32000  0.0  0.1  1220  2560 ??  I  3:27PM0:00.01
relayd: hce (relayd)
_relayd   2677  0.0  0.1  1516  3188 ??  I  3:27PM0:00.01
relayd: relay (relayd)

Now loading the phishing/domains URL list, which has about ~63k
entries. relayd's parent process ballons to over 2GB memory usage
(I'm assuming it's reading the URL lists and building a data structure
for the relays), and after that the relays stabilize with the
following memory usage:

# ps aux | grep relayd
USER   PID %CPU %MEM   VSZ   RSS TT  STAT  STARTED   TIME COMMAND
_relayd  12982  0.0 12.9 516728 526288 ??  S  3:31PM0:03.44
relayd: relay (relayd)
_relayd   1206  0.0  0.1  1368  2836 ??  I  3:31PM0:00.01
relayd: ca (relayd)
root 25673  0.0  2.7 155616 111228 ??  Is 3:31PM0:16.35
relayd: parent (relayd)
_relayd  15513  0.0  0.1  1416  2832 ??  S  3:31PM0:00.01
relayd: pfe (relayd)
_relayd  15643  0.0  0.1  1200  2560 ??  I  3:31PM0:00.01
relayd: hce (relayd)
_relayd  25822  0.0 12.9 516716 526296 ??  S  3:31PM0:03.37
relayd: relay (relayd)
_relayd  17950  0.0  0.1  1380  2824 ??  I  3:31PM0:00.01
relayd: ca (relayd)
_relayd   9068  0.0  0.1  1360  2784 ??  I  3:31PM0:00.01
relayd: ca (relayd)
_relayd  19666  0.0 12.9 516712 526292 ??  S  3:31PM0:03.46
relayd: relay (relayd)

So that's about ~520 MB of memory per relay process, out of 3 total.
Next I load another URL list alongside the previous one, the
adult/urls list, which contains roughtly ~55k entries. Adding up
with the previous list, we have more or less ~118k URLs for relayd to
process. The parent process takes a couple minutes to process
everything, going over 4GB VSZ and 2.2GB RSS. After all's said and
done, here's what's shown by ps:

# ps aux | grep relayd
USER   PID %CPU %MEM   VSZ   RSS TT  STAT  STARTED   TIME COMMAND
_relayd   6332  0.0  0.1  1428  2228 ??  I  3:35PM0:00.01
relayd: ca (relayd)
_relayd   8736  0.0 23.9 967808 976768 ??  I  3:35PM0:06.81
relayd: relay (relayd)
_relayd  22890  0.0 23.9 967812 976768 ??  I  3:35PM0:06.77
relayd: relay (relayd)
_relayd   5871  0.0 23.9 967804 976760 ??  I  3:35PM0:06.33
relayd: relay (relayd)
_relayd   8199  0.0  0.1  1440  2256 ??  I  3:35PM0:00.01
relayd: ca (relayd)
root  5571  0.0  5.3 315032 214796 ??  Is 3:35PM1:28.45
relayd: parent (relayd)
_relayd  30781  0.0  0.1  1488  2136 ??  S  3:35PM0:00.01
relayd: pfe (relayd)
_relayd   1502  0.0  0.0  1272  2040 ??  I  3:35PM0:00.01
relayd: hce (relayd)
_relayd  29135  0.0  0.1  1432  2236 ??  I  3:35PM0:00.01
relayd: ca (relayd)

Nearly 1GB of RAM per relay process, and ~214 MB to the parent
process. This server I'm working with has 4GB of RAM, so it can't go
much further. If I attempt to load the biggest URL list from the set,
adult/domains (slightly above 1 million entries), the server hangs
up after a while and demands a hard reset.

Is there any configuration parameter I'm missing here? I've reviewed
the