Re: relayd memory usage when loading large URL lists
On 2015-03-01, Felipe Scarel fbsca...@gmail.com wrote: Now loading the phishing/domains URL list, which has about ~63k entries. relayd's parent process ballons to over 2GB memory usage (I'm assuming it's reading the URL lists and building a data structure for the relays), Yes, it's building a red-black tree structure during startup. So that's about ~520 MB of memory per relay process, out of 3 total. This is probably shared (fork does copy-on-write, so forked processes can just use the original memory unless they make changes to it). Try adjusting the prefork number and check the free memory with top(1) rather than the per-process memory with ps(1).
Re: relayd memory usage when loading large URL lists
On Wed, Mar 4, 2015 at 6:29 AM, Stuart Henderson s...@spacehopper.org wrote: On 2015-03-01, Felipe Scarel fbsca...@gmail.com wrote: Now loading the phishing/domains URL list, which has about ~63k entries. relayd's parent process ballons to over 2GB memory usage (I'm assuming it's reading the URL lists and building a data structure for the relays), Yes, it's building a red-black tree structure during startup. Nice to know. So that's about ~520 MB of memory per relay process, out of 3 total. This is probably shared (fork does copy-on-write, so forked processes can just use the original memory unless they make changes to it). Try adjusting the prefork number and check the free memory with top(1) rather than the per-process memory with ps(1). Alright, I'll do that. In other news, Reyk replied to me via Twitter saying that relayd is not optimized for large blacklists yet. I'll keep using the current version for the time being, as ~100k URLs is sufficient for my current demand. Thanks for your help!
Re: relayd memory usage when loading large URL lists
On Sun, Mar 1, 2015 at 4:45 PM, Felipe Scarel fbsca...@gmail.com wrote: Hello all, I'm implementing a simple SSL forward proxy using relayd. Configuration has been fine, as was testing. There seems to be one issue with memory consumption, however. To better illustrate my issue, here follows an excerpt of /etc/relayd.conf : http protocol httpsfilter { tcp { nodelay, sack, socket buffer 65536, backlog 1024 } return error match header set Keep-Alive value $TIMEOUT match header set Connecton value close pass quick url file /etc/relayd.d/custom_whitelist block url file /etc/relayd.d/custom_blacklist include /etc/relayd.d/auto_blacklist ssl ca key /etc/ssl/private/ca.key password password ssl ca cert /etc/ssl/ca.crt } So basically it checks against a custom whitelist, then a custom blacklist, and finally an auto blacklist (which is the main source of the problem). Using a few URLs with both custom black/white lists poses no issue, but when attempting to load a somewhat bigger URL list downloaded from the internet (I'm using ftp://ftp.ut-capitole.fr/pub/reseau/cache/squidguard_contrib/blacklists.tar.gz) I run into memory problems. For example, here is relayd's memory usage when only the custom white/black lists are loaded (2 URLs total, no big deal): # ps aux | grep relayd USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND _relayd 17238 0.0 0.1 1528 3208 ?? I 3:27PM0:00.01 relayd: relay (relayd) _relayd 14280 0.0 0.1 1524 3176 ?? I 3:27PM0:00.02 relayd: relay (relayd) _relayd 30448 0.0 0.1 1396 2812 ?? I 3:27PM0:00.01 relayd: ca (relayd) _relayd 10020 0.0 0.1 1376 2768 ?? I 3:27PM0:00.01 relayd: ca (relayd) _relayd 25775 0.0 0.1 1400 2852 ?? I 3:27PM0:00.01 relayd: ca (relayd) root 346 0.0 0.1 1912 3672 ?? Is 3:27PM0:00.02 relayd: parent (relayd) _relayd 15883 0.0 0.1 1440 2828 ?? I 3:27PM0:00.01 relayd: pfe (relayd) _relayd 32000 0.0 0.1 1220 2560 ?? I 3:27PM0:00.01 relayd: hce (relayd) _relayd 2677 0.0 0.1 1516 3188 ?? I 3:27PM0:00.01 relayd: relay (relayd) Now loading the phishing/domains URL list, which has about ~63k entries. relayd's parent process ballons to over 2GB memory usage (I'm assuming it's reading the URL lists and building a data structure for the relays), and after that the relays stabilize with the following memory usage: # ps aux | grep relayd USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND _relayd 12982 0.0 12.9 516728 526288 ?? S 3:31PM0:03.44 relayd: relay (relayd) _relayd 1206 0.0 0.1 1368 2836 ?? I 3:31PM0:00.01 relayd: ca (relayd) root 25673 0.0 2.7 155616 111228 ?? Is 3:31PM0:16.35 relayd: parent (relayd) _relayd 15513 0.0 0.1 1416 2832 ?? S 3:31PM0:00.01 relayd: pfe (relayd) _relayd 15643 0.0 0.1 1200 2560 ?? I 3:31PM0:00.01 relayd: hce (relayd) _relayd 25822 0.0 12.9 516716 526296 ?? S 3:31PM0:03.37 relayd: relay (relayd) _relayd 17950 0.0 0.1 1380 2824 ?? I 3:31PM0:00.01 relayd: ca (relayd) _relayd 9068 0.0 0.1 1360 2784 ?? I 3:31PM0:00.01 relayd: ca (relayd) _relayd 19666 0.0 12.9 516712 526292 ?? S 3:31PM0:03.46 relayd: relay (relayd) So that's about ~520 MB of memory per relay process, out of 3 total. Next I load another URL list alongside the previous one, the adult/urls list, which contains roughtly ~55k entries. Adding up with the previous list, we have more or less ~118k URLs for relayd to process. The parent process takes a couple minutes to process everything, going over 4GB VSZ and 2.2GB RSS. After all's said and done, here's what's shown by ps: # ps aux | grep relayd USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND _relayd 6332 0.0 0.1 1428 2228 ?? I 3:35PM0:00.01 relayd: ca (relayd) _relayd 8736 0.0 23.9 967808 976768 ?? I 3:35PM0:06.81 relayd: relay (relayd) _relayd 22890 0.0 23.9 967812 976768 ?? I 3:35PM0:06.77 relayd: relay (relayd) _relayd 5871 0.0 23.9 967804 976760 ?? I 3:35PM0:06.33 relayd: relay (relayd) _relayd 8199 0.0 0.1 1440 2256 ?? I 3:35PM0:00.01 relayd: ca (relayd) root 5571 0.0 5.3 315032 214796 ?? Is 3:35PM1:28.45 relayd: parent (relayd) _relayd 30781 0.0 0.1 1488 2136 ?? S 3:35PM0:00.01 relayd: pfe (relayd) _relayd 1502 0.0 0.0 1272 2040 ?? I 3:35PM0:00.01 relayd: hce (relayd) _relayd 29135 0.0 0.1 1432 2236 ?? I 3:35PM0:00.01 relayd: ca (relayd) Nearly 1GB of RAM per relay process, and ~214 MB to the parent process. This server I'm working with has 4GB of RAM, so it can't go much further. If I attempt to load the biggest URL list from the set,
relayd memory usage when loading large URL lists
Hello all, I'm implementing a simple SSL forward proxy using relayd. Configuration has been fine, as was testing. There seems to be one issue with memory consumption, however. To better illustrate my issue, here follows an excerpt of /etc/relayd.conf : http protocol httpsfilter { tcp { nodelay, sack, socket buffer 65536, backlog 1024 } return error match header set Keep-Alive value $TIMEOUT match header set Connecton value close pass quick url file /etc/relayd.d/custom_whitelist block url file /etc/relayd.d/custom_blacklist include /etc/relayd.d/auto_blacklist ssl ca key /etc/ssl/private/ca.key password password ssl ca cert /etc/ssl/ca.crt } So basically it checks against a custom whitelist, then a custom blacklist, and finally an auto blacklist (which is the main source of the problem). Using a few URLs with both custom black/white lists poses no issue, but when attempting to load a somewhat bigger URL list downloaded from the internet (I'm using ftp://ftp.ut-capitole.fr/pub/reseau/cache/squidguard_contrib/blacklists.tar.gz) I run into memory problems. For example, here is relayd's memory usage when only the custom white/black lists are loaded (2 URLs total, no big deal): # ps aux | grep relayd USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND _relayd 17238 0.0 0.1 1528 3208 ?? I 3:27PM0:00.01 relayd: relay (relayd) _relayd 14280 0.0 0.1 1524 3176 ?? I 3:27PM0:00.02 relayd: relay (relayd) _relayd 30448 0.0 0.1 1396 2812 ?? I 3:27PM0:00.01 relayd: ca (relayd) _relayd 10020 0.0 0.1 1376 2768 ?? I 3:27PM0:00.01 relayd: ca (relayd) _relayd 25775 0.0 0.1 1400 2852 ?? I 3:27PM0:00.01 relayd: ca (relayd) root 346 0.0 0.1 1912 3672 ?? Is 3:27PM0:00.02 relayd: parent (relayd) _relayd 15883 0.0 0.1 1440 2828 ?? I 3:27PM0:00.01 relayd: pfe (relayd) _relayd 32000 0.0 0.1 1220 2560 ?? I 3:27PM0:00.01 relayd: hce (relayd) _relayd 2677 0.0 0.1 1516 3188 ?? I 3:27PM0:00.01 relayd: relay (relayd) Now loading the phishing/domains URL list, which has about ~63k entries. relayd's parent process ballons to over 2GB memory usage (I'm assuming it's reading the URL lists and building a data structure for the relays), and after that the relays stabilize with the following memory usage: # ps aux | grep relayd USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND _relayd 12982 0.0 12.9 516728 526288 ?? S 3:31PM0:03.44 relayd: relay (relayd) _relayd 1206 0.0 0.1 1368 2836 ?? I 3:31PM0:00.01 relayd: ca (relayd) root 25673 0.0 2.7 155616 111228 ?? Is 3:31PM0:16.35 relayd: parent (relayd) _relayd 15513 0.0 0.1 1416 2832 ?? S 3:31PM0:00.01 relayd: pfe (relayd) _relayd 15643 0.0 0.1 1200 2560 ?? I 3:31PM0:00.01 relayd: hce (relayd) _relayd 25822 0.0 12.9 516716 526296 ?? S 3:31PM0:03.37 relayd: relay (relayd) _relayd 17950 0.0 0.1 1380 2824 ?? I 3:31PM0:00.01 relayd: ca (relayd) _relayd 9068 0.0 0.1 1360 2784 ?? I 3:31PM0:00.01 relayd: ca (relayd) _relayd 19666 0.0 12.9 516712 526292 ?? S 3:31PM0:03.46 relayd: relay (relayd) So that's about ~520 MB of memory per relay process, out of 3 total. Next I load another URL list alongside the previous one, the adult/urls list, which contains roughtly ~55k entries. Adding up with the previous list, we have more or less ~118k URLs for relayd to process. The parent process takes a couple minutes to process everything, going over 4GB VSZ and 2.2GB RSS. After all's said and done, here's what's shown by ps: # ps aux | grep relayd USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND _relayd 6332 0.0 0.1 1428 2228 ?? I 3:35PM0:00.01 relayd: ca (relayd) _relayd 8736 0.0 23.9 967808 976768 ?? I 3:35PM0:06.81 relayd: relay (relayd) _relayd 22890 0.0 23.9 967812 976768 ?? I 3:35PM0:06.77 relayd: relay (relayd) _relayd 5871 0.0 23.9 967804 976760 ?? I 3:35PM0:06.33 relayd: relay (relayd) _relayd 8199 0.0 0.1 1440 2256 ?? I 3:35PM0:00.01 relayd: ca (relayd) root 5571 0.0 5.3 315032 214796 ?? Is 3:35PM1:28.45 relayd: parent (relayd) _relayd 30781 0.0 0.1 1488 2136 ?? S 3:35PM0:00.01 relayd: pfe (relayd) _relayd 1502 0.0 0.0 1272 2040 ?? I 3:35PM0:00.01 relayd: hce (relayd) _relayd 29135 0.0 0.1 1432 2236 ?? I 3:35PM0:00.01 relayd: ca (relayd) Nearly 1GB of RAM per relay process, and ~214 MB to the parent process. This server I'm working with has 4GB of RAM, so it can't go much further. If I attempt to load the biggest URL list from the set, adult/domains (slightly above 1 million entries), the server hangs up after a while and demands a hard reset. Is there any configuration parameter I'm missing here? I've reviewed the