Re: smtpd processes congregating at the pub
Noel Jones put forth on 1/29/2010 8:44 AM: > On 1/29/2010 1:37 AM, Stan Hoeppner wrote: >>> Local shows very speedy delivery. Is this "long" smtpd process >>> lifespan normal >>> for 2.5.5 or did I do something screwy/wrong in my config? >>> >>> relay=local, delay=2.2, delays=2.2/0/0/0.01, dsn=2.0.0, status=sent >>> relay=local, delay=0.32, delays=0.29/0.02/0/0, dsn=2.0.0, status=sent >>> relay=local, delay=0.77, delays=0.75/0.03/0/0, dsn=2.0.0, status=sent >>> relay=local, delay=0.26, delays=0.25/0/0/0.01, dsn=2.0.0, status=sent >>> relay=local, delay=0.64, delays=0.62/0.03/0/0, dsn=2.0.0, status=sent >>> relay=local, delay=0.26, delays=0.25/0/0/0, dsn=2.0.0, status=sent > Nitpick: you talk about smtpd, then show log snips from smtp. But no > matter, they both honor max_idle and will behave in a similar manner. Maybe I could have worded that more clearly Noel. Those snippets are from postfix/local not smtp. smtp doesn't normally relay to local, afaik. ;) I included these snippets in an attempt to show that inbound delivery is very fast. Not understanding the smtpd process behavior at the time, wrt max_idle, I assume that fast delivery would equal smtpd exiting quickly. smtpd doesn't log delays afaict, so I included the local information instead. My apologies for the confusion. -- Stan
Re: smtpd processes congregating at the pub
Wietse Venema put forth on 1/31/2010 7:34 PM: > Stan Hoeppner: >>> Better: apply the long-term solution, in the form of the patch below. >>> This undoes the max_idle override (a workaround that I introduced >>> with Postfix 2.3). I already introduced the better solution with >>> Postfix 2.4 while solving a different problem. >> >> I'm not sure if I fully understand this. I'm using 2.5.5, so shouldn't I >> already have the 2.4 solution mentioned above? I must not be reading this >> correctly. > > The patch undoes the Postfix 2.3 change that is responsible for > the shorter-than-expected proxymap lifetimes that you observed > on low-traffic systems. > > With that change backed out, the reduced ipc_idle change from > Postfix 2.4 will finally get a chance to fix the excessive lifetime > of proxymap and trivial-rewrite processes on high-traffic systems. So, if I understand correctly, these changes made in 2.3 and 2.4 were to get more desirable behavior from proxymap and trivial-rewrite on high traffic systems, and this caused this (very minor) problem on low traffic systems? The patch resolves the low traffic issue, basically reverting to the older code used before said 2.3 changes? And these changes have, through 2.7, given the desired behavior on high-traffic systems? Or no? Your statement "will finally get a chance to..." is future tense. Does this mean the desired behavior for high-traffic systems has not been seen to date? I apologize if this seems a stupid question. The future tense in your statement confuses me. If that _is_ what you mean, future tense, does this mean I have inadvertently played a tiny role in helping you identify a long standing problem/issue? ;) -- Stan
Re: smtpd processes congregating at the pub
Stan Hoeppner: > > Better: apply the long-term solution, in the form of the patch below. > > This undoes the max_idle override (a workaround that I introduced > > with Postfix 2.3). I already introduced the better solution with > > Postfix 2.4 while solving a different problem. > > I'm not sure if I fully understand this. I'm using 2.5.5, so shouldn't I > already have the 2.4 solution mentioned above? I must not be reading this > correctly. The patch undoes the Postfix 2.3 change that is responsible for the shorter-than-expected proxymap lifetimes that you observed on low-traffic systems. With that change backed out, the reduced ipc_idle change from Postfix 2.4 will finally get a chance to fix the excessive lifetime of proxymap and trivial-rewrite processes on high-traffic systems. Wietse
Re: smtpd processes congregating at the pub
Wietse Venema put forth on 1/31/2010 10:38 AM: > Stan Hoeppner: >> This is making good progress. Seeing the smtpd's memory footprint >> drop so dramatically is fantastic. However, I'm still curious as >> to why proxymap doesn't appear to be honoring $max_idle or $max_use. >> Maybe my understanding of $max_use is not correct? It's currently >> set to 100, the default. Watching top while sending a test message >> through, I see proxymap launch but then exit within 5 seconds, >> while smtpd honors max_idle. Is there some other setting I need >> to change to keep proxymap around longer? > > Short answer (workaround for low-traffic sites): set ipc_idle=$max_idle > to approximate the expected behavior. This keeps the smtpd-to-proxymap > connection open for as long as smtpd runs. Then, proxymap won't > terminate before its clients terminate. Wietse, thank you for the very thorough and thoughtful response. For a few reasons, including the fact I don't trust myself working with source in this case, and that I'd rather not throw monkey wrenches into my distro's package management, I'm going to go with the short answer workaround above. All factors being taken into account, I think it best fits my needs, skills, and usage profile. > Better: apply the long-term solution, in the form of the patch below. > This undoes the max_idle override (a workaround that I introduced > with Postfix 2.3). I already introduced the better solution with > Postfix 2.4 while solving a different problem. I'm not sure if I fully understand this. I'm using 2.5.5, so shouldn't I already have the 2.4 solution mentioned above? I must not be reading this correctly. > Long answer: in ancient times, all Postfix daemons except qmgr > implemented the well-known max_idle=100s and max_use=100, as well > as the lesser-known ipc_idle=100s (see "short answer" for the effect > of that parameter). > > While this worked fine for single-client servers such as smtpd, it > was not so great for multi-client servers such as proxymap or > trivial-rewrite. This problem was known, and the idea was that it > would be solved over time. > > Theoretically, smtpd could run for up to $max_idle * $max_use = 3 > hours, while proxymap and trivial-rewrite could run for up to > $max_idle * $max_use * $max_use = 12 days on low-traffic systems > (one SMTP client every 100s, or a little under 900 SMTP clients a > day), and it would run forever on systems with a steady mail flow. > > This was a problem. The point of max_use is to limit the impact of > bugs such as memory or file handle leaks, by retiring a process > after doing a limited amount of work. I can test Postfix itself > with tools such as Purify and Valgrind, but I can't do those tests > with every version of everyone's system libraries. This is a very smart design philosophy. Just one more reason I feel privileged to use Postfix. > If a proxymap or trivial-rewrite server can run for 11 days even > on systems with a minuscule load, then max_use isn't working as > intended. > > The main cause is that the proxymap etc. clients reuse a connection > to improve efficiency. Therefore, the proxymap etc. server politely > waits until all its clients have disconnected before checking the > max_use counter. While this politeness thing can't be changed > easily, it is relatively easy to play with the proxymap etc. server's > max_idle value, and with the smtpd etc. ipc_ttl value. > > Postfix 2.3 reduced the proxymap etc. max_idle to a fixed 1s value > to make those processes go away sooner when idle. I think that > this was a mistake, because it makes processes terminate too soon, > and thereby worsens the low-traffic behavior. Instead, we should > speed up the proxymap etc. server's max_use counter. > Postfix 2.4 reduced ipc_ttl to 5s. This was done for a different > purpose: to allow proxymap etc. clients to switch to the least-loaded > proxymap etc. server. But, I think that this was also the right way > to deal with long-lived proxymap etc. processes, because it speeds > up the proxymap etc. max_use counter. Absolutely fascinating background information Wietse. Thank you for sharing this. It's always nice to learn how/why some things work "under the hood"; things that often can't easily be found in any official documentation. -- Stan
Re: smtpd processes congregating at the pub
Stan Hoeppner: > This is making good progress. Seeing the smtpd's memory footprint > drop so dramatically is fantastic. However, I'm still curious as > to why proxymap doesn't appear to be honoring $max_idle or $max_use. > Maybe my understanding of $max_use is not correct? It's currently > set to 100, the default. Watching top while sending a test message > through, I see proxymap launch but then exit within 5 seconds, > while smtpd honors max_idle. Is there some other setting I need > to change to keep proxymap around longer? Short answer (workaround for low-traffic sites): set ipc_idle=$max_idle to approximate the expected behavior. This keeps the smtpd-to-proxymap connection open for as long as smtpd runs. Then, proxymap won't terminate before its clients terminate. Better: apply the long-term solution, in the form of the patch below. This undoes the max_idle override (a workaround that I introduced with Postfix 2.3). I already introduced the better solution with Postfix 2.4 while solving a different problem. Long answer: in ancient times, all Postfix daemons except qmgr implemented the well-known max_idle=100s and max_use=100, as well as the lesser-known ipc_idle=100s (see "short answer" for the effect of that parameter). While this worked fine for single-client servers such as smtpd, it was not so great for multi-client servers such as proxymap or trivial-rewrite. This problem was known, and the idea was that it would be solved over time. Theoretically, smtpd could run for up to $max_idle * $max_use = 3 hours, while proxymap and trivial-rewrite could run for up to $max_idle * $max_use * $max_use = 12 days on low-traffic systems (one SMTP client every 100s, or a little under 900 SMTP clients a day), and it would run forever on systems with a steady mail flow. This was a problem. The point of max_use is to limit the impact of bugs such as memory or file handle leaks, by retiring a process after doing a limited amount of work. I can test Postfix itself with tools such as Purify and Valgrind, but I can't do those tests with every version of everyone's system libraries. If a proxymap or trivial-rewrite server can run for 11 days even on systems with a minuscule load, then max_use isn't working as intended. The main cause is that the proxymap etc. clients reuse a connection to improve efficiency. Therefore, the proxymap etc. server politely waits until all its clients have disconnected before checking the max_use counter. While this politeness thing can't be changed easily, it is relatively easy to play with the proxymap etc. server's max_idle value, and with the smtpd etc. ipc_ttl value. Postfix 2.3 reduced the proxymap etc. max_idle to a fixed 1s value to make those processes go away sooner when idle. I think that this was a mistake, because it makes processes terminate too soon, and thereby worsens the low-traffic behavior. Instead, we should speed up the proxymap etc. server's max_use counter. Postfix 2.4 reduced ipc_ttl to 5s. This was done for a different purpose: to allow proxymap etc. clients to switch to the least-loaded proxymap etc. server. But, I think that this was also the right way to deal with long-lived proxymap etc. processes, because it speeds up the proxymap etc. max_use counter. The patch below keeps the reduced ipc_ttl from Postfix 2.4, and removes the max_idle overrides from Postfix 2.3. Wietse *** ./src/proxymap/proxymap.c- Thu Jan 10 09:03:55 2008 --- ./src/proxymap/proxymap.c Sun Jan 31 10:52:50 2010 *** *** 594,605 myfree(saved_filter); /* - * This process is called by clients that already enforce the max_idle - * time, so we don't have to do it another time. - */ - var_idle_limit = 1; - - /* * Never, ever, get killed by a master signal, as that could corrupt a * persistent database when we're in the middle of an update. */ --- 594,599 *** ./src/trivial-rewrite/trivial-rewrite.c-Wed Dec 9 18:39:51 2009 --- ./src/trivial-rewrite/trivial-rewrite.c Sun Jan 31 10:53:01 2010 *** *** 565,576 if (resolve_verify.transport_info) transport_post_init(resolve_verify.transport_info); check_table_stats(0, (char *) 0); - - /* - * This process is called by clients that already enforce the max_idle - * time, so we don't have to do it another time. - */ - var_idle_limit = 1; } MAIL_VERSION_STAMP_DECLARE; --- 565,570
Re: smtpd processes congregating at the pub
Stan Hoeppner put forth on 1/31/2010 12:04 AM: > Sorry for top posting. Forgot to add something earlier: Proxymap seems to be > exiting on my system immediately after servicing requests. It does not seem > to > be obeying $max_use or $max_idle which are both set to 100. It did this even > before I added cidr lists to proxymap a few hours ago. Before that, afaik, it > was only being called for local alias verification, and it exited immediately > in > that case as well. Making a little more progress on this, slowly. I'd forgotten that I have a regexp table that's rather large, containing 1626 expressions. I added it to proxymap, and this action dropped the size of my smtpd processes dramatically, by about a factor of 5. Apparently, even though this regexp table has only 1626 lines, it requires far more memory than my big 'countries' cidr table which has 11148 lines. PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 14411 postfix 20 0 20276 16m 1480 S8 4.3 0:00.51 proxymap 14410 postfix 20 0 6704 3368 2208 S0 0.9 0:00.04 smtpd This is making good progress. Seeing the smtpd's memory footprint drop so dramatically is fantastic. However, I'm still curious as to why proxymap doesn't appear to be honoring $max_idle or $max_use. Maybe my understanding of $max_use is not correct? It's currently set to 100, the default. Watching top while sending a test message through, I see proxymap launch but then exit within 5 seconds, while smtpd honors max_idle. Is there some other setting I need to change to keep proxymap around longer? -- Stan
Re: smtpd processes congregating at the pub
Sorry for top posting. Forgot to add something earlier: Proxymap seems to be exiting on my system immediately after servicing requests. It does not seem to be obeying $max_use or $max_idle which are both set to 100. It did this even before I added cidr lists to proxymap a few hours ago. Before that, afaik, it was only being called for local alias verification, and it exited immediately in that case as well. -- Stan Stan Hoeppner put forth on 1/30/2010 11:13 PM: > Wietse Venema put forth on 1/30/2010 7:14 PM: >> Stan Hoeppner: >>> AFAIK I don't use Berkeley DB tables, only hash (small,few) and cidr >>> (very large, a handful). >> >> hash (and btree) == Berkeley DB. > > Ahh, good to know. I'd thought only btree used Berkeley DB and that hash > tables > used something else. > >> If you have big CIDR tables, you can save lots of memory by using >> proxy:cidr: instead of cidr: (and running "postfix reload"). >> Effectively, this turns all that private memory into something that >> can be shared via the proxy: protocol. > > I implemented proxymap but it doesn't appear to have changed the memory > footprint of smtpd much at all, if any. I reloaded once, and restarted once > just in case. > > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND > 4554 postfix 20 0 20828 17m 2268 S0 4.5 0:00.46 smtpd > 4560 postfix 20 0 20036 16m 2268 S0 4.3 0:00.47 smtpd > 4555 postfix 20 0 6812 3056 1416 S0 0.8 0:00.10 proxymap > >> The current CIDR implementation is optimized to make it easy to >> verify for correctness, and is optimized for speed when used with >> limited lists of netblocks (mynetworks, unassigned address blocks, >> reserved address blocks, etc.). > > Understood. > >> If you want to list large portions of Internet address space such >> as entire countries the current implementation starts burning CPU >> time (it examines all CIDR patterns in order; with a bit of extra >> up-front work during initialization, address lookups could skip >> over a lot of patterns, but the implementation would of course be >> harder to verify for correctness), and it wastes 24 bytes per CIDR >> rule when Postfix is compiled with IPv6 support (this roughly >> doubles the amount memory that is used by CIDR tables). > > I don't really notice much CPU burn on any postfix processes with these > largish > CIDRs, never have. I've got 12,212 CIDRs in 3 files, 11,148 of them in just > the > "countries" file alone. After implementing proxymap, I'm not seeing much > reduction in smtpd RES size, maybe 1MB if that. SHR is almost identical to > before. If it's not the big tables bloating smtpd, I wonder what is? Or, > have > I not implemented proxymap correctly? Following are my postconf -n and > main.cf > relevant parts. > > alias_maps = hash:/etc/aliases > append_dot_mydomain = no > biff = no > config_directory = /etc/postfix > disable_vrfy_command = yes > header_checks = pcre:/etc/postfix/header_checks > inet_interfaces = all > message_size_limit = 1024 > mime_header_checks = pcre:/etc/postfix/mime_header_checks > mydestination = hardwarefreak.com > myhostname = greer.hardwarefreak.com > mynetworks = 192.168.100.0/24 > myorigin = hardwarefreak.com > parent_domain_matches_subdomains = debug_peer_list smtpd_access_maps > proxy_interfaces = 65.41.216.221 > proxy_read_maps = $local_recipient_maps $mydestination $virtual_alias_maps > $virtual_alias_domains $virtual_mailbox_maps $virtual_mailbox_domains > $relay_recipient_maps $relay_domains $canonical_maps $sender_canonical_maps > $recipient_canonical_maps $relocated_maps $transport_maps $mynetworks > $sender_bcc_maps $recipient_bcc_maps $smtp_generic_maps $lmtp_generic_maps > proxy:${cidr}/countries proxy:${cidr}/spammer proxy:${cidr}/misc-spam-srcs > readme_directory = /usr/share/doc/postfix > recipient_bcc_maps = hash:/etc/postfix/recipient_bcc > relay_domains = > smtpd_banner = $myhostname ESMTP Postfix > smtpd_helo_required = yes > smtpd_recipient_restrictions = permit_mynetworks > reject_unauth_destination check_recipient_access > hash:/etc/postfix/whitelist check_sender_access hash:/etc/postfix/whitelist > check_client_access hash:/etc/postfix/whitelist check_client_access > hash:/etc/postfix/blacklist check_client_access > regexp:/etc/postfix/fqrdns.regexp check_client_access > pcre:/etc/postfix/ptr-tld.pcre check_client_access proxy:${cidr}/countries > check_client_access proxy:${cidr}/spammer check_client_access > proxy:${cidr}/misc-spam-srcsreject_unknown_reverse_client_hostname > reject_non_fqdn_sender reject_non_fqdn_helo_hostname > reject_invalid_helo_hostnamereject_unknown_helo_hostname > reject_unlisted_recipient reject_rbl_client zen.spamhaus.org > check_policy_service inet:127.0.0.1:6 > strict_rfc821_envelopes = yes > virtual_alias_maps = hash:/etc/postfix/virtual > > /etc/postfix/main.cf snippet > > cidr=cidr:/etc/postfix/cidr_files > > proxy_read
Re: smtpd processes congregating at the pub
Wietse Venema put forth on 1/30/2010 7:14 PM: > Stan Hoeppner: >> AFAIK I don't use Berkeley DB tables, only hash (small,few) and cidr >> (very large, a handful). > > hash (and btree) == Berkeley DB. Ahh, good to know. I'd thought only btree used Berkeley DB and that hash tables used something else. > If you have big CIDR tables, you can save lots of memory by using > proxy:cidr: instead of cidr: (and running "postfix reload"). > Effectively, this turns all that private memory into something that > can be shared via the proxy: protocol. I implemented proxymap but it doesn't appear to have changed the memory footprint of smtpd much at all, if any. I reloaded once, and restarted once just in case. PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 4554 postfix 20 0 20828 17m 2268 S0 4.5 0:00.46 smtpd 4560 postfix 20 0 20036 16m 2268 S0 4.3 0:00.47 smtpd 4555 postfix 20 0 6812 3056 1416 S0 0.8 0:00.10 proxymap > The current CIDR implementation is optimized to make it easy to > verify for correctness, and is optimized for speed when used with > limited lists of netblocks (mynetworks, unassigned address blocks, > reserved address blocks, etc.). Understood. > If you want to list large portions of Internet address space such > as entire countries the current implementation starts burning CPU > time (it examines all CIDR patterns in order; with a bit of extra > up-front work during initialization, address lookups could skip > over a lot of patterns, but the implementation would of course be > harder to verify for correctness), and it wastes 24 bytes per CIDR > rule when Postfix is compiled with IPv6 support (this roughly > doubles the amount memory that is used by CIDR tables). I don't really notice much CPU burn on any postfix processes with these largish CIDRs, never have. I've got 12,212 CIDRs in 3 files, 11,148 of them in just the "countries" file alone. After implementing proxymap, I'm not seeing much reduction in smtpd RES size, maybe 1MB if that. SHR is almost identical to before. If it's not the big tables bloating smtpd, I wonder what is? Or, have I not implemented proxymap correctly? Following are my postconf -n and main.cf relevant parts. alias_maps = hash:/etc/aliases append_dot_mydomain = no biff = no config_directory = /etc/postfix disable_vrfy_command = yes header_checks = pcre:/etc/postfix/header_checks inet_interfaces = all message_size_limit = 1024 mime_header_checks = pcre:/etc/postfix/mime_header_checks mydestination = hardwarefreak.com myhostname = greer.hardwarefreak.com mynetworks = 192.168.100.0/24 myorigin = hardwarefreak.com parent_domain_matches_subdomains = debug_peer_list smtpd_access_maps proxy_interfaces = 65.41.216.221 proxy_read_maps = $local_recipient_maps $mydestination $virtual_alias_maps $virtual_alias_domains $virtual_mailbox_maps $virtual_mailbox_domains $relay_recipient_maps $relay_domains $canonical_maps $sender_canonical_maps $recipient_canonical_maps $relocated_maps $transport_maps $mynetworks $sender_bcc_maps $recipient_bcc_maps $smtp_generic_maps $lmtp_generic_maps proxy:${cidr}/countries proxy:${cidr}/spammer proxy:${cidr}/misc-spam-srcs readme_directory = /usr/share/doc/postfix recipient_bcc_maps = hash:/etc/postfix/recipient_bcc relay_domains = smtpd_banner = $myhostname ESMTP Postfix smtpd_helo_required = yes smtpd_recipient_restrictions = permit_mynetworks reject_unauth_destination check_recipient_access hash:/etc/postfix/whitelist check_sender_access hash:/etc/postfix/whitelist check_client_access hash:/etc/postfix/whitelist check_client_access hash:/etc/postfix/blacklist check_client_access regexp:/etc/postfix/fqrdns.regexp check_client_access pcre:/etc/postfix/ptr-tld.pcre check_client_access proxy:${cidr}/countries check_client_access proxy:${cidr}/spammer check_client_access proxy:${cidr}/misc-spam-srcsreject_unknown_reverse_client_hostname reject_non_fqdn_sender reject_non_fqdn_helo_hostname reject_invalid_helo_hostnamereject_unknown_helo_hostname reject_unlisted_recipient reject_rbl_client zen.spamhaus.org check_policy_service inet:127.0.0.1:6 strict_rfc821_envelopes = yes virtual_alias_maps = hash:/etc/postfix/virtual /etc/postfix/main.cf snippet cidr=cidr:/etc/postfix/cidr_files proxy_read_maps = $local_recipient_maps $mydestination $virtual_alias_maps $virtual_alias_domains $virtual_mailbox_maps $virtual_mailbox_domains $relay_recipient_maps $relay_domains $canonical_maps $sender_canonical_maps $recipient_canonical_maps $relocated_maps $transport_maps $mynetworks $sender_bcc_maps $recipient_bcc_maps $smtp_generic_maps $lmtp_generic_maps proxy:${cidr}/countries proxy:${cidr}/spammer proxy:${cidr}/misc-spam-srcs check_client_access proxy:${cidr}/countries check_client_access proxy:${cidr}/spammer check_client_access proxy:${cidr}/misc-spam-srcs -- Stan
Re: smtpd processes congregating at the pub
Stan Hoeppner: > AFAIK I don't use Berkeley DB tables, only hash (small,few) and cidr > (very large, a handful). hash (and btree) == Berkeley DB. If you have big CIDR tables, you can save lots of memory by using proxy:cidr: instead of cidr: (and running "postfix reload"). Effectively, this turns all that private memory into something that can be shared via the proxy: protocol. The current CIDR implementation is optimized to make it easy to verify for correctness, and is optimized for speed when used with limited lists of netblocks (mynetworks, unassigned address blocks, reserved address blocks, etc.). If you want to list large portions of Internet address space such as entire countries the current implementation starts burning CPU time (it examines all CIDR patterns in order; with a bit of extra up-front work during initialization, address lookups could skip over a lot of patterns, but the implementation would of course be harder to verify for correctness), and it wastes 24 bytes per CIDR rule when Postfix is compiled with IPv6 support (this roughly doubles the amount memory that is used by CIDR tables). Wietse
Re: smtpd processes congregating at the pub
Wietse Venema put forth on 1/30/2010 9:03 AM: > Allow me to present a tutorial on Postfix and operating system basics. Thank you Wietse. I'm always eager to learn. :) > Postfix reuses processes for the same reasons that Apache does; > however, Apache always runs a fixed minimum amount of daemons, > whereas Postfix will dynamically shrink to zero smtpd processes > over time. Possibly not the best reference example, as I switched to Lighty mainly due to the Apache behavior you describe, but also due to Apache resource hogging in general. But I understand your point. It's better to keep one or two processes resident to service the next inbound requests than to constantly tear down and then rebuild processes, which causes significant overhead and performance issues on busy systems. > Therefore, people who believe that Postfix processes should not be > running in the absence of client requests, should also terminate > their Apache processes until a connection arrives. No-one does that. Wouldn't that really depend on the purpose of the server? How about a web admin daemon running on a small network device? I almost do this with Lighty currently. I have a single daemon instance that handles all requests, max processes=1. It's a very lightly loaded server, and a single instance is more than enough. In fact, given the load, I might possibly look into running Lighty from inetd, if possible, as I do Samba. > If people believe that each smtpd process uses 15MB of RAM, and > that two smtpd processes use 30MB of RAM, then that would have been > correct had Postfix been running on MS-DOS. > > First, the physical memory footprint of a process (called resident > memory size) is smaller than the virtual memory footprint (which > comprises all addressable memory including the executable, libraries, > data, heap and stack). With FreeBSD 8.0 I see an smtpd VSZ/RSS of > 6.9MB/4.8MB; with Fedora Core 11, 4.2MB/1.8MB; and with FreeBSD > 4.1 it's 1.8MB/1.4MB. Ten years of system library bloat. Debian 5.0.3, kernel 2.6.31 PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 29242 postfix 20 0 22408 18m 2268 S0 4.9 0:00.58 smtpd 29251 postfix 20 0 17264 13m 2208 S0 3.6 0:00.48 smtpd > Second, when multiple processes execute the same executable file > and libraries, those processes will share a single memory copy of > the code and constants of that executable file and libraries. > Therefore, a large portion of their resident memory sizes will > actually map onto the same physical memory pages. 15+15 != 30. I was of the understanding that top's SHR column described memory shareable with other processes. In the real example above from earlier today, it would seem that my two smtpd processes can only share ~2.2MB of code, data structures, etc. man top: t: SHR -- Shared Mem size (kb) The amount of shared memory used by a task. It simply reflects memory that could be potentially shared with other processes. Am I missing something, or reading my top output incorrectly? > Third, some code uses mmap() to allocate memory that is mapped from > a file. This adds to the virtual memory footprint of each process, > but of course only the pages that are actually accessed will add > to the resident memory size. In the case of Postfix, this mechanism > is used by Berkeley DB to allocate a 16MB shared-memory read buffer. Is this 16MB buffer also used for hash and/or cidr tables, and is this shareable? AFAIK I don't use Berkeley DB tables, only hash (small,few) and cidr (very large, a handful). > There are some other tricks that allow for further savings (such > as copy-on-write, which allows sharing of a memory page until a > process attempts to write to it) but in the case of Postfix, those > savings will be modest. I must be screwing something up somewhere then. According to my top output, I'm only sharing ~2.2MB between smtpd processes, yet I've seen them occupy anywhere from 11-18MB RES. If the top output is correct, there is a huge amount of additional sharing that "should" be occurring, no? Debian runs Postfix in a chroot by default. I know very little about chroot environments. Could this have something to do with the tiny amount of shared memory between the smtpds? Thanks for taking interest in this Wietse. I'm sure I've probably done something screwy that is easily fixable, and will get that shared memory count up where it should be. -- Stan
Re: smtpd processes congregating at the pub
Stan Hoeppner: > Wietse Venema put forth on 1/29/2010 6:15 AM: > > Stan Hoeppner: > >> Based on purely visual non-scientific observation (top), it seems my smtpd > >> processes on my MX hang around much longer in (Debian) 2.5.5 than they did > >> in > >> (Debian) 2.3.8. In 2.3.8 Master seemed to build them and tear them down > >> very > > > > Perhaps Debian changed this: > > http://www.postfix.org/postconf.5.html#max_idle > > > > The Postfix default is 100s. > > Yes, I confirmed this on my system. > > > I don't really see why anyone would shorten this - that's a waste > > of CPU cycles. In particular, stopping Postfix daemons after 10s Allow me to present a tutorial on Postfix and operating system basics. Postfix reuses processes for the same reasons that Apache does; however, Apache always runs a fixed minimum amount of daemons, whereas Postfix will dynamically shrink to zero smtpd processes over time. Therefore, people who believe that Postfix processes should not be running in the absence of client requests, should also terminate their Apache processes until a connection arrives. No-one does that. If people believe that each smtpd process uses 15MB of RAM, and that two smtpd processes use 30MB of RAM, then that would have been correct had Postfix been running on MS-DOS. First, the physical memory footprint of a process (called resident memory size) is smaller than the virtual memory footprint (which comprises all addressable memory including the executable, libraries, data, heap and stack). With FreeBSD 8.0 I see an smtpd VSZ/RSS of 6.9MB/4.8MB; with Fedora Core 11, 4.2MB/1.8MB; and with FreeBSD 4.1 it's 1.8MB/1.4MB. Ten years of system library bloat. Second, when multiple processes execute the same executable file and libraries, those processes will share a single memory copy of the code and constants of that executable file and libraries. Therefore, a large portion of their resident memory sizes will actually map onto the same physical memory pages. 15+15 != 30. Third, some code uses mmap() to allocate memory that is mapped from a file. This adds to the virtual memory footprint of each process, but of course only the pages that are actually accessed will add to the resident memory size. In the case of Postfix, this mechanism is used by Berkeley DB to allocate a 16MB shared-memory read buffer. There are some other tricks that allow for further savings (such as copy-on-write, which allows sharing of a memory page until a process attempts to write to it) but in the case of Postfix, those savings will be modest. Wietse
Re: smtpd processes congregating at the pub
Wietse Venema put forth on 1/29/2010 6:15 AM: > Stan Hoeppner: >> Based on purely visual non-scientific observation (top), it seems my smtpd >> processes on my MX hang around much longer in (Debian) 2.5.5 than they did in >> (Debian) 2.3.8. In 2.3.8 Master seemed to build them and tear them down very > > Perhaps Debian changed this: > http://www.postfix.org/postconf.5.html#max_idle > > The Postfix default is 100s. Yes, I confirmed this on my system. > I don't really see why anyone would shorten this - that's a waste > of CPU cycles. In particular, stopping Postfix daemons after 10s > means that people don't have a clue about what they are doing. > The fact that it's now increased to 30s confirms my suspicion. Think of a lightly loaded (smtp connects/min) vanity domain server that functions as a Postfix MX with local delivery, a Dovecot IMAP, a Lighty+Roundcube, a Samba server, and a dns resolver serving local requests and one remote workstation. The system is also used interactively (via SSH/BASH) for a number of things including an occasional kernel compile. The machine only has 384MB of RAM. My smtp load is low enough that having an smtpd process or two hanging around for 100 seconds just wastes 13-18MB per smtpd of memory for 80-90 of those 100 seconds. This system regularly goes 5 minutes or more between smtp connects. Sometimes two come in simultaneously, and I end up with two smtpd processes hanging around for 100 seconds, eating over 30MB RAM with no benefit. Thus, for me, it makes more sense to have the smtpd's exit as soon as possible to free up memory that can be (better) used for something else. Yes, I guess I'm a maniac. ;) In this scenario, with very infrequent smtpd reuse, do you still think I should let them idle for 100 seconds, or at all? From my perspective, that 18-30MB+ can often be better utilized during that time. -- Stan
Re: smtpd processes congregating at the pub
On 1/29/2010 1:37 AM, Stan Hoeppner wrote: Stan Hoeppner put forth on 1/29/2010 12:27 AM: Based on purely visual non-scientific observation (top), it seems my smtpd processes on my MX hang around much longer in (Debian) 2.5.5 than they did in (Debian) 2.3.8. In 2.3.8 Master seemed to build them and tear them down very quickly after the transaction was complete. An smtpd process' lifespan was usually 10 seconds or less on my 2.3.8. In 2.5.5 smtpd's seem to hang around for up to 30 secs to a minute. Local shows very speedy delivery. Is this "long" smtpd process lifespan normal for 2.5.5 or did I do something screwy/wrong in my config? relay=local, delay=2.2, delays=2.2/0/0/0.01, dsn=2.0.0, status=sent relay=local, delay=0.32, delays=0.29/0.02/0/0, dsn=2.0.0, status=sent relay=local, delay=0.77, delays=0.75/0.03/0/0, dsn=2.0.0, status=sent relay=local, delay=0.26, delays=0.25/0/0/0.01, dsn=2.0.0, status=sent relay=local, delay=0.64, delays=0.62/0.03/0/0, dsn=2.0.0, status=sent relay=local, delay=0.26, delays=0.25/0/0/0, dsn=2.0.0, status=sent I think I found it: max_idle = x The default is 100 on my system. I changed it to 10 and that seems to have had an effect. Did this setting exist in 2.3.8? I didn't see a version note next to max_idle in my 2.5.5 man smtpd. If so, was the default something insanely low like 1, or 0? Like I said, smtpd's seemed to come and go in a hurry on 2.3.8. Nitpick: you talk about smtpd, then show log snips from smtp. But no matter, they both honor max_idle and will behave in a similar manner. The max_idle default has been 100s pretty much forever. The idea is that an idle postfix process will be reused to do more work rather than starting a new process every time. This makes postfix *far* more efficient than one process per job. Although the 100s default is somewhat arbitrary, I have trouble imagining a situation where a shorter max_idle makes sense. On a very lightly loaded system where processes are seldom reused, a shorter max_idle might not hurt anything, but it won't help anything either. -- Noel Jones
Re: smtpd processes congregating at the pub
Stan Hoeppner: > Based on purely visual non-scientific observation (top), it seems my smtpd > processes on my MX hang around much longer in (Debian) 2.5.5 than they did in > (Debian) 2.3.8. In 2.3.8 Master seemed to build them and tear them down very Perhaps Debian changed this: http://www.postfix.org/postconf.5.html#max_idle The Postfix default is 100s. I don't really see why anyone would shorten this - that's a waste of CPU cycles. In particular, stopping Postfix daemons after 10s means that people don't have a clue about what they are doing. The fact that it's now increased to 30s confirms my suspicion. Technical correctness: the Postfix master does not terminate processes. Processes terminate voluntarily. Wietse
Re: smtpd processes congregating at the pub
Stan Hoeppner put forth on 1/29/2010 12:27 AM: > Based on purely visual non-scientific observation (top), it seems my smtpd > processes on my MX hang around much longer in (Debian) 2.5.5 than they did in > (Debian) 2.3.8. In 2.3.8 Master seemed to build them and tear them down very > quickly after the transaction was complete. An smtpd process' lifespan was > usually 10 seconds or less on my 2.3.8. In 2.5.5 smtpd's seem to hang around > for up to 30 secs to a minute. > > Local shows very speedy delivery. Is this "long" smtpd process lifespan > normal > for 2.5.5 or did I do something screwy/wrong in my config? > > relay=local, delay=2.2, delays=2.2/0/0/0.01, dsn=2.0.0, status=sent > relay=local, delay=0.32, delays=0.29/0.02/0/0, dsn=2.0.0, status=sent > relay=local, delay=0.77, delays=0.75/0.03/0/0, dsn=2.0.0, status=sent > relay=local, delay=0.26, delays=0.25/0/0/0.01, dsn=2.0.0, status=sent > relay=local, delay=0.64, delays=0.62/0.03/0/0, dsn=2.0.0, status=sent > relay=local, delay=0.26, delays=0.25/0/0/0, dsn=2.0.0, status=sent I think I found it: max_idle = x The default is 100 on my system. I changed it to 10 and that seems to have had an effect. Did this setting exist in 2.3.8? I didn't see a version note next to max_idle in my 2.5.5 man smtpd. If so, was the default something insanely low like 1, or 0? Like I said, smtpd's seemed to come and go in a hurry on 2.3.8. -- Stan