Hello, I'm one of the maintainers of multi-node spam checking service. We were recently hit by DDoS attack. We received hundreds of emails per second, all targeted to randomst...@single.client.com. Unfortunately client.com had "unknown receiver tarpit" feature enabled and we had (must have) "reject_unverified_recipient" option enabled on our side. This resulted hundreds of verify probes per second, but client replied to less that one per second. This resulted HUGE mail queue of verify probes plus couple of real emails. Basically we and all of our clients were DDoS'ed as our Postfix installation was using 99% of time to handle those queued verify probes.
There are lot of different concurrency limits in Postfix but none for verify. I quickly came up with attached patch which solved this DDoS attack. It's not complete and it's quite dirty, but I'm sending it here for comments before I clean it up. Basic idea in patch: Chunk #1: Function to increase/decrease/get current concurrency value per receiver domain. I'm re-using verify_map for this value, stored as key = "@@domain.com", value = "0". I know that value "0" will be purged by verify_cache_validator() but that's not a problem. Chunk #2: Descrease concurrency limit when probe finishes. Chunk #3: Check if concurrency limit is over limit and DEFER is so. Current limit is hardcoded to 18 but $default_destination_concurrency_limit should be good default value. Chunk #4: Increase concurrecty limit before sending probe. Is this the correct way to solve this kind of DDoS? Should I clean up the patch and add new verify_concurrency_limit config option? Any comments? While debugging my patch I noticed that Postfix doesn't strictly honor verify timeout. If previous verify has already timeouted, but cache cleanup timeout (12h) has now yet expired, Postfix will use previous answer PLUS it sends new refresh probe (doesn't wait for answer). Shouldn't it just ignore the old value? My patch doesn't take this is account and it might result 18 new verifys + unknown number of refresh probes. I would rather just ignore the old value. diff -ur postfix-2.10.2.orig/src/verify/verify.c postfix-2.10.2/src/verify/verify.c --- postfix-2.10.2.orig/src/verify/verify.c 2014-03-11 10:23:51.653142262 +0200 +++ postfix-2.10.2/src/verify/verify.c 2014-03-12 11:14:55.938779885 +0200 @@ -338,6 +338,45 @@ return (0); } +/* concurrency - keep track of currently running probes per domain */ + +static signed int concurrency(VSTRING *email, signed int modify) +{ + VSTRING *domain; + const char *raw_data, *delim; + signed int count; + + /* Convert email to "@@domain.tld" */ + delim = vstring_memchr(email, '@'); + if (delim == NULL) + return 0; + domain = vstring_alloc(40); + vstring_sprintf(domain, "@%s", delim); + msg_warn(">>>> domain %s, modify %d", STR(domain), modify); + + /* Lookup current value */ + raw_data = dict_cache_lookup(verify_map, STR(domain)); + if (raw_data == NULL) + count = 0; + else + count = atoi(raw_data); + + /* Set new value */ + count = count + modify; + if (count < 0) { + msg_warn(">>>> negative %s = %d", STR(domain), count); + count = 0; + } else if (modify != 0) { + VSTRING *data = vstring_alloc(10); + vstring_sprintf(data, "%d", count); + dict_cache_update(verify_map, STR(domain), STR(data)); + msg_warn(">>>> update %s = %s", STR(domain), STR(data)); + vstring_free(data); + } + vstring_free(domain); + return count; +} + /* verify_update_service - update address service */ static void verify_update_service(VSTREAM *client_stream) @@ -372,6 +411,7 @@ * the address will be re-probed upon the next query. As long as * some probes succeed the address will remain cached as OK. */ + concurrency(addr, -1); if (addr_status == DEL_RCPT_STAT_OK || (raw_data = dict_cache_lookup(verify_map, STR(addr))) == 0 || STATUS_FROM_RAW_ENTRY(raw_data) != DEL_RCPT_STAT_OK) { @@ -456,12 +496,23 @@ || (now - probed > PROBE_TTL /* safe to probe */ && (POSITIVE_ENTRY_EXPIRED(addr_status, updated) || NEGATIVE_ENTRY_EXPIRED(addr_status, updated)))) { + + if (concurrency(addr, 0) >= 18) { + addr_status = DEL_RCPT_STAT_DEFER; + probed = 0; + updated = now; + text = "Concurrency limit exceeded"; + msg_warn(">>>> %s", text); + } else { + addr_status = DEL_RCPT_STAT_TODO; probed = 0; updated = 0; text = "Address verification in progress"; if (raw_data != 0 && var_verify_neg_cache == 0) dict_cache_delete(verify_map, STR(addr)); + + } } if (msg_verbose) msg_info("GOT %s status=%d probed=%ld updated=%ld text=%s", @@ -495,6 +546,7 @@ if (now - probed > PROBE_TTL && (POSITIVE_REFRESH_NEEDED(addr_status, updated) || NEGATIVE_REFRESH_NEEDED(addr_status, updated))) { + concurrency(addr, +1); if (msg_verbose) msg_info("PROBE %s status=%d probed=%ld updated=%ld", STR(addr), addr_status, now, updated);