Hello,

I'm one of the maintainers of multi-node spam checking service. We were
recently hit by DDoS attack. We received hundreds of emails per second,
all targeted to randomst...@single.client.com. Unfortunately client.com
had "unknown receiver tarpit" feature enabled and we had (must have)
"reject_unverified_recipient" option enabled on our side. This resulted
hundreds of verify probes per second, but client replied to less that
one per second. This resulted HUGE mail queue of verify probes plus
couple of real emails. Basically we and all of our clients were DDoS'ed
as our Postfix installation was using 99% of time to handle those
queued verify probes.

There are lot of different concurrency limits in Postfix but none for
verify. I quickly came up with attached patch which solved this DDoS
attack. It's not complete and it's quite dirty, but I'm sending it here
for comments before I clean it up.

Basic idea in patch:

Chunk #1: Function to increase/decrease/get current concurrency value
per receiver domain. I'm re-using verify_map for this value, stored as
key = "@@domain.com", value = "0". I know that value "0" will be purged
by verify_cache_validator() but that's not a problem.

Chunk #2: Descrease concurrency limit when probe finishes.

Chunk #3: Check if concurrency limit is over limit and DEFER is so.
Current limit is hardcoded to 18 but
$default_destination_concurrency_limit should be good default value.

Chunk #4: Increase concurrecty limit before sending probe.

Is this the correct way to solve this kind of DDoS? Should I clean up
the patch and add new verify_concurrency_limit config option? Any
comments?


While debugging my patch I noticed that Postfix doesn't strictly honor
verify timeout. If previous verify has already timeouted, but cache
cleanup timeout (12h) has now yet expired, Postfix will use previous
answer PLUS it sends new refresh probe (doesn't wait for answer).
Shouldn't it just ignore the old value? My patch doesn't take this is
account and it might result 18 new verifys + unknown number of refresh
probes. I would rather just ignore the old value.





diff -ur postfix-2.10.2.orig/src/verify/verify.c 
postfix-2.10.2/src/verify/verify.c
--- postfix-2.10.2.orig/src/verify/verify.c     2014-03-11 10:23:51.653142262 
+0200
+++ postfix-2.10.2/src/verify/verify.c  2014-03-12 11:14:55.938779885 +0200
@@ -338,6 +338,45 @@
     return (0);
 }
 
+/* concurrency - keep track of currently running probes per domain */
+
+static signed int concurrency(VSTRING *email, signed int modify)
+{
+       VSTRING *domain;
+       const char *raw_data, *delim;
+       signed int count;
+
+       /* Convert email to "@@domain.tld" */
+       delim = vstring_memchr(email, '@');
+       if (delim == NULL)
+               return 0;
+       domain = vstring_alloc(40);
+       vstring_sprintf(domain, "@%s", delim);
+       msg_warn(">>>> domain %s, modify %d", STR(domain), modify);
+
+       /* Lookup current value */
+       raw_data = dict_cache_lookup(verify_map, STR(domain));
+       if (raw_data == NULL)
+               count = 0;
+       else
+               count = atoi(raw_data);
+
+       /* Set new value */
+       count = count + modify;
+       if (count < 0) {
+               msg_warn(">>>> negative %s = %d", STR(domain), count);
+               count = 0;
+       } else if (modify != 0) {
+               VSTRING *data = vstring_alloc(10);
+               vstring_sprintf(data, "%d", count);
+               dict_cache_update(verify_map, STR(domain), STR(data));
+               msg_warn(">>>> update %s = %s", STR(domain), STR(data));
+               vstring_free(data);
+       }
+       vstring_free(domain);
+       return count;
+}
+
 /* verify_update_service - update address service */
 
 static void verify_update_service(VSTREAM *client_stream)
@@ -372,6 +411,7 @@
             * the address will be re-probed upon the next query. As long as
             * some probes succeed the address will remain cached as OK.
             */
+           concurrency(addr, -1);
            if (addr_status == DEL_RCPT_STAT_OK
                || (raw_data = dict_cache_lookup(verify_map, STR(addr))) == 0
                || STATUS_FROM_RAW_ENTRY(raw_data) != DEL_RCPT_STAT_OK) {
@@ -456,12 +496,23 @@
            || (now - probed > PROBE_TTL        /* safe to probe */
                && (POSITIVE_ENTRY_EXPIRED(addr_status, updated)
                    || NEGATIVE_ENTRY_EXPIRED(addr_status, updated)))) {
+
+           if (concurrency(addr, 0) >= 18) {
+               addr_status = DEL_RCPT_STAT_DEFER;
+               probed = 0;
+               updated = now;
+               text = "Concurrency limit exceeded";
+               msg_warn(">>>> %s", text);
+           } else {
+               
            addr_status = DEL_RCPT_STAT_TODO;
            probed = 0;
            updated = 0;
            text = "Address verification in progress";
            if (raw_data != 0 && var_verify_neg_cache == 0)
                dict_cache_delete(verify_map, STR(addr));
+
+           }
        }
        if (msg_verbose)
            msg_info("GOT %s status=%d probed=%ld updated=%ld text=%s",
@@ -495,6 +546,7 @@
        if (now - probed > PROBE_TTL
            && (POSITIVE_REFRESH_NEEDED(addr_status, updated)
                || NEGATIVE_REFRESH_NEEDED(addr_status, updated))) {
+           concurrency(addr, +1);
            if (msg_verbose)
                msg_info("PROBE %s status=%d probed=%ld updated=%ld",
                         STR(addr), addr_status, now, updated);

Reply via email to