Re: Posfix: deliver to spam folder analog of reject_rbl_client

2010-10-29 Thread Noel Butler
On Thu, 2010-10-28 at 07:52 -0400, John Peach wrote:



  Right, so, how is THAT a false positive, it is a justifiable listing
  if they became part of the problem.
  
 I never said it was a false positive. Just that it's a waste of time
 trying to get delisted; we gave up with that years ago.
 


Really? No one I've met who actually requested delisting was ignored or
refused, sure, a few years ago it was taking a few weeks to get out of
it, but it got there in the end without pestering.

A lot of people are just spreading FUD because of them reading the
requirements (which werent requirements) and never bothering to ask
fearing theyd have to pay, and IIRC, it was a charity not SORBS,
probably a bad idea, sometimes the scare tactic can backfire, I believe
the web page is updated (or will be soon) reflecting this.

IMHO, it comes down to laziness of those admins not chasing up, what do
you do when you get in spamcop, wait till your listing expires, get
relisted, wait until it expires again (being a longer listing period).

SORBS only gave me one headache many many years ago, it took no time to
get it resolved, and they have been ultra reliable here, they are
heavily used by Australian and New Zealand ISP's, probably of little
consequence to Americans though.



Re: Posfix: deliver to spam folder analog of reject_rbl_client

2010-10-29 Thread Noel Butler
On Thu, 2010-10-28 at 09:40 -0400, Wietse Venema wrote:


   .
 
 This illustrates what you get when blocking all mail from an ISP
 just because some customer sent some email that hit some spamtrap.
 


We do it here, I've done it for 5 years or so, little problems at all
given the remoteness of the addresses, the only way 
for it to be tried is because of a spambot, therefore that IP =
miscreant, and frankly I don't want miscreants knocking our way, and not
many miscreants use their ISP's mail servers. 



 Such an approach makes sense only if receiving one spam message is
 a bigger problem than losing a larger amount of legitimate email.
 


But how do you know its only  one  I'm sure if this IP in question was
only used by you then there is no doubt it would be wrongful listing,
however, majordomo is not the smartest kid of the block, hasn't been for
a decade if ever, someone with a grudge could easily fire away at it
causing it ending up where it is now.

Luckily the vast majority of IP's hitting spamtraps are end users/bots,
so it would normally be quite rare to have a real actual mail server
listed, however Gmail pushes this envelope to extreme, not so bad
recently, but in the past, it has had a reputation almost as bad as the
mid to late nineties AOL.




Re: Need help with SALS and TLS

2010-10-29 Thread Noel Jones

On 10/28/2010 6:26 PM, Kory Hamzeh wrote:

3. I have TLS working with name/pass auth, on port 587 if the client
UNCHECKS Use SSL. For some reason that I don't understand, if the client
has Use SSL enabled, it disconnects the TCP connection as soon as a SSL


In the context of most mail clients, SSL refers to 
(deprecated) wrappermode TLS, typically on port 465.



My main question at this point: is my SASL and TLS setup secure (encrypted)
with my current configuration below?




Oct 27 16:22:30 ns postfix/smtpd[15850]: Anonymous TLS connection
established from 108.sub-97-48-178.myvzw.com[97.48.178.108]: TLSv1 with
cipher DHE-RSA-AES256-SHA (256/256 bits)


The above line shows a TLS session correctly established (this 
line is also logged at smtpd_tls_loglevel = 1).  This 
connection is secure.  Typically one would use -o 
smtpd_tls_security_level=enforce on the submission port 587 
in master.cf to require a secure connection on that port.


I've found it also generally useful to go ahead and enable 
smtps wrappermode SSL on port 465 for folks who mistakenly 
configure their client that way, or for folks with antique 
software that doesn't properly support STARTTLS.


STARTTLS and wrappermode are equally secure and I think the 
goal is to cause your customers/clients/coworkers no more 
grief than necessary.



Failed log entry, same as before but SSL enabled on the phone (client):



The phone connects to the port, but the phone is expecting a 
TLS handshake rather than an SMTP conversation, so the session 
is never established.



  -- Noel Jones


THREAD KILLED (Posfix: deliver to spam bla blah drivel)

2010-10-29 Thread Wietse Venema
Wietse:
 [About blocking all mail from an ISP because some customer sent spam]
 Such an approach makes sense only if receiving one spam message is
 a bigger problem than losing a larger amount of legitimate email.

Noel Butler:
 But how do you know its only  one  I'm sure if this IP in question was

Your response shows that you have no clue about the ham:spam ratio
(I do have a clue: I've been customer for 14 years and can't remember
when I last received spam that originates from their network).

This is a technical mailing list about Postfix. There is no room
here for contributions without quantitative technical content.

From now on there is a taboo on SORBS, just like SPF.  Trespassers
will be shot.

Wietse


Cron Mail deliver process dying from postfix sendmail.

2010-10-29 Thread Jerrale G

Reporting-MTA: dns; mail.sheltoncomputers.com
X-SC-Mail-Server-Queue-ID: B84AF1B60004
X-SC-Mail-Server-Sender: rfc822; syste...@sheltoncomputers.com
Arrival-Date: Fri, 29 Oct 2010 03:37:02 -0400 (EDT)

Final-Recipient: rfc822; ad...@sheltoncomputers.com
Original-Recipient: rfc822; root
Action: failed
Status: 5.3.0
Diagnostic-Code: x-unix; internal software error


Part 1.2
Subject:
Cron r...@server1 /etc/weeklybackup
From:
syste...@sheltoncomputers.com (Cron Daemon)
Date:
Fri, 29 Oct 2010 03:37:02 -0400 (EDT)
To:
syste...@sheltoncomputers.com
Return-Path:
syste...@sheltoncomputers.com
Received:
by mail.sheltoncomputers.com (SC Mail Server, from userid 0) id 
B84AF1B60004; Fri, 29 Oct 2010 04:55:50 -0400 (EDT)

Content-Type:
text/plain; charset=ISO-8859-1
Auto-Submitted:
auto-generated
X-Cron-Env:
SHELL=/bin/sh
X-Cron-Env:
HOME=/root
X-Cron-Env:
PATH=/usr/bin:/bin
X-Cron-Env:
LOGNAME=root
X-Cron-Env:
USER=root
Message-ID:
20101029085550.b84af1b60...@mail.sheltoncomputers.com


why would this be happening? the deliver process dies and this didn't 
start happening until I added -e, as someone here volunteered a great 
idea, to the transport of postfix. There is an alias in postfix for this 
admin address as a catch all box for creating tickets to the right 
departments for abuse@ postmaster@ and ad...@. ALL other mail delivers 
perfectly but I dont know whats with crontab on this mail server.


Thanks


Jerrale G.
SC Senior Admin


Re: Cron Mail deliver process dying from postfix sendmail.

2010-10-29 Thread Wietse Venema
Jerrale G:
 Final-Recipient: rfc822; ad...@sheltoncomputers.com
 Original-Recipient: rfc822; root
 Action: failed
 Status: 5.3.0
 Diagnostic-Code: x-unix; internal software error

Translation: some command was running as a POSTFIX CHILD process,
and that command terminated with status 70.

This exit status is defined in /usr/include/sysexits.h as:

#define EX_SOFTWARE 70  /* internal software error */

Hence, the Diagnostic-Code: x-unix; internal software error
in the notification quoted above.

 the deliver process dies and this didn't 
 start happening until I added -e, as someone here volunteered a great 
 idea, to the transport of postfix.

Before making changes, be sure to understand their consequences.

Wietse


Re: Cron Mail deliver process dying from postfix sendmail.

2010-10-29 Thread Ravindra Gupta // Viva
Dear Team,

I am using postfix to sending the mail,but some time postfix is working
very slow.Please give the advice how to improve profermance of postfix to
send the fast mail.





On Fri, Oct 29, 2010 at 6:58 PM, Jerrale G 
jerralega...@sheltoncomputers.com wrote:

  Reporting-MTA: dns; mail.sheltoncomputers.com
 X-SC-Mail-Server-Queue-ID: B84AF1B60004
 X-SC-Mail-Server-Sender: rfc822; syste...@sheltoncomputers.com
 Arrival-Date: Fri, 29 Oct 2010 03:37:02 -0400 (EDT)

 Final-Recipient: rfc822; ad...@sheltoncomputers.com
 Original-Recipient: rfc822; root
 Action: failed
 Status: 5.3.0
 Diagnostic-Code: x-unix; internal software error


 Part 1.2
 Subject:
 Cron r...@server1 /etc/weeklybackup
 From:
 syste...@sheltoncomputers.com (Cron Daemon)
 Date:
 Fri, 29 Oct 2010 03:37:02 -0400 (EDT)
 To:
 syste...@sheltoncomputers.com
 Return-Path:
 syste...@sheltoncomputers.com syste...@sheltoncomputers.com
 Received:
 by mail.sheltoncomputers.com (SC Mail Server, from userid 0) id
 B84AF1B60004; Fri, 29 Oct 2010 04:55:50 -0400 (EDT)
 Content-Type:
 text/plain; charset=ISO-8859-1
 Auto-Submitted:
 auto-generated
 X-Cron-Env:
 SHELL=/bin/sh
 X-Cron-Env:
 HOME=/root
 X-Cron-Env:
 PATH=/usr/bin:/bin
 X-Cron-Env:
 LOGNAME=root
 X-Cron-Env:
 USER=root
 Message-ID:
 20101029085550.b84af1b60...@mail.sheltoncomputers.com20101029085550.b84af1b60...@mail.sheltoncomputers.com


 why would this be happening? the deliver process dies and this didn't start
 happening until I added -e, as someone here volunteered a great idea, to
 the transport of postfix. There is an alias in postfix for this admin
 address as a catch all box for creating tickets to the right departments for
 abuse@ postmaster@ and ad...@. ALL other mail delivers perfectly but I
 dont know whats with crontab on this mail server.

 Thanks


 Jerrale G.
 SC Senior Admin




-- 

Incase of any further queries, Please feel free to mail me or contact me on
the numbers provided below.

Thanks  Regards,
Ravindra Gupta
Asst Manager - Tech Support

Viva Infomedia Pvt. Ltd.
242, Oshiwara Industrial Centre,
New Link Road, Opp. Oshiwara Bus Depot,
Goregaon West, Mumbai 400104.
Direct: +91.22.40310353
Board: +91.22.40310310
Email: ravin...@vivaconnect.in

Viva Infomedia: Awarded as Best SME (E-Commerce) at CNBC Emerging India
Awards 2009


Re: Cron Mail deliver process dying from postfix sendmail.

2010-10-29 Thread Wietse Venema
Ravindra Gupta // Viva:
 Dear Team,
 
 I am using postfix to sending the mail,but some time postfix is working
 very slow.Please give the advice how to improve profermance of postfix to
 send the fast mail.

Please DO NOT ask a NEW question by replying to OLD MAIL.


Multi recipient mail and deferring messages

2010-10-29 Thread Rich Bishop
The first check in our smtpd_recipient_restrictions defers mail for overquota
users:

smtpd_recipient_restrictions =
  check_recipient_access hash:/etc/postfix/overquota,
  check_recipient_access hash:/etc/postfix/legacy-domains,
  .
  .
  .


The overquota map file just defers messages:

x...@domain.edu 460 Mailbox Overquota
y...@domain.edu 460 Mailbox Overquota


This appears to be working fine, but I heard from a hotmail user today that a
multi-recipient message to our domain is being deferred for all
recipients. In the postfix logs I see hotmail connecting to us, attempting two
addresses that are overquota and then hanging up. My impression was that it
should send to all valid recipients and only defer for those that we return a
4xx.


Are we incorrectly configured here?

Thanks,

Rich


Postfix locking up, not accepting connections / smtp not sending emails out

2010-10-29 Thread Christian Rohmann
Hey postfix-users,

we are currently analyzing very strange postfix behavior which I can
only describe as lockup or freeze.
Honestly we reached our abilities to gather more info and to find the
root cause of this issue.

You are my last hope obi wan ... eh Wietse 


--- Setup / Configuration ---
We are running three VMware ESX 4.1 guests (multiple VMware hosts) with
Debian Lenny amd64.
Each machine has 8 virtual CPUs and 4GB of RAM. We are using the Debian
lenny postfix package (2.5.5) and now moved up to 2.7.1 which we
backported (problems remained unchanged).
Each instance of postfix recieves about 200-300 mails/minute. The mail
addresses of the incoming emails are checked using a relay_recipient_map
(proxymap + mysql) are rewritten (domain is changed) and then relayed
using a transport map towards a multi-A record containing two IPs.


--- Lockup Issues ---
The setup and configuration works like a charm for hours at a time and
all of a sudden it stops working leading to two issues (not at the same
time):

1) First issue was that suddenly smtp stopped delivering email to that
mutli-A record. We noticed a few thousand emails in the active queue (I
guess all emails where in the active queue by that time). We rule out
problems with the destination servers since the remaining postfix
instances still delivered mail during that time. Even the last
submission of email from the now locked up postfix finished without
issue. There are just no more tries to reach the destination. Postfix
also stops to deliver to both destination IPs at the same time. There
was no logging anymore, but from anvil giving us some statistics about
connection rates.

2) The second issue, occurring more often is that smtpd stopped
responding or doing anything acutally. Sometimes the number of processes
went to max (500) at other occurences it just stayed at like 30-50 but
symptoms where still the same.

Looking at netstat all tcp connections to the smtpd processes went away
after some time.We believe that clients trying to deliver email to us
did disconnect due to reaching their timeout.

Here is a sniplet of log from before and after the issue:
--- connect before and after lockup ---
Oct 29 12:57:35 mailserver postfix/smtpd[12457]: connect from
newsletter.xxx.de[xxx.162.xxx.28]
Oct 29 13:29:02 mailserver postfix/smtpd[12457]: 826C611FA8:
client=newsletter.xxx.de[xxx.162.xxx.28]
Oct 29 13:29:03 mailserver postfix/smtpd[12457]: lost connection after
RCPT from newsletter.xxx.de[xxx.162.xxx.28]
Oct 29 13:29:03 mailserver postfix/smtpd[12457]: disconnect from
newsletter.xxx.de[xxx.162.xxx.28]
--- / connect before and after lockup ---
 There is not a single line of log from smtpd processes while being in
the locked up state. At 12:57:35 there is a last connect from a client,
no more log happens until we run postfix reload (! a reload is
sufficient) about half an hour later. Then the last few lines show that
the smtpd process is alive again talking to the now already
disconnected client.

Also the master process is still there but not accepting anymore
connections until we issue postfix reload. Here is an strace of the
master process:
--- strace on master-pid while locked up ---
10648 alarm(333)= 307
10648 epoll_wait(13,
--- / strace on master-pid while locked up ---


Another strange thing is, that some of the last log entries contain
UNKNOWN instead of the IP:
--- smtpd connects towards UNKNOWN ---
Oct 29 13:05:02 mailserver postfix/smtpd[14196]: connect from
unknown[unknown]
Oct 29 13:05:02 mailserver postfix/smtpd[14196]: NOQUEUE: reject:
CONNECT from unknown[unknown]: 550 5.7.1 Client host rejected: cannot
find your reverse hostname, [unknown]; proto=SMTP
Oct 29 13:05:02 mailserver postfix/smtpd[14196]: too many errors after
CONNECT from unknown[unknown]
Oct 29 13:05:02 mailserver postfix/smtpd[14196]: disconnect from
unknown[unknown]
--- / smtpd connects towards UNKNOWN ---

Anvil also gives statistics about this obscure client-IP UNKNOWN:
--- anvil statistics for UNKNOWN ---
Oct 29 13:14:57 mailserver postfix/anvil[9312]: statistics: max
connection rate 92/60s for (smtp:unknown) at Oct 29 13:05:02
Oct 29 13:14:57 mailserver postfix/anvil[9312]: statistics: max
connection count 1 for (smtp:unknown) at Oct 29 13:05:02
Oct 29 13:14:57 mailserver postfix/anvil[9312]: statistics: max cache
size 413 at Oct 29 13:05:02
--- / anvil statistics for UNKNOWN ---

During the lock up we still see the *:25 listening socket of the master
process. When we tried to connect ourselves we just waited forever for a
connection or even a SMTP prompt.

New incoming connections are in state SYN_RECV:
--- netstat while locked up---
Proto Recv-Q Send-Q Local Address   Foreign Address State
tcp0  0 x.x.x.143:25   xx.0.xx.217:58556  SYN_RECV
tcp0  0 x.x.x.143:25   xx.115.xx.166:46355   SYN_RECV
tcp0  0 x.x.x.143:25   xx.63.xx.197:37216 SYN_RECV
.
.
.
---/ 

Re: Multi recipient mail and deferring messages

2010-10-29 Thread Wietse Venema
Rich Bishop:
 This appears to be working fine, but I heard from a hotmail user today that a
 multi-recipient message to our domain is being deferred for all
 recipients. In the postfix logs I see hotmail connecting to us, attempting two
 addresses that are overquota and then hanging up.

That's hotmail trying to be clever.

Wietse


Re: Multi recipient mail and deferring messages

2010-10-29 Thread Noel Jones

On 10/29/2010 10:01 AM, Rich Bishop wrote:

The first check in our smtpd_recipient_restrictions defers mail for overquota
users:

smtpd_recipient_restrictions =
   check_recipient_access hash:/etc/postfix/overquota,
   check_recipient_access hash:/etc/postfix/legacy-domains,


Careful there; careless entries in access tables before 
reject_unauth_destination can make you an open relay.

http://www.postfix.org/SMTPD_ACCESS_README.html#danger



   .
   .
   .


The overquota map file just defers messages:

x...@domain.edu 460 Mailbox Overquota
y...@domain.edu 460 Mailbox Overquota


This appears to be working fine, but I heard from a hotmail user today that a
multi-recipient message to our domain is being deferred for all
recipients. In the postfix logs I see hotmail connecting to us, attempting two
addresses that are overquota and then hanging up. My impression was that it
should send to all valid recipients and only defer for those that we return a
4xx.


The remote client should continue trying all recipients for 
the message.



Are we incorrectly configured here?


Did you adjust the value of smtpd_hard_error_limit?  That will 
cause postfix to disconnect early.  If you lowered it (to 2 
maybe?) you might need to bump it back up a bit.

http://www.postfix.org/postconf.5.html#smtpd_hard_error_limit

Even if postfix disconnects for too many errors, the client 
should reconnect (eventually) to try additional recipients, 
but they control when the retry happens.


Or it could just be a hotmail thing.


  -- Noel Jones


Re: Postfix locking up, not accepting connections / smtp not sending emails out

2010-10-29 Thread lst_hoe02

Zitat von Christian Rohmann crohm...@netcologne.de:


Hey postfix-users,

we are currently analyzing very strange postfix behavior which I can
only describe as lockup or freeze.
Honestly we reached our abilities to gather more info and to find the
root cause of this issue.

You are my last hope obi wan ... eh Wietse 


--- Setup / Configuration ---
We are running three VMware ESX 4.1 guests (multiple VMware hosts) with
Debian Lenny amd64.
Each machine has 8 virtual CPUs and 4GB of RAM. We are using the Debian
lenny postfix package (2.5.5) and now moved up to 2.7.1 which we
backported (problems remained unchanged).
Each instance of postfix recieves about 200-300 mails/minute. The mail
addresses of the incoming emails are checked using a relay_recipient_map
(proxymap + mysql) are rewritten (domain is changed) and then relayed
using a transport map towards a multi-A record containing two IPs.



Maybe another instance of this problem?

http://tech.groups.yahoo.com/group/postfix-users/message/269786

Regards

Andreas




smime.p7s
Description: S/MIME Cryptographic Signature


Re: Postfix locking up, not accepting connections / smtp not sending emails out

2010-10-29 Thread Wietse Venema
I will assume that this is a bug in OS software or in emulated hardware.

Does this server run in a virtual machine?

What is the output from grep fatal on today's and yesterday's maillog file?

What is the output from grep watchdog on all your maillog files?

Wietse


Re: Postfix locking up, not accepting connections / smtp not sending emails out

2010-10-29 Thread Christian Rohmann
Hey Wietse,

thanks for the quick reply. Sorry for the delay, was a few GBs of logs
to grep through ;-)

On 10/29/2010 05:49 PM, Wietse Venema wrote:
 I will assume that this is a bug in OS software or in emulated hardware.
Possible, but we are not really having a special setup ... just VMware +
Debian amd64 which hundreds of other shops run I suppose.

 Does this server run in a virtual machine?
Yeah, the Debian Lenny (amd64) runs on VMware ESX 4.1 hosts. The guests
itself are Vmware HW revision 7.

 What is the output from grep fatal on today's and yesterday's maillog file?
None, not a single line.

 What is the output from grep watchdog on all your maillog files?
Same as above - nothing.


I guess that rules out this issue here?

On 10/29/2010 05:43 PM, lst_ho...@kwsoft.de wrote:
 Maybe another instance of this problem?
 http://tech.groups.yahoo.com/group/postfix-users/message/269786


Even though at some point postfix stopped at EPOLL_WAIT...



Christian


Re: Postfix locking up, not accepting connections / smtp not sending emails out

2010-10-29 Thread Wietse Venema
Christian Rohmann:
  Does this server run in a virtual machine?
 Yeah, the Debian Lenny (amd64) runs on VMware ESX 4.1 hosts. The guests
 itself are Vmware HW revision 7.

VMware has an entire KB article on problems with delivering timer
interrupts to guest machines, and the hoops that they are jumping
through to avoid poor performance. See
http://tech.groups.yahoo.com/group/postfix-users/message/269786

  What is the output from grep fatal on today's and yesterday's maillog 
  file?
 None, not a single line.
 
  What is the output from grep watchdog on all your maillog files?
 Same as above - nothing.
 
 I guess that rules out this issue here?

No, it confirms my suspicion that either a) you run Postfix  2.4
and do postfix stop or reload frequently, or b) your virtual
timers are broken, or c) you used grep on compressed files instead
of using zgrep or bzgrep.

All Postfix daemons including the master have an alarm(3) timer
that aborts the process when it becomes stuck.  

Normally all processes reset their alarm timer frequently; when
they become stuck, they stop resetting their alarm timer. When the
timer goes off, it logs a watchdog error and kills the process.

 On 10/29/2010 05:43 PM, lst_ho...@kwsoft.de wrote:
  Maybe another instance of this problem?
  http://tech.groups.yahoo.com/group/postfix-users/message/269786
 
 Even though at some point postfix stopped at EPOLL_WAIT...

That does not look like the problem with postfix stop or reload
with Postfix  2.4 which sometimes triggers a deadlock in syslog().

So we still have the possibility that your timer support is broken
such that even the per-process alarm timer is no longer working.

Postfix relies heavily on timer support to enforce sanity.

Specifically, Postfix relies on short-term timers (implemented with
poll and epoll on Linux) to enforce time limits on read/write
operations, and relies on long-term alarm timers to kill off a
process that hangs because some short-timer failed to go off.

If both layers of safety fail due to broken (virtual) timer support,
then it is not possible to run Postfix reliably.

Wietse


Re: Postfix locking up, not accepting connections / smtp not sending emails out

2010-10-29 Thread Wietse Venema
  On 10/29/2010 05:43 PM, lst_ho...@kwsoft.de wrote:
   Maybe another instance of this problem?
   http://tech.groups.yahoo.com/group/postfix-users/message/269786
  
  Even though at some point postfix stopped at EPOLL_WAIT...

The main loop in the master is as follows:

forever {
set an alarm for 1000s
do an EPOLL_WAIT for up to 500s and handle any child process
events, or short-term timer requests that are implemented
around the EPOLL_WAIT timer.
respond to sighup (the sighup flag is set by a signal handler)
respond to sigchld (the sigchld flag is set by a signal handler)
}

It would be worthwhile to see what strace reports when you leave
it running. If strace reports nothing in 500s then EPOLL_WAIT is
not working. If strace reports nothing after 1000s then the alarm
timer is also not working.

Wietse


Re: Postfix locking up, not accepting connections / smtp not sending emails out

2010-10-29 Thread Christian Rohmann
Hey again,

On 10/29/2010 07:23 PM, Wietse Venema wrote:
 The main loop in the master is as follows:
 
 forever {
 set an alarm for 1000s
 do an EPOLL_WAIT for up to 500s and handle any child process
   events, or short-term timer requests that are implemented
   around the EPOLL_WAIT timer.
 respond to sighup (the sighup flag is set by a signal handler)
 respond to sigchld (the sigchld flag is set by a signal handler)
 }

Just now one machine had the issue again. I checked and saw that we
where down to just two smtpd processes and even though master was still
bound to port 25 no new connections where accepted. I did telnet to it,
but the connection was not accepted and ran into timeout.

How does the timer issue relate to the master process not accepting
anymore TCP/IP connections on port 25?


 It would be worthwhile to see what strace reports when you leave
 it running. If strace reports nothing in 500s then EPOLL_WAIT is
 not working. If strace reports nothing after 1000s then the alarm
 timer is also not working.

I'll try to gather you some strace data. I guess the strace should be of
the master? Could you give me a hint on what options you might want?


On 10/29/2010 07:04 PM, Wietse Venema wrote:
 VMware has an entire KB article on problems with delivering timer
 interrupts to guest machines, and the hoops that they are jumping
 through to avoid poor performance. See
 http://tech.groups.yahoo.com/group/postfix-users/message/269786

Thanks for the hint, I already printed that article to read over the
weekend.





Thanks for your help,


Christian











Re: Postfix locking up, not accepting connections / smtp not sending emails out

2010-10-29 Thread Jeroen Geilman

On 10/29/2010 05:35 PM, Christian Rohmann wrote:

Hey postfix-users,

we are currently analyzing very strange postfix behavior which I can
only describe as lockup or freeze.
Honestly we reached our abilities to gather more info and to find the
root cause of this issue.

You are my last hope obi wan ... eh Wietse 


--- Setup / Configuration ---
We are running three VMware ESX 4.1 guests (multiple VMware hosts) with
Debian Lenny amd64.
Each machine has 8 virtual CPUs and 4GB of RAM.


Wow. Seriously ? For *email* ?


  We are using the Debian
lenny postfix package (2.5.5) and now moved up to 2.7.1 which we
backported (problems remained unchanged).
Each instance of postfix recieves about 200-300 mails/minute.


That's... frankly, that's nothing.
We do that volume on a single-vcpu with 512MB.

At 5% CPU utilization.

Which makes me suspect you have other issues.
Is your networking sane/stable/provably working 100% ?
Which vmware NIC are you using ?
I have seen incompatibilities in Linux with anything but the standard 
Intel E1000 emulator, so I suggest you use only that.



The setup and configuration works like a charm for hours at a time and
all of a sudden it stops working leading to two issues (not at the same
time):

1) First issue was that suddenly smtp stopped delivering email to that
mutli-A record. We noticed a few thousand emails in the active queue (I
guess all emails where in the active queue by that time). We rule out
problems with the destination servers since the remaining postfix
instances still delivered mail during that time. Even the last
submission of email from the now locked up postfix finished without
issue. There are just no more tries to reach the destination. Postfix
also stops to deliver to both destination IPs at the same time. There
was no logging anymore, but from anvil giving us some statistics about
connection rates.

2) The second issue, occurring more often is that smtpd stopped
responding or doing anything acutally. Sometimes the number of processes
went to max (500) at other occurences it just stayed at like 30-50 but
symptoms where still the same.
   


Again looks like a networking issue to me - however, the issue may well 
be *caused* by a timer malfunction.


Timing is rather important for dependable TCP, after all.


Looking at netstat all tcp connections to the smtpd processes went away
after some time.We believe that clients trying to deliver email to us
did disconnect due to reaching their timeout.
   


That means the symptom definitely surfaces in master(8), as that accepts 
all connections.



Here is a sniplet of log from before and after the issue:
--- connect before and after lockup ---
Oct 29 12:57:35 mailserver postfix/smtpd[12457]: connect from
newsletter.xxx.de[xxx.162.xxx.28]
Oct 29 13:29:02 mailserver postfix/smtpd[12457]: 826C611FA8:
client=newsletter.xxx.de[xxx.162.xxx.28]
Oct 29 13:29:03 mailserver postfix/smtpd[12457]: lost connection after
RCPT from newsletter.xxx.de[xxx.162.xxx.28]
Oct 29 13:29:03 mailserver postfix/smtpd[12457]: disconnect from
newsletter.xxx.de[xxx.162.xxx.28]
--- / connect before and after lockup ---
  There is not a single line of log from smtpd processes while being in
the locked up state. At 12:57:35 there is a last connect from a client,
no more log happens until we run postfix reload (! a reload is
sufficient) about half an hour later. Then the last few lines show that
the smtpd process is alive again talking to the now already
disconnected client.

Also the master process is still there but not accepting anymore
connections until we issue postfix reload. Here is an strace of the
master process:
--- strace on master-pid while locked up ---
10648 alarm(333)= 307
10648 epoll_wait(13,
--- / strace on master-pid while locked up ---


Another strange thing is, that some of the last log entries contain
UNKNOWN instead of the IP:
--- smtpd connects towards UNKNOWN ---
Oct 29 13:05:02 mailserver postfix/smtpd[14196]: connect from
unknown[unknown]
Oct 29 13:05:02 mailserver postfix/smtpd[14196]: NOQUEUE: reject:
CONNECT from unknown[unknown]: 550 5.7.1 Client host rejected: cannot
find your reverse hostname, [unknown]; proto=SMTP
Oct 29 13:05:02 mailserver postfix/smtpd[14196]: too many errors after
CONNECT from unknown[unknown]
Oct 29 13:05:02 mailserver postfix/smtpd[14196]: disconnect from
unknown[unknown]
--- / smtpd connects towards UNKNOWN ---

Anvil also gives statistics about this obscure client-IP UNKNOWN:
--- anvil statistics for UNKNOWN ---
Oct 29 13:14:57 mailserver postfix/anvil[9312]: statistics: max
connection rate 92/60s for (smtp:unknown) at Oct 29 13:05:02
Oct 29 13:14:57 mailserver postfix/anvil[9312]: statistics: max
connection count 1 for (smtp:unknown) at Oct 29 13:05:02
Oct 29 13:14:57 mailserver postfix/anvil[9312]: statistics: max cache
size 413 at Oct 29 13:05:02
--- / anvil statistics for UNKNOWN ---

During the lock up we still see the *:25 listening socket of the master

Re: postfix clustering

2010-10-29 Thread Peter
Hi Stan,


 
 I think Victor meant not a Postfix issue.  If you
 want to build a mail
 store cluster over a WAN link, start your reading here:


 
 http://www.drbd.org
 http://sourceware.org/cluster/gfs/
 
 The combination of these will allow you to accomplish your
 cluster goal.
  Depending on the aggregate write bandwidth of your MTAs
 and delete b/w
 of your POPD, you may need a site-to-site link of anywhere
 from 10Mb/s
 to 100Mb/s, or maybe even more.  If your two servers
 are located in
 buildings on the same campus and connected via 100/1000Mb/s
 ethernet
 then this solution will work very well.  If your two
 servers are located
 in two internet colocation facilities and your b/w is
 limited to 10Mb/s
 or less, RTTs are unstable, etc, then this solution may not
 work well
 for you.  Mirroring a disk over a network requires a
 stable quality network.

I agree with your point.
the above solution should work well if the active/active server
are located in the same location.

for the machines in different data center, there is no guarantee of speed.

also, making the server run in a different data center is fail-over protection 
solution.

using rsync is a way to synchronize the storage. 
however, multiple MX record only works well if pointing to servers
in the same data center sharing the same storage for imap.

thus, a valid solution is to change the IP address of imap server 
when failover is required. but the dns propergation might take up
to three days. is there a better alternative? 


guess it is something beyond postfix to handle. not sure how postfix users will 
handle such an issue?

Thanks.

Peter







Re: Postfix locking up, not accepting connections / smtp not sending emails out

2010-10-29 Thread Wietse Venema
Christian Rohmann:
  It would be worthwhile to see what strace reports when you leave
  it running. If strace reports nothing in 500s then EPOLL_WAIT is
  not working. If strace reports nothing after 1000s then the alarm
  timer is also not working.
 
 I'll try to gather you some strace data. I guess the strace should be of
 the master? Could you give me a hint on what options you might want?

I was replying to your text that strace reported the master was in
EPOLL_WAIT.  I suggest that you find out what strace reports when
you leave it running for a longer time.

Wietse


Re: postfix doubling emails and spam!

2010-10-29 Thread Al Zick

Hi,

On Oct 27, 2010, at 11:50 PM, Noel Jones wrote:


On 10/27/2010 7:02 PM, Al Zick wrote:

Is there a replacement for procmail? I know it seemed to take
longer and did raise cpu usage, but when I first installed it
with bogofilter, it almost eliminated spam getting into my inbox.


depends on why you're using procmail...  If you need a way to  
interface spam/virus filtering, amavisd-new + spamassassin + clamav  
+ sanesecurity clam signatures are a popular and effective  
combination, although SpamAssassin can use quite a bit of resources.


Currently, I just use procmail to interface with the spam filters. I  
would really like to put a bunch of rules into procmail too, for  
example: if is sees the word viagra anywhere in the email, it is  
spam, there is no reason to go any further with it.


Right now, I am concerned that I would need a quad core, quad  
processor system that was dedicated to just running spamassassin, so  
I am looking at other solutions.




problems lately have been with email. I feel like I need to
get postfix to stop using so much cpu.


Show some evidence. Postfix shouldn't use very much CPU.



per second hitting the mail server just to be temporarily
bounced by the graylisting when in the end they get bounced
anyway. Even after they are bounced, they just keep coming
anyway.



Most greylist services use DEFER_IF_PERMIT so that mail that can be  
permanently rejected is not deferred to retry.


I think that I need to accept and delete email that is being sent to  
maybe the top few email address that don't exist   and never had  
existed. They add the most lines to the log. When I was just  
accepting them and deleting them, then the log was very quiet.


If your forwarded mail is what's attempting repeated delivery  
despite being rejected, you'll need to whitelist those servers and  
eat the mail.  Otherwise, firewall clients who refuse to go away.


I will definitely be whitelisting all the servers that forward email  
to me. I will also be whitelisting all my friend's mail servers. This  
will probably help with a lot of the bounce rebouncing.





Identify the problem, then address it







Sounds as if you've foolishly set soft_bounce = yes


# postconf -d | grep soft_bounce
soft_bounce = no



man postconf to see what -d does and why the above information  
is useless.


But no matter; soft_bounce doesn't appear in your postconf -n  
listing, so that's not it.


Is there anything else that could cause a soft_bounce?



[postconf output]

bounce_queue_lifetime = 2d
default_destination_concurrency_limit = 5
default_process_limit = 15
maximal_backoff_time = 4h
maximal_queue_lifetime = 3d
minimal_backoff_time = 2h
qmgr_message_active_limit = 50
qmgr_message_recipient_limit = 50
queue_run_delay = 30m


Your settings resemble what someone with an underpowered server  
with a bad backscatter problem might try.  If that's not your  
situation, use the defaults.  If that *is* your situation, address  
the source of the problem rather than putting postfix colored band- 
aids on it.


What exactly is a backscatter problem?

If I do have a backscatter problem, what should the settings be?

Mucking around with the above settings is a good way to cripple  
postfix performance.  Tread carefully here.


With a process limit of 15, any server less than 10 years old  
should hardly get above idle.  The default has been 100 for years;  
most servers can easily support several times that.


This install of postfix is from a few years ago and it was not up to  
date then (it is what installed with the OS and I never updated it).  
A friend of mine recompiled OS for better optimization. I think it  
was already pretty old when I install it. Really, I was supposed to  
upgrade Postfix through the packaging system because there was some  
known problems with what came with the OS, but I never did. I had a  
friend of mine look at it because it would not receive or send emails  
to the outside world, and I am not really sure what he did anymore. I  
think he added one line to master.cf and I think he had me make other  
changes to master.cf (although, he may have made them). I do remember  
that the server would basically not work at all and I think the  
process limit was set to something lower and I raised it to 15. This  
server runs a lot of other things, like 2 web servers, named, squid,  
and a whole lot of custom written software, and it pretty much does  
everything that both of my other dedicated servers do, so that may be  
why it was set so low.


Could this be one of the reasons I see so many bounces in the log?  
Would this act like a soft bounce? Besides the process limit what  
else should be raised?






smtpd_recipient_restrictions = permit_mynetworks,
reject_unauth_destination, reject_invalid_hostname,
reject_unauth_pipelining, reject_non_fqdn_sender,
reject_unknown_sender_domain, reject_non_fqdn_recipient,
reject_unknown_recipient_domain, reject_rbl_client

Re: postfix doubling emails and spam!

2010-10-29 Thread Jeroen Geilman

On 10/29/2010 09:39 PM, Al Zick wrote:


Currently, I just use procmail to interface with the spam filters. 


Procmail is expensive to run.
If you use amavisd-new with SA, it will control those processes outside 
of mailbox delivery.


I would really like to put a bunch of rules into procmail too, for 
example: if is sees the word viagra anywhere in the email, it is spam, 
there is no reason to go any further with it.


That would be trivial with a body_check (although they are generally slow).

I'm also quite positive that spamassassin can do ANY kind of full-text 
scan, on any conditions.




Is there anything else that could cause a soft_bounce?



Don't accept mail you cannot deliver.
That's Rule #1 of spam prevention.


What exactly is a backscatter problem?


Ehm. Backscatter is accepting mail from forged senders that bounces. You 
send the bounce back to the forged address.




If I do have a backscatter problem, what should the settings be?


Don't accept mail you cannot deliver.
Run strict sender verification if you want to avoid backscatter.

I have several websites that I own that are in the top 1,000,000 sites 
based on traffic according to Alexa and although this server only 
hosts the email for like 30 some domains. I seem to get more than my 
fair share of spam. Right now, it is still manageable, but soon I will 
need a very high end dedicated mail server, if I don't change 
something. Personally, I feel my config is wrong and that is why I am 
asking some questions.




You are not using any HELO restrictions. That is generally not a good 
idea, as my HELO checks catch more spam than all other restrictions 
combined.
Also, system performance (or the lack thereof) is greatly influenced by 
the ordering of your spam checking - do the most expensive tests last, 
and as little as possible.


I use sane HELO and sender/recipient checks, and a single RBL - zen.
Anything that passes that far goes to amavisd-new with SA and clamav.
SA finds maybe one message in 20 or 30 to be spam.
I usually don't worry about it after that, but you can run the 
daily-updated rules-du-jour ruleset in SA.


I was also looking at something else and it looks like Postfix was 
built without pcre. Will I be able to use header checks without this?


You can still use regexp if that is compiled in, but the man page says 
it is slower than pcre.



--
J.



Re: Multi recipient mail and deferring messages

2010-10-29 Thread Victor Duchovni
On Fri, Oct 29, 2010 at 11:01:51AM -0400, Rich Bishop wrote:

 The overquota map file just defers messages:
 
 x...@domain.edu 460 Mailbox Overquota
 y...@domain.edu 460 Mailbox Overquota

What is this 460 code? Don't invent non-standard SMTP reply codes.

The correct response is:

452 4.2.2 Mailbox full

Hotmail may behave better if you give them a more reasonable response
code.

-- 
Viktor.


Re: Postfix locking up, not accepting connections / smtp not sending emails out

2010-10-29 Thread Wietse Venema
Christian Rohmann:
 Hey again,
 
 On 10/29/2010 07:23 PM, Wietse Venema wrote:
  The main loop in the master is as follows:
  
  forever {
  set an alarm for 1000s
  do an EPOLL_WAIT for up to 500s and handle any child process
  events, or short-term timer requests that are implemented
  around the EPOLL_WAIT timer.
  respond to sighup (the sighup flag is set by a signal handler)
  respond to sigchld (the sigchld flag is set by a signal handler)
  }
 
 Just now one machine had the issue again. I checked and saw that we
 where down to just two smtpd processes and even though master was still
 bound to port 25 no new connections where accepted. I did telnet to it,
 but the connection was not accepted and ran into timeout.

This means that the smtpd processes are hanging, the master is
hanging, or both. 

At this point I will not speculate further until you report the
result of following the instructions in
http://www.postfix.org/DEBUG_README.html#logging

If I don't see a credible report about warnings etc.  in Postfix
logfiles, then that means that you are flying blind, and that needs
to be addressed first.

The following is for background information only.

The master daemon watches the SMTP port only when all existing
smtpd processes have reported that they are busy (i.e. talking to
an SMTP client).  Otherwise, some idle smtpd process will watch
the port.

When all smtpd processes have reported that they are busy, the
master starts a new smtpd processes in response to a new connection,
provided that the per-service process limit is not reached (otherwise
the master logs a warning that all ports are busy).

In your case, the two smtpd processes got stuck before sending the
I am busy message to the master daemon, so the master daemon
still believes that the two processes are idle. I don't know if
this has anything to do with broken virtual timers.

Regardless of why a process hangs, if it hangs then you should see
watchdog errors in the Postfix logs. If you don't see those then
either your virtual timer is busted, or your logging is busted.

Wietse


Re: postfix doubling emails and spam!

2010-10-29 Thread mouss

Le 29/10/2010 21:39, Al Zick a écrit :
Currently, I just use procmail to interface with the spam filters. I 
would really like to put a bunch of rules into procmail too, for 
example: if is sees the word viagra anywhere in the email, it is spam, 
there is no reason to go any further with it.


if it's that simple, then you can do it in postfix (body checks). but 
such simple rules will generate FPs. for example, your own mail contains 
that word. and if you push the game, you'll encounter sussex, charles 
dickens, socialist, via granada, ... etc. besides, it's been a long time 
that spammer know how to evade. you've probably seen \/1...@_gggr@ and 
the like.





Right now, I am concerned that I would need a quad core, quad 
processor system that was dedicated to just running spamassassin, so I 
am looking at other solutions.


did you measure? many commercial companies sell spam filters based on 
spamassassin without uch cpu/ram. it's not about spamassassin, perl, ... 
etc. it's about what you check. if you look for millions of strings in 
mail, then you'll have problems, even if you code that in assembly (or 
even if you create a processor that only does that!).


I think that I need to accept and delete email that is being sent to 
maybe the top few email address that don't exist   and never had 
existed. They add the most lines to the log. When I was just accepting 
them and deleting them, then the log was very quiet.


don't. if some log lines annoy you, use a script to ignore them. don't 
accept and delete mail. what if I mistype your address and write to 
a...@family... ? ('k' is near 'l' on my keyboard).


I will definitely be whitelisting all the servers that forward email 
to me. I will also be whitelisting all my friend's mail servers. This 
will probably help with a lot of the bounce rebouncing.


sure, but unfortunately that's work that never get finished.


What exactly is a backscatter problem?


it's when a server accepts mail during the smtp transaction, then a 
bounce is caused later. your logs should tell. (example reasons are: 
incorrect address validation. quota. ... etc).


now, I'd bet that your problem is the exit status of procmail. when it 
fails because of a temporary error, it should not tell postfix that this 
is a permanent failure.




If I do have a backscatter problem, what should the settings be?


hard to tell without knowing what causes the problem. if you don't need 
procmail, remove it and see. (I fail to see why you would need procmail 
to interface with a spam filter).




This install of postfix is from a few years ago and it was not up to 
date then (it is what installed with the OS and I never updated it). A 
friend of mine recompiled OS for better optimization. I think it was 
already pretty old when I install it. Really, I was supposed to 
upgrade Postfix through the packaging system because there was some 
known problems with what came with the OS, but I never did. I had a 
friend of mine look at it because it would not receive or send emails 
to the outside world, and I am not really sure what he did anymore. I 
think he added one line to master.cf and I think he had me make other 
changes to master.cf (although, he may have made them). I do remember 
that the server would basically not work at all and I think the 
process limit was set to something lower and I raised it to 15. 


what process limit do you talk about? postfix contains many processes.

This server runs a lot of other things, like 2 web servers, named, 
squid, and a whole lot of custom written software, and it pretty much 
does everything that both of my other dedicated servers do, so that 
may be why it was set so low.


again, this means nothing. a single script can ruin your server (try a 
while (true); do fork; done). don't count the number of applications, 
servers, processes. Instead, measure the load of your system.



[snip]
I spend a lot of time trying to deal with spam. What I have found is 
that I need to update my spam filtering often, but still I seem to 
need to totally revamp the way that I am dealing with spam. I can't 
seem to get away with a lot of false positives, yet I don't want to 
deliver the amount of spam that I have been.


I have several websites that I own that are in the top 1,000,000 sites 
based on traffic according to Alexa and although this server only 
hosts the email for like 30 some domains. I seem to get more than my 
fair share of spam.


don't believe that. what is your spam ratio? (I mean spam/(spam+ham)). 
if it's less than 90%, then feel lucky...



Right now, it is still manageable, but soon I will need a very high 
end dedicated mail server, if I don't change something. Personally, I 
feel my config is wrong and that is why I am asking some questions.


the question is: what kind of server do you have and how much mail do 
you receive?




I was also looking at something else and it looks like Postfix was 
built without pcre. Will I be able to 

RE: Postfix 2nd instance

2010-10-29 Thread motty.cruz

I wish to manually delivered email to 2nd instance of Postfix instead of
going through Amavisd-release function for a reason other than do not want
to edit Amavisd-new configuration. 

Can you tell me the command to accomplish the above request? 

Thanks, 
-Motty

-Original Message-
From: owner-postfix-us...@postfix.org
[mailto:owner-postfix-us...@postfix.org] On Behalf Of Noel Jones
Sent: Thursday, October 21, 2010 2:44 PM
To: postfix-users@postfix.org
Subject: Re: Postfix 2nd instance

On 10/21/2010 4:31 PM, motty.cruz wrote:
 Hello,
 I have two instances of Postfix running on FreeBSD 8.1. First instance 
 of Postfix received email from outside world and deliver it to
Amavisd-new.
 After scanning the email Amavisd-new hand it back to 2nd instance of 
 Postfix. Amavisd-new by default banned *.exe files. I have a genuine 
 email stock in the banned folder with an .exe attachment. I want to 
 release that email but I don't want it to go through the scanning 
 process again. Can you help me release that email to the 2nd instance of
Postfix?

 I appreciate your help!

 Thanks,
 -Motty


Use the amavisd-release function to release the message from quarantine.
Amavisd-new will then release it to the second postfix instance.

For more info, see the amavisd-new docs or ask on the amavis-users list.

   -- Noel Jones



Re: Postfix 2nd instance

2010-10-29 Thread Jerry
On Fri, 29 Oct 2010 15:15:02 -0700
motty.cruz motty.c...@gmail.com articulated:

 I wish to manually delivered email to 2nd instance of Postfix instead
 of going through Amavisd-release function for a reason other than do
 not want to edit Amavisd-new configuration. 
 
 Can you tell me the command to accomplish the above request?

Please don't Top Post.

Start here: http://www.postfix.org/MULTI_INSTANCE_README.html

-- 
Jerry ✌
postfix-u...@seibercom.net
_
TO REPORT A PROBLEM see http://www.postfix.org/DEBUG_README.html#mail
TO (UN)SUBSCRIBE see http://www.postfix.org/lists.html



Re: THREAD KILLED (Posfix: deliver to spam bla blah drivel)

2010-10-29 Thread Noel Butler


 i know all you know nothing  then kill the thread so people can't
show you might be wrong or defend themselves, oh my. how nice,  now I
recall why i probably left this list last time! I hope you get that new
job in the censorship office, you've got the right credentials.

and for the record, I dont give a rats ass of your opinion, if some turd
is trying to pollute my networks,  I dont want them here, period!

In use of ANY anti spam method there will ALWAYS be collateral damage,
regardless of what method is used. Thats why god invented whitelists.


Now I shall be silent

On Fri, 2010-10-29 at 08:42 -0400, Wietse Venema wrote:

 Wietse:
  [About blocking all mail from an ISP because some customer sent spam]
  Such an approach makes sense only if receiving one spam message is
  a bigger problem than losing a larger amount of legitimate email.
 
 Noel Butler:
  But how do you know its only  one  I'm sure if this IP in question was
 
 Your response shows that you have no clue about the ham:spam ratio
 (I do have a clue: I've been customer for 14 years and can't remember
 when I last received spam that originates from their network).
 
 This is a technical mailing list about Postfix. There is no room
 here for contributions without quantitative technical content.
 
 From now on there is a taboo on SORBS, just like SPF.  Trespassers
 will be shot.
 
   Wietse




Re: THREAD KILLED (Posfix: deliver to spam bla blah drivel)

2010-10-29 Thread Noel Butler
On Sat, 2010-10-30 at 09:26 +1000, Noel Butler wrote:

 
 
  i know all you know nothing  then kill the thread so people can't
 show you might be wrong or defend themselves, oh my. how nice,  now I
 recall why i probably left this list last time! I hope you get that
 new job in the censorship office, you've got the right credentials.
 
 and for the record, I dont give a rats ass of your opinion, if some
 turd is trying to pollute my networks,  I dont want them here, period!
 
 In use of ANY anti spam method there will ALWAYS be collateral damage,
 regardless of what method is used. Thats why god invented whitelists.
 
 
 Now I shall be silent


Damn, hit enter too soon, I shall unsub from this list to save you the
trouble, I've learnt nothing really since I've been here anyway :)
and my only real question about postfix and teh probable bug given its
unnecessary double sql queries in many cases, went unanswered and
ignored.

have fun kiddies


attachment: face-smile.png

Re: postfix clustering

2010-10-29 Thread Stan Hoeppner
Peter put forth on 10/29/2010 1:55 PM:

 guess it is something beyond postfix to handle. not sure how postfix users 
 will handle such an issue?

Attempting to architect your remote site cluster or failover solution
via back-n-forth to the Postfix mail list is not the proper way to go
about this.  We can give you pointers, but we can't architect the
solution for you.  There are too many variables and too much complexity
involved, and the solution will be specific to your individual
situation, of which providing us all the necessary details would be
nearly impossible in this format.

It seems you are in over your head and aren't grasping some of the basic
principles of Disaster Avoidance and Recovery 101.  For instance, cheap
and reliable are usually mutually exclusive when it comes to ISPs/Colos.
 If your remote link is not reliable, then rsync will be no more
reliable than DRBD, true?

It's probably time to bring in a paid consultant to help you architect
this.  If you can't afford one, then you really can't afford a good off
site resiliency solution.  If this is the case, then you should
seriously consider outsourced email hosting which will give you all of
what you want, albeit with possibly a little less control.

-- 
Stan