Hi there,
I had a big giant email planned here, but as I was writing it I narrowed
down the scope of the problem we're having to a recursive stat call (I
think) in vdelivermail.c
First, some background on the setup:
I'm in the process of migrating a 12 G, ~5000 user
sendmail/aliases/virtualuser system to a qmail/vpopmail one, using MySQL as
the backend and am having a single problem holding me up.
We've got a cluster of 3 delivery machines, with a /vpopmail parition
shared over NFS. The NFS server is also the MySQL DB server where the
backend is done. /vpopmail is a 3Ware RAID 10 running ReiserFS. (We've
tried both the defaults and noatime/notail.)
All the 800 or so virtual domains are empty (save for the postmaster
account) and filled with .qmail-vuser files that forward to
[EMAIL PROTECTED] When a vpopmail user is made at one of those
domains, delivery happens instantaneously. Delivering to any vpopmail user
at the default domain results in vdelivermail hanging for 2-10 minutes
before finally delivering the message.
vuserinfo -d [EMAIL PROTECTED] works fine, which led me to
believe it was not a MySQL table problem (we're not using many_domains).
The vdelivery hang occurs whether delivering directly ON the NFS server, or
delivering on one of the cluster servers (though the time of the delay
varies unpredictably), which leads me to think that it's not an NFS
problem. Standard NFS read/writes are fine.
Additionally, copying files into and out of user's Maildirs manually works
fine, and squirrelmail and courier-imap are handling the situation fine as
well.
Attempted delivery to non-existant addresses gives a failure message
immediately.
Manual testing was done with a line like below, to verify it wasn't
anything else in qmail:
cat /vpopmail/testing/samplemail.txt | env EXT=cleaver
HOST=defaultdomain.com vdelivermail '' bounce-no-mailbox
Okay, as I was writing the above message, I decided to strace the running
vdelivermail process and discovered that vdelivermail was looping here:
stat64(/etc/vpopmail/domains/defaultdomain.com/5/charlenes/Maildir//new/1078418383.M015727P2293.haku.defaultdomain.com,
{st_mode=S_IFREG|0644, st_size=11180, ...}) = 0
stat64(/etc/vpopmail/domains/defaultdomain.com/5/charlenes/Maildir//new/1078418397.M208677P5866.haku.defaultdomain.com,
{st_mode=S_IFREG|0644, st_size=2123, ...}) = 0
stat64(/etc/vpopmail/domains/defaultdomain.com/5/charlenes/Maildir//new/1078418401.M185492P7109.haku.defaultdomain.com,
{st_m
[later]
stat64(/etc/vpopmail/domains/defaultdomain.com/E/gary/Maildir//new/1078419549.M564758P6609.haku.defaultdomain.com,
{st_mode=S_IFREG|0644, st_size=2744, ...}) = 0
stat64(/etc/vpopmail/domains/defaultdomain.com/E/gary/Maildir//new/1078419549.M438602P6573.haku.defaultdomain.com,
{st_mode=S
It appears to be stating every single message in every user underneath the
default domain's directory(!). Given that there is about 12 GB of mail
that's being transferred over in the test systems (before we go live), that
would explain the long delay. As it gets cached by NFS or the local disk
array, the time the stats take vary.
Any ideas on why it might be doing this? I'm looking over count_dir in
vdelivermail.c right now and not seeing it. =(
Sincerely,
Japheth J.C. Cleaver