Problem:  vpopmail authentications failing randomly

grep vchkpw /var/log/maillog | grep -v success | grep dlb
May 24 11:45:03 mail01 vpopmail[40833]: vchkpw-pop3: vpopmail user
not found [EMAIL PROTECTED]:
May 24 11:50:03 mail01 vpopmail[41401]: vchkpw-pop3: vpopmail user
not found [EMAIL PROTECTED]:
May 24 11:55:04 mail01 vpopmail[42117]: vchkpw-pop3: vpopmail user
not found [EMAIL PROTECTED]:
May 24 12:00:04 mail01 vpopmail[42735]: vchkpw-pop3: vpopmail user
not found [EMAIL PROTECTED]:
May 24 12:50:06 mail01 vpopmail[51623]: vchkpw-pop3: vpopmail user
not found [EMAIL PROTECTED]:
May 24 12:55:07 mail01 vpopmail[52208]: vchkpw-pop3: vpopmail user
not found [EMAIL PROTECTED]:
May 24 13:00:06 mail01 vpopmail[52799]: vchkpw-pop3: vpopmail user
not found [EMAIL PROTECTED]:
May 24 13:20:16 mail01 vpopmail[55953]: vchkpw-pop3: vpopmail user
not found [EMAIL PROTECTED]:
May 24 13:48:23 mail01 vpopmail[13650]: vchkpw-pop3: vpopmail user
not found [EMAIL PROTECTED]:

These are happening for various accounts, on a seemingly random
basis. Out about 4,000 authentications today, it failed 100 times.
qmail-pop3d is run as follows:

exec softlimit -m 4096000 tcpserver -H -R -c50 0 pop3 qmail-popup vchkpw qmail-pop3d Maildir 2>&1

System Specs:
Pentium III 1.0GHz - 1.0GB RAM
36GB ATA disk
FreeBSD 4.10-stable
MySQL 4.0.24 with linuxthreads, running locally
Vpopmail 5.4.2-5.4.10
~400 users.

Previous versions of MySQL were not compiled with linuxthreads but
this problem existed then as well.  The my.cnf file is based on my-
medium.cnf, with two tweaks appropriate for this system (skip-innodb,
bin-log disabled).

I have used vpopmail versions 5.4.2, 5.4.8 and 5.4.10 and all exhibit
this problem.  I also see this error in the maillogs on occasion:

delivery 14578: failure: vmysql:_sql_error
delivery 14641: failure: vmysql:_sql_error

There are no errors reported in MySQL's .err log. There are no other
related errors reported in the system logs. The MySQL load is quite
light, with less than one query executed every few seconds.  The
system load is also fairly light, mostly hovering between 0.4 and 0.5.

All the symptoms are indicative of vpopmail having an issue with
MySQL, so I set up a little perl script to test for me while I tried
to replicate the problem.

my $email = $1 || '[EMAIL PROTECTED]';
my $limit = $2 || 1000;
print "checking for $email $limit times  (each . = success) ";
for ( my $i = 0; $i < $limit; $i++){
         my $dir =`~vpopmail/bin/vuserinfo -d $email`;
         chomp $dir;
         -d $dir ? print "." : print "$i fail\n";
         sleep 1;

So, I run this script in one terminal window while trying to trigger
the problem in another. As near as I can tell, the problem is always
during times when the system is busy. I cannot replicate this problem
on any of my own servers.  To help narrow down the problem, I put the
system under sustained heavy load  (make buildworld).  As expected, I
get frequent authentication and test (~45%) failures.

So, there is certainly a problem with vpopmail and it's MySQL
interaction. Again, this is with versions including 5.4.2, 5.4.8, and
5.4.10.  I've seen random cases of this but I've never had an
instance where it was repeatable.

In such a case, shouldn't the vpopmail programs be returning a error
indicative of the problem instead of a "user not found" error?   A
"user not found" error is not even close to accurately describing
what the problem is.

During this heavy load testing, I wanted to see if the issue was
MySQL or vpopmail.  To test this I ran another MySQL client and see
if it too has problems interacting with the MySQL server. This is
another C program, also compiled to access the vpopmail database and
do a query. I also ran this program simultaneously with the test
script and the "make buildworld".  While the vpopmail test script was
failing every other authentication, the other program succeeded,
every single time, never failing once.

So, I turned to vmysql.c and noticed a timeout setting there that
affects the mysql connection timeout. I bumped it up from 2 to 5, and
it has reduced the failure frequency but it's still happening fairly
regularly. There are some slow queries in MySQL, but only 15 from the
last day, so that doesn't even closely correspond with 300 "no found"

So, anyone got ideas on how to debug this issue further?


Show me a piano falling down a mineshaft and I'll show you A-flat minor.

Reply via email to