Re: [qmailadmin] Major bug in qmailadmin when using MySQL replication.

2002-10-22 Thread Matt Simerson
Apologies, I meant to click SAVE instead of send as I left work 
yesterday. Anyway, appended to this email is The Rest of The Story.

On Monday, October 21, 2002, at 06:24  PM, Matt Simerson wrote:

OK, this will be fun to explain.

I have a large vpopmail install that I'm running this on so the system 
is quite busy.  I recently converted this system from .cdb to MySQL 
authentication. In so doing, we've happened upon a few anomalies, the 
strangest one being that qmailadmin functions only worked 
intermittently.

We use MySQL replication and have a MySQL master running on a quad 
Xeon that is more than adequate hardware wise. We replicate to each 
mail server via MySQL replication. Vpopmail is then configured to 
write updates to the master and read from the slave on localhost. This 
works quite well for vpopmail.

So, upon noticing that qmailadmin wasn't working reliably, we started 
poking around and saw that every time we made a change in qmailadmin, 
we'd get two updates written to the MySQL master (as observed in 
binlog). The first would have all the values we wanted, the second 
would end up resetting those values to exactly what they were before 
the attempted change was made.

So, a little visit to the qmailadmin source code revealed the culprit 
in user.c.

If you read through user.c, you'll find a subroutine modusergo. In 
there, is the guts of what happens when you click on modify user and 
then submit.  Reading through that, it appears that if we get a new 
password, we make a call to vpasswd. This function, I would guess is 
against libvpopmail.a and connects to the MySQL update server and 
updates the password, in addition to running crypt() against the 
supplied password. At this point, all is well.

However, in the very next function, the problem becomes manifest. We 
check for the value of gecos and if it exists (which it always does 
because it's value is in the HTML form) then it first makes a call to 
vauth_getpw. Testing has revealed that as expected, it does the 
database read from the MySQL read server on localhost.  Unfortunately, 
it's making the query a tiny fraction of a second after writing the new 
password into MySQL update server. I'm sure Bill and Ken both know the 
problem now.

If you weren't using replication, this wouldn't be a problem (and isn't 
on my test servers) but what ends up happening is one SQL query arrives 
at the update server and sets the values that you changed 
appropriately. The next query arrives right behind it and resets the 
values to exactly what they were before the first query updated them.

So, the bandaid I currently applied is putting a 5 second sleep in 
there so that the first change gets a chance to replicate before we run 
vauth_getpw. However, this is a distinctly BAD way to fix it. There's 
no way to guarantee that a change made to the MySQL master has 
propagated to the slave. In 5 seconds it's a 99% probability but I'd 
argue that isn't good enough.

I can only think of one guaranteed way to fix it. That is to make sure 
that when qmailadmin does it's vauth_getpw call, it reads from the 
MySQL master. This is the only guaranteed way to assure that a 
successful database update written a fraction of a second before gets 
read back.

This could be a even bigger problem for people that don't manage their 
MySQL replication well. Imagine having a farm of replicated servers 
where one is out of sync and the admins don't notice. A (l)user hits 
the server that's database isn't replicated and see's their old gecos 
field that they updated last week. The update it again and it resets 
their password back to their old one which they had updated the week 
before. That's not good manners.

So, anyone else have suggestions on the best way to fix this?

Matt




Re: [qmailadmin] Major bug in qmailadmin when using MySQL replication.

2002-10-22 Thread Bill Shupp
On Tuesday, October 22, 2002, at 07:49  AM, Matt Simerson wrote:


Apologies, I meant to click SAVE instead of send as I left work 
yesterday. Anyway, appended to this email is The Rest of The Story.

On Monday, October 21, 2002, at 06:24  PM, Matt Simerson wrote:

OK, this will be fun to explain.

I have a large vpopmail install that I'm running this on so the 
system is quite busy.  I recently converted this system from .cdb to 
MySQL authentication. In so doing, we've happened upon a few 
anomalies, the strangest one being that qmailadmin functions only 
worked intermittently.

We use MySQL replication and have a MySQL master running on a quad 
Xeon that is more than adequate hardware wise. We replicate to each 
mail server via MySQL replication. Vpopmail is then configured to 
write updates to the master and read from the slave on localhost. 
This works quite well for vpopmail.

So, upon noticing that qmailadmin wasn't working reliably, we started 
poking around and saw that every time we made a change in qmailadmin, 
we'd get two updates written to the MySQL master (as observed in 
binlog). The first would have all the values we wanted, the second 
would end up resetting those values to exactly what they were before 
the attempted change was made.

So, a little visit to the qmailadmin source code revealed the culprit 
in user.c.

If you read through user.c, you'll find a subroutine modusergo. In 
there, is the guts of what happens when you click on modify user and 
then submit.  Reading through that, it appears that if we get a new 
password, we make a call to vpasswd. This function, I would guess is 
against libvpopmail.a and connects to the MySQL update server and 
updates the password, in addition to running crypt() against the 
supplied password. At this point, all is well.

However, in the very next function, the problem becomes manifest. We 
check for the value of gecos and if it exists (which it always does 
because it's value is in the HTML form) then it first makes a call to 
vauth_getpw. Testing has revealed that as expected, it does the 
database read from the MySQL read server on localhost.  Unfortunately, 
it's making the query a tiny fraction of a second after writing the 
new password into MySQL update server. I'm sure Bill and Ken both know 
the problem now.

If you weren't using replication, this wouldn't be a problem (and 
isn't on my test servers) but what ends up happening is one SQL query 
arrives at the update server and sets the values that you changed 
appropriately. The next query arrives right behind it and resets the 
values to exactly what they were before the first query updated them.

So, the bandaid I currently applied is putting a 5 second sleep in 
there so that the first change gets a chance to replicate before we 
run vauth_getpw. However, this is a distinctly BAD way to fix it. 
There's no way to guarantee that a change made to the MySQL master has 
propagated to the slave. In 5 seconds it's a 99% probability but I'd 
argue that isn't good enough.

I can only think of one guaranteed way to fix it. That is to make sure 
that when qmailadmin does it's vauth_getpw call, it reads from the 
MySQL master. This is the only guaranteed way to assure that a 
successful database update written a fraction of a second before gets 
read back.

This could be a even bigger problem for people that don't manage their 
MySQL replication well. Imagine having a farm of replicated servers 
where one is out of sync and the admins don't notice. A (l)user hits 
the server that's database isn't replicated and see's their old gecos 
field that they updated last week. The update it again and it resets 
their password back to their old one which they had updated the week 
before. That's not good manners.

So, anyone else have suggestions on the best way to fix this?

After only getting the first half of the story yesterday, I duplicated 
your process to see if I could figure it out.  The strange thing to me 
was that there are I think 3 select queries and 2 update queries that 
occur before modusergo is finished.  This seems excessive.  There 
really should just be one select, and one update.  All the error 
checking and modifications to the vpw structure should happen in 
qmailadmin, with a single update at the end.  I think that refining 
this will be the best solution to the problem.  However, the following 
select during moduser (called by the result page after modifying the 
user) might still get old data.  I'm not sure how long replication 
takes.  Did you experiment with sleeps less than 5 seconds?  I vaguely 
recall someone else having to put a sleep in either qmailadmin or 
vqadmin for this same reason.  Maybe another solution would be to 
refresh back to the main menu, or user list after modifying.  That 
would force a delay before moduser() does the next select for the 
result page.

I'll look at it today and see if I can improve it.

Regards,

Bill Shupp




Re: [qmailadmin] Major bug in qmailadmin when using MySQL replication.

2002-10-22 Thread Tom Collins
On Tuesday, October 22, 2002, at 07:49 AM, Matt Simerson wrote:

If you read through user.c, you'll find a subroutine modusergo. In 
there, is the guts of what happens when you click on modify user and 
then submit.  Reading through that, it appears that if we get a new 
password, we make a call to vpasswd. This function, I would guess is 
against libvpopmail.a and connects to the MySQL update server and 
updates the password, in addition to running crypt() against the 
supplied password. At this point, all is well.

However, in the very next function, the problem becomes manifest. We 
check for the value of gecos and if it exists (which it always does 
because it's value is in the HTML form) then it first makes a call to 
vauth_getpw. Testing has revealed that as expected, it does the 
database read from the MySQL read server on localhost.  Unfortunately, 
it's making the query a tiny fraction of a second after writing the 
new password into MySQL update server. I'm sure Bill and Ken both know 
the problem now.

What if you just moved the vpasswd call (to change the password) AFTER 
the other changes you make via vauth_getpw?  You still have the problem 
of a user not seeing their changes propagate immediately (if they go to 
edit their page again, it might still show the old values).

I can only think of one guaranteed way to fix it. That is to make sure 
that when qmailadmin does it's vauth_getpw call, it reads from the 
MySQL master. This is the only guaranteed way to assure that a 
successful database update written a fraction of a second before gets 
read back.

I don't know enough about the vpopmail structures, but you would 
probably need to add a flag to vauth_getpw indicating whether it was 
allowed to read from the local (replicated) server, or if it should 
read from the master.

Qmailadmin should make all of its calls from the master, so it's always 
reporting the actual/current settings.

--
Tom Collins
[EMAIL PROTECTED]




Re: [qmailadmin] Major bug in qmailadmin when using MySQL replication.

2002-10-22 Thread Bill Shupp
Sorry, didn't mean to take this off-list...

On Tuesday, October 22, 2002, at 09:08  AM, Matt Simerson wrote:


On Tuesday, October 22, 2002, at 11:27  AM, Bill Shupp wrote:


After only getting the first half of the story yesterday, I 
duplicated your process to see if I could figure it out.  The strange 
thing to me was that there are I think 3 select queries and 2 update 
queries that occur before modusergo is finished.  This seems 
excessive. There really should just be one select, and one update.

I tend to agree.


All the error checking and modifications to the vpw structure should 
happen in qmailadmin, with a single update at the end.

The only problem with that is the vpopmail call vauth_setpw function 
doesn't handle crypting of the password. That needs to be done via a 
vpopmail call in case the sysadmin has set up crypt to use something 
other than the systems default crypt().  Of course, that could be as 
easy as adding a vcrypt routine to vpopmail if it doesn't already 
exist. Then you'd have the crypted password.

I think that refining this will be the best solution to the problem.  
However, the following select during moduser (called by the result 
page after modifying the user) might still get old data.  I'm not 
sure how long replication takes.

In theory, it's instant but on really busy systems, it can take up to 
a second or two. If replication breaks for any reason, it could be 
minutes or hours. You can't count on

 Did you experiment with sleeps less than 5 seconds?


No, specifically because I wrote a script that allows me to watch the 
replication among all four of my slaves at the same time. It often 
takes a few seconds for the slave to catch up. I picked 5 because it 
*should* succeed nearly every time.

I vaguely recall someone else having to put a sleep in either 
qmailadmin or vqadmin for this same reason.

But a sleep is a bandaid, it's not a solution.


Maybe another solution would be to refresh back to the main menu, or 
user list after modifying.  That would force a delay before moduser() 
does the next select for the result page.

I think the ideal solution would be, as you stated, a one select, and 
one update function. At the very beginning of the modusergo 
subroutine, do your vauth_getpw so that you have all the values. Then, 
if there is a password update, then run the vcrypt function which will 
return the crypted password. Populate your vpw hash with the crypted 
password, any other changes as already configured and then at the end 
of the routine, update the master server.

My only beef with that is I that I think it should do the vauth_getpw 
from the MySQL master. That way you'll never end up overwriting newer 
data on the master from an out of date slave. That functionality will 
only affect people using MySQL replication and I think it's a 
safeguard that should be in place.

I think that having qmailadmin always read from the master is the best 
solution.  Unfortunately, it would require updating the vpopmail API to 
add an argument specifying whether to use a read server or not.  Unless 
someone knows a way around it.

Short of that, one solution would be to merge the call to vpasswd into 
vauth_setpw, but as you pointed out above, the mkpasswd call from 
vpasswd, along with some other checking from that function, would need 
to added to modusergo.  And to make sure that the next read doesn't get 
old information, you could go back to the main menu, or show_users 
instead of moduser.  I could easily implement this solution this 
morning, if that's what people want to do.

Thoughts?  What do you think Ken?

Cheers,

Bill Shupp




Re: [qmailadmin] Major bug in qmailadmin when using MySQL replication.

2002-10-22 Thread Bill Shupp
On Tuesday, October 22, 2002, at 09:30  AM, Bill Shupp wrote:


I think that having qmailadmin always read from the master is the best 
solution.  Unfortunately, it would require updating the vpopmail API 
to add an argument specifying whether to use a read server or not.  
Unless someone knows a way around it.

Short of that, one solution would be to merge the call to vpasswd into 
vauth_setpw, but as you pointed out above, the mkpasswd call from 
vpasswd, along with some other checking from that function, would need 
to added to modusergo.  And to make sure that the next read doesn't 
get old information, you could go back to the main menu, or show_users 
instead of moduser.  I could easily implement this solution this 
morning, if that's what people want to do.

Ok, for fun I went ahead and implemented this.  The patch below is 
against vanilla 1.0.6, but it works for me.  It consolidates the 
updates into one call by integrating much of vpasswd right into 
modusergo.  There are still 3 select statements though, 2 before 
modusergo is called, and one during modusergo.

Also, at the end of the function, it calls show_users() instead of 
moduser(), which should help allow replication to catch up before the 
next read for that user.

Cheers,

Bill



*** user.c.orig Tue Oct 22 10:01:27 2002
--- user.c  Tue Oct 22 10:28:05 2002
***
*** 753,758 
--- 753,764 
   static char NewBuf[156];
   int count;
   FILE *fs;
+ #ifdef SQWEBMAIL_PASS
+  uid_t uid;
+  gid_t gid;
+ #endif
+
+   vpw = vauth_getpw(ActionUser, Domain);

if (!( AdminType==DOMAIN_ADMIN ||
   (AdminType==USER_ADMIN  strcmp(ActionUser,Username)==0))){
***
*** 768,788 
vclose();
exit(0);
  }
! ret_code = vpasswd( ActionUser, Domain, Password1, USE_POP);
! if ( ret_code != VA_SUCCESS ) {
sprintf(StatusMessage, %s, get_html_text(140));
  } else {
!   sprintf(StatusMessage,%s %s%s %s,
! get_html_text(139), ActionUser, Domain, verror(ret_code) );
  }
}

GetValue(TmpCGI,Gecos, gecos=, MAX_BUFF);
if ( strlen( Gecos ) != 0 ) {
- vpw = vauth_getpw(ActionUser, Domain);
  vpw-pw_gecos = Gecos;
- vauth_setpw(vpw, Domain);
}

/* get the value of the cforward radio button */
GetValue(TmpCGI,box, cforward=, MAX_BUFF);
--- 774,813 
vclose();
exit(0);
  }
! if (strlen(Password1)  MAX_PW_CLEAR_PASSWD) {
!   sprintf(StatusMessage,%s %s%s %s,
! get_html_text(139), ActionUser, Domain, VA_PASSWD_TOO_LONG 
);
!   moduser();
!   vclose();
!   exit(0);
! } else if (vpw-pw_gid  NO_PASSWD_CHNG) {
sprintf(StatusMessage, %s, get_html_text(140));
+   moduser();
+   vclose();
+   exit(0);
  } else {
! sprintf(StatusMessage,%s %s%s,
! get_html_text(139), ActionUser, Domain);
  }
+
+ mkpasswd3(Password1,Crypted, MAX_BUFF);
+ vpw-pw_passwd = Crypted;
+
+ #ifdef CLEAR_PASS
+   vpw-pw_clear_passwd = Password1;
+ #endif
+ #ifdef SQWEBMAIL_PASS
+   vget_assign(Domain, NULL, 0, uid, gid );
+   vsqwebmail_pass( vpw-pw_dir, Crypted, uid, gid);
+ #endif
+
}

GetValue(TmpCGI,Gecos, gecos=, MAX_BUFF);
if ( strlen( Gecos ) != 0 ) {
  vpw-pw_gecos = Gecos;
}
+   vauth_setpw(vpw, Domain);

/* get the value of the cforward radio button */
GetValue(TmpCGI,box, cforward=, MAX_BUFF);
***
*** 899,903 
}

call_hooks(HOOK_MODUSER);
!   moduser();
  }
--- 924,928 
}

call_hooks(HOOK_MODUSER);
!   show_users(Username, Domain, Mytime);
  }