Re: cyrus replication validation

2007-07-16 Thread Wesley Craig
If I understand this patch correctly, it doesn't solve the larger  
problem that I'm interested in: is the data on my replica the same as  
the data on my primary, or more to the point, are the two data sets  
converging?  This patch *would* allow me to more or less validate  
that the cyrus.* meta files are more or less the same on both, and  
takes care of the sync_server staling staging issue -- whether the  
cyrus.* meta files match the other files.  But I'm really interested  
in something that can run out of band from csync, imap, etc, that  
examines files on the primary and replica to know what the variance  
is.  I think make_md5 is pretty ideal for what I'm after, as a source  
of data.  We're working on scripts that compare the data files.

:wes

On 06 Apr 2007, at 22:31, Rob Mueller wrote:
 The provided Cyrus tool make_md5 is for validating replication.  It
 would, for instance, have found the recently discussed bug in   
 sync_server that caused random files to be overwritten in the  
 event  that sync_server reused a stale staging file.  It would  
 probably be  cool if there were documentation somewhere that  
 advised people on how  to run it and how to use it to validate  
 replication.

 We have a patch that helps with this as well see MD5 UUIDs here:

 http://cyrus.brong.fastmail.fm/

 Basically it does two things:
 1. You can make the UUIDs of all messages the first 11 bytes of the  
 MD5 of the message
 2. You can fetch a computed MD5 of any message on disk via IMAP

 Using the second, you can do complete validation via IMAP, just  
 iterate through all folders and all messages, get the computed MD5  
 and compare on both sides.

 The UUID bit is just designed to help replication when messages are  
 moved between folders, rather than having to resend the entire  
 message on a move, it can just link them from one folder to the  
 other at the replication end.

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: cyrus replication validation

2007-07-16 Thread Rob Mueller
Hi

 If I understand this patch correctly, it doesn't solve the larger  problem 
 that I'm interested in: is the data on my replica the same as  the data on 
 my primary, or more to the point, are the two data sets  converging?  ... 
 But I'm really interested  in something that can run out of band from 
 csync, imap, etc, that  examines files on the primary and replica to know 
 what the variance

As mentioned, there's two parts to the patch. The UUID part which helps with 
the replication, but there's also this bit.

 2. You can fetch a computed MD5 of any message on disk via IMAP

 Using the second, you can do complete validation via IMAP, just  iterate 
 through all folders and all messages, get the computed MD5  and compare 
 on both sides.

We wanted the same thing you did, some way to guarantee that the message 
data on both sides was exactly the same. One way of doing that was to use 
something that runs under the covers to check the messages on disk, which is 
fine. The other was to basically add something to the IMAP protocol which 
lets us do the same thing via IMAP.

We went the second, because we already had code that given a username, would 
check their master server and replica server to see that
1. The folder list matched
2. For each folder, message count + unread count + uidvalidity + uidnext 
matched (eg status results)
3. For each folder, the UID listing matched
4. For each folder, the flags on each UID message matched

These were all easy to get via IMAP on both sides and compare. However they 
were all meta-data related, and didn't help check that the actual email 
spool data on disk was correct. Which is why we added two FETCH items to the 
imap protocol with the above patch.

FILE.MD5 and FILE.SIZE

With these, we can now compare each file on each side of the master/replica 
set to see that they match. This means we can now check pretty much all meta 
data + spool data on both sides for consistency, all via IMAP connections, 
without having having to do any more peeking under the hood. Of course 
actually having the patch in there is pretty heavily peeking under the 
hood, but it was easier for us to do that because we already had a script 
which did steps 1-4, so adding a hack to the IMAP protocol was easier for us 
than creating a whole new system. Whether this is easier/harder at your site 
is up to you.

Rob


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: cyrus replication validation

2007-04-09 Thread Dmitriy Kirhlarov
On Fri, Apr 06, 2007 at 05:52:28PM -0400, John Capo wrote:

  On both servers:
  find imap/ -type f | awk '!/(cache|index|header)/ {print}' | sort  
  server1.lst
  find imap/ -type f | awk '!/(cache|index|header)/ {print}' | sort  
  server2.lst
  
  and
  diff -u server1.lst server2.lst
  
 
 Quick mailboxes.db check.
 
 ctl_mboxlist -d | md5   on server1
 ctl_mboxlist -d | md5   on server2
 
 Both hashes should be identical.  Or diff the ctl_mboxlist -d
 outputs.

Please, correct me, if I wrong. It's just check of mailbox lists, but
not messages numbers.

WBR.
Dmitriy

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: cyrus replication validation

2007-04-09 Thread John Capo
Quoting Dmitriy Kirhlarov ([EMAIL PROTECTED]):
 On Fri, Apr 06, 2007 at 05:52:28PM -0400, John Capo wrote:
 
   On both servers:
   find imap/ -type f | awk '!/(cache|index|header)/ {print}' | sort  
   server1.lst
   find imap/ -type f | awk '!/(cache|index|header)/ {print}' | sort  
   server2.lst
   
   and
   diff -u server1.lst server2.lst
   
  
  Quick mailboxes.db check.
  
  ctl_mboxlist -d | md5   on server1
  ctl_mboxlist -d | md5   on server2
  
  Both hashes should be identical.  Or diff the ctl_mboxlist -d
  outputs.
 
 Please, correct me, if I wrong. It's just check of mailbox lists, but
 not messages numbers.

Correct.



 
 WBR.
 Dmitriy
 
 Cyrus Home Page: http://cyrusimap.web.cmu.edu/
 Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
 List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: cyrus replication validation

2007-04-06 Thread Dmitriy Kirhlarov
On Thu, Apr 05, 2007 at 12:10:14PM -0400, Ilya Vishnyakov wrote:

 Hello Cyrus Gurus!
 I was wondering if there is any specific way to check if the
 replication was done properly? I set up cyrus replication between two
 servers (documentation I used:
 http://cyrusimap.web.cmu.edu/imapd/install-replication.html). However,
 before switching our production servers we would like to make sure
 that replication was done properly. We checked if the directories are

On both servers:
find imap/ -type f | awk '!/(cache|index|header)/ {print}' | sort  server1.lst
find imap/ -type f | awk '!/(cache|index|header)/ {print}' | sort  server2.lst

and
diff -u server1.lst server2.lst

WBR.
Dmitriy

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: cyrus replication validation

2007-04-06 Thread Ilya Vishnyakov
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
 
hmmm. it shows the equal sizes for both files.
thank you.

Dmitriy Kirhlarov wrote:
 On Thu, Apr 05, 2007 at 12:10:14PM -0400, Ilya Vishnyakov wrote:

 Hello Cyrus Gurus!
 I was wondering if there is any specific way to check if the
 replication was done properly? I set up cyrus replication between two
 servers (documentation I used:
 http://cyrusimap.web.cmu.edu/imapd/install-replication.html). However,
 before switching our production servers we would like to make sure
 that replication was done properly. We checked if the directories are

 On both servers:
 find imap/ -type f | awk '!/(cache|index|header)/ {print}' | sort 
server1.lst
 find imap/ -type f | awk '!/(cache|index|header)/ {print}' | sort 
server2.lst

 and
 diff -u server1.lst server2.lst

 WBR.
 Dmitriy
 
 Cyrus Home Page: http://cyrusimap.web.cmu.edu/
 Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
 List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
 
iD8DBQFGFlLcUZGmaUWxLn8RAl0vAJ9cjGvGj6EDp1TICoXby36tqc/yPwCgkrp+
PiSQGmVFX5NjIlKYNYBxZtM=
=DY+E
-END PGP SIGNATURE-


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: cyrus replication validation

2007-04-06 Thread John Capo
Quoting Dmitriy Kirhlarov ([EMAIL PROTECTED]):
 On Thu, Apr 05, 2007 at 12:10:14PM -0400, Ilya Vishnyakov wrote:
 
  Hello Cyrus Gurus!
  I was wondering if there is any specific way to check if the
  replication was done properly? I set up cyrus replication between two
  servers (documentation I used:
  http://cyrusimap.web.cmu.edu/imapd/install-replication.html). However,
  before switching our production servers we would like to make sure
  that replication was done properly. We checked if the directories are
 
 On both servers:
 find imap/ -type f | awk '!/(cache|index|header)/ {print}' | sort  
 server1.lst
 find imap/ -type f | awk '!/(cache|index|header)/ {print}' | sort  
 server2.lst
 
 and
 diff -u server1.lst server2.lst
 

Quick mailboxes.db check.

ctl_mboxlist -d | md5   on server1
ctl_mboxlist -d | md5   on server2

Both hashes should be identical.  Or diff the ctl_mboxlist -d
outputs.

You should check the subscriptions on the replica too.  I don't
know of a simple way for you to verify the subscriptions other than
software that fetches and compares each each users subscriptions.
Subscription replication is the only replication problem I am seeing
these days and I haven't had time to look into it.

Well, that's not completely true.  I have seen some cases where the
bits controlling the POP3 UIDL format will differ on the replicas.
If all mailboxes were created fairly recently, for some value of
recent, or you have no POP3 users, you should not have a problem.

I have mailboxes that were originally created with early 1.X and
lots of POP3 users.  The UIDL format has changed over the years and
we have yet another UIDL format that attempts to get around the
Outlook problem.  The jury is still out on that.  The UIDL format
difference are only a problem if mail is left on server.

John Capo


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: cyrus replication validation

2007-04-06 Thread Wesley Craig

On 06 Apr 2007, at 17:52, John Capo wrote:

Quick mailboxes.db check.

ctl_mboxlist -d | md5   on server1
ctl_mboxlist -d | md5   on server2

Both hashes should be identical.  Or diff the ctl_mboxlist -d
outputs.


The provided Cyrus tool make_md5 is for validating replication.  It  
would, for instance, have found the recently discussed bug in  
sync_server that caused random files to be overwritten in the event  
that sync_server reused a stale staging file.  It would probably be  
cool if there were documentation somewhere that advised people on how  
to run it and how to use it to validate replication.


:wes

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: cyrus replication validation

2007-04-06 Thread Rob Mueller



The provided Cyrus tool make_md5 is for validating replication.  It
would, for instance, have found the recently discussed bug in  sync_server 
that caused random files to be overwritten in the event  that sync_server 
reused a stale staging file.  It would probably be  cool if there were 
documentation somewhere that advised people on how  to run it and how to 
use it to validate replication.


We have a patch that helps with this as well see MD5 UUIDs here:

http://cyrus.brong.fastmail.fm/

Basically it does two things:
1. You can make the UUIDs of all messages the first 11 bytes of the MD5 of 
the message

2. You can fetch a computed MD5 of any message on disk via IMAP

Using the second, you can do complete validation via IMAP, just iterate 
through all folders and all messages, get the computed MD5 and compare on 
both sides.


The UUID bit is just designed to help replication when messages are moved 
between folders, rather than having to resend the entire message on a move, 
it can just link them from one folder to the other at the replication end.


Rob


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html