Re: MESSAGE quota resource implemention

2011-09-01 Thread Rob Mueller



Actually, really I'd like to create a new UNIQUEID - and store
all the files in paths based on uniqueid rather than on folder
name.  This would not only mean renames could be fast and
atomic, but that delayed delete would be fast.  The downside is
a more opaque filesystem layout.  Oh, another upside - file path
limitations don't exist so much any more.


While UNIQUEID is nice, the opaqueness is annoying. Personally, I liked the 
idea that we talked about a while back which I think was:


$spooldir/a/user/aardvark/user.aardvark/
$spooldir/a/user/aardvark/user.aardvark.drafts/
$spooldir/a/user/aardvark/user.aardvark.trash/
$spooldir/a/user/aardvark/user.aardvark.foo/
$spooldir/a/user/aardvark/user.aardvark.foo.bar/
$spooldir/a/user/aardvark/user.aardvark.abc.xyz/

So you end up with every folder for a user in one dir. This solves the 
current messy handling of sub-dirs (eg currently you have to create the 
intermediate dir /abc/ even though there's no entry in mailboxes.db for it), 
and makes renaming any folder very cheap still (because you can do it with a 
single dir rename, rather than having to move each message file), but you 
don't go completely opaque.


Delete handling is still easier, just rename to DELETED.$oldname.$UNIQUEID 
or something like that, because it's cheap to rename anyway.


Of course it means re-organising folder layout for every installation out 
there, but maybe we need to bump major versions anyway, cyrus 3.0 here we 
come :)


Rob



Re: Zero byte appends

2010-07-17 Thread Rob Mueller



Some applications (e.g. Kolab) store other objects than Mail (e.g.
Addressbooks, Calendar, Tasks or Notes) in IMAP folders. They might
create empty objects for some reason.


Well that's pretty evil.  The objects should still have a header,
right?  Or do they just store blobs?


http://www.faqs.org/rfcs/rfc2822.html

So the body is optional (3.5), but in the headers (3.6), there has to be a 
Date and From header


  The only required header fields are the origination date field and
  the originator address field(s).  All other header fields are
  syntactically optional.  More information is contained in the table
  following this definition.

So really a 0 byte message is not a valid message.

Rob



Re: about the replication

2008-01-24 Thread Rob Mueller



I run quota on master and replica every once in a while and compare.

That's an easy way to find and fix the problem cases, regardless of 
whether the master or replica was at fault. There are typically only about 
half a dozen cases each time, so I don't think that it's a huge problem.


I agree, it's not a huge problem, and there are other ways to fix it.

However coming back to my main point, shouldn't sync_client -u be guaranteed 
to make a replica be in sync with the master for a given user for 
everything; folders, subscriptions, flags, seen state, sieve script, quotas, 
etc. Currently it seems to do everything but quotas, which is annoyingly 
inconsistent!


Rob



Re: about the replication

2008-01-23 Thread Rob Mueller



I ran my script last night on one of the back-ends.  This one has 20928
mailboxes.  And 1067 are not in sync.  From those, 897 users didn't have 
all their mailboxes replicated to the replica.  The others had the same 
folder count, but one or more messages were not the same (or weren't 
there).


We also have a script which does this for all users every week. It checks 
for each user that the following match:


1. Folder list
2. Folder subscriptions
3. Quota (total + used)
4. Sieve scripts
5. For every folder, status output for (messages uidnext unseen recent 
uidvalidity)

6. For every message in every folder, flags + message sizes + GUID

All of those are reasonably quick since they access only meta-data. We also 
check on a regular basis for a random sampling of messages that rfc822.sha1 
on both sides match.


The script also waits for a few seconds and retries if there's any problems 
(replication is non-synchronous, so there may be delays in differences 
between both sides).


If there's a mismatch problem, the script can be set to:

1) Run a sync_client -u
2) Run a reconstruct on both sides
3) Completely delete the replica mailbox and run a sync_client -u to build a 
complete new copy from scratch


One annoying thing we've found recently is that sync_client -u doesn't fix 
quota problems, it seems the protocol doesn't include an absolute set 
replica quota to this action. We'd been meaning to look into that but 
hadn't got to it.


David: Don't suppose you could check that out? It would be nice if 
sync_client -u could really fix every user problem, it's very close to that 
right now, quota mismatches seems to be about the only thing it can't fix.


FYI, unfortunately at the moment this script is fairly closely tied to our 
system (it uses a bunch of other modules to work out master/replica servers 
for a user). It probably would be nice to factor that stuff out so others 
could use it...


Rob



Re: statuscache

2008-01-17 Thread Rob Mueller



- statuscache v.2 stores statuscache_data as a binary blob, which is
platform dependent (byte-order  word size).  This make statuscache.db 
non-portable.  I believe that this might be the first cyrusdb that would 
be non-portable.  Does anybody care?


The only tricky part is going to be upgrading from v.2 to v.3.  If its not 
designed to be persistent and/or migrated between platforms, I'm not sure 
I care, since your design is definitely quicker.


Basically our startup scripts completely delete the statuscache.db file on 
startup, so migrating isn't an issue for us at all.


So if you want to make it platform independent for others in the future, I'm 
fine with that.



BTW, is statuscache:log necessary, or is it just for debugging?


It's not necessary, it was just helpful for debugging rather than having to 
recompile everything. From memory, clients make SO many status calls, that 
it can completely flood the log if you leave the syslog calls in.


Rob



Re: statuscache

2008-01-17 Thread Rob Mueller



Why do you delete the cache?  I doesn't appear from the code that the
cache entries would be invalid if the mailbox and seen state aren't 
touched.


Historical. We only use berkeley db for statuscache and duplicate delivery, 
and because of problems in the past with berkeley db's corrupting 
themselves, we found it easier to just delete them on startup. It avoids the 
case of cyrus failing to start if one of the bdb's is corrupted


I'm changing the statuscache option to a switch and leaving the logging in 
as LOG_DEBUG


My only concern with this is that it can cause a LOT of log messages. I've 
never been entirely sure how well syslog handles the case of lots of log 
messages. I think it still has to send them over the socket to the syslogd. 
Most people then have a syslog.conf that discards debug messages, so it 
always seemed like a possible way to overflow the syslog socket buffer and 
loose important messages (i'm pretty sure syslog never guarantees that a 
message will be logged, which is why djb implemented his own way that uses 
streams rather than dgrams). This is probably all entirely academic and not 
a real issue...


Rob



Two bugs in 2.3.10-rc2 that cause segfaults

2007-10-16 Thread Rob Mueller

Hi Ken

After some testing on our test server, we rolled out 2.3.10-rc2 onto one of 
our production IMAP stores and almost immediately starting noticing 
segfaults for some users.


Bron and I spent quite a bit of time this afternoon debugging, and we're 
pretty sure we found both the causes.


1. In index_parse_sequence(), your realloc() doesn't multiply by 
sizeof(struct seq_range), so if you have a sequence list with more than 
about 8 items, it will start corrupting memory


2. The signed - unsigned changes in the prot stream stuff weren't as 
innocent as they looked and created a subtle and nasty bug. You might want 
to audit the other code where you changed signed - unsigned as well...


Here's an example of the problem if you use IDLE and close the connection on 
it. In that case, what happens is:


In cmd_idle(), we wait for data from the client by calling:

   c = getword(imapd_in, arg);

getword() calls prot_getc(), which looks like this:

#define prot_getc(s) ((s)-cnt--  0 ? (int)*(s)-ptr++ : prot_fill(s))

Say there's no data in the prot buffer, then cnt == 0. However when we call 
the above macro, it does cnt--, which makes cnt wrap to 4294967295, and then 
calls prot_fill. prot_fill() calls read(), and if read() returns = 0 (eg 
other end closed the connection) it does.


   if (n = 0) {
   if (n) s-error = xstrdup(strerror(errno));
   else s-eof = 1;
   return EOF;
   }

So it sets the eof flag and returns EOF.

Back in cmd_idle, we get the EOF, and exit out of cmd_idle and back into the 
main loop, which does.


   /* Parse tag */
   c = getword(imapd_in, tag);

Now getword calls the prot_getc() macro again... but uh, oh. cnt is 
definitely  0 now (it's 4294967295) and so we reading through unitialised 
memory trying to interpret it as IMAP commands until we hit a segfault 
somewhere.


Our fix is to change prot_getc() to only decrement cnt if it's non-zero. 
Because it's an expression macro, it uses the C comma operator which I've 
found often even experienced C people don't know about 
(http://www.eskimo.com/~scs/cclass/int/sx4db.html).


#define prot_getc(s) ((s)-cnt  0 ? (--(s)-cnt, (int)*(s)-ptr++) : 
prot_fill(s))


Both patches are here:

http://cyrus.brong.fastmail.fm/patches/cyrus-fix-segfaults-2.3.10.diff

Rob



Re: Making Replication Robust

2007-10-12 Thread Rob Mueller



This would seem to be a significant advantage of running sync_client
outside master.

When I shut down master, sync_client continues to process the outstanding 
log. I can then use sync_shutdown_file when it has finished and is idle.


We do something similar.

But it means you have to develop a bunch of your own infrastructure to make 
cyrus replication robust. It's not currently a start it and it just works 
until you shut it down solution, which means either people have to 
replicate the same extra infrastructure work everywhere separately, or 
people are going to get burnt not realising that what they're doing isn't 
safe.


That's why I'd really like this to be in cyrus itself. I think we should be 
able to say in the documentation something like:


Shuting down a cyrus master with a SIGQUIT ensures that all actions have 
been replicated to the replica side.


It makes writing init scripts and the like a lot easier.


There seems to be a spilt of opinion between BSD and SVR4: BSD tries to
retry while SVR4 throws EINTR. Linux of course can work either way:

http://www.gnu.org/software/libc/manual/html_node/Interrupted-Primitives.html


Isn't a lot of writes already wrapped up in some retry_write() function. I 
admit I haven't looked closely.


Anyway, is this really a problem. Basically shouldn't you be able to kill 
cyrus at any point, and files are left in a consistent restartable state? If 
so, if something returns EINTR, won't it just move on and eventually exit? 
Or is the problem that you have something like:


write to file 1
write to file 2

And if the first returns EINTR but is ignored, and then it writes the 
complete data to the second, things are in an inconsistent state?


Rob



Re: Making Replication Robust

2007-10-12 Thread Rob Mueller



Or is the problem that you have something like:

write to file 1
write to file 2

And if the first returns EINTR but is ignored, and then it writes the 
complete data to the second, things are in an inconsistent state?


This is my concern.


Doing an ack 'write\(' reveals a scary mix of write, retry_write and 
fwrite calls. My initial reaction was that binary files seem to use 
open/retry_write, and text files use fopen/fwrite, but doesn't quite seem to 
be the case...


mailbox.c
1242:r = write(newheader_fd, MAILBOX_HEADER_MAGIC,
1359:n = retry_write(mailbox-index_fd, buf, header_size);
1428:n = retry_write(mailbox-index_fd, buf, INDEX_RECORD_SIZE);
1477:n = retry_write(mailbox-index_fd, buf, len);
1642:fwrite(buf, 1, INDEX_HEADER_SIZE, newindex);
1659:fwrite(bufp, INDEX_RECORD_SIZE, 1, newindex);
1710:fwrite(buf, INDEX_RECORD_SIZE, 1, newindex);
1721:   fwrite(buf+OFFSET_DELETED,
1952:   n = retry_write(expunge_fd, buf, mailbox-record_size);
1979:   if (newindex) fwrite(buf, 1, mailbox-record_size, newindex);
1999:   /* fwrite will automatically call write() in a sane way */
2000:   fwrite(cacheitembegin, 1, cache_record_size, newcache);
2004:   fwrite(buf, 1, mailbox-record_size, newindex);
2058:fwrite(buf, 1, mailbox-start_offset, newindex);
2215:   fwrite(buf, 1, sizeof(bit32), newcache);
2219:fwrite(buf, 1, mailbox-start_offset, newindex);
2263:   n = retry_write(expunge_fd, buf, mailbox-start_offset);
2342:   r = quota_write(mailbox-quota, tid);
2363:   fwrite(buf, 1, mailbox-start_offset, newexpungeindex);
2424:   n = retry_write(expunge_fd, buf, mailbox-start_offset);
2719:   n = retry_write(mailbox.cache_fd, (char *)mailbox.generation_no, 
4);

2823:   r = quota_write(mailbox-quota, tid);
3056:   r = quota_write((newmailbox-quota), tid);
3309:   r = quota_write(newmailbox.quota, tid);
3319:   r2 = quota_write(newmailbox.quota, tid);
3398:n = retry_write(destfd, src_base, src_size);

It seems mixing up fd's or FILE * structs all over the place. *sigh*

Does fwrite() retry a write on EINTR? It looks like that's the whole point 
of retry_write() anyway.


If fwrite() does retry, then about the only other work would be changing any 
naked write() calls to retry_write(), which actually doesn't seem that many.


Thoughts?

Rob



Re: Making Replication Robust

2007-10-09 Thread Rob Mueller



c) MUST have a clean process to soft-failover to the
  replica machine, making sure that all replication
  events from the ex-master have been synchronised.


Something more than sync_shutdown_file plus automatic retries on
recent work files?


I think the problem at the moment is that the process you really want is:

1. Stop new imap/pop/lmtp/sieve/etc connections
2. Finish and close existing connections cleanly but as quickly as possible
3. Finish running any sync log files
4. Fully shutdown

There's currently no clean way to do this. Basically you have to SIGTERM 
master which hard kills it and all children, then manually run 
sync_client -f on any remaining log files.


We've got a patch which makes master handle SIGQUIT much more nicely. 
Basically it appears there was some existing infrastructure that was 
designed to handle a cleaner shutdown, look at the code to all the places 
that call signals_poll(). It looks like the idea was that you could send 
child processes SIGQUIT and they would continue their current action until 
their main loop and check if they'd been sent a QUIT, and then exit 
cleanly. Unfortunately if you sent SIGQUIT to master, it would just SIGTERM 
all children, not SIGQUIT them.


This patch attempts to fix this, so that sending SIGQUIT to master, sends 
SIGQUIT to all children, and then waits for them to all exit cleanly.


http://cyrus.brong.fastmail.fm/#cyrus-clean-shutdown-2.3.8.diff

This solves step 1  2 above, though it doesn't deal with the case of a 
crazy child that doesn't respond to SIGQUIT. Personally our init script 
sends SIGQUIT, and if the master process is still there after 10 seconds, 
then it sends SIGTERM to force and exit. In general we find that everything 
exits after a couple of seconds of SIGQUIT.


To do step 3, I think the best might be to have a new cyrus.conf section, a 
SHUTDOWN section which gives some commands to run on shutdown. Basically 
after all children have accepted a SIGQUIT and exited, then we run the 
SHUTDOWN section, which would run a final sync_client -r on the sync dir to 
finish up any remaining log files.


With all of that in place, it means you could send a SIGQUIT to a cyrus 
master process on a master server, and it would cleanly shutdown all 
children and ensure that all replication events have been correctly played 
to the replica. You could then do the same to the replica, then reverse 
their roles, and bring them both back up and you've got a safe soft 
failover.



At the moment we replace messages (on the master knows best principle).

It would be easy enough to leave message in place and generate warnings 
instead, although this would generate a lot of warnings, one for every bad 
message every time that a given mailbox is updated.


That's what this patch does.

http://cyrus.brong.fastmail.fm/#cyrus-warnmismatcheduuids-2.3.8.diff

In theory with clean soft failovers, you should NEVER have UIDs with 
mismatched UUIDs. After a hard failover, you obviously might, but in those 
cases, just replacing the message means we're almost certainly overwriting a 
delivered message and loosing it which is bad. At least making it an option 
to overwrite or log I think is a sane idea.


My nightmare scenario is a replication engine which carries on running in 
the face of mboxlist corruption on the master: you could lose a lot of 
mailboxes on the replica that way.


That would be bad, though hard to detect and stop. I guess that's what 
backups are for...



It would be easy enough to generate multiple replication log files.

MySQL keeps a single transaction log for multiple replicas, but that file 
contains quite a lot of information about each transaction. In contrast 
the Cyrus sync log is just a list of objects we need to pay attention to: 
the files have much less state, particularly without duplicates.


The other option is rather than using the rotate log, play it, delete it 
system, you generate one log file but you keep track of offsets within the 
file to tell you where each replica is up to. That's what mysql does, so you 
can have multiple replicas because each replica is playing off the same 
log files, they're just up to different offsets at any point in time.


Rob



Re: Two bug fix patches

2007-09-14 Thread Rob Mueller



Eventually worked out that if a message is delivered by LMTP to several
recipients, then the first recipient gets a correct ENVELOPE, but all the 
following recipients get an lcase()ed version. The reason is that 
append_fromstage() in 2.3 reuses the struct body calculated for the 
first recipient. Unfortunately there are some lcase() calls in the middle 
of message_write_cache() which overwrite fields in that structure.


The incorrect cyrus.cache entries are replicated as is by sync_client 
UPLOAD PARSED, so master and replica are typically both incorrect.


I'm running a mailstore with fastmail.fm patch for  uuidmode: md5.

Should I worry?  It seems to me I should.


I don't think so. This only affects cache records. Our uuidmode patch only 
calculates the md5 for the message body. The make_md5 David mentioned is a 
completely separate script he runs and maintains to calculate the md5's of 
messages and stores the values separately to the existing cyrus structures.


At least I believe that's the case, I'm sure David will comment.

Rob



Re: *IMPORTANT* - bugfix sync_append_commit index breakage

2007-09-01 Thread Rob Mueller



I believe so.  If we also used it for appends, we wouldn't run into the
problem that Fastmail is seeing.  But, it may be too much of a
performance hit.



I don't think that any of the code does the .NEW for appends because that
makes append O(N) where N is the mailbox size, and that would suck for
huge mailboxes.


Definitely. Rewriting the cyrus.index file for every message delivered to a 
mailbox would be horrible, especially when some users have 100,000+ messages 
in a mailbox.



Speaking of which, is there any reason why cyrus.expunge isn't sorted
by UID?  We have to rewrite the entire cyrus.index each time we do an
expunge anyway.


I'm guessing it's because messages can be deleted + expunged in any order, 
so it's easy just to append the record for any message being expunged to the 
end of the cyrus.expunge file. If you had to keep it sorted, that would mean 
every time you expunged a message, you'd have to reread, resort, and rewrite 
the cyrus.expunge file. Generally the cyrus.expunge will be a lot smaller 
than the cyrus.index, but still annoying...


Rob



Re: Need advice: mailbox-based \Seen flag

2007-08-30 Thread Rob Mueller



I actually went digging into seen state and cyrus.index a little today
for another reason, and I realized that we wouldn't have to expand/upgrade 
the index file to accomodate a shared seen state.  The existing field for 
system flags is just a bitmask, so we could easily add a bit for shared 
\Seen.  We could also implement the sharedseen annotation as a mailbox 
flag in the index header, just like condstore.


Based on other changes that I'm thinking about, I'm starting to like this 
idea better, even thought it *may* involve more code.


Doesn't that just seem annoying having two *completely* different ways of 
storing seen state? There' probably quite a few places in the code that look 
for the seen state. It just seems much easier to me to have a single flag 
that controls whether to open a specific users seen db, or the anyone user 
seen db, rather than having to special case lots of code around the place 
for either pulling from the seen db, or looking up a bit in the index file.


Rob



Re: Various small patches

2007-02-15 Thread Rob Mueller



lmtpd - deliver: $msgid$, from=$from$, to=$to$, uid=$uid$,
mailbox=$mboxname$
lmtpd - sieve discard: $msgid$, from=$from$, to=$to$
lmtpd - sieve redirect: $msgid$, from=$from$, to=$to$, 
redirect=$redirect

lmtpd - sieve error: $msgid$, from=$from, to=$to$, error=$error$
imapd - expunge: $msgid$, uid=$uid$, mailbox=$mboxname$

Does something like that sound reasonable?


Yes, although I doubt that you want imapd/ipop3d to log the UID of
every single message which is expunged.


Given we're suggesting logging the msgid, from, to, mailboxname and uid of 
every message delivered, I don't see too much of a problem in logging the 
msgid, mailbox name and uid of every message expunged, it should be a pretty 
similar order of magnitude and would provide very useful tracking 
information.


Rob