Re: MESSAGE quota resource implemention
On 17/09/11 01:37, Julien Coloos wrote: > As discussed here and on IRC, I rebased my commits with all the changes: > - the 'utility' methods have been promoted to 'command' ones, which > are now generic and may (options hash) concern cyrus binaries > - the 'start_command_bg' method is now replaced by a 'background' > option in 'run_command' > - dropped the 'mode' option, leaving only the 'redirects' one > - moved 'unpack' method from Cassandane::Unit::TestCase to > Cassandane::Instance, leaving the tar utility determine itself the kind > of the tar file > -> by default destination is the current cassandane instance base > directory; alternatively one can specify a relative path - > 'path/to/file' - to this directory, or an absolute path - '/path/to/file' > - use the pre-existing 'user.cassandane' mailbox for quota upgrade tests > - fixed a few minor things (like avoiding short-life zombies by > harvesting some cassandane dead children) > - refactored a few more stuff > - removed stuff that became unnecessary > > Hoping this will work out this time :) Excellent work, thanks a lot. I particularly like the refactoring in Quota.pm that allows the tests to express checks of reported quota numbers in a way that's concise but not sensitive to the order reported by the server. I've pulled your changes and pushed them out to github and cmu. -- Greg.
Re: MESSAGE quota resource implemention
As discussed here and on IRC, I rebased my commits with all the changes: - the 'utility' methods have been promoted to 'command' ones, which are now generic and may (options hash) concern cyrus binaries - the 'start_command_bg' method is now replaced by a 'background' option in 'run_command' - dropped the 'mode' option, leaving only the 'redirects' one - moved 'unpack' method from Cassandane::Unit::TestCase to Cassandane::Instance, leaving the tar utility determine itself the kind of the tar file -> by default destination is the current cassandane instance base directory; alternatively one can specify a relative path - 'path/to/file' - to this directory, or an absolute path - '/path/to/file' - use the pre-existing 'user.cassandane' mailbox for quota upgrade tests - fixed a few minor things (like avoiding short-life zombies by harvesting some cassandane dead children) - refactored a few more stuff - removed stuff that became unnecessary Hoping this will work out this time :) Regards Julien
Re: MESSAGE quota resource implemention
On 15/09/11 03:14, Julien Coloos wrote: > Le 12/09/2011 23:09, Bron Gondwana a écrit : >> >>> - for 'old' mailboxes (those created before the annotation storage >>> usage field in the index header), current annotations usage shall be >>> computed (and added to the quota entry) upon upgrading; this way >>> users won't have to run 'quota -f' for all quotaroots after >>> switching to this new version ;) >> Definitely. Upgrading usually handles things like that. It's >> the right way[tm]. >> > Pushed a few lines of code in the 'upgrade_index' function. Though it > still lacks the annotation storage usage computing. > It's in the branch 'gnb/annotate/fixes'; commit > ed61f721487804b205e399c538fc35ddc153bd93. Thanks Julien, I've cherry-picked this commit and the "Fixes" commit from the other day into my annotate branch. BTW I'm working on making reconstruct calculate the X-ANNOTATION-STORAGE quota. Some of that code may also be useful during upgrade and for handling message expunging. > >>> Actually I just gave up the 'old' test: there is no easy way to >>> simulate upgrading mailbox index, or at least I don't feel confident >>> enough to make it in cassandane :( No, there isn't a really good way at the moment. >> Easiest way is the same way I did for the broken quotas test. >> Have a tar file with the contents of the "old mailbox" which >> you unpack onto the filesystem, and then open the mailbox and >> check that the "upgraded" fields are what you would expect. >> >> I also did something similar for the "crash on thread" test, >> where there were 5 messages which were known to be able to >> crash the THREAD command. I unpacked the folder contents >> with those messages in a test. > Nice idea :) Bron's method of saving a tarball of the state of a mailbox generated by an older release is reasonably straightforward. The difficulty however is going to be working out whether the result of the upgrade is correct. In the general case, you can't just test that the first open of the mailbox doesn't report an error from upgrading, you need to test at least: * the mailbox is re-written with the latest version number * all the "new" fields have correct or at least feasible values * the messages previously present are still present and in the same order with the same contents and the same metadata (uid, flags, internaldate, etc) To do this properly you really need to save, independently of the Cyrus mailbox store, a whole lot of information about the expected results of the mailbox open so that Cassandane can check it. The two other alternatives are both worse: a) don't check and b) hardcode checks into the Cassandane code which depend intimately on details of the data in the mailbox state tarball. I'm thinking that what we need is a standalone utility built from the Cassandane source tree. Someone would point this utility at a live working Cyrus instance (not one controlled by Cassandane), and it would create a new folder and fill the folder with known data, then harvest the on-disk state left by Cyrus and packages that state up into a tarball along with a separate description of the messages. That tarball could then be checked into the Cassandane tree and used to run upgrade tests with reasonable post-upgrade checks. The advantages I see for this approach are: * when (not 'if') we need to improve either the data generated or the checks run, we can improve this utility and then re-capture new state as time permits. * the versions of Cyrus tested can be on separate machines from Cassandane itself - which allows testing upgrades from really old versions which don't install or run on modern OSes due to library issues, and - allows testing upgrades from Cyrus on other platforms (e.g. platforms with different endianness or wordsize). BTW at some point in the future I want Cassandane to be able to handle multiple installed versions of Cyrus, so that we can test configurations like cross-version replication. But that's not really the same problem as upgrade testing. For upgrade testing, we don't need functioning code for the old Cyrus versions, just a snapshot of their spoor. For cross-version replication or murder testing we need live installs to poke. The difference is critical, because I figure we'll need to support upgrade from much older versions than we support in a cross-version murder, and those older versions will be quite challenging to install or get running on the same platform that we develop on. > Pushed a few things. Thanks. I pulled in your commit "Added quota MESSAGE resource (RFC 2087) test cases." yesterday and then refactored the new tests a bit, but didn't get a chance to push it out again until today. > Added an option hash so that when running commands: > - test can run system commands (based on the current code which would > run cyrus commands inside the current cassandane instance directory) > - finer I/O redirections are possible > - working dire
Re: MESSAGE quota resource implemention
Le 12/09/2011 23:09, Bron Gondwana a écrit : - for 'old' mailboxes (those created before the annotation storage usage field in the index header), current annotations usage shall be computed (and added to the quota entry) upon upgrading; this way users won't have to run 'quota -f' for all quotaroots after switching to this new version ;) Definitely. Upgrading usually handles things like that. It's the right way[tm]. Pushed a few lines of code in the 'upgrade_index' function. Though it still lacks the annotation storage usage computing. It's in the branch 'gnb/annotate/fixes'; commit ed61f721487804b205e399c538fc35ddc153bd93. Actually I just gave up the 'old' test: there is no easy way to simulate upgrading mailbox index, or at least I don't feel confident enough to make it in cassandane :( Easiest way is the same way I did for the broken quotas test. Have a tar file with the contents of the "old mailbox" which you unpack onto the filesystem, and then open the mailbox and check that the "upgraded" fields are what you would expect. I also did something similar for the "crash on thread" test, where there were 5 messages which were known to be able to crash the THREAD command. I unpacked the folder contents with those messages in a test. Nice idea :) Pushed a few things. Added an option hash so that when running commands: - test can run system commands (based on the current code which would run cyrus commands inside the current cassandane instance directory) - finer I/O redirections are possible - working directory can be specified While adding those new 'features', I still kept the 'run_utility' method (cyrus commands) and simply added the 'run_command' method for other commands. Then, as you suggested on IRC, I added an 'unpack' method to extract tar/gz/bz2 (and combinations) files using system commands. And I added back my quota upgrade test from a cyrus v2.4(.11) mailbox, using two tar.gz files containing a mailbox content and its quota file. It's in the 'quotamessage/gnb/annotate' branch; commits d2bf4e4f42f7f8c9b53713441116ddbea5b0a265, 86a52daeed5a03f078b88de67d3d10b51a7f8cc4 and fb827e8fc77529a5e23465d2c19d6e88adf7cae8. Maybe you won't keep everything I pushed, but I hope some parts will be helpful :) Regards Julien
Re: MESSAGE quota resource implemention
On Mon, Sep 12, 2011 at 04:46:52PM +0200, Julien Coloos wrote: > Le 09/09/2011 14:18, Greg Banks a écrit : > >Ok, here you go. Not completely tested yet, so caveat emptor. > Remaining things concerning annotations: > - when deleting messages, annotations length is not substracted; > the solution may not be that simple, since I believe users are > allowed to unexpunge mails: so since index entry is still here - > until real unlinking -, annotations may have to stay there too - > until unlinking too When they unexpunge, you add it back. The decision to have delayed expunge and WHEN to expunge is a server administrator decision that's outside the control of the user. With storage, we subtract the usage when we mark the record expunged. Annotation length should be handled the same way. This will have to be done when the message gets expunged, by finding the size of the annotations and subtracting that also. > - for 'old' mailboxes (those created before the annotation storage > usage field in the index header), current annotations usage shall be > computed (and added to the quota entry) upon upgrading; this way > users won't have to run 'quota -f' for all quotaroots after > switching to this new version ;) Definitely. Upgrading usually handles things like that. It's the right way[tm]. > Actually I just gave up the 'old' test: there is no easy way to > simulate upgrading mailbox index, or at least I don't feel confident > enough to make it in cassandane :( Easiest way is the same way I did for the broken quotas test. Have a tar file with the contents of the "old mailbox" which you unpack onto the filesystem, and then open the mailbox and check that the "upgraded" fields are what you would expect. I also did something similar for the "crash on thread" test, where there were 5 messages which were known to be able to crash the THREAD command. I unpacked the folder contents with those messages in a test. Regards, Bron.
Re: MESSAGE quota resource implemention
Le 09/09/2011 14:18, Greg Banks a écrit : Ok, here you go. Not completely tested yet, so caveat emptor. Had to change two things: - mailbox_quota_check now expects a quota diff array which is good :), but change is now really applied if all diffs are '> 0' (instead of '>= 0' previously); in some cases '0' is used to check that mailbox is not currently overquota (e.g. in LMTP service), so either we should go back to testing '>= 0', or change callers that did rely on that test - changing the value of annotations quota storage in the index (mailbox_use_annot_quota function) do dirty the quota, but not the index; the new value is thus not committed (at least when setting mailbox annotations), and the quota entry becomes false when later deleting the mailbox If needed, you can look in our 'cyrus-imapd' repository at the 'gnb/annotate/fixes' branch. There is one commit for those matters. Remaining things concerning annotations: - when deleting messages, annotations length is not substracted; the solution may not be that simple, since I believe users are allowed to unexpunge mails: so since index entry is still here - until real unlinking -, annotations may have to stay there too - until unlinking too - for 'old' mailboxes (those created before the annotation storage usage field in the index header), current annotations usage shall be computed (and added to the quota entry) upon upgrading; this way users won't have to run 'quota -f' for all quotaroots after switching to this new version ;) Then maybe some of the cassandane tests I pushed on our repository would need to be refreshed (at least the one that checks what happens for legacy mailboxes on which we add one of the newly handled quota resources). That's next week's focus I think. Actually I just gave up the 'old' test: there is no easy way to simulate upgrading mailbox index, or at least I don't feel confident enough to make it in cassandane :( Other tests do work. Once the annotations usage is subtracted upon messages deletion (see before), all tests shall pass :) I rebased the 'quotamessage/gnb/annotate' branch of our 'cassandane' repository today, leaving that test aside. Regards Julien
Re: MESSAGE quota resource implemention
On 08/09/11 00:45, Julien Coloos wrote: > Le 06/09/2011 10:23, Greg Banks a écrit : >> > >> >> ... >> >> I'm still not convinced we'll need quota.sets[], but I'll play along. >> >> Thanks again for your work, and sorry that my annotate branch wasn't >> quite as stable a base as you first thought :) > So, I saved my current branch to 'quotamessage-0/gnb/annotate' and > rebased my patches on current 'annotate' branch (with less racy 'quota > -f'). > I removed everything related to recomputing from my patches (as well as > quota.sets[]). > > What is missing now is the new index field, which value will be used in > mailbox_get_usage function. Since my changes do rely on this function, > and sometimes computes a delta compared to a previous call of that > function, it may not need to be updated afterwards ... I hope. Ok, here you go. Not completely tested yet, so caveat emptor. The following changes since commit af8d5bd3b40c4cb49fb7c943c85cb0aeab209215: Make quota -f less racy. (2011-09-06 15:53:49 +1000) are available in the git repository at: ssh://g...@github.com/gnb/cyrus-imapd.git annotate Greg Banks (9): Fix compile warning in cmd_setquota append_setup() et al take an array of quota diffs unit: restore simplicity to quota TESTCASE macro annotate: kill annotate_state_write_{start,finish} annotate: _delete and _msg_copy take mailbox* annotate: use mailbox* instead of name in API Fix annotation memory leak in sync_server annotate: track quota in mailbox header annotate: woops, forgot this hunk Julien Coloos (5): Added quota MESSAGE resource (RFC 2087) management. List and compute all set quota resources. Do not mess with mailbox index struct fields upon deleting. When tracking quota usage changes, check all resources. Added helper function to get messages current usage from mailbox struct. Handle quota resources other than STORAGE in DUMP/UNDUMP. Also fix protocol breaking upon dumping annotations. cunit/annotate.c | 84 + cunit/quota.c| 72 +-- imap/annotate.c | 507 -- imap/annotate.h | 19 ++- imap/append.c| 30 ++- imap/append.h|6 +- imap/cyr_virusscan.c |2 +- imap/imap_err.et |2 +- imap/imapd.c | 219 --- imap/index.c |7 +- imap/lmtpd.c | 33 +++- imap/lmtpengine.c|7 +- imap/lmtpengine.h|4 +- imap/mailbox.c | 104 --- imap/mailbox.h | 12 +- imap/mbdump.c| 146 --- imap/mboxlist.c | 31 ++-- imap/nntpd.c |4 +- imap/quota.c | 94 +++--- imap/quota.h | 18 ++- imap/quota_db.c | 36 +++- imap/smmapd.c| 15 +- imap/sync_client.c |4 +- imap/sync_server.c | 19 ++- imap/sync_support.c |6 - lib/imapoptions | 11 +- 26 files changed, 933 insertions(+), 559 deletions(-) > Then maybe some of the cassandane tests I pushed on our repository would > need to be refreshed (at least the one that checks what happens for > legacy mailboxes on which we add one of the newly handled quota resources). > That's next week's focus I think. -- Greg.
Re: MESSAGE quota resource implemention
Sent from my iPhone On 08/09/2011, at 0:45, Julien Coloos wrote: Le 06/09/2011 10:23, Greg Banks a écrit : a) my commit "Make quota -f less racy" is going to cause lots of clashes (sorry!) b) Bron and I both think that your commit "Compute each quota resource upon setting it for the first time." is unnecessary, given that i) quota -f doesn't suck now, and ii) soon, all of the quota-able quantities will be tracked in fields in the index header. So I think we'll need another round, sorry :( Given that the annotations quota is broken and I'll be reimplementing it anyway, you may as well ignore QUOTA_ANNOTSTORAGE in all commits, and leave out the function annotatemore_computequota() for now. We'll use something very much like it for reconstruct later. I'm hoping to be able to pull your next round of changes into my annotate branch before I reimplement the annotation quota in the next few days. ... I'm still not convinced we'll need quota.sets[], but I'll play along. Thanks again for your work, and sorry that my annotate branch wasn't quite as stable a base as you first thought :) So, I saved my current branch to 'quotamessage-0/gnb/annotate' and rebased my patches on current 'annotate' branch (with less racy 'quota -f'). I removed everything related to recomputing from my patches (as well as quota.sets[]). Excellent, I'll take a look at these when I get into the office. What is missing now is the new index field, which value will be used in mailbox_get_usage function. Since my changes do rely on this function, and sometimes computes a delta compared to a previous call of that function, it may not need to be updated afterwards ... I hope. That seems likely. I have an almost-building diff which adds that field. Then maybe some of the cassandane tests I pushed on our repository would need to be refreshed (at least the one that checks what happens for legacy mailboxes on which we add one of the newly handled quota resources). Yep. Greg.
Re: MESSAGE quota resource implemention
Le 06/09/2011 10:23, Greg Banks a écrit : a) my commit "Make quota -f less racy" is going to cause lots of clashes (sorry!) b) Bron and I both think that your commit "Compute each quota resource upon setting it for the first time." is unnecessary, given that i) quota -f doesn't suck now, and ii) soon, all of the quota-able quantities will be tracked in fields in the index header. So I think we'll need another round, sorry :( Given that the annotations quota is broken and I'll be reimplementing it anyway, you may as well ignore QUOTA_ANNOTSTORAGE in all commits, and leave out the function annotatemore_computequota() for now. We'll use something very much like it for reconstruct later. I'm hoping to be able to pull your next round of changes into my annotate branch before I reimplement the annotation quota in the next few days. ... I'm still not convinced we'll need quota.sets[], but I'll play along. Thanks again for your work, and sorry that my annotate branch wasn't quite as stable a base as you first thought :) So, I saved my current branch to 'quotamessage-0/gnb/annotate' and rebased my patches on current 'annotate' branch (with less racy 'quota -f'). I removed everything related to recomputing from my patches (as well as quota.sets[]). What is missing now is the new index field, which value will be used in mailbox_get_usage function. Since my changes do rely on this function, and sometimes computes a delta compared to a previous call of that function, it may not need to be updated afterwards ... I hope. Then maybe some of the cassandane tests I pushed on our repository would need to be refreshed (at least the one that checks what happens for legacy mailboxes on which we add one of the newly handled quota resources). Regards Julien
Re: MESSAGE quota resource implemention
Le 06/09/2011 19:48, Julien Coloos a écrit : Le 06/09/2011 10:23, Greg Banks a écrit : So I think we'll need another round, sorry :( Given that the annotations quota is broken and I'll be reimplementing it anyway, you may as well ignore QUOTA_ANNOTSTORAGE in all commits, and leave out the function annotatemore_computequota() for now. We'll use something very much like it for reconstruct later. I'm hoping to be able to pull your next round of changes into my annotate branch before I reimplement the annotation quota in the next few days. If the new code is capable of returning actual resource usages upon getquota (I think Bron wanted it that way), then I guess I can drop some more of the functions I added (annotatemore_computequota, but also mboxlist_updatequota and associated code including the quota.sets[]; then maybe I could also change the code as proposed earlier and prevent writing of resource limit in quota entry if it is QUOTA_UNLIMITED). Will look at it tomorrow. When I said 'actual', I meant based on the upcoming mailbox index field. But wait, I got a bit confused with the last point about playing along with quota.sets[]. If what you discussed with Bron is that, in the end, usage (re)computing will only be done with the quota utility (no more automatic computing upon setquota, neither for getquota), then after upgrading people shall call 'quota -f' once and for all and quota.sets[] must disappear - and all resources must be written in the quota entry -, otherwise users would need to call 'quota -f' on a given quotaroot each time they set a new quota resource limit to a mailbox. Regards Julien
Re: MESSAGE quota resource implemention
Le 06/09/2011 10:23, Greg Banks a écrit : So I think we'll need another round, sorry :( Given that the annotations quota is broken and I'll be reimplementing it anyway, you may as well ignore QUOTA_ANNOTSTORAGE in all commits, and leave out the function annotatemore_computequota() for now. We'll use something very much like it for reconstruct later. I'm hoping to be able to pull your next round of changes into my annotate branch before I reimplement the annotation quota in the next few days. If the new code is capable of returning actual resource usages upon getquota (I think Bron wanted it that way), then I guess I can drop some more of the functions I added (annotatemore_computequota, but also mboxlist_updatequota and associated code including the quota.sets[]; then maybe I could also change the code as proposed earlier and prevent writing of resource limit in quota entry if it is QUOTA_UNLIMITED). Will look at it tomorrow. Worked on DUMP/UNDUMP today: https://github.com/worldline-messaging/cyrus-imapd/commit/45148b20a4f2343d72aa7436e7255a92508d7bf8 I used an IMAP list to send data, as for annotations. For future usages, maybe the UNDUMP code could try to skip IMAP lists if it finds any instead of the expected file content LITERAL ? (to prevent breaking compatibility too many times, in case other things would need to be transmitted this way through DUMP/UNDUMP). By the way, I noticed that UNDUMPing annotations fails for now (function annotate_state_write uses 'int_mboxname' in the annotation state struct, but it is NULL in this context). I'm still not convinced we'll need quota.sets[], but I'll play along. Thanks again for your work, and sorry that my annotate branch wasn't quite as stable a base as you first thought :) No problem, I was somehow prepared to that kind of scenario :) Regards Julien
Re: MESSAGE quota resource implemention
On 06/09/11 02:55, Julien Coloos wrote: > Le 05/09/2011 06:12, Greg Banks a écrit : >> >> Julien, I think we agreed on everything else, right? I'm looking >> forward to your next iteration. > After picking your 'uquota_t removal' commit, I also removed it on my > end, and changed the code according to our previous discussions. Ok, I've read these: commit "Remove uquota_t" commit "Minor code cleanup" commit "Do not mess with mailbox index struct fields upon deleting." commit "When tracking quota usage changes, check all resources." and it all looks fine to me, except that: a) my commit "Make quota -f less racy" is going to cause lots of clashes (sorry!) b) Bron and I both think that your commit "Compute each quota resource upon setting it for the first time." is unnecessary, given that i) quota -f doesn't suck now, and ii) soon, all of the quota-able quantities will be tracked in fields in the index header. So I think we'll need another round, sorry :( Given that the annotations quota is broken and I'll be reimplementing it anyway, you may as well ignore QUOTA_ANNOTSTORAGE in all commits, and leave out the function annotatemore_computequota() for now. We'll use something very much like it for reconstruct later. I'm hoping to be able to pull your next round of changes into my annotate branch before I reimplement the annotation quota in the next few days. > Adding > an helper function which fills a quota_t array with the current messages > usage from mailbox struct made the code a bit cleaner and more generic :) Yes, I'm very pleased with that :) > I still have to look for transferring limits upon DUMP/UNDUMP. And for > the time being, I still kept the quota.sets[] thing (we will see later > if we can do it more smartly). I'm still not convinced we'll need quota.sets[], but I'll play along. Thanks again for your work, and sorry that my annotate branch wasn't quite as stable a base as you first thought :) -- Greg.
Re: MESSAGE quota resource implemention
On 05/09/11 20:16, Greg Banks wrote: > On 02/09/11 20:03, Bron Gondwana wrote: >> On Fri, Sep 02, 2011 at 07:36:20PM +1000, Greg Banks wrote: >> >>> How's about this for a strategy? >>> >>> When a quota resource is first enabled, (i.e. the limit is changed from >>> UNLIMITED to some finite value), the usage is stored as some special >>> value which I'll call INDETERMINATE. >> >> What about 'getquota'? I don't support any solution which leaves getquota >> returning bogus values or failing to respond. That's just icky and >> confusing. >> >> I don't think you can avoid two passes, and I don't even think you can >> avoid two values during if you really want to be good about it. >> >> [...] >> >> There are pure ways to do this, that guarantee consistency. [...] >> >> That's a real, robust solution. But it's pretty heavy engineering. > > After some thought, I agreed, decided it's quite doable, and decided to > start. I have an implementation coded, so far the hardest bit is testing it. Implemented, tested, and I'd appreciate a review. https://github.com/gnb/cyrus-imapd/commit/af8d5bd3b40c4cb49fb7c943c85cb0aeab209215 https://github.com/gnb/cassandane/commit/c529f745b51aa0b566020ecbffca774e783124ee https://github.com/gnb/cassandane/commit/569eea2b9a2ad56b98c3c5cea60509689fa49bad -- Greg.
Re: MESSAGE quota resource implemention
On Mon, Sep 05, 2011 at 07:19:00PM +0200, Julien Coloos wrote: > Le 05/09/2011 12:06, Bron Gondwana a écrit : > >On Mon, Sep 05, 2011 at 02:32:40PM +1000, Greg Banks wrote: > > > >> - add a 32b mailbox index header entry to track the storage in bytes > >>used by all annotations for the mailbox itself or for messages in the > >>mailbox > >Why not 64b? Admittedly hanging gigabytes of annotations off a single > >folders is probably evil, but this is the mailbox header - 4 bytes per > >mailbox to not even have to think about it is super-cheap. We keep a > >handful of "blanks" around in the header already. > > > >... > > > >> - in mailbox_commit_quota(), if the field is not "unknown", then > >>calculate the delta in usage and apply to the quota db. > > > This new field seems useful for the quota utility and when setting a > new quotaroot: it wouldn't need to check each annotation associated > to the mailbox (or messages) and would rely on that field, as it > currently does for total message size and now the number of messages > (with my patch). > I don't know what other people do with cyrus, but here those two > situations do not happen that often: > - quotaroot is usually set once upon creating the user mailbox > - quota utility is rarely used, and usually on one mailbox at a time > So maybe I am missing something, but my question would be: is it > really worth it ? I would challenge the "usually on one mailbox at a time" theory - it's often on one quota root at a time (i.e. - a single user), but also can be used on the entire server. Or for someone with a domain quota, on all the users in a single domain. This can be quite the race condition if someone has a lot of mailboxes and a lot of traffic on said mailboxes. Short of killing their connections and blocking LMTP traffic, you can't guarantee atomicity on actions that cross multiple mailboxes. But - the new field definitely has value. I like the idea of just trusting that field in the quota utility and for regular tasks. Bron.
Re: MESSAGE quota resource implemention
Le 05/09/2011 12:06, Bron Gondwana a écrit : On Mon, Sep 05, 2011 at 02:32:40PM +1000, Greg Banks wrote: - add a 32b mailbox index header entry to track the storage in bytes used by all annotations for the mailbox itself or for messages in the mailbox Why not 64b? Admittedly hanging gigabytes of annotations off a single folders is probably evil, but this is the mailbox header - 4 bytes per mailbox to not even have to think about it is super-cheap. We keep a handful of "blanks" around in the header already. ... - in mailbox_commit_quota(), if the field is not "unknown", then calculate the delta in usage and apply to the quota db. This new field seems useful for the quota utility and when setting a new quotaroot: it wouldn't need to check each annotation associated to the mailbox (or messages) and would rely on that field, as it currently does for total message size and now the number of messages (with my patch). I don't know what other people do with cyrus, but here those two situations do not happen that often: - quotaroot is usually set once upon creating the user mailbox - quota utility is rarely used, and usually on one mailbox at a time So maybe I am missing something, but my question would be: is it really worth it ? Regards Julien
Re: MESSAGE quota resource implemention
Le 05/09/2011 06:12, Greg Banks a écrit : Julien, I think we agreed on everything else, right? I'm looking forward to your next iteration. After picking your 'uquota_t removal' commit, I also removed it on my end, and changed the code according to our previous discussions. Adding an helper function which fills a quota_t array with the current messages usage from mailbox struct made the code a bit cleaner and more generic :) I still have to look for transferring limits upon DUMP/UNDUMP. And for the time being, I still kept the quota.sets[] thing (we will see later if we can do it more smartly). Regards Julien
Re: MESSAGE quota resource implemention
On 05/09/11 20:06, Bron Gondwana wrote: > On Mon, Sep 05, 2011 at 02:32:40PM +1000, Greg Banks wrote: >> Ok, I'm now convinced my first attempt at annotation quotas sucked too >> hard, here's how I want to re-implement it. Let me know what you think. > >> - add a 32b mailbox index header entry to track the storage in bytes >> used by all annotations for the mailbox itself or for messages in the >> mailbox > > Why not 64b? Admittedly hanging gigabytes of annotations off a single > folders is probably evil, Do we even have a db backend that would support that? > but this is the mailbox header - 4 bytes per > mailbox to not even have to think about it is super-cheap. We keep a > handful of "blanks" around in the header already. > >> - at header upgrade time, write the special value ~0 into this field, >> meaning "unknown" > > Funky. If you didn't want to get all special-value about it, you could > use a flag over in the flags field as well, there are only about 5 of > the 32 in use so far. Sure. >> - in mailbox_commit_quota(), if the field is not "unknown", then >> calculate the delta in usage and apply to the quota db. > > I assume this is done the same way it's done now? Something like that. > By taking a "snapshot" > of the value at the start, and calculating a diff to apply at the end. > That was actually a filthy workaround (one of many, dammit) for using > unsigned datatypes. If we made this a signed 64 bit datatype as well, > with a flag off to the side for "unknown", then we could actually store > a diff in the mailbox object directly - and update both this field and > the quota's value during the commit. Ok. >> - make reconstruct scan the annotations db to figure out the correct >> value for this new field. > > Yep - that makes sense. Reconstruct already does this for the > quota_mailbox_used field, doesn't it. Yes. > And any other derived > "aggregate" values in the header. > >> This means that the annotation STORE and SETMETADATA paths will be >> updating the quota db in the same place that APPEND et al do, >> mailbox_commit_quota(). This should work around Bug #3529 and also make >> the code neater. > > Nice. Yeah, all I have to do is work out how to futz with the annotate API yet again :( -- Greg.
Re: MESSAGE quota resource implemention
On 02/09/11 20:03, Bron Gondwana wrote: > On Fri, Sep 02, 2011 at 07:36:20PM +1000, Greg Banks wrote: > >> How's about this for a strategy? >> >> When a quota resource is first enabled, (i.e. the limit is changed from >> UNLIMITED to some finite value), the usage is stored as some special >> value which I'll call INDETERMINATE. > > What about 'getquota'? I don't support any solution which leaves getquota > returning bogus values or failing to respond. That's just icky and > confusing. > > I don't think you can avoid two passes, and I don't even think you can > avoid two values during if you really want to be good about it. > > [...] > > There are pure ways to do this, that guarantee consistency. [...] > > That's a real, robust solution. But it's pretty heavy engineering. After some thought, I agreed, decided it's quite doable, and decided to start. I have an implementation coded, so far the hardest bit is testing it. -- Greg.
Re: MESSAGE quota resource implemention
On Mon, Sep 05, 2011 at 02:32:40PM +1000, Greg Banks wrote: > Ok, I'm now convinced my first attempt at annotation quotas sucked too > hard, here's how I want to re-implement it. Let me know what you think. > - add a 32b mailbox index header entry to track the storage in bytes > used by all annotations for the mailbox itself or for messages in the > mailbox Why not 64b? Admittedly hanging gigabytes of annotations off a single folders is probably evil, but this is the mailbox header - 4 bytes per mailbox to not even have to think about it is super-cheap. We keep a handful of "blanks" around in the header already. > - at header upgrade time, write the special value ~0 into this field, > meaning "unknown" Funky. If you didn't want to get all special-value about it, you could use a flag over in the flags field as well, there are only about 5 of the 32 in use so far. > - in mailbox_commit_quota(), if the field is not "unknown", then > calculate the delta in usage and apply to the quota db. I assume this is done the same way it's done now? By taking a "snapshot" of the value at the start, and calculating a diff to apply at the end. That was actually a filthy workaround (one of many, dammit) for using unsigned datatypes. If we made this a signed 64 bit datatype as well, with a flag off to the side for "unknown", then we could actually store a diff in the mailbox object directly - and update both this field and the quota's value during the commit. > - make reconstruct scan the annotations db to figure out the correct > value for this new field. Yep - that makes sense. Reconstruct already does this for the quota_mailbox_used field, doesn't it. And any other derived "aggregate" values in the header. > This means that the annotation STORE and SETMETADATA paths will be > updating the quota db in the same place that APPEND et al do, > mailbox_commit_quota(). This should work around Bug #3529 and also make > the code neater. Nice. Bron.
Re: MESSAGE quota resource implemention
On 01/09/11 22:22, Bron Gondwana wrote: > On Thu, Sep 01, 2011 at 01:27:00PM +0200, Julien Coloos wrote: >> Le 01/09/2011 03:03, Greg Banks a écrit : > >>> I'm thinking that there's now three places in the code which take >>> a mailbox* and fill out an array of quota diffs, interpreting the >>> contents of the struct mailbox. That should really be >>> centralised. >> I'm not sure to see what you have in mind here. >> Are you talking about the places where the QUOTA_STORAGE and >> QUOTA_MESSAGE entries of the quota diff array are computed >> relatively to the 'quota_mailbox_used' and 'exists' index fields of >> the struct mailbox ? If so I guess some of the code could indeed be >> centralised. > > One place please :) Ideally I'd like to absorb more of the quota > stuff into mailbox.c. Greg and I have some debate about this - how > much is too much for that file to be doing. Probably it should be > abstracted into a couple of layers of stuff - but I really do like > the consistency of having just a couple of function calls: > > mailbox_append_index_record; and > mailbox_update_index_record > > which do all the consistency checking and counter updating inside. > Plus of course a mailbox_check_quota thing that takes a set of > quota checks to do and sees if there will be space for the > planned changes! Ok, I'm now convinced my first attempt at annotation quotas sucked too hard, here's how I want to re-implement it. Let me know what you think. - rejig the annotation code to always have an open and locked mailbox* before opening any annotations db (for per-mailbox and per-message annotations) - add a 32b mailbox index header entry to track the storage in bytes used by all annotations for the mailbox itself or for messages in the mailbox - at header upgrade time, write the special value ~0 into this field, meaning "unknown" - when setting or clearing annotations, if the field's value is ~0 then recalculate it by scanning the annotations databases - when setting or clearing annotations, after the field is no longer in it's "unknown" state, increment or decrement the field by the size of the annotation - in mailbox_commit_quota(), if the field is not "unknown", then calculate the delta in usage and apply to the quota db. - make reconstruct scan the annotations db to figure out the correct value for this new field. This means that the annotation STORE and SETMETADATA paths will be updating the quota db in the same place that APPEND et al do, mailbox_commit_quota(). This should work around Bug #3529 and also make the code neater. -- Greg.
Re: MESSAGE quota resource implemention
On 01/09/11 21:27, Julien Coloos wrote: >> >> [...] Perhaps >> we should a) document it clearly and b) detect the situation and put >> an obvious message saying something like "you may need to run quota -f >> ..." where the human making the change will see it. Also, such a >> message might be useful when usage underflow is detected. >> > My concern here is that there are times when quota changes are not done > manually by a human, but happen automatically as part of platform > provisioning scheme. For example, on many platforms we manage, if a user > subscribes to some offer his/her mailbox quota will automatically be > upgraded (IMAP action). So, at least in our case, I though it might be > useful. > > But I agree that in case of underflow detection throwing a warning in > syslog might help draw the attention when logs are analysed. Ok, done: https://github.com/gnb/cyrus-imapd/commit/084dcf9b2f586c6f6c0e2298a66b73f46ae43f16 Julien, I think we agreed on everything else, right? I'm looking forward to your next iteration. -- Greg.
Re: MESSAGE quota resource implemention
On 01/09/11 03:21, Bron Gondwana wrote: > On Wed, Aug 31, 2011 at 05:50:36PM +0200, Julien Coloos wrote: >> Things that may be worth noting: >> - DUMP/UNDUMP currently does nothing special about MESSAGE or >> X-ANNOTATION-STORAGE quota resources >> -> should it be transferred ? > > I'd like to replace DUMP/UNDUMP with replication protocol > communications for XFER. > >> -> without breaking backward compatibility, limits could only be >> transferred through a 'fake' file entry, as for annotations > > But for now, that's definitely the pragmatic way to go. > >> - quota usage is currently stored in a uquota_t variable, and >> delta is computed as quota_t; so theorically there could be overflow >> issues if quota usage to add/substract cannot be held in a quota_t; >> in practice it should be unlikely since that would mean a usage of >> over 2^63-1 > > I propose we scrap uquota_t - it is un-necessary for the medium-term > future, now that we're requiring 64 bit types. Agreed, and done: https://github.com/gnb/cyrus-imapd/commit/836eae7b4bda663882c7f135e76b7453b1665436 -- Greg.
Re: MESSAGE quota resource implemention
On Fri, Sep 02, 2011 at 07:36:20PM +1000, Greg Banks wrote: > If the software was robust, underflow would not happen and we would not > need to test for it and handle it. Thus the log messages are not > operational messages intended for the sysadmin, but warnings about > internal Cyrus problems intended for Cyrus developers, and syslog is a > suitable place for them. In theory, yes - assuming you can keep blackbox control over all the filesystems that Cyrus is operating over, and the sysadmin never restores from backup or otherwise screws with any of the underlying files. > How's about this for a strategy? > > When a quota resource is first enabled, (i.e. the limit is changed from > UNLIMITED to some finite value), the usage is stored as some special > value which I'll call INDETERMINATE. What about 'getquota'? I don't support any solution which leaves getquota returning bogus values or failing to respond. That's just icky and confusing. I don't think you can avoid two passes, and I don't even think you can avoid two values during if you really want to be good about it. Anyway - moving a new folder into a quotaroot is NOT racy. You just need to read quota_mailbox_used on the mailbox, then lock the old quota root, subtract N bytes and unlock it - then lock the NEW quota root, add N bytes and unlock it again. No problem. The only issue is updating WITHOUT changing the quotaroot, which is an issue because a particular mailbox doesn't know if it's already been counted in the new quota value or not, so if it should be updating the value. There are pure ways to do this, that guarantee consistency. I think the best way is probably some sort of A/B thing, where you label the quotaroot as A or B in the mailbox - AND in the quota root. So the initial state looks like this: ROOT: A A: $usage B: INVALID When you want to run a quota -f you set 'B' to zero, and then run the update logic over all mailboxes, updating B with the value as you go, and setting the quotaroot in the mailbox to be in state 'B', so it also updates B. Any mailbox in state 'B' will update both A AND B, because the root is still in state A. Mailboxes in state 'A' will only update 'A', because they match the root. When you have finished quota -f, both values are being updated simultaneously by all mailboxes. You also have two fields which you can compare, and a guarantee that they were both atomically updated, so if they're not the same then there was definitely corruption, not just a race condition. So you can report that. Then you update the quota root to say 'ROOT: B', and 'A' invalid. Or even just A: zero. If anything continues to update the wrong field then you also have corruption (probably a mailbox outside the quotaroot pointing to it, which is pretty silly) That's a real, robust solution. But it's pretty heavy engineering. > As far as I can see the only point of that first append_check() call is > to fail early in the case of a permission fail. It's nice to fail before the client starts uploading. Failing any later than that is kinda pointless, because you're not saving the client bandwidth - which may matter. The disk IO is unlikely to matter to the server, since it's a rare case. > convinced I got it very wrong, and should have left all the quota > updating in mailbox_commit_quota(). I was trying hard to avoid adding a > field to the index header to track the storage used by all the > annotations for the mailbox and for messages in the mailbox; but I'm > really not happy with the results :( Well, we're not committed to keeping it that way - it's not as if it's even "in the wild" except for some really early adopters of the master branch, who deserve whatever pain they get (mostly, that's us - and we know enough to be able to clean up any mess) Bron.
Re: MESSAGE quota resource implemention
On 01/09/11 22:22, Bron Gondwana wrote: > On Thu, Sep 01, 2011 at 01:27:00PM +0200, Julien Coloos wrote: >> Le 01/09/2011 03:03, Greg Banks a écrit : > > But more generally, the "update the quotaroot" is atomic-safe, > because the mailbox doesn't add/remove things from the quotaroot > racily - but quota -f IS racy. The only way around that would > be a dual pass mark and collect thing, where you marked each > mailbox as "we're re-calculating, so don't update the quota usage" > in the first sweep, then came back and removed the mark as you > read the value. Tricky, but doable. The downside is that a > crash part way through can leave you in a broken state. But > that's true of everything. > >> But I agree that in case of underflow detection throwing a warning >> in syslog might help draw the attention when logs are analysed. > > "when". haha. > > (maybe at a few sites that care... but for the vast majority of > sites, if you're depending on them reading syslog you've already > lost. Software that understands that and is robust in the face > of errors is much nicer for the poor suckers on the receiving end > of all this) If the software was robust, underflow would not happen and we would not need to test for it and handle it. Thus the log messages are not operational messages intended for the sysadmin, but warnings about internal Cyrus problems intended for Cyrus developers, and syslog is a suitable place for them. How's about this for a strategy? When a quota resource is first enabled, (i.e. the limit is changed from UNLIMITED to some finite value), the usage is stored as some special value which I'll call INDETERMINATE. Changing a mailbox' quotaroot to an existing quotaroot, resets all the quotaroot's usages to INDETERMINATE. Usage deltas applied to the INDETERMINATE value silently yield INDETERMINATE back. Usage deltas which would underflow a finite (i.e. not INDETERMINATE) usage value are clamped with a warning about an internal consistency problem. A regularly run process such as cyr_expire finds quotaroots which have one or more INDETERMINATE usages and runs quota -f on them. In this model, INDETERMINATE value acts like your "dual pass mark". > >>> In the 1st hunk in cmd_append(), at this point in the code I >>> believe totalsize = 0, so you could easily pass 0 for the new last >>> argument as well. >> Yes at this point totalsize is still 0. >> The current code, which handles MULTIAPPEND, does a preliminary >> check to see if mailbox is not overquota, then receives all >> messages, and finally checks that all messages can fit in the >> mailbox. >> Since we know for sure at least one message is coming, I though it >> could be a good idea to early-check that at least one more message >> would fit in the mailbox before receiving anything. Otherwise we are >> staging new file only to trash it at the end (when overquota). As far as I can see the only point of that first append_check() call is to fail early in the case of a permission fail. So >> actually I wonder if the code could be updated to check (LITERAL >> parameter) that each message about to be received would fit in the >> mailbox, instead of checking only once at the end ? Literals, and binary, and urls... I do think there's an argument to made for moving the maintenance of the totalsize parameter into the struct appendstate, and once you do that it's possible to do quota/permission checks at finer grained points. But why bother? Why slow down the common case of not exceeding quota? > > There are some bugs about this already. In particularl the opposite > case, checking that we actually want to append something before > aborting a sieve delivery - because it may be discarded or redirected > or even duplicate suppressed anyway. Something to keep in mind. > > >>> I'm thinking that there's now three places in the code which take >>> a mailbox* and fill out an array of quota diffs, interpreting the >>> contents of the struct mailbox. That should really be >>> centralised. >> I'm not sure to see what you have in mind here. >> Are you talking about the places where the QUOTA_STORAGE and >> QUOTA_MESSAGE entries of the quota diff array are computed >> relatively to the 'quota_mailbox_used' and 'exists' index fields of >> the struct mailbox ? Yes. > One place please :) Ideally I'd like to absorb more of the quota > stuff into mailbox.c. Greg and I have some debate about this - how > much is too much for that file to be doing. Probably it should be > abstracted into a couple of layers of stuff - but I really do like > the consistency of having just a couple of function calls: > > mailbox_append_index_record; and > mailbox_update_index_record > > which do all the consistency checking and counter updating inside. > Plus of course a mailbox_check_quota thing that takes a set of > quota checks to do and sees if there will be space for the > planned changes! After implementing the X-ANNOTATION-STORAGE quota, I'm
Re: MESSAGE quota resource implemention
On Fri, Sep 02, 2011 at 09:37:24AM +1000, Rob Mueller wrote: > > >Actually, really I'd like to create a new UNIQUEID - and store > >all the files in paths based on uniqueid rather than on folder > >name. This would not only mean renames could be fast and > >atomic, but that delayed delete would be fast. The downside is > >a more opaque filesystem layout. Oh, another upside - file path > >limitations don't exist so much any more. > > While UNIQUEID is nice, the opaqueness is annoying. Personally, I > liked the idea that we talked about a while back which I think was: > > $spooldir/a/user/aardvark/user.aardvark/ > $spooldir/a/user/aardvark/user.aardvark.drafts/ > $spooldir/a/user/aardvark/user.aardvark.trash/ > $spooldir/a/user/aardvark/user.aardvark.foo/ > $spooldir/a/user/aardvark/user.aardvark.foo.bar/ > $spooldir/a/user/aardvark/user.aardvark.abc.xyz/ I still have a patch somewhere that does that. > So you end up with every folder for a user in one dir. This solves > the current messy handling of sub-dirs (eg currently you have to > create the intermediate dir /abc/ even though there's no entry in > mailboxes.db for it), and makes renaming any folder very cheap still > (because you can do it with a single dir rename, rather than having > to move each message file), but you don't go completely opaque. It does give you a 256 character folder tree limit on many operating systems. > Delete handling is still easier, just rename to > DELETED.$oldname.$UNIQUEID or something like that, because it's > cheap to rename anyway. Sorry, make that 239, once you make space for timestamps and dots. > Of course it means re-organising folder layout for every > installation out there, but maybe we need to bump major versions > anyway, cyrus 3.0 here we come :) tools/rehash - I have one of those that can handle this plus all the other existing formats as well. It's on a git branch somewhere. Bron.
Re: MESSAGE quota resource implemention
Actually, really I'd like to create a new UNIQUEID - and store all the files in paths based on uniqueid rather than on folder name. This would not only mean renames could be fast and atomic, but that delayed delete would be fast. The downside is a more opaque filesystem layout. Oh, another upside - file path limitations don't exist so much any more. While UNIQUEID is nice, the opaqueness is annoying. Personally, I liked the idea that we talked about a while back which I think was: $spooldir/a/user/aardvark/user.aardvark/ $spooldir/a/user/aardvark/user.aardvark.drafts/ $spooldir/a/user/aardvark/user.aardvark.trash/ $spooldir/a/user/aardvark/user.aardvark.foo/ $spooldir/a/user/aardvark/user.aardvark.foo.bar/ $spooldir/a/user/aardvark/user.aardvark.abc.xyz/ So you end up with every folder for a user in one dir. This solves the current messy handling of sub-dirs (eg currently you have to create the intermediate dir /abc/ even though there's no entry in mailboxes.db for it), and makes renaming any folder very cheap still (because you can do it with a single dir rename, rather than having to move each message file), but you don't go completely opaque. Delete handling is still easier, just rename to DELETED.$oldname.$UNIQUEID or something like that, because it's cheap to rename anyway. Of course it means re-organising folder layout for every installation out there, but maybe we need to bump major versions anyway, cyrus 3.0 here we come :) Rob
Re: MESSAGE quota resource implemention
Le 01/09/2011 14:22, Bron Gondwana a écrit : A provisioning system could run quota -f itself after making the change, of course. Sure. But since the quota is being changed, clients would wonder why they need to call another quota utility to finish the job. Plus I wanted to have something similar to how it works when the quota entry is first created (which previously only managed one resource): people didn't need to call the quota utility after creating the quotaroot, so if it's doable for other resources it's better. But I agree that in case of underflow detection throwing a warning in syslog might help draw the attention when logs are analysed. "when". haha. (maybe at a few sites that care... but for the vast majority of sites, if you're depending on them reading syslog you've already lost. Software that understands that and is robust in the face of errors is much nicer for the poor suckers on the receiving end of all this) :D It's sometimes hard to find a good compromise between "we have good faith current data are correct, so for non-vital ones we perform minimal checks and provide tools to repair punctually" and "we are paranoid and check/repair each and every data all the time" ;) But I am a bit biased here due to past experiences with non-robust/buggy code that tries to correct itself and often ends up in endless iterations... FYI - I'm planning to be a bit more lazy about mailbox deletes at some point. It would be super-cool to not even move the files until someone trys to create a new mailbox with the same name, otherwise just clean them up in the regular course of cyr_expire. Need to think about that some more though. Actually, really I'd like to create a new UNIQUEID - and store all the files in paths based on uniqueid rather than on folder name. This would not only mean renames could be fast and atomic, but that delayed delete would be fast. The downside is a more opaque filesystem layout. Oh, another upside - file path limitations don't exist so much any more. Nice :) One place please :) Ideally I'd like to absorb more of the quota stuff into mailbox.c. Greg and I have some debate about this - how much is too much for that file to be doing. Probably it should be abstracted into a couple of layers of stuff - but I really do like the consistency of having just a couple of function calls: mailbox_append_index_record; and mailbox_update_index_record which do all the consistency checking and counter updating inside. Plus of course a mailbox_check_quota thing that takes a set of quota checks to do and sees if there will be space for the planned changes! Makes sense. Regards Julien
Re: MESSAGE quota resource implemention
On Thu, Sep 01, 2011 at 02:50:30PM +0200, Michael Menge wrote: > Quoting Bron Gondwana : > > >Actually, really I'd like to create a new UNIQUEID - and store > >all the files in paths based on uniqueid rather than on folder > >name. This would not only mean renames could be fast and > >atomic, but that delayed delete would be fast. The downside is > >a more opaque filesystem layout. Oh, another upside - file path > >limitations don't exist so much any more. > > > > How do you restore a folder from a filebased backup? > Or exclude some folders form backup (e.g. trash folders)? Tools :) More specfically, an actual backup command rather than doing file-based backups in the first place. That way we can do consistent snapshots. Bron.
Re: MESSAGE quota resource implemention
Quoting Bron Gondwana : Actually, really I'd like to create a new UNIQUEID - and store all the files in paths based on uniqueid rather than on folder name. This would not only mean renames could be fast and atomic, but that delayed delete would be fast. The downside is a more opaque filesystem layout. Oh, another upside - file path limitations don't exist so much any more. How do you restore a folder from a filebased backup? Or exclude some folders form backup (e.g. trash folders)? M.MengeTel.: (49) 7071/29-70316 Universität Tübingen Fax.: (49) 7071/29-5912 Zentrum für Datenverarbeitung mail: michael.me...@zdv.uni-tuebingen.de Wächterstraße 76 72074 Tübingen smime.p7s Description: S/MIME Signatur
Re: MESSAGE quota resource implemention
On Thu, Sep 01, 2011 at 01:27:00PM +0200, Julien Coloos wrote: > Le 01/09/2011 03:03, Greg Banks a écrit : > >This one's tough, I wasn't sure what to do. However, I'm happy to > >leave it to the administrator to have to manually run quota -f > >(maybe twice!) if they set a quota on a resource that is already > >being used. I'm unconvinced that automatically doing the > >equivalent of -f as a side effect of setting the first limit is > >necessary or wise. Perhaps we should a) document it clearly and > >b) detect the situation and put an obvious message saying > >something like "you may need to run quota -f ..." where the human > >making the change will see it. Also, such a message might be > >useful when usage underflow is detected. > > > My concern here is that there are times when quota changes are not > done manually by a human, but happen automatically as part of > platform provisioning scheme. For example, on many platforms we > manage, if a user subscribes to some offer his/her mailbox quota > will automatically be upgraded (IMAP action). So, at least in our > case, I though it might be useful. A provisioning system could run quota -f itself after making the change, of course. But more generally, the "update the quotaroot" is atomic-safe, because the mailbox doesn't add/remove things from the quotaroot racily - but quota -f IS racy. The only way around that would be a dual pass mark and collect thing, where you marked each mailbox as "we're re-calculating, so don't update the quota usage" in the first sweep, then came back and removed the mark as you read the value. Tricky, but doable. The downside is that a crash part way through can leave you in a broken state. But that's true of everything. > But I agree that in case of underflow detection throwing a warning > in syslog might help draw the attention when logs are analysed. "when". haha. (maybe at a few sites that care... but for the vast majority of sites, if you're depending on them reading syslog you've already lost. Software that understands that and is robust in the face of errors is much nicer for the poor suckers on the receiving end of all this) > Le 01/09/2011 10:15, Greg Banks a écrit : > >commit 2d4fb0 "Added quota MESSAGE resource (RFC 2087) management." > > > >imap/quota.h > > > >Why #include ? It's not like we're using any of the > >typedefs there. There's a good argument to be made that we > >*should* do so, but until then the #include doesn't seem useful. > > > As you realised in the next commit, I removed QUOTA_REPORT_FMT > (appearing only in quota.c) to use PRIdMAX/PRIuMAX and cast to > (u)intmax_t upon formatting. Using those appeared as a future-proof > solution in the discussion about 32/64-bit variables thread I > started some time ago :) > So yeah, the include could have been done *only* in quota.c, but I > remembered that Bron wanted to get rid of (u)quota_t later. So I > decided to put the include in quota.h for later use. I don't mind keeping quota_t around. Using it makes it a bit clearer that you intend to store a quota in this variable. It's like free documentation. Just uquota_t is pointless in a 64bit required world. > >In the 1st hunk in cmd_append(), at this point in the code I > >believe totalsize = 0, so you could easily pass 0 for the new last > >argument as well. > Yes at this point totalsize is still 0. > The current code, which handles MULTIAPPEND, does a preliminary > check to see if mailbox is not overquota, then receives all > messages, and finally checks that all messages can fit in the > mailbox. > Since we know for sure at least one message is coming, I though it > could be a good idea to early-check that at least one more message > would fit in the mailbox before receiving anything. Otherwise we are > staging new file only to trash it at the end (when overquota). So > actually I wonder if the code could be updated to check (LITERAL > parameter) that each message about to be received would fit in the > mailbox, instead of checking only once at the end ? There are some bugs about this already. In particularl the opposite case, checking that we actually want to append something before aborting a sieve delivery - because it may be discarded or redirected or even duplicate suppressed anyway. Something to keep in mind. > >imap/mailbox.c > > > >I'm not sure it's a good idea to set mailbox->i.exists = 0 in > >mailbox_delete(). Perhaps it would be better to check for > >(mailbox->i.options & OPT_MAILBOX_DELETED) when sampling > >mailbox->i.exists in mailbox_commit_quota(). > Hmm, I wondered about it too. At the end of mailbox_delete, upon > calling mailbox_close, all physical files in the mailbox are > deleted, and it didn't seem like 'exists' was used anywhere > in-between, so I thought it would be alright. > But given the complexity of the code at that point (lot of cleaning > when deleting the mailbox), and unforeseen future usages, I guess > you are right. Actually maybe even q
Re: MESSAGE quota resource implemention
Hi, thanks for your comments and reviewing the code Le 01/09/2011 03:03, Greg Banks a écrit : I think that there are two things that may also be done concerning quota entries: - always recompute resource usage when changing its limit from unlimited to bounded -> currently it is only done once when setting the usage limit for the first time -> that way, it may not be necessary to track resource presence when reading/writing quota entries -> but maybe it could be too time consuming in some cases, since it seems to be possible to associate a quota resource to a whole domain (recomputing usage for all mailboxes would then take some time) - do not write resource in quota entry when its usage is unlimited -> except for the STORAGE resource, for backward compatibility -> would also help keeping quota entries size to the bare minimum What do you think ? This one's tough, I wasn't sure what to do. However, I'm happy to leave it to the administrator to have to manually run quota -f (maybe twice!) if they set a quota on a resource that is already being used. I'm unconvinced that automatically doing the equivalent of -f as a side effect of setting the first limit is necessary or wise. Perhaps we should a) document it clearly and b) detect the situation and put an obvious message saying something like "you may need to run quota -f ..." where the human making the change will see it. Also, such a message might be useful when usage underflow is detected. My concern here is that there are times when quota changes are not done manually by a human, but happen automatically as part of platform provisioning scheme. For example, on many platforms we manage, if a user subscribes to some offer his/her mailbox quota will automatically be upgraded (IMAP action). So, at least in our case, I though it might be useful. But I agree that in case of underflow detection throwing a warning in syslog might help draw the attention when logs are analysed. Le 01/09/2011 10:15, Greg Banks a écrit : commit 2d4fb0 "Added quota MESSAGE resource (RFC 2087) management." imap/quota.h Why #include ? It's not like we're using any of the typedefs there. There's a good argument to be made that we *should* do so, but until then the #include doesn't seem useful. As you realised in the next commit, I removed QUOTA_REPORT_FMT (appearing only in quota.c) to use PRIdMAX/PRIuMAX and cast to (u)intmax_t upon formatting. Using those appeared as a future-proof solution in the discussion about 32/64-bit variables thread I started some time ago :) So yeah, the include could have been done *only* in quota.c, but I remembered that Bron wanted to get rid of (u)quota_t later. So I decided to put the include in quota.h for later use. I'm unconvinced about the need for quota.sets[]. After a lot of reading it seems to me that (quota.sets[x]) and (quota.limits[x] != QUOTA_UNLIMITED) are identical or should be. Also, are you setting sets[] anywhere except when loading from the db? These would be perfectly identical if implementing the last point I brought up (not writing the resource in the quota entry if it's usage is unlimited). But then I need to compute actual quota usage at the right time. And when I saw that the code would allow to associate a quota to a whole domain (ouch, performance issue foreseen if quota is to be computed too often), I decided on the solution I proposed: compute it only once the first time it is set, hence the need of quota.sets[] (since when changing from bounded to unlimited limit, the resource is still kept in the quota entry to prevent recomputing its usage next time). Otherwise, sets[] is updated when setting quota limits (in mboxlist_setquotas). imap/quota_db.c In quota_write(), your sanity check should goto the buf_free() at the end of the function rather than return early. Otherwise, looks good. Ah, forgot that one. Actually I took a shortcut here, since buf_free should not be necessary with the current code (which does append strings; so if the len is still 0, no string was malloc'ed). But for sanity reasons you are right, it is wiser to goto the end of the function and do the expected cleanup stuff. This shall be future-proof. imap/imapd.c In the 1st hunk in cmd_append(), at this point in the code I believe totalsize = 0, so you could easily pass 0 for the new last argument as well. Yes at this point totalsize is still 0. The current code, which handles MULTIAPPEND, does a preliminary check to see if mailbox is not overquota, then receives all messages, and finally checks that all messages can fit in the mailbox. Since we know for sure at least one message is coming, I though it could be a good idea to early-check that at least one more message would fit in the mailbox before receiving anything. Otherwise we are staging new file only to trash it at the end (when overquota). So actually I wonder if the code could be u
Re: MESSAGE quota resource implemention
Julien Coloos wrote: Hi, As discussed on IRC last week, I worked on implementing MESSAGE quota resource in cyrus (see RFC 2087; STORAGE resource already being handled). I created a branch based on Greg's 'annotate' one on github, since his work on annotation storage management made mine a lot easier. Details on the changes I made: cyrus-imapd: - added MESSAGE quota resource management -> updated cunit test -> added 'quotawarnmsg' option which behaviour is similar to 'quotawarnkb'; warning message shown when selecting folder in IMAP tells which resource limit was triggered -> added 'autocreatequotamsg' option, to set MESSAGE limit (unlimited by default) when user auto-creates its mailbox - xfer transfers all non-unlimited resources - added helper function to compute annotation storage usage for a given mailbox - changed the way quota entries are read/written -> resources presence in fetched entry is remembered -> when writing entry, only present resources are written -> when setting resources limits, entries which were not present upon fetching are now marked as present and their current usage is computed - quota utility lists and computes (-f) all resources associated to mailbox The branch 'quotamessage/gnb/annotate' is available here: git://github.com/worldline-messaging/cyrus-imapd.git. It is based on Greg's 'annotate' branch on github. Reviewed: commit 2d4fb0 "Added quota MESSAGE resource (RFC 2087) management." imap/quota.h Why #include ? It's not like we're using any of the typedefs there. There's a good argument to be made that we *should* do so, but until then the #include doesn't seem useful. I'm unconvinced about the need for quota.sets[]. After a lot of reading it seems to me that (quota.sets[x]) and (quota.limits[x] != QUOTA_UNLIMITED) are identical or should be. Also, are you setting sets[] anywhere except when loading from the db? I like quota_update_useds(), it does seem necessary to be able to update several usages together. Otherwise, looks good. imap/quota_db.c In quota_write(), your sanity check should goto the buf_free() at the end of the function rather than return early. Otherwise, looks good. imap/append.c Looks good. imap/append.h Looks good imap/cyr_virusscan.c Looks good imap/imap_err.et Looks good imap/imapd.c In the 1st hunk in cmd_append(), at this point in the code I believe totalsize = 0, so you could easily pass 0 for the new last argument as well. Otherwise looks good. BTW, reading your code I realised a number of places I failed to account for X-ANNOTATION-STORAGE quota properly :( I think I'll have to change a number of those new pairs of quota_t arguments to a quota_t quotadiff[QUOTA_NUMRESOURCES]. imap/index.c Looks good imapd/lmtpd.c Looks good imap/lmtpengine.c Looks good imap/lmtpengine.h Looks good imap/mailbox.c I'm not sure it's a good idea to set mailbox->i.exists = 0 in mailbox_delete(). Perhaps it would be better to check for (mailbox->i.options & OPT_MAILBOX_DELETED) when sampling mailbox->i.exists in mailbox_commit_quota(). Otherwise, looks good. imap/mailbox.h Looks good imap/mbdump.c Looks good imap/nntpd.c Looks good lib/imapoptions Looks good. commit 0e613c "Removed unused code." Looks good. commit c8aeef "Compute each quota resource upon setting it for the first time." commit b30c10 "List and compute all set quota resources." Oh, here's why you include stdint.h: you're using PRIdMAX and not QUOTA_REPORT_FMT, which is used only in imapd.c. I'm thinking that there's now three places in the code which take a mailbox* and fill out an array of quota diffs, interpreting the contents of the struct mailbox. That should really be centralised. -- Greg.
Re: MESSAGE quota resource implemention
Julien Coloos wrote: Hi, As discussed on IRC last week, I worked on implementing MESSAGE quota resource in cyrus (see RFC 2087; STORAGE resource already being handled). I created a branch based on Greg's 'annotate' one on github, since his work on annotation storage management made mine a lot easier. Details on the changes I made: cyrus-imapd: - added MESSAGE quota resource management -> updated cunit test -> added 'quotawarnmsg' option which behaviour is similar to 'quotawarnkb'; warning message shown when selecting folder in IMAP tells which resource limit was triggered -> added 'autocreatequotamsg' option, to set MESSAGE limit (unlimited by default) when user auto-creates its mailbox - xfer transfers all non-unlimited resources - added helper function to compute annotation storage usage for a given mailbox - changed the way quota entries are read/written -> resources presence in fetched entry is remembered -> when writing entry, only present resources are written -> when setting resources limits, entries which were not present upon fetching are now marked as present and their current usage is computed - quota utility lists and computes (-f) all resources associated to mailbox The branch 'quotamessage/gnb/annotate' is available here: git://github.com/worldline-messaging/cyrus-imapd.git. It is based on Greg's 'annotate' branch on github. cassandane: - added tests for MESSAGE resource - modified current tests to play a bit with subfolders and setting quota after adding messages/annotations - special test to check new resource usage is computed when necessary for pre-existing mailboxes (which only have usage for STORAGE resource) The branch 'quotamessage/gnb/master' is available here: git://github.com/worldline-messaging/cassandane.git. It is based on Greg's 'master' branch on github. That all sounds promising and I look forward to reading it. Things that may be worth noting: - DUMP/UNDUMP currently does nothing special about MESSAGE or X-ANNOTATION-STORAGE quota resources -> should it be transferred ? Yes. I've been lazy updating the dump/undump code. -> without breaking backward compatibility, limits could only be transferred through a 'fake' file entry, as for annotations - quota usage is currently stored in a uquota_t variable, and delta is computed as quota_t; so theorically there could be overflow issues if quota usage to add/substract cannot be held in a quota_t; in practice it should be unlikely since that would mean a usage of over 2^63-1 That seems...unlikely. Why the change concerning quota entries read/write ? The thing is that I wanted to make version upgrading as painless as possible, both for users and in the code. With previous code, when quota entry exists and is read, missing resources are initialised with default values (0 usage, unlimited). Thus only usage delta is tracked, and actual usage computing would only happen if quota entry was missing: this is not nice when upgrading, since lots of mailboxes are likely to have an entry with only the STORAGE resource present. So actual usage has to be computed at some point for newly handled resources. The idea here is to compute it when setting the resource for the first time. To do that it was necessary to know when the resource was not previously present and keep it that way until its limit is finally set for the first time. This scheme has another advantage: for platforms where only STORAGE quota is used, quota entries size remains as it is now. Only people using those new quota resources will have their quota entries grow to store the new data. Comments are welcomed :) I'll look at the code before commenting. I think that there are two things that may also be done concerning quota entries: - always recompute resource usage when changing its limit from unlimited to bounded -> currently it is only done once when setting the usage limit for the first time -> that way, it may not be necessary to track resource presence when reading/writing quota entries -> but maybe it could be too time consuming in some cases, since it seems to be possible to associate a quota resource to a whole domain (recomputing usage for all mailboxes would then take some time) - do not write resource in quota entry when its usage is unlimited -> except for the STORAGE resource, for backward compatibility -> would also help keeping quota entries size to the bare minimum What do you think ? This one's tough, I wasn't sure what to do. However, I'm happy to leave it to the administrator to have to manually run quota -f (maybe twice!) if they set a quota on a resource that is already being used. I'm unconvinced that automatically doing the equivalent of -f as a side effect of setting the first limit is necessary or wise. Perhaps we should a) document it clearly and b) detect the situation and put an obvious message s
Re: MESSAGE quota resource implemention
On Wed, Aug 31, 2011 at 05:50:36PM +0200, Julien Coloos wrote: > Things that may be worth noting: > - DUMP/UNDUMP currently does nothing special about MESSAGE or > X-ANNOTATION-STORAGE quota resources > -> should it be transferred ? I'd like to replace DUMP/UNDUMP with replication protocol communications for XFER. > -> without breaking backward compatibility, limits could only be > transferred through a 'fake' file entry, as for annotations But for now, that's definitely the pragmatic way to go. > - quota usage is currently stored in a uquota_t variable, and > delta is computed as quota_t; so theorically there could be overflow > issues if quota usage to add/substract cannot be held in a quota_t; > in practice it should be unlikely since that would mean a usage of > over 2^63-1 I propose we scrap uquota_t - it is un-necessary for the medium-term future, now that we're requiring 64 bit types. > I think that there are two things that may also be done concerning > quota entries: > - always recompute resource usage when changing its limit from > unlimited to bounded That's not too expensive really - we do it in the "quota" tool anyway. The bigger issue is locking. You need to be very careful of deadlock situations if you do this. We have that issue in the quota tool now. The only _real_ way to do it reliably would be to go through and remove all the quota roots from each mailbox, then go through again one at a time and add them back, recording usage as you go. Otherwise there's no way of knowing if your end result is consistent. > -> that way, it may not be necessary to track resource presence > when reading/writing quota entries > -> but maybe it could be too time consuming in some cases, since > it seems to be possible to associate a quota resource to a whole > domain (recomputing usage for all mailboxes would then take some > time) Yeah, now this is what you could call all sorts of bad names. It's really not sane to keep a quota counter at EVERY possible level in the tree just incase someone wants to hang a limit there later. You would get insane amounts of contention on the domain quota lock for a busy domain, which would be un-necessary. > PS: should I open a new bugzilla ticket for that ? Why not - numbers are cheap. > PPS: > cunit: on my computer cunit tests succeed except the 'foreach' one > in the 'quota' suite which timeouts (it seems messing with 4096 > quotalegacy is too much for my computer). There's stuff you can do with fd limits. Check out /etc/pam.d/* for pam_limits.so, and then /etc/security/limits.conf. I just had to learn about that stuff today! Bron.