Persistent message URIs and a mid redirector, was: Re: [Mailman-Developers] Requirements for a new archiver
Hi Appols to butt in without having had the time to properly follow the thread... For me the thing that I hate most about the current mailman web archives is the lack of persistent URIs, the fact that you open a mbox to edit out soemones phone number they sent to a public list by mistake and after you have rebuild the archives most message URIs have changed and as a result dozens of carfully constructed wiki pages referencing these email are broken :-( If a new archive only resulted in persistent URIs for messages I'd be happy, I guess peole know this classic? Cool URIs don't change http://www.w3.org/Provider/Style/URI Also are people aware of the neat hack that the W3C uses where by you can get to any message in their list archives with a URI like this: http://www.w3.org/mid/$MID And in addition each outgoing message from their list server has this URI in the header, for example: X-Archived-At: http://www.w3.org/mid/[EMAIL PROTECTED] This header is added with a procmail rule: http://groups.yahoo.com/group/rss-dev/message/3163 Chris -- Chris Croome [EMAIL PROTECTED] web design http://www.webarchitects.co.uk/ web content management http://mkdoc.com/ ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: Efficient final message disposition (was Re: [Mailman-Developers] Requirements for a new archiver)
At 8:20 PM -0500 2003/10/30, J C Lawrence wrote: While I don't disagree, this is really an MTA's job, not Mailman's. This is why I've been doing log analysis of MXes and routing mail to customised outbound MTAs on the basis of responsiveness, since early 2000. Adaptive MX routing is great stuff. There is a need for this function, and no MTA available today does it. MLMs throughout the history of the Internet have incorporated a variety of features for SMTP performance enhancement that are unique to mailing lists or are usually found primarily in mailing lists, and this is no different. If you want to externalize all these functions outside of mailman, that's fine. But then someone has to pick up the ball and start hacking on bulk_mailer or some other program to provide these features. Yup. I did it at the first level with an initial SMTP proxy which routed based on MX response records pulled from a DB. Again, this is a feature which is not found on any MTA available today, and which is known to have a huge impact on mailing list performance. This feature needs to be provided somewhere, by someone. I'm generally of the view that Mailman should do opportunistic domain sorting and per-MTA customised VERP handoffs (because nobody has standardised VERP across MTAs), and beyond that to back off. Mailman's job is to get the outbound mail into the MTA's spool as quickly as possible, wrapped in transactions (ie RCPT TO bundles) that are friendly to efficient processing, and that's it. If you go back to Barry's message, he was talking about getting even further involved, by doing a mail-merge process. Since there is no MMTP (something that Bryan Costales, Eric Allman, and I had worked on for a while, before we realized that it would just make the spam problem worse and then dropped all further efforts), there is a need for an intermediate program that is called by mailman and then hands the messages off to the MTA. Either that intermediate program can be provided by mailman itself, or it can come from a third party. But it needs to come from somewhere. We're not in the game of second guessing the MTAs. That way lies wasted time and madness. If there were MLTAs which were optimized for this function, I would agree with you. Since we're trying to take standard MTAs which may have only some optimizations that might be generally applicable to most situations (including mailing lists), I must disagree. For the mailing list specific optimizations that we know are not provided by many common MTAs or MTA versions, we need to perform those optimizations before the message gets to the MTA. We also need to be able to selectively turn them off, in the case that there are MTAs that can do that specific job themselves and don't need our interference. Where Mailman's performance hurts is in the handling of the list configs, especially for lists with very large memberships rosters and in queue runner performance and overhead (try watching queue runner's system resource profile in v2.1 for lists with 50,000 members). For me those are the obvious low hanging fruit, You should definitely go after the low-hanging fruit when you can. However, you also have to consider how much work would go into fixing those problems. A high priority item that would require re-engineering the entire system is something that should be planned for the long term, perhaps in conjunction with other things that would likewise require significant re-engineering efforts as well. Meanwhile, if there are other performance issues that can be addressed which do not require such significant re-engineering, those should be given serious consideration in the shorter term. -- Brad Knowles, [EMAIL PROTECTED] They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++): a C++(+++)$ UMBSHI$ P+++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+() DI+() D+(++) G+() e++ h--- r---(+++)* z(+++) ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: Efficient final message disposition (was Re: [Mailman-Developers] Requirements for a new archiver)
On Fri, 31 Oct 2003 16:04:43 +0100 Brad Knowles [EMAIL PROTECTED] wrote: At 8:20 PM -0500 2003/10/30, J C Lawrence wrote: While I don't disagree, this is really an MTA's job, not Mailman's. This is why I've been doing log analysis of MXes and routing mail to customised outbound MTAs on the basis of responsiveness, since early 2000. Adaptive MX routing is great stuff. There is a need for this function, and no MTA available today does it. MLMs throughout the history of the Internet have incorporated a variety of features for SMTP performance enhancement that are unique to mailing lists or are usually found primarily in mailing lists, and this is no different. True. Its not a very difficult process, and is absurdly expensive the way I handle it. At some point in my copious spare time I should whack another couple config tokens into Exim, just to up the ante. If you want to externalize all these functions outside of mailman, that's fine. But then someone has to pick up the ball and start hacking on bulk_mailer or some other program to provide these features. Aye, but some care should be taken here defining who the people are, between the Good-For-Mailman, and Good-For-Large-Mail-Systems camps. They're related, but not synonymous. Yup. I did it at the first level with an initial SMTP proxy which routed based on MX response records pulled from a DB. Again, this is a feature which is not found on any MTA available today, and which is known to have a huge impact on mailing list performance. This feature needs to be provided somewhere, by someone. True. If you go back to Barry's message, he was talking about getting even further involved, by doing a mail-merge process. Since there is no MMTP (something that Bryan Costales, Eric Allman, and I had worked on for a while, before we realized that it would just make the spam problem worse and then dropped all further efforts), there is a need for an intermediate program that is called by mailman and then hands the messages off to the MTA. nod Mailmerge and VERP customisation, and the standards for the communication of those things to the MTA are areas that need attention, both for Mailman and the rest of the market (tho the IronPort and related guys might argue). This would be a good point to get some cross-MTA discussion going on. We're not in the game of second guessing the MTAs. That way lies wasted time and madness. If there were MLTAs which were optimized for this function, IIRC QMail has a (typically DJB) VERP/rewrite handoff method. I also recall that it is very bound into QMail's process and IO model, but perhaps this should be examined? I would agree with you. Since we're trying to take standard MTAs which may have only some optimizations that might be generally applicable to most situations (including mailing lists), I must disagree. There's that audience problem again. I actually agree with you in the general case, and am willing to spend time and effort in that direction. However I see this as somewhat disjoint from Mailman in specific. -- J C Lawrence -(*)Satan, oscillate my metallic sonatas. [EMAIL PROTECTED] He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Thu, 2003-10-30 at 00:08, J C Lawrence wrote: Hang-on. Apache isn't the target. Mailman's UI is a CGI app. As such it works with any web server that supports CGI-bin, which pretty much means any web server with no exceptions. That's a pretty large gain, especially in the novice admin or simple deployment case territory. Sure, but I suspect that plumbing Mailman out to http will be just a proxy rule away from integrating with an existing web server. That's not without its headaches too, but should be as widely supported. Doing our own thing for HTTP handling can quickly be another Pandora's box, security concern, and integration problem for the (majority of) people who do want to run Apache/Boa/Thttpd/Zeus/etc. We do need to worry about the security of the http framework (e.g. Twisted), but past that, it's still our responsibility. I mostly see this as a thin veneer between the web and the core logic for Mailman. Wanna use CGI? I suspect it's just a little extra glue. Same goes for mod_python or whatever. An approach like Exim + elspy affords some really cool possibilities. Absolutely, but that is outside of Mailman's territory. Definitely for now, that's for sure. I don't want to write it off completely, but we need to be practical too. More interesting would be things like TMDA integration, or implementing support for Yakov Shafranovich extension of my consent token protocol: http://www.ietf.org/internet-drafts/draft-irtf-asrg-cri-00.txt Getting early buy-in as a sample implementation for an MLM wouldn't be a Bad Thing. There's a lot of really neat and useful integration and feature set territory to explore before you start staring down the MTA's throat. Sure. I just skimmed the CRI draft, but here's some questions (hmm, if you answer this please start a new thread). If you send 10 messages to a list within 10 minutes and I've never heard of you before, should I send you 10 challenges or one? If I send you 10, should I consider a response to any one of them good enough to free all 10 posts? Also, isn't any CRI system going to have to have mail bomb defenses? -Barry ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Thu, 30 Oct 2003 09:15:35 -0500 Barry Warsaw [EMAIL PROTECTED] wrote: On Thu, 2003-10-30 at 00:08, J C Lawrence wrote: Hang-on. Apache isn't the target. Mailman's UI is a CGI app. As such it works with any web server that supports CGI-bin, which pretty much means any web server with no exceptions. That's a pretty large gain, especially in the novice admin or simple deployment case territory. Sure, but I suspect that plumbing Mailman out to http will be just a proxy rule away from integrating with an existing web server. That's not without its headaches too, but should be as widely supported. Considerably more web servers support CGI-bin than support proxy rules. -- J C Lawrence -(*)Satan, oscillate my metallic sonatas. [EMAIL PROTECTED] He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Efficient final message disposition (was Re: [Mailman-Developers] Requirements for a new archiver)
Ok, I'm beat up enough, so let me open things up to a hopefully more productive thread. How can Mailman more efficiently hand off messages to a local mail server for final delivery? Some problems with the current approach include: - The desire/requirement that Mailman chunk and sort recipients - The ability for Mailman to swamp the mail server or cause the mail server to consume all available cpu - The fact that failures in upstream mail server are reported to Mailman as bounces instead of as error codes - Inefficiencies in VERP/personalization/mail-merge because of the lack of cooperation - The need for Mailman to queue outgoing messages that aren't completely delivered I'm sure you guys can identify more issues wink. Look at the complexity in SMTPDirect.py, and even there, we still have problems. So how do we design a system where we can push the complexity and efficiency concerns out past our boundary? Here's a rough sketch of what I'd like: Mailman has a list of recipients, or at least knows how to calculate that list. It has a message template as encoded 7-bit ascii. It has a dictionary (association table, hash table) of substitution placeholders to values for each recipient, or knows how to calculate that. Mailman wants to simply hand that data off to some agent and forget about it. It wants to know that the agent will make best effort to mail merge and deliver. It wants to be informed of any final delivery failures. And that's it. Mailman doesn't want to chunkify recipients, and it doesn't want to sort them. It doesn't want to worry about a mail server effectively managing system resources. I'd rather not have to hand it a couple of meg of recipient or substitution data, but there seems to be no other way. So what can we do here to improve matters? -Barry ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Thu, 30 Oct 2003 07:04:19 +0100 Brad Knowles [EMAIL PROTECTED] wrote: At 12:40 AM -0500 2003/10/30, J C Lawrence wrote: I've already said my bits there and proposed what I see as the cheap, easy, incremental improvement course: Twisted's NNTP supports for storage, Message IDs for keys, a variant best-effort detection and rewriting policy for collisions, and a MeoWWW derivative for HTML presentation/posting. I don't know anything about Twisted or MeoWWW, so I can't say how they address the subjects above. Twisted is a pythonic library that implements most of the basic network protocols. Among other things it has an RFC conformant NNTP server and client implementations. Creating an NNTP server with a backing message store is, literally, three lines in Python. Of course it doesn't support all the nifties that real netnews servers do ala expires, administrative controls, feeds, etc. Its not intended for that market, and Mailman doesn't need those supports. If deployment sites need that, they're going to be using inn2|[BCD}News|Diablo anyway. MeoWWW is a (very inefficient but fixable) pythonic CGI which supports reading and posting to netnews via NNTP. It has various nice UI points, a decent feature set (more than we have now), and does The Right Thing in almost every aspect I've checked except for performance in the spool reads. I can say that I'm not sure about an NNTP-based storage solution... We should really start out by splitting that discussion. NNTP is an access protocol. Netnews servers have various storage formats and techniques. Currently NNTP and IMAP are the only standardised wide-deployment protocols for message spool access. I'm not interested in IMAP for the reasons previously discussed. NNTP isn't great, but it is already supported by Mailman for the new gating features and adds a clean abstraction model which allows trivial replacement of Mailman's implementation by inn2|[BCD]news|Diablo|whatever should the deployment site wish. Additionally, again as a standards-etc based protocol, it allows clean abstraction for archive presentation: anything that talks NNTP can now be an effective Mailman archive presenter. Ditto for archive indexing. As a dev I'm interested in arguments about how to handle the store behind the NNTP interface -- I find that stuff fun and intriguing -- but also think they are fairly uninteresting right now for Mailman specifically. The 90% case for Mailman will have less than 200K messages in their site-wide spool, and most of those an order of magnitude less. For me the interesting point is that once we abstract the message storage behind a well-supported standards-based protocol we can incrementally improve our implementation and those really concerned with the larger cases can throw in inn2 or whatever else, like a filter to SQL, instead. ITMT we get the flexibility and time to grow and do it Really Right. Additionally, having adopted such a well defined abstraction model once, moving down the road should something else better appear it should be a comparatively small cost to support that in addition or instead. ... although certain storage techniques we've recently discussed borrow a lot from extant NNTP implementations, and I'm not sure how much sense it would make to rip out just those parts we know we need, or if we could actually reasonably take the whole thing, kit-n-caboodle. Which may indeed happen. I do believe that we need an alternative solution to the message-id header as it was presented to us in the message, as a stable guaranteed unique (well, as good as MD-5 or SHA-1 gets) message identifier that can always be used to refer to the exact same message no matter what. I'm in split minds here. I see the temptation. I like using Message-IDS, and they are a natural fit to the model semantically, but messing with Message-IDs has unpleasant effects for some other systems. shrug Whether we use this message identifier as a replacement for the message-id header value as it was presented to us -- I think that's a more philosophical discussion, and I think we should address it by allowing both options but deciding which would be a reasonable default to take. nod I'm on the side of rewriting Message-IDs if we do generate our own keys. I don't like it, but it seems the cleanest approach. Given that the mailman UI is basically completely contained within the CGI, I'm inclined to leave it there and work on improving it internally, allowing us to continue to work with most any webserver the client may have. Agreed. I don't know how MeoWWW addresses this issue, either by replacing the webserver, or providing additional tools that may make it easier to present a good and consistent UI. MeoWWW is a CGI as discussed above. Twisted implements both sides of HTTP in addition to the NNTP discussed above, but I haven't looked at the details. -- J C Lawrence -(*)
Re: [Mailman-Developers] Requirements for a new archiver
On Oct 30, 2003, at 6:38 AM, J C Lawrence wrote: Sure, but I suspect that plumbing Mailman out to http will be just a proxy rule away from integrating with an existing web server. That's not without its headaches too, but should be as widely supported. Considerably more web servers support CGI-bin than support proxy rules. And think about all of the colo environments where it's getting installed. Proxy stuff may not be welcome there. And you make it difficult for someone to integrate Mailman into a larger site environment where they want to use tools (like mod_layout) to skin things. Do we know what a typical installation of Mailman is like? Do we know how it's used? Do we know what kind of hardware it's really running on, or what environments? Do we know what the user base is? What their top ten wish list is? Excuse me for sounding like a product manager, but are these features because they're needed, or because we think they'd be fun to implement? and are we building an upgrade the user base can use, or only alpha geek hardware owners? (and in reality, I think Barry has a good intuitive sense of these issues, but I wanted to have all of us rememeber it, and maybe it wouldn't be a bad idea to get some objective data) ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: Efficient final message disposition (was Re: [Mailman-Developers] Requirements for a new archiver)
At 9:53 AM -0500 2003/10/30, Barry Warsaw wrote: I'm sure you guys can identify more issues wink. Look at the complexity in SMTPDirect.py, and even there, we still have problems. I'm not a programmer, so I can't really help you there. ;-( So how do we design a system where we can push the complexity and efficiency concerns out past our boundary? I can say that I think we need to look at all of the recommendations in the following papers: Tuning Sendmail for Large Mailing Lists Rob Kolstad Proceedings of LISA '97 http://tinyurl.com/t09c Drinking from the Fire(walls) Hose: Another Approach to Very Large Mailing Lists Strata Rose Chalup, Christine Hogan, Greg Kulosa, Bryan McDonald, and Bryan Stansell Proceedings of LISA '98 http://tinyurl.com/t09k There may be others that we need to look at, but of which I am not (yet) aware. If anyone knows of any, please let me know. We're already doing some of the things recommended in these papers, but not everything. And I think there may be a couple more things we can do that are not mentioned, but which would be a further help. However, if you want to hand all this work to an external final mail-merge delivery agent, this is moot. We just need to make sure that the selected FMMDA addresses all these issues. We could use an existing tool (e.g., bulk_mailer from ftp://cs.utk.edu/pub/moore/bulk_mailer/), or we could create a separate package to address this issue (of course, that brings the ball back into our court). Or, you could just have Chuq solve this problem for you, as he mentioned in http://mail.python.org/pipermail/mailman-developers/2000-May/006820.html. ;-) So what can we do here to improve matters? Sounds to me like you want to externalize this whole process. Problem is, bulk_mailer is the only tool I know of that currently exists as a partial attempt to address this problem, although perhaps some additional work on it could fill in the rest. Alternatively, you develop, or work with someone else to develop, an alternative to bulk_mailer that does all the things you want and which can be used as an external tool. -- Brad Knowles, [EMAIL PROTECTED] They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++): a C++(+++)$ UMBSHI$ P+++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+() DI+() D+(++) G+() e++ h--- r---(+++)* z(+++) ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: Efficient final message disposition (was Re: [Mailman-Developers] Requirements for a new archiver)
On Oct 30, 2003, at 7:48 AM, Brad Knowles wrote: Tuning Sendmail for Large Mailing Lists http://tinyurl.com/t09c 400K/day aggregate max Drinking from the Fire(walls) Hose: http://tinyurl.com/t09k 380K/day aggregate max (yawn. My server's bored. snicker) but seriously, both of them are built around pre sendmail 8.12 environments. there's some interesting stuff there, but it's now fairly dated, since sendmail 8.12 really changes the landscape. And all of those other environments Or, you could just have Chuq solve this problem for you, as he mentioned in http://mail.python.org/pipermail/mailman-developers/2000-May/ 006820.html. ;-) gack. So what can we do here to improve matters? Sounds to me like you want to externalize this whole process. Problem is, bulk_mailer is the only tool Because pretty much every MLM has internalized the process. By the end of november, I'll have completely retired any use of bulk_mailer on my systems for other solutions. One big reason: increasing spam blocking (stupid or otherwise) of non-individually addressed email. The old list server setup of: to: subscribers of list [EMAIL PROTECTED] bcc: [EMAIL PROTECTED] is increasingly risky as far as delivery is concerned. I also don't think it allows for the kind of personalization that's needed for your general audiences (help URLs, unsub URls, etc). And with sendmail 8.12, queue groups and envelope splitting, frankly, bulk_mailer does more harm to the delivery stream than good. Just stuff it into sendmail, tune sendmail to split intelligently. bulk_mailer is obsolete... and much to my amusement, a few sites block based on its use in headers (idiots), which is why my copy identifies itself as ulkbay_ailermay. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
claw == J C Lawrence Re: [Mailman-Developers] Requirements for a new archiver Wed, 29 Oct 2003 21:22:32 -0500 claw I may be unusual in this regard, but I generally consider claw list archives as one-way systems: messages go in and never claw come out. Out of idle curiosity, why doesn't 'write once read many' indicate a directory more than a database? jam pgp0.pgp Description: PGP signature ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Thu, 30 Oct 2003 16:20:10 -0500 John A Martin [EMAIL PROTECTED] wrote: Out of idle curiosity, why doesn't 'write once read many' indicate a directory more than a database? 1) The filesystem is a database. 2) Unix filesystems have extremely limited meta-data. 3) A discussed format is putting the mesasges on the filesystem (as a BD), and the meta data in a different DB (primarily due to open(2)/stat(2) expense. -- J C Lawrence -(*)Satan, oscillate my metallic sonatas. [EMAIL PROTECTED] He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Rewriting Message-ID (was Re: [Mailman-Developers] Requirements for a new archiver)
On Tue, 2003-10-28 at 13:30, J C Lawrence wrote: Yup. Of course this heads directly into that beautiful debate of whether MLMs should rewrite Message IDs. Summarising briefly: If we rewrite all IDs we'll piss off the people who use ID to do dupe detection/deletion for courtesy copies. If we don't do some rewriting some messages won't make it through NNTP and some other people will be pissed off. Two contrasting approaches: 1) We guarantee uniqueness of all Message IDs. The only way to do this is to rewrite all IDs. This will piss off some people. 2) We best-effort guarantee uniqueness by only guaranteeing uniqueness within the last N messages to the list. This could be one by rewriting all IDs, in which case we might as well guarantee total uniqueness, or it could be done by keeping a DB of the last N (cf CDBD) and either discarding or rewriting detected collisions. This of course means that some messages will be discarded by NNTP and we won't know about it. Some may be willing to accept those risks. Nice summary, thanks. Here's a strawman: In the spirit of RFC 2369 we define a new header called List-Message-ID, and as in that standard, this field MUST only be generated by a mailing list, not by end users. Nested lists SHOULD remove the parent's List-Message-ID and supply its own. List-Message-ID conforms to the same syntax as for Message-ID in RFC 2822. Of course, for now read the header as if it had an X- prefix. When an MLM receives a message, it generates a List-Message-ID header which is guaranteed to be globally unique. A cooperating archiver should use this header as its primary key, and must provide a mechanism whereby the List-Message-ID can be presented and the archived message can be returned. It may fall back to Message-ID when there is no List-Message-ID header present. Internally, we use List-Message-ID as the primary key into our message store. We further define a header (X-)List-Archived-Message which contains a url pointing directly to this message in a cooperating archive. Now we have some knobs we can tweak. Q. When posting a message to News, when should Mailman copy the List-Message-ID header to Message-ID? A. Never, Only to resolve duplicate rejections, Always Q. When reflecting a posted message back to the list, when should Mailman copy the List-Message-ID header to Message-ID? A. Never, Always I think it's time we started filling in the missing holes in the RFCs for mailing list functions, such as the interactions we're describing here. I propose to start a section of the wiki (or perhaps www.list.org) to collect these. Eventually we should try to get consensus with or archivers and MLMs, and then push a standard, but that's a long way off. -Barry ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, 2003-10-29 at 13:30, J C Lawrence wrote: 2) Message IDs are not guaranteed globally unique, but the collision rate can be manageable/acceptable in a large number of deployment cases. Ah, which reminds me, elaborating on my strawman, the answers to when should Mailman rewrite Message-ID on posts should be: Never, Only to resolve duplicates, Always. -Barry ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: Rewriting Message-ID (was Re: [Mailman-Developers] Requirements for a new archiver)
On Thu, 30 Oct 2003 17:47:18 -0500 Barry Warsaw [EMAIL PROTECTED] wrote: In the spirit of RFC 2369 we define a new header called List-Message-ID, and as in that standard, this field MUST only be generated by a mailing list, not by end users. Nested lists SHOULD remove the parent's List-Message-ID and supply its own. List-Message-ID conforms to the same syntax as for Message-ID in RFC 2822. Of course, for now read the header as if it had an X- prefix. When an MLM receives a message, it generates a List-Message-ID header which is guaranteed to be globally unique. A cooperating archiver should use this header as its primary key, and must provide a mechanism whereby the List-Message-ID can be presented and the archived message can be returned. It may fall back to Message-ID when there is no List-Message-ID header present. I haven't finished musing on this (busy day, thus slow on other replies as well), but my first thought: What happens when a given a message is sent to several lists on the same host? Does each list do its own munge? Do we do USENET-style crossposting? I want to do crossposting. I don't think we can due to per-list customisations. -- J C Lawrence -(*)Satan, oscillate my metallic sonatas. [EMAIL PROTECTED] He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Thu, 30 Oct 2003 17:51:27 -0500 Barry Warsaw [EMAIL PROTECTED] wrote: On Wed, 2003-10-29 at 13:30, J C Lawrence wrote: 2) Message IDs are not guaranteed globally unique, but the collision rate can be manageable/acceptable in a large number of deployment cases. Ah, which reminds me, elaborating on my strawman, the answers to when should Mailman rewrite Message-ID on posts should be: Never, Only to resolve duplicates, Always. Does that mean that we keep a database of all Message-IDs that all lists on that host have ever seen? If so, what happens when a single message is CC'ed to multiple lists? NetNews servers require global uniqueness across all newsgroups. I'm rapidly coming to the conclusion that we have to rewrite all Message-IDs whenever the internal archive is enabled. -- J C Lawrence -(*)Satan, oscillate my metallic sonatas. [EMAIL PROTECTED] He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: Efficient final message disposition (was Re: [Mailman-Developers] Requirements for a new archiver)
On Thu, 30 Oct 2003 18:20:56 +0100 Brad Knowles [EMAIL PROTECTED] wrote: At 8:41 AM -0800 2003/10/30, Chuq Von Rospach wrote: One of them is recipient sorting by average delivery time over the past week (probably want a decaying geometric mean), which would require tracking log data on a per-recipient basis. While I don't disagree, this is really an MTA's job, not Mailman's. This is why I've been doing log analysis of MXes and routing mail to customised outbound MTAs on the basis of responsiveness, since early 2000. Adaptive MX routing is great stuff. Another is two-level message handling, by configuring the MTA for the initial delivery attempt to use very low timeouts, but then to fall back to a secondary MTA (or MTA pool) that uses more standard timeouts for those sites that are slower. Yup. I did it at the first level with an initial SMTP proxy which routed based on MX response records pulled from a DB. Perhaps in its current form, that is true. However, not all sites are using sendmail 8.12, and of the ones that are, most are probably not using it in a manner that is more suitable for mailing lists. I'm generally of the view that Mailman should do opportunistic domain sorting and per-MTA customised VERP handoffs (because nobody has standardised VERP across MTAs), and beyond that to back off. Mailman's job is to get the outbound mail into the MTA's spool as quickly as possible, wrapped in transactions (ie RCPT TO bundles) that are friendly to efficient processing, and that's it. We're not in the game of second guessing the MTAs. That way lies wasted time and madness. However, given the issues you've mentioned, it would probably be a good idea to be able to turn off selected bulk_mailer type features, so that you can let the MTA do more of it's job better -- if it is configured to do so. There are thresholds for covering up for broken software. There are also thresholds for covering up for SysAdm negligence or oversight. You've got to pick where you stop accepting the problem. Ideally we should be resilient and friendly to both. Realistically we need to do something reasonable and not worry too hard about the rest. Priorities. Mailman's primary performance problems are not at the MTA hand off. MTA configuration and tuning for mailing lists is only a minor art. There is not-inconsiderable documentation and understanding of the field. A US$2K commodity box subjected to moderate tuning efforts using readily available documentation can sustain 2,400 outbound deliveries per minute. You do the arithmetic. In a perfect world that maps out to 3.4 million per day. Cut that under half for queue injection overhead other crap and you're still talking a million deliveries per day for a US$2K host.[1] A million messages a day already puts us above the 99th percentile for list server audiences. I'm not really concerned about that problem. Where Mailman's performance hurts is in the handling of the list configs, especially for lists with very large memberships rosters and in queue runner performance and overhead (try watching queue runner's system resource profile in v2.1 for lists with 50,000 members). For me those are the obvious low hanging fruit, and those are the points that will help not just the performance hounds, but also the lower 80% who are running under-provisioned under-configured under-admined multi-purpose boxes who want Mailman to be a bit more reasonable and forgiving about their not-so-brilliant systems. [1] That's of course assuming reasonable sustained queue size and responsive MXes. However, those are separate problems and ignoring MTA-specific behaviours (like Exim's active hatred of large queues), the methods and systems to segment and tame those problems are fairly well known. -- J C Lawrence -(*)Satan, oscillate my metallic sonatas. [EMAIL PROTECTED] He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: Efficient final message disposition (was Re: [Mailman-Developers] Requirements for a new archiver)
On Thu, 30 Oct 2003 08:41:17 -0800 Chuq Von Rospach [EMAIL PROTECTED] wrote: On Oct 30, 2003, at 7:48 AM, Brad Knowles wrote: One big reason: increasing spam blocking (stupid or otherwise) of non-individually addressed email. The old list server setup of: to: subscribers of list [EMAIL PROTECTED] bcc: [EMAIL PROTECTED] is increasingly risky as far as delivery is concerned. I've seen a couple mail BCPs and internal spam-handling plans at large ISPs and corporates which explicitly include the line item: Discard all mail with more than one address in the envelope. Scary, stupid, true: They want the pain to stop. I find it hard to blame them. I also don't think it allows for the kind of personalization that's needed for your general audiences (help URLs, unsub URls, etc). Aye, such VERPish attributes is becoming a necessity. -- J C Lawrence -(*)Satan, oscillate my metallic sonatas. [EMAIL PROTECTED] He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: Efficient final message disposition (was Re: [Mailman-Developers] Requirements for a new archiver)
On Thu, 30 Oct 2003 09:53:19 -0500 Barry Warsaw [EMAIL PROTECTED] wrote: - The desire/requirement that Mailman chunk and sort recipients This shouldn't be any more complex than domain sorting, and need not be perfect. - The ability for Mailman to swamp the mail server or cause the mail server to consume all available cpu Rate limiting. - The fact that failures in upstream mail server are reported to Mailman as bounces instead of as error codes I don't know that Mailman can do anything about this. We can't reliably distinguish between system errors and delivery failures for MTAs beyond Mailman's borders. There's a protocol hole here I don't know we can or should attempt to fix. - Inefficiencies in VERP/personalization/mail-merge because of the lack of cooperation Oh yeah. - The need for Mailman to queue outgoing messages that aren't completely delivered Queue runner could do with some more intelligence in that dept. Mailman wants to simply hand that data off to some agent and forget about it. It wants to know that the agent will make best effort to mail merge and deliver. It wants to be informed of any final delivery failures. And that's it. Mailman doesn't want to chunkify recipients, and it doesn't want to sort them. It doesn't want to worry about a mail server effectively managing system resources. I'd rather not have to hand it a couple of meg of recipient or substitution data, but there seems to be no other way. So what can we do here to improve matters? Start yelling at DJB, Wietse, Phillip, and Eric about a standardised SMTP extension for VERP. With a little luck and minor work we can probably get some of the other commercial mail people involved as well. -- J C Lawrence -(*)Satan, oscillate my metallic sonatas. [EMAIL PROTECTED] He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
At 3:06 PM -0500 2003/10/27, Kevin McCann wrote: I was thinking about using MHonarc to enhance the archive experience but it doesn't work with MySQL directly so Mail::Box just might be what the doctor ordered. No database handles BLOB (Binary Large OBject) storage well. Even high-end databases have problems in this area. IMO, this is a bad idea. Better would be to use a mailbox format that handles simultaneous multiple access reasonably well. You can use c-client and mbx format, or MH format, or something else reasonably decent. -- Brad Knowles, [EMAIL PROTECTED] They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++): a C++(+++)$ UMBSHI$ P+++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+() DI+() D+(++) G+() e++ h--- r---(+++)* z(+++) ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
At 12:41 AM -0500 2003/10/28, J C Lawrence wrote: Quite, this is how/why NNTP uses Message-IDs are unique indexing qualifiers. Problem is that client-assigned message-ids are not guaranteed unique. Too many people are using RFC 1918 private addressing space, and if the machine doesn't know it's own name, then it stuffs in just the IP address for that portion. Everything else could quite feasibly collide, and you'd wind up with multiple non-unique message-ids. You need a guaranteed unique id to be used as a primary index field. -- Brad Knowles, [EMAIL PROTECTED] They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++): a C++(+++)$ UMBSHI$ P+++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+() DI+() D+(++) G+() e++ h--- r---(+++)* z(+++) ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
At 3:12 PM -0500 2003/10/27, Barry Warsaw wrote: What would then be in the database would be records providing easy lookup by message-id (at least) into the on-disk message store. Putting meta-data into the database would work. Then use that index information to actually access the files. I recommended the same in my invited talk at http://www.shub-internet.org/brad/papers/dihses/. Of course, if you're going to use a USENET interface, you should use Diablo as the back-end. ;-) -- Brad Knowles, [EMAIL PROTECTED] They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++): a C++(+++)$ UMBSHI$ P+++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+() DI+() D+(++) G+() e++ h--- r---(+++)* z(+++) ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, 29 Oct 2003 16:29:09 +0100 Brad Knowles [EMAIL PROTECTED] wrote: At 12:41 AM -0500 2003/10/28, J C Lawrence wrote: Quite, this is how/why NNTP uses Message-IDs are unique indexing qualifiers. Problem is that client-assigned message-ids are not guaranteed unique. Right, and that was the point. If we do nothing to Message IDs we don't change external behaviour. If we use a netnews backing store for the archives and we don't dick with the message IDs we run the risk of some messages never reaching the archives. If we use a netnews backing store and dick with message IDs we can offer various levels of guarantee that messages reach the archives, and of pissing off users because we messed with the Message IDs. As always, you get to pick. Everything else could quite feasibly collide, and you'd wind up with multiple non-unique message-ids. In which case the many people currently using ID-based dupe collapsing (eg default Exchange config) will lose messages, and the archives will lose messagesOR...we offer some level of guarantee (see yesterday's discussion) with the matching trade-offs. You need a guaranteed unique id to be used as a primary index field. Need is a strong word. Its very deployment and use-case sensitive. There are a large number of cases where I'm content to rest on the assurance that the Message IDs arriving at my lists will always be unique. There are also a large number of cases where I'm not willing to make that assessment, as well as a large number of cases where I'm willing to simply discard anu duplicated Message ID messages at the archiver level. Similarly, there are cases where re-writing the Message IDs in any form is significantly troubling, and cases where its not. Need? No. It is a deployment choice with easily understood ramifications. -- J C Lawrence -(*)Satan, oscillate my metallic sonatas. [EMAIL PROTECTED] He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
At 11:48 AM -0500 2003/10/29, J C Lawrence wrote: You need a guaranteed unique id to be used as a primary index field. Need is a strong word. Its very deployment and use-case sensitive. In the case of a database, it is a hard requirement. A primary index field must be guaranteed unique. There is absolutely no way around this issue. Need? No. It is a deployment choice with easily understood ramifications. Perhaps for the application, but this is a totally different ballgame when it comes to a database. Google for primary index field, and hopefully you will understand. -- Brad Knowles, [EMAIL PROTECTED] They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++): a C++(+++)$ UMBSHI$ P+++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+() DI+() D+(++) G+() e++ h--- r---(+++)* z(+++) ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, 2003-10-29 at 10:13, Brad Knowles wrote: At 3:06 PM -0500 2003/10/27, Kevin McCann wrote: I was thinking about using MHonarc to enhance the archive experience but it doesn't work with MySQL directly so Mail::Box just might be what the doctor ordered. No database handles BLOB (Binary Large OBject) storage well. Even high-end databases have problems in this area. IMO, this is a bad idea. Agreed. I was thinking more along the lines of storing the message body as is, which, yes, might sometimes be base-64 encoded. Content headers, boundary string, etc. could also be stored so as to make decoding (by a web app) a cinch. You could go further and create attachment files and point to it in an url or file field. But keep the message intact, as it was received. That way if you want to get into after-the-fact message delivery (manual resend, or maybe a member missed a message and wants it in his/her inbox), it's not a chore. The Messages_ table that Lyris uses in its database is a good starting point if one wants to do the same kind of thing. I can dig up the specs if there is interest. - Kevin ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
Brad Knowles [EMAIL PROTECTED] writes: At 3:06 PM -0500 2003/10/27, Kevin McCann wrote: I was thinking about using MHonarc to enhance the archive experience but it doesn't work with MySQL directly so Mail::Box just might be what the doctor ordered. No database handles BLOB (Binary Large OBject) storage well. Even high-end databases have problems in this area. IMO, this is a bad idea. Better would be to use a mailbox format that handles simultaneous multiple access reasonably well. You can use c-client and mbx format, or MH format, or something else reasonably decent. Hmm... Maildirs. With just a bit of minor trickery the unique filename created to receive a message as it arrives at Mailman might be put into the saved rfc822 header (much like MTAs place a queue id), or into the message trailer if you must, and perhaps could be preserved in the filename as the message is moved/copied from one directory to another and thereby providing a unique index that can be included in the message Mailman puts on the wire. jam pgp0.pgp Description: PGP signature ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Oct 29, 2003, at 9:41 AM, Brad Knowles wrote: In the case of a database, it is a hard requirement. A primary index field must be guaranteed unique. There is absolutely no way around this issue. which is why it many times makes sense to generate your own. Consider, say, identifying all messages with an MD5 hash of the message then use that for all of your link generating and access work. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, 29 Oct 2003 18:41:20 +0100 Brad Knowles [EMAIL PROTECTED] wrote: At 11:48 AM -0500 2003/10/29, J C Lawrence wrote: You need a guaranteed unique id to be used as a primary index field. Need is a strong word. Its very deployment and use-case sensitive. In the case of a database, it is a hard requirement. A primary index field must be guaranteed unique. There is absolutely no way around this issue. Right, and I'm not arguing that. My point is two fold: 1) Using Message ID as a primary key is attractive. 2) Message IDs are not guaranteed globally unique, but the collision rate can be manageable/acceptable in a large number of deployment cases. We don't have to guarantee key uniqueness for all messages BEFORE they are submitted to the message store. The unique property can be assumed from external sources (with all that implies) should the deployment case want that. There are tradeoffs here, and it is not clear to me that there is an instant and obvious global solution. Need? No. It is a deployment choice with easily understood ramifications. Perhaps for the application, but this is a totally different ballgame when it comes to a database. Google for primary index field, and hopefully you will understand. I'm neither an idiot or a neophyte in this game. Yes, a database needs a primary unique key. That's not in debate. The questions are: Do we know the key before submission to the store? (If we don't the store operation shouldn't be asynchronous) Is the risk of discarded messages due to key collisions acceptable? (Some deployment cases consider such losses acceptable, others can guarantee uniqueness without Mailman's involvement) Rotely assuming that Mailman must guarantee key uniqueness before we hit the message store is not a given, its a choice. Let's at least be on the same page. -- J C Lawrence -(*)Satan, oscillate my metallic sonatas. [EMAIL PROTECTED] He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
At 1:28 PM -0500 2003/10/29, John A. Martin wrote: Hmm... Maildirs. Not. From http://www.washington.edu/imap/documentation/formats.txt.html: . mh This is supported for compatibility with the past. This is the format used by the old mh program. mh is very inefficient; the entire directory must be read and each file stat()'d, and in order to determine the size of a message, the entire file must be read and newline conversion performed. mh is deficient in that it does not support any permanent flags or keywords; and has no means to store UIDs (because the mh compress command renames all the files, that's why). [ ... deletia ... ] The Maildir format used by qmail has all of the performance disadvantages of mh noted above, with the additional problem that the files are renamed in order to change their status so you end up having to rescan the directory frequently the current names (particularly in a shared mailbox scenario). It doesn't scale, and it represents a support nightmare; [ ... deletia ... ] So what does this all mean? A database (such as used by Exchange) is really a much better approach if you want to move away from flat files. mx and especially Cyrus take a tenative step in that direction; mx failed mostly because it didn't go anywhere near far enough. Cyrus goes much further, and scores remarkable benefits from doing so. However, a well-designed pure database without the overhead of separate files would do even better. Of course, we all know about the database problems of Exchange, and how Exchange admins have to frequently shut everything down and clean their databases, how often they crash, how often they completely trash all e-mail for all their users, etc I submit that the reason for this is the combination of crappy Microsoft-style programming and the fact that no database handles BLOBs well. Even top-notch programmers have real problems with these kinds of implementations -- I am intimately familiar with the database implementation methods used in the AOL mail system, and suffice it to say that this is a really, really hairy nightmare that you do *NOT* want. That said, storing meta-data in a real database and then using external filesystem techniques for actually accessing the data, should give you the best of both worlds -- the speed of access of the database, and the reliability and well-understood access and backup mechanisms of filesystems. -- Brad Knowles, [EMAIL PROTECTED] They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++): a C++(+++)$ UMBSHI$ P+++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+() DI+() D+(++) G+() e++ h--- r---(+++)* z(+++) ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
At 1:30 PM -0500 2003/10/29, J C Lawrence wrote: Right, and I'm not arguing that. My point is two fold: 1) Using Message ID as a primary key is attractive. Agreed. 2) Message IDs are not guaranteed globally unique, but the collision rate can be manageable/acceptable in a large number of deployment cases. Outside of a database, this may be something you can decide whether or not to live with. Within the confines of a database, this simply is not possible. The ANSI SQL specification has some hard requirements for a primary index key: 1. It cannot ever be null. 2. It must always be guaranteed unique. I'm sure there are other requirements. But these two are a good start. We don't have to guarantee key uniqueness for all messages BEFORE they are submitted to the message store. All other keys could potentially be non-unique, or null, but not the primary index key. This is why many applications have the database assign the primary index key itself on insertion into the table, so that all the necessary requirements can be met. I'm neither an idiot or a neophyte in this game. Yes, a database needs a primary unique key. Then you must realize that we could not possibly use message-id as the primary index key, unless this is a field that we generate ourselves in such a way that all the necessary requirements are met. Rotely assuming that Mailman must guarantee key uniqueness before we hit the message store is not a given, its a choice. The message-id is not necessarily the primary index key. See above. With regards to a primary index key, there simply is no choice. The message-id could continue to be one of the many secondary index keys, which is a totally different issue. Let's at least be on the same page. Agreed. -- Brad Knowles, [EMAIL PROTECTED] They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++): a C++(+++)$ UMBSHI$ P+++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+() DI+() D+(++) G+() e++ h--- r---(+++)* z(+++) ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Oct 29, 2003, at 10:45 AM, Brad Knowles wrote: That said, storing meta-data in a real database and then using external filesystem techniques for actually accessing the data, should give you the best of both worlds -- the speed of access of the database, and the reliability and well-understood access and backup mechanisms of filesystems. Hint: look at what INN did when they implmented cycbufs. Effectively, you create 1-N files, or create files as needed. Each file is N bytes long, pre-allocated on file creation. When you store messages, they're written into the file sequentially (or any other way you want. If you want to get into best fit allocations and turn this into a malloc() style heap, be my guest). Metadata to access the info is then a filename, and an lseek() pointer into the file, and # of bytes to read, plus your normal identifying info. It's fast, it's efficient use of file pointers, it avoids the worst aspects of the unix file system, and I'm amazed nobody ever thinks to use it for other purposes (or that it took that long for usenet people to discover it, I suggested a simpler variant of it back in the 80s and was told inodes are our friends...) you can even do expiration/purge/etc if you want, by moving stuff around and changing the pointers. I've even thought of using it as the backing store for a picture library. With a nice relational database and a series of these data boxes, I think you have store data in the best and fastest possible way... ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, Oct 29, 2003 at 07:45:53PM +0100, Brad Knowles wrote: At 1:28 PM -0500 2003/10/29, John A. Martin wrote: Hmm... Maildirs. Not. From http://www.washington.edu/imap/documentation/formats.txt.html: [deletia] I don't know why a reasonable person would cite documentation pertaining to UW-IMAP, a server that has been a standards, security and performance bummer. Why not cite http://www.courier-mta.org/mbox-vs-maildir/? quote Painting just about every filesystem in existence with the same brush, and assuming that every filesystem works pretty much in the same way, is very misleading. Many contemporary high performance filesystem are designed explicitly for parallel access. For example, consider the SGI XFS filesystem: The free space and inodes within each AG are managed independently and in parallel so multiple processes can allocate free space throughout the file system simultaneously.[2] It took me about 6 months to write the first revision of the maildir-based Courier-IMAP server. The absence of maildir support in the UW-IMAP server is the reason I wrote it. Many people have found that it needed less memory, and was faster than UW-IMAP. Many people observed that upgrading to Courier-IMAP lowered their overall system load, and increased performance. Large mail clusters with a network-based fault tolerant, scalable, architecture frequently have problem deploying mbox-based mailboxes, due to many documented problems with file locking (file locking is required for mbox-based mailboxes) with network-based filesystems.[3] As referenced in [3], maildirs have no issues with NFS (the most common type of a network-based filesystem) since maildirs do not use locking. After looking around for some time, I did not find any independent benchmarks that directly measured the relative performance of mboxes and maildirs. Therefore I decided to run some actual benchmarks myself. I defined the test conditions according to UW-IMAP server's documentation. I created a test environment that stacked the deck in favor of mboxes. This was done in accordance with the claimed shortcomings of maildirs as stated in UW-IMAP server's documentation, in order to accurately measure the magnitude of the claimed problems. /quote and at the end: quote The final conclusion is that -- except in some specific instances -- using maildirs will be just as fast -- and in sometimes much faster -- than mbox files, while placing less of a load on the rest of the mail system. The claims in the UW-IMAP server's documentation regarding maildir performance can be supported only in certain, specific, very narrowly-defined conditions. There is no simple answer on which mail storage format is better. A lot depends on many variables that vary widely in different situations. Besides the raw benchmarks shown above, other factors include the mail server software being used, what kind of storage is being used, and the available network bandwidth. The final answer depends on all of the above. /quote [flame-bait deleted] A database (such as used by Exchange) is really a much better approach if you want to move away from flat files. mx and especially Cyrus take a tenative step in that direction; mx failed mostly because it didn't go anywhere near far enough. Cyrus goes much further, and scores remarkable benefits from doing so. However, a well-designed pure database without the overhead of separate files would do even better. It always confounds me that people will go for database voodoo and deride filesystems when a filesystem is a highly specialised database in and of itself. Putting things that are in a filesystem into a database offers the power and flexability of querying, but certianly should not be done for the sake of speed (assuming the filesystem-based implementation meets whatever other requirements are present). Of course, we all know about the database problems of Exchange, and how Exchange admins have to frequently shut everything down and clean their databases, how often they crash, how often they completely trash all e-mail for all their users, etc Which is a good lesson about databases: because of their flexability, they cannot be qa'd to cope with all of their uses without being put into production and losing data and being subsequently fixed. Filesystems, which have a more narrowly-defined scope, tend to suffer this less. Thats why database logs that live on filesystems are used for data recovery when a database eats itself. I submit that the reason for this is the combination of crappy Microsoft-style programming and the fact that no database handles BLOBs well. Even top-notch programmers have real problems with these kinds of implementations -- I am intimately familiar with the database implementation methods used in the AOL mail system, and suffice it to say that this is a really, really hairy nightmare that you
Re: [Mailman-Developers] Requirements for a new archiver
At 11:38 AM -0800 2003/10/29, Chuq Von Rospach wrote: Hint: look at what INN did when they implmented cycbufs. I did. See http://www.shub-internet.org/brad/papers/dihses/. -- Brad Knowles, [EMAIL PROTECTED] They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++): a C++(+++)$ UMBSHI$ P+++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+() DI+() D+(++) G+() e++ h--- r---(+++)* z(+++) ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
At 11:54 AM -0800 2003/10/29, Peter C. Norton wrote: It always confounds me that people will go for database voodoo and deride filesystems when a filesystem is a highly specialised database in and of itself. I am aware of that. I was aware of that when I first gave my invited talk entitled Design and Implementation of Highly Scalable E-mail Systems, which you can find at http://www.shub-internet.org/brad/papers/dihses/. Note that Eric Allman (author of the original Ingres database, among many other things) and Kirk McKusick (author of the Berkeley Fast File System) were in the audience. I did not embarrass myself. Databases aren't meant to be storage for abstract binary data. They're meant to be a searchable index of data of types they understand. Correct. And despite all claims to the contrary from the vendors, no database properly understands binary large objects, nor do they give you another datatype they do actually understand that would be suitable for the storage of e-mail message bodies. Assuming I had a clean slate to start a database project for a mail store, personally I'd much rather prototype it in something like postgresql where I could add data types to deal with email. I could then make header types, text types, mime types classes, etc. Then I could test to see if it was a good idea to implement it. IMO, that would be an exercise in futility. We've been down this road a million times before. We don't need to go down it again to know that the result is not likely to be successful, especially when we have alternatives that are proven to work well -- we store the message meta-data in the database, and then the message bodies in an separate message store akin to INN timecaf/timehash heaps (see http://www.shub-internet.org/brad/papers/dihses/lisa2000/sld090.htm). I think using a standard sql database for doing mail operations is asking for trouble. Standard databases don't know how to parse rfc822/2822 headers and that means that you've got to either write a whole lot of stored procedures in a clunky query language (or java!?!?!) and then maintain it, or you've got to do it all in the imap/pop3/whatever server which means a whole lot of yammering traffic between the database and the I/P/W server all the time, which == slow. You don't ask the database to understand or parse RFC2822 headers or messages. That's up to your application. You just store data using the formats known to the database, and the message bodies according to the methods above. -- Brad Knowles, [EMAIL PROTECTED] They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++): a C++(+++)$ UMBSHI$ P+++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+() DI+() D+(++) G+() e++ h--- r---(+++)* z(+++) ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, Oct 29, 2003 at 09:25:53PM +0100, Brad Knowles wrote: Assuming I had a clean slate to start a database project for a mail store, personally I'd much rather prototype it in something like postgresql where I could add data types to deal with email. I could then make header types, text types, mime types classes, etc. Then I could test to see if it was a good idea to implement it. IMO, that would be an exercise in futility. We've been down this road a million times before. We don't need to go down it again to know that the result is not likely to be successful, especially when we have alternatives that are proven to work well -- we store the message meta-data in the database, and then the message bodies in an separate message store akin to INN timecaf/timehash heaps (see http://www.shub-internet.org/brad/papers/dihses/lisa2000/sld090.htm). It seems like you're only partially agreeing/disagreeing with me (optimist/pessamist). Disagreeing: you're saying that using datatypes in the database which are appropriate to the kind of data being stored (mail messages) is an excercise in futility. But, agreeing: that storing these in a database in another way is OK. I don't get why you'd just want to store these as text when you have databases that can be made more suitable to the problem. I think using a standard sql database for doing mail operations is asking for trouble. Standard databases don't know how to parse rfc822/2822 headers and that means that you've got to either write a whole lot of stored procedures in a clunky query language (or java!?!?!) and then maintain it, or you've got to do it all in the imap/pop3/whatever server which means a whole lot of yammering traffic between the database and the I/P/W server all the time, which == slow. You don't ask the database to understand or parse RFC2822 headers or messages. That's up to your application. You just store data using the formats known to the database, and the message bodies according to the methods above. So all the parsing happens in the database client side. Which is slow. -Peter -- The 5 year plan: In five years we'll make up another plan. Or just re-use this one. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, 29 Oct 2003 11:38:33 -0800 Chuq Von Rospach [EMAIL PROTECTED] wrote: Hint: look at what INN did when they implmented cycbufs. Aye, its a cute system. Effectively, you create 1-N files, or create files as needed. Each file is N bytes long, pre-allocated on file creation. When you store messages, they're written into the file sequentially (or any other way you want. If you want to get into best fit allocations and turn this into a malloc() style heap, be my guest). Metadata to access the info is then a filename, and an lseek() pointer into the file, and # of bytes to read, plus your normal identifying info. It's fast, it's efficient use of file pointers, it avoids the worst aspects of the unix file system, and I'm amazed nobody ever thinks to use it for other purposes (or that it took that long for usenet people to discover it, I suggested a simpler variant of it back in the 80s and was told inodes are our friends...) Small caveat: Some modern fileystems make operating on the one-file-per-message stores extremely efficient. Admittedly they aren't in wide cross-platform deployment, but the filesystems and file op behaviour of today and yesteryear are not quite the same. I've even thought of using it as the backing store for a picture library. With a nice relational database and a series of these data boxes, I think you have store data in the best and fastest possible way... Some years back I talked to Mike Belshe (used to be at Remarq) about their storage techniques (I caught him shortly after Critical Path bought Remarq). Keying off other LISA papers they segmented their storage space by object size, customising and configuring each segment to suit (things like RAID strip size, number of spindles, FS tuning parameters, etc). He asserted that the rewards were very significant. However, these are very large archive problems and are a bit outside of Mailman's home turf. -- J C Lawrence -(*)Satan, oscillate my metallic sonatas. [EMAIL PROTECTED] He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, 29 Oct 2003 13:11:01 -0800 Chuq Von Rospach [EMAIL PROTECTED] wrote: On Oct 29, 2003, at 1:05 PM, David Birnbaum wrote: 2. third-party add-ons make it that much harder to install. If I have to set up a Mysql or Postgres database to use Mailman, it's a step that will put off people who don't already have it going. actually, if you do it right, it's much easier -- because when you build in those tools, you build in standardized interfaces that third party add-ons can access, instead of the current case, which are code hacks that break every time Barry burps at the CVS server... Aye, picking the right interface abstractions is key. There's also a disjoint between the novice SysAdm case who loves the fact of Mailman's all-in-one service, and the more meaty chap who integrates what he needs to. Much of Mailman's appeal at the low end is its all-in-one simple-to-install nature. (Well, ignoring thee GID FAQ...) Mailman v2.1 has a plugin layer for the membership roster. Its not a fully mature interface, but there are LDAP and SQL adaptors in the wild. At some point those adaptors will move into the Mailman core. If we move the archiving components (storage, presentation, index) behind plugin interfaces as well there's a reasonable opportunity for similar third parties to build adaptor layers which then also move into the Mailman core. Oh yeah, and just to keep Nigel Metheringham hopping: Mailman just doesn't have enough configuration options. -- J C Lawrence -(*)Satan, oscillate my metallic sonatas. [EMAIL PROTECTED] He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, Oct 29, 2003 at 10:14:52PM +0100, Brad Knowles wrote: I don't believe that there are any databases in existence that ... can be made more suitable to the problem. In theory you can add data types to postgresql. Not that I've done it myself, but its been done. -Peter -- The 5 year plan: In five years we'll make up another plan. Or just re-use this one. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, 29 Oct 2003 13:59:06 -0800 Peter C Norton [EMAIL PROTECTED] wrote: On Wed, Oct 29, 2003 at 10:14:52PM +0100, Brad Knowles wrote: I don't believe that there are any databases in existence that ... can be made more suitable to the problem. In theory you can add data types to postgresql. Not that I've done it myself, but its been done. True, but that doesn't answer the question of whether an RDBMS is a good storage tool for messages. I spent a couple months of spare time last year building an archiving system I liked atop PostgresQL using fully decomposed SQL structures for all the message bits. It was not a pretty exercise, and the results were worse. Brad makes excellent points in his comments on poor BLOB support, the value if DBs for meta-data, and disaster recovery ease. -- J C Lawrence -(*)Satan, oscillate my metallic sonatas. [EMAIL PROTECTED] He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, 29 Oct 2003 20:11:49 +0100 Brad Knowles [EMAIL PROTECTED] wrote: At 1:30 PM -0500 2003/10/29, J C Lawrence wrote: 2) Message IDs are not guaranteed globally unique, but the collision rate can be manageable/acceptable in a large number of deployment cases. Outside of a database, this may be something you can decide whether or not to live with. Within the confines of a database, this simply is not possible. Of course, and that's the point. We are in violent agreement. The ANSI SQL specification has some hard requirements for a primary index key: I know, but that's not what I'm asserting. I'll also ignore the DB types which don't require primary keys of any form, as that's essentially what we have now and we're assuming an indexed store instead. We don't have to guarantee key uniqueness for all messages BEFORE they are submitted to the message store. All other keys could potentially be non-unique, or null, but not the primary index key. Ahh, I think I see the disjoint. We're using key in two contexts without distinguishing between them: 1) The property of a message which identifies that message with a high probability of uniqueness. This can be a Message ID, MD5SUM, whatever, but it is not guaranteed unique, it merely is unique most of the time for large definitions of most. 2) The primary key as used in an indexed DB or other store which is guaranteed unique for all cases. Between the two there's a conflict. One requires perfect uniqueness. The other delivers merely a good Best Effort. The assertion is that we don't always have to solve that mismatch. We can elect to live with the collisions. This is why many applications have the database assign the primary index key itself on insertion into the table, so that all the necessary requirements can be met. Sure, except that doing that in our case requires that storage be a synchronous operation (otherwise we don't know the key at rewrite/delivery time). That would a significant change from the current model and rather unfriendly to a wide range of deployment cases. Keeping the storage procedure asynchronous with an a-priori key (for whatever guarantee of uniqueness) makes for a more interesting system. I'm neither an idiot or a neophyte in this game. Yes, a database needs a primary unique key. Then you must realize that we could not possibly use message-id as the primary index key, unless this is a field that we generate ourselves in such a way that all the necessary requirements are met. No, I don't realise that because it is false. We can use Message IDs as the primary key right now, today. In fact, I am, right now, this minute, today. You are assuming that every message submitted to the store must be accepted by the store. That is an assumption that hasn't been defined as a requirement and which some evidence suggests isn't a hard requirement. A very small percentage of the messages I submit to my store don't make it. They have duplicate Message IDs. They run through Mailman just fine. They never reach my list archives. I know, expect, accept this. The primary key has to be unique for every message IN THE STORE. Accepted. That does not dictate that the primary key for every message SUBMITTED to the store has to be unique (not that key assignment is occurring before collision check), or that the store has to ACCEPT every message which is submitted to it. Guaranteeing perfect uniqueness of the keys prior to submission to the store is fragile and expensive. It is tempting to do some form of very good approximation (cg Chuq's MD5SUM). Without perfect synchrony with the store's keys. if we calculate keys prior to insertion, or merely accept the keys that are given us in the form of Message IDs we're going to get occasional collisions. The question is how to handle messages whose a priori assigned keys collide with keys already in the store. We can handle the collision case in several ways: 1) Ignore it and discard messages bearing colliding keys. 2) Best Effort attempt to guarantee uniqueness within a window, with collisions outside the window discarded. 3) Fully guarantee uniqueness. The first is easy. The second is fairly easy. The third isn't trivial. In all three cases the population of key values in the store remains unique. Its just that the population of keys submitted to the store may or may not be unique. Lossage at the insertion layer can be acceptable. Let's at least be on the same page. Agreed. Cool. -- J C Lawrence -(*)Satan, oscillate my metallic sonatas. [EMAIL PROTECTED] He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, Oct 29, 2003 at 05:10:40PM -0500, J C Lawrence wrote: True, but that doesn't answer the question of whether an RDBMS is a good storage tool for messages. I spent a couple months of spare time last year building an archiving system I liked atop PostgresQL using fully decomposed SQL structures for all the message bits. It was not a pretty exercise, and the results were worse. Brad makes excellent points in his comments on poor BLOB support, the value if DBs for meta-data, and disaster recovery ease. I may not have made it clear, but I'm focusing on the metadata. Once you've parsed rfc822/2822, then it may become easier to have things in the database that can manipulate those types. I.e. to do be able to do simple searches for a property of given arbitrary headers (w/o having to have a database schema that consists of a few known headers and others which you then have to treat as a blob or as text). -Peter -- The 5 year plan: In five years we'll make up another plan. Or just re-use this one. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Oct 29, 2003, at 2:28 PM, Peter C. Norton wrote: I may not have made it clear, but I'm focusing on the metadata. Once you've parsed rfc822/2822, then it may become easier to have things in the database that can manipulate those types. I.e. to do be able to do simple searches for a property of given arbitrary headers (w/o having to have a database schema that consists of a few known headers and others which you then have to treat as a blob or as text). my only real worry is that from what I've seen, 99.99% of the time, the user is going to want content searches. header stuff is fine, but of really low priority in the scheme of things (necessary to put useful things together, meaningless if you can't content/context search in fulltext). that's why I'm leaning, blob issues or no, towards full-text storage in MySQL 4. Because if you can't easily chop up the message body content and find the messages you want to deal with, elegant storage of the headers is irrelevant... I think you need that, too. But until you get a reasonable context search for the message body, designing the rest is silly. And it seems to me there are few better methods than dumping the text into MySQL and letting it do the work. Compromises, tradeoffs and etc notwithstanding... ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
by the way, this statement is in conflict with my previous statemenet of use cycbufs. I'm fully aware of that conflict, too. resolving it will be one of the big challenges. On Oct 29, 2003, at 4:12 PM, Chuq Von Rospach wrote: that's why I'm leaning, blob issues or no, towards full-text storage in MySQL 4. Because if you can't easily chop up the message body content and find the messages you want to deal with, elegant storage of the headers is irrelevant... ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, 29 Oct 2003 16:12:50 -0800 Chuq Von Rospach [EMAIL PROTECTED] wrote: On Oct 29, 2003, at 2:28 PM, Peter C. Norton wrote: I may not have made it clear, but I'm focusing on the metadata. Once you've parsed rfc822/2822, then it may become easier to have things in the database that can manipulate those types. I.e. to do be able to do simple searches for a property of given arbitrary headers (w/o having to have a database schema that consists of a few known headers and others which you then have to treat as a blob or as text). my only real worry is that from what I've seen, 99.99% of the time, the user is going to want content searches. header stuff is fine, but of really low priority in the scheme of things (necessary to put useful things together, meaningless if you can't content/context search in fulltext). I see two needs, for significantly different populations. The first wants a browsing interface with keyed and indexed by date, thread, and author. The second wands full text search with rapid location and retrieval of matching messages. Often a single user will move between the access methods, reading by thread, bouncing over to a search, then reading all an author has written that match, then searching again, etc. As such two distinct sets of indexes seem called for: full text and message meta-data. that's why I'm leaning, blob issues or no, towards full-text storage in MySQL 4. Because if you can't easily chop up the message body content and find the messages you want to deal with, elegant storage of the headers is irrelevant... True. However, but this seems to conflate two distinct problems. If you're going to do unindexed searches then this makes sense, however except for minimal cases that's an interesting space. It scales like crap and has an even worse feature set. It is more interesting to split storage and indexing into distinct solution designs, and to build or pick something tailored for that smaller problem. That way you don't do full text searching, you do full text indexing and then search the indexes. I think you need that, too. But until you get a reasonable context search for the message body, designing the rest is silly. Is searching message bodies really interesting, or is building indexes of message bodies such that you can later search those indexes the actually interesting point? And it seems to me there are few better methods than dumping the text into MySQL and letting it do the work. Compromises, tradeoffs and etc notwithstanding... How does MySQL help you in building language-sensitive rapid response indexes of large text blobs? -- J C Lawrence -(*)Satan, oscillate my metallic sonatas. [EMAIL PROTECTED] He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
At 4:12 PM -0800 2003/10/29, Chuq Von Rospach wrote: that's why I'm leaning, blob issues or no, towards full-text storage in MySQL 4. Because if you can't easily chop up the message body content and find the messages you want to deal with, elegant storage of the headers is irrelevant... I think you could do full word indexing per message, and then store that index information in the database. Searching for phrases would require hitting the message bodies themselves, but searching for individual words could be done on indexed fields. -- Brad Knowles, [EMAIL PROTECTED] They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++): a C++(+++)$ UMBSHI$ P+++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+() DI+() D+(++) G+() e++ h--- r---(+++)* z(+++) ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, 29 Oct 2003 16:40:53 -0800 Chuq Von Rospach [EMAIL PROTECTED] wrote: by the way, this statement is in conflict with my previous statemenet of use cycbufs. I'm fully aware of that conflict, too. resolving it will be one of the big challenges. cycbufs implement a filesystem-based heap with pool semantics. (There's a fair bit of literature on that space in the OS and application realm) As such they are specifically tuned for the case where the number of calls to malloc() are of a similar magnitude to the calls to free(). This makes sense in a netnews world where news articles expire regularly, and in general as much data is added to the spool as is removed from it. Does that model really apply to list archives? It doesn't for me. I may be unusual in this regard, but I generally consider list archives as one-way systems: messages go in and never come out. -- J C Lawrence -(*)Satan, oscillate my metallic sonatas. [EMAIL PROTECTED] He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Thu, 30 Oct 2003 02:52:52 +0100 Brad Knowles [EMAIL PROTECTED] wrote: I think you could do full word indexing per message, and then store that index information in the database. Searching for phrases would require hitting the message bodies themselves, but searching for individual words could be done on indexed fields. Consider an index which records not just the fact of a token's presence in an entity, but also the offsets at which it occurs within the entity. Searching for phrases then consists of searching for objects which satisfy the boolean X AND Y, as well as the smaller clause offset(X) + length (X) + 1|2 == offset (Y). Larger phrases extend the equivalence language linearly, tho they create exponential search costs. -- J C Lawrence -(*)Satan, oscillate my metallic sonatas. [EMAIL PROTECTED] He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Oct 29, 2003, at 5:52 PM, Brad Knowles wrote: I think you could do full word indexing per message, and then store that index information in the database. Searching for phrases would require hitting the message bodies themselves, but searching for individual words could be done on indexed fields. you could, but is it worth doing it yourself when MySQL is building it for you? http://www.mysql.com/doc/en/Fulltext_Search.html http://jeremy.zawodny.com/blog/archives/000576.html http://www.zend.com/zend/tut/tutorial-ferrara1.php If you were just storing into a TEXT and then doing SELECT LIKE into it, I'd agree with you. But MySQL is doing interesting things here. Why not leverage it? ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Oct 29, 2003, at 6:16 PM, J C Lawrence wrote: I see two needs, for significantly different populations. The first wants a browsing interface with keyed and indexed by date, thread, and author. The second wands full text search with rapid location and retrieval of matching messages. Often a single user will move between the access methods, reading by thread, bouncing over to a search, then reading all an author has written that match, then searching again, etc. As such two distinct sets of indexes seem called for: full text and message meta-data. I think you need that, too. But until you get a reasonable context search for the message body, designing the rest is silly. Is searching message bodies really interesting, or is building indexes of message bodies such that you can later search those indexes the actually interesting point? You're basically asking why do you need google when you have yahoo? ask the folks who depend on google. (and yes, I'm oversimplifying to make a point). How does MySQL help you in building language-sensitive rapid response indexes of large text blobs? just posted a bunch of links. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Oct 29, 2003, at 6:22 PM, J C Lawrence wrote: cycbufs implement a filesystem-based heap with pool semantics. (There's a fair bit of literature on that space in the OS and application realm) As such they are specifically tuned for the case where the number of calls to malloc() are of a similar magnitude to the calls to free(). This makes sense in a netnews world where news articles expire regularly, and in general as much data is added to the spool as is removed from it. Does that model really apply to list archives? It doesn't for me. I may be unusual in this regard, but I generally consider list archives as one-way systems: messages go in and never come out. and in general, you're mostly right. Deletions out of archives are pretty minimal. But I think cycbufs still make a lot of sense as a way to reduce design complexity needed to avoid using up potentially infinite numbers of inodes, and the performance and design complexity inherent in building a storage structure around a typical unix filesystem. It's just so much less hassle on any number of levels dealing with 50 100 megabyte files than it is a directory structure with 500 megabytes of messages spread around 100,000 individual files. whether it's backups and restores, migrating data to a new server, etc, etc etc, you make life much simpler. And god help you if you're updating that structure when the system crashes and you have to fsck and put it back together again. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
At 9:22 PM -0500 2003/10/29, J C Lawrence wrote: cycbufs implement a filesystem-based heap with pool semantics. (There's a fair bit of literature on that space in the OS and application realm) As such they are specifically tuned for the case where the number of calls to malloc() are of a similar magnitude to the calls to free(). This makes sense in a netnews world where news articles expire regularly, and in general as much data is added to the spool as is removed from it. So long as the calls to malloc() are kept reasonably small (which is typically true in this case), it shouldn't matter whether or not there are any free() calls. Yes, you slowly build up more disk space in utilization, but all archive solutions will have the same problem, and this solution will scale as well as, or better than, any other that I know of. Consider the case where you are trying to store all news articles that have ever been posted -- not really much difference. -- Brad Knowles, [EMAIL PROTECTED] They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++): a C++(+++)$ UMBSHI$ P+++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+() DI+() D+(++) G+() e++ h--- r---(+++)* z(+++) ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Thu, 30 Oct 2003 04:08:45 +0100 Brad Knowles [EMAIL PROTECTED] wrote: At 9:22 PM -0500 2003/10/29, J C Lawrence wrote: cycbufs implement a filesystem-based heap with pool semantics. (There's a fair bit of literature on that space in the OS and application realm) As such they are specifically tuned for the case where the number of calls to malloc() are of a similar magnitude to the calls to free(). This makes sense in a netnews world where news articles expire regularly, and in general as much data is added to the spool as is removed from it. So long as the calls to malloc() are kept reasonably small (which is typically true in this case), it shouldn't matter whether or not there are any free() calls. I've written several heap managers including several pool based systems as well as other sorts of custom allocators. There are a great many simplifications that come along with the write-once approach, especially in terms of the trade-offs between allocation expense and free space management. Yes, you slowly build up more disk space in utilization, but all archive solutions will have the same problem, and this solution will scale as well as, or better than, any other that I know of. Which is not exactly my point. cycbufs are a useful technique to be sure, much as Chuq has discussed from a management perspective. My point is more that I don't see that they add anything essentially different to the storage space in terms of storage semantics. You get a higher rate of file handle re-use, a more friendly filesystem behaviour for older filesystem designs (pleasant optimisations), but exactly the same single key - byte stream without adding any more interesting verbs of transforms to the solution space. This is not a Bad Thing, just not something that seems applicable at this state in the design discussion. First come ontology and semantics, then comes implementation. Consider the case where you are trying to store all news articles that have ever been posted -- not really much difference. Actually the two cases are considerably different. In the delete case I have to do pool management, with some eye toward fragmentation control and optimisations of average latency for free heap searches, as well as heap integrity audits. In the write-only case I just build on the end and need pay no mind to prior data once it is allocated. In both cases I have to do predictive work on the distribution of allocation sizes, but that's far cheaper in the write-only case as the multiple-pool search overhead can be entirely skipped. There's a considerable difference in complexity between the two. -- J C Lawrence -(*)Satan, oscillate my metallic sonatas. [EMAIL PROTECTED] He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
At 7:00 PM -0800 2003/10/29, Chuq Von Rospach wrote: you could, but is it worth doing it yourself when MySQL is building it for you? http://www.mysql.com/doc/en/Fulltext_Search.html From the top of this page: 6.8 MySQL Full-text Search As of Version 3.23.23, MySQL has support for full-text indexing and searching. Full-text indexes in MySQL are an index of type FULLTEXT. FULLTEXT indexes are used with MyISAM tables only and can be created from CHAR, VARCHAR, or TEXT columns at CREATE TABLE time or added later with ALTER TABLE or CREATE INDEX. For large datasets, it will be much faster to load your data into a table that has no FULLTEXT index, then create the index with ALTER TABLE (or CREATE INDEX). Loading data into a table that already has a FULLTEXT index could be significantly slower. Moreover, mail messages will be a undetermined variable length. Can MySQL support a 32-bit VARCHAR? What about type TEXT? Or 8-bit or even 16-bit character sets? Since you might be storing a lot of MIME bodypart types, can it handle BLOBs, and can it handle them well? Or, do you do parsing within your archive application and store the entire message somewhere outside of the database, while storing a FULLTEXT index of only the bodypart types you declare to be human-readable? What if you want to do a case-sensitive search? In that case, it doesn't look like FULLTEXT or MATCH will do you any good, since MATCH is declared to be case-insensitive. Or what if you want to search for hyphenated literals? It seems that MATCH considers them to be word breaks even within literal searches. If you were just storing into a TEXT and then doing SELECT LIKE into it, I'd agree with you. But MySQL is doing interesting things here. Why not leverage it? I'm not sure it really helps in this case. I'm not sure it can handle the amounts of data that might need to be stored into a field, or the different character sets that might need to be used. I'm also concerned about what using this function might do to the overall speed and size of the database. On the page quoted above, look for benchmark data reported by Jim Nguyen and John Takacs. Two million rows with text and multiple word searches (three or more) taking 30-seconds to a minute to complete, is not good performance. Three to five million rows, with searches taking 50 seconds or more for single words, is not good performance. Now, consider how many words might be in a single message (hundreds to thousands or even tens of thousands), and how many messages might be in a single archive (thousands to millions). If each message was contained within a row, this would be dead-Universe slow. -- Brad Knowles, [EMAIL PROTECTED] They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++): a C++(+++)$ UMBSHI$ P+++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+() DI+() D+(++) G+() e++ h--- r---(+++)* z(+++) ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, 2003-10-29 at 13:45, Brad Knowles wrote: That said, storing meta-data in a real database and then using external filesystem techniques for actually accessing the data, should give you the best of both worlds -- the speed of access of the database, and the reliability and well-understood access and backup mechanisms of filesystems. I'm strongly in favor of this kind of approach. I don't know what the best on-disk storage format is (although cycbuf sounds interesting), but I'm pretty sure we want the raw messages stored as plain files on the file system. We may even want both the encoded and decoded messages stored on the file system -- at the very least, we should have attachments decoded and stored in separate files. Then we want metadata about the messages stored in a database. We should be able to regenerate or update the metadata by trolling over the raw message storage, and we should be able to vend messages from the message store via any number of protocols. The message store should be a central component of Mailman, but it should be defined by an interface in case we decide to change the implementation of the message store. -Barry ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
At 10:27 PM -0500 2003/10/29, J C Lawrence wrote: Actually the two cases are considerably different. In the delete case I have to do pool management, with some eye toward fragmentation control and optimisations of average latency for free heap searches, as well as heap integrity audits. In the write-only case I just build on the end and need pay no mind to prior data once it is allocated. Not really. You still have to maintain all the indexes, make sure that if things get moved around that all the links get updated, etc True, you don't have to worry about fragementation control or other more complex aspects of heap management, but that's a further cost savings over other techniques and not a drawback to using this technique for this purpose. Now, if you want to consider what would happen to you if the Scientologists ever came after you, or if you had court orders to remove postings that linked to bomb-making instructions, you'd probably want to keep all those other tools related to heap management around anyway. They'd be less likely to be used, but at least you wouldn't have to take the entire site down while you went and wrote the tools from scratch to handle a situation that you had not foreseen. -- Brad Knowles, [EMAIL PROTECTED] They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++): a C++(+++)$ UMBSHI$ P+++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+() DI+() D+(++) G+() e++ h--- r---(+++)* z(+++) ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, 2003-10-29 at 14:38, Chuq Von Rospach wrote: Hint: look at what INN did when they implmented cycbufs. Effectively, you create 1-N files, or create files as needed. Each file is N bytes long, pre-allocated on file creation. When you store messages, they're written into the file sequentially (or any other way you want. If you want to get into best fit allocations and turn this into a malloc() style heap, be my guest). Metadata to access the info is then a filename, and an lseek() pointer into the file, and # of bytes to read, plus your normal identifying info. It's fast, it's efficient use of file pointers, it avoids the worst aspects of the unix file system, and I'm amazed nobody ever thinks to use it for other purposes (or that it took that long for usenet people to discover it, I suggested a simpler variant of it back in the 80s and was told inodes are our friends...) I'm not sure if Andrew Koenig is on this list, but he described an algorithm he developed to quickly find messages in an mbox file. If he's here, maybe he can talk about it. I really don't like mbox files, primarily because they require munging From lines in the body of the message. MMDF would be better, but I think ideal from a philosophical point of view would be one-message-per-file if it can be done efficiently cross-platform. Maybe file system experts here can provide pointers or advice on exactly which file and operating systems make this approach feasible, even for huge message counts. you can even do expiration/purge/etc if you want, by moving stuff around and changing the pointers. I've even thought of using it as the backing store for a picture library. With a nice relational database and a series of these data boxes, I think you have store data in the best and fastest possible way... It's a very interesting idea. -Barry ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, 2003-10-29 at 15:41, J C Lawrence wrote: Some years back I talked to Mike Belshe (used to be at Remarq) about their storage techniques (I caught him shortly after Critical Path bought Remarq). Keying off other LISA papers they segmented their storage space by object size, customising and configuring each segment to suit (things like RAID strip size, number of spindles, FS tuning parameters, etc). He asserted that the rewards were very significant. However, these are very large archive problems and are a bit outside of Mailman's home turf. Mailman's philosophy is, keep it as simple as possible to handle 80% of the installations out there, but provide enough framework for the other 20% to extend for extreme uses. Strategies to accomplish this include defining interfaces to key components, and shipping something that works out of the box and is good enough for most people. It's not always easy, of course, to architect something that scales this way. I think we have a pretty good idea of the scaling problems with Mailman 2, and I hope we can push the envelop significantly for Mailman 3. -Barry ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
At 10:47 PM -0500 2003/10/29, Barry Warsaw wrote: I'm not sure if Andrew Koenig is on this list, but he described an algorithm he developed to quickly find messages in an mbox file. If he's here, maybe he can talk about it. 7th edition mbox files are a pain. There are other mailbox file formats that are much better and easier to parse (UW-IMAP .mbx being one). I really don't like mbox files, primarily because they require munging From lines in the body of the message. MMDF would be better, but I think ideal from a philosophical point of view would be one-message-per-file if it can be done efficiently cross-platform. Therein lies the problem. Some filesystems make this more feasible than others, at least on larger scale systems. Maybe file system experts here can provide pointers or advice on exactly which file and operating systems make this approach feasible, even for huge message counts. SGIs XFS on Irix does a pretty good job, with hashed directory structures, and an extent-based journaling filesystem. Regretfully, I don't think that all of these features are fully supported under the Linux version of XFS, and that work has basically ground to a halt with the lay-offs of all the key SGI people who had been working on XFS. Veritas VxFS also does a good job in this area. Other than SGI XFS for Irix and Veritas VxFS, I don't know of any good solutions to this problem at the filesystem level. Kirk McKusick and Eric Allman agree with you that this is a proper filesystem problem that should be solved at the filesystem level (at least, that's what they've said to me when I brought this issue up to them), and they feel you should not attempt to solve filesystem problems with tricks like INN timecaf/timehash cycbufs. However, while that's nice in theory, that doesn't necessarily help us here in the real world. -- Brad Knowles, [EMAIL PROTECTED] They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++): a C++(+++)$ UMBSHI$ P+++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+() DI+() D+(++) G+() e++ h--- r---(+++)* z(+++) ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Thu, 30 Oct 2003 04:45:37 +0100 Brad Knowles [EMAIL PROTECTED] wrote: At 10:27 PM -0500 2003/10/29, J C Lawrence wrote: Actually the two cases are considerably different. In the delete case I have to do pool management, with some eye toward fragmentation control and optimisations of average latency for free heap searches, as well as heap integrity audits. In the write-only case I just build on the end and need pay no mind to prior data once it is allocated. Not really. You still have to maintain all the indexes, make sure that if things get moved around that all the links get updated, etc With a write-once system you don't actually need to ever move anything. At its core it is: Open one file, repetitively append to end until file size exceeds size N, create new file, repeat. You can do object size clustering across files or other optimisation techniques, but the basic pattern remains the same. For the few cases you have to support delete you either just NULL the byte stream for the pointed-to object, or you invalidate the key. As the frequency and number of such deletes is infinitesimal, they require no special management complexity. You can afford to just swallow the lost free space as the cost of attempting to manage it is simply never rewarded. True, you don't have to worry about fragementation control or other more complex aspects of heap management, but that's a further cost savings over other techniques and not a drawback to using this technique for this purpose. True. I'm not lableing it a drawback, just a boon of dubious advantage. Now, if you want to consider what would happen to you if the Scientologists ever came after you, or if you had court orders to remove postings that linked to bomb-making instructions, you'd probably want to keep all those other tools related to heap management around anyway. Not really. The percentage of such deleted posts over the lifetime of the store can be generally assumed to be less than 1 in 10^5, and is probably considerably lower, if not in the 1:10^8 range. Add a simple invalid key semantic and you're done. Caveat: Continual addition and deletion of SPAM from an archive would change this balance. They'd be less likely to be used, but at least you wouldn't have to take the entire site down while you went and wrote the tools from scratch to handle a situation that you had not foreseen. You're going to need tools when the percentage of such deleted postings is sufficiently high that the cost of the lost free space and its overhead exceeds the cost of managing that free space. That's not a quick thing. -- J C Lawrence -(*)Satan, oscillate my metallic sonatas. [EMAIL PROTECTED] He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, 2003-10-29 at 16:54, J C Lawrence wrote: Aye, picking the right interface abstractions is key. Right on. There's also a disjoint between the novice SysAdm case who loves the fact of Mailman's all-in-one service, and the more meaty chap who integrates what he needs to. Much of Mailman's appeal at the low end is its all-in-one simple-to-install nature. (Well, ignoring thee GID FAQ...) Yep, and I really really want Mailman 3 to take this concept farther. Some things that I think will help include, using Twisted to eliminate the /requirement/ of Apache integration and possibly the incoming mail server integration, as well as implement a bulk mailer to eliminate the need for an outgoing mail server. Ideally, it will still be possible to integrate with a Postfix for incoming and outgoing, but it shouldn't be necessary to get up and running. Mailman v2.1 has a plugin layer for the membership roster. Its not a fully mature interface, but there are LDAP and SQL adaptors in the wild. This interface was largely bolted on, so it's clumsy. Mailman 3 will be defined by interfaces from the start. At some point those adaptors will move into the Mailman core. If we move the archiving components (storage, presentation, index) behind plugin interfaces as well there's a reasonable opportunity for similar third parties to build adaptor layers which then also move into the Mailman core. Oh yeah, and just to keep Nigel Metheringham hopping: Mailman just doesn't have enough configuration options. Heh. That's another issue. I'm sure Mailman 3 will grow many more configuration options. The trick is making them manageable (and mostly ignorable -- i.e. the defaults Usually Work out of the box). I've been experimenting with ideas for list styles which will make list admins lives easier I think, without reducing the flexibility for experts. -Barry ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
And since Barry's underlying philosophy is to minimize the number of things Mailman depends on, that sort of lets out depending on them having an OS with a high-performance journaling filesystem, no? (giggle) On Oct 29, 2003, at 8:00 PM, Brad Knowles wrote: However, while that's nice in theory, that doesn't necessarily help us here in the real world. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, 29 Oct 2003 23:01:14 -0500 Barry Warsaw [EMAIL PROTECTED] wrote: On Wed, 2003-10-29 at 16:54, J C Lawrence wrote: Aye, picking the right interface abstractions is key. Right on. I'm still debating if I can run down there on the 8th. I'd love to go to EuroQuest, but I also really need to be in Providence on the 7th, and back at work on the 9th. Aaaarrrgh. _IF_ I can make it we must go hit a pub with whiteboards in hand. Sorry for no earlier reply on this BTW, I'm in drowning eyeballs mode. ...as well as implement a bulk mailer to eliminate the need for an outgoing mail server. Eeeek! I trust this would be for immediate handoff to a real MTA versus handling final delivery directly? Quite the Pandora's box if not. Mailman v2.1 has a plugin layer for the membership roster. Its not a fully mature interface, but there are LDAP and SQL adaptors in the wild. This interface was largely bolted on, so it's clumsy. Mailman 3 will be defined by interfaces from the start. nod BTW Whatever happened to Michel Pelletier's interfaces PEP? I see the draft, and I see signs that something got done, but not what... Oh yeah, and just to keep Nigel Metheringham hopping: Mailman just doesn't have enough configuration options. Heh. That's another issue. Last I heard Nigel was still running screaming into the hills. I'm sure Mailman 3 will grow many more configuration options. The trick is making them manageable (and mostly ignorable -- i.e. the defaults Usually Work out of the box). nod I've been experimenting with ideas for list styles which will make list admins lives easier I think, without reducing the flexibility for experts. Aye, that's something the Plone folk have been digging at with some success: a base library of waffle-stomp configuration patterns. I'm not sure for Mailman if we want just a picklist, or a very simple wizard. I suspect something more akin to the very brief QA wizard at Creative Commons for choosing a license type may be more effective and interesting than a picklist: http://creativecommons.org/license/ Very simple, very general, covers the basic cases, hides all the ugly stuff and picks sane defaults. It becomes even more interesting if site admins can tailor the configs for the basic cases. -- J C Lawrence -(*)Satan, oscillate my metallic sonatas. [EMAIL PROTECTED] He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, 2003-10-29 at 16:14, Brad Knowles wrote: One key factor here is that all of the information in the database should be able to be re-created from the message bodies alone, if there should happen to be a catastrophic system crash. Just to be dense, let me ask for clarification: by message body you mean the entire original message, as received on the wire, not just the message payload (i.e. sans RFC 2822 headers). If so, I agree completely. But I also think the decoded message should be stored on the file system somehow as well. I.e. decode attachments and store then as separate files too. -Barry ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
At 11:01 PM -0500 2003/10/29, J C Lawrence wrote: With a write-once system you don't actually need to ever move anything. Depends on how you manage the storage of those large files. If you have an infinitely large filesystem that is guaranteed 100% reliable in all possible circumstances, you're right. Otherwise, you might find that the filesystem is getting full and things need to be moved around, or you suffer a disk or storage system crash and you have to restore from backups, or you use an HSM solution to move older files to slower/higher capacity storage, or you have issues with too many large files in a single directory and need to implement your own directory hashing scheme, etc Not really. The percentage of such deleted posts over the lifetime of the store can be generally assumed to be less than 1 in 10^5, and is probably considerably lower, if not in the 1:10^8 range. Add a simple invalid key semantic and you're done. It depends on whether or not the court order allows you to just mark things as deleted and be done with it. If they force you to actually expunge all copies of that data from your systems, you will have to do more work. You're going to need tools when the percentage of such deleted postings is sufficiently high that the cost of the lost free space and its overhead exceeds the cost of managing that free space. That's not a quick thing. True enough, but as you've pointed out, there have been a number of implementations of this sort of solution, and you've worked on at least a couple yourself. These sorts of tools should already be reasonably well understood and not too difficult to write or borrow from other sources. -- Brad Knowles, [EMAIL PROTECTED] They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++): a C++(+++)$ UMBSHI$ P+++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+() DI+() D+(++) G+() e++ h--- r---(+++)* z(+++) ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
At 11:01 PM -0500 2003/10/29, Barry Warsaw wrote: Yep, and I really really want Mailman 3 to take this concept farther. Some things that I think will help include, using Twisted to eliminate the /requirement/ of Apache integration and possibly the incoming mail server integration, as well as implement a bulk mailer to eliminate the need for an outgoing mail server. There, I have to disagree. Both the web server and the mail server issues are complex enough that I don't believe it would be a good idea to try and re-invent this wheel. There are already enough bad web server and mail server implementations out there -- we don't need to make this situation worse. There may be some mailing-list specific issues that we can (and should) handle better inside mailman before we hand these things off to the other servers, but both Apache and postfix/sendmail/exim have enough experience and world-wide testing behind them to make it little else than folly resulting from hubris to try and replace them. There's just no substitute for having hundreds of millions of people world-wide pounding on these things day-in and day-out 365 days a year. Components like this should be scheduled for replacement if, and only if, you can demonstrate beyond a reasonable doubt that there are inherent problems that are insurmountable otherwise, and there is no feasible alternative. You don't just take a Tom Mix pocket knife and cut open your own chest and remove your heart, to replace it with a mechanical pump that you designed yourself out of a tin can, a turkey baster, some bailing wire, and some garden hose. If you absolutely require a heart transplant and there are no human alternatives, you get a world-respected heart surgeon to perform the operation using the latest techniques and the Jaarvik 9 (or whatever). And then you get everyone in your family, all your friends, all your neighbors, all your church members, and hopefully all religious people world-wide to pray for you. -- Brad Knowles, [EMAIL PROTECTED] They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++): a C++(+++)$ UMBSHI$ P+++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+() DI+() D+(++) G+() e++ h--- r---(+++)* z(+++) ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, 2003-10-29 at 22:06, Chuq Von Rospach wrote: It's just so much less hassle on any number of levels dealing with 50 100 megabyte files than it is a directory structure with 500 megabytes of messages spread around 100,000 individual files. whether it's backups and restores, migrating data to a new server, etc, etc etc, you make life much simpler. And god help you if you're updating that structure when the system crashes and you have to fsck and put it back together again. We should just throw everything into a ZODB FileStorage Data.fs file, and let it grow to gigs in size 1/2 wink. -Barry ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
At 11:18 PM -0500 2003/10/29, Barry Warsaw wrote: Just to be dense, let me ask for clarification: by message body you mean the entire original message, as received on the wire, not just the message payload (i.e. sans RFC 2822 headers). If so, I agree completely. Yes, you are correct. At issue is that there might be some headers which some users might wish to search on (or maybe just see) which might not be put into one or more of the fields, and you don't want to take the risk of losing those by assuming that you can always re-generate all the headers from what you've stored inside the database. But I also think the decoded message should be stored on the file system somehow as well. I.e. decode attachments and store then as separate files too. My experience is that this is a bad idea. However, if the implementation is fully modularized at the API level, then we can always rip out the mailman solution and instead put in something that actually works. -- Brad Knowles, [EMAIL PROTECTED] They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++): a C++(+++)$ UMBSHI$ P+++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+() DI+() D+(++) G+() e++ h--- r---(+++)* z(+++) ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Thu, Oct 30, 2003 at 05:00:48AM +0100, Brad Knowles wrote: SGIs XFS on Irix does a pretty good job, with hashed directory structures, and an extent-based journaling filesystem. Regretfully, I don't think that all of these features are fully supported under the Linux version of XFS, and that work has basically ground to a halt with the lay-offs of all the key SGI people who had been working on XFS. Veritas VxFS also does a good job in this area. [ A cursory google search indicates that hashed dirs, extents, and journalling are all in linux xfs. I can't imagine an unsupported feature making its way into the filesystem that SGI is putting on its latest and greatest systems, but if you know about this, please share ] In the case of a one-file-per-message approach, my experience with vxfs is that it creates a rather slow filesystem when you get your filesystem to the point of haing with a few hundred thousand small files (lots of wasted space in the extents and I believe, though I may be wrong, that there were lots of metadata lookups through multiple layers of indirections slowing things down). However reiserfs was built to handle a mix of lots of small files, ala maildir or mh spools. I'm not too current on current bsd going-ons, but I'd bet that ffs2 has something to offer in this arena, too, since it looks like it almost does extent-based allocation now. Kirk McKusick and Eric Allman agree with you that this is a proper filesystem problem that should be solved at the filesystem level (at least, that's what they've said to me when I brought this issue up to them), and they feel you should not attempt to solve filesystem problems with tricks like INN timecaf/timehash cycbufs. Err... then to relate this to a prior post, why not just use maildirs on filesystems that are engineered to handle that sort of thing? However, while that's nice in theory, that doesn't necessarily help us here in the real world. Unless you are using a filesystem that works for this, right? Like xfs, vxfs, reiserfs, and probably ffs2. I believe that linux's ext3 has support for hashing directories (or soon will - I don't precisely know as I've been focusing on other things) -Peter -- The 5 year plan: In five years we'll make up another plan. Or just re-use this one. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Oct 29, 2003, at 8:26 PM, Brad Knowles wrote: There may be some mailing-list specific issues that we can (and should) handle better inside mailman before we hand these things off to the other servers, but both Apache and postfix/sendmail/exim have enough experience and world-wide testing behind them to make it little else than folly resulting from hubris to try and replace them. +1 I've experimented with direct-out-the-pipe delivery systems. Trust me, you don't want to go there. It's not trivial. Well, it's trivial for 90% of the world that follows the RFCs and behaves as expected and has the right DNS setups and isn't trying to outsmart spammers by being stupid. and you'll spend the other 90% of your time trying to build compatibility in with the other 10%. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Oct 29, 2003, at 8:27 PM, Barry Warsaw wrote: We should just throw everything into a ZODB FileStorage Data.fs file, and let it grow to gigs in size 1/2 wink. troll until you have to split it across two disks because one is full. and don't forget, a single monolithic storage file gets backed up fully every time you change it. The guy in charge of buying tapes to back up your system just screamed in agony, since there's no possibility of an incremental backup for what is 99.999% static data. /troll ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
And windows? And older hardware? Solaris 8? Hell, solaris 6 and 7? You going to depend on people only running year-old-or-less hardware and OS? On Oct 29, 2003, at 8:35 PM, Peter C. Norton wrote: I'm not too current on current bsd going-ons, but I'd bet that ffs2 has something to offer in this arena, too, since it looks like it almost does extent-based allocation now. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, 2003-10-29 at 23:17, J C Lawrence wrote: I'm still debating if I can run down there on the 8th. I'd love to go to EuroQuest, but I also really need to be in Providence on the 7th, and back at work on the 9th. Aaaarrrgh. _IF_ I can make it we must go hit a pub with whiteboards in hand. Sounds great. Bring a laptop and we'll bang out some code (anyone else up for a mini-Mailman-3 sprint at my house? :). I'll probably be heading to Fedex Field on the 9th for a 'Skins game, so the 8th would be perfect. ...as well as implement a bulk mailer to eliminate the need for an outgoing mail server. Eeeek! I trust this would be for immediate handoff to a real MTA versus handling final delivery directly? Quite the Pandora's box if not. Yep, which makes me nervous, but which does have a certain standalone-ability appeal. I don't want to write it off, and of course, we'll have an interface for this so the first (only?) implementation will be MTA hand-off. BTW Whatever happened to Michel Pelletier's interfaces PEP? I see the draft, and I see signs that something got done, but not what... Dead in the water AFAIK. But there are lots of folks using a more formal interface system for Python applications, such as for Zope3. Just writing the interface down, with good docstrings, goes a long way. Last I heard Nigel was still running screaming into the hills. Hey, I love Exim -- Greg's done some very cool stuff with it on mail.{python,zope}.org. But man, I find it hard to track down just the right knob I need to tweak. :) Aye, that's something the Plone folk have been digging at with some success: a base library of waffle-stomp configuration patterns. I'm not sure for Mailman if we want just a picklist, or a very simple wizard. I haven't even thought about how to surface it in the u/i -- it's mostly machinery right now. But yeah, a wizard is just the ticket, at least for canned styles (which again, will solve 80% of the problem). Which reminds me -- I'm really hoping we can get some web u/i jockies and CSS geeks in to eventually make things real purty. Dammit Jim, I'm a musician, not a graphic artist. :) -Barry ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Thu, 30 Oct 2003 05:15:58 +0100 Brad Knowles [EMAIL PROTECTED] wrote: At 11:01 PM -0500 2003/10/29, J C Lawrence wrote: With a write-once system you don't actually need to ever move anything. Depends on how you manage the storage of those large files. If you have an infinitely large filesystem that is guaranteed 100% reliable in all possible circumstances, you're right. Otherwise, you might find that the filesystem is getting full and things need to be moved around, or you suffer a disk or storage system crash and you have to restore from backups, or you use an HSM solution to move older files to slower/higher capacity storage, or you have issues with too many large files in a single directory and need to implement your own directory hashing scheme, etc True, but most of those really end up being a meta-indexing problem. You have many big files. You have indexes which point into those many big files. Occasionally you move those big files about, so your meta-indexes need to be changed point to the new locations of the big files, but the same offsets within the big files... Its really not an expensive or difficult space. If you really need to move individual messages about between file blobs at a respectable rate, then you're in another world of pain, but we don't have any evidence of that requirement, or that such a requirement can't be handled by simply unrolling the big file and respooling the individual messages onto the ends of other big files in different locations. Not really. The percentage of such deleted posts over the lifetime of the store can be generally assumed to be less than 1 in 10^5, and is probably considerably lower, if not in the 1:10^8 range. Add a simple invalid key semantic and you're done. It depends on whether or not the court order allows you to just mark things as deleted and be done with it. If they force you to actually expunge all copies of that data from your systems, you will have to do more work. Ahem. for key in list_of_bad_message_keys: big_file, offset, length = get_message_big_file (key) handle = open (big_file) handle.seek (offset) handle.write (' ', length) handle.close () key.invalidate () Not a whole lot more complexity. You're just invalidating the pointed-to data as well as the key. You're still not doing free space management. -- J C Lawrence -(*)Satan, oscillate my metallic sonatas. [EMAIL PROTECTED] He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, 29 Oct 2003 23:27:46 -0500 Barry Warsaw [EMAIL PROTECTED] wrote: On Wed, 2003-10-29 at 22:06, Chuq Von Rospach wrote: We should just throw everything into a ZODB FileStorage Data.fs file, and let it grow to gigs in size 1/2 wink. There are good reasons I use DirectoryStorage: $ find /var/lib/zope/instance/default/var/Data_fs_dir -type f | wc -l 499266 Lotsa little teensy files! -- J C Lawrence -(*)Satan, oscillate my metallic sonatas. [EMAIL PROTECTED] He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, 2003-10-29 at 23:26, Brad Knowles wrote: There, I have to disagree. Both the web server and the mail server issues are complex enough that I don't believe it would be a good idea to try and re-invent this wheel. There are already enough bad web server and mail server implementations out there -- we don't need to make this situation worse. Let's not discount the integration problems, which are a huge headache for newbies. I'm fairly certain that Twisted is the right approach for surfacing the web u/i to Mailman. The requirements are not overwhelming and fronting Mailman's u/i with Apache really doesn't buy us that much. We all agree that CGI sucks, and we could make that better with mod_python or some other such glue, but why go to the trouble? Relying on Twisted for the incoming mail protocols is something I'm less certain about, although there is a lot of appeal to this approach. We could throw lots smarts into a Python port-25 listener, including global spam fighting and bounce processing. An approach like Exim + elspy affords some really cool possibilities. A bigger negative is that there's less precedence for proxying smtpd as there is for httpd, so it's harder to fit Mailman into the mix with an existing mail server. -Barry ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, 2003-10-29 at 23:36, Chuq Von Rospach wrote: I've experimented with direct-out-the-pipe delivery systems. Trust me, you don't want to go there. It's not trivial. Well, it's trivial for 90% of the world that follows the RFCs and behaves as expected and has the right DNS setups and isn't trying to outsmart spammers by being stupid. and you'll spend the other 90% of your time trying to build compatibility in with the other 10%. Chuq, do you think it would be feasible for Mailman to try to handle that 90% itself, and then only hand-off to a Real MTA when it runs into trouble with the other 10% -- assuming it could know when it runs into trouble. Also, there's incoming SMTP and outgoing SMTP. It may be possible to build in support for one direction without providing the other. (It also may not be worth it.) -Barry ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, 29 Oct 2003 20:37:49 -0800 Chuq Von Rospach [EMAIL PROTECTED] wrote: On Oct 29, 2003, at 8:27 PM, Barry Warsaw wrote: and don't forget, a single monolithic storage file gets backed up fully every time you change it. The guy in charge of buying tapes to back up your system just screamed in agony, since there's no possibility of an incremental backup for what is 99.999% static data. Ha! So just why do you think I moved off FileStorage for Data.fs? That said there's some value in getting a versioning data store with rollback support for list configs. The data volume isn't huge, but it is highly sensitive. I'd also like to see flat text logging of all configuration changes in addition to moderation activity. It would save the help and support desks a lot of hurt. -- J C Lawrence -(*)Satan, oscillate my metallic sonatas. [EMAIL PROTECTED] He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, 2003-10-29 at 23:40, Chuq Von Rospach wrote: And windows? Hey, ignoring Windows has been a successful strategy so far, why stop now? Plus, Longhorn will save us all, right? Oh, and Everything Will Be Faster Next Year Anyway. wink -Barry ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
At 8:35 PM -0800 2003/10/29, Peter C. Norton wrote: [ A cursory google search indicates that hashed dirs, extents, and journalling are all in linux xfs. I can't imagine an unsupported feature making its way into the filesystem that SGI is putting on its latest and greatest systems, but if you know about this, please share ] My understanding is that the port of XFS to Linux was only about 70% done at the time the critical software engineers were laid off by SGI, and that no further work in this area has been done. Maybe the features are supposedly there but incomplete. However reiserfs was built to handle a mix of lots of small files, ala maildir or mh spools. I'm sorry, I don't trust ReiserFS at all. I'd trust XFS if it was on Irix, or IBMs JFS, but not ReiserFS. Hell, on a Linux system, I'd use ext2fs before I'd use Reiser. I'm not too current on current bsd going-ons, but I'd bet that ffs2 has something to offer in this arena, too, since it looks like it almost does extent-based allocation now. No, not yet. There are improvements in the areas of handling synchronous meta-data updates, background fsck, etc... but nothing like extent-based filesystems or integrated hashed directory schemes, etc Err... then to relate this to a prior post, why not just use maildirs on filesystems that are engineered to handle that sort of thing? Because we can't guarantee that everyone (or anyone) would be willing/able to use the selected filesystems that we have blessed? You think requiring everyone to install PostgreSQL would be bad, do you really want to try to force them all to use ReiserFS on Linux as their only supported option? Unless you are using a filesystem that works for this, right? Like xfs, vxfs, reiserfs, and probably ffs2. I believe that linux's ext3 has support for hashing directories (or soon will - I don't precisely know as I've been focusing on other things) My understanding is that ext3fs is dead. The work that Stephen Tweedie had been doing stopped long ago, and even then it was only a minor tweak over ext2fs. I don't believe that this work has been picked up again or extended to include other features. -- Brad Knowles, [EMAIL PROTECTED] They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++): a C++(+++)$ UMBSHI$ P+++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+() DI+() D+(++) G+() e++ h--- r---(+++)* z(+++) ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Oct 29, 2003, at 8:53 PM, Barry Warsaw wrote: Chuq, do you think it would be feasible for Mailman to try to handle that 90% itself, and then only hand-off to a Real MTA when it runs into trouble with the other 10% -- assuming it could know when it runs into trouble. I think you have enough on your plate to not re-invent what others have already done pretty well. When you run out of features to implement, then think about this. Not until. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, 2003-10-29 at 23:37, Chuq Von Rospach wrote: troll until you have to split it across two disks because one is full. and don't forget, a single monolithic storage file gets backed up fully every time you change it. The guy in charge of buying tapes to back up your system just screamed in agony, since there's no possibility of an incremental backup for what is 99.999% static data. /troll Actually, newer versions of ZODB have a script called repozo.py which makes incremental backups feasible. It knows a lot about FileStorage's formats. Also note that there are alternative storage implementations such as BerkeleyDB-based storage (slow, but presumably more reliable) and the 3rd party DirectoryStorage. We'll talk about databases in another thread. I have my own biases, but I'm too tired now to get into it. -Barry ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, 2003-10-29 at 23:55, J C Lawrence wrote: That said there's some value in getting a versioning data store with rollback support for list configs. +1 The data volume isn't huge, but it is highly sensitive. I'd also like to see flat text logging of all configuration changes in addition to moderation activity. +1 -Barry ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, 2003-10-29 at 23:46, J C Lawrence wrote: There are good reasons I use DirectoryStorage: $ find /var/lib/zope/instance/default/var/Data_fs_dir -type f | wc -l 499266 Lotsa little teensy files! :) -Barry ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, 29 Oct 2003 23:50:22 -0500 Barry Warsaw [EMAIL PROTECTED] wrote: On Wed, 2003-10-29 at 23:26, Brad Knowles wrote: There, I have to disagree. Both the web server and the mail server issues are complex enough that I don't believe it would be a good idea to try and re-invent this wheel. There are already enough bad web server and mail server implementations out there -- we don't need to make this situation worse. Let's not discount the integration problems, which are a huge headache for newbies. I thought the prevalence of canned Mailman packages was doing a lot there? I haven't watched the -users list in a while. I'm fairly certain that Twisted is the right approach for surfacing the web u/i to Mailman. The requirements are not overwhelming and fronting Mailman's u/i with Apache really doesn't buy us that much. Hang-on. Apache isn't the target. Mailman's UI is a CGI app. As such it works with any web server that supports CGI-bin, which pretty much means any web server with no exceptions. That's a pretty large gain, especially in the novice admin or simple deployment case territory. Doing our own thing for HTTP handling can quickly be another Pandora's box, security concern, and integration problem for the (majority of) people who do want to run Apache/Boa/Thttpd/Zeus/etc. We all agree that CGI sucks, and we could make that better with mod_python or some other such glue, but why go to the trouble? CGI sucks yes, but it is the guaranteed common denominator, and CD counts for more than feature whiz-bang at this level. Relying on Twisted for the incoming mail protocols is something I'm less certain about, although there is a lot of appeal to this approach. -1 Tarbaby, pandora's box, security nightmare, unbounded security envelope. We could throw lots smarts into a Python port-25 listener, including global spam fighting and bounce processing. You ___really___ don't want to get into your own SMTP-level bounce processing. Really. That's one huge endlessly sucking time sinker. Let Phillip Hazel, Wietse and the rest spend their time there. An approach like Exim + elspy affords some really cool possibilities. Absolutely, but that is outside of Mailman's territory. More interesting would be things like TMDA integration, or implementing support for Yakov Shafranovich extension of my consent token protocol: http://www.ietf.org/internet-drafts/draft-irtf-asrg-cri-00.txt Getting early buy-in as a sample implementation for an MLM wouldn't be a Bad Thing. There's a lot of really neat and useful integration and feature set territory to explore before you start staring down the MTA's throat. -- J C Lawrence -(*)Satan, oscillate my metallic sonatas. [EMAIL PROTECTED] He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
At 11:43 PM -0500 2003/10/29, J C Lawrence wrote: True, but most of those really end up being a meta-indexing problem. Fair enough. Not a whole lot more complexity. You're just invalidating the pointed-to data as well as the key. You're still not doing free space management. What about your backups? And your off-site backups? And your mirror sites around the world? Any other copies of those files that might have been copied off somewhere else? -- Brad Knowles, [EMAIL PROTECTED] They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++): a C++(+++)$ UMBSHI$ P+++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+() DI+() D+(++) G+() e++ h--- r---(+++)* z(+++) ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
At 11:53 PM -0500 2003/10/29, Barry Warsaw wrote: Chuq, do you think it would be feasible for Mailman to try to handle that 90% itself, and then only hand-off to a Real MTA when it runs into trouble with the other 10% -- assuming it could know when it runs into trouble. Bryan Costales and Eric Allman had this debate at InfoBeat/Mercury Mail. Bryan said that he could write a better simple MTA that could handle the easy 80% and leave the hard 20% to sendmail. Eric showed that he could improve sendmail to the point where it would perform at or near the level of performance of Bryan's code without throwing everything out, and would out-perform every other aspect of the system in question (so that the MTA was no longer the bottleneck at any stage). I'm confident that the same sort of approach is appropriate for other well-respected MTAs (e.g., postfix, and exim in my personal experience). Also, there's incoming SMTP and outgoing SMTP. It may be possible to build in support for one direction without providing the other. (It also may not be worth it.) It's hard enough writing an incoming SMTP handler, and doing it right. Many large service providers have seriously screwed up when trying to do so (bigfoot anyone?), and others have only implemented half of the inbound solution (AOL), leaving the harder parts to standard programs like sendmail. Even then I argued violently against this approach at AOL, and felt that we could do a better job by leaving all the external interfacing/queueing issues to sendmail, and instead make the in-house developed code an LMTP Local Delivery Agent. I was over-ruled, primarily because we had already gone too far down the road that had been chosen for us. Note that none of the original Internet Mail Operations team members are left at AOL (almost all bugged out when the new mail server software came online), and I don't think any of the original Internet Mail Development team members are left, either. Bad Juju, Bwana. I've been down this road before. Trust me, you don't want to do this. -- Brad Knowles, [EMAIL PROTECTED] They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++): a C++(+++)$ UMBSHI$ P+++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+() DI+() D+(++) G+() e++ h--- r---(+++)* z(+++) ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, 29 Oct 2003 20:59:56 -0800 Chuq Von Rospach [EMAIL PROTECTED] wrote: On Oct 29, 2003, at 8:53 PM, Barry Warsaw wrote: Chuq, do you think it would be feasible for Mailman to try to handle that 90% itself, and then only hand-off to a Real MTA when it runs into trouble with the other 10% -- assuming it could know when it runs into trouble. I think you have enough on your plate to not re-invent what others have already done pretty well. When you run out of features to implement, then think about this. Not until. Seconded, in spades. -- J C Lawrence -(*)Satan, oscillate my metallic sonatas. [EMAIL PROTECTED] He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
engineering details. On Oct 29, 2003, at 8:59 PM, Brad Knowles wrote: What about your backups? And your off-site backups? And your mirror sites around the world? Any other copies of those files that might have been copied off somewhere else? ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Oct 29, 2003, at 9:08 PM, Brad Knowles wrote: Bryan Costales and Eric Allman had this debate at InfoBeat/Mercury Mail. Bryan said that he could write a better simple MTA that could handle the easy 80% and leave the hard 20% to sendmail. There is no such thing as a simple MTA. This gets hairy quickly. Really quickly. you are much better off spending money on a good fast disk RAID (since the chances that you'll win the lottery are on par with the chances that your bottleneck is NOT disk I/O in mail sending) than on a programmer to try to build fast MTAs. that none of the original Internet Mail Operations team members are left at AOL (almost all bugged out when the new mail server software came online), and I don't think any of the original Internet Mail Development team members are left, either. And boy, does it show. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Thu, 30 Oct 2003 05:59:48 +0100 Brad Knowles [EMAIL PROTECTED] wrote: At 11:43 PM -0500 2003/10/29, J C Lawrence wrote: Not a whole lot more complexity. You're just invalidating the pointed-to data as well as the key. You're still not doing free space management. What about your backups? And your off-site backups? And your mirror sites around the world? Any other copies of those files that might have been copied off somewhere else I'm not going to touch the aspects of attempting to rewrite the data in backup sets without invalidating the backups. Uhh uhh. No deal. I'm also not going to touch the management of data that has been copied outside of the store's purview. Its no longer in the store's scope and so isn't really under discussion. I can run strings on my Oracle tables as well, but that really doesn't make the resulting data files part of Oracle's data-management model. At its core this is a snapshot issue. What you're really arguing for is the ability to revert, recover, or synchronise (they're all the same thing under the covers) the state of the store in a logically consistent fashion. As such you're interested in logical consistency for not just one Big File, but across files, and across the meta-indexes; logical consistency of the store as a whole. This really isn't a storage format problem. Its a transaction framing problem and a snapshotting problem (which is really jut a transaction framing problem). You need to not only know the state of the data files, but the state of the meta-indexes, and that they are synchronised with each other. This is not a trivial space, but its also not an unknown space. File versioning systems have been messing here for years with change keys and and signatures. Ultimately it comes down to a shared transaction key. The old ATT SCCS papers are a particularly good read in this regard. -- J C Lawrence -(*)Satan, oscillate my metallic sonatas. [EMAIL PROTECTED] He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Thu, 30 Oct 2003 06:08:32 +0100 Brad Knowles [EMAIL PROTECTED] wrote: At 11:53 PM -0500 2003/10/29, Barry Warsaw wrote: Note that none of the original Internet Mail Operations team members are left at AOL (almost all bugged out when the new mail server software came online), and I don't think any of the original Internet Mail Development team members are left, either. Eeek. Not fun. I've been down this road before. Trust me, you don't want to do this. Barry, listen to this man. He speaks sooth. -- J C Lawrence -(*)Satan, oscillate my metallic sonatas. [EMAIL PROTECTED] He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Oct 29, 2003, at 9:21 PM, J C Lawrence wrote: This is not a trivial space, but its also not an unknown space. File versioning systems have been messing here for years with change keys and and signatures. Ultimately it comes down to a shared transaction key. The old ATT SCCS papers are a particularly good read in this regard. How does this statement reconcile with Barry's not wanting to require MySQL or PostgreSQL for Mailman because he doesn't want to layer on too many dependencies to get Mailman running? We seem to be heading off into places where the answer is if we're lucky, it'll run on that cluster of G5's at Uvirginia -- slowly. Unless Barry wants to throw his simplicity requirements out the window, we can't expect high performance filesystems, SANs, fiber optic RAID connects, or for that matter, linux over windows over sgi over solaris 2.5. This stuff that's floating around is great, if we were writing an enterprise-class, mega-bugger IS-supported system for a corporate data center. How's taht all relate to Mailman, anyway? Maybe we should refocus and not wander down interesting but entirely philosophical ratholes? ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, 29 Oct 2003 23:41:37 -0500 Barry Warsaw [EMAIL PROTECTED] wrote: On Wed, 2003-10-29 at 23:17, J C Lawrence wrote: Sounds great. Bring a laptop and we'll bang out some code (anyone else up for a mini-Mailman-3 sprint at my house? :). I'll probably be heading to Fedex Field on the 9th for a 'Skins game, so the 8th would be perfect. Hopefully I know by this Sunday. Will see. Eeeek! I trust this would be for immediate handoff to a real MTA versus handling final delivery directly? Quite the Pandora's box if not. Yep... In what way would this be different from the current SMTP delivery supports? BTW Whatever happened to Michel Pelletier's interfaces PEP? I see the draft, and I see signs that something got done, but not what... Dead in the water AFAIK. Ahh. Last I heard Nigel was still running screaming into the hills. Hey, I love Exim -- Greg's done some very cool stuff with it on mail.{python,zope}.org. But man, I find it hard to track down just the right knob I need to tweak. :) Hehn. I like Exim a lot, and compared to the competition the documentation is superb. I got a note after my Mailman doesn't have enough config options that he'd, err, had a somewhat explosive reaction. -- J C Lawrence -(*)Satan, oscillate my metallic sonatas. [EMAIL PROTECTED] He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Wed, 29 Oct 2003 21:31:51 -0800 Chuq Von Rospach [EMAIL PROTECTED] wrote: On Oct 29, 2003, at 9:21 PM, J C Lawrence wrote: How's taht all relate to Mailman, anyway? Maybe we should refocus and not wander down interesting but entirely philosophical ratholes? Agreed, but then I've said my piece several times on those scores. We need a requirements definition for the abstractions for storage, indexing and presentation. I've already stated my bits there. So far there's been neither argument or commentary, just a bunch of cross-purposes violent agreement between Brad and me. While I like a netnews model as it suits my needs, I really don't care what the store is so long as it solves the problems I've laid out. We need a priori key determination, a collision policy, key handoffs to an indexer (which could be NULL in Chuq's MySQL case), and an improved/adapted presentation layer. I've already said my bits there and proposed what I see as the cheap, easy, incremental improvement course: Twisted's NNTP supports for storage, Message IDs for keys, a variant best-effort detection and rewriting policy for collisions, and a MeoWWW derivative for HTML presentation/posting. Counters? -- J C Lawrence -(*)Satan, oscillate my metallic sonatas. [EMAIL PROTECTED] He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Thu, 30 Oct 2003 05:00:48 +0100 Brad Knowles [EMAIL PROTECTED] wrote: At 10:47 PM -0500 2003/10/29, Barry Warsaw wrote: SGIs XFS on Irix does a pretty good job, with hashed directory structures, and an extent-based journaling filesystem. ReiserFS also does particularly well here. I haven't yet tested IBM's JFS. Last time I hit VxFS hard (back in the HP-UX 20.20 days) it really didn't like huge directories, but that may have changed since then. -- J C Lawrence -(*)Satan, oscillate my metallic sonatas. [EMAIL PROTECTED] He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
At 9:16 PM -0800 2003/10/29, Chuq Von Rospach wrote: There is no such thing as a simple MTA. This gets hairy quickly. Really quickly. Bryan is one of the few people I would expect to be able to do something that could actually handle the easy 80%. Writing the book _sendmail_ (now in its fourth edition) is just one of his many talents. you are much better off spending money on a good fast disk RAID (since the chances that you'll win the lottery are on par with the chances that your bottleneck is NOT disk I/O in mail sending) than on a programmer to try to build fast MTAs. They were already using pure RAM disks for this application. Disk I/O was not the problem. Bryan and Eric were two major contributors to my invited talks Sendmail Performance Tuning for Large Systems (see http://www.shub-internet.org/brad/papers/sendmail-tuning/) and Design and Implementation of Highly Scalable E-mail Systems (see http://www.shub-internet.org/brad/papers/dihses/). These guys are not lightweights in this field. And boy, does it show. Indeed. -- Brad Knowles, [EMAIL PROTECTED] They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++): a C++(+++)$ UMBSHI$ P+++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+() DI+() D+(++) G+() e++ h--- r---(+++)* z(+++) ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Thu, 30 Oct 2003 05:51:32 +0100 Brad Knowles [EMAIL PROTECTED] wrote: I'm sorry, I don't trust ReiserFS at all. I'd trust XFS if it was on Irix, or IBMs JFS, but not ReiserFS. Hell, on a Linux system, I'd use ext2fs before I'd use Reiser. I'll simply note that I've been using ReiserFS on just over a dozen systems ranging from million+ messages a day list servers to build, dev, web, and desktop boxes. I've yet to have problems. ... do you really want to try to force them all to use ReiserFS on Linux as their only supported option? Err, want or consider reasonable? My understanding is that ext3fs is dead. I'd thought that Ted T'so took over some of the reins in his move to IBM, but I haven't chatted to him in a long whiles. -- J C Lawrence -(*)Satan, oscillate my metallic sonatas. [EMAIL PROTECTED] He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
On Thu, 30 Oct 2003 00:56:29 -0500 J C Lawrence J wrote: On Thu, 30 Oct 2003 05:51:32 +0100 Brad Knowles [EMAIL PROTECTED] wrote: I'm sorry, I don't trust ReiserFS at all. I'd trust XFS if it was on Irix, or IBMs JFS, but not ReiserFS. Hell, on a Linux system, I'd use ext2fs before I'd use Reiser. I'll simply note that I've been using ReiserFS on just over a dozen systems ranging from million+ messages a day list servers to build, dev, web, and desktop boxes. I've yet to have problems. Err, add in just under three years. -- J C Lawrence -(*)Satan, oscillate my metallic sonatas. [EMAIL PROTECTED] He lived as a devil, eh? http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live. ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
At 12:49 AM -0500 2003/10/30, J C Lawrence wrote: ReiserFS also does particularly well here. I haven't yet tested IBM's JFS. Last time I hit VxFS hard (back in the HP-UX 20.20 days) it really didn't like huge directories, but that may have changed since then. HP-UX 20.20? I wasn't aware that they had gone much beyond HP-UX 11.x. Did you mean HP-UX 10.20? Now that's a beast I remember, and remember loathing with a passion. HP-UX 9 was slow, but rock-solid -- no matter how hard you beat on the damn thing, it just slowed down but never stopped. HP-UX 10.x was a real dog. HP-UX 11.x looked like it was going to shape up better, but then I got out of AOL before we had many of those systems in house. -- Brad Knowles, [EMAIL PROTECTED] They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++): a C++(+++)$ UMBSHI$ P+++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+() DI+() D+(++) G+() e++ h--- r---(+++)* z(+++) ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers
Re: [Mailman-Developers] Requirements for a new archiver
At 12:40 AM -0500 2003/10/30, J C Lawrence wrote: We need a priori key determination, a collision policy, key handoffs to an indexer (which could be NULL in Chuq's MySQL case), and an improved/adapted presentation layer. As far as this goes, I agree. I've already said my bits there and proposed what I see as the cheap, easy, incremental improvement course: Twisted's NNTP supports for storage, Message IDs for keys, a variant best-effort detection and rewriting policy for collisions, and a MeoWWW derivative for HTML presentation/posting. I don't know anything about Twisted or MeoWWW, so I can't say how they address the subjects above. I can say that I'm not sure about an NNTP-based storage solution, although certain storage techniques we've recently discussed borrow a lot from extant NNTP implementations, and I'm not sure how much sense it would make to rip out just those parts we know we need, or if we could actually reasonably take the whole thing, kit-n-caboodle. I do believe that we need an alternative solution to the message-id header as it was presented to us in the message, as a stable guaranteed unique (well, as good as MD-5 or SHA-1 gets) message identifier that can always be used to refer to the exact same message no matter what. Whether we use this message identifier as a replacement for the message-id header value as it was presented to us -- I think that's a more philosophical discussion, and I think we should address it by allowing both options but deciding which would be a reasonable default to take. Given that the mailman UI is basically completely contained within the CGI, I'm inclined to leave it there and work on improving it internally, allowing us to continue to work with most any webserver the client may have. I don't know how MeoWWW addresses this issue, either by replacing the webserver, or providing additional tools that may make it easier to present a good and consistent UI. -- Brad Knowles, [EMAIL PROTECTED] They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++): a C++(+++)$ UMBSHI$ P+++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+() DI+() D+(++) G+() e++ h--- r---(+++)* z(+++) ___ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers