subject:"Re\: \[Mailman\-Developers\] Improving the archives"

Re: [Mailman-Developers] Improving the archives

2007-11-03 Thread Stephen J. Turnbull

Craig Loomis writes: >Globally unique IDs, hashed IDs, etc., are very appealing from > various CS-y and techie points of view, but are simply not memorable > to humans or knowable by dumb external programs. I think as much, or > more, effort should be put into delivering a straightfo

Re: [Mailman-Developers] Improving the archives

2007-11-03 Thread Jeff Breidenbach

>but if you can trust yourself to generate them, consecutive >integers provide minimal, order-preserving, perfect hashing, too! Hmm this sounds pretty sensible to me. Jeff ___ Mailman-Developers mailing list Mailman-Developers@python.org http://mail

Re: [Mailman-Developers] Improving the archives

2007-10-30 Thread Craig Loomis

Or Re: [Mailman-Developers 10417] Improving the archives I would like to interject and highlight some use cases for stable and predictable IDs. For us, "message IDs" are directly used both by people and ignorant programs. Our mailing lists serve as a permanent and concise record of ou

Re: [Mailman-Developers] Improving the archives

2007-10-03 Thread Ian Eiloart

--On 2 October 2007 22:47:35 -0400 Barry Warsaw <[EMAIL PROTECTED]> wrote: > One question: should the angle brackets on the Message-ID be part of > the hash or not? I think they should, or IOW, the entire value of > the Message-ID header is taken as the hash, though they should be > stripped o

Re: [Mailman-Developers] Improving the archives

2007-10-02 Thread Jeff Breidenbach

Question: what about crossposted messages? Let's say a message gets sent to a list called mailman-developers with a CC to a list called pet-bunnies. Hypothetically, of course. Presumably, the person who got the message from pet-bunnies should probably end up at the pet-bunnies archive, where the m

Re: [Mailman-Developers] Improving the archives

2007-10-02 Thread Barry Warsaw

-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Aug 8, 2007, at 1:04 AM, Dale Newfield wrote: > Jeff Breidenbach wrote: >> 5.85 million messages > >> That's 0.03% if you count all the messages. It is 0.008% if you >> discard the top three offenders, all of which I have contacted. > > I'd say tha

Re: [Mailman-Developers] Improving the archives

2007-08-07 Thread Dale Newfield

Jeff Breidenbach wrote: > 5.85 million messages > That's 0.03% if you count all the messages. It is 0.008% if you > discard the top three offenders, all of which I have contacted. I'd say that's a strong argument for just using the Message-ID and simplifying this tremendously... ...Barry, do yo

Re: [Mailman-Developers] Improving the archives

2007-08-07 Thread Jeff Breidenbach

> What we really want to know is how many (non-empty) Message-ID > collisions are there that *don't* share a Date? This is the number of > messages that only-messageid loses, and that the composite identifier > method would not lose. I took a look at a larger dataset, 5.85 million messages from s

Re: [Mailman-Developers] Improving the archives

2007-08-01 Thread Jeff Breidenbach

> 704 messages fall into this category. Of these, 596 come from a > single (malfunctioning and duplicate spewing) list server. I have > not yet examined the remaining 208 messages, but I'll bet anything > many also have duplicate message bodies. Or are spam. So for this > data set, we have an upper

Re: [Mailman-Developers] Improving the archives

2007-08-01 Thread Jeff Breidenbach

> What we really want to know is how many (non-empty) Message-ID > collisions are there that *don't* share a Date? This is the number of > messages that only-messageid loses, and that the composite identifier > method would not lose. It took longer than expected, but I now have numbers from looki

Re: [Mailman-Developers] Improving the archives

2007-07-26 Thread Jeff Breidenbach

> If you are relying on the sender to do the right thing, then > why not force them to create proper message-ids? I think Barry's proposal is essentially a numbers game - e.g. he's hoping for significantly better results using "Date" in the calculation than not using it. http://wiki.list.org/disp

Re: [Mailman-Developers] Improving the archives

2007-07-26 Thread Dale Newfield

Jeff Breidenbach wrote: > So I just looked at 2 million raw messages from 2007, spread over > a few thousand mailing lists (all data is from mail-archive.com). My > first question was - when comparing only with messages from the > same list - how many times do I see a repeated message-id? The > ans

Re: [Mailman-Developers] Improving the archives

2007-07-25 Thread Jeff Breidenbach

> If you improve the script or find numbers that lead to different > conclusions, now's the time to know! Live and learn! So I just looked at 2 million raw messages from 2007, spread over a few thousand mailing lists (all data is from mail-archive.com). My first question was - when comparing only

Re: [Mailman-Developers] Improving the archives

2007-07-25 Thread Stephen J. Turnbull

Barry Warsaw writes: > Yes, definitely. What do you think of the base32 examples I have on > the wiki page? They're somewhat better than Message-IDs for readability, but they're not user-friendly. > On Jul 24, 2007, at 1:11 PM, Terri Oda wrote: > > > It seems silly to generate nice shor

Re: [Mailman-Developers] Improving the archives

2007-07-25 Thread Stephen J. Turnbull

Barry Warsaw writes: > I agree, I just don't think message-ids are user friendly enough to > be this canonical url. Especially in this context, which is exactly > where urls are thrown in users faces. An archiving service is > exactly the right place for redirecting human readable urls

Re: [Mailman-Developers] Improving the archives

2007-07-25 Thread Gustav H Meyer

Hi, I think this is the first time that I'm posting here but hopefully not the last. Thanks to everyone involved for an incredible project. I'm not much of a developer but I like practical solutions and will do everything possible to help improve in this area even if it's just to give some feedbac

Re: [Mailman-Developers] Improving the archives

2007-07-25 Thread Jason Fesler

> Guarantee is a pretty strong word. A malicious person could post two > messages with the same message-id, same date, but different bodies. This is my concern too. Especially since this is known information; it is trivial to be malicious. Whatever was done, I think would *have* to deal with '

Re: [Mailman-Developers] Improving the archives

2007-07-25 Thread Barry Warsaw

-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Jul 25, 2007, at 12:47 AM, Jeff Breidenbach wrote: >> What you gain from my proposal over a pure Message-ID approach >> is guaranteed uniqueness given the list copy > > Guarantee is a pretty strong word. A malicious person could post two > messages

Re: [Mailman-Developers] Improving the archives

2007-07-25 Thread Barry Warsaw

-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Jul 24, 2007, at 11:04 PM, Stephen J. Turnbull wrote: >>> So we just specify a header to put it in, and subscribers will be >>> able >>> to use it, per definition of a canonical URL. >> >> It is the archive server's job to decide what is the "can

Re: [Mailman-Developers] Improving the archives

2007-07-25 Thread Barry Warsaw

-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Jul 24, 2007, at 2:03 PM, Jeff Breidenbach wrote: >> Regardless of whether we *need* to generate our own unique ID, I'm >> leaning towards the thought that we're going to *want* to generate >> our own for usability reasons. In a perfect world, i t

Re: [Mailman-Developers] Improving the archives

2007-07-25 Thread Barry Warsaw

-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Jul 24, 2007, at 1:11 PM, Terri Oda wrote: > On 24-Jul-07, at 12:31 PM, Jeff Breidenbach wrote: >>> So we just specify a header to put it in, and subscribers will be >>> able >>> to use it, per definition of a canonical URL. >> It is the archive se

Re: [Mailman-Developers] Improving the archives

2007-07-24 Thread Jeff Breidenbach

> What you gain from my proposal over a pure Message-ID approach > is guaranteed uniqueness given the list copy Guarantee is a pretty strong word. A malicious person could post two messages with the same message-id, same date, but different bodies. Sometimes the channel between the MLM and the arc

Re: [Mailman-Developers] Improving the archives

2007-07-24 Thread Stephen J. Turnbull

Jeff Breidenbach writes: > >So we just specify a header to put it in, and subscribers will be able > >to use it, per definition of a canonical URL. > > It is the archive server's job to decide what is the "canonical" URL > for a message. There's a good chance these archival URLs will be > s

Re: [Mailman-Developers] Improving the archives

2007-07-24 Thread Barry Warsaw

-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Jul 24, 2007, at 12:31 PM, Jeff Breidenbach wrote: >> What complexity? Mailman just does >> >> msg['X-List-Archive-Received-ID'] = Email.msgid() > > Easy to introduce, harder to deal with. The archival server would now > keep track of both the me

Re: [Mailman-Developers] Improving the archives

2007-07-24 Thread Barry Warsaw

-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Jul 24, 2007, at 2:56 AM, Stephen J. Turnbull wrote: > I simply think we should be prepared for applications where relying on > the sender to supply a UUID is not acceptable; we need to be able to > provide one ourselves. Creating UUIDs is a solve

Re: [Mailman-Developers] Improving the archives

2007-07-24 Thread Barry Warsaw

-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Jul 24, 2007, at 2:02 AM, Jeff Breidenbach wrote: > Which brings me to suggestion #2, which is go ahead and write > an RFC on how list servers should embed archival links in messages. > This sounds like an internet wide interoperability issue as mu

Re: [Mailman-Developers] Improving the archives

2007-07-24 Thread Barry Warsaw

-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Jul 22, 2007, at 12:33 PM, Terri Oda wrote: > On 20-Jul-07, at 8:39 AM, Barry Warsaw wrote: >> I've looked at a few lurker archivers and I wasn't blown away by its >> user interface. That's apparently highly configurable though. > > I've been doin

Re: [Mailman-Developers] Improving the archives

2007-07-24 Thread Jeff Breidenbach

> Regardless of whether we *need* to generate our own unique ID, I'm > leaning towards the thought that we're going to *want* to generate > our own for usability reasons. In a perfect world, i think we'd have > a sequence number so I could visit http://example.com/mailman/ > archives/listname/204.

Re: [Mailman-Developers] Improving the archives

2007-07-24 Thread Terri Oda

On 24-Jul-07, at 12:31 PM, Jeff Breidenbach wrote: >> So we just specify a header to put it in, and subscribers will be >> able >> to use it, per definition of a canonical URL. > It is the archive server's job to decide what is the "canonical" URL > for a message. There's a good chance these arch

Re: [Mailman-Developers] Improving the archives

2007-07-24 Thread Dale Newfield

Jeff Breidenbach wrote: > In addition, Barry was talking about concocting a unique > identifier from the Date field and Message-ID. I'm not a big fan of > this idea, because the date field comes from the mail user agent > and is often wildly corrupt; e;g; coming from 100 years in the future. Oh--I

Re: [Mailman-Developers] Improving the archives

2007-07-24 Thread Jeff Breidenbach

There are three different parties coming to the table. One is the mail transfer agent of the sender, another is the list server, and the third is the archive server. Ideally, all three will be happy campers. >So we just specify a header to put it in, and subscribers will be able >to use it, per de

Re: [Mailman-Developers] Improving the archives

2007-07-24 Thread Stephen J. Turnbull

John A. Martin writes: > >> better to go ahead and use the mesage-id, rather than concoct > >> yet another "this time we mean it!" unique identifier. > > st> That's not the point. We're not going to impose this on > st> senders; > > I read the quote as meaning "this time

Re: [Mailman-Developers] Improving the archives

2007-07-24 Thread John A. Martin

>>>>> "st" == Stephen J Turnbull >>>>> "Re: [Mailman-Developers] Improving the archives" >>>>> Tue, 24 Jul 2007 15:56:35 +0900 st> Jeff Breidenbach writes: >> > Notice that of 325146 total messages, 624 of t

Re: [Mailman-Developers] Improving the archives

2007-07-23 Thread Stephen J. Turnbull

Jeff Breidenbach writes: > > Notice that of 325146 total messages, 624 of them had no message-id > > header. Even if you aggregate dup+col, you're still looking at a > > total duplicate rate of 0.29%. > > Message ID's are supposed to be unique. Fortunately, a rule more honored in the obser

Re: [Mailman-Developers] Improving the archives

2007-07-23 Thread Jeff Breidenbach

> Notice that of 325146 total messages, 624 of them had no message-id > header. Even if you aggregate dup+col, you're still looking at a > total duplicate rate of 0.29%. Message ID's are supposed to be unique. This is discussed in in RFC 822: 4.6.1 and RFC 1036: 2.1.5, and probably other places.

Re: [Mailman-Developers] Improving the archives

2007-07-22 Thread Dale Newfield

Terri Oda wrote: > I've been doing a lot of thinking about interface, and I'm coming to > the conclusion that something more like a web bulletin board is > probably the way to go For public lists, the answer may lie in external tools like nabble.com or mailinglistarchive.com Of course, that

Re: [Mailman-Developers] Improving the archives

2007-07-22 Thread Terri Oda

On 20-Jul-07, at 8:39 AM, Barry Warsaw wrote: > I've looked at a few lurker archivers and I wasn't blown away by its > user interface. That's apparently highly configurable though. I've been doing a lot of thinking about interface, and I'm coming to the conclusion that something more like a web

Re: [Mailman-Developers] Improving the archives

2007-07-21 Thread A.M. Kuchling

On Fri, Jul 20, 2007 at 11:16:19AM -0400, Barry Warsaw wrote: > Cool. I wonder if lurker is compatible with Python 2.5's > mailbox.Maildir implementation and whether the two could share the > maildirs. Thanks for the information! It had better be -- Maildir has a published specification. If

Re: [Mailman-Developers] Improving the archives

2007-07-20 Thread Stephen J. Turnbull

Barry Warsaw writes: > But it would have to be subject to the same bounce rules as any other > auto-response which could be used as a spam vector, e.g. limit the > number of bounces per time period and don't include the entire > original message in the bounce But that prevents detecting

Re: [Mailman-Developers] Improving the archives

2007-07-20 Thread Barry Warsaw

-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Jul 20, 2007, at 10:59 AM, Nigel Metheringham wrote: > On 20 Jul 2007, at 15:52, Barry Warsaw wrote: >> Mailman gets the From_ line before passing off to the archiver. >> But that's interesting, does lurker /require/ the From_ line? >> > > Well lur

Re: [Mailman-Developers] Improving the archives

2007-07-20 Thread Nigel Metheringham

On 20 Jul 2007, at 15:52, Barry Warsaw wrote: > Mailman gets the From_ line before passing off to the archiver. > But that's interesting, does lurker /require/ the From_ line? > Well lurker handles Maildir - no From_ but the same info is in the filename, and it can take messages on stdin wit

Re: [Mailman-Developers] Improving the archives

2007-07-20 Thread Barry Warsaw

-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Nigel, On Jul 20, 2007, at 10:38 AM, Nigel Metheringham wrote: > On 20 Jul 2007, at 15:26, Barry Warsaw wrote: >>> BTW lurker gives all messages an ID which is 3 parts separated by >>> periods. The first part is a date field - ie 20070720, the sec

Re: [Mailman-Developers] Improving the archives

2007-07-20 Thread Nigel Metheringham

On 20 Jul 2007, at 15:26, Barry Warsaw wrote: >> BTW lurker gives all messages an ID which is 3 parts separated by >> periods. The first part is a date field - ie 20070720, the second >> part is the receive time, UTC, as 6 digits, and the final part >> is some form of hex id. The nice part is if y

Re: [Mailman-Developers] Improving the archives

2007-07-20 Thread Barry Warsaw

-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Jul 20, 2007, at 9:17 AM, Nigel Metheringham wrote: > > On 20 Jul 2007, at 13:39, Barry Warsaw wrote: >> I've looked at a few lurker archivers and I wasn't blown away by its >> user interface. That's apparently highly configurable though. > > I'd

Re: [Mailman-Developers] Improving the archives

2007-07-20 Thread Barry Warsaw

-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Jul 20, 2007, at 9:31 AM, Stephen J. Turnbull wrote: > Barry Warsaw writes: > >> Second, things can happen to a list >> that might cause this sequence number to get corrupted. > > Add an X-Mailman-Sequence-Number header if not already present. > >

Re: [Mailman-Developers] Improving the archives

2007-07-20 Thread Nigel Metheringham

On 20 Jul 2007, at 13:39, Barry Warsaw wrote: > I've looked at a few lurker archivers and I wasn't blown away by its > user interface. That's apparently highly configurable though. I'd be inclined to agree wrt user interface. Documentation regarding this, and anything else to do with lurker, app

Re: [Mailman-Developers] Improving the archives

2007-07-20 Thread Barry Warsaw

-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Jul 20, 2007, at 9:21 AM, Stephen J. Turnbull wrote: >> How likely is it that two messages with the same message-id and >> date are /not/ duplicates? > > For message id generators that include a time-stamp in the generated > id, approximately the s

Re: [Mailman-Developers] Improving the archives

2007-07-20 Thread Stephen J. Turnbull

Barry Warsaw writes: > Second, things can happen to a list > that might cause this sequence number to get corrupted. Add an X-Mailman-Sequence-Number header if not already present. That doesn't deal with your other comments, but as I point out elsewhere, if you don't use *any* Mailman-specif

Re: [Mailman-Developers] Improving the archives

2007-07-20 Thread Stephen J. Turnbull

Barry Warsaw writes: > First, I want to avoid talking about file system layout. To me, > that's an implementation detail we needn't worry about right now. Agreed. > How likely is it that two messages with the same message-id and > date are /not/ duplicates? For message id generators t

Re: [Mailman-Developers] Improving the archives

2007-07-20 Thread Barry Warsaw

-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Jul 9, 2007, at 11:09 PM, Stephen J. Turnbull wrote: > John A. Martin writes: > >> In the absence of a Message-ID >> on an outgoing mail message many if not most MTAs will add one. Why >> not let Mailman anticipate the need to add a Message-ID whe

Re: [Mailman-Developers] Improving the archives

2007-07-20 Thread Barry Warsaw

-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Jul 8, 2007, at 1:06 AM, Paul Wise wrote: > My personal opinion is that pipermail should be removed and mailman > should not contain a default archiver since there are plenty of good > archivers already (lurker, mhonarc etc). Adding wrappers around

Re: [Mailman-Developers] Improving the archives

2007-07-20 Thread Barry Warsaw

-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Jul 5, 2007, at 12:09 PM, John Dennis wrote: > A little over a year ago I went on a search to find the best open > source > archiver and at that time I came up with Lurker > (http://lurker.sourceforge.net) Since then I believe Lurker has seen a >

Re: [Mailman-Developers] Improving the archives

2007-07-20 Thread Barry Warsaw

-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Jul 4, 2007, at 3:30 PM, Jeff Breidenbach wrote: >> Maybe a way to think about this is that the canonical url is based on >> the message-id, but then there's some way to distill even this down >> to a tinyurl or simple integer that would be stable

Re: [Mailman-Developers] Improving the archives

2007-07-20 Thread Barry Warsaw

-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Jul 4, 2007, at 1:16 PM, Dale Newfield wrote: > Barry Warsaw wrote: >> Maybe a way to think about this is that the canonical url is based on >> the message-id, but then there's some way to distill even this down >> to a tinyurl or simple integer th

Re: [Mailman-Developers] Improving the archives

2007-07-09 Thread Stephen J. Turnbull

John A. Martin writes: > In the absence of a Message-ID > on an outgoing mail message many if not most MTAs will add one. Why > not let Mailman anticipate the need to add a Message-ID when archiving > the message rather than leaving it to the outgoing MTA? Quite. My reason for saying "last

Re: [Mailman-Developers] Improving the archives

2007-07-07 Thread Paul Wise

On 7/3/07, Terri Oda <[EMAIL PROTECTED]> wrote: > I'm trying to remember all the things people have suggested for the > archives in the past so I can figure out what needs to be done and > what might be nice to have, and see if this is doable in the time I > have in the foreseeable future. At lis

Re: [Mailman-Developers] Improving the archives

2007-07-05 Thread Terri Oda

On 5-Jul-07, at 12:09 PM, John Dennis wrote: > A little over a year ago I went on a search to find the best open > source > archiver and at that time I came up with Lurker > (http://lurker.sourceforge.net) Since then I believe Lurker has seen a > major new revision. I also believe Lurker is the a

Re: [Mailman-Developers] Improving the archives

2007-07-05 Thread John Dennis

On Tue, 2007-07-03 at 20:05 -0400, Barry Warsaw wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On Jul 2, 2007, at 11:06 PM, Terri Oda wrote: > > > Since I've largely finished up the coding contract that was eating up > > a lot of my time, I'm thinking that I'd like to do some coding

Re: [Mailman-Developers] Improving the archives

2007-07-04 Thread Jeff Breidenbach

>In which case [the message body link] would be set to something like. > >http://third-party-service/[EMAIL PROTECTED] Just for fun, I did a trial implementation. It works, but the URLs are too long. For example, the URL below spends 59 characters on the messag-id, and 27 characters on the listnam

Re: [Mailman-Developers] Improving the archives

2007-07-04 Thread Jeff Breidenbach

>Maybe a way to think about this is that the canonical url is based on >the message-id, but then there's some way to distill even this down >to a tinyurl or simple integer that would be stable in the face of >full archive regenerations. I'd suggest the reverse. Keep the canoncical archive URL shor

Re: [Mailman-Developers] Improving the archives

2007-07-04 Thread Dale Newfield

I'm all for someone taking ownership of this long-neglected component -- thank you for doing so! Barry Warsaw wrote: > Maybe a way to think about this is that the canonical url is based on > the message-id, but then there's some way to distill even this down > to a tinyurl or simple integer t

Re: [Mailman-Developers] Improving the archives

2007-07-04 Thread John A. Martin

>>>>> "st" == Stephen J Turnbull >>>>> "Re: [Mailman-Developers] Improving the archives" >>>>> Wed, 04 Jul 2007 16:49:58 +0900 st> The main drawback to using Message IDs that I can see is that st> broken MUAs may s

Re: [Mailman-Developers] Improving the archives

2007-07-04 Thread Stephen J. Turnbull

Barry Warsaw writes: > > - archive links that won't break if the archive is rebuilt > > Yes, this is absolutely critical, in fact, I'd put it right at the > top of the list, even more so than a u/i overhaul. Stable urls, with > backward compatible redirecting links if at all possible, wo

Re: [Mailman-Developers] Improving the archives

2007-07-03 Thread Barry Warsaw

-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Steve makes me think of a couple of other wish list items. On Jul 3, 2007, at 7:36 AM, Steve Huston wrote: > On 7/2/07 11:06 PM, Terri Oda wrote: >> - better address obfuscation (maybe by generating pages through cgi) > > I run a few Wordpress sites,

Re: [Mailman-Developers] Improving the archives

2007-07-03 Thread Barry Warsaw

-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Jul 2, 2007, at 11:06 PM, Terri Oda wrote: > Since I've largely finished up the coding contract that was eating up > a lot of my time, I'm thinking that I'd like to do some coding for > fun. And nothing says fun like trying to fix the Mailman arch

Re: [Mailman-Developers] Improving the archives

2007-07-03 Thread Steve Huston

I'll admit to not having read previous discussions on this topic, but I'll also add my 2 here: On 7/2/07 11:06 PM, Terri Oda wrote: > - better address obfuscation (maybe by generating pages through cgi) I run a few Wordpress sites, and there's a plugin I use called PHPEnkoder which does a good j

66 matches

Mail list logo