Re: [Mailman-Developers] Improving the archives

2007-08-07 Thread Jeff Breidenbach
 What we really want to know is how many (non-empty) Message-ID
 collisions are there that *don't* share a Date?  This is the number of
 messages that only-messageid loses, and that the composite identifier
 method would not lose.

I took a look at a larger dataset, 5.85 million messages from several
thousand lists. Of the messages that share message-id but not date,
most come from a small number of based web services.

  875 come from forums.slimdevices.com
  378 come from lists.openplans.org
  265 come from nabble.com
  164 come from egroups.com
  135 come from yahoo.com
  166 come from elsewhere

That's 0.03% if you count all the messages. It is 0.008% if you
discard the top three offenders, all of which I have contacted.
I didn't try contacting Yahoo/eGroups because in my past
experience, talking to a brick wall is easier. I have not analyzed
how many of these messages are spam or have duplicate bodies,
which further discounts the percentages.

Hope this data helps.
___
Mailman-Developers mailing list
Mailman-Developers@python.org
http://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: 
http://www.python.org/cgi-bin/faqw-mm.py?req=showfile=faq01.027.htp


Re: [Mailman-Developers] Improving the archives

2007-08-07 Thread Dale Newfield
Jeff Breidenbach wrote:
 5.85 million messages

 That's 0.03% if you count all the messages. It is 0.008% if you
 discard the top three offenders, all of which I have contacted.

I'd say that's a strong argument for just using the Message-ID and 
simplifying this tremendously...

...Barry, do you disagree?

(It can still be a base32 encoded SHA hash it to make it less user hostile.)
http://wiki.list.org/display/DEV/Stable+URLs

-Dale
___
Mailman-Developers mailing list
Mailman-Developers@python.org
http://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: 
http://www.python.org/cgi-bin/faqw-mm.py?req=showfile=faq01.027.htp