Re: Request for help - cleaning spam from the debian-boot mailing list archive
Quoting Frans Pop (elen...@planet.nl): On Monday 01 June 2009, Christian Perrier wrote: To be even more efficient, I wonder if there's a possibility to download list archives as a mailbox. That would make spam tagging more efficient than going through the web interface. scp master.debian.org:~debian/lists/debian-boot/debian-boot.mm.gz . Only works for DDs obviously. Disadvantage is that this archive will still have all spam that's already been removed... I'm sticking with the web interface myself. Yesterday, I grabbed several such mailboxes. Before working on them, I passed the messages through CRM114, which I already use for a while to set scores on my incoming messages: zcat debian-boot.200608.gz | formail -s /usr/bin/crm -u /home/bubulle/.crm114/ mailfilter.crm debian-boot.200608.scored That creates a new scored mailbox where messages have additionnal headers, including: X-CRM114-Status: Good ( pR: 161.9126 ) or X-CRM114-Status: UNSURE (1.1278) This message is 'unsure'; please train it! or X-CRM114-Status: SPAM ( pR: -15.1978 ) In my .muttrc, I have this: color header white black ^X-CRM114-Status:.*Good.* color header blue black ^X-CRM114-Status:.*SPAM.* color header red black ^X-CRM114-Status:.*UNSURE.* Then I read this mailbox with mutt. unsure messages appear in cyan and sure spams appear in red. Then, I can tag messages ('T' in mutt's default keymapping) easily by using the colors as a helper (of course I *do* check for false positives) and also go through messages identified as non spam.and tag those that are actually spam. Then, all these tagged messages are piped to my report list spam macroand also identified as spam to CRM114 (pipe them to $HOME/.crm114/mailfilter.crm -u $HOME/.crm114/ ss-pam --force Then, all good messages are identified as ham to CRM114. As a conclusion, I found this method quite more efficient than using the web interfaceand, of course, it allows working offline, which is a must-have for me. signature.asc Description: Digital signature
Re: Request for help - cleaning spam from the debian-boot mailing list archive
Quoting Stefano Canepa (s...@linux.it): I started with 2006/01, added my nick into the table on the wiki. Could you do 2008/08 to 2009/01? These are the most recent ones that still have only 4 reviews 2007/08 to 2007/12 are also good targets. Stefano, also don't forget about increasing the number of reviews when adding your nick to a month (I corrected the two months you did yesterday FWIW). Great work, everybody, by the way. I recently went through a month that already got the 5 reviews and where spam was obviously cleaned oout and this is impressive. Before that action, we had huge spam storms from time to time that were completely cluttering out the archives. To be even more efficient, I wonder if there's a possibility to download list archives as a mailbox. That would make spam tagging more efficient than going through the web interface. signature.asc Description: Digital signature
Re: Request for help - cleaning spam from the debian-boot mailing list archive
Il giorno lun, 01/06/2009 alle 08.59 +0200, Christian Perrier ha scritto: Could you do 2008/08 to 2009/01? These are the most recent ones that still have only 4 reviews 2007/08 to 2007/12 are also good targets. OK, I'm going to review them today. Stefano, also don't forget about increasing the number of reviews when adding your nick to a month (I corrected the two months you did yesterday FWIW). Sorry for my mistake. To be even more efficient, I wonder if there's a possibility to download list archives as a mailbox. That would make spam tagging more efficient than going through the web interface. I think that: a link to get back to the list you are reviewing from the thanks page and a link added at the end of the email so that you can mark spam from you MUA would be helpfull. I'm thinking to open a wishlist bugs. Bye Stefano -- Stefano Canepa aka sc: s...@linux.it - http://www.stefanocanepa.it Three great virtues of a programmer: laziness, impatience and hubris. Le tre grandi virtù di un programmatore: pigrizia, impazienza e arroganza. (Larry Wall) signature.asc Description: Questa è una parte del messaggio firmata digitalmente
Re: Request for help - cleaning spam from the debian-boot mailing list archive
On Monday 01 June 2009, Christian Perrier wrote: To be even more efficient, I wonder if there's a possibility to download list archives as a mailbox. That would make spam tagging more efficient than going through the web interface. scp master.debian.org:~debian/lists/debian-boot/debian-boot.mm.gz . Only works for DDs obviously. Disadvantage is that this archive will still have all spam that's already been removed... I'm sticking with the web interface myself. -- To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: Request for help - cleaning spam from the debian-boot mailing list archive
Quoting Frans Pop (elen...@planet.nl): Only works for DDs obviously. Disadvantage is that this archive will still have all spam that's already been removed... I'm sticking with the web interface myself. Thanks. I'll make a few trys. It's probably OK to use the mailbox for the first reviews when it's very likely that very few spam has already been removed. With the web interface, I found a quite fast way to move around archives already, particularly when there's a big bunch of successive spams. That works with Konqueror: Click on first spam Tab, quickly read the file to check this is a spam, Enter Alt-Left, Alt-Left Tab (moves to the next message and so on That saves many clicks, which, with a web interface is often the most time-consuming activity..:-) signature.asc Description: Digital signature
Re: Request for help - cleaning spam from the debian-boot mailing list archive
Il giorno lun, 01/06/2009 alle 20.28 +0200, Christian Perrier ha scritto: With the web interface, I found a quite fast way to move around archives already, particularly when there's a big bunch of successive spams. That works with Konqueror: Click on first spam Tab, quickly read the file to check this is a spam, Enter Alt-Left, Alt-Left Tab (moves to the next message and so on The same applies to iceweasel and epiphany Bye Stefano PS: Christian, sorry I hit reply to sender instead of reply to list. -- Stefano Canepa aka sc: s...@linux.it - http://www.stefanocanepa.it Three great virtues of a programmer: laziness, impatience and hubris. Le tre grandi virtù di un programmatore: pigrizia, impazienza e arroganza. (Larry Wall) signature.asc Description: Questa è una parte del messaggio firmata digitalmente
Re: Request for help - cleaning spam from the debian-boot mailing list archive
Il giorno dom, 17/05/2009 alle 06.29 +0200, Frans Pop ha scritto: ... Current status can be seen on: http://wiki.debian.org/DebianInstaller/SpamClean Additional help to scan the archive and nominate posts is always welcome. I can do some work, tell me which month needs more help. Bye Stefano -- Stefano Canepa aka sc: s...@linux.it - http://www.stefanocanepa.it Three great virtues of a programmer: laziness, impatience and hubris. Le tre grandi virtù di un programmatore: pigrizia, impazienza e arroganza. (Larry Wall) signature.asc Description: Questa è una parte del messaggio firmata digitalmente
Re: Request for help - cleaning spam from the debian-boot mailing list archive
On Sunday 31 May 2009, Stefano Canepa wrote: Il giorno dom, 17/05/2009 alle 06.29 +0200, Frans Pop ha scritto: Current status can be seen on: http://wiki.debian.org/DebianInstaller/SpamClean Additional help to scan the archive and nominate posts is always welcome. I can do some work, tell me which month needs more help. That's great. Basically any month that has not yet had 5 reviews is a target. I'd suggest to start with the months having the lowest number of reviews (2006/01-04) and then the months in 2009, 2008 and 2007 with only 4 reviews. It would be great if 2008 and 2009 could get full coverage (5 reviews for all months) this week. Cheers, FJP P.S. I plan to start on years before 2006 next week. -- To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: Request for help - cleaning spam from the debian-boot mailing list archive
Il giorno dom, 31/05/2009 alle 19.51 +0200, Frans Pop ha scritto: On Sunday 31 May 2009, Stefano Canepa wrote: Il giorno dom, 17/05/2009 alle 06.29 +0200, Frans Pop ha scritto: Current status can be seen on: http://wiki.debian.org/DebianInstaller/SpamClean Additional help to scan the archive and nominate posts is always welcome. I can do some work, tell me which month needs more help. That's great. Basically any month that has not yet had 5 reviews is a target. I'd suggest to start with the months having the lowest number of reviews (2006/01-04) and then the months in 2009, 2008 and 2007 with only 4 reviews. I started with 2006/01, added my nick into the table on the wiki. Bye sc -- Stefano Canepa aka sc: s...@linux.it - http://www.stefanocanepa.it Three great virtues of a programmer: laziness, impatience and hubris. Le tre grandi virtù di un programmatore: pigrizia, impazienza e arroganza. (Larry Wall) signature.asc Description: Questa è una parte del messaggio firmata digitalmente
Re: Request for help - cleaning spam from the debian-boot mailing list archive
Hi, Christian Perrier bubu...@debian.org wrote: I found interesting to see that among months I recently worked on, September 2007 had a huge amount of spam (including a terrible spam storm in the middle of the month), August 2007 has a fairly high number, bit May, June and July had nearly no spam at all. Did they receive at least 5 nominations? On http://wiki.debian.org/DebianInstaller/SpamClean there is only a small number of months in 2006 and 2007, that received checks by at least 5 persons. Holger -- == Created with Sylpheed 2.5.0 under the NEW DEBIAN GNU/LINUX 5.0.0 - L E N N Y http://counter.li.org/, Registered LinuxUser #311290 = -- To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: Request for help - cleaning spam from the debian-boot mailing list archive
Quoting Holger Wansing (li...@wansing-online.de): Hi, Christian Perrier bubu...@debian.org wrote: I found interesting to see that among months I recently worked on, September 2007 had a huge amount of spam (including a terrible spam storm in the middle of the month), August 2007 has a fairly high number, bit May, June and July had nearly no spam at all. Did they receive at least 5 nominations? On http://wiki.debian.org/DebianInstaller/SpamClean there is only a small number of months in 2006 and 2007, that received checks by at least 5 persons. Yes. That's what surprised me. Then I figured out that maybe some *other* people did reviews without mentioning it on the wiki pageor did such reviews before the wiki page was setup. signature.asc Description: Digital signature
Re: Request for help - cleaning spam from the debian-boot mailing list archive
On Sunday 17 May 2009, Frans Pop wrote: On Sunday 03 May 2009, Frans Pop wrote: I'm looking forward to a cleaner archive! If we share the workload a bit, that should be possible. We now have a solid team working on this and if we keep this up it looks that in 4 or 5 weeks we can have d-boot virtually clean of spam for 2006 and later. Excellent progress again. This week a massive 1125 spams got removed. The number of new posts available for review remains fairly constant: 600. It's also fairly clear what's left to do. considered is almost fully explained by removed + classified ham + the 600 available for review. That leaves the difference between nominated and considered as our to do list. These are posts that have received at least one nomination, but not yet the five needed to enter the review stage. There will be some incorrect nominations in there, but I expect most to spams from months that have not had a full scan yet. Updated status can be seen on: http://wiki.debian.org/DebianInstaller/SpamClean Cheers, FJP signature.asc Description: This is a digitally signed message part.
Re: Request for help - cleaning spam from the debian-boot mailing list archive
Quoting Frans Pop (elen...@planet.nl): On Sunday 17 May 2009, Frans Pop wrote: On Sunday 03 May 2009, Frans Pop wrote: I'm looking forward to a cleaner archive! If we share the workload a bit, that should be possible. We now have a solid team working on this and if we keep this up it looks that in 4 or 5 weeks we can have d-boot virtually clean of spam for 2006 and later. Excellent progress again. This week a massive 1125 spams got removed. The number of new posts available for review remains fairly constant: 600. As every week, I performed a full review of the 620 I had to review, this morning. I guess you did so too, Frans. I found interesting to see that among months I recently worked on, September 2007 had a huge amount of spam (including a terrible spam storm in the middle of the month), August 2007 has a fairly high number, bit May, June and July had nearly no spam at all. signature.asc Description: Digital signature
Re: Request for help - cleaning spam from the debian-boot mailing list archive
(cleaning spam in debian-boot: see http://wiki.debian.org/DebianInstaller/SpamClean) Quoting Frans Pop (elen...@planet.nl): have been done. This has already resulted in 676 spams being removed from the archive and for this week another 650 posts are waiting for review. Done this morning. No ham found, only Spam. I found out that I inadvertently left a few posts rated as Unsure which i apparently not really easy to come back on once you've clicked Send and Continue. That happens because my fast Click, Page Down repetitions on the page sometimes fails (the click does not change the button status to Spam. So, my reviews for this week probably have a few (less than 10) posts erroneously rated as Unsure while everything was spam, unboubtfully. We now have a solid team working on this and if we keep this up it looks that in 4 or 5 weeks we can have d-boot virtually clean of spam for 2006 and later. I wish we would have as many people working on patches to D-I..:-) A few enhancements I would propose to listmasters (or anyone behind the review tool): - have a display mode for the list archives where messages already nominated would be shown in a different way (maybe sorting messages by number of spam nominations?). That would help those people who review archives after 1 or 2 people already did it to spot possible spam more easily - allow reviewing more than 10 nominated posts at a time. This is probably what slows me down the most when reviewing. For the record, this morning, I spent about 40 minutes reviewing 550 nominated posts. - allow coming back on messages one once rated as Unsure Finally, once we've done 2006-2009, I think we should post something in http://wiki.debian.org/DeveloperNews. I don't know whether other people are doing such reviews but the method you (Frans) proposed and which we nos use could indeed bring emulation in other lists. signature.asc Description: Digital signature
Re: Request for help - cleaning spam from the debian-boot mailing list archive
Hallo! Du (Christian Perrier) hast geschrieben: A few enhancements I would propose to listmasters (or anyone behind the review tool): - have a display mode for the list archives where messages already nominated would be shown in a different way (maybe sorting messages by number of spam nominations?). That would help those people who review archives after 1 or 2 people already did it to spot possible spam more easily Although i don't understand fully your suggestion i fear that this would lead to less quality in the review, because Reviewers would rely on other Reviewers. - allow reviewing more than 10 nominated posts at a time. This is probably what slows me down the most when reviewing. For the record, this morning, I spent about 40 minutes reviewing 550 nominated posts. I chose 10 because i think thats a value that doesn't drive away 'part-time' reviewers. But I think about providing pages with more. - allow coming back on messages one once rated as Unsure Once a week (Sunday 6:00 GMT) a job runs which picks up all ratings and remove the articles and things. After that articles rated as 'Unsure' will be displayed again. Yours, Cord, Debian Listmaster of the day -- http://lists.debian.org -- To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: Request for help - cleaning spam from the debian-boot mailing list archive
Quoting Cord Beermann (c...@debian.org): Hallo! Du (Christian Perrier) hast geschrieben: A few enhancements I would propose to listmasters (or anyone behind the review tool): - have a display mode for the list archives where messages already nominated would be shown in a different way (maybe sorting messages by number of spam nominations?). That would help those people who review archives after 1 or 2 people already did it to spot possible spam more easily Although i don't understand fully your suggestion i fear that this would lead to less quality in the review, because Reviewers would rely on other Reviewers. Yes, this is what I was suggesting, roughly. Making it easier to spot out what has more probability to be spam. I agree this makes reviewers depend on other reviewers, so that can be seen as debatable..:) - allow reviewing more than 10 nominated posts at a time. This is probably what slows me down the most when reviewing. For the record, this morning, I spent about 40 minutes reviewing 550 nominated posts. I chose 10 because i think thats a value that doesn't drive away 'part-time' reviewers. But I think about providing pages with more. *that* would help a lot. - allow coming back on messages one once rated as Unsure Once a week (Sunday 6:00 GMT) a job runs which picks up all ratings and remove the articles and things. After that articles rated as 'Unsure' will be displayed again. Oh, that's perfect, in such case. signature.asc Description: Digital signature
Re: Request for help - cleaning spam from the debian-boot mailing list archive
On Sunday 03 May 2009, Frans Pop wrote: I'm looking forward to a cleaner archive! If we share the workload a bit, that should be possible. First of all: many thanks for the great response to this RFH! Progress on the review of the archive has been huge. Since the start over 11,000 nominations as spam have been submitted and about 3400 reviews have been done. This has already resulted in 676 spams being removed from the archive and for this week another 650 posts are waiting for review. We now have a solid team working on this and if we keep this up it looks that in 4 or 5 weeks we can have d-boot virtually clean of spam for 2006 and later. Current status can be seen on: http://wiki.debian.org/DebianInstaller/SpamClean Additional help to scan the archive and nominate posts is always welcome. Cheers, FJP signature.asc Description: This is a digitally signed message part.
Re: Request for help - cleaning spam from the debian-boot mailing list archive
Quoting Frans Pop (elen...@planet.nl): OK, starting to go through the review process...I guess you will go through it as well. Well, I'm already done... Think I had 1 or 2 ham messages. I'm done too, finished yesterday. I confirm there was 1 or 2 ham messages, that's all. signature.asc Description: Digital signature
Re: Request for help - cleaning spam from the debian-boot mailing list archive
Quoting Frans Pop (elen...@planet.nl): Note that the review batch job (including actual removals, updating messages to be reviewed and review statistics) is only run once a week. Any idea when this is run? Since the moment (a few days ago) where some months reached mentions on the wiki page, I checked the messages to be reviewed page but there was non for -boot, so I guess this is because that batch didn't happen yet. signature.asc Description: Digital signature
Re: Request for help - cleaning spam from the debian-boot mailing list archive
On Sunday 10 May 2009, Christian Perrier wrote: Quoting Frans Pop (elen...@planet.nl): Note that the review batch job (including actual removals, updating messages to be reviewed and review statistics) is only run once a week. Any idea when this is run? Well, they are updated now while they were not yet updated yesterday. I'll let you draw your own conclusions from that ;-) 233 messages removed so far and 700 new messages available for review... -- To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: Request for help - cleaning spam from the debian-boot mailing list archive
Quoting Frans Pop (elen...@planet.nl): Well, they are updated now while they were not yet updated yesterday. I'll let you draw your own conclusions from that ;-) 233 messages removed so far and 700 new messages available for review... Ouch...we worked too well. OK, starting to go through the review process...I guess you will go through it as well. I think it would be better if no other DD wastes time doing reviews as I guess that all messages that we have both rated as spam will be dropped from the review queue as of next Sunday (if my guess that only two Spam ratings are enough, provided nobody rates the same messages as Ham or Inappropriate). One should notice that, so far, all messages I had to review since about 1/2 hour were indeed spam while several of the 299 messages I reviewed before were Ham. signature.asc Description: Digital signature
Re: Request for help - cleaning spam from the debian-boot mailing list archive
On Sunday 10 May 2009, Christian Perrier wrote: Quoting Frans Pop (elen...@planet.nl): Well, they are updated now while they were not yet updated yesterday. I'll let you draw your own conclusions from that ;-) 233 messages removed so far and 700 new messages available for review... Ouch...we worked too well. OK, starting to go through the review process...I guess you will go through it as well. Well, I'm already done... Think I had 1 or 2 ham messages. I think it would be better if no other DD wastes time doing reviews as I guess that all messages that we have both rated as spam will be dropped from the review queue as of next Sunday (if my guess that only two Spam ratings are enough, provided nobody rates the same messages as Ham or Inappropriate). No, we need 3 undisputed reviews. -- To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: Request for help - cleaning spam from the debian-boot mailing list archive
Hi, Christian Perrier bubu...@debian.org wrote: From my own experience, it's fairly easy to miss spams in the lists of messages, so we really needs a few more people (about 1 or 2, I think) to go through the archives. I want to help here. I will start at April 2009 and go backwards to the past (will document at http://wiki.debian.org/DebianInstaller/SpamClean) Holger -- == Created with Sylpheed 2.5.0 under the NEW DEBIAN GNU/LINUX 5.0.0 - L E N N Y http://counter.li.org/, Registered LinuxUser #311290 = -- To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: Request for help - cleaning spam from the debian-boot mailing list archive
Quoting Holger Wansing (li...@wansing-online.de): Hi, Christian Perrier bubu...@debian.org wrote: From my own experience, it's fairly easy to miss spams in the lists of messages, so we really needs a few more people (about 1 or 2, I think) to go through the archives. I want to help here. I will start at April 2009 and go backwards to the past (will document at http://wiki.debian.org/DebianInstaller/SpamClean) Unless Frans has another advice, I'd suggest concentrating on months that haven't had 5 reviews already. signature.asc Description: Digital signature
Re: Request for help - cleaning spam from the debian-boot mailing list archive
Hi, Christian Perrier bubu...@debian.org wrote: Unless Frans has another advice, I'd suggest concentrating on months that haven't had 5 reviews already. Ath the moment there are only two months which had already 5 (or more) reviews. Holger -- == Created with Sylpheed 2.5.0 under the NEW DEBIAN GNU/LINUX 5.0.0 - L E N N Y http://counter.li.org/, Registered LinuxUser #311290 = -- To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: Request for help - cleaning spam from the debian-boot mailing list archive
On Saturday 09 May 2009, Christian Perrier wrote: Quoting Holger Wansing (li...@wansing-online.de): Hi, Christian Perrier bubu...@debian.org wrote: From my own experience, it's fairly easy to miss spams in the lists of messages, so we really needs a few more people (about 1 or 2, I think) to go through the archives. I want to help here. I will start at April 2009 and go backwards to the past (will document at http://wiki.debian.org/DebianInstaller/SpamClean) Unless Frans has another advice, I'd suggest concentrating on months that haven't had 5 reviews already. Note that the review batch job (including actual removals, updating messages to be reviewed and review statistics) is only run once a week. This means that if a month in the archive has already had 2 or 3 checks by others, it probably makes sense to wait a week or two [1] as there's a good chance that some messages will already be removed by then. I expect quite a few spam messages to already have a few nominations, so having 2 or 3 additional nominations may be enough to start getting them reviewed and removed. We should eventually have 5 scans for every month, but it seems smart to delay the last 2 for some time to avoid needless work. [1] One week for messages with 5+ nominations to be included for review and one more week for actual reviews by DDs followed by the actual removal. -- To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: Request for help - cleaning spam from the debian-boot mailing list archive
Quoting Frans Pop (elen...@planet.nl): Well, new messages will be nominated all the time, especially if enough people respond to this RFH. As of now, there seem to be 5 people working on the initial step (identify spam in list archives): fjp philbat bubulle fp dww. That just enough to bring potential spam mails to the second step (review) as each must receive 5 nominations. From my own experience, it's fairly easy to miss spams in the lists of messages, so we really needs a few more people (about 1 or 2, I think) to go through the archives. Interestingly, this is a task that one can do by relatively small bits: it takes me about 15 minutes to go through one month (about 1000 messages)... Take a look at http://wiki.debian.org/DebianInstaller/SpamClean, the second table, which has these statistics for d-boot. That shows we now have 18297 nominations in total for 4657 different messages. Of these 659 are considered (have received enough nominations to get to the review stage), so IIUC that's the number that needs to be reviewed and that should probably be the total you see for d-boot when you select the list for reviewing. There have been 591 reviews (although I doubt that number a bit) and the end result is that so far 16 messages have been removed from the archive. I went through all to be reviewed mails. For a post to be removed, it has to be rated as spam by at least THREE DD (assuming nobody rates it as Ham or Inappropriate). So we at least need another DD to commit self to do reviews. signature.asc Description: Digital signature
Re: Request for help - cleaning spam from the debian-boot mailing list archive
Quoting Frans Pop (elen...@planet.nl): The removal of spam gets done in three stages: I started working on this yesterday (apparently a few others have also, which is good). I did a little bit of: 1) a spam message needs to be reported by multiple people (using the Report as spam button displayed at the top of each message) But also tried to go through: 2) this then needs to be reviewed by multiple DDs (using the new tools) Is there a way, with that step, to know *how many* left messages there are? When going to these new tools, recorded spam messages are shown in batches of 10 and clicking on Send an continue shows you with another batch of 10 and so onin a more or less random order. So, indeed, there is no indication whether we a re close to the end or far away from it, etc... I have seen this page: http://lists.debian.org/archive-spam-removals/review/stats.html But I don't know if some of the numbers here are meaningful wrt this taks of reviewing reported spam. Anyway, thanks for the initiative, Frans. I missed the appearance of these new tools and it seems that they'll be very useful to lean out our archives. signature.asc Description: Digital signature
Re: Request for help - cleaning spam from the debian-boot mailing list archive
On Monday 04 May 2009, Christian Perrier wrote: 2) this then needs to be reviewed by multiple DDs (using the new tools) Is there a way, with that step, to know *how many* left messages there are? The page where you select a mailing list shows how many nominated messages there are to be reviewed and how many have already been reviewed by you. When going to these new tools, recorded spam messages are shown in batches of 10 and clicking on Send an continue shows you with another batch of 10 and so onin a more or less random order. So, indeed, there is no indication whether we a re close to the end or far away from it, etc... Well, new messages will be nominated all the time, especially if enough people respond to this RFH. I have seen this page: http://lists.debian.org/archive-spam-removals/review/stats.html But I don't know if some of the numbers here are meaningful wrt this taks of reviewing reported spam. Take a look at http://wiki.debian.org/DebianInstaller/SpamClean, the second table, which has these statistics for d-boot. That shows we now have 18297 nominations in total for 4657 different messages. Of these 659 are considered (have received enough nominations to get to the review stage), so IIUC that's the number that needs to be reviewed and that should probably be the total you see for d-boot when you select the list for reviewing. There have been 591 reviews (although I doubt that number a bit) and the end result is that so far 16 messages have been removed from the archive. I have some questions about some of the numbers and will send a mail to listmasters a bit later to request clarification. Cheers, FJP -- To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Request for help - cleaning spam from the debian-boot mailing list archive
Hi all, Yes, I do mean all: D-I team, D-I users, lurkers, everybody! Now that the listmasters have created a working toolset to actually remove spam from the mailing list archives [1], it seems like a good idea to make a coordinated effort to clean spam from the list archives. The removal of spam gets done in three stages: 1) a spam message needs to be reported by multiple people (using the Report as spam button displayed at the top of each message) 2) this then needs to be reviewed by multiple DDs (using the new tools) 3) the actual removal The call for help here is mainly for 1) and is something everybody can help with. To make sure not too many people work on the same messages, I've created a wiki page to coordinate this step: http://wiki.debian.org/DebianInstaller/SpamClean The idea is that you add your name/nick/initials when you start the review of a months. 3 or 4 people for each month should be enough. Please do review the complete month when you add your name! But we'll also need a few DDs doing 2) on a regular basis. I myself have today done 1) for the months Jan-Apr of 2009 and 2) for all open reports. I intend to continue working on this for the next few months. This also showed that the three-step procedure is very much needed: even with the safeguards there were surprisingly many messages reported as spam that were valid messages. I'm looking forward to a cleaner archive! If we share the workload a bit, that should be possible. Cheers, FJP [1] http://lists.debian.org/debian-devel-announce/2009/04/msg00012.html (item: RFH: Removing spam from the listarchive) http://wiki.debian.org/Teams/ListMaster/ListArchiveSpam signature.asc Description: This is a digitally signed message part.