Re: Request for help - cleaning spam from the debian-boot mailing list archive

2009-06-08 Thread Christian Perrier
Quoting Frans Pop (elen...@planet.nl):
 On Monday 01 June 2009, Christian Perrier wrote:
  To be even more efficient, I wonder if there's a possibility to
  download list archives as a mailbox. That would make spam tagging more
  efficient than going through the web interface.
 
 scp master.debian.org:~debian/lists/debian-boot/debian-boot.mm.gz .
 
 Only works for DDs obviously. Disadvantage is that this archive will still 
 have all spam that's already been removed...
 I'm sticking with the web interface myself.


Yesterday, I grabbed several such mailboxes.

Before working on them, I passed the messages through CRM114, which I
already use for a while to set scores on my incoming messages:

zcat debian-boot.200608.gz | formail -s /usr/bin/crm -u /home/bubulle/.crm114/ 
mailfilter.crm  debian-boot.200608.scored


That creates a new scored mailbox where messages have additionnal
headers, including:

X-CRM114-Status: Good  ( pR: 161.9126 )
or
X-CRM114-Status: UNSURE (1.1278) This message is 'unsure'; please train it!
or
X-CRM114-Status: SPAM  ( pR: -15.1978 )


In my .muttrc, I have this:
color header white black ^X-CRM114-Status:.*Good.*
color header blue black ^X-CRM114-Status:.*SPAM.*
color header red black ^X-CRM114-Status:.*UNSURE.*

Then I read this mailbox with mutt.

unsure messages appear in cyan and sure spams appear in red.

Then, I can tag messages ('T' in mutt's default keymapping) easily
by using the colors as a helper (of course I *do* check for false
positives) and also go through messages identified as non
spam.and tag those that are actually spam.

Then, all these tagged messages are piped to my report list spam
macroand also identified as spam to CRM114 (pipe them to 
$HOME/.crm114/mailfilter.crm -u $HOME/.crm114/ ss-pam --force

Then, all good messages are identified as ham to CRM114.


As a conclusion, I found this method quite more efficient than using
the web interfaceand, of course, it allows working offline, which
is a must-have for me.




signature.asc
Description: Digital signature


Re: Request for help - cleaning spam from the debian-boot mailing list archive

2009-06-01 Thread Christian Perrier
Quoting Stefano Canepa (s...@linux.it):

 I started with 2006/01, added my nick into the table on the wiki.


Could you do 2008/08 to 2009/01? These are the most recent ones that
still have only 4 reviews 2007/08 to 2007/12 are also good
targets.

Stefano, also don't forget about increasing the number of reviews
when adding your nick to a month (I corrected the two months you did
yesterday FWIW).

Great work, everybody, by the way. I recently went through a month
that already got the 5 reviews and where spam was obviously cleaned
oout and this is impressive. Before that action, we had huge spam
storms from time to time that were completely cluttering out the
archives.

To be even more efficient, I wonder if there's a possibility to download
list archives as a mailbox. That would make spam tagging more
efficient than going through the web interface.





signature.asc
Description: Digital signature


Re: Request for help - cleaning spam from the debian-boot mailing list archive

2009-06-01 Thread Stefano Canepa
Il giorno lun, 01/06/2009 alle 08.59 +0200, Christian Perrier ha
scritto:
 Could you do 2008/08 to 2009/01? These are the most recent ones that
 still have only 4 reviews 2007/08 to 2007/12 are also good
 targets.

OK, I'm going to review them today.

 Stefano, also don't forget about increasing the number of reviews
 when adding your nick to a month (I corrected the two months you did
 yesterday FWIW).

Sorry for my mistake.

 To be even more efficient, I wonder if there's a possibility to download
 list archives as a mailbox. That would make spam tagging more
 efficient than going through the web interface.

I think that: a link to get back to the list you are reviewing from the
thanks page and a link added at the end of the email so that you can
mark spam from you MUA would be helpfull. I'm thinking to open a
wishlist bugs.

Bye
Stefano

-- 
Stefano Canepa aka sc: s...@linux.it - http://www.stefanocanepa.it
Three great virtues of a programmer: laziness, impatience and hubris.
Le tre grandi virtù di un programmatore: pigrizia, impazienza e
arroganza. (Larry Wall)


signature.asc
Description: Questa è una parte del messaggio firmata digitalmente


Re: Request for help - cleaning spam from the debian-boot mailing list archive

2009-06-01 Thread Frans Pop
On Monday 01 June 2009, Christian Perrier wrote:
 To be even more efficient, I wonder if there's a possibility to
 download list archives as a mailbox. That would make spam tagging more
 efficient than going through the web interface.

scp master.debian.org:~debian/lists/debian-boot/debian-boot.mm.gz .

Only works for DDs obviously. Disadvantage is that this archive will still 
have all spam that's already been removed...
I'm sticking with the web interface myself.


-- 
To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: Request for help - cleaning spam from the debian-boot mailing list archive

2009-06-01 Thread Christian Perrier
Quoting Frans Pop (elen...@planet.nl):

 Only works for DDs obviously. Disadvantage is that this archive will still 
 have all spam that's already been removed...
 I'm sticking with the web interface myself.


Thanks. I'll make a few trys. It's probably OK to use the mailbox for
the first reviews when it's very likely that very few spam has
already been removed.

With the web interface, I found a quite fast way to move around
archives already, particularly when there's a big bunch of successive
spams. That works with Konqueror:

Click on first spam
Tab, quickly read the file to check this is a spam, Enter
Alt-Left, Alt-Left
Tab (moves to the next message
and so on

That saves many clicks, which, with a web interface is often the most
time-consuming activity..:-)




signature.asc
Description: Digital signature


Re: Request for help - cleaning spam from the debian-boot mailing list archive

2009-06-01 Thread Stefano Canepa
Il giorno lun, 01/06/2009 alle 20.28 +0200, Christian Perrier ha
scritto:
 With the web interface, I found a quite fast way to move around
 archives already, particularly when there's a big bunch of successive
 spams. That works with Konqueror:
 
 Click on first spam
 Tab, quickly read the file to check this is a spam, Enter
 Alt-Left, Alt-Left
 Tab (moves to the next message
 and so on
 
The same applies to iceweasel and epiphany

Bye
Stefano

PS: Christian, sorry I hit reply to sender instead of reply to list.

-- 
Stefano Canepa aka sc: s...@linux.it - http://www.stefanocanepa.it
Three great virtues of a programmer: laziness, impatience and hubris.
Le tre grandi virtù di un programmatore: pigrizia, impazienza e
arroganza. (Larry Wall)


signature.asc
Description: Questa è una parte del messaggio firmata digitalmente


Re: Request for help - cleaning spam from the debian-boot mailing list archive

2009-05-31 Thread Stefano Canepa
Il giorno dom, 17/05/2009 alle 06.29 +0200, Frans Pop ha scritto:
...

 Current status can be seen on:
 http://wiki.debian.org/DebianInstaller/SpamClean
 
 Additional help to scan the archive and nominate posts is always welcome.

I can do some work, tell me which month needs more help.

Bye
Stefano

-- 
Stefano Canepa aka sc: s...@linux.it - http://www.stefanocanepa.it
Three great virtues of a programmer: laziness, impatience and hubris.
Le tre grandi virtù di un programmatore: pigrizia, impazienza e
arroganza. (Larry Wall)


signature.asc
Description: Questa è una parte del messaggio firmata digitalmente


Re: Request for help - cleaning spam from the debian-boot mailing list archive

2009-05-31 Thread Frans Pop
On Sunday 31 May 2009, Stefano Canepa wrote:
 Il giorno dom, 17/05/2009 alle 06.29 +0200, Frans Pop ha scritto:
  Current status can be seen on:
  http://wiki.debian.org/DebianInstaller/SpamClean
 
  Additional help to scan the archive and nominate posts is always
  welcome.

 I can do some work, tell me which month needs more help.

That's great.

Basically any month that has not yet had 5 reviews is a target.
I'd suggest to start with the months having the lowest number of reviews 
(2006/01-04) and then the months in 2009, 2008 and 2007 with only 4 
reviews.

It would be great if 2008 and 2009 could get full coverage (5 reviews for 
all months) this week.

Cheers,
FJP

P.S. I plan to start on years before 2006 next week.


-- 
To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: Request for help - cleaning spam from the debian-boot mailing list archive

2009-05-31 Thread Stefano Canepa
Il giorno dom, 31/05/2009 alle 19.51 +0200, Frans Pop ha scritto:
 On Sunday 31 May 2009, Stefano Canepa wrote:
  Il giorno dom, 17/05/2009 alle 06.29 +0200, Frans Pop ha scritto:
   Current status can be seen on:
   http://wiki.debian.org/DebianInstaller/SpamClean
  
   Additional help to scan the archive and nominate posts is always
   welcome.
 
  I can do some work, tell me which month needs more help.
 
 That's great.
 
 Basically any month that has not yet had 5 reviews is a target.
 I'd suggest to start with the months having the lowest number of reviews 
 (2006/01-04) and then the months in 2009, 2008 and 2007 with only 4 
 reviews.

I started with 2006/01, added my nick into the table on the wiki.

Bye
sc

-- 
Stefano Canepa aka sc: s...@linux.it - http://www.stefanocanepa.it
Three great virtues of a programmer: laziness, impatience and hubris.
Le tre grandi virtù di un programmatore: pigrizia, impazienza e
arroganza. (Larry Wall)


signature.asc
Description: Questa è una parte del messaggio firmata digitalmente


Re: Request for help - cleaning spam from the debian-boot mailing list archive

2009-05-26 Thread Holger Wansing
Hi,

Christian Perrier bubu...@debian.org wrote:
 I found interesting to see that among months I recently worked on,
 September 2007 had a huge amount of spam (including a terrible spam
 storm in the middle of the month), August 2007 has a fairly high
 number, bit May, June and July had nearly no spam at all.

Did they receive at least 5 nominations?
On http://wiki.debian.org/DebianInstaller/SpamClean there is only a 
small number of months in 2006 and 2007, that received checks by 
at least 5 persons.



Holger

-- 

==
Created with Sylpheed 2.5.0
under the NEW DEBIAN GNU/LINUX 5.0.0 - L E N N Y
http://counter.li.org/,  Registered LinuxUser #311290
=


-- 
To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: Request for help - cleaning spam from the debian-boot mailing list archive

2009-05-26 Thread Christian Perrier
Quoting Holger Wansing (li...@wansing-online.de):
 Hi,
 
 Christian Perrier bubu...@debian.org wrote:
  I found interesting to see that among months I recently worked on,
  September 2007 had a huge amount of spam (including a terrible spam
  storm in the middle of the month), August 2007 has a fairly high
  number, bit May, June and July had nearly no spam at all.
 
 Did they receive at least 5 nominations?
 On http://wiki.debian.org/DebianInstaller/SpamClean there is only a 
 small number of months in 2006 and 2007, that received checks by 
 at least 5 persons.


Yes. That's what surprised me. Then I figured out that maybe some
*other* people did reviews without mentioning it on the wiki
pageor did such reviews before the wiki page was setup.




signature.asc
Description: Digital signature


Re: Request for help - cleaning spam from the debian-boot mailing list archive

2009-05-24 Thread Frans Pop
On Sunday 17 May 2009, Frans Pop wrote:
 On Sunday 03 May 2009, Frans Pop wrote:
  I'm looking forward to a cleaner archive! If we share the workload a
  bit, that should be possible.

 We now have a solid team working on this and if we keep this up it
 looks that in 4 or 5 weeks we can have d-boot virtually clean of spam
 for 2006 and later.

Excellent progress again. This week a massive 1125 spams got removed.
The number of new posts available for review remains fairly constant: 600.

It's also fairly clear what's left to do. considered is almost fully 
explained by removed + classified ham + the 600 available for review.
That leaves the difference between nominated and considered as our to 
do list. These are posts that have received at least one nomination, but 
not yet the five needed to enter the review stage. There will be some 
incorrect nominations in there, but I expect most to spams from months 
that have not had a full scan yet.

Updated status can be seen on:
http://wiki.debian.org/DebianInstaller/SpamClean

Cheers,
FJP


signature.asc
Description: This is a digitally signed message part.


Re: Request for help - cleaning spam from the debian-boot mailing list archive

2009-05-24 Thread Christian Perrier
Quoting Frans Pop (elen...@planet.nl):
 On Sunday 17 May 2009, Frans Pop wrote:
  On Sunday 03 May 2009, Frans Pop wrote:
   I'm looking forward to a cleaner archive! If we share the workload a
   bit, that should be possible.
 
  We now have a solid team working on this and if we keep this up it
  looks that in 4 or 5 weeks we can have d-boot virtually clean of spam
  for 2006 and later.
 
 Excellent progress again. This week a massive 1125 spams got removed.
 The number of new posts available for review remains fairly constant: 600.

As every week, I performed a full review of the 620 I had to review,
this morning. I guess you did so too, Frans.

I found interesting to see that among months I recently worked on,
September 2007 had a huge amount of spam (including a terrible spam
storm in the middle of the month), August 2007 has a fairly high
number, bit May, June and July had nearly no spam at all.




signature.asc
Description: Digital signature


Re: Request for help - cleaning spam from the debian-boot mailing list archive

2009-05-17 Thread Christian Perrier
(cleaning spam in debian-boot: see
http://wiki.debian.org/DebianInstaller/SpamClean)

Quoting Frans Pop (elen...@planet.nl):

 have been done. This has already resulted in 676 spams being removed from 
 the archive and for this week another 650 posts are waiting for review.

Done this morning. No ham found, only Spam.

I found out that I inadvertently left a few posts rated as Unsure
which i apparently not really easy to come back on once you've clicked
Send and Continue. That happens because my fast Click, Page Down
repetitions on the page sometimes fails (the click does not change the
button status to Spam. So, my reviews for this week probably have a
few (less than 10) posts erroneously rated as Unsure while
everything was spam, unboubtfully.

 
 We now have a solid team working on this and if we keep this up it looks 
 that in 4 or 5 weeks we can have d-boot virtually clean of spam for 2006 
 and later.

I wish we would have as many people working on patches to D-I..:-)



A few enhancements I would propose to listmasters (or anyone behind
the review tool):

- have a display mode for the list archives where messages already
nominated would be shown in a different way (maybe sorting messages
by number of spam nominations?). That would help those people who
review archives after 1 or 2 people already did it to spot possible
spam more easily

- allow reviewing more than 10 nominated posts at a time. This is
probably what slows me down the most when reviewing. For the record,
this morning, I spent about 40 minutes reviewing 550 nominated posts.

- allow coming back on messages one once rated as Unsure

Finally, once we've done 2006-2009, I think we should post something
in http://wiki.debian.org/DeveloperNews. I don't know whether other
people are doing such reviews but the method you (Frans) proposed and
which we nos use could indeed bring emulation in other lists.






signature.asc
Description: Digital signature


Re: Request for help - cleaning spam from the debian-boot mailing list archive

2009-05-17 Thread Cord Beermann
Hallo! Du (Christian Perrier) hast geschrieben:

 A few enhancements I would propose to listmasters (or anyone behind
 the review tool):
 
 - have a display mode for the list archives where messages already
 nominated would be shown in a different way (maybe sorting messages
 by number of spam nominations?). That would help those people who
 review archives after 1 or 2 people already did it to spot possible
 spam more easily

Although i don't understand fully your suggestion i fear that this
would lead to less quality in the review, because Reviewers would rely
on other Reviewers.

 - allow reviewing more than 10 nominated posts at a time. This is
 probably what slows me down the most when reviewing. For the record,
 this morning, I spent about 40 minutes reviewing 550 nominated posts.

I chose 10 because i think thats a value that doesn't drive away
'part-time' reviewers. But I think about providing pages with more.

 - allow coming back on messages one once rated as Unsure

Once a week (Sunday 6:00 GMT) a job runs which picks up all ratings
and remove the articles and things. After that articles rated as
'Unsure' will be displayed again. 

Yours,
Cord, Debian Listmaster of the day
-- 
http://lists.debian.org


-- 
To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: Request for help - cleaning spam from the debian-boot mailing list archive

2009-05-17 Thread Christian Perrier
Quoting Cord Beermann (c...@debian.org):
 Hallo! Du (Christian Perrier) hast geschrieben:
 
  A few enhancements I would propose to listmasters (or anyone behind
  the review tool):
  
  - have a display mode for the list archives where messages already
  nominated would be shown in a different way (maybe sorting messages
  by number of spam nominations?). That would help those people who
  review archives after 1 or 2 people already did it to spot possible
  spam more easily
 
 Although i don't understand fully your suggestion i fear that this
 would lead to less quality in the review, because Reviewers would rely
 on other Reviewers.

Yes, this is what I was suggesting, roughly. Making it easier to spot
out what has more probability to be spam.

I agree this makes reviewers depend on other reviewers, so that can be
seen as debatable..:)


 
  - allow reviewing more than 10 nominated posts at a time. This is
  probably what slows me down the most when reviewing. For the record,
  this morning, I spent about 40 minutes reviewing 550 nominated posts.
 
 I chose 10 because i think thats a value that doesn't drive away
 'part-time' reviewers. But I think about providing pages with more.

*that* would help a lot.

 
  - allow coming back on messages one once rated as Unsure
 
 Once a week (Sunday 6:00 GMT) a job runs which picks up all ratings
 and remove the articles and things. After that articles rated as
 'Unsure' will be displayed again. 


Oh, that's perfect, in such case.





signature.asc
Description: Digital signature


Re: Request for help - cleaning spam from the debian-boot mailing list archive

2009-05-16 Thread Frans Pop
On Sunday 03 May 2009, Frans Pop wrote:
 I'm looking forward to a cleaner archive! If we share the workload a
 bit, that should be possible.

First of all: many thanks for the great response to this RFH!

Progress on the review of the archive has been huge. Since the start over 
11,000 nominations as spam have been submitted and about 3400 reviews 
have been done. This has already resulted in 676 spams being removed from 
the archive and for this week another 650 posts are waiting for review.

We now have a solid team working on this and if we keep this up it looks 
that in 4 or 5 weeks we can have d-boot virtually clean of spam for 2006 
and later.

Current status can be seen on:
http://wiki.debian.org/DebianInstaller/SpamClean

Additional help to scan the archive and nominate posts is always welcome.

Cheers,
FJP


signature.asc
Description: This is a digitally signed message part.


Re: Request for help - cleaning spam from the debian-boot mailing list archive

2009-05-14 Thread Christian Perrier
Quoting Frans Pop (elen...@planet.nl):

  OK, starting to go through the review process...I guess you will go
  through it as well.
 
 Well, I'm already done...
 Think I had 1 or 2 ham messages.


I'm done too, finished yesterday. I confirm there was 1 or 2 ham
messages, that's all.




signature.asc
Description: Digital signature


Re: Request for help - cleaning spam from the debian-boot mailing list archive

2009-05-10 Thread Christian Perrier
Quoting Frans Pop (elen...@planet.nl):

 Note that the review batch job (including actual removals, updating
 messages to be reviewed and review statistics) is only run once a week.

Any idea when this is run?

Since the moment (a few days ago) where some months reached mentions
on the wiki page, I checked the messages to be reviewed page but
there was non for -boot, so I guess this is because that batch didn't
happen yet.




signature.asc
Description: Digital signature


Re: Request for help - cleaning spam from the debian-boot mailing list archive

2009-05-10 Thread Frans Pop
On Sunday 10 May 2009, Christian Perrier wrote:
 Quoting Frans Pop (elen...@planet.nl):
  Note that the review batch job (including actual removals, updating
  messages to be reviewed and review statistics) is only run once a
  week.

 Any idea when this is run?

Well, they are updated now while they were not yet updated yesterday. I'll 
let you draw your own conclusions from that ;-)

233 messages removed so far and 700 new messages available for review...


-- 
To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: Request for help - cleaning spam from the debian-boot mailing list archive

2009-05-10 Thread Christian Perrier
Quoting Frans Pop (elen...@planet.nl):

 Well, they are updated now while they were not yet updated yesterday. I'll 
 let you draw your own conclusions from that ;-)
 
 233 messages removed so far and 700 new messages available for review...


Ouch...we worked too well.

OK, starting to go through the review process...I guess you will go
through it as well. I think it would be better if no other DD wastes
time doing reviews as I guess that all messages that we have both
rated as spam will be dropped from the review queue as of next Sunday
(if my guess that only two Spam ratings are enough, provided nobody
rates the same messages as Ham or Inappropriate).

One should notice that, so far, all messages I had to review since
about 1/2 hour were indeed spam while several of the 299 messages I reviewed
before were Ham.






signature.asc
Description: Digital signature


Re: Request for help - cleaning spam from the debian-boot mailing list archive

2009-05-10 Thread Frans Pop
On Sunday 10 May 2009, Christian Perrier wrote:
 Quoting Frans Pop (elen...@planet.nl):
  Well, they are updated now while they were not yet updated yesterday.
  I'll let you draw your own conclusions from that ;-)
 
  233 messages removed so far and 700 new messages available for
  review...

 Ouch...we worked too well.

 OK, starting to go through the review process...I guess you will go
 through it as well.

Well, I'm already done...
Think I had 1 or 2 ham messages.

 I think it would be better if no other DD wastes 
 time doing reviews as I guess that all messages that we have both
 rated as spam will be dropped from the review queue as of next Sunday
 (if my guess that only two Spam ratings are enough, provided nobody
 rates the same messages as Ham or Inappropriate).

No, we need 3 undisputed reviews.


-- 
To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: Request for help - cleaning spam from the debian-boot mailing list archive

2009-05-09 Thread Holger Wansing
Hi,

Christian Perrier bubu...@debian.org wrote:
 From my own experience, it's fairly easy to miss spams in the lists
 of messages, so we really needs a few more people (about 1 or 2, I
 think) to go through the archives.

I want to help here.
I will start at April 2009 and go backwards to the past
(will document at http://wiki.debian.org/DebianInstaller/SpamClean)


Holger

-- 

==
Created with Sylpheed 2.5.0
under the NEW DEBIAN GNU/LINUX 5.0.0 - L E N N Y
http://counter.li.org/,  Registered LinuxUser #311290
=


-- 
To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: Request for help - cleaning spam from the debian-boot mailing list archive

2009-05-09 Thread Christian Perrier
Quoting Holger Wansing (li...@wansing-online.de):
 Hi,
 
 Christian Perrier bubu...@debian.org wrote:
  From my own experience, it's fairly easy to miss spams in the lists
  of messages, so we really needs a few more people (about 1 or 2, I
  think) to go through the archives.
 
 I want to help here.
 I will start at April 2009 and go backwards to the past
 (will document at http://wiki.debian.org/DebianInstaller/SpamClean)


Unless Frans has another advice, I'd suggest concentrating on months
that haven't had 5 reviews already.




signature.asc
Description: Digital signature


Re: Request for help - cleaning spam from the debian-boot mailing list archive

2009-05-09 Thread Holger Wansing
Hi,

Christian Perrier bubu...@debian.org wrote:
 Unless Frans has another advice, I'd suggest concentrating on months
 that haven't had 5 reviews already.

Ath the moment there are only two months which had already 5 (or more) 
reviews.


Holger

-- 

==
Created with Sylpheed 2.5.0
under the NEW DEBIAN GNU/LINUX 5.0.0 - L E N N Y
http://counter.li.org/,  Registered LinuxUser #311290
=


-- 
To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: Request for help - cleaning spam from the debian-boot mailing list archive

2009-05-09 Thread Frans Pop
On Saturday 09 May 2009, Christian Perrier wrote:
 Quoting Holger Wansing (li...@wansing-online.de):
  Hi,
 
  Christian Perrier bubu...@debian.org wrote:
   From my own experience, it's fairly easy to miss spams in the
   lists of messages, so we really needs a few more people (about 1 or
   2, I think) to go through the archives.
 
  I want to help here.
  I will start at April 2009 and go backwards to the past
  (will document at http://wiki.debian.org/DebianInstaller/SpamClean)

 Unless Frans has another advice, I'd suggest concentrating on months
 that haven't had 5 reviews already.

Note that the review batch job (including actual removals, updating
messages to be reviewed and review statistics) is only run once a week.

This means that if a month in the archive has already had 2 or 3 checks by 
others, it probably makes sense to wait a week or two [1] as there's a 
good chance that some messages will already be removed by then.

I expect quite a few spam messages to already have a few nominations, so 
having 2 or 3 additional nominations may be enough to start getting them 
reviewed and removed.
We should eventually have 5 scans for every month, but it seems smart to 
delay the last 2 for some time to avoid needless work.

[1] One week for messages with 5+ nominations to be included for review 
and one more week for actual reviews by DDs followed by the actual 
removal.


-- 
To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: Request for help - cleaning spam from the debian-boot mailing list archive

2009-05-05 Thread Christian Perrier
Quoting Frans Pop (elen...@planet.nl):

 Well, new messages will be nominated all the time, especially if enough 
 people respond to this RFH.

As of now, there seem to be 5 people working on the initial step
(identify spam in list archives): fjp philbat bubulle fp dww.

That just enough to bring potential spam mails to the second step
(review) as each must receive 5 nominations.

From my own experience, it's fairly easy to miss spams in the lists
of messages, so we really needs a few more people (about 1 or 2, I
think) to go through the archives.

Interestingly, this is a task that one can do by relatively small
bits: it takes me about 15 minutes to go through one month (about 1000
messages)...

 Take a look at http://wiki.debian.org/DebianInstaller/SpamClean, the 
 second table, which has these statistics for d-boot. That shows we now 
 have 18297 nominations in total for 4657 different messages. Of these 659 
 are considered (have received enough nominations to get to the review 
 stage), so IIUC that's the number that needs to be reviewed and that 
 should probably be the total you see for d-boot when you select the list 
 for reviewing.
 
 There have been 591 reviews (although I doubt that number a bit) and the 
 end result is that so far 16 messages have been removed from the archive.


I went through all to be reviewed mails.

For a post to be removed, it has to be rated as spam by at least THREE
DD (assuming nobody rates it as Ham or Inappropriate).

So we at least need another DD to commit self to do reviews.




signature.asc
Description: Digital signature


Re: Request for help - cleaning spam from the debian-boot mailing list archive

2009-05-04 Thread Christian Perrier
Quoting Frans Pop (elen...@planet.nl):

 The removal of spam gets done in three stages:

I started working on this yesterday (apparently a few others have
also, which is good).

I did a little bit of:

 1) a spam message needs to be reported by multiple people (using the
Report as spam button displayed at the top of each message)

But also tried to go through:


 2) this then needs to be reviewed by multiple DDs (using the new tools)


Is there a way, with that step, to know *how many* left messages there
are?

When going to these new tools, recorded spam messages are shown in
batches of 10 and clicking on Send an continue shows you with
another batch of 10 and so onin a more or less random order.

So, indeed, there is no indication whether we a re close to the end
or far away from it, etc...

I have seen this page:
http://lists.debian.org/archive-spam-removals/review/stats.html

But I don't know if some of the numbers here are meaningful wrt this
taks of reviewing reported spam.

Anyway, thanks for the initiative, Frans. I missed the appearance of
these new tools and it seems that they'll be very useful to lean out
our archives.




signature.asc
Description: Digital signature


Re: Request for help - cleaning spam from the debian-boot mailing list archive

2009-05-04 Thread Frans Pop
On Monday 04 May 2009, Christian Perrier wrote:
  2) this then needs to be reviewed by multiple DDs (using the new
  tools)

 Is there a way, with that step, to know *how many* left messages there
 are?

The page where you select a mailing list shows how many nominated messages 
there are to be reviewed and how many have already been reviewed by you.

 When going to these new tools, recorded spam messages are shown in
 batches of 10 and clicking on Send an continue shows you with
 another batch of 10 and so onin a more or less random order.

 So, indeed, there is no indication whether we a re close to the end
 or far away from it, etc...

Well, new messages will be nominated all the time, especially if enough 
people respond to this RFH.

 I have seen this page:
 http://lists.debian.org/archive-spam-removals/review/stats.html

 But I don't know if some of the numbers here are meaningful wrt this
 taks of reviewing reported spam.

Take a look at http://wiki.debian.org/DebianInstaller/SpamClean, the 
second table, which has these statistics for d-boot. That shows we now 
have 18297 nominations in total for 4657 different messages. Of these 659 
are considered (have received enough nominations to get to the review 
stage), so IIUC that's the number that needs to be reviewed and that 
should probably be the total you see for d-boot when you select the list 
for reviewing.

There have been 591 reviews (although I doubt that number a bit) and the 
end result is that so far 16 messages have been removed from the archive.

I have some questions about some of the numbers and will send a mail to 
listmasters a bit later to request clarification.

Cheers,
FJP


-- 
To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org