Update of /cvsroot/spambayes/spambayes/testtools
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv14471/testtools
Modified Files:
incremental.HOWTO.txt
Log Message:
Minor updates.
Index: incremental.HOWTO.txt
===================================================================
RCS file: /cvsroot/spambayes/spambayes/testtools/incremental.HOWTO.txt,v
retrieving revision 1.5
retrieving revision 1.6
diff -C2 -d -r1.5 -r1.6
*** incremental.HOWTO.txt 28 Dec 2003 01:12:11 -0000 1.5
--- incremental.HOWTO.txt 7 Apr 2005 01:34:41 -0000 1.6
***************
*** 1,9 ****
- Yes, this is a lame attempt at explaining what I've built,
- in the vain hope that someone will read it and improve it.
- I'm writing this with only about 4 hours sleep, so my
- coherency may not be particularly high.
-
-
-
There are a few steps to doing incremental training tests:
--- 1,2 ----
***************
*** 12,24 ****
sequence and group them. The corpora need to be in the
good old familiar Data/{Ham,Spam}/{reservoir,Set*} tree.
! For my purposes, I wrote the es2hs.py tool to grab stuff
! out of my real MH mail archive folders; other people may
! want some other method of getting the corpora into the
! tree.
2. Sort and group the corpora. When testing, messages will
be processed in sorted order. The messages should all
have unique names with a group number and an id number
! separated by a dash (eg. 0123-004556). I wrote
sort+group.py for this. sort+group.py sorts the messages
into chronological order (by topmost Received header) and
--- 5,18 ----
sequence and group them. The corpora need to be in the
good old familiar Data/{Ham,Spam}/{reservoir,Set*} tree.
! For my (Alex) purposes, I wrote the es2hs.py tool to grab
! stuff out of my real MH mail archive folders; other people
! may want some other method of getting the corpora into the
! tree. If you're using Outlook, then the
! Outlook2000/export.py script is what you are after.
2. Sort and group the corpora. When testing, messages will
be processed in sorted order. The messages should all
have unique names with a group number and an id number
! separated by a dash (eg. 0123-004556). I (Alex) wrote
sort+group.py for this. sort+group.py sorts the messages
into chronological order (by topmost Received header) and
***************
*** 30,35 ****
the oldest msg found.
! Note that this script will run through *all* the files in
! the Data directory, not just those in Data/Ham and Data/Spam.
3. Distribute the corpora into multiple sets so you can do
--- 24,32 ----
the oldest msg found.
! With 1.0.x, note that this script will run through *all* the
! files in the Data directory, not just those in Data/Ham and
! Data/Spam. With 1.1, only those specified in the
! ham_directories and spam_directories will be used, unless
! the -a option is used.
3. Distribute the corpora into multiple sets so you can do
***************
*** 64,71 ****
to do this, outputting datasets for plotmtv. plotmtv is
a really neat data visualization tool. Use it. Love it.
! Gods, I need more sleep.
See dotest.sh for a sample of automating steps 4 & 5.
-
- Please, somebody rewrite this file.
-
--- 61,65 ----
to do this, outputting datasets for plotmtv. plotmtv is
a really neat data visualization tool. Use it. Love it.
! XXX tools for Excel.
See dotest.sh for a sample of automating steps 4 & 5.
_______________________________________________
Spambayes-checkins mailing list
[email protected]
http://mail.python.org/mailman/listinfo/spambayes-checkins