Re: [dspam-users] Trouble training DSPAM

2008-11-04 Thread Chris Baldwin

Kyle,


Thanks, that made a world of difference. Luckily, my co-workers like to 
hang on to old email, so I have a reasonably large selection of both 
spam and ham. My first pass, just to make sure everything worked, well, 
worked. This is what dspam_stats is giving me.


   TP:   368 TN:91 FP:14 FN: 0 SC: 0 NC: 0
   SHR:  100.00%   HSR:   13.33%   OCA:   97.04%

I just fed dspam_train around 10GB of mail, both spam and ham, so we'll 
see what happens.



-Chris

Kyle Johnson wrote:

Hello Chris,

You should be using the dspam_train program for training.  What you 
were doing, using dspam with --class (though you would also need 
--source), is used for retraining (correcting an error).


You need to pass a username, a path to spam, and a path to ham (both 
of which are in maildir format), to dspam_train:

dspam_train username /path/to/spam /path/to/ham
You can also use an index file, which tells dspam_train where to find 
the spam and ham files.


If you have a mbox, there are a number of programs on the web which 
will convert your mail into maildir format.


But remember that you need to train spam and ham.  If you train only 
one, you will probably mess up accuracy.


Hope this helps,
-Kyle

On Tue, Nov 4, 2008 at 2:02 PM, Chris Baldwin 
<[EMAIL PROTECTED] > wrote:


Hi,

I'm having some trouble training DSPAM. I am using an mbox that I
dumped a fair amount of spam into, and then I'm running this command:

  formail -s dspam --client --user my.username --class=spam
--source=corpus --mode=teft < Spam.train &

However, when I look at the results, all the spam is tagged as
Innocent:

15347: [11/04/2008 13:56:14] libdspam returned probability of 0.00
15347: [11/04/2008 13:56:14] message result: NOT SPAM
15347: [11/04/2008 13:56:14] appending header X-DSPAM-Result: Innocent

Am I missing something here?

To make things more confusing form my end, dspam_stats tells me
that every single piece of mail that I've fed dspam is a True
Negative. The problem is that I've also fed it a few hundred ham
messages, using the same syntax as above (w/ --class=innocent),
and getting a similar result as above.

Here's the configure, just so you know how it's set up:
./configure --enable-daemon --enable-syslog
--enable-long-usernames --with-storage-driver=hash_drv
--with-delivery-agent=procmail --enable-verbose-debug

I'd appreciate any ideas or suggestions on what to do at this point.

-Chris Baldwin







 


!DSPAM:1011,4910b233150921932574660!




Re: [dspam-users] Trouble training DSPAM

2008-11-04 Thread Kyle Johnson
Hello Chris,

You should be using the dspam_train program for training.  What you were
doing, using dspam with --class (though you would also need --source), is
used for retraining (correcting an error).

You need to pass a username, a path to spam, and a path to ham (both of
which are in maildir format), to dspam_train:
dspam_train username /path/to/spam /path/to/ham
You can also use an index file, which tells dspam_train where to find the
spam and ham files.

If you have a mbox, there are a number of programs on the web which will
convert your mail into maildir format.

But remember that you need to train spam and ham.  If you train only one,
you will probably mess up accuracy.

Hope this helps,
-Kyle

On Tue, Nov 4, 2008 at 2:02 PM, Chris Baldwin
<[EMAIL PROTECTED]>wrote:

> Hi,
>
> I'm having some trouble training DSPAM. I am using an mbox that I dumped a
> fair amount of spam into, and then I'm running this command:
>
>   formail -s dspam --client --user my.username --class=spam --source=corpus
> --mode=teft < Spam.train &
>
> However, when I look at the results, all the spam is tagged as Innocent:
>
> 15347: [11/04/2008 13:56:14] libdspam returned probability of 0.00
> 15347: [11/04/2008 13:56:14] message result: NOT SPAM
> 15347: [11/04/2008 13:56:14] appending header X-DSPAM-Result: Innocent
>
> Am I missing something here?
>
> To make things more confusing form my end, dspam_stats tells me that every
> single piece of mail that I've fed dspam is a True Negative. The problem is
> that I've also fed it a few hundred ham messages, using the same syntax as
> above (w/ --class=innocent), and getting a similar result as above.
>
> Here's the configure, just so you know how it's set up: ./configure
> --enable-daemon --enable-syslog --enable-long-usernames
> --with-storage-driver=hash_drv --with-delivery-agent=procmail
> --enable-verbose-debug
>
> I'd appreciate any ideas or suggestions on what to do at this point.
>
> -Chris Baldwin
>
>
>
> 
>
>
>


!DSPAM:1011,4910a6a7150921582336289!