Re: bayes learning '0 messages found'

2010-02-15 Thread smfabac



John Hardin wrote:
 
 On Sat, 13 Feb 2010, smfabac wrote:
 
 Is there a message size limit for sa-learn?
 
 Yes, there is, and sadly sa-learn does not explicitly tell you a message 
 has been skipped because it's too large.
 
 If there's a non-text attachment try deleteing it and re-learning the 
 message.
 
 -- 
   John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
   jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
   key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
 ---
End users want eye candy and the ooo's and hhh's experience
when reading mail. To them email isn't a tool, but an entertainment
form. -- Steve Lake
 ---
   9 days until George Washington's 278th Birthday
 
 

Ok. It's a size problem:

I edited the notspam message and deleted 1000 lines from line 3000 to
4000, saved the file and then reprocessed notspam.

I continued getting 0 messages examined until I had deleted 3000 lines
of the message:

Message size as received:

$ wc -l notspam 
   6408 notspam  -- sa-learn --ham failed on notspam folder
 with one message  of 6000+ lines
$ 

After deleting 3003 lines:

$ wc -l notspam
   3405 notspam
$ vi notspam

 1  ^A^A^A^A
 2  From smf  Thu Feb 11 01:30:02 2010
 3  From: Boyd Lynn Gerber gerb...@zenez.com
 4  To: distribut...@registry.ca
 5  Subject: Quarterly ASCII posting of SCO UnixWare 7/OpenUNIX
8/OpenServer6 FAQ
 6  Date: Thu, 11 Feb 2010 00:05:18 -0700 (MST)
 7  Message-Id: ou8faqqt_1265871...@news.xmission.com

  3395
  3396   filepriv -f setuid programfile.exe
  3397
  3398  --
  3399  Boyd Gerber gerb...@zenez.com 801 849-0213
  3400  ZENEZ   1042 East Fort Union #135, Midvale Utah  84047
  3401
  3402
  3403  =_4B73B21B.8398EDEC--
  3404
  3405  ^A^A^A^A

$ sa-learn --showdots --ham --mbox notspam
.
Learned tokens from 1 message(s) (1 message(s) examined)
$ 
$ wc notspam
  lines: 3405  words:  18735  characters: 130876 notspam


So, does the documentation on sa-learn indicate that there is 
a size limit on the message to be processed?

-- 
View this message in context: 
http://old.nabble.com/bayes-learning-%270-messages-found%27-tp27358517p27590620.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: bayes learning '0 messages found'

2010-02-15 Thread smfabac


Kai Schaetzl wrote:
 
 Smfabac wrote on Mon, 15 Feb 2010 00:20:06 -0800 (PST):
 
 So, does the documentation on sa-learn indicate that there is 
 a size limit on the message to be processed?
 
 Why not check yourself?
 
 Kai
 
 -- 
 Get your web at Conactive Internet Services: http://www.conactive.com
 
 
 
 
 

Thanks for your help Kai.

After checking
http://spamassassin.apache.org/full/3.0.x/dist/doc/sa-learn.html

I see that there is no official answer to the question. what is the message
size limit where sa-learn fails. 

The question So, does the documentation on sa-learn indicate that there is
a 
size limit on the messages to be processed? is a veiled request to the SA
developers/maintainers that people may be interested in that information.

-- 
View this message in context: 
http://old.nabble.com/bayes-learning-%270-messages-found%27-tp27358517p27595445.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: bayes learning '0 messages found'

2010-02-13 Thread smfabac


RW-15 wrote:
 
 On Fri, 12 Feb 2010 17:51:12 +
 RW rwmailli...@googlemail.com wrote:
 
 On Fri, 12 Feb 2010 09:17:54 -0800 (PST)
 smfabac smfa...@att.net wrote:
 
  
 
  Mark, 
  
  On UNIX any file is a mbox file if it contains mail messages in the
  form:
  
  ^A^A^A^A
  mail headers
  mail body
  ^A^A^A^A
  ^A^A^A^A
  Next Message mail headers
  mail body
  ^A^A^A^A
 
 I don't know what that is, but it's not a standard mbox format.
 
 In mbox format the emails all start with a blank line and a From.
 
 
 It appears to be mmdf format
 
 http://www.washington.edu/imap/documentation/formats.txt.html
 
 

Ok, 

Now that we're all on the same page. How do I find out why sa-learn
is not processing the legal not-spam file?  To re-cap, sa-learn --spam
--mbox isspam works but sa-learn --ham --mbox not-spam is not
working.  

The sa-learn --dump magic shows that messages have been 
added by the sa-learn command:

$ sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0  12551  0  non-token data: nspam
0.000  0  68020  0  non-token data: nham
0.000  0 143948  0  non-token data: ntokens
0.000  0 1260104403  0  non-token data: oldest atime
0.000  0 1266048014  0  non-token data: newest atime
0.000  0 1266049794  0  non-token data: last journal sync
atime
0.000  0 1265630710  0  non-token data: last expiry atime
0.000  05529600  0  non-token data: last expire atime
delta
0.000  0  19095  0  non-token data: last expire
reduction co
unt

$ sa-learn --spam --mbox isspam
Learned tokens from 1 message(s) (1 message(s) examined)
$

$ sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0  12552  0  non-token data: nspam
0.000  0  68020  0  non-token data: nham
0.000  0 144608  0  non-token data: ntokens
0.000  0 1260104403  0  non-token data: oldest atime
0.000  0 1266048014  0  non-token data: newest atime
0.000  0 1266049794  0  non-token data: last journal sync
atime
0.000  0 1265630710  0  non-token data: last expiry atime
0.000  05529600  0  non-token data: last expire atime
delta
0.000  0  19095  0  non-token data: last expire
reduction co
unt
$ 

As you can see the nspam has incremented by 1.

$ sa-learn --ham --mbox not-spam
Learned tokens from 0 message(s) (0 message(s) examined)
$ 

Read Create Save Delete Undelete Print Folder Options Quit
Set mail options and preferences
Folder: not-spamSaturday February 13, 2010 
2:34
-- [1] Message 

  1 gerb...@zenez.co  11 Feb 10 6404  Quarterly ASCII posting of SCO
Uni


Is there a message size limit for sa-learn?  The message in not-spam is 
plain ascii, no html.

$ wc -l not-spam
   6408 not-spam  -- sa-learn --ham failed on not-spam folder with one
message
$ 
$ wc -l isspam
   1039 isspam   -- sa-learn --spam worked on isspam folder with one
message
$ 
-- 
View this message in context: 
http://old.nabble.com/bayes-learning-%270-messages-found%27-tp27358517p27573012.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: bayes learning '0 messages found'

2010-02-13 Thread smfabac


Charles Gregory wrote:
 
 On Sat, 13 Feb 2010, smfabac wrote:
 Now that we're all on the same page. How do I find out why sa-learn
 is not processing the legal not-spam file?  To re-cap, sa-learn --spam
 --mbox isspam works but sa-learn --ham --mbox not-spam is not
 working.
 
 Well, I would expect if this suggestion were right you would have had all 
 sorts of warning messages about syntax, but just in case
 
 Maybe linux is interpreting the dash in the filename as a switch 
 indicator? Try enclosing the file name in single quotes or use a filename 
 without a dash...
 
 - C
 
 
 
 

$ ls -lt | head -3
total 15868
-rw---   1 smf  group 249046 Feb 13 02:37 not-spam
-rw-rw-rw-   1 smf  group  94762 Feb 13 02:29 isspam
$ mv not-spam notspam
$ ls -lt | head -3
total 15868
-rw---   1 smf  group 249046 Feb 13 02:37 notspam
-rw-rw-rw-   1 smf  group  94762 Feb 13 02:29 isspam

$ sa-learn --showdots --ham --mbox notspam

Learned tokens from 0 message(s) (0 message(s) examined)
$

On the off chance that permissions on the file is an issue:

$ chmod 666 notspam
$ ls -lt | head -3
total 15868
-rw-rw-rw-   1 smf  group 249046 Feb 13 02:37 notspam
-rw-rw-rw-   1 smf  group  94762 Feb 13 02:29 isspam

$ sa-learn --showdots --ham --mbox notspam

Learned tokens from 0 message(s) (0 message(s) examined)

Still no luck.

-- 
View this message in context: 
http://old.nabble.com/bayes-learning-%270-messages-found%27-tp27358517p27576922.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: spamassasin: sa-learn --dump magic intrepretation

2010-02-12 Thread smfabac


Michael Scheidell wrote:
 
 Is there a document regarding the interpretation of
 
 
  sa-learn --dump magic
 config: could not find site rules directory
 
 0.000  03  0  non-token data: bayes db
 version
 0.000  0   261451  0  non-token data: nspam
 0.000  018530  0  non-token data: nham
 0.000  0   143599  0  non-token data: ntokens
 
 0.000  0  1231533845  0  non-token data: oldest atime
 0.000  0  1237223892  0  non-token data: newest atime
 0.000  0  1237214668  0  non-token data: last journal
 sync
 atime
 0.000  0  1237059740  0  non-token data: last expiry
 atime

 0.000  05529600  0  non-token data: last expire
 atime
 delta
 
 0.000  0   9311  0  non-token data: last expire
 reduction
 count
 
 
 Let me take a stab at it.
 The db version is 3

 You have 261,451 tokens that appeared in Œspam¹.
 You have 18,530 tokens that appeard in Œham¹

 You have 143,599 tokens (remember, some tokens could appear in both spam
 and
 ham)

 The oldest token is date -j -f %s 1231533845
 Fri Jan  9 15:44:05 EST 2009

 The newest token is date -j -f %s 1237223892
 Mon Mar 16 13:18:12 EDT 2009

 The rest should be easy to figure out.
 
 Two questions: what is the date program above that accepts -j -f %s
 1231533845
 (what OS)? Neither Windows or SCO UNIX accepts these options. 
 
 What about the other fields in the output of dump magic (field 1: 0.000, 
 field 2: and field 4: 0)?  Are they a secret known only to spamassassin
 developers
 and kept secret for some reason?
 
 
 
 -- 
 Michael Scheidell, CTO
|SECNAP Network Security
 Finalist 2009 Network Products Guide Hot Companies
 FreeBSD SpamAssassin Ports maintainer
 
 
 
 _
 This email has been scanned and certified safe by SpammerTrap(r). 
 For Information please see http://www.secnap.com/products/spammertrap/
 _
 
 
 

-- 
View this message in context: 
http://old.nabble.com/spamassasin%3A-sa-learn---dump-magic-intrepretation-tp22543157p27565677.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: bayes learning '0 messages found'

2010-02-12 Thread smfabac


tonjg wrote:
 
 raq550 server
 OS: strongbolt2
 spamassassin.i386 0:3.2.5-1.el4
 
 I'm trying to run:
 sa-learn --spam --showdots --dir /path/to...mbox
 but it fails with:
 'Learned tokens from 0 message(s) (0 messages examined)'
 my spam mail is in a file called mbox but when I run the above command to
 the directory containg mbox it always fails with the '0 messages examined'
 error.
 I've also tried copying the mbox file to another location, removing all
 the restrictions on it but I still get '0 messages learned'.
 I know the sa-learn command is working properly because I previously
 pointed it to a wrong location and it picked up 3 tokens but it won't pick
 up anything from the mbox file. I've even tried renaming the (copied) mbox
 file and restarting spamassassin but no joy.
 The mbox file contains about 200 spam mails and is 3.5Mb. Thanks for any
 help.
 

I am having a similar problem as the  poster but I have successfully run
spamassassin for several years and today when I used the sa-lean
command to process the mailbox where I moved the mis-classified
mail message (not-spam) I get:

$ sa-learn --showdots --ham --mbox not-spam

Learned tokens from 0 message(s) (0 message(s) examined)
$

Check the mail folder not-spam:

$ mail -f not-spam
SCO OpenServer Mail Release 5.0.7  Type ? for help.
not-spam: 1 message
   1 gerb...@zenez.co Thu Feb 11 01:30 6405/248986 Quarterly ASCII posting
 of 


And reading the message:

Message  1:
From smf  Thu Feb 11 01:30:02 2010
From: Boyd Lynn Gerber gerb...@zenez.com
To: distribut...@registry.ca
Subject: Quarterly ASCII posting of SCO UnixWare 7/OpenUNIX 8/OpenServer 6
FAQ
Date: Thu, 11 Feb 2010 00:05:18 -0700 (MST)
Message-Id: ou8faqqt_1265871...@news.xmission.com
X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on unix.smfabac.com
X-Spam-Level: ***
X-Spam-Status: Yes, score=3.4 required=3.0 tests=HEADER_SPAM
autolearn=unavailable version=3.2.5
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary=--=_4B73B21B.8398EDEC
Status: RO

This is a multi-part message in MIME format.

=_4B73B21B.8398EDEC
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit

Spam detection software, running on the system unix.smfabac.com, has


And sa-learn --dump --magic shows:

$ sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0  12551  0  non-token data: nspam
0.000  0  67987  0  non-token data: nham
0.000  0 143194  0  non-token data: ntokens
0.000  0 1260104403  0  non-token data: oldest atime
0.000  0 1265990403  0  non-token data: newest atime
0.000  0 1265991303  0  non-token data: last journal sync
atime
0.000  0 1265630710  0  non-token data: last expiry atime
0.000  05529600  0  non-token data: last expire atime
delta
0.000  0  19095  0  non-token data: last expire
reduction co
unt
$

I have successfully run sa-learn --ham --mbox not-spam in the past so
why is it failing me now?

how do I determine why the message is not being processed by sa-learn?


-- 
View this message in context: 
http://old.nabble.com/bayes-learning-%270-messages-found%27-tp27358517p27566005.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: bayes learning '0 messages found'

2010-02-12 Thread smfabac


Mark Martinec wrote:
 
 tonjg wrote:
 I'm trying to run:
 sa-learn --spam --showdots --dir /path/to...mbox
 but it fails with:
 'Learned tokens from 0 message(s) (0 messages examined)'
 my spam mail is in a file called mbox but when I run the above command to
 the directory containg mbox it always fails with the '0 messages
 examined' error.
 
 If your messages are in a mbox *file*, you need an option --mbox,
 not --dir .
 
 smfabac wrote: 
 I am having a similar problem as the  poster but I have successfully run
 spamassassin for several years and today when I used the sa-lean
 command to process the mailbox where I moved the mis-classified
 mail message (not-spam) I get:
 
 $ sa-learn --showdots --ham --mbox not-spam
 
 Learned tokens from 0 message(s) (0 message(s) examined)
 
 Check the mail folder not-spam:
 
 If not-spam is a folder (not a mbox file), you must not
 use the option --mbox.
 
   Mark
 
 
 

Mark, 

On UNIX any file is a mbox file if it contains mail messages in the form:

^A^A^A^A
mail headers
mail body
^A^A^A^A
^A^A^A^A
Next Message mail headers
mail body
^A^A^A^A

And my not-spam file meets this requirement:

^A^A^A^A
From smf  Thu Feb 11 01:30:02 2010
From: Boyd Lynn Gerber gerb...@zenez.com
To: distribut...@registry.ca
...
stuff deleted
...
=_4B73B21B.8398EDEC--

^A^A^A^A

Also, reading the file with the command mail -f not-spam launches 
the UNIX mail reader showing that the file is legal mbox file.
-- 
View this message in context: 
http://old.nabble.com/bayes-learning-%270-messages-found%27-tp27358517p27566692.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.