Re: bayes learning '0 messages found'
John Hardin wrote: On Sat, 13 Feb 2010, smfabac wrote: Is there a message size limit for sa-learn? Yes, there is, and sadly sa-learn does not explicitly tell you a message has been skipped because it's too large. If there's a non-text attachment try deleteing it and re-learning the message. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- End users want eye candy and the ooo's and hhh's experience when reading mail. To them email isn't a tool, but an entertainment form. -- Steve Lake --- 9 days until George Washington's 278th Birthday Ok. It's a size problem: I edited the notspam message and deleted 1000 lines from line 3000 to 4000, saved the file and then reprocessed notspam. I continued getting 0 messages examined until I had deleted 3000 lines of the message: Message size as received: $ wc -l notspam 6408 notspam -- sa-learn --ham failed on notspam folder with one message of 6000+ lines $ After deleting 3003 lines: $ wc -l notspam 3405 notspam $ vi notspam 1 ^A^A^A^A 2 From smf Thu Feb 11 01:30:02 2010 3 From: Boyd Lynn Gerber gerb...@zenez.com 4 To: distribut...@registry.ca 5 Subject: Quarterly ASCII posting of SCO UnixWare 7/OpenUNIX 8/OpenServer6 FAQ 6 Date: Thu, 11 Feb 2010 00:05:18 -0700 (MST) 7 Message-Id: ou8faqqt_1265871...@news.xmission.com 3395 3396 filepriv -f setuid programfile.exe 3397 3398 -- 3399 Boyd Gerber gerb...@zenez.com 801 849-0213 3400 ZENEZ 1042 East Fort Union #135, Midvale Utah 84047 3401 3402 3403 =_4B73B21B.8398EDEC-- 3404 3405 ^A^A^A^A $ sa-learn --showdots --ham --mbox notspam . Learned tokens from 1 message(s) (1 message(s) examined) $ $ wc notspam lines: 3405 words: 18735 characters: 130876 notspam So, does the documentation on sa-learn indicate that there is a size limit on the message to be processed? -- View this message in context: http://old.nabble.com/bayes-learning-%270-messages-found%27-tp27358517p27590620.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: bayes learning '0 messages found'
Kai Schaetzl wrote: Smfabac wrote on Mon, 15 Feb 2010 00:20:06 -0800 (PST): So, does the documentation on sa-learn indicate that there is a size limit on the message to be processed? Why not check yourself? Kai -- Get your web at Conactive Internet Services: http://www.conactive.com Thanks for your help Kai. After checking http://spamassassin.apache.org/full/3.0.x/dist/doc/sa-learn.html I see that there is no official answer to the question. what is the message size limit where sa-learn fails. The question So, does the documentation on sa-learn indicate that there is a size limit on the messages to be processed? is a veiled request to the SA developers/maintainers that people may be interested in that information. -- View this message in context: http://old.nabble.com/bayes-learning-%270-messages-found%27-tp27358517p27595445.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: bayes learning '0 messages found'
RW-15 wrote: On Fri, 12 Feb 2010 17:51:12 + RW rwmailli...@googlemail.com wrote: On Fri, 12 Feb 2010 09:17:54 -0800 (PST) smfabac smfa...@att.net wrote: Mark, On UNIX any file is a mbox file if it contains mail messages in the form: ^A^A^A^A mail headers mail body ^A^A^A^A ^A^A^A^A Next Message mail headers mail body ^A^A^A^A I don't know what that is, but it's not a standard mbox format. In mbox format the emails all start with a blank line and a From. It appears to be mmdf format http://www.washington.edu/imap/documentation/formats.txt.html Ok, Now that we're all on the same page. How do I find out why sa-learn is not processing the legal not-spam file? To re-cap, sa-learn --spam --mbox isspam works but sa-learn --ham --mbox not-spam is not working. The sa-learn --dump magic shows that messages have been added by the sa-learn command: $ sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 12551 0 non-token data: nspam 0.000 0 68020 0 non-token data: nham 0.000 0 143948 0 non-token data: ntokens 0.000 0 1260104403 0 non-token data: oldest atime 0.000 0 1266048014 0 non-token data: newest atime 0.000 0 1266049794 0 non-token data: last journal sync atime 0.000 0 1265630710 0 non-token data: last expiry atime 0.000 05529600 0 non-token data: last expire atime delta 0.000 0 19095 0 non-token data: last expire reduction co unt $ sa-learn --spam --mbox isspam Learned tokens from 1 message(s) (1 message(s) examined) $ $ sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 12552 0 non-token data: nspam 0.000 0 68020 0 non-token data: nham 0.000 0 144608 0 non-token data: ntokens 0.000 0 1260104403 0 non-token data: oldest atime 0.000 0 1266048014 0 non-token data: newest atime 0.000 0 1266049794 0 non-token data: last journal sync atime 0.000 0 1265630710 0 non-token data: last expiry atime 0.000 05529600 0 non-token data: last expire atime delta 0.000 0 19095 0 non-token data: last expire reduction co unt $ As you can see the nspam has incremented by 1. $ sa-learn --ham --mbox not-spam Learned tokens from 0 message(s) (0 message(s) examined) $ Read Create Save Delete Undelete Print Folder Options Quit Set mail options and preferences Folder: not-spamSaturday February 13, 2010 2:34 -- [1] Message 1 gerb...@zenez.co 11 Feb 10 6404 Quarterly ASCII posting of SCO Uni Is there a message size limit for sa-learn? The message in not-spam is plain ascii, no html. $ wc -l not-spam 6408 not-spam -- sa-learn --ham failed on not-spam folder with one message $ $ wc -l isspam 1039 isspam -- sa-learn --spam worked on isspam folder with one message $ -- View this message in context: http://old.nabble.com/bayes-learning-%270-messages-found%27-tp27358517p27573012.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: bayes learning '0 messages found'
Charles Gregory wrote: On Sat, 13 Feb 2010, smfabac wrote: Now that we're all on the same page. How do I find out why sa-learn is not processing the legal not-spam file? To re-cap, sa-learn --spam --mbox isspam works but sa-learn --ham --mbox not-spam is not working. Well, I would expect if this suggestion were right you would have had all sorts of warning messages about syntax, but just in case Maybe linux is interpreting the dash in the filename as a switch indicator? Try enclosing the file name in single quotes or use a filename without a dash... - C $ ls -lt | head -3 total 15868 -rw--- 1 smf group 249046 Feb 13 02:37 not-spam -rw-rw-rw- 1 smf group 94762 Feb 13 02:29 isspam $ mv not-spam notspam $ ls -lt | head -3 total 15868 -rw--- 1 smf group 249046 Feb 13 02:37 notspam -rw-rw-rw- 1 smf group 94762 Feb 13 02:29 isspam $ sa-learn --showdots --ham --mbox notspam Learned tokens from 0 message(s) (0 message(s) examined) $ On the off chance that permissions on the file is an issue: $ chmod 666 notspam $ ls -lt | head -3 total 15868 -rw-rw-rw- 1 smf group 249046 Feb 13 02:37 notspam -rw-rw-rw- 1 smf group 94762 Feb 13 02:29 isspam $ sa-learn --showdots --ham --mbox notspam Learned tokens from 0 message(s) (0 message(s) examined) Still no luck. -- View this message in context: http://old.nabble.com/bayes-learning-%270-messages-found%27-tp27358517p27576922.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: spamassasin: sa-learn --dump magic intrepretation
Michael Scheidell wrote: Is there a document regarding the interpretation of sa-learn --dump magic config: could not find site rules directory 0.000 03 0 non-token data: bayes db version 0.000 0 261451 0 non-token data: nspam 0.000 018530 0 non-token data: nham 0.000 0 143599 0 non-token data: ntokens 0.000 0 1231533845 0 non-token data: oldest atime 0.000 0 1237223892 0 non-token data: newest atime 0.000 0 1237214668 0 non-token data: last journal sync atime 0.000 0 1237059740 0 non-token data: last expiry atime 0.000 05529600 0 non-token data: last expire atime delta 0.000 0 9311 0 non-token data: last expire reduction count Let me take a stab at it. The db version is 3 You have 261,451 tokens that appeared in spam¹. You have 18,530 tokens that appeard in ham¹ You have 143,599 tokens (remember, some tokens could appear in both spam and ham) The oldest token is date -j -f %s 1231533845 Fri Jan 9 15:44:05 EST 2009 The newest token is date -j -f %s 1237223892 Mon Mar 16 13:18:12 EDT 2009 The rest should be easy to figure out. Two questions: what is the date program above that accepts -j -f %s 1231533845 (what OS)? Neither Windows or SCO UNIX accepts these options. What about the other fields in the output of dump magic (field 1: 0.000, field 2: and field 4: 0)? Are they a secret known only to spamassassin developers and kept secret for some reason? -- Michael Scheidell, CTO |SECNAP Network Security Finalist 2009 Network Products Guide Hot Companies FreeBSD SpamAssassin Ports maintainer _ This email has been scanned and certified safe by SpammerTrap(r). For Information please see http://www.secnap.com/products/spammertrap/ _ -- View this message in context: http://old.nabble.com/spamassasin%3A-sa-learn---dump-magic-intrepretation-tp22543157p27565677.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: bayes learning '0 messages found'
tonjg wrote: raq550 server OS: strongbolt2 spamassassin.i386 0:3.2.5-1.el4 I'm trying to run: sa-learn --spam --showdots --dir /path/to...mbox but it fails with: 'Learned tokens from 0 message(s) (0 messages examined)' my spam mail is in a file called mbox but when I run the above command to the directory containg mbox it always fails with the '0 messages examined' error. I've also tried copying the mbox file to another location, removing all the restrictions on it but I still get '0 messages learned'. I know the sa-learn command is working properly because I previously pointed it to a wrong location and it picked up 3 tokens but it won't pick up anything from the mbox file. I've even tried renaming the (copied) mbox file and restarting spamassassin but no joy. The mbox file contains about 200 spam mails and is 3.5Mb. Thanks for any help. I am having a similar problem as the poster but I have successfully run spamassassin for several years and today when I used the sa-lean command to process the mailbox where I moved the mis-classified mail message (not-spam) I get: $ sa-learn --showdots --ham --mbox not-spam Learned tokens from 0 message(s) (0 message(s) examined) $ Check the mail folder not-spam: $ mail -f not-spam SCO OpenServer Mail Release 5.0.7 Type ? for help. not-spam: 1 message 1 gerb...@zenez.co Thu Feb 11 01:30 6405/248986 Quarterly ASCII posting of And reading the message: Message 1: From smf Thu Feb 11 01:30:02 2010 From: Boyd Lynn Gerber gerb...@zenez.com To: distribut...@registry.ca Subject: Quarterly ASCII posting of SCO UnixWare 7/OpenUNIX 8/OpenServer 6 FAQ Date: Thu, 11 Feb 2010 00:05:18 -0700 (MST) Message-Id: ou8faqqt_1265871...@news.xmission.com X-Spam-Flag: YES X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on unix.smfabac.com X-Spam-Level: *** X-Spam-Status: Yes, score=3.4 required=3.0 tests=HEADER_SPAM autolearn=unavailable version=3.2.5 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary=--=_4B73B21B.8398EDEC Status: RO This is a multi-part message in MIME format. =_4B73B21B.8398EDEC Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit Spam detection software, running on the system unix.smfabac.com, has And sa-learn --dump --magic shows: $ sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 12551 0 non-token data: nspam 0.000 0 67987 0 non-token data: nham 0.000 0 143194 0 non-token data: ntokens 0.000 0 1260104403 0 non-token data: oldest atime 0.000 0 1265990403 0 non-token data: newest atime 0.000 0 1265991303 0 non-token data: last journal sync atime 0.000 0 1265630710 0 non-token data: last expiry atime 0.000 05529600 0 non-token data: last expire atime delta 0.000 0 19095 0 non-token data: last expire reduction co unt $ I have successfully run sa-learn --ham --mbox not-spam in the past so why is it failing me now? how do I determine why the message is not being processed by sa-learn? -- View this message in context: http://old.nabble.com/bayes-learning-%270-messages-found%27-tp27358517p27566005.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: bayes learning '0 messages found'
Mark Martinec wrote: tonjg wrote: I'm trying to run: sa-learn --spam --showdots --dir /path/to...mbox but it fails with: 'Learned tokens from 0 message(s) (0 messages examined)' my spam mail is in a file called mbox but when I run the above command to the directory containg mbox it always fails with the '0 messages examined' error. If your messages are in a mbox *file*, you need an option --mbox, not --dir . smfabac wrote: I am having a similar problem as the poster but I have successfully run spamassassin for several years and today when I used the sa-lean command to process the mailbox where I moved the mis-classified mail message (not-spam) I get: $ sa-learn --showdots --ham --mbox not-spam Learned tokens from 0 message(s) (0 message(s) examined) Check the mail folder not-spam: If not-spam is a folder (not a mbox file), you must not use the option --mbox. Mark Mark, On UNIX any file is a mbox file if it contains mail messages in the form: ^A^A^A^A mail headers mail body ^A^A^A^A ^A^A^A^A Next Message mail headers mail body ^A^A^A^A And my not-spam file meets this requirement: ^A^A^A^A From smf Thu Feb 11 01:30:02 2010 From: Boyd Lynn Gerber gerb...@zenez.com To: distribut...@registry.ca ... stuff deleted ... =_4B73B21B.8398EDEC-- ^A^A^A^A Also, reading the file with the command mail -f not-spam launches the UNIX mail reader showing that the file is legal mbox file. -- View this message in context: http://old.nabble.com/bayes-learning-%270-messages-found%27-tp27358517p27566692.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.