Re: Bayes filter marking everything as ham

2016-05-31 Thread David Jones
>https://wiki.apache.org/spamassassin/ImproveAccuracy >I have gone through this wiki (and ones like it) at least a dozen times. >My server is blocking about 50% of the spam, thanks to some of the >other layers of spam protection. It's just bayes that I can't seem to get >right Are you getting

Re: Bayes filter marking everything as ham

2016-05-31 Thread Martin Gregorie
On Wed, 2016-06-01 at 00:38 +, David Jones wrote: > > Too bad we couldn't make SA do something very annoying and > more obvious when the URIBL_BLOCKED rule was hit. > I notice, rather to my surprise, that the SA Wiki doesn't seem to have an entry for the URIBL_BLOCKED rule. However, since

Re: Bayes filter marking everything as ham

2016-05-31 Thread Martin Gregorie
On Tue, 2016-05-31 at 17:04 -0700, Peter Carlson wrote: > > URIBL_BLOCKED == read some basics > your reply == useless.  You have no idea what I may or may not have  > read.  You are under no obligation to provide any help to me or > anyone  > else.  I suggest that if for whatever reason you find

Re: Bayes filter marking everything as ham

2016-05-31 Thread John Hardin
On Tue, 31 May 2016, Peter Carlson wrote: I will investigate this (URIBL_BLOCKED) further tomorrow (https://wiki.apache.org/spamassassin/CachingNameserver), Note: caching != recursing. You can have a caching forwarding local nameserver, which won't fix URIBL_BLOCKED. however I doubt that

Re: Bayes filter marking everything as ham

2016-05-31 Thread Peter Carlson
not everyone is an email expert that understands how RBLs work and that it's bad to share a recursive DNS server on an SA server. I will investigate this (URIBL_BLOCKED) further tomorrow (https://wiki.apache.org/spamassassin/CachingNameserver), however I doubt

Re: Bayes filter marking everything as ham

2016-05-31 Thread John Hardin
On Tue, 31 May 2016, Peter Carlson wrote: On 05/31/2016 04:27 PM, Reindl Harald wrote: Am 31.05.2016 um 23:58 schrieb Peter Carlson: > May 30 09:04:53 www amavis[16577]: (16577-03) Passed CLEAN > {RelayedInbound}, Tests: >

Re: Bayes filter marking everything as ham

2016-05-31 Thread David Jones
>From: Reindl Harald >Sent: Tuesday, May 31, 2016 6:27 PM >To: users@spamassassin.apache.org >Subject: Re: Bayes filter marking everything as ham >Am 31.05.2016 um 23:58 schrieb Peter Carlson: >> May 30 09:04:53 www amavis[16577]: (16577-03) Passed CLEAN >>

Re: Bayes filter marking everything as ham

2016-05-31 Thread shanew
Kind of a shot in the dark, but are you sure everyone is promptly moving their spam out of the inboxes? I worry about automated learning like this. Even then, it seems unlikely that every mail would get tagged by bayes as likely ham. Someone just today suggested in another thread to add the

Re: Bayes filter marking everything as ham

2016-05-31 Thread Peter Carlson
On 05/31/2016 04:27 PM, Reindl Harald wrote: Am 31.05.2016 um 23:58 schrieb Peter Carlson: May 30 09:04:53 www amavis[16577]: (16577-03) Passed CLEAN {RelayedInbound}, Tests: [BAYES_00=-1.9,RCVD_IN_MSPIKE_H2=-0.001,SPF_PASS=-0.001,URIBL_BLOCKED=0.001], autolearn=ham

Re: Bayes filter marking everything as ham

2016-05-31 Thread Reindl Harald
Am 31.05.2016 um 23:58 schrieb Peter Carlson: May 30 09:04:53 www amavis[16577]: (16577-03) Passed CLEAN {RelayedInbound}, Tests: [BAYES_00=-1.9,RCVD_IN_MSPIKE_H2=-0.001,SPF_PASS=-0.001,URIBL_BLOCKED=0.001], autolearn=ham autolearn_force=no, autolearnscore=-0.001, 3992 ms

Re: SA Concepts - plugin for email semantics

2016-05-31 Thread David Jones
>From: RW >Sent: Tuesday, May 31, 2016 5:20 PM >To: users@spamassassin.apache.org >Subject: Re: SA Concepts - plugin for email semantics >On Tue, 31 May 2016 15:20:56 -0400 >Bill Cole wrote: >> On 29 May 2016, at 11:07, RW wrote: >> >> > Statistical filters are

Re: SA Concepts - plugin for email semantics

2016-05-31 Thread RW
On Tue, 31 May 2016 15:20:56 -0400 Bill Cole wrote: > On 29 May 2016, at 11:07, RW wrote: > > > Statistical filters are based on some statistical theory combined > > with pragmatic kludges and assumptions. Practical filters have been > > developed based on what's been found to work, not on

Bayes filter marking everything as ham

2016-05-31 Thread Peter Carlson
(sorry if this is a repost, I dont see my messages coming through...the irony of spamassassin.apache.org trapping my request for help as spam.  I have snipped the logfile entries which I think were causing it to be tagged as spam) All of my messages

Re: SA Concepts - plugin for email semantics

2016-05-31 Thread Dianne Skoll
On Tue, 31 May 2016 21:23:11 +0100 Paul Stead wrote: > The implementation was undertaken from a personal interest - I asked > the question of what people thought of the implementation and the > impact to Bayes DB. I think what the "concepts" concept ends up doing

Re: SA Concepts - plugin for email semantics

2016-05-31 Thread Paul Stead
On 31/05/16 20:20, Bill Cole wrote: It is no shock that while this implementation has Paul Stead's name on it, it is apparently mostly the product of the anti-spam community's most spectacular case of Dunning-Kruger Syndrome, who has apparently figured out that his personal 'brand' has

Re: SA Concepts - plugin for email semantics

2016-05-31 Thread Bill Cole
On 29 May 2016, at 11:07, RW wrote: On Sat, 28 May 2016 15:37:21 -0400 Bill Cole wrote: More importantly (IMHO) they aren't designed to collide with existing common tokens and be added back into messages that may contain those tokens already in order to influence Bayesian classification.

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Shivram Krishnan
Agreed that I do not have experience. I am just playing my cards out here to get a corpus of mails. Thanks guys! On Tue, May 31, 2016 at 11:20 AM, Reindl Harald wrote: > > > Am 31.05.2016 um 20:16 schrieb Antony Stone: > >> On Tuesday 31 May 2016 at 20:11:14, Shivram

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Reindl Harald
Am 31.05.2016 um 20:16 schrieb Antony Stone: On Tuesday 31 May 2016 at 20:11:14, Shivram Krishnan wrote: In the glue - like spamass-mailer, there would be two folders which are created. One would be the mailbox and the other would be a spambox(dont know the term). Cant you access the spambox

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Shivram Krishnan
In the glue - like spamass-mailer, there would be two folders which are created. One would be the mailbox and the other would be a spambox(dont know the term). Cant you access the spambox to extract the mail? On Tue, May 31, 2016 at 11:01 AM, Reindl Harald wrote: > > >

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Shivram Krishnan
Hi Antony, I have an ongoing collection of Blacklists since Jan 1 ,2016. This way I would know how long it has stayed on the Blacklist. "Dealing with email "after the event" (especially with regard to blacklists) will give you very different results from dealing with it as it happens, if for no

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Antony Stone
On Tuesday 31 May 2016 at 15:47:56, Shivram Krishnan wrote: > I am using SA as an oracle for Blacklisting. Our research concerns with > combining multiple sources of blacklist and also consider the historical > importance of an IP in a blacklist to create a very effective master > blacklist. > >

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Reindl Harald
Am 31.05.2016 um 19:25 schrieb Shivram Krishnan: Thanks guys. What I am going to ask might be a longshot. But is it possible for anyone who is running a mailserver to give a list of source of SPAM (recent , anytime this year)and the SA score associated? It will be extremely useful for my

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Shivram Krishnan
Thanks guys. What I am going to ask might be a longshot. But is it possible for anyone who is running a mailserver to give a list of source of SPAM (recent , anytime this year)and the SA score associated? It will be extremely useful for my research and credit would be given. Example:-

Re: Accidental Spam Forward

2016-05-31 Thread Joe Quinn
On 5/31/2016 12:06 PM, Anthony Hoppe wrote: All, I accidentally forwarded some spam to this list. Autocomplete got the best of me and I chose "spamassassin" instead of "spamcop" in the "TO" field of the message. I haven't received the message myself (not sure if I will), but wanted to

Re: SA Concepts - plugin for email semantics

2016-05-31 Thread RW
On Tue, 31 May 2016 12:05:39 -0400 Bill Cole wrote: > On 31 May 2016, at 2:21, Henrik K wrote: > > > On Mon, May 30, 2016 at 06:25:08PM -0400, Dianne Skoll wrote: > >> On Mon, 30 May 2016 17:45:52 -0400 > >> "Bill Cole" wrote: > >> > >>> So you could

Re: SA Concepts - plugin for email semantics

2016-05-31 Thread RW
On Mon, 30 May 2016 17:45:52 -0400 Bill Cole wrote: > The "Naive Bayes" classification approach is theoretically moored to > Bayes' Theorem FWIW Bayes hasn't been "Naive Bayes" for a long time.

Accidental Spam Forward

2016-05-31 Thread Anthony Hoppe
All, I accidentally forwarded some spam to this list. Autocomplete got the best of me and I chose "spamassassin" instead of "spamcop" in the "TO" field of the message. I haven't received the message myself (not sure if I will), but wanted to apologize in case any of you got it. Happy, uh,

Re: SA Concepts - plugin for email semantics

2016-05-31 Thread Bill Cole
On 31 May 2016, at 2:21, Henrik K wrote: On Mon, May 30, 2016 at 06:25:08PM -0400, Dianne Skoll wrote: On Mon, 30 May 2016 17:45:52 -0400 "Bill Cole" wrote: So you could have 'sex' and 'meds' and 'watches' tallied up in into frequency counts that sum

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Bowie Bailey
On 5/31/2016 1:38 AM, @lbutlr wrote: On May 30, 2016, at 11:06 PM, Shivram Krishnan wrote: 2) I have set a threshold of -10 to see how spamassassin assigns a score for every mail. No. Do not do this. Instead, set this option in your local.cf file: add_header all

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Reindl Harald
Am 31.05.2016 um 17:13 schrieb Shivram Krishnan: I might be forced to do this. Take the corpus from Mailinator and manually mark it as SPAM or HAM and use sa-learn to train spamassassin. But this is what is confusing me. doesnt SA use a lot more tags, to determine if it is a SPAM or HAM? does

Re: Odd results when using whitelisting

2016-05-31 Thread Bowie Bailey
On 5/30/2016 10:35 AM, Nick Howitt wrote: Just for a bit of closure, it looks like when you use amavisd-new with SA, it is amavisd-new and not SA which is adding the X-Spam headers. In /etc/amavisd/api.conf there is a parameter, $sa_tag_level_deflt, defaulted to -99, below which no X-Spam

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Shivram Krishnan
I might be forced to do this. Take the corpus from Mailinator and manually mark it as SPAM or HAM and use sa-learn to train spamassassin. But this is what is confusing me. doesnt SA use a lot more tags, to determine if it is a SPAM or HAM? does this mean that sa-learn is not only for bayes but

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Antony Stone
On Tuesday 31 May 2016 at 17:02:26, Reindl Harald wrote: > Am 31.05.2016 um 16:59 schrieb Antony Stone: > > > > I had read SA documentation such as > > https://spamassassin.apache.org/full/3.1.x/doc/sa-learn.html > that's all based on opinions - the only question is the quality of > training

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Reindl Harald
Am 31.05.2016 um 16:59 schrieb Antony Stone: On Tuesday 31 May 2016 at 15:32:49, Reindl Harald wrote: Am 31.05.2016 um 15:28 schrieb Antony Stone: 2. You should be aware (*especially* if using this stuff as the basis of a research project - any competent referee should pick up on something

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Antony Stone
On Tuesday 31 May 2016 at 15:32:49, Reindl Harald wrote: > Am 31.05.2016 um 15:28 schrieb Antony Stone: > > 2. You should be aware (*especially* if using this stuff as the basis of > > a research project - any competent referee should pick up on something > > like this) that SA works best when

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Shivram Krishnan
BTW I am using SA as an oracle for Blacklisting. Our research concerns with combining multiple sources of blacklist and also consider the historical importance of an IP in a blacklist to create a very effective master blacklist. Let me give you an example. Suppose an IP address 1.2.3.4 appeared

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Shivram Krishnan
The data set which i use for bayes consists of both ham and spam. ( https://www.cs.cmu.edu/~./enron/) Lets consider a scenario, where I have a domain and I point it to a mailserver. It might take a while for me to generate 50,000 mails a day ( mailinator provides me this) . I need to embed

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Reindl Harald
Am 31.05.2016 um 15:28 schrieb Antony Stone: 2. You should be aware (*especially* if using this stuff as the basis of a research project - any competent referee should pick up on something like this) that SA works best when the emails it is asked to process are from the same source as it has

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Reindl Harald
Am 31.05.2016 um 15:21 schrieb Shivram Krishnan: Here is my scenario. I am using SA as a oracle/ground truth for a research project. It is generally hard to get hold of a real time mail corpus nope, just point a cheap domain to a mailserver accepting all incoming stuff and spread some

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Antony Stone
On Tuesday 31 May 2016 at 15:21:19, Shivram Krishnan wrote: > Here is my scenario. I am using SA as a oracle/ground truth for a research > project. Okay. > It is generally hard to get hold of a real time mail corpus Er, what?? > I opted for a service provided by mailinator. > I have also

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Shivram Krishnan
Here is my scenario. I am using SA as a oracle/ground truth for a research project. It is generally hard to get hold of a real time mail corpus, so I opted for a service provided by mailinator. Mailinator is a company which provides users with disposable email ID's and it offers an API to obtain

Re: Multiple RBLs and dynamic IPs

2016-05-31 Thread Reindl Harald
Am 31.05.2016 um 10:43 schrieb Matus UHLAR - fantomas: On 30 May 2016, at 15:07, Alex wrote: Yeah, that's it exactly. Particularly overseas where it doesn't appear NAT and/or submission are used as readily as they are here. Am 31.05.2016 um 03:09 schrieb Bill Cole: Irrelevant in this case

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Reindl Harald
Am 31.05.2016 um 08:18 schrieb Shivram Krishnan: It is not on production. I am using this to evaluate spamassassin. how will you evaluate something when you slay your setup that way? On Mon, May 30, 2016 at 10:38 PM, @lbutlr > wrote: On May

Re: Multiple RBLs and dynamic IPs

2016-05-31 Thread Matus UHLAR - fantomas
On 30 May 2016, at 15:07, Alex wrote: Yeah, that's it exactly. Particularly overseas where it doesn't appear NAT and/or submission are used as readily as they are here. Am 31.05.2016 um 03:09 schrieb Bill Cole: Irrelevant in this case because if you trust that header not to be an

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Reindl Harald
Am 31.05.2016 um 04:24 schrieb Shivram Krishnan: I am testing spamassassin on a SPAM/HAM corpus of mails. Spamassassin is not picking up an obvious spam like in this case http://pastebin.com/MbNRNFWy . you sample is mangeled and hence crap it's even damaged because a leading newline frankly

Re: Multiple RBLs and dynamic IPs

2016-05-31 Thread Reindl Harald
Am 31.05.2016 um 03:09 schrieb Bill Cole: On 30 May 2016, at 15:07, Alex wrote: Yeah, that's it exactly. Particularly overseas where it doesn't appear NAT and/or submission are used as readily as they are here. Irrelevant in this case because if you trust that header not to be an

Re: SA Concepts - plugin for email semantics

2016-05-31 Thread Reindl Harald
Am 31.05.2016 um 02:30 schrieb Bill Cole: On 30 May 2016, at 18:25, Dianne Skoll wrote: On Mon, 30 May 2016 17:45:52 -0400 "Bill Cole" wrote: So you could have 'sex' and 'meds' and 'watches' tallied up in into frequency counts that sum up natural

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Dave Funk
OK, So you are testing to see how SA scores artificial mail messages. However SA is designed to evaluate real mail messages, not botched fabrications of them, so I don't understand what you are trying to achieve. You have (either deliberately or unknowingly) omitted the necessary information

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread LuKreme
On May 31, 2016, at 00:18, Shivram Krishnan wrote: > It is not on production. I am using this to evaluate spamassassin. You are not testing or evaluating properly when you break the configuration. --

Re: SA Concepts - plugin for email semantics

2016-05-31 Thread Henrik K
On Mon, May 30, 2016 at 06:25:08PM -0400, Dianne Skoll wrote: > On Mon, 30 May 2016 17:45:52 -0400 > "Bill Cole" wrote: > > > So you could have 'sex' and 'meds' and 'watches' tallied up in into > > frequency counts that sum up natural (word) and synthetic

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Shivram Krishnan
It is not on production. I am using this to evaluate spamassassin. On Mon, May 30, 2016 at 10:38 PM, @lbutlr wrote: > On May 30, 2016, at 11:06 PM, Shivram Krishnan > wrote: > > 2) I have set a threshold of -10 to see how spamassassin assigns a score >