Re: Spamassassin not capturing obvious Spam

2016-06-04 Thread Bill Cole
On 31 May 2016, at 2:18, Shivram Krishnan wrote: It is not on production. I am using this to evaluate spamassassin. That is entirely unnecessary and will break the autolearning subsystem if you have it enabled. To get a full report of the rules hit and their scores, use the '-t' option wit

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Shivram Krishnan
Agreed that I do not have experience. I am just playing my cards out here to get a corpus of mails. Thanks guys! On Tue, May 31, 2016 at 11:20 AM, Reindl Harald wrote: > > > Am 31.05.2016 um 20:16 schrieb Antony Stone: > >> On Tuesday 31 May 2016 at 20:11:14, Shivram Krishnan wrote: >> >> In th

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Reindl Harald
Am 31.05.2016 um 20:16 schrieb Antony Stone: On Tuesday 31 May 2016 at 20:11:14, Shivram Krishnan wrote: In the glue - like spamass-mailer, there would be two folders which are created. One would be the mailbox and the other would be a spambox(dont know the term). Cant you access the spambox

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Reindl Harald
Am 31.05.2016 um 20:11 schrieb Shivram Krishnan: In the glue - like spamass-mailer, there would be two folders which are created. One would be the mailbox and the other would be a spambox(dont know the term). Cant you access the spambox to extract the mail? in the glue there are no folders as

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Antony Stone
On Tuesday 31 May 2016 at 20:11:14, Shivram Krishnan wrote: > In the glue - like spamass-mailer, there would be two folders which are > created. One would be the mailbox and the other would be a spambox(dont > know the term). Cant you access the spambox to extract the mail? It sounds to me that y

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Shivram Krishnan
In the glue - like spamass-mailer, there would be two folders which are created. One would be the mailbox and the other would be a spambox(dont know the term). Cant you access the spambox to extract the mail? On Tue, May 31, 2016 at 11:01 AM, Reindl Harald wrote: > > > Am 31.05.2016 um 19:55 sch

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Reindl Harald
Am 31.05.2016 um 19:55 schrieb Shivram Krishnan: There will a point where the decision to drop the mail is made based on the headers. Cant we log it there? SA don't make any decisions of drop / reject the glue does - spamass-milter, amavis or whatever and even if - i would find it pervert to

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Shivram Krishnan
Hello Reindl, There will a point where the decision to drop the mail is made based on the headers. Cant we log it there? On Tue, May 31, 2016 at 10:30 AM, Reindl Harald wrote: > > > Am 31.05.2016 um 19:25 schrieb Shivram Krishnan: > >> Thanks guys. >> >> What I am going to ask might be a longsh

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Shivram Krishnan
Hi Antony, I have an ongoing collection of Blacklists since Jan 1 ,2016. This way I would know how long it has stayed on the Blacklist. "Dealing with email "after the event" (especially with regard to blacklists) will give you very different results from dealing with it as it happens, if for no o

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Antony Stone
On Tuesday 31 May 2016 at 15:47:56, Shivram Krishnan wrote: > I am using SA as an oracle for Blacklisting. Our research concerns with > combining multiple sources of blacklist and also consider the historical > importance of an IP in a blacklist to create a very effective master > blacklist. > >

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Reindl Harald
Am 31.05.2016 um 19:25 schrieb Shivram Krishnan: Thanks guys. What I am going to ask might be a longshot. But is it possible for anyone who is running a mailserver to give a list of source of SPAM (recent , anytime this year)and the SA score associated? It will be extremely useful for my rese

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Shivram Krishnan
Thanks guys. What I am going to ask might be a longshot. But is it possible for anyone who is running a mailserver to give a list of source of SPAM (recent , anytime this year)and the SA score associated? It will be extremely useful for my research and credit would be given. Example:- efetunisie.

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Bowie Bailey
On 5/31/2016 1:38 AM, @lbutlr wrote: On May 30, 2016, at 11:06 PM, Shivram Krishnan wrote: 2) I have set a threshold of -10 to see how spamassassin assigns a score for every mail. No. Do not do this. Instead, set this option in your local.cf file: add_header all Report _REPORT_ This will

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Reindl Harald
Am 31.05.2016 um 17:13 schrieb Shivram Krishnan: I might be forced to do this. Take the corpus from Mailinator and manually mark it as SPAM or HAM and use sa-learn to train spamassassin. But this is what is confusing me. doesnt SA use a lot more tags, to determine if it is a SPAM or HAM? does

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Shivram Krishnan
I might be forced to do this. Take the corpus from Mailinator and manually mark it as SPAM or HAM and use sa-learn to train spamassassin. But this is what is confusing me. doesnt SA use a lot more tags, to determine if it is a SPAM or HAM? does this mean that sa-learn is not only for bayes but als

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Antony Stone
On Tuesday 31 May 2016 at 17:02:26, Reindl Harald wrote: > Am 31.05.2016 um 16:59 schrieb Antony Stone: > > > > I had read SA documentation such as > > https://spamassassin.apache.org/full/3.1.x/doc/sa-learn.html > that's all based on opinions - the only question is the quality of > training and

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Reindl Harald
Am 31.05.2016 um 16:59 schrieb Antony Stone: On Tuesday 31 May 2016 at 15:32:49, Reindl Harald wrote: Am 31.05.2016 um 15:28 schrieb Antony Stone: 2. You should be aware (*especially* if using this stuff as the basis of a research project - any competent referee should pick up on something li

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Antony Stone
On Tuesday 31 May 2016 at 15:32:49, Reindl Harald wrote: > Am 31.05.2016 um 15:28 schrieb Antony Stone: > > 2. You should be aware (*especially* if using this stuff as the basis of > > a research project - any competent referee should pick up on something > > like this) that SA works best when the

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Shivram Krishnan
BTW I am using SA as an oracle for Blacklisting. Our research concerns with combining multiple sources of blacklist and also consider the historical importance of an IP in a blacklist to create a very effective master blacklist. Let me give you an example. Suppose an IP address 1.2.3.4 appeared on

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Shivram Krishnan
The data set which i use for bayes consists of both ham and spam. ( https://www.cs.cmu.edu/~./enron/) Lets consider a scenario, where I have a domain and I point it to a mailserver. It might take a while for me to generate 50,000 mails a day ( mailinator provides me this) . I need to embed multipl

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Reindl Harald
Am 31.05.2016 um 15:28 schrieb Antony Stone: 2. You should be aware (*especially* if using this stuff as the basis of a research project - any competent referee should pick up on something like this) that SA works best when the emails it is asked to process are from the same source as it has be

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Reindl Harald
Am 31.05.2016 um 15:21 schrieb Shivram Krishnan: Here is my scenario. I am using SA as a oracle/ground truth for a research project. It is generally hard to get hold of a real time mail corpus nope, just point a cheap domain to a mailserver accepting all incoming stuff and spread some hidden

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Antony Stone
On Tuesday 31 May 2016 at 15:21:19, Shivram Krishnan wrote: > Here is my scenario. I am using SA as a oracle/ground truth for a research > project. Okay. > It is generally hard to get hold of a real time mail corpus Er, what?? > I opted for a service provided by mailinator. > I have also trai

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Shivram Krishnan
Here is my scenario. I am using SA as a oracle/ground truth for a research project. It is generally hard to get hold of a real time mail corpus, so I opted for a service provided by mailinator. Mailinator is a company which provides users with disposable email ID's and it offers an API to obtain th

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Reindl Harald
Am 31.05.2016 um 08:18 schrieb Shivram Krishnan: It is not on production. I am using this to evaluate spamassassin. how will you evaluate something when you slay your setup that way? On Mon, May 30, 2016 at 10:38 PM, @lbutlr mailto:krem...@kreme.com>> wrote: On May 30, 2016, at 11:06 P

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Reindl Harald
Am 31.05.2016 um 04:24 schrieb Shivram Krishnan: I am testing spamassassin on a SPAM/HAM corpus of mails. Spamassassin is not picking up an obvious spam like in this case http://pastebin.com/MbNRNFWy . you sample is mangeled and hence crap it's even damaged because a leading newline frankly y

Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Dave Funk
OK, So you are testing to see how SA scores artificial mail messages. However SA is designed to evaluate real mail messages, not botched fabrications of them, so I don't understand what you are trying to achieve. You have (either deliberately or unknowingly) omitted the necessary information tha

Re: Spamassassin not capturing obvious Spam

2016-05-30 Thread LuKreme
On May 31, 2016, at 00:18, Shivram Krishnan wrote: > It is not on production. I am using this to evaluate spamassassin. You are not testing or evaluating properly when you break the configuration. --

Re: Spamassassin not capturing obvious Spam

2016-05-30 Thread Shivram Krishnan
It is not on production. I am using this to evaluate spamassassin. On Mon, May 30, 2016 at 10:38 PM, @lbutlr wrote: > On May 30, 2016, at 11:06 PM, Shivram Krishnan > wrote: > > 2) I have set a threshold of -10 to see how spamassassin assigns a score > for every mail. > > No. Do not do this. >

Re: Spamassassin not capturing obvious Spam

2016-05-30 Thread @lbutlr
On May 30, 2016, at 11:06 PM, Shivram Krishnan wrote: > 2) I have set a threshold of -10 to see how spamassassin assigns a score for > every mail. No. Do not do this. -- When the routine bites hard / and ambitions are low And the resentment rides high / but emotions won't grow And we're chang

Re: Spamassassin not capturing obvious Spam

2016-05-30 Thread Shivram Krishnan
1) The message is indeed fabricated. I had to generate a RFC 2822 mail from JSON. I am harvesting SPAM mails from mailinator.com (public email's). So that is an error in my generation of the RFC 2822. I did not change it as spamassassin did not assign a score. 2) I have set a threshold of -10 to s

Re: Spamassassin not capturing obvious Spam

2016-05-30 Thread Dave Funk
That message is either a fabrication or something from a messed up system. There's no sign of an IP address (neither IPv4 nor IPv6) in it. There are two identical 'Received:' headers which have '()' where there should be at least the IP address of the incoming connection. This indicates that the

Re: Spamassassin not capturing obvious Spam

2016-05-30 Thread LuKreme
On May 30, 2016, at 20:24, Shivram Krishnan wrote: > I have followed the guidelines on > https://wiki.apache.org/spamassassin/ImproveAccuracy . No, you really haven't. > Content analysis details: (3.9 points, -10.0 required) This makes no sense at all. Either you have set the spam scores neg

Re: Spamassassin not capturing obvious Spam

2016-05-30 Thread Rob McEwen
On 5/30/2016 10:24 PM, Shivram Krishnan wrote: I am testing spamassassin on a SPAM/HAM corpus of mails. Spamassassin is not picking up an obvious spam like in this case http://pastebin.com/MbNRNFWy . Your pastebin example didn't show the "last external" sending IP. Could have have been there o

Spamassassin not capturing obvious Spam

2016-05-30 Thread Shivram Krishnan
Hey guys, I am testing spamassassin on a SPAM/HAM corpus of mails. Spamassassin is not picking up an obvious spam like in this case http://pastebin.com/MbNRNFWy . I have followed the guidelines on https://wiki.apache.org/spamassassin/ImproveAccuracy . Let me know how to catch these type of Spams