Re: Corpus of Spam/Ham headers(Source IP) for research

2016-07-01 Thread Bill Cole
On 29 Jun 2016, at 11:38, Shivram Krishnan wrote: Hello Bill, There has been enough research which has been done in this field were the authors have obtained the data from network operators. This for instance is a

Re: Corpus of Spam/Ham headers(Source IP) for research

2016-06-29 Thread David Jones
>From: Shivram Krishnan <rorryk...@gmail.com> >Sent: Wednesday, June 29, 2016 10:50 AM >To: Antony Stone >Cc: users@spamassassin.apache.org >Subject: Re: Corpus of Spam/Ham headers(Source IP) for research   >Hello Antony, >We will be getting headers from our Univers

Re: Corpus of Spam/Ham headers(Source IP) for research

2016-06-29 Thread Joe Quinn
On 6/29/2016 11:50 AM, Shivram Krishnan wrote: Hello Antony, We will be getting headers from our University. The only reason why we want other list is that we are tailoring Blacklists for specific networks, to see how these blacklists perform. The idea being , your network may not be seeing

Re: Corpus of Spam/Ham headers(Source IP) for research

2016-06-29 Thread Shivram Krishnan
Hello Antony, We will be getting headers from our University. The only reason why we want other list is that we are tailoring Blacklists for specific networks, to see how these blacklists perform. The idea being , your network may not be seeing the same attack vectors as what the USC network

Re: Corpus of Spam/Ham headers(Source IP) for research

2016-06-29 Thread Antony Stone
On Wednesday 29 June 2016 at 17:38:35, Shivram Krishnan wrote: > There has been enough research which has been done in this field were the > authors have obtained the data from network operators. This > eports> for instance is

Re: Corpus of Spam/Ham headers(Source IP) for research

2016-06-29 Thread Antony Stone
On Wednesday 29 June 2016 at 17:35:28, Shivram Krishnan wrote: > We could solve this problem , If you could submit the set of IP's by > anonymising the last octet of the IP addresses. What good is that going to do you in your research project? > Also we could sign an NDA (if you are willing to

Re: Corpus of Spam/Ham headers(Source IP) for research

2016-06-29 Thread Shivram Krishnan
Hello Bill, There has been enough research which has been done in this field were the authors have obtained the data from network operators. This for instance is a paper from UPenn, which has collected over 31 million Mail

Re: Corpus of Spam/Ham headers(Source IP) for research

2016-06-29 Thread Shivram Krishnan
Hey guys, I see there is a lot of concern of revealing the set of Spam IP's and Ham IP's , where one could get to know either a customer - company relation (which may be private) and might generate suphosticated phishing attacks. We could solve this problem , If you could submit the set of IP's

Re: Corpus of Spam/Ham headers(Source IP) for research

2016-06-29 Thread Bill Cole
On 29 Jun 2016, at 1:00, Shivram Krishnan wrote: Hello Bill, Thank you so much for your views. I agree that your customers would not like it if you share information. But Oliver suggested , I need only the source IP addresses of the Spam and Ham emails , which can even be anonymized in the

Re: Corpus of Spam/Ham headers(Source IP) for research

2016-06-29 Thread Rob McEwen
On 6/29/2016 1:00 AM, Shivram Krishnan wrote: Thank you so much for your views. I agree that your customers would not like it if you share information. But Oliver suggested , I need only the source IP addresses of the Spam and Ham emails , which can even be anonymized in the last octet.

Re: Corpus of Spam/Ham headers(Source IP) for research

2016-06-29 Thread Reindl Harald
Am 29.06.2016 um 13:14 schrieb Olivier: Reindl Harald writes: forget the big ones - just filter them out and look at the small ones where PTR/Sender is from the same domain, connect it to your destination domains which are easily to find out and voila you have

Re: Corpus of Spam/Ham headers(Source IP) for research

2016-06-29 Thread Olivier
Reindl Harald writes: > you underestimate the combination "ip from host xyz sent ham to one of > my customers" combined with easy to find customer domains as possible > targets You could/should hide your identity when providing the data. In fact, I am not even sure

Re: Corpus of Spam/Ham headers(Source IP) for research

2016-06-29 Thread Olivier
Reindl Harald writes: > forget the big ones - just filter them out and look at the small ones > where PTR/Sender is from the same domain, connect it to your destination > domains which are easily to find out and voila you have > comapny-to-company relations by looking

Re: Corpus of Spam/Ham headers(Source IP) for research

2016-06-29 Thread Reindl Harald
Am 29.06.2016 um 12:59 schrieb Antony Stone: On Wednesday 29 June 2016 at 12:42:02, Reindl Harald wrote: Am 29.06.2016 um 12:35 schrieb Olivier: Reindl Harald writes: he asked *exactly the same* with "dataset of source IP addresses of emails received" but for a

Re: Corpus of Spam/Ham headers(Source IP) for research

2016-06-29 Thread Antony Stone
On Wednesday 29 June 2016 at 12:42:02, Reindl Harald wrote: > Am 29.06.2016 um 12:35 schrieb Olivier: > > Reindl Harald writes: > >> > >> he asked *exactly the same* with "dataset of source IP addresses of > >> emails received" but for a ton of relations you just need

Re: Corpus of Spam/Ham headers(Source IP) for research

2016-06-29 Thread Reindl Harald
Am 29.06.2016 um 12:35 schrieb Olivier: Reindl Harald writes: Am 29.06.2016 um 06:45 schrieb Olivier: Though I have devised a mechanism to generate these blacklists, I am not finding a suitable evaluation metric. It would be great if somebody could give me a dataset

Re: Corpus of Spam/Ham headers(Source IP) for research

2016-06-29 Thread Olivier
Reindl Harald writes: > [1:multipart/mixed Hide] > > > [1/1:text/plain Hide] > > > > Am 29.06.2016 um 06:45 schrieb Olivier: >>> Though I have devised a mechanism to generate these blacklists, I am >>> not >>> finding a suitable evaluation metric. It would be great if

Re: Corpus of Spam/Ham headers(Source IP) for research

2016-06-29 Thread Reindl Harald
Am 29.06.2016 um 06:45 schrieb Olivier: Though I have devised a mechanism to generate these blacklists, I am not finding a suitable evaluation metric. It would be great if somebody could give me a dataset of source IP addresses of emails received by your network which have been marked as

Re: Corpus of Spam/Ham headers(Source IP) for research

2016-06-28 Thread Shivram Krishnan
Hello Bill, Thank you so much for your views. I agree that your customers would not like it if you share information. But Oliver suggested , I need only the source IP addresses of the Spam and Ham emails , which can even be anonymized in the last octet. Will that still be a privacy concern?

Re: Corpus of Spam/Ham headers(Source IP) for research

2016-06-28 Thread Olivier
Shivram > Though I have devised a mechanism to generate these blacklists, I am > not > finding a suitable evaluation metric. It would be great if somebody > could > give me a dataset of source IP addresses of emails received by your > network > which have been marked as HAM/SPAM by

Re: Corpus of Spam/Ham headers(Source IP) for research

2016-06-28 Thread Bill Cole
On 28 Jun 2016, at 20:33, Shivram Krishnan wrote: Hey Guys, I am a researcher at the University of Southern California ( https://steel.isi.edu/ ), and I have been working on making Blacklists more effective by combining different sources of Blacklists, and creating a Blacklists specific

Corpus of Spam/Ham headers(Source IP) for research

2016-06-28 Thread Shivram Krishnan
Hey Guys, I am a researcher at the University of Southern California ( https://steel.isi.edu/ ), and I have been working on making Blacklists more effective by combining different sources of Blacklists, and creating a Blacklists specific for a particular network. Though I have devised a