Re: Re: rbldnsd blacklist question

2008-09-16 Thread Dallas Engelken

Rob McEwen wrote:
John 
Hardin wrote:

On Tue, 16 Sep 2008, Marc Perkel wrote:

Looking from opinions from people running rbl blacklists.

I have a list that contains a lot of name based information. I'm 
about to add a lot more information to the list and what will happen 
is that when you look up a name you might get several results. For 
example, a hostname might be blacklisted, be in a URIBL list, be in 
a day old bread list, and a NOT QUIT list. So it might return 4 
results like 127.0.0.2, 127.0.0.6, 127.0.0.7, 127.0.0.8.


Is this what would be considered "best practice". My thinking is 
that having one list that returns everything is very efficient.
Isn't general practice to bitmap the last octet if you're going to 
convey multiple pieces of information?


If you have a situation where there might be more than one "answer" 
for a given query, and you are content with having a maximum of 7 
possible answers, then... 


Why just 7?  You have 2 other octets to use..   127.X.Y.Z  - X and Y 
dont have to be zeros...


512 possibilities if you use all the bit on all 3 octets (but I'd avoid 
loopback 127.0.0.1).


448 possibilities if you only count bit 1 settable on octet 2 and 3 (ie 
127.1.1.2)


343 if you avoid setting bit 1 altogether on any octet (ie 127.2.2.2)

--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: Re: rbldnsd blacklist question

2008-09-16 Thread Dallas Engelken

John Hardin wrote:
On Tue, 
16 Sep 2008, Marc Perkel wrote:



Looking from opinions from people running rbl blacklists.

I have a list that contains a lot of name based information. I'm 
about to add a lot more information to the list and what will happen 
is that when you look up a name you might get several results. For 
example, a hostname might be blacklisted, be in a URIBL list, be in a 
day old bread list, and a NOT QUIT list. So it might return 4 results 
like 127.0.0.2, 127.0.0.6, 127.0.0.7, 127.0.0.8.


Is this what would be considered "best practice". My thinking is that 
having one list that returns everything is very efficient.


Isn't general practice to bitmap the last octet if you're going to 
convey multiple pieces of information?




Isnt it simple enough to write the zone file in 2 different formats and 
map them to 2 different zone names to support both bitmasked and 
multiple response if there is value in having both?


URIBL uses bitmasks, but doesnt need to as we dont cross list domains to 
multiple lists.


--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: Re: Incorrect DNSBL evaluation

2008-07-21 Thread Dallas Engelken

Karsten Bräckelmann wrote:

On Mon, 2008-07-21 at 23:17 +0200, Matthias Leisi wrote:
  

Yves Goergen schrieb:



  
What do you mean? My mail server uses the DNS servers of the computing 
centre. What SpamAssassin does, I don't know. The IP addresses are:
  


The same as everyone else... Sic.

  

# cat /etc/resolv.conf
nameserver 213.133.100.100
nameserver 213.133.99.99
nameserver 213.133.98.98
nameserver 213.133.98.97
  

Ah, Hetzner. I had a lot less problems since I started to run my own:

main:~> cat /etc/resolv.conf
nameserver 127.0.0.1



Every Hetzner customer using the same DNS by default? Yeah, that indeed
looks like these DNS servers are being blocked by the BL operators (see
my previous post). Most likely not only URIBL, but every major BL out
there...
  


I have looked, and there are no ACLs on 213.133.0.0/16 whatsoever, so 
its not coming from the uribl mirror side.


Could those DNS servers be monetizers? Have you (Yves) even tried manual 
lookups to see how the ISP DNS server is responding? Do this and report 
your results..


$ dig @213.133.100.100 unclassified.de.multi.uribl.com A

Those NS IPs are not reachable from here, so I cant test to see how they 
respond.


--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: Starting a URIBL - Howto? [OT]

2008-04-29 Thread Dallas Engelken

Rob McEwen wrote:

(on-list follow-up)

By "proactive listings", I discovered in my off-list conversation with 
Dallas that this refers to URIBL-Gold listings... where items are 
listed in "uribl-gold" in advance of seeing them in actual spams. But 
this uribl-gold list isn't available to the public and is not even 
prescribed as a list to use for fighting spam.


We do ask anyone with access to it to use it.  Since its  basically 
uribl black for domains that we believe will show up in future spam 
campaigns, there is no reason not to.  I'm sure there are some on this 
list that can comment further in regards to its effectiveness.


I'm really disappointed that Dallas would have presented that kind of 
comparison to ivmURI. This is like comparing some kid's best 
basketball game on an X-Box to Michael Jordan's best basketball game 
on the court. I'm glad that URIBL-Gold is helping URIBL black get 
better... but until the listing actually makes it into URIBL-Black... 
and is then actually *usable* for blocking spam...


From a RBL  perspective,  the purpose of the data in there is to catch 
the front end of spam runs.  Assuming it takes ~5 minutes to list, 
rebuild, and redistribute new zone data  in reactive mode, we could miss 
50% of a 10 minute campaign.  Obviously the longer the campaign draws 
out, the better the miss rate looks.   But those using gold+black have 
100% hitrates on alot of these campaigns,  which is something that is 
difficult if not impossible to achieve on a reactive blacklist based 
soley on trap data or user feed back.


As you can see at http://www.uribl.com/gold.shtml, over 20% (14k of 57k) 
of the domains that have been listed in gold for hours, days, even 
weeks, have since moved to black.So,  assume each of those 14k 
domains returned NXDOMAIN on black.uribl.com for the first ~5 minutes of 
each of their campaigns, how much spam do you think we missed?  Quite a 
lot I'd say.   That short window is what we are targetting here.   It 
doesnt result in a huge hitrate because it only hits in gold during the 
rebuild and redistribute window, but it does serve its purpose quite well.


Aside from client side spam filtering,  I could see 
registries/registrars, web hosts, ip space owners and the like 
benefiting from this data as well.  Knowing there is potential for abuse 
prior to the abuse actually occurs could be quite a powerful tool.
For example, I can tell you that ns1.tuhaerge.com is the next NS that 
will be spewing up VPXL crapmail 
(http://www.spamtrackers.hk/wiki/index.php?title=VPXL)..That NS and 
every domain registred against that NS should be instantly nuked, but 
getting those Chinese registrars to action anything like this, even with 
proper evidence, is nearly impossible... just think if you asked them to 
kill it before the abuse started.  ;) 


--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: Re: Starting a URIBL - Howto? [OT]

2008-04-29 Thread Dallas Engelken

Rob McEwen wrote:
Dallas 
Engelken wrote:
Yes, of course, but you're results.txt is biased as it only shows 
where imvURI hits.


Based on the last 20k adds to URIBL,  it appears to me that imvURI 
has less coverage?

:

Dallas,

Yes, you are right!

URIBL *does* cast a wider net than ivmURI.

So, in general, I agree with your statement that ivmURI has less 
coverage than URIBL. But I'm confused about your stats... and they 
looks really weird. (but maybe I'm just not understanding them?)


So here is what I did.

I took the last 500 additions to URIBL, (not including geocity and 
blogspot items... so that this comparison would compare apples to 
apples!) I then ran those against ivmURI.


186 of the 500 latest additions to URIBL were also found in ivmURI.

I then reversed this testing and ran URIBL against the last 500 
additions to ivmURI.


328 of the latest 500 additions to ivmURI were listed on URIBL.

So yes, basically, you're right, URIBL does have greater coverage than 
ivmURI.


Your point is well made. For the most part, URIBL casts a wider net 
than ivmURI. Also, if you were to include geocity and blogspot hits, 
of course, that would throw the comparison wildly in URIBL's favor... 
but I'm not so sure that would be a fair comparison.


No, you're right, thats not fair.   If I compare only recent reactive 
listings, minus the subdomain hosters that we list, you hit about 60% 
whereas before it was more like 27%.


imvURI stats from last 5000 URIBL black listings
-> 2981 hits
-> 2019 misses



(In both tests, I checked against the 2nd list just about 2-3 minutes 
after grabbing the lastest data from first list. This is important as 
I was seeing those stats quickly grow for BOTH after my initial 
collection of stats... because items not yet in both lists are 
continuously getting into the other list fast. So timing is mission 
critical in this kind of testing and the time between gathering and 
checking MUST be the same both ways.)


However, I think you missed my point about 
http://invaluement.com/results.txt


I wasn't saying that this proved that ivmURI is better than URIBL or 
SURBL. Only that this proves ivmURI as being *relevant* and *useful* 
...even for those who are already using *both* URIBL and SURBL.  (and 
this is just one such proof!)


you said,

"and ALL 3 catch stuff the other 2 miss... FOR EXAMPLE: 
http://invaluement.com/results.txt )"

your EXAMPLE contradicts the statement that precedes it.  I can only take it in the context of how I read it.  





For example, if ivmURI were only catching stuff already caught by 
URIBL and SURBL, ivmURI wouldn't be relevant or helpful to anyone. 
Moreover, I believe that URIBL or SURBL could easily create a 
similarly impressive page as my http://invaluement.com/results.txt page.


Probably.



Bottom line is that you are correct... AND... I'm sorry you took this 
as me dissing URIBL!




I didnt take it that way.  I was just pointing out that your statement 
didnt match your accompanying example.


Simply put, there are some series of spams that each of the three URI 
blacklists are better at catching than the other two. That is ALL that 
I meant by this.



Okay, if you would have said that, I would have agreed and never posted :)

--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: Re: Starting a URIBL - Howto? [OT]

2008-04-29 Thread Dallas Engelken

Rob McEwen wrote:


 and ALL 3 catch stuff the other 2 miss... FOR EXAMPLE: 
http://invaluement.com/results.txt )


Yes, of course, but you're results.txt is biased as it only shows where 
imvURI hits.


Based on the last 20k adds to URIBL,  it appears to me that imvURI has 
less coverage?


imvURI stats from last 2 URIBL reactive listings.
-> 5519 hits
-> 14481 misses

imvURI stats from last 2 URIBL proactive listings.
-> 351 hits
-> 19649 misses


--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: Re: Looking for hosts to white list

2008-04-23 Thread Dallas Engelken

Benny Pedersen wrote:

On Tue, April 22, 2008 23:47, Marc Perkel wrote:
  

I'm looking for people who are running URI blacklists, but I'm more
interested in your whitelist information. I have an extensive list
myself and looking for partners to swap data with.

but uribl.com have a hidded whitelist, there might be others that have the
point of hide it :)
  


Are you sure you mean uribl.com?

white.uribl.com is a publically available zone.  It is not a part of 
multi.uribl.com, but is available for stand alone queries.


# host -tTXT microsoft.com.white.uribl.com
microsoft.com.white.uribl.com text "Whitelisted, see 
http://lookup.uribl.com/?domain=microsoft.com";


URIBL white hits are also visiable on the lookup form, ie
http://lookup.uribl.com/?d=godaddy.com

We're not scared to show it off, as we dont use it for false remediation 
(for the most part).


--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: OT: uribl.com folks awake?

2008-03-27 Thread Dallas Engelken

Jonathan Nichols wrote:
Sorry for 
the OT. I've been trying to get in touch with whoever is in charge of 
URIBL zonefile mirrors without success.


Is this thing on? Ping me offlist, por favor. I may have just been 
pinging the wrong people.




http://www.uribl.com/contact.shtml
--->  For DNS questions not related to listings.. that includes zone 
information, transfers, outages, etc. Use dnsadmin at uribl dot com 
<mailto:[EMAIL PROTECTED]>.


Have you done that?

--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: Re: util_rb_2tld

2008-03-26 Thread Dallas Engelken

> McDonald, Dan wrote:
>
>
>> On Tue, 2008-03-25 at 16:44 +0100, Yet Another Ninja wrote:
>>
>> util_rb_2tld  by.ru
>> util_rb_2tld  tripod.com
>
> So, the man page is wrong?
> [luser  sa ~]$ man Mail::SpamAssassin::Conf
> /util_rb_2tld
> [...]
>util_rb_2tld 2tld-1.tld 2tld-2.tld ...

No, I dont think this was a message regarding util_rb_2tld usage format.

I think the point he was making was that if you add by.ru and tripod.com 
to your util_rb_2tld config it will help filter spam abusing those hosts.


hotmail.ru would be another one.. as the tripod spammers started hitting 
hotmail.ru with it today.


+---+-+
| domain| seen|
+---+-+
| skn24n.hotmail.ru | 2008-03-26 11:41:46 |
| fe0ky.hotmail.ru  | 2008-03-26 11:35:45 |
| xyw7dgf.hotmail.ru| 2008-03-26 11:33:21 |
| mmyjolyn.tripod.com   | 2008-03-25 14:48:51 |
| taviamarya.tripod.com | 2008-03-25 14:47:17 |
| roljanna.tripod.com   | 2008-03-25 14:47:08 |
+---+-+

# host -tTXT skn24n.hotmail.ru.multi.uribl.com
skn24n.hotmail.ru.multi.uribl.com text "Blacklisted, see 
http://lookup.uribl.com/?domain=skn24n.hotmail.ru";


See http://rss.uribl.com/hosters/  for host abuse listings.

--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: Time to make multi.uribl.org optional rather than default?

2008-02-19 Thread Dallas Engelken

Andy Dills wrote:
It appears (from email recently sent to the admins of a few small 
mailservers I help admin) that the people in charge of uribl.com have 
decided to set a pretty low threshold for blacklisting DNS servers from 
querying, demanding that people who hit that threshold pay them a rather 
exorbitant rate for a data feed.
  


Demanding?  I believe the first thing that excessive query volume email 
tells you is to simply shut it off and be done.   The data feed option 
is just that, an option.   If you see no value in it, then you wont be 
missing anything by us not answering your queries.


I have judged this threshold to be low based on the size of some of the 
mail/dns servers whose admins have gotten this email, along with the fact 
that this is the only blacklist to have taken this obnoxious stance.


  


What is your definition of low volume?  db2.xecu.net + 	dns02.xecu.net 
accounts for nearly 500k queries/day (~3GB of data/mo).


There are over 40k unique IP that query URIBL public dns.  As any mirror
operator can see, we have around 180 IPs in the ACL.   So thats ~0.45%. 
 And those 180 blocked IPs consist of far fewer organizations/companies 
as many have more than 1 IP on that list.


Filtering the top 0.45% IPs results in 20% fewer queries/second to the 
mirrors.  I dont see trying to limit excessive bandwidth usage on 
donated mirrors as an "obnoxious stance".



because right now the default inclusion of tests against 
multi.surbl.com is in reality just a "trial service" and an opportunity 
for this for-profit organization to create revenue streams.


  


If you remove it from SA by default, you're doing so at the expense of 
the other 99.55%.


We asked you to shut off your queries on 2007-12-27 19:15:09.  Nearly 3 
months later and we still saw the same high volume queries from your 
systems.



I really don't care much either way, for me it's a done deal, I'm 
disabling the tests on my mail servers and advising others to do the same.
I'm just wondering if the community at large is aware of this and has an 
opinion.
  


Superb.  Thats all you had to do in the first place without raising a stink.

If SA wants to completely remove uribl.com tests because we dont allow 
the heavy hitters to query the public mirrors,  thats their choice.


Although, the usage policy for Spamhaus 
(http://www.spamhaus.org/organization/dnsblusage.html) doesnt prevent 
inclusion of RCVD_IN_SBL in SA.


Thanks,

--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com




Re: Re: Can anyone help me? surbl.org FP problems?

2008-01-31 Thread Dallas Engelken

John Hardin wrote:

On Tue, 2008-01-29 at 15:25 -0800, John Hardin wrote:
  

On Tue, 2008-01-29 at 17:51 -0500, Matt Kettler wrote:


Perhaps Verizon is screwing up their DNS?

Ahh, yes they are:

http://www.freedom-to-tinker.com/?p=1227
  

Hrm.

As a troubleshooting hack for this increasingly-common "feature",
perhaps a URIBL/DNSBL rule could be defined that checks a domain that
will *never* be in the zones (apache.org maybe) and if it ever hit then
add -20 to the score (to override all the FP hits) and emit a warning to
inspect your DNS service for ISP hijacking? 



...duh, that won't work. Where would the domain name to test come from?

Perhaps a check for ISP DNS tomfoolery could be put in the --lint checks
somehow?

  


Or better yet, just fix the URIBLDNS plugin code to expect responses 
matching ^127\.


Anything else is a dns monetizer.

--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: URIWhois-0.02

2007-09-26 Thread Dallas Engelken

Robert - elists wrote:

DOB, for example, is run by ar.com, who are a registrar.  Since they are a
domain registrar, they have full, direct access to the whois database.

Jeff C.




Well there ya go Jeff...

Become a registrar and bam! More data to help you cause

  


Thats the easy answer, but do you know what it costs to become a registrar?

Just for com/net from verisign you have $6500 up front, and $4k 
recurring.   To get your icann credentials, you have $2500 up front with 
application,  $4k yearly.  A variable fee to icann once you start 
registering domains,  and obviously the $0.25/registration that goes to 
icann.   You also have to be able to show $70,000 in  working capital.


And that only gets you com and net.   I'd want org from pir,  info from 
afilias,  and any other tld that takes alot of abuse (cn is big right now).


At least those fees keep the bad guys from becoming registrars too.

--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: fdf spam

2007-08-10 Thread Dallas Engelken

David B Funk wrote:

On Sat, 11 Aug 2007, wolfgang wrote:

  

In an older episode (Friday, 10. August 2007), Mike Cisar wrote:


Has anyone else been seeing the empty-body "PDF" spam, but with a
.fdf file extension.  Had a whole pile in my inbox here this morning.
  

Thousands of them went through our mail gateways at work. A typo in some
bot?



No, merely the next episode in the never-ending spam-wars saga.

A ".fdf" file is yet another Adobe file type and double-clicking on one
(in a Windows box) will launch Acrobat-reader and display its contents.
However anti-spam weapons such as PDFinfo are explicitly coded to look
for ".pdf" files, thus ".fdf" is given a pass.
This shows the cleverness behind (at least some of) the spammers.

A quick edit will update PDFinfo to check ".fdf" files too.

  


that was done this morning if you want to grab a new version...
http://www.rulesemporium.com/plugins/PDFInfo.pm

--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: Detecting short-TTL domains?

2007-08-10 Thread Dallas Engelken

Jared Hall wrote:
Great overview on DNS and Net::DNS.   While there is a 
difference between RR and zone TTL times, my 
observation was based upon Zone SOA TTL records of 
recent spamvertized URIs in Emails.  

  


Well then that is even simpler...  using the same code below..

&lookup('uribl.com', 'SOA');

and add support for SOA type in the lookup() sub.


   # support other rr types below...
   if ($type eq 'SOA') {
 print "$host $ttl IN $a ", $a->minimum, "\n";
   }

above, the $ttl would still be the ttl on the SOA answer record (possibly cached), 
and $a->minimum would be the ttl found in the SOA.

from the docs,
minimum() Returns the minimum (default) TTL for records in this zone.






There is nothing wrong with using URI BLs.  But most
URI BLs are simply triggered from  a "problem" that 
somebody else already had.  It still seems to me that

the problems presented by Fast-Flux systems can be
mitigated by some coding relevant to current statistical 
norms.


While I have no doubt that Dallas is technically accurate,
I'm wondering if there is a Net::DNS function that can be 
used to extract zone SOA TTL: values (at least until 
Joe Spammer starts tweaking individual RRs)?



Jared Hall
General Telecom, LLC.





On Friday 10 August 2007 13:59, Dallas Engelken wrote:
  

John Rudd wrote:


I'm a prophet now!?

:-)

Hm.  So, I'm sure I can figure this out eventually, but does anyone
know the right Net::DNS way to extract the TTL?
  

Net::DNS::RR has a ttl() function.

# perl ttl_test
Lookup: A www.uribl.com
www.uribl.com 591 IN A 209.200.135.149



use Net::DNS;

my $res = Net::DNS::Resolver->new;
&lookup('www.uribl.com', 'A');
exit;

sub lookup {
  my ($host, $rr) = @_;
  print "Lookup: $rr $host\n";

  my $packet = $res->send($host, $rr);
  return unless $packet;

  my $header = $packet->header;
  return if ($header->rcode =~ m/NXDOMAIN|SERVFAIL|REFUSED/i);

  my @answer = $packet->answer;
  foreach $a (@answer) {
my $type = $a->type;
my $ttl  = $a->ttl;
if ($type eq 'A') {
  print "$host $ttl IN $a ", $a->address, "\n";
}
# support other rr types below...
  }
}


Note that Net::DNS returns the ttl from the answer record, which means
if you have a caching nameserver, your ttl may be lower than the value
returned from the authoritative nameservers.  Pulling a ttl from an SOA
wont work either, as ttl can be set per RR.   The only proper way to do
this is to perform a lookup, set the $res->nameservers() to those from
the $packet->authority and re-run the query.  That will give you
authoritative results, and the ttl will be the proper one.

Something like this...

 @authority = $packet->authority;

 if (scalar @authority) {
   @ns=();  # reset nameservers...
   foreach my $a (@authority) {
 my $type = $a->type;
 my $s = $a->rdatastr;
 if ($type =~ m/ns/i) {
   $s=~s/\.$//;
   push(@ns,$s);
 }
       }
   $res->nameservers(@ns);
 }

Pulling authoritative results can be quite slow, so you may want to
alarm it to prevent timeouts from hanging you up.




--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: Detecting short-TTL domains?

2007-08-10 Thread Dallas Engelken

John Rudd wrote:


I'm a prophet now!?

:-)

Hm.  So, I'm sure I can figure this out eventually, but does anyone 
know the right Net::DNS way to extract the TTL?




Net::DNS::RR has a ttl() function.

# perl ttl_test
Lookup: A www.uribl.com
www.uribl.com 591 IN A 209.200.135.149



use Net::DNS;

my $res = Net::DNS::Resolver->new;
&lookup('www.uribl.com', 'A');
exit;

sub lookup {
 my ($host, $rr) = @_;
 print "Lookup: $rr $host\n";

 my $packet = $res->send($host, $rr);
 return unless $packet;

 my $header = $packet->header;
 return if ($header->rcode =~ m/NXDOMAIN|SERVFAIL|REFUSED/i);

 my @answer = $packet->answer;
 foreach $a (@answer) {
   my $type = $a->type;
   my $ttl  = $a->ttl;
   if ($type eq 'A') {
 print "$host $ttl IN $a ", $a->address, "\n";
   }
   # support other rr types below...
 }
}


Note that Net::DNS returns the ttl from the answer record, which means 
if you have a caching nameserver, your ttl may be lower than the value 
returned from the authoritative nameservers.  Pulling a ttl from an SOA 
wont work either, as ttl can be set per RR.   The only proper way to do 
this is to perform a lookup, set the $res->nameservers() to those from 
the $packet->authority and re-run the query.  That will give you 
authoritative results, and the ttl will be the proper one. 


Something like this...

@authority = $packet->authority;

if (scalar @authority) {
  @ns=();  # reset nameservers...
  foreach my $a (@authority) {
my $type = $a->type;
my $s = $a->rdatastr;
if ($type =~ m/ns/i) {
  $s=~s/\.$//;
  push(@ns,$s);
}
  }
  $res->nameservers(@ns);
}

Pulling authoritative results can be quite slow, so you may want to 
alarm it to prevent timeouts from hanging you up.


--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: New PDF?

2007-07-22 Thread Dallas Engelken

WebTent wrote:

I have a few PDF's getting through now after doing pretty good, the
latest 0.4 pdfinfo + sa 3.1.7 + sare rules + sa-update is not scoring
enough on these:

  


Current version is v0.6.   And sigs for those were added last 
Thursday...   


http://esmtp.webtent.net/mail1.txt
  


   *  0.6 GMD_PDF_ENCRYPTED BODY: Attached PDF is encrypted
   *  2.0 GMD_PDF_FUZZY2_T11 BODY: Fuzzy tags Match
   *  5A4CB7600371063164BB7AFA6EDE7FE9
   *  0.2 GMD_PDF_EMPTY_BODY BODY: Attached PDF with empty message body
   *  3.0 GMD_PDF_STOX_M4 PDF Stox spam


http://esmtp.webtent.net/mail2.txt

  

   *  2.0 GMD_PDF_FUZZY2_T9 BODY: Fuzzy tags Match
   *  875C8F0810E6524EF0C3A7C4221A4C28
   *  0.6 GMD_PDF_ENCRYPTED BODY: Attached PDF is encrypted
   *  0.2 GMD_PDF_EMPTY_BODY BODY: Attached PDF with empty message body
   *  3.0 GMD_PDF_STOX_M4 PDF Stox spam

--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: PDF spam

2007-07-19 Thread Dallas Engelken

R.Smits wrote:

Matt Kettler wrote:
  

Tarak Ranjan wrote:


greetings,
i'm getting pdf attached spam. please help me stop that using
spamassassin...

Horacio_FILE_506292_6906.pdf

/tarak

  
  

The PDFInfo plugin from rulesemporium is designed for this kind of thing.

http://www.rulesemporium.com/plugins.htm

Personally, I've been able to keep them under control with good bayes
training, automated training by spamtraps, and a selective greylist, so
I have not yet tried this plugin.





Plugin seems to work great, but is it stable enough for big production
environments ? Any issues ?

  


I've heard of no performance problems.. Its only going to run on 
messages with mime parts that it belives contains pdf anyways...  so 
what is that, <1% of the time.


--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: Errors with PDFInfo.pm

2007-07-17 Thread Dallas Engelken

Wolfgang Zeikat wrote:

Hello again,

On 07/12/07 16:22, Dallas Engelken wrote:

Wolfgang Zeikat wrote:
I noticed that some of the latest pdf spam mails do not contain a 
filename in the mime headers, could that be a reason for the above 
behaviour?



Possibly, but seeing that line 300 is just a dbg() line itself, you 
can either comment it out, or change it to something that will not 
through a warn.


   # dbg("pdfinfo: found part, type=$type file=$name cte=$cte");
   dbg("pdfinfo: found part, type=".($type ? $type : '')." 
file=".($name ? $name : '')." cte=".($cte ? $cte : '')."");




Thanks, that fixed those. Lately, I see a lot of:
Jul 17 14:27:10 spamlock2 spamd[9786]: Use of uninitialized value in 
concatenation (.) or string at /etc/mail/spamassassin/PDFInfo.pm line 
272,  line 1579.
Jul 17 14:27:10 spamlock2 spamd[9786]: Use of uninitialized value in 
hash element at /etc/mail/spamassassin/PDFInfo.pm line 283,  
line 1579.


Line 272 is (after the earlier changes):
dbg("pdfinfo: MD5 results for ".($name ? $name : '')." - md5=$md5 
fuzzy1=$fuzzy_md5 fuzzy2=$tags_md5");


Line 283 is:
$pms->{pdfinfo}->{fuzzy_md5}->{$tags_md5} = 1;



I'd say $tags_md5 is undef then which is odd because if it made it 
that far, then the message has a pdf in it and all pdfs have tag 
structures.


Got samples that make that warn appear?

--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: Who can tell me where the latest sa-stats can be found.

2007-07-16 Thread Dallas Engelken

Steven W. Orr wrote:
I used to use it but it's old and has bugs. I recent;y found out that 
it's *not* part of the sa distro. Is this still supported and if so, 
where do I get it?


I looked around and found hugely conflicting version info. e.g., 
version 0.93 seems to support sa-3.1.x but version 1.03 seems to be 
for sa-3.0.
(BTW, they both seem to be dated 2007-01-30 at 
http://rulesemporium.com/programs/

)


what the hell are you reading?

http://rulesemporium.com/programs/sa-stats-1.0.txt   =  v1.03  is the 
latest, for SA 3.1


# version: 1.03
# author:  Dallas Engelken <[EMAIL PROTECTED]>
# desc:Generates Top Spam/Ham Rules fired for SA 3.1.x installations.


http://rulesemporium.com/programs/sa-stats.txt = v0.93, for  SA 3.0

# version: 0.93
# author:  Dallas Engelken <[EMAIL PROTECTED]>
# desc:Generates Top Spam/Ham Rules fired for SA 3.x installations.


I havent touched them for a while and havent checked if v1.03 even works 
with SA 3.2.   If something needs to be done, let me know.


--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: PDFText Plugin for PDF file scoring - not for PDF images

2007-07-13 Thread Dallas Engelken

James MacLean wrote:

Hi folks,

Regrets if this is the wrong list.

Wanted to be able to score on text found in PDF files. Did not see any 
obvious route, so made a plugin that calls XPDF's pdfinfo and 
pdftotext to get the text that is then scored.


Sample local.cf could be :

pdftotext_cmd /usr/local/bin/pdftotext
pdfinfo_cmd /usr/local/bin/pdfinfo
body PDF_TO_TEXT 
eval:check_pdftext("^Error","sex","drugs",'Title:\s+stock_tmp.pdf:4','Creator:\s+OpenOffice.org 
1.1.4:4')


Notice that a :4 gives a find of that regex 4 points.

Really don't know if this was the right road to follow, as I copied 
the AntiVirus.pm and came up with this:

http://support.ednet.ns.ca/SpamAssassin/PDFText.pm

So far... it appears to work as expected and didn't take down a pretty 
busy server ;).


Enjoy hearing any positive criticisms :).


I did this the other day with CAM::PDF, but Theo recommended this work 
should be done in the post_message_parse() plugin call.   Then you could 
just write body rules against the text, uris would get checked by 
uribldns plugin, etc


--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: New spam getting by PDFInfo?

2007-07-13 Thread Dallas Engelken

McDonald, Dan wrote:

On Fri, 2007-07-13 at 12:28 -0400, Robert Fitzpatrick wrote:
  

Just verified a couple of PDF attachments getting through with our
PDFInfo rules. Can someone test these to see if my PDF rules are working
or if you're able to block? I believe the rules are working as the
latter message is hitting one, just not enough to block. I tried my
access to the PDFInfo link sent to me by the webmaster to see if there
was an update, but it is not working now :(



running pdfinfo 0.3, I see the first one being analyzed, but not stopped
by the pdfinfo rule:

  


there is a more current version than 0.3 that probably hits these. when 
i tried to access the urls, they were already gone, but i'd guess they 
were the ones that used 'pdf crypt'


--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: Rulesemporium

2007-07-13 Thread Dallas Engelken

John D. Hardin wrote:

On Fri, 13 Jul 2007, Christopher X. Candreva wrote:

  

On Fri, 13 Jul 2007, John D. Hardin wrote:


Is there some reason pointing everyone at the coral cache of the 
website won't work? Granted, coral is also intended for large files, 
but it is distributed and is almost transparent...
  

Well right now, www.rulesemporium.com came up in a few seconds
directly, and took over a minute via the Coral Cache.

So I would answer "because it doesn't help, and slows things down
in fact".



The initial retrieval of the cached pages *does* require a regular 
connection to the primary website, so the coral network would be just 
as impacted by a DDoS as regular users are. However, once it has its 
copy response should be quite fast. I just tried it and it took just a 
few seconds, whereas I haven't been able to get directly to the 
primary website at all for a week or more.
  



Hi John,

Prolexic says...


If you could ask any users with connectivity issues to submit a 'host
www.rulesemporium.com' and 'tcptraceroute www.rulesemporium.com' along"
with a complaint of connectivity problems, that would be very helpful. 



So, if you want to send that to me, I can get the info to them so they 
can get to the bottom of it.


--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: Rulesemporium

2007-07-12 Thread Dallas Engelken

Anders Norrbring wrote:

Henrik Krohns skrev:

On Wed, Jul 11, 2007 at 07:44:37PM -0400, Phil Barnett wrote:
We can't be the first people to come up against this problem. How 
have others solved it?


Bunch'o'Mirrors? Crude and effective.



*raise a hand* I volonteer to mirror, I have lots of both hd and bw 
capacity to spare.


Sure, until you get your first DDoS... 

SURBL had like 10 mirrors for www when they started getting the ddos, 
and all of them took over 200mbit/s.. some upwards of 450mbit.   URIBL 
had 3, and Spamhaus has 2 that I know of.   If they can ddos at well 
over 3gbit/s (15*200),  it really doesnt matter how many damn mirrors 
there are.  Even if your mirror providers would take 20mbit/s each and 
not null route your ass, you'd need well over 150 mirrors.


I do not believe "Bunch'o'Mirrors" is "the solution".It may be all 
fine and good for distribution of load/bandwidth, but thwarting off ddos 
it is not.


The proper solution would be to dismantle the botnets that are capable 
of mass ddos.  Some ISPs need to gain a clue, step it up, and do their 
part to cut off access to infected PCs.


--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: Errors with PDFInfo.pm

2007-07-12 Thread Dallas Engelken

Wolfgang Zeikat wrote:

Hi,

On 07/12/07 15:39, Robert Schetterer wrote:

> Hi, @ll
> the newest version of pdfinfo plugin
> matched some new pdf spam right now
>
> *  2.0 GMD_PDF_FUZZY2_T3 BODY: Fuzzy MD5 Match
> *  3D4E25DE4A05695681D694716D579474
>

yes it does that here too in SA 3.1.8, but I get errors like:

Jul 12 15:59:53 spamlock3 spamd[13136]: Use of uninitialized value in 
concatenation (.) or string at /etc/mail/spamassassin/PDFInfo.pm line 
300,  line 532.
Jul 12 15:59:53 spamlock3 spamd[13136]: Use of uninitialized value in 
concatenation (.) or string at /etc/mail/spamassassin/PDFInfo.pm line 
261,  line 532.
Jul 12 15:59:53 spamlock3 spamd[13136]: Use of uninitialized value in 
concatenation (.) or string at /etc/mail/spamassassin/PDFInfo.pm line 
262,  line 532.


I noticed that some of the latest pdf spam mails do not contain a 
filename in the mime headers, could that be a reason for the above 
behaviour?


Possibly, but seeing that line 300 is just a dbg() line itself, you can 
either comment it out, or change it to something that will not through a 
warn.


   # dbg("pdfinfo: found part, type=$type file=$name cte=$cte");
   dbg("pdfinfo: found part, type=".($type ? $type : '')." 
file=".($name ? $name : '')." cte=".($cte ? $cte : '')."");


Thanks,

--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: Rulesemporium

2007-07-11 Thread Dallas Engelken

Robert - eLists wrote:

Praise God Almighty!

We were able to spend more than a few seconds and many click on the
rulesemporium website.

Awesome.

As it says, was it moved over to vr.org ???

  


A couple years ago...  yup.   Which is now netactuate.com

--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: PDFInfo plugin with SA 3.1.7

2007-07-11 Thread Dallas Engelken
   GMD_PRODUCER_GPL85s/0h of 10767 corpus (9986s/781h 
AxB2-TRAPS) 07/11/07
# countsGMD_PRODUCER_POWERPDF   0s/0h of 10767 corpus (9986s/781h 
AxB2-TRAPS) 07/11/07
# countsGMD_PRODUCER_POWERPDF   0s/0h of 5641 corpus (4064s/1577h 
AxB-MANUAL) 07/11/07
# countsGMD_PDF_STOX_M1 159s/0h of 6132 corpus (555s/1577h 
AxB-MANUAL) 07/11/07
# countsGMD_PDF_STOX_M1 40s/0h of 11773 corpus (10988s/785h 
AxB2-TRAPS) 07/11/07
# countsGMD_PDF_STOX_M2 223s/0h of 6132 corpus (555s/1577h 
AxB-MANUAL) 07/11/07
# countsGMD_PDF_STOX_M2 29s/0h of 10767 corpus (9986s/781h 
AxB2-TRAPS) 07/11/07


--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: Re: So what about rulesemporium.com and these anti-PDF rules?

2007-07-04 Thread Dallas Engelken

Henrik Krohns wrote:

On Wed, Jul 04, 2007 at 10:08:29AM +0100, Justin Mason wrote:
  

Bear in mind that the spammer who is developing this PDF spam is only one
person, and he/she probably has at least one non-spammy-looking email
address at his disposal.

What's to spot him/her from asking Dallas for a copy of the ruleset and
plugin, same as any other SpamAssassin user, waiting a few days to cover
his/her tracks, then fixing the spam to avoid it again?

And if you think this isn't already happening, I have a bridge for sale ;)



If I was a spammer, I couldn't care less if few people were using some
secret PDF blocking stuff. It's not like AOL or some big companies are using
it. :)
  


Based on that logic, it makes no difference if it gets released or not

You dont think big companies utilize SpamAssassin, SARE, or other open 
source products for solutions, or even ideas for similar solutions?   I 
think you would be pleasantly surprised.


--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: Re: So what about rulesemporium.com and these anti-PDF rules?

2007-07-03 Thread Dallas Engelken

Jason Haar wrote:

Theo Van Dinter wrote:
  

All in all, you're better off just making things public.
  


I agree. It's sort of like saying that Open Source cannot work as a
model in the antivirus/antispam arena...
  


It can, if you have the people willing to contribute new dats on every 
revision of .



...and it may be true - but no-one on this list believes it ;-)
  


The method used in the plugin is very simple, and very easy to work 
around if made public.   What happens here is that when that 
"workaround" occurs, we have to release a new plugin, and a new 
ruleset.  Its not like we just release a new ruleset, someone runs 
RDJ/sa-update and they are off.There is no way to auto-update the 
plugin (currently) besides to announce it and hope people install it.   
I foresee a major failure there.


If you think you can improve it so that the plugin remains static, and 
only the rules need changing, then be my guest...


--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: RE: So what about rulesemporium.com and these anti-PDF rules?

2007-07-03 Thread Dallas Engelken

Chris Santerre wrote:


You didn't miss anything. I don't believe they are released yet. FInal 
testing being done. Results look great. I'll see if they can get 
released soon.


--Chris

> -Original Message-
> From: Michal Jeczalik [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, July 03, 2007 9:47 AM
> To: users@spamassassin.apache.org
> Subject: So what about rulesemporium.com and these anti-PDF rules?
>
>
> It's been announced that these rules are coming soon and...?
> Or maybe I
> missed something?



The PDFInfo.pm and accompanying ruleset will not be public.  If you want 
it, please go to

http://www.rulesemporium.com/plugins.htm#pdfinfo and request it.

I'll try and get PDF support added into ImageInfo.pm soon, but it will 
only extend the capabilities that you currently have for gif/jpg/png... 
that being attachment count, file name matching,  pdf image dimensions, 
pixal coverage (area), etc.However, thats not an ideal solution, and 
the rules you can write with that will stop the spam, but also have a 
greater chance of falsing.


The mechanism used for accurate detection in the PDFInfo plugin is not 
going to be a part of this..   and I'd recommend you request the plugin 
and use it privately.   If the information gets publicized,  that method 
would soon be useless...  and I dont feel like reworking it if I dont 
have to, nor maintaining a ruleset that is highly dependent on the 
plugin.   Updates to the ruleset could very well mean updating the 
plugin, and you cant get people to update a plugin en masse as easy as 
you can get them to RDJ a new ruleset.  :)


--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: RulesDuJour lint failed. Updates rolled back.

2007-06-28 Thread Dallas Engelken
This must be an issue that needs to be raised with Prolexic, as they are 
doing the DDoS protection for rulesemporium.com.


Can anyone reproduce this redirect outside of RDJ, and give me a dump of 
the full transaction including http headers?


I'd rather fix the actual problem and not patch around it.

Thanks,
Dallas


Lindsay Haisley wrote:

This problem is probably due to the way Rules Emporium is handling
traffic.  If requests come too fast from the same address, or if their
server is busy, they send an HTML redirect page instructing the client
to try again in 0.1 second.  Curl and wget don't understand "" and simply store the refresh page as the
output of the request.  rules_du_jour is just a shell script so a proper
fix should be pretty easy.  The following is a quick and dirty patch
which sort of solves the problem, at least for the next run of
rules_du_jour.

  
--- /root/rules_du_jour.orig2007-06-17 21:01:24.0 -0500
+++ /var/lib/spamassassin/rules_du_jour 2007-06-18 12:37:44.0 -0500
@@ -907,6 +907,8 @@
 [ "${SEND_THE_EMAIL}" ] && echo -e "${MESSAGES}" | sh -c "${MAILCMD} -s 
\"RulesDuJour Run Summary on ${HOSTNAME}\" ${MAIL_ADDRESS}";
 fi
 
+grep -il 'META HTTP-EQUIV' ${TMPDIR}/*|xargs -n1 rm -f 
+

 cd ${OLDDIR};
 
 exit;

  

rules_du_jour will still fail, but this will clean up the mess and next
time (hopefully) it'll run properly.  A proper fix would sense when this
happens and retry the download after a suitable short wait.  It may also
be helpful to insert some "sleep .5" instructions at appropriate points
(or "sleep 1" if your implementation of sleep(1) doesn't understand
floating point numbers).


On Thu, 2007-06-28 at 11:22 +0100, Nigel Frankcom wrote:
  

On Wed, 27 Jun 2007 16:42:39 -0400, "Daryl C. W. O'Shea"
<[EMAIL PROTECTED]> wrote:



Nigel Frankcom wrote:
  

On Wed, 27 Jun 2007 08:48:02 -0400, David Boltz <[EMAIL PROTECTED]>
wrote:



I?ve been getting the lint failures found below on my Rules Du Jour
updates for a few weeks now.  Yes this would be since the DDoS attacks
on rulesemporium.  It looks like the same problem people have been
having with the tripwire but for me it?s the adult and since just
recently the spoof rules. The solutions I've seen don't seem to work
for me. I see that my cron job (run nightly) is pulling some HTML
source instead of the rules.  I?ve tried removing the faulty
70_sare_adult.* from etc/mail/spamassassin/RulesDuJour/ and manually
replacing it with the ?actual? file using wget.  I?ve even manually
updated the used /etc/mail/spamassassin/70_sare_adult.cf to ensure
that it was correct.  When I us ?wget
http://rulesemporium.com/rules/70_sare_adult.cf? to grab the file it
works without problems. Does anyone have any ideas on how I might fix
this problem?


***WARNING***: spamassassin --lint failed.
Rolling configuration files back, not restarting SpamAssassin.
Rollback command is:  mv -f /etc/mail/spamassassin/70_sare_adult.cf
  

The quick cure is to delete anything in the
/etc/mail/spamassassin/RulesDuJour/ directory and rerun RDJ by hand.

That worked for me on CentOS 4.5

The bug has been reported and a fix is due in 3.2.2 I believe.


Huh?  What's SA have to do with RDJ triggering Prolexic's DoS protection?

  

Daryl is right, there is no fix due in 3.2.2 - I got the RDJ and the
sa-update errors confused. I guess maybe I should dye my hair blonde.

Apologies for any confusion I've caused.

Kind regards

Nigel




--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: Spam PDF

2007-06-28 Thread Dallas Engelken

Robert Schetterer wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Dallas Engelken schrieb:
  

John Thompson wrote:


Raymond Myren wrote:

 
  

Just today I started receiving spam mails with attached .pdf files with
a spam image.
Any ideas how to stop this spam type?



Nothing, yet. But since these appear to be an image file encapsulated in
a .pdf, it may be possible to get FuzzyOCR to parse them for spam text.

  
  

As was stated earlier...

Until its publicly released, you can request a solution from SARE with a
simple email via the information at
http://www.rulesemporium.com/plugins.htm#pdfinfo



Hi Dallas,
i am lucky to report that your rules matched
all pdf spam ( i had 4 ) caught in the past at my servers
good work!


  


Good, as expected.   Thanks for the feedback.

--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: Spam PDF

2007-06-28 Thread Dallas Engelken

John Thompson wrote:

Raymond Myren wrote:

  

Just today I started receiving spam mails with attached .pdf files with
a spam image.
Any ideas how to stop this spam type?



Nothing, yet. But since these appear to be an image file encapsulated in
a .pdf, it may be possible to get FuzzyOCR to parse them for spam text.

  


As was stated earlier...

Until its publicly released, you can request a solution from SARE with a 
simple email via the information at 
http://www.rulesemporium.com/plugins.htm#pdfinfo


--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: pdf spam solution idea

2007-06-27 Thread Dallas Engelken

arni wrote:

Hi,

its come up several times now that people ask for a way to directly 
detect pdf spam by the pdf content and not only through headers or 
other means (hashes, bayes).
I've found a solution that should be pretty easy to realise in a 
Fuzzy-OCR like plugin. Here is what it should do:


Use xpdf (http://www.foolabs.com/xpdf/download.html) to read the pdf 
document

export the images to ppm files using `pdfimages`
export the text parts to a simple text using `pdftotext`

This plugin should run as one of the first to make the raw text read 
available (for example by attaching it as an extra mime part or 
somehow internally) as well as make the images available to FuzzyOCR 
or similar by the same means as above.


Unfortunately i wont be able to write such a plugin myself, it should 
be rather easy to do but i cant start to learn pearl just for this ;-)


I already have... I'll be releasing the info soon.

--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: Spam PDF

2007-06-27 Thread Dallas Engelken

Raymond Dijkxhoorn wrote:

Hi!


We just caught one:

Content analysis details:   (5.0 points, 4.0 required)

pts rule name  description
-  --
- --
0.6 SPF_SOFTFAIL   SPF: sender does not match SPF record
(softfail)
0.4 BAYES_60   BODY: Bayesian spam probability is 60 to 80%
   [score: 0.7404]
2.2 TVD_SPACE_RATIOBODY: TVD_SPACE_RATIO
0.9 RCVD_IN_SORBS_DUL  RBL: SORBS: sent directly from dynamic IP
address
   [201.32.227.251 listed in dnsbl.sorbs.net]
0.9 RCVD_IN_PBLRBL: Received via a relay in Spamhaus PBL
   [201.32.227.251 listed in zen.spamhaus.org]


Jun 27 14:50:03 vmx80 MailScanner[4491]: Message l5RCnxP8019756 from 
212.127.254.149 ([EMAIL PROTECTED]) to quicknet.nl is spam, 
SpamAssassin (not cached, score=24.191, required 5, BAYES_50 0.00, 
BODY_EMPTY 0.50, GMD_PDF_BAD_FUZZY 20.00, GMD_PDF_HORIZ 0.25, 
GMD_PDF_STOX 1.00, PROLO_NO_URI 0.01, RCVD_IN_WHOIS_BOGONS 2.43)


Dallas rocks!



The cats out of the bag now!   :)

More details on this will be made available later today hopefully.

--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: Status of Spamassassin

2007-06-13 Thread Dallas Engelken

The Doctor wrote:

On Wed, Jun 13, 2007 at 07:30:10AM -0500, Dallas Engelken wrote:
  

The Doctor wrote:


Cans rules_du_jour work?


Still getting a no update state.
 
  

SARE is back up (knock on wood).  Delete your .cf files and re-run RDJ...

--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.





I got:


Script started on Wed Jun 13 06:38:41 2007
doctor.nl2k.ab.ca//etc/mail/spamassassin$ rulesdu  _du_jour

exec: curl -w %{http_code} --compressed -O -R -s -S -z 
/etc/mail/spamassassin/RulesDuJour/rules_du_jour http://sandgnat.com/rdj/rules_du_jour 
2>&1

curl_output: 304

Performing preliminary lint (sanity check; does the CURRENT config lint?).

No files updated; No restart required.











Rules Du Jour Run Summary:RulesDuJour Run Summary on doctor.nl2k.ab.ca:



***NOTICE***: /usr/contrib/bin/spamassassin -p 
/usr/contrib/etc/MailScanner/spam.assassin.prefs.conf --lint failed.  This 
means that you have an error somwhere in your SpamAssassin configuration.  To 
determine what the problem is, please run '/usr/contrib/bin/spamassassin -p 
/usr/contrib/etc/MailScanner/spam.assassin.prefs.conf --lint' from a shell and 
notice the error messages it prints.  For more (debug) information, add the -D 
switch to the command.  Usually the problem will be found in local.cf, 
user_prefs, or some custom rulelset found in /etc/mail/spamassassin.  Here are 
the errors that '/usr/contrib/bin/spamassassin -p 
/usr/contrib/etc/MailScanner/spam.assassin.prefs.conf --lint' reported:



[15745] warn: config: failed to parse line, skipping, in 
"/usr/contrib/etc/mail/spamassassin/local.cf": socre FORGED_HOTMAIL_RCVD2 45.0

[15745] warn: config: failed to parse line, skipping, in 
"/usr/contrib/etc/mail/spamassassin/local.cf": socre SARE_URGBIZ 45.0

[15745] warn: config: failed to parse line, skipping, in 
"/usr/contrib/etc/mail/spamassassin/local.cf": terse_report This message came 
for a spam friendly e-mail server.

[15745] warn: config: failed to parse line, skipping, in 
"/usr/contrib/etc/mail/spamassassin/random.cf": 

[15745] warn: config: failed to parse line, skipping, in 
"/usr/contrib/etc/mail/spamassassin/random.cf": 

[15745] warn: config: failed to parse line, skipping, in 
"/usr/contrib/etc/mail/spamassassin/random.cf": 302 Found

[15745] warn: config: failed to parse line, skipping, in 
"/usr/contrib/etc/mail/spamassassin/random.cf": 

[15745] warn: config: failed to parse line, skipping, in 
"/usr/contrib/etc/mail/spamassassin/random.cf": Found

[15745] warn: config: failed to parse line, skipping, in "/usr/contrib/etc/mail/spamassassin/random.cf": 
The document has moved http://www.sa-blacklist.stearns.org/sa-blacklist/random.current.cf";>here.

[15745] warn: config: failed to parse line, skipping, in 
"/usr/contrib/etc/mail/spamassassin/random.cf": 
  



where do you get /usr/contrib/etc/mail/spamassassin/random.cf from?

--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: Status of Spamassassin

2007-06-13 Thread Dallas Engelken

The Doctor wrote:

Cans rules_du_jour work?


Still getting a no update state.
  


SARE is back up (knock on wood).  Delete your .cf files and re-run RDJ...

--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: Rulesemporium down?

2007-06-09 Thread Dallas Engelken

Jerry Durand wrote:

At 09:19 AM 6/9/2007, Dallas Engelken wrote:

Rulesemporium.com will be coming back online at approximately 1800 
GMT.   Special thanks to Prolexic (http://www.prolexic.com) for the 
DDoS protection.



Great news and good work!  I assume we can re-enable sa-update for 
tonight's run.


Thanks for keeping this running.





Yes, I just verified http://www.rulesemporium.com/rules/ is serving data 
now.


--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: Rulesemporium down?

2007-06-09 Thread Dallas Engelken

Yet Another Ninja wrote:

On 6/9/2007 6:50 PM, Jerry Durand wrote:

At 09:19 AM 6/9/2007, Dallas Engelken wrote:

Rulesemporium.com will be coming back online at approximately 1800 
GMT.   Special thanks to Prolexic (http://www.prolexic.com) for the 
DDoS protection.



Great news and good work!  I assume we can re-enable sa-update for 
tonight's run.


Thanks for keeping this running.



Guys

There's really no need to automate RDJ

SARE rules aren't being updated too frequently and any rule change 
will be announced on the list.


Each RDJ empty hit adds to traffic, which, atm , is a precious luxury.

Pls be considerate and help SARE keep the site alive.



Prolexic will be providing proper caching of the rules shortly, so this 
shouldnt be much of an issue going forward.   As long as people would 
keep their automation at 1-2 times a day,  its cool.


--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: Rulesemporium down?

2007-06-09 Thread Dallas Engelken

Yet Another Ninja wrote:

On 6/7/2007 2:52 PM, Jake Vickers wrote:

Steven Stern wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

My systems all were unable to connect for their daily RDJ update
yesterday.  I time out trying to reach http://rulesemporium.com.  Does
anyone know what's happening?
- --
  

Same issue here. 404 errors.


Pls Disable all RDJ till further notice...



Rulesemporium.com will be coming back online at approximately 1800 
GMT.   Special thanks to Prolexic (http://www.prolexic.com) for the DDoS 
protection.


--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: Spamassassin is very slow...

2007-06-08 Thread Dallas Engelken

Sven Schuster wrote:

Hi,

On Fri, Jun 08, 2007 at 12:26:38PM +0200, Devilish Entity told us:
  

On 6/8/07, Theo Van Dinter <[EMAIL PROTECTED]> wrote:


On Thu, Jun 07, 2007 at 03:15:35PM -0700, geist_ wrote:
  

One AMD Unknown 1300MHz processor, 2601.92 total bogomips, 95M RAM


[...]
  

Any help should be usefull...


Get more RAM. :)Seriously, 95M is not really enough for anything these
days, let alone resource intensive apps such as SA.

  

Well i assume that it is really few but it never was as slow... Plus
it's only about a little server i get at max 20 mails per day... So...
before it tooks about 3~4secs to parse/scan a message



do you have network tests enabled, especially URIBL?? If so it might
be due to the recent DDOS on uribl.com, which causes the scans to
take longer due to DNS timeout??



  


There should be no dns timeouts for URIBL currently.   The dns mirrors 
are all up...  just the websites are ddos'd.


--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: Using SA code to extract URLs ?

2007-01-13 Thread Dallas Engelken

Michael W. Cocke wrote:
I was 
told a while back that the best way to extract urls from emails was to 
use code from SpamAssassin.  Ok - Now, I need to do just that. Any 
pointers?  I've looked thru the code in SpamCopURI, but unless there 
are some docs hidden somewhere I can't even figure out the entry 
point.  Are there some docs hidden somewhere (I hope!)?


Thanks!

Mike-



here is a little something i use to extract urls from messages.   it 
takes a mesage on STDIN, runs its through a empty instance of SA (no 
rules, no configs loaded), and prints to STDOUT.


#!/usr/bin/perl

use Mail::SpamAssassin;
use Mail::SpamAssassin::PerMsgStatus;

&main;

# 

sub main {
 my $msg;
 while (<>) { $msg .= $_; }
 my $data = &geturi(\$msg);
 print $data;
 exit;
}

# 

sub geturi {
 my ($message) = shift;
 my $sa = create_saobj();
 $sa->init(0);
 my $mail = $sa->parse($$message);
 my $msg = Mail::SpamAssassin::PerMsgStatus->new($sa, $mail);
 my @uris = $msg->get_uri_list();
 my %uri_list;
 foreach my $uri (@uris) {
   next if ($uri =~ m/^(cid|mailto|javascript):/i);
   $uri_list{$uri} = 1;
 }
 my $uris = join("\n", keys %uri_list, "");
 return $uris;
}

# 

sub create_saobj {
 my %setup_args = ( rules_filename => undef, site_rules_filename => undef,
userprefs_filename => undef, userstate_dir => undef,
local_tests_only => 1, dont_copy_prefs => 1
  );
 my $sa = Mail::SpamAssassin->new(\%setup_args);
 return $sa;
}

# 
# EOF



# cat corpus/spam/canselon.com.html | perl parse_uri.pl
http://images.loveouroffers.com/general/8675_usub/USUB_101_b_02.gif
./unsubscribeOffers.html
http://images.loveouroffers.com/general/8675_usub/USUB_101_b_01.gif
http://images.loveouroffers.com/general/8675_usub/spacer.gif
list.html?clientid=12&em=&offerid=1&mailerid=1&emailid=0
http://list.html/?clientid=12&em=&offerid=1&mailerid=1&emailid=0
http://images.loveouroffers.com/general/8675_usub/USUB_101_b_03.jpg
http:///unsubscribeOffers.html
http://./unsubscribeOffers.html


Enjoy.  Also, I only get digest copies from this list and dont check 
them all, so please cc me if you want me to see it. :)


--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



Re: ImageInfo Bug

2006-10-04 Thread Dallas Engelken

Stuart Johnston wrote:

Dallas,

I think there is a bug in the image_size_range function.

my $name = $type.'_dems';

Should probably be more like:

my $name = "dems_$type";

Thanks,
Stuart
Yup.. Craig Green made me aware of that last week, and I've been too 
busy to address it.  I'll get it updated on the SARE side shortly.   I 
havent looked at Theo's sandbox lately, but I'd guess its incorrect 
there also then.


Thanks,

--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com



ImageInfo plugin updated!

2006-08-05 Thread Dallas Engelken

Greeting,

I've added a few enhancements to the ImageInfo plugin for SpamAssassin. 


You can get it from..
http://www.rulesemporium.com/plugins.htm#imageinfo

Updates:
- added optimization changes by Theo Van Dinter
- added jpeg support
- added function image_named()
- added function image_size_exact()
- added function image_size_range()
- added function image_to_text_ratio()

See the update ruleset for some example.  Tweak the rules/scores to meet 
your needs...


--
dallase
http://uribl.com



RE: Strange problem

2006-07-10 Thread Dallas Engelken
> -Original Message-
> From: Rick Macdougall [mailto:[EMAIL PROTECTED] 
> Sent: Monday, July 10, 2006 11:59
> To: [EMAIL PROTECTED]
> Cc: users@spamassassin.apache.org
> Subject: Re: Strange problem
> 
> Sanford Whiteman wrote:
> >> Both  servers have exactly the same config except for the 
> auto-learn 
> >> and bayes/user prefs are stored in mysql on the FreeBSD server.
> > 
> 
> Thanks to all who replied.
> 
> I found the problem and it's related to ixhash, the timeout 
> doesn't work correctly / work at all.
> 
> I see
> 
> Jul 10 11:13:01 spa010 spamd[29830]: ixhash timeout reached 
> at /etc/mail/spamassassin/ixhash.pm line 91,  line 2226.
> 
> Jul 10 11:13:01 spa010 spamd[29830]: ixhash timeout reached 
> at/etc/mail/spamassassin/ixhash.pm line 91,  line 2226.
> 
> In the logs and the child never exits from processing the message.
> 
> I've cc'd Dallas to see if he has any insights into the problem.
> 

the warns are being generated because the timeout value has been exceeded...


my $timeout = $permsgstatus->{main}->{conf}->{'ixhash_timeout'} ||
5;
eval {
  Mail::SpamAssassin::Util::trap_sigalrm_fully(sub { die "ixhash
timeout reached"; });

the code is right.. you need to figure out why it times out.  have you
hardcoded ixhash_timeout to some other value?   have you tried manual
lookups from that box?

# host -tA abc.ix.dnsbl.manitu.net
Host abc.ix.dnsbl.manitu.net not found: 3(NXDOMAIN)

d



RE: DNS Whitelists

2006-06-22 Thread Dallas Engelken
> Actually what I was thinking of was an DNS version of this list so that
other applications can use it. 

oh i see..  well SA couldnt use it without someone writing a plugin then.

dallase
http://uribl.com




  




RE: DNS Whitelists

2006-06-22 Thread Dallas Engelken
> -Original Message-
> From: Marc Perkel [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, June 22, 2006 09:30
> To: [EMAIL PROTECTED]
> Cc: users@spamassassin.apache.org
> Subject: Re: DNS Whitelists
> 
> I'm not thinking links, What I want to do is whitelist based 
> on the host name of the server connecting to my server.
> 

isnt that what whitelist_rcvd_from is for?

is that what http://www.rulesemporium.com/rules/70_sare_whitelist.cf is for?

what am i missing here?

dallase
http://uribl.com



RE: DNS Whitelists

2006-06-22 Thread Dallas Engelken
> -Original Message-
> From: Marc Perkel [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, June 22, 2006 09:15
> To: users@spamassassin.apache.org
> Subject: DNS Whitelists
> 
> Are there any DNS bases whitelists out there? If not - 
> shouldn't we build one?
> 
> I need two different kinds of DNS whitelists. One would be 
> hosts that NEVER send spam. Large banks, etc.
> 
> The second list is a list of hosts that should never be blacklisted. 
> These are hosts that might send some spam but should never 
> accidentally be blacklisted because of it. Examples would be 
> *.aol.com, *.earthlink.nat, *.yahoo.com. The idea here for 
> those of us who are trying to build really reliable 
> blacklists to reference these lists as hosts to never blacklist.
> 
> Any thoughts on this?
> 
> 

# ping aol.com.white.uribl.com
PING aol.com.white.uribl.com (127.0.0.2) 56(84) bytes of data.
64 bytes from localhost (127.0.0.2): icmp_seq=1 ttl=64 time=0.095 ms

# ping otherdomain.com.white.uribl.com
ping: unknown host otherdomain.com.white.uribl.com


white.uribl.com will probably do exactly what you want here... but just
realize spammers can include these domains in their spam also.

you could always do something like...

urirhssub   URIBL_BLACK multi.uribl.com.A   2
bodyURIBL_BLACK eval:check_uridnsbl('URIBL_BLACK')
describeURIBL_BLACK Contains an URL listed in the URIBL
blacklist
tflags  URIBL_BLACK net
score   URIBL_BLACK 3

urirhssub   URIBL_WHITE white.uribl.com.A   2
bodyURIBL_WHITE eval:check_uridnsbl('URIBL_WHITE')
describeURIBL_WHITE Contains an URL listed in the URIBL
whitelist
tflags  URIBL_WHITE net
score   URIBL_WHITE -2

metaURIBL_COMPENSATE  (URIBL_BLACK && URIBL_WHITE)
describeURIBL_COMPENSATE  Contains an URL listed on both URIBL black
and white
score URIBL_COMPENSATE  1

dallase
http://uribl.com



RE: Latest sa-stats from last week

2006-05-08 Thread Dallas Engelken
> -Original Message-
> From: Matt Kettler [mailto:[EMAIL PROTECTED] 
> Sent: Monday, May 08, 2006 14:50
> To: [EMAIL PROTECTED]
> Cc: users@spamassassin.apache.org
> Subject: Re: Latest sa-stats from last week
> 
> Dallas Engelken wrote:
> >> -Original Message-
> >> From:  [mailto:[EMAIL PROTECTED]
> >> Sent: Monday, May 08, 2006 14:07
> >> To: users@spamassassin.apache.org
> >> Subject: Latest sa-stats from last week
> >>
> >> Email:   561313  Autolearn: 0  AvgScore:   6.77  
> >> AvgScanTime:  2.41 sec
> >> Spam:209359  Autolearn: 0  AvgScore:  16.99  
> >> AvgScanTime:  2.30 sec
> >> Ham: 351954  Autolearn: 0  AvgScore:   0.70  
> >> AvgScanTime:  2.48 sec
> >>
> >> Time Spent Running SA:   376.39 hours
> >> Time Spent Processing Spam:  133.76 hours
> >> Time Spent Processing Ham:   242.62 hours
> >>
> >> TOP SPAM RULES FIRED
> >> 
> >> RANKRULE NAME   COUNT %OFRULES 
> >> %OFMAIL %OFSPAM  %OFHAM
> >> 
> >>1URIBL_BLACK 1633977.09   
> >> 29.11   78.050.50
> > 
> > Nice.
> > 
> > How does that Queen song go??  We... are...  ;)
> > 
> 
> I would be proud of those numbers Dallas.. However, I'd also 
> take them as a warning of areas needing improvement.
> 
> URIBL has the highest spam hit rate, but you nonspam hit-rate 
> is more than 5 times that of JP, your closest competitor in 
> the world of uridnsbl's.
> 
>1URIBL_BLACK 1633977.09   
> 29.11   78.050.50
>5URIBL_JP_SURBL  1182515.13   
> 21.07   56.480.09
> 
> Given that your spam hit rate is 1.5 times that of JP, 
> compared to the 5 times higher nonspam rate, it suggests JP 
> is doing a whole lot better in the accuracy department.
> 
> (note: I do realize this can be biased by overall FNs in SA. 
> Some of those 0.50 might be SA FN's. That said, such FNs 
> would likely also affect other URIBLs.)
> 
> This isn't to say that URIBL_BLACK isn't useful, or that you 
> guys aren't doing a good job. However, this is good evidence 
> you guys are doing great, but you do still have some areas 
> that could use improvement.
> 

thanks, i think. ;)

our fp ratio for ham has always been hanging at that level.  i think thats a
good sign.  it means the data in our zones that are causing those ham hits
have not changed, and no one has notified us that they need removal.
doesnt worry me a bit.

we welcome your delist requests if you actually find a FP (that we can agree
on) on black.uribl.com.  :)

d



RE: Latest sa-stats from last week

2006-05-08 Thread Dallas Engelken
> -Original Message-
> From:  [mailto:[EMAIL PROTECTED] 
> Sent: Monday, May 08, 2006 14:07
> To: users@spamassassin.apache.org
> Subject: Latest sa-stats from last week
> 
> Email:   561313  Autolearn: 0  AvgScore:   6.77  
> AvgScanTime:  2.41 sec
> Spam:209359  Autolearn: 0  AvgScore:  16.99  
> AvgScanTime:  2.30 sec
> Ham: 351954  Autolearn: 0  AvgScore:   0.70  
> AvgScanTime:  2.48 sec
> 
> Time Spent Running SA:   376.39 hours
> Time Spent Processing Spam:  133.76 hours
> Time Spent Processing Ham:   242.62 hours
> 
> TOP SPAM RULES FIRED
> 
> RANKRULE NAME   COUNT %OFRULES 
> %OFMAIL %OFSPAM  %OFHAM
> 
>1URIBL_BLACK 1633977.09   
> 29.11   78.050.50

Nice.

How does that Queen song go??  We... are...  ;)



RE: URIBL_BLACK + OB_SURBL double-listed nonspam domain

2006-02-18 Thread Dallas Engelken
> -Original Message-
> From: Matt Kettler [mailto:[EMAIL PROTECTED] 
> Sent: Sunday, February 19, 2006 06:09
> To: jdow
> Cc: users@spamassassin.apache.org
> Subject: Re: URIBL_BLACK + OB_SURBL double-listed nonspam domain
> 
> Right now JP+SC scores  8.585, which even BAYES_00 can't 
> bring back down under the 5.0 line. I trust the URIBLs a lot, 
> I think they're great. But I don't trust them so much that 
> two of them should be able to over-ride BAYES_00 without any 
> other spam rules firing.
> 

So score BAYES_00 at -10.. unless you don't "trust" BAYES_00 either.  :)



RE: Over-scoring of SURBL lists...

2006-02-18 Thread Dallas Engelken
> -Original Message-
> From: Matt Kettler [mailto:[EMAIL PROTECTED] 
> Sent: Sunday, February 19, 2006 06:27
> To: [EMAIL PROTECTED]
> Cc: users@spamassassin.apache.org
> Subject: Re: Over-scoring of SURBL lists...
> 
> Dallas Engelken wrote:
> >
> >
> > So please... put this f'ing thread to bed and send a delist request.
> >   
> Yes, but dallas.. this thread IS NOT about how to keep the 
> URIBLs cleaner. I really don't care how it got there. I 
> understand that mistakes happen. No big deal. I'm not trying 
> to start a witch-hunt demanding greater purity in URIBL 
> listings. If I wanted to do that, I'd do it on the uribl and 
> surbl lists.
> 
> I *AM* trying to get people to think about the STRUCTURE OF 
> THE RULES and how they are scored in SpamAssassin. The 
> problem is nobody's even willing to discuss that end of 
> things without miles of proof that a problem exists.
> 
> I've proven a problem exists.. Submitting delist requests 
> will NOT work as a sole fix it because it's just going to 
> happen again. and again, and again. Yes, delists are a good 
> thing. But we need to realize that human error will continue 
> to happen, and thus the spammassassin rules need to be 
> structured accordingly.
> 

You've proven a problem of obscure FPs.  We (surbl/uribl) both maintain
internal whitelists.  They are not fully encapsulating of every ham domain
out there, but they are pretty god damn big and remove nearly all
possibilities of causing what I would call substantial "damage".

Your examples (to this point) are of very narrow scope.  I have not heard
anyone else on SA-users ever complain of rampant URIBL only FPs..  and these
people will normally let you know if it exists.

> So can we put all the arguments about who's URIBL is bigger 
> than who's to rest and start looking at the spamassassin end 
> of the problem?

I'm not sure where you got this, but I've never said anything of the sorts.
I've also never heard Jeff say anything of the sort.  We have different list
structure, different listing philosophies, and different sources but we
have a similar intrest and many times reach the same final result.

> Because I really don't give a damn about who made what 
> mistakes and who makes more mistakes.
> 
> Simple fact. Mistakes get made. Sometimes multiple mistakes 
> coincide with each other. For some reason, many people on 
> this list seem to refuse to accept that can happen. So I've 
> had to make a lot of proof it can happen. Some folks have 
> taken that as criticism of the URIBLs affected. It's not, 
> it's just facts to support the obvious.
> 

IMHO, your "proof" has been small and insignificant to this point.

> I *like* both surbl.org and uribl.com. I thimk they're great. 
> So will you guys quit painting me as attacking the URIBLS 
> because I point out some problems with how SA implements 
> checking them?
> 
> Can we address the real question here:
> 
> How can we keep the spam tagged, and try to mitigate the FPs 
> by keeping additive scores for multiple URIBLs more moderate? 
> +20 worth of URIBL hits is fine on spam, but astronomically 
> high scores don't really help SA when the tagging threshold 
> is +5. However, they do hurt SA when overlapping mistakes happen.
> 
> 

If this is the issue that you are really trying to address, it would be
better done on the dev list... Because I think the users list (in general)
is happy with the current implementation.  If they are not, I guess now is
the time to speak up. 

I am going to bow out of this thread now as I have spent far more time on it
than it warrants.  I appreciate your feedback to uribl, and welcome your
delist requests for any FPs you come across.  In the end, we are all working
towards a common goal.

Thanks,
Dallas





RE: Over-scoring of SURBL lists...

2006-02-18 Thread Dallas Engelken
> -Original Message-
> From: Matt Kettler [mailto:[EMAIL PROTECTED] 
> Sent: Sunday, February 19, 2006 02:07
> To: jdow
> Cc: users@spamassassin.apache.org
> Subject: Re: Over-scoring of SURBL lists...
> 
> jdow wrote:
> >
> >> rbl/uribl overlap.
> >
> > Matt, I think your worry about overlap is faulty. If the 
> lists all fed 
> > off one common database it would be a worry. Then the correlation 
> > would be a symptom of the system not working. If they all work off 
> > more or less individual captures and submissions their raw 
> databases 
> > have low correlation. If their results correlate well, as 
> in "overlap"
> > as you are using it, that is an indication of their goodness.
> 
> Yes, but the frequency of overlap in nonspam that I'm seeing 
> at my site is disturbing.
> I've posted examples of this, and they keep getting ignored.
> 
> This IS a real problem. I am not speculating. I've posted two 
> real domains on this list that have had the problem for me in 
> the past 7 days.
> ultraedit-updates.com: OB + uribl black (delisted from both at my
> request)
>winterizewithscotts.com: OB + uribl black (I have 
> intentionally NOT submitted a delist request for this domain)
> 

"honey, our grass is less green this year because URIBL blocked my
winterizer reminder."  :)

winterizewithscotts.com was manually added on oct-14, no delist requests in
over 4 months.  It was not via web submission.  It was not an automated add.
Rather a direct add by someone who has added over 8k entries to uribl black
in the last week.  Now I'm not saying its wrong or right, I'm just saying it
was a judgement call based on human review.

So please... put this f'ing thread to bed and send a delist request.

D 






RE: Over-scoring of SURBL lists...

2006-02-17 Thread Dallas Engelken
> -Original Message-
> From: Matt Kettler [mailto:[EMAIL PROTECTED] 
> Sent: Saturday, February 18, 2006 00:05
> To: Raymond Dijkxhoorn
> Cc: jdow; users@spamassassin.apache.org
> Subject: Re: Over-scoring of SURBL lists...
> 
> Raymond Dijkxhoorn wrote:
> > Hi!
> 
> >>>
>  I consider that "highly similar" for JP, SC, AB, OB and WS.
> >>>
> >>> As similar as 30 and 40, and 0, .3 and 7 are, I suppose.
> > 
> >> On another paw how "independent" are these lists? Do any 
> inherit from 
> >> other lists or are they all separately maintained?
> > 
> > They use different datasources and no cross links between them. If 
> > there is a real nasty one we could/would talk about it on 
> the private 
> > list but thats really sporadic.
> 
> Untrue. AB and SC use a common data source, spamcop reports. 
> However, each has it's own processing/listing criteria and 
> each is separately maintained.
> 
> And, realistically, since WS and uribl accept direct reports 
> from more-or-less anyone, their data sources could be 
> redundant with any other URIBLs depending on what the
> 
> It's really straight forward for an end-user to report the 
> email to spamcop, then report the spamverized URI to WS and 
> URIBL_BLACK via web forms.
> 
> Pickup on surbl's SC list appears to involve multiple reports 
> to spamcop, but there's still potential for common inputs.
> 
> Let's see a show of hands.. How many people here have ever 
> filed a spam report with multiple lists, including doing 
> spamcop + either WS or URIBL.
> 
> (raises own hand)
> 

FWIW, web submissions account for less than 1% (119 of 12652 listings) of
URIBL data for the last 7 days.  All submissions are reviewed, so I find it
hard to believe that the FPs are coming in via this mechanism.. seeing that
a human reports it (i hope) and a human reviews it.   From what I see, FPs
normally come from automation and over zealous mass adds.

D



RE: Over-scoring of SURBL lists...

2006-02-17 Thread Dallas Engelken
> -Original Message-
> From: Daryl C. W. O'Shea [mailto:[EMAIL PROTECTED] 
> Sent: Friday, February 17, 2006 21:34
> To: Dallas L. Engelken
> Cc: users@spamassassin.apache.org
> Subject: Re: Over-scoring of SURBL lists...
> 
> Dallas L. Engelken wrote:
> > The result will be no URIBL only FPs.  OTOH, you may end up with a 
> > shit-ton of people bitching about spam accuracy dropping in 
> stock 3.2 
> > installs if you make these changes.
> 
> I'm not sure it'd be *that* bad.
> 
> A grep of my logs from this week shows that 1.1% of my spam 
> scores under a score of 8 and only 13% of those spams hit 
> *any* URIBLs.
> 
> So yeah, there'd be more FNs, but I'm not sure that it'd a 
> shit-ton of them.
> 
> 

All I know is I've had a few system bit by 
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4767

And when this happens, I hear about it, because people are complaining about
getting a "shit-ton" of spam.  And all just run SBL/URIBL/SURBL.  No other
RBLs.   Now I understand loosing URIBL tests completely versus scoring
between say 2 and 5.5 are completely different things... but I still believe
a considerable increase will be seen in FN's.

Which phone call would you rather have?

  Q)  My client is trying to send me email and its being rejected because
they are listed on URIBL, what should I do?  
  A)  whitelist the sender or request delisting from the URIBL

  Q)  How do I block all this stock, pill, porn, etc spam that started
coming in since we "upgraded"?
  A)  Well, you can re-adjust your heuristic scoring for your URIBL tests
back to their previous values, let me walk you through that.. 

The first question can be answered in 1 minute.  

The second question OTOH could take you a considerable amount of time.
Especially if you have to do it for them.  Oh, and you have to wait for them
to reconfigure their PIX (that they don't know how to administer) before you
can get in, and they want you to wait on the line until they figure it out.
Meanwhile client X, Y, and Z are waiting for you to get off the phone so you
can do the same thing for them ;)
 
All I'm saying here is, I'll take the easy route :)

Dallas








RE: Over-scoring of SURBL lists...

2006-02-17 Thread Dallas Engelken
> -Original Message-
> From: Matt Kettler [mailto:[EMAIL PROTECTED] 
> Sent: Friday, February 17, 2006 18:47
> To: Matt Kettler
> Cc: Jeff Chan; users@spamassassin.apache.org
> Subject: Re: Over-scoring of SURBL lists...
> 
> Matt Kettler wrote:
> 
> > I'll even re-quote myself:
> >> I personally would like to see some statistics, but  at 
> this point, 
> >> we  don't have any test data on this so we're arguing your 
> theory vs mine.
> > And your quote that I was counter-pointing:
> >> As you can see the performance of the lists are different, 
> and the way they're created is different too.
> > 
> > I don't see enough of a difference to clearly rule out 
> significant overlap.
> > 
> > I'll define my test of "significant overlap" as:
> >> 10% of total hits redundant across 3 or more lists and >1% nonspam 
> >> hits
> > redundant across 2 or more lists.
> > 
> 
> Messages received today that are double-listed in two or more 
> of SC, JP, AB, OB and WS:
> grep "SURBL_MULTI2" /var/log/maillog |grep "Feb 17" |wc -l
> 292
> 
> All surbl.org hits in same timeframe (includes ph, but no matter):
> 
> grep "_SURBL" /var/log/maillog |grep "Feb 17" |wc -l
> 583
> 
> So we at least have a 50% double-listing rate. That 
> in-and-of-itself isn't much of a problem, but it also doesn't 
> rule out overlap. It's still a whole lot higher than my first 
> criteria of 10% overlap
> 
> However, right now I don't have more than 100 FPs so I can't 
> really comment on the nonspam hit rate of SURBL_MULTI2. 
> That's the important one.
> 
> I also added multi3, multi4 and another rule to detect 
> overlap between uribl.com's black and surbl.org:
> 
> meta URIBL_BLACK_OVERLAP (URIBL_BLACK && (URIBL_AB_SURBL || 
> URIBL_JP_SURBL || URIBL_OB_SURBL || URIBL_WS_SURBL || 
> URIBL_SC_SURBL)) score URIBL_BLACK_OVERLAP -1.0
> 

if anyone is interested, here is an alternative scoring method for
25_uribl.cf -> http://www.uribl.com/tools/25_uribl.cf (make sure you wipe
out the scores for uribl tests in 50_scores.cf if you replace this file).

This should make SBL/URIBL/SURBL hits range in score from 2.0 to 5.5... 

- 2.0 (SBL ONLY) 
- 2.5 (URIBL_ONLY)
- 2.5 (SURBL_ONLY)
- 3.0 (SBL + URIBL)
- 3.0 (SBL + SURBL)
- 3.0 (SURBL_ONLY x2)
- 4.0 (URIBL + SURBL)
- 5.0 (SBL + URIBL + SURBL)
- 5.5 (SBL + URIBL + SURBLx2)

If you want to reduce the possibility of URIBL-only FPs, this is the way to
go.  

D



RE: Over-scoring of SURBL lists...

2006-02-16 Thread Dallas Engelken
> -Original Message-
> From: Theo Van Dinter [mailto:[EMAIL PROTECTED] 
> Sent: Friday, February 17, 2006 01:09
> To: users@spamassassin.apache.org
> Subject: Re: Over-scoring of SURBL lists...
> 
> On Thu, Feb 16, 2006 at 10:42:19PM -, Dallas Engelken wrote:
> > So.. I have moved partypoker.com to grey for now.  I'll let you and 
> > Theo thumb wrestle over it :)
> 
> Warning: I have big hands. ;)
> 

Yea, "thats what she said"  ;)

> I'm happy to show samples of mails to certain folks, btw.  
> There are several personal and spamtrap entries in my which 
> refer to them:
> 

Oh I'm not saying anyones wrong.. I'm just tired of hearing people say we
are wrong.  We actually have 2 black submissions for partypoker.com.  One
actually had a sample attached.

> 
> BTW, I'm wondering when URIBL will be able to keep up with my 
> submissions so I can start them up again.  ;P
> 

Once we get our summer interns ;)

> BTW2: if anyone's curious, here's the URIBL stats for last 
> weeks' SA net
> checks:
> 
>   MSECSSPAM% HAM% S/ORANK   SCORE  NAME
>   0   220791504350.814   0.000.00  (all messages)
> 0.0  81.4048  18.59520.814   0.000.00  (all messages as %)
>  34.512  42.3169   0.34300.992   0.770.00  URIBL_BLACK
>   0.701   0.7555   0.46400.620   0.490.00  URIBL_GREY
>   0.000   0.   0.0.500   0.450.00  URIBL_RED
> 
> and SURBL's for comparison:
> 
>  25.585  31.4293   0.1.000   1.000.00  URIBL_SC_SURBL
>  33.248  40.8409   0.00991.000   1.000.00  URIBL_JP_SURBL
>  36.254  44.5226   0.05550.999   0.940.00  URIBL_OB_SURBL
>   4.291   5.2710   0.1.000   0.910.01  T_URIBL_XS_SURBL
>   3.907   4.7996   0.00201.000   0.900.00  URIBL_AB_SURBL
>  39.914  48.6415   1.70710.966   0.650.00  URIBL_WS_SURBL
>   0.195   0.2391   0.1.000   0.630.00  URIBL_PH_SURBL
> 

SPAM% is crap when it comes to ruleqa on uribls.  Spammers rotate domains
daily.  We expire dead domains daily.  I guess we could keep all the bloat
around to pump our numbers ;)  If you had a daily rotated corpus, we'd own
it in SPAM%...  

Todays stats.

RANKRULE NAME   COUNT  %OFMAIL %OFSPAM  %OFHAM
--
   4URIBL_BLACK  471431.44   76.161.21
   8URIBL_JP_SURBL   156218.60   52.330.06
   9URIBL_OB_SURBL   135116.28   45.260.35
  10URIBL_WS_SURBL   118614.39   39.730.46
  12URIBL_SC_SURBL95911.40   32.130.00
--

Todays stats from a bigger install.

--
   5URIBL_BLACK 134850   46.70   71.790.68
   9URIBL_JP_SURBL  7065924.35   37.620.01
  10URIBL_OB_SURBL  6915123.84   36.820.03
  12URIBL_WS_SURBL  5778619.95   30.770.13
  19URIBL_SC_SURBL  28533 9.83   15.190.00
--

At the end of the day if you run a GA, all uribls may look similar, but
real-time stats show a much different picture.   I don't think our detection
speed is any faster than JP because I've seen some of the timestamps on new
additions, but maybe its our rebuild and distribution time to our mirrors.
I don't know, but SA user numbers tend to agree ->
http://www.gossamer-threads.com/lists/spamassassin/users/67936

D



RE: Over-scoring of SURBL lists...

2006-02-16 Thread Dallas Engelken
> -Original Message-
> From: Daryl C. W. O'Shea [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, February 16, 2006 21:51
> To: users@spamassassin.apache.org
> Subject: Re: Over-scoring of SURBL lists...
> 
> Matt Kettler wrote:
> > List Mail User wrote:
> 
> > My FPs fall into two categories:
> 
> Like Matt, I've had similar electronics newsletters trigger 
> on apparently non-spammed domains.
> 
> I've also had a number of users complain about FPs on emails 
> from a number of online poker sites.
> 
> 
> > And yet it's in URIBL's blacklist. (I've already requested a delist)
> 
> Do they actually delist domains by request?  I've long ago given up 
> trying after having all of my requests rejected.
> 

If "all of your requests" are referring to URIBL.COM, I think you are over
exaggerating.

You have submitted 1 time to uribl, that was a delist request for
partypoker.com, which was requested to be blacklisted by Theo.  You'd think
2 people working on an anti-spam project together could agree on something?
One mans spam is another mans ham (or addiction?), and its up to URIBL to
make that classification.  Its not always going to be right to everyone.   

If you don't like our classification, submit a delist reqest.  If we reject
it, submit another.   We take notice when multiple requests come in for the
same domain, especially from unique uids.   For a delist request, give us a
reason in the "Your Message Regarding this submission (optional)" section.
That goes along ways.

So.. I have moved partypoker.com to grey for now.  I'll let you and Theo
thumb wrestle over it :)

Dallas



Re: Post your top 10 from sa-stats

2006-01-31 Thread Dallas Engelken
On Tue, 2006-01-31 at 07:37 -0600, DAve wrote:
> And mine, note that these are *post* MailScanner and RBLs, which are 
> running on my mail gateways. By the time SA gets the mail I've pruned 
> anywhere from 45% to 75% of the messages, depending on the day.
> 
> TOP SPAM RULES FIRED
> RANK  RULE NAME   COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
> 1 URIBL_BLACK 1623608.88   55.25   88.862.10

is that 2% ham hits really missed spam or are you having false positives
due to URIBL_BLACK??

Thanks,

-- 
Dallas Engelken <[EMAIL PROTECTED]>
http://uribl.com



Re: Post your top 10 from sa-stats

2006-01-31 Thread Dallas Engelken
On Mon, 2006-01-30 at 16:45 -0600,  wrote:
> Here is mine:
> 
> TOP SPAM RULES FIRED
> 
> RANKRULE NAME   COUNT %OFRULES %OFMAIL %OFSPAM
> %OFHAM
> 
>1URIBL_BLACK 2577787.36   44.54   77.31

amen to that!

-- 
Dallas Engelken <[EMAIL PROTECTED]>
http://uribl.com



RE: Post your top 10 from sa-stats

2006-01-31 Thread Dallas Engelken
On Tue, 2006-01-31 at 11:20 -0600, Kristopher Austin wrote:
> Hmm, I guess that's a question for Dallas.  This is the version I'm
> using:
> # file: sa-stats.pl
> # date: 2005-08-03
> # version: 1.0
> # author: Dallas Engelken <[EMAIL PROTECTED]>
> # desc: SA 3.1.x log parser
> 
> I don't seem to be the only one showing that strange math.  Dave had the
> same sort of entry in his:
> TOP HAM RULES FIRED
> RANK  RULE NAME   COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
> 1 HTML_MESSAGE6306721.17   21.46   63.61   56.74
> 
> Dallas, is there a bug or are we interpreting these numbers incorrectly?
> 

Ok, Lets take the following sample data

Email: 2766 
Spam:   975
Ham:   1791

TOP SPAM RULES FIRED
--
RANKRULE NAME   COUNT  %OFMAIL %OFSPAM  %OFHAM
--
   7HTML_MESSAGE  62922.74   64.51   34.51
 --

TOP HAM RULES FIRED
--
RANKRULE NAME   COUNT  %OFMAIL %OFSPAM  %OFHAM
--
   6HTML_MESSAGE  61822.34   64.51   34.51
--

we had 2766 total emails.  

for %OFMAIL,
629 spam messages hit HTML_MESSAGE which is 629/2766 = 22.74%.
618 ham messages hit HTML_MESSAGE which is 618/2766 = 22.34%.

for %OFSPAM
629 spam message hit HTML_MESSAGE  which is 629/975 = 64.51%.
618 spam message hit HTML_MESSAGE  which is 618/1791 = 34.51%.

If you want to know what percent the rule HTML_MESSAGE triggered out of
all email, you'd need to add SPAM + HAM / TOTAL
618+629 / 2766 = 45.08%.

The %OFMAIL category is misleading because its comparing the hit count
(on that line) against the total email.   I've went ahead and changed
that is v1.02 and v0.92 respectively.   If you like the old way it
works, dont get the new version :)

SA 3.0.x - http://www.rulesemporium.com/programs/sa-stats.txt
SA 3.1.x - http://www.rulesemporium.com/programs/sa-stats-1.0.txt

Hope this clarifies!

Thanks,

-- 
Dallas Engelken <[EMAIL PROTECTED]>
http://uribl.com 



Re: SpamAssassin 3.1.0-pre2 PRERELEASE available!

2005-06-30 Thread Dallas Engelken
On Thu, 2005-06-30 at 06:39 -0500, Michael Parker wrote:
> Kai Schaetzl wrote:
> 
> >
> >>SQL 
> >> storage is now recommended for Bayes
> >>
> >>
> >
> >Hm, time to check the documents how to set this up ...
> >BTW: is my impression correct that Bayes on SQL won't do any auto-expire, 
> >you have to do it yourself with some SQL code?
> >  
> >
> 
> No, it does auto expire just fine.  Not sure what gave you that impression.
> 

maybe confused with an sql auto-whitelist?
d