Re: Re: rbldnsd blacklist question
Rob McEwen wrote: John Hardin wrote: On Tue, 16 Sep 2008, Marc Perkel wrote: Looking from opinions from people running rbl blacklists. I have a list that contains a lot of name based information. I'm about to add a lot more information to the list and what will happen is that when you look up a name you might get several results. For example, a hostname might be blacklisted, be in a URIBL list, be in a day old bread list, and a NOT QUIT list. So it might return 4 results like 127.0.0.2, 127.0.0.6, 127.0.0.7, 127.0.0.8. Is this what would be considered "best practice". My thinking is that having one list that returns everything is very efficient. Isn't general practice to bitmap the last octet if you're going to convey multiple pieces of information? If you have a situation where there might be more than one "answer" for a given query, and you are content with having a maximum of 7 possible answers, then... Why just 7? You have 2 other octets to use.. 127.X.Y.Z - X and Y dont have to be zeros... 512 possibilities if you use all the bit on all 3 octets (but I'd avoid loopback 127.0.0.1). 448 possibilities if you only count bit 1 settable on octet 2 and 3 (ie 127.1.1.2) 343 if you avoid setting bit 1 altogether on any octet (ie 127.2.2.2) -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: Re: rbldnsd blacklist question
John Hardin wrote: On Tue, 16 Sep 2008, Marc Perkel wrote: Looking from opinions from people running rbl blacklists. I have a list that contains a lot of name based information. I'm about to add a lot more information to the list and what will happen is that when you look up a name you might get several results. For example, a hostname might be blacklisted, be in a URIBL list, be in a day old bread list, and a NOT QUIT list. So it might return 4 results like 127.0.0.2, 127.0.0.6, 127.0.0.7, 127.0.0.8. Is this what would be considered "best practice". My thinking is that having one list that returns everything is very efficient. Isn't general practice to bitmap the last octet if you're going to convey multiple pieces of information? Isnt it simple enough to write the zone file in 2 different formats and map them to 2 different zone names to support both bitmasked and multiple response if there is value in having both? URIBL uses bitmasks, but doesnt need to as we dont cross list domains to multiple lists. -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: Re: Incorrect DNSBL evaluation
Karsten Bräckelmann wrote: On Mon, 2008-07-21 at 23:17 +0200, Matthias Leisi wrote: Yves Goergen schrieb: What do you mean? My mail server uses the DNS servers of the computing centre. What SpamAssassin does, I don't know. The IP addresses are: The same as everyone else... Sic. # cat /etc/resolv.conf nameserver 213.133.100.100 nameserver 213.133.99.99 nameserver 213.133.98.98 nameserver 213.133.98.97 Ah, Hetzner. I had a lot less problems since I started to run my own: main:~> cat /etc/resolv.conf nameserver 127.0.0.1 Every Hetzner customer using the same DNS by default? Yeah, that indeed looks like these DNS servers are being blocked by the BL operators (see my previous post). Most likely not only URIBL, but every major BL out there... I have looked, and there are no ACLs on 213.133.0.0/16 whatsoever, so its not coming from the uribl mirror side. Could those DNS servers be monetizers? Have you (Yves) even tried manual lookups to see how the ISP DNS server is responding? Do this and report your results.. $ dig @213.133.100.100 unclassified.de.multi.uribl.com A Those NS IPs are not reachable from here, so I cant test to see how they respond. -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: Starting a URIBL - Howto? [OT]
Rob McEwen wrote: (on-list follow-up) By "proactive listings", I discovered in my off-list conversation with Dallas that this refers to URIBL-Gold listings... where items are listed in "uribl-gold" in advance of seeing them in actual spams. But this uribl-gold list isn't available to the public and is not even prescribed as a list to use for fighting spam. We do ask anyone with access to it to use it. Since its basically uribl black for domains that we believe will show up in future spam campaigns, there is no reason not to. I'm sure there are some on this list that can comment further in regards to its effectiveness. I'm really disappointed that Dallas would have presented that kind of comparison to ivmURI. This is like comparing some kid's best basketball game on an X-Box to Michael Jordan's best basketball game on the court. I'm glad that URIBL-Gold is helping URIBL black get better... but until the listing actually makes it into URIBL-Black... and is then actually *usable* for blocking spam... From a RBL perspective, the purpose of the data in there is to catch the front end of spam runs. Assuming it takes ~5 minutes to list, rebuild, and redistribute new zone data in reactive mode, we could miss 50% of a 10 minute campaign. Obviously the longer the campaign draws out, the better the miss rate looks. But those using gold+black have 100% hitrates on alot of these campaigns, which is something that is difficult if not impossible to achieve on a reactive blacklist based soley on trap data or user feed back. As you can see at http://www.uribl.com/gold.shtml, over 20% (14k of 57k) of the domains that have been listed in gold for hours, days, even weeks, have since moved to black.So, assume each of those 14k domains returned NXDOMAIN on black.uribl.com for the first ~5 minutes of each of their campaigns, how much spam do you think we missed? Quite a lot I'd say. That short window is what we are targetting here. It doesnt result in a huge hitrate because it only hits in gold during the rebuild and redistribute window, but it does serve its purpose quite well. Aside from client side spam filtering, I could see registries/registrars, web hosts, ip space owners and the like benefiting from this data as well. Knowing there is potential for abuse prior to the abuse actually occurs could be quite a powerful tool. For example, I can tell you that ns1.tuhaerge.com is the next NS that will be spewing up VPXL crapmail (http://www.spamtrackers.hk/wiki/index.php?title=VPXL)..That NS and every domain registred against that NS should be instantly nuked, but getting those Chinese registrars to action anything like this, even with proper evidence, is nearly impossible... just think if you asked them to kill it before the abuse started. ;) -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: Re: Starting a URIBL - Howto? [OT]
Rob McEwen wrote: Dallas Engelken wrote: Yes, of course, but you're results.txt is biased as it only shows where imvURI hits. Based on the last 20k adds to URIBL, it appears to me that imvURI has less coverage? : Dallas, Yes, you are right! URIBL *does* cast a wider net than ivmURI. So, in general, I agree with your statement that ivmURI has less coverage than URIBL. But I'm confused about your stats... and they looks really weird. (but maybe I'm just not understanding them?) So here is what I did. I took the last 500 additions to URIBL, (not including geocity and blogspot items... so that this comparison would compare apples to apples!) I then ran those against ivmURI. 186 of the 500 latest additions to URIBL were also found in ivmURI. I then reversed this testing and ran URIBL against the last 500 additions to ivmURI. 328 of the latest 500 additions to ivmURI were listed on URIBL. So yes, basically, you're right, URIBL does have greater coverage than ivmURI. Your point is well made. For the most part, URIBL casts a wider net than ivmURI. Also, if you were to include geocity and blogspot hits, of course, that would throw the comparison wildly in URIBL's favor... but I'm not so sure that would be a fair comparison. No, you're right, thats not fair. If I compare only recent reactive listings, minus the subdomain hosters that we list, you hit about 60% whereas before it was more like 27%. imvURI stats from last 5000 URIBL black listings -> 2981 hits -> 2019 misses (In both tests, I checked against the 2nd list just about 2-3 minutes after grabbing the lastest data from first list. This is important as I was seeing those stats quickly grow for BOTH after my initial collection of stats... because items not yet in both lists are continuously getting into the other list fast. So timing is mission critical in this kind of testing and the time between gathering and checking MUST be the same both ways.) However, I think you missed my point about http://invaluement.com/results.txt I wasn't saying that this proved that ivmURI is better than URIBL or SURBL. Only that this proves ivmURI as being *relevant* and *useful* ...even for those who are already using *both* URIBL and SURBL. (and this is just one such proof!) you said, "and ALL 3 catch stuff the other 2 miss... FOR EXAMPLE: http://invaluement.com/results.txt )" your EXAMPLE contradicts the statement that precedes it. I can only take it in the context of how I read it. For example, if ivmURI were only catching stuff already caught by URIBL and SURBL, ivmURI wouldn't be relevant or helpful to anyone. Moreover, I believe that URIBL or SURBL could easily create a similarly impressive page as my http://invaluement.com/results.txt page. Probably. Bottom line is that you are correct... AND... I'm sorry you took this as me dissing URIBL! I didnt take it that way. I was just pointing out that your statement didnt match your accompanying example. Simply put, there are some series of spams that each of the three URI blacklists are better at catching than the other two. That is ALL that I meant by this. Okay, if you would have said that, I would have agreed and never posted :) -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: Re: Starting a URIBL - Howto? [OT]
Rob McEwen wrote: and ALL 3 catch stuff the other 2 miss... FOR EXAMPLE: http://invaluement.com/results.txt ) Yes, of course, but you're results.txt is biased as it only shows where imvURI hits. Based on the last 20k adds to URIBL, it appears to me that imvURI has less coverage? imvURI stats from last 2 URIBL reactive listings. -> 5519 hits -> 14481 misses imvURI stats from last 2 URIBL proactive listings. -> 351 hits -> 19649 misses -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: Re: Looking for hosts to white list
Benny Pedersen wrote: On Tue, April 22, 2008 23:47, Marc Perkel wrote: I'm looking for people who are running URI blacklists, but I'm more interested in your whitelist information. I have an extensive list myself and looking for partners to swap data with. but uribl.com have a hidded whitelist, there might be others that have the point of hide it :) Are you sure you mean uribl.com? white.uribl.com is a publically available zone. It is not a part of multi.uribl.com, but is available for stand alone queries. # host -tTXT microsoft.com.white.uribl.com microsoft.com.white.uribl.com text "Whitelisted, see http://lookup.uribl.com/?domain=microsoft.com"; URIBL white hits are also visiable on the lookup form, ie http://lookup.uribl.com/?d=godaddy.com We're not scared to show it off, as we dont use it for false remediation (for the most part). -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: OT: uribl.com folks awake?
Jonathan Nichols wrote: Sorry for the OT. I've been trying to get in touch with whoever is in charge of URIBL zonefile mirrors without success. Is this thing on? Ping me offlist, por favor. I may have just been pinging the wrong people. http://www.uribl.com/contact.shtml ---> For DNS questions not related to listings.. that includes zone information, transfers, outages, etc. Use dnsadmin at uribl dot com <mailto:[EMAIL PROTECTED]>. Have you done that? -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: Re: util_rb_2tld
> McDonald, Dan wrote: > > >> On Tue, 2008-03-25 at 16:44 +0100, Yet Another Ninja wrote: >> >> util_rb_2tld by.ru >> util_rb_2tld tripod.com > > So, the man page is wrong? > [luser sa ~]$ man Mail::SpamAssassin::Conf > /util_rb_2tld > [...] >util_rb_2tld 2tld-1.tld 2tld-2.tld ... No, I dont think this was a message regarding util_rb_2tld usage format. I think the point he was making was that if you add by.ru and tripod.com to your util_rb_2tld config it will help filter spam abusing those hosts. hotmail.ru would be another one.. as the tripod spammers started hitting hotmail.ru with it today. +---+-+ | domain| seen| +---+-+ | skn24n.hotmail.ru | 2008-03-26 11:41:46 | | fe0ky.hotmail.ru | 2008-03-26 11:35:45 | | xyw7dgf.hotmail.ru| 2008-03-26 11:33:21 | | mmyjolyn.tripod.com | 2008-03-25 14:48:51 | | taviamarya.tripod.com | 2008-03-25 14:47:17 | | roljanna.tripod.com | 2008-03-25 14:47:08 | +---+-+ # host -tTXT skn24n.hotmail.ru.multi.uribl.com skn24n.hotmail.ru.multi.uribl.com text "Blacklisted, see http://lookup.uribl.com/?domain=skn24n.hotmail.ru"; See http://rss.uribl.com/hosters/ for host abuse listings. -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: Time to make multi.uribl.org optional rather than default?
Andy Dills wrote: It appears (from email recently sent to the admins of a few small mailservers I help admin) that the people in charge of uribl.com have decided to set a pretty low threshold for blacklisting DNS servers from querying, demanding that people who hit that threshold pay them a rather exorbitant rate for a data feed. Demanding? I believe the first thing that excessive query volume email tells you is to simply shut it off and be done. The data feed option is just that, an option. If you see no value in it, then you wont be missing anything by us not answering your queries. I have judged this threshold to be low based on the size of some of the mail/dns servers whose admins have gotten this email, along with the fact that this is the only blacklist to have taken this obnoxious stance. What is your definition of low volume? db2.xecu.net + dns02.xecu.net accounts for nearly 500k queries/day (~3GB of data/mo). There are over 40k unique IP that query URIBL public dns. As any mirror operator can see, we have around 180 IPs in the ACL. So thats ~0.45%. And those 180 blocked IPs consist of far fewer organizations/companies as many have more than 1 IP on that list. Filtering the top 0.45% IPs results in 20% fewer queries/second to the mirrors. I dont see trying to limit excessive bandwidth usage on donated mirrors as an "obnoxious stance". because right now the default inclusion of tests against multi.surbl.com is in reality just a "trial service" and an opportunity for this for-profit organization to create revenue streams. If you remove it from SA by default, you're doing so at the expense of the other 99.55%. We asked you to shut off your queries on 2007-12-27 19:15:09. Nearly 3 months later and we still saw the same high volume queries from your systems. I really don't care much either way, for me it's a done deal, I'm disabling the tests on my mail servers and advising others to do the same. I'm just wondering if the community at large is aware of this and has an opinion. Superb. Thats all you had to do in the first place without raising a stink. If SA wants to completely remove uribl.com tests because we dont allow the heavy hitters to query the public mirrors, thats their choice. Although, the usage policy for Spamhaus (http://www.spamhaus.org/organization/dnsblusage.html) doesnt prevent inclusion of RCVD_IN_SBL in SA. Thanks, -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: Re: Can anyone help me? surbl.org FP problems?
John Hardin wrote: On Tue, 2008-01-29 at 15:25 -0800, John Hardin wrote: On Tue, 2008-01-29 at 17:51 -0500, Matt Kettler wrote: Perhaps Verizon is screwing up their DNS? Ahh, yes they are: http://www.freedom-to-tinker.com/?p=1227 Hrm. As a troubleshooting hack for this increasingly-common "feature", perhaps a URIBL/DNSBL rule could be defined that checks a domain that will *never* be in the zones (apache.org maybe) and if it ever hit then add -20 to the score (to override all the FP hits) and emit a warning to inspect your DNS service for ISP hijacking? ...duh, that won't work. Where would the domain name to test come from? Perhaps a check for ISP DNS tomfoolery could be put in the --lint checks somehow? Or better yet, just fix the URIBLDNS plugin code to expect responses matching ^127\. Anything else is a dns monetizer. -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: URIWhois-0.02
Robert - elists wrote: DOB, for example, is run by ar.com, who are a registrar. Since they are a domain registrar, they have full, direct access to the whois database. Jeff C. Well there ya go Jeff... Become a registrar and bam! More data to help you cause Thats the easy answer, but do you know what it costs to become a registrar? Just for com/net from verisign you have $6500 up front, and $4k recurring. To get your icann credentials, you have $2500 up front with application, $4k yearly. A variable fee to icann once you start registering domains, and obviously the $0.25/registration that goes to icann. You also have to be able to show $70,000 in working capital. And that only gets you com and net. I'd want org from pir, info from afilias, and any other tld that takes alot of abuse (cn is big right now). At least those fees keep the bad guys from becoming registrars too. -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: fdf spam
David B Funk wrote: On Sat, 11 Aug 2007, wolfgang wrote: In an older episode (Friday, 10. August 2007), Mike Cisar wrote: Has anyone else been seeing the empty-body "PDF" spam, but with a .fdf file extension. Had a whole pile in my inbox here this morning. Thousands of them went through our mail gateways at work. A typo in some bot? No, merely the next episode in the never-ending spam-wars saga. A ".fdf" file is yet another Adobe file type and double-clicking on one (in a Windows box) will launch Acrobat-reader and display its contents. However anti-spam weapons such as PDFinfo are explicitly coded to look for ".pdf" files, thus ".fdf" is given a pass. This shows the cleverness behind (at least some of) the spammers. A quick edit will update PDFinfo to check ".fdf" files too. that was done this morning if you want to grab a new version... http://www.rulesemporium.com/plugins/PDFInfo.pm -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: Detecting short-TTL domains?
Jared Hall wrote: Great overview on DNS and Net::DNS. While there is a difference between RR and zone TTL times, my observation was based upon Zone SOA TTL records of recent spamvertized URIs in Emails. Well then that is even simpler... using the same code below.. &lookup('uribl.com', 'SOA'); and add support for SOA type in the lookup() sub. # support other rr types below... if ($type eq 'SOA') { print "$host $ttl IN $a ", $a->minimum, "\n"; } above, the $ttl would still be the ttl on the SOA answer record (possibly cached), and $a->minimum would be the ttl found in the SOA. from the docs, minimum() Returns the minimum (default) TTL for records in this zone. There is nothing wrong with using URI BLs. But most URI BLs are simply triggered from a "problem" that somebody else already had. It still seems to me that the problems presented by Fast-Flux systems can be mitigated by some coding relevant to current statistical norms. While I have no doubt that Dallas is technically accurate, I'm wondering if there is a Net::DNS function that can be used to extract zone SOA TTL: values (at least until Joe Spammer starts tweaking individual RRs)? Jared Hall General Telecom, LLC. On Friday 10 August 2007 13:59, Dallas Engelken wrote: John Rudd wrote: I'm a prophet now!? :-) Hm. So, I'm sure I can figure this out eventually, but does anyone know the right Net::DNS way to extract the TTL? Net::DNS::RR has a ttl() function. # perl ttl_test Lookup: A www.uribl.com www.uribl.com 591 IN A 209.200.135.149 use Net::DNS; my $res = Net::DNS::Resolver->new; &lookup('www.uribl.com', 'A'); exit; sub lookup { my ($host, $rr) = @_; print "Lookup: $rr $host\n"; my $packet = $res->send($host, $rr); return unless $packet; my $header = $packet->header; return if ($header->rcode =~ m/NXDOMAIN|SERVFAIL|REFUSED/i); my @answer = $packet->answer; foreach $a (@answer) { my $type = $a->type; my $ttl = $a->ttl; if ($type eq 'A') { print "$host $ttl IN $a ", $a->address, "\n"; } # support other rr types below... } } Note that Net::DNS returns the ttl from the answer record, which means if you have a caching nameserver, your ttl may be lower than the value returned from the authoritative nameservers. Pulling a ttl from an SOA wont work either, as ttl can be set per RR. The only proper way to do this is to perform a lookup, set the $res->nameservers() to those from the $packet->authority and re-run the query. That will give you authoritative results, and the ttl will be the proper one. Something like this... @authority = $packet->authority; if (scalar @authority) { @ns=(); # reset nameservers... foreach my $a (@authority) { my $type = $a->type; my $s = $a->rdatastr; if ($type =~ m/ns/i) { $s=~s/\.$//; push(@ns,$s); } } $res->nameservers(@ns); } Pulling authoritative results can be quite slow, so you may want to alarm it to prevent timeouts from hanging you up. -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: Detecting short-TTL domains?
John Rudd wrote: I'm a prophet now!? :-) Hm. So, I'm sure I can figure this out eventually, but does anyone know the right Net::DNS way to extract the TTL? Net::DNS::RR has a ttl() function. # perl ttl_test Lookup: A www.uribl.com www.uribl.com 591 IN A 209.200.135.149 use Net::DNS; my $res = Net::DNS::Resolver->new; &lookup('www.uribl.com', 'A'); exit; sub lookup { my ($host, $rr) = @_; print "Lookup: $rr $host\n"; my $packet = $res->send($host, $rr); return unless $packet; my $header = $packet->header; return if ($header->rcode =~ m/NXDOMAIN|SERVFAIL|REFUSED/i); my @answer = $packet->answer; foreach $a (@answer) { my $type = $a->type; my $ttl = $a->ttl; if ($type eq 'A') { print "$host $ttl IN $a ", $a->address, "\n"; } # support other rr types below... } } Note that Net::DNS returns the ttl from the answer record, which means if you have a caching nameserver, your ttl may be lower than the value returned from the authoritative nameservers. Pulling a ttl from an SOA wont work either, as ttl can be set per RR. The only proper way to do this is to perform a lookup, set the $res->nameservers() to those from the $packet->authority and re-run the query. That will give you authoritative results, and the ttl will be the proper one. Something like this... @authority = $packet->authority; if (scalar @authority) { @ns=(); # reset nameservers... foreach my $a (@authority) { my $type = $a->type; my $s = $a->rdatastr; if ($type =~ m/ns/i) { $s=~s/\.$//; push(@ns,$s); } } $res->nameservers(@ns); } Pulling authoritative results can be quite slow, so you may want to alarm it to prevent timeouts from hanging you up. -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: New PDF?
WebTent wrote: I have a few PDF's getting through now after doing pretty good, the latest 0.4 pdfinfo + sa 3.1.7 + sare rules + sa-update is not scoring enough on these: Current version is v0.6. And sigs for those were added last Thursday... http://esmtp.webtent.net/mail1.txt * 0.6 GMD_PDF_ENCRYPTED BODY: Attached PDF is encrypted * 2.0 GMD_PDF_FUZZY2_T11 BODY: Fuzzy tags Match * 5A4CB7600371063164BB7AFA6EDE7FE9 * 0.2 GMD_PDF_EMPTY_BODY BODY: Attached PDF with empty message body * 3.0 GMD_PDF_STOX_M4 PDF Stox spam http://esmtp.webtent.net/mail2.txt * 2.0 GMD_PDF_FUZZY2_T9 BODY: Fuzzy tags Match * 875C8F0810E6524EF0C3A7C4221A4C28 * 0.6 GMD_PDF_ENCRYPTED BODY: Attached PDF is encrypted * 0.2 GMD_PDF_EMPTY_BODY BODY: Attached PDF with empty message body * 3.0 GMD_PDF_STOX_M4 PDF Stox spam -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: PDF spam
R.Smits wrote: Matt Kettler wrote: Tarak Ranjan wrote: greetings, i'm getting pdf attached spam. please help me stop that using spamassassin... Horacio_FILE_506292_6906.pdf /tarak The PDFInfo plugin from rulesemporium is designed for this kind of thing. http://www.rulesemporium.com/plugins.htm Personally, I've been able to keep them under control with good bayes training, automated training by spamtraps, and a selective greylist, so I have not yet tried this plugin. Plugin seems to work great, but is it stable enough for big production environments ? Any issues ? I've heard of no performance problems.. Its only going to run on messages with mime parts that it belives contains pdf anyways... so what is that, <1% of the time. -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: Errors with PDFInfo.pm
Wolfgang Zeikat wrote: Hello again, On 07/12/07 16:22, Dallas Engelken wrote: Wolfgang Zeikat wrote: I noticed that some of the latest pdf spam mails do not contain a filename in the mime headers, could that be a reason for the above behaviour? Possibly, but seeing that line 300 is just a dbg() line itself, you can either comment it out, or change it to something that will not through a warn. # dbg("pdfinfo: found part, type=$type file=$name cte=$cte"); dbg("pdfinfo: found part, type=".($type ? $type : '')." file=".($name ? $name : '')." cte=".($cte ? $cte : '').""); Thanks, that fixed those. Lately, I see a lot of: Jul 17 14:27:10 spamlock2 spamd[9786]: Use of uninitialized value in concatenation (.) or string at /etc/mail/spamassassin/PDFInfo.pm line 272, line 1579. Jul 17 14:27:10 spamlock2 spamd[9786]: Use of uninitialized value in hash element at /etc/mail/spamassassin/PDFInfo.pm line 283, line 1579. Line 272 is (after the earlier changes): dbg("pdfinfo: MD5 results for ".($name ? $name : '')." - md5=$md5 fuzzy1=$fuzzy_md5 fuzzy2=$tags_md5"); Line 283 is: $pms->{pdfinfo}->{fuzzy_md5}->{$tags_md5} = 1; I'd say $tags_md5 is undef then which is odd because if it made it that far, then the message has a pdf in it and all pdfs have tag structures. Got samples that make that warn appear? -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: Who can tell me where the latest sa-stats can be found.
Steven W. Orr wrote: I used to use it but it's old and has bugs. I recent;y found out that it's *not* part of the sa distro. Is this still supported and if so, where do I get it? I looked around and found hugely conflicting version info. e.g., version 0.93 seems to support sa-3.1.x but version 1.03 seems to be for sa-3.0. (BTW, they both seem to be dated 2007-01-30 at http://rulesemporium.com/programs/ ) what the hell are you reading? http://rulesemporium.com/programs/sa-stats-1.0.txt = v1.03 is the latest, for SA 3.1 # version: 1.03 # author: Dallas Engelken <[EMAIL PROTECTED]> # desc:Generates Top Spam/Ham Rules fired for SA 3.1.x installations. http://rulesemporium.com/programs/sa-stats.txt = v0.93, for SA 3.0 # version: 0.93 # author: Dallas Engelken <[EMAIL PROTECTED]> # desc:Generates Top Spam/Ham Rules fired for SA 3.x installations. I havent touched them for a while and havent checked if v1.03 even works with SA 3.2. If something needs to be done, let me know. -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: PDFText Plugin for PDF file scoring - not for PDF images
James MacLean wrote: Hi folks, Regrets if this is the wrong list. Wanted to be able to score on text found in PDF files. Did not see any obvious route, so made a plugin that calls XPDF's pdfinfo and pdftotext to get the text that is then scored. Sample local.cf could be : pdftotext_cmd /usr/local/bin/pdftotext pdfinfo_cmd /usr/local/bin/pdfinfo body PDF_TO_TEXT eval:check_pdftext("^Error","sex","drugs",'Title:\s+stock_tmp.pdf:4','Creator:\s+OpenOffice.org 1.1.4:4') Notice that a :4 gives a find of that regex 4 points. Really don't know if this was the right road to follow, as I copied the AntiVirus.pm and came up with this: http://support.ednet.ns.ca/SpamAssassin/PDFText.pm So far... it appears to work as expected and didn't take down a pretty busy server ;). Enjoy hearing any positive criticisms :). I did this the other day with CAM::PDF, but Theo recommended this work should be done in the post_message_parse() plugin call. Then you could just write body rules against the text, uris would get checked by uribldns plugin, etc -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: New spam getting by PDFInfo?
McDonald, Dan wrote: On Fri, 2007-07-13 at 12:28 -0400, Robert Fitzpatrick wrote: Just verified a couple of PDF attachments getting through with our PDFInfo rules. Can someone test these to see if my PDF rules are working or if you're able to block? I believe the rules are working as the latter message is hitting one, just not enough to block. I tried my access to the PDFInfo link sent to me by the webmaster to see if there was an update, but it is not working now :( running pdfinfo 0.3, I see the first one being analyzed, but not stopped by the pdfinfo rule: there is a more current version than 0.3 that probably hits these. when i tried to access the urls, they were already gone, but i'd guess they were the ones that used 'pdf crypt' -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: Rulesemporium
John D. Hardin wrote: On Fri, 13 Jul 2007, Christopher X. Candreva wrote: On Fri, 13 Jul 2007, John D. Hardin wrote: Is there some reason pointing everyone at the coral cache of the website won't work? Granted, coral is also intended for large files, but it is distributed and is almost transparent... Well right now, www.rulesemporium.com came up in a few seconds directly, and took over a minute via the Coral Cache. So I would answer "because it doesn't help, and slows things down in fact". The initial retrieval of the cached pages *does* require a regular connection to the primary website, so the coral network would be just as impacted by a DDoS as regular users are. However, once it has its copy response should be quite fast. I just tried it and it took just a few seconds, whereas I haven't been able to get directly to the primary website at all for a week or more. Hi John, Prolexic says... If you could ask any users with connectivity issues to submit a 'host www.rulesemporium.com' and 'tcptraceroute www.rulesemporium.com' along" with a complaint of connectivity problems, that would be very helpful. So, if you want to send that to me, I can get the info to them so they can get to the bottom of it. -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: Rulesemporium
Anders Norrbring wrote: Henrik Krohns skrev: On Wed, Jul 11, 2007 at 07:44:37PM -0400, Phil Barnett wrote: We can't be the first people to come up against this problem. How have others solved it? Bunch'o'Mirrors? Crude and effective. *raise a hand* I volonteer to mirror, I have lots of both hd and bw capacity to spare. Sure, until you get your first DDoS... SURBL had like 10 mirrors for www when they started getting the ddos, and all of them took over 200mbit/s.. some upwards of 450mbit. URIBL had 3, and Spamhaus has 2 that I know of. If they can ddos at well over 3gbit/s (15*200), it really doesnt matter how many damn mirrors there are. Even if your mirror providers would take 20mbit/s each and not null route your ass, you'd need well over 150 mirrors. I do not believe "Bunch'o'Mirrors" is "the solution".It may be all fine and good for distribution of load/bandwidth, but thwarting off ddos it is not. The proper solution would be to dismantle the botnets that are capable of mass ddos. Some ISPs need to gain a clue, step it up, and do their part to cut off access to infected PCs. -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: Errors with PDFInfo.pm
Wolfgang Zeikat wrote: Hi, On 07/12/07 15:39, Robert Schetterer wrote: > Hi, @ll > the newest version of pdfinfo plugin > matched some new pdf spam right now > > * 2.0 GMD_PDF_FUZZY2_T3 BODY: Fuzzy MD5 Match > * 3D4E25DE4A05695681D694716D579474 > yes it does that here too in SA 3.1.8, but I get errors like: Jul 12 15:59:53 spamlock3 spamd[13136]: Use of uninitialized value in concatenation (.) or string at /etc/mail/spamassassin/PDFInfo.pm line 300, line 532. Jul 12 15:59:53 spamlock3 spamd[13136]: Use of uninitialized value in concatenation (.) or string at /etc/mail/spamassassin/PDFInfo.pm line 261, line 532. Jul 12 15:59:53 spamlock3 spamd[13136]: Use of uninitialized value in concatenation (.) or string at /etc/mail/spamassassin/PDFInfo.pm line 262, line 532. I noticed that some of the latest pdf spam mails do not contain a filename in the mime headers, could that be a reason for the above behaviour? Possibly, but seeing that line 300 is just a dbg() line itself, you can either comment it out, or change it to something that will not through a warn. # dbg("pdfinfo: found part, type=$type file=$name cte=$cte"); dbg("pdfinfo: found part, type=".($type ? $type : '')." file=".($name ? $name : '')." cte=".($cte ? $cte : '').""); Thanks, -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: Rulesemporium
Robert - eLists wrote: Praise God Almighty! We were able to spend more than a few seconds and many click on the rulesemporium website. Awesome. As it says, was it moved over to vr.org ??? A couple years ago... yup. Which is now netactuate.com -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: PDFInfo plugin with SA 3.1.7
GMD_PRODUCER_GPL85s/0h of 10767 corpus (9986s/781h AxB2-TRAPS) 07/11/07 # countsGMD_PRODUCER_POWERPDF 0s/0h of 10767 corpus (9986s/781h AxB2-TRAPS) 07/11/07 # countsGMD_PRODUCER_POWERPDF 0s/0h of 5641 corpus (4064s/1577h AxB-MANUAL) 07/11/07 # countsGMD_PDF_STOX_M1 159s/0h of 6132 corpus (555s/1577h AxB-MANUAL) 07/11/07 # countsGMD_PDF_STOX_M1 40s/0h of 11773 corpus (10988s/785h AxB2-TRAPS) 07/11/07 # countsGMD_PDF_STOX_M2 223s/0h of 6132 corpus (555s/1577h AxB-MANUAL) 07/11/07 # countsGMD_PDF_STOX_M2 29s/0h of 10767 corpus (9986s/781h AxB2-TRAPS) 07/11/07 -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: Re: So what about rulesemporium.com and these anti-PDF rules?
Henrik Krohns wrote: On Wed, Jul 04, 2007 at 10:08:29AM +0100, Justin Mason wrote: Bear in mind that the spammer who is developing this PDF spam is only one person, and he/she probably has at least one non-spammy-looking email address at his disposal. What's to spot him/her from asking Dallas for a copy of the ruleset and plugin, same as any other SpamAssassin user, waiting a few days to cover his/her tracks, then fixing the spam to avoid it again? And if you think this isn't already happening, I have a bridge for sale ;) If I was a spammer, I couldn't care less if few people were using some secret PDF blocking stuff. It's not like AOL or some big companies are using it. :) Based on that logic, it makes no difference if it gets released or not You dont think big companies utilize SpamAssassin, SARE, or other open source products for solutions, or even ideas for similar solutions? I think you would be pleasantly surprised. -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: Re: So what about rulesemporium.com and these anti-PDF rules?
Jason Haar wrote: Theo Van Dinter wrote: All in all, you're better off just making things public. I agree. It's sort of like saying that Open Source cannot work as a model in the antivirus/antispam arena... It can, if you have the people willing to contribute new dats on every revision of . ...and it may be true - but no-one on this list believes it ;-) The method used in the plugin is very simple, and very easy to work around if made public. What happens here is that when that "workaround" occurs, we have to release a new plugin, and a new ruleset. Its not like we just release a new ruleset, someone runs RDJ/sa-update and they are off.There is no way to auto-update the plugin (currently) besides to announce it and hope people install it. I foresee a major failure there. If you think you can improve it so that the plugin remains static, and only the rules need changing, then be my guest... -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: RE: So what about rulesemporium.com and these anti-PDF rules?
Chris Santerre wrote: You didn't miss anything. I don't believe they are released yet. FInal testing being done. Results look great. I'll see if they can get released soon. --Chris > -Original Message- > From: Michal Jeczalik [mailto:[EMAIL PROTECTED] > Sent: Tuesday, July 03, 2007 9:47 AM > To: users@spamassassin.apache.org > Subject: So what about rulesemporium.com and these anti-PDF rules? > > > It's been announced that these rules are coming soon and...? > Or maybe I > missed something? The PDFInfo.pm and accompanying ruleset will not be public. If you want it, please go to http://www.rulesemporium.com/plugins.htm#pdfinfo and request it. I'll try and get PDF support added into ImageInfo.pm soon, but it will only extend the capabilities that you currently have for gif/jpg/png... that being attachment count, file name matching, pdf image dimensions, pixal coverage (area), etc.However, thats not an ideal solution, and the rules you can write with that will stop the spam, but also have a greater chance of falsing. The mechanism used for accurate detection in the PDFInfo plugin is not going to be a part of this.. and I'd recommend you request the plugin and use it privately. If the information gets publicized, that method would soon be useless... and I dont feel like reworking it if I dont have to, nor maintaining a ruleset that is highly dependent on the plugin. Updates to the ruleset could very well mean updating the plugin, and you cant get people to update a plugin en masse as easy as you can get them to RDJ a new ruleset. :) -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: RulesDuJour lint failed. Updates rolled back.
This must be an issue that needs to be raised with Prolexic, as they are doing the DDoS protection for rulesemporium.com. Can anyone reproduce this redirect outside of RDJ, and give me a dump of the full transaction including http headers? I'd rather fix the actual problem and not patch around it. Thanks, Dallas Lindsay Haisley wrote: This problem is probably due to the way Rules Emporium is handling traffic. If requests come too fast from the same address, or if their server is busy, they send an HTML redirect page instructing the client to try again in 0.1 second. Curl and wget don't understand "" and simply store the refresh page as the output of the request. rules_du_jour is just a shell script so a proper fix should be pretty easy. The following is a quick and dirty patch which sort of solves the problem, at least for the next run of rules_du_jour. --- /root/rules_du_jour.orig2007-06-17 21:01:24.0 -0500 +++ /var/lib/spamassassin/rules_du_jour 2007-06-18 12:37:44.0 -0500 @@ -907,6 +907,8 @@ [ "${SEND_THE_EMAIL}" ] && echo -e "${MESSAGES}" | sh -c "${MAILCMD} -s \"RulesDuJour Run Summary on ${HOSTNAME}\" ${MAIL_ADDRESS}"; fi +grep -il 'META HTTP-EQUIV' ${TMPDIR}/*|xargs -n1 rm -f + cd ${OLDDIR}; exit; rules_du_jour will still fail, but this will clean up the mess and next time (hopefully) it'll run properly. A proper fix would sense when this happens and retry the download after a suitable short wait. It may also be helpful to insert some "sleep .5" instructions at appropriate points (or "sleep 1" if your implementation of sleep(1) doesn't understand floating point numbers). On Thu, 2007-06-28 at 11:22 +0100, Nigel Frankcom wrote: On Wed, 27 Jun 2007 16:42:39 -0400, "Daryl C. W. O'Shea" <[EMAIL PROTECTED]> wrote: Nigel Frankcom wrote: On Wed, 27 Jun 2007 08:48:02 -0400, David Boltz <[EMAIL PROTECTED]> wrote: I?ve been getting the lint failures found below on my Rules Du Jour updates for a few weeks now. Yes this would be since the DDoS attacks on rulesemporium. It looks like the same problem people have been having with the tripwire but for me it?s the adult and since just recently the spoof rules. The solutions I've seen don't seem to work for me. I see that my cron job (run nightly) is pulling some HTML source instead of the rules. I?ve tried removing the faulty 70_sare_adult.* from etc/mail/spamassassin/RulesDuJour/ and manually replacing it with the ?actual? file using wget. I?ve even manually updated the used /etc/mail/spamassassin/70_sare_adult.cf to ensure that it was correct. When I us ?wget http://rulesemporium.com/rules/70_sare_adult.cf? to grab the file it works without problems. Does anyone have any ideas on how I might fix this problem? ***WARNING***: spamassassin --lint failed. Rolling configuration files back, not restarting SpamAssassin. Rollback command is: mv -f /etc/mail/spamassassin/70_sare_adult.cf The quick cure is to delete anything in the /etc/mail/spamassassin/RulesDuJour/ directory and rerun RDJ by hand. That worked for me on CentOS 4.5 The bug has been reported and a fix is due in 3.2.2 I believe. Huh? What's SA have to do with RDJ triggering Prolexic's DoS protection? Daryl is right, there is no fix due in 3.2.2 - I got the RDJ and the sa-update errors confused. I guess maybe I should dye my hair blonde. Apologies for any confusion I've caused. Kind regards Nigel -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: Spam PDF
Robert Schetterer wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Dallas Engelken schrieb: John Thompson wrote: Raymond Myren wrote: Just today I started receiving spam mails with attached .pdf files with a spam image. Any ideas how to stop this spam type? Nothing, yet. But since these appear to be an image file encapsulated in a .pdf, it may be possible to get FuzzyOCR to parse them for spam text. As was stated earlier... Until its publicly released, you can request a solution from SARE with a simple email via the information at http://www.rulesemporium.com/plugins.htm#pdfinfo Hi Dallas, i am lucky to report that your rules matched all pdf spam ( i had 4 ) caught in the past at my servers good work! Good, as expected. Thanks for the feedback. -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: Spam PDF
John Thompson wrote: Raymond Myren wrote: Just today I started receiving spam mails with attached .pdf files with a spam image. Any ideas how to stop this spam type? Nothing, yet. But since these appear to be an image file encapsulated in a .pdf, it may be possible to get FuzzyOCR to parse them for spam text. As was stated earlier... Until its publicly released, you can request a solution from SARE with a simple email via the information at http://www.rulesemporium.com/plugins.htm#pdfinfo -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: pdf spam solution idea
arni wrote: Hi, its come up several times now that people ask for a way to directly detect pdf spam by the pdf content and not only through headers or other means (hashes, bayes). I've found a solution that should be pretty easy to realise in a Fuzzy-OCR like plugin. Here is what it should do: Use xpdf (http://www.foolabs.com/xpdf/download.html) to read the pdf document export the images to ppm files using `pdfimages` export the text parts to a simple text using `pdftotext` This plugin should run as one of the first to make the raw text read available (for example by attaching it as an extra mime part or somehow internally) as well as make the images available to FuzzyOCR or similar by the same means as above. Unfortunately i wont be able to write such a plugin myself, it should be rather easy to do but i cant start to learn pearl just for this ;-) I already have... I'll be releasing the info soon. -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: Spam PDF
Raymond Dijkxhoorn wrote: Hi! We just caught one: Content analysis details: (5.0 points, 4.0 required) pts rule name description - -- - -- 0.6 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail) 0.4 BAYES_60 BODY: Bayesian spam probability is 60 to 80% [score: 0.7404] 2.2 TVD_SPACE_RATIOBODY: TVD_SPACE_RATIO 0.9 RCVD_IN_SORBS_DUL RBL: SORBS: sent directly from dynamic IP address [201.32.227.251 listed in dnsbl.sorbs.net] 0.9 RCVD_IN_PBLRBL: Received via a relay in Spamhaus PBL [201.32.227.251 listed in zen.spamhaus.org] Jun 27 14:50:03 vmx80 MailScanner[4491]: Message l5RCnxP8019756 from 212.127.254.149 ([EMAIL PROTECTED]) to quicknet.nl is spam, SpamAssassin (not cached, score=24.191, required 5, BAYES_50 0.00, BODY_EMPTY 0.50, GMD_PDF_BAD_FUZZY 20.00, GMD_PDF_HORIZ 0.25, GMD_PDF_STOX 1.00, PROLO_NO_URI 0.01, RCVD_IN_WHOIS_BOGONS 2.43) Dallas rocks! The cats out of the bag now! :) More details on this will be made available later today hopefully. -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: Status of Spamassassin
The Doctor wrote: On Wed, Jun 13, 2007 at 07:30:10AM -0500, Dallas Engelken wrote: The Doctor wrote: Cans rules_du_jour work? Still getting a no update state. SARE is back up (knock on wood). Delete your .cf files and re-run RDJ... -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. I got: Script started on Wed Jun 13 06:38:41 2007 doctor.nl2k.ab.ca//etc/mail/spamassassin$ rulesdu _du_jour exec: curl -w %{http_code} --compressed -O -R -s -S -z /etc/mail/spamassassin/RulesDuJour/rules_du_jour http://sandgnat.com/rdj/rules_du_jour 2>&1 curl_output: 304 Performing preliminary lint (sanity check; does the CURRENT config lint?). No files updated; No restart required. Rules Du Jour Run Summary:RulesDuJour Run Summary on doctor.nl2k.ab.ca: ***NOTICE***: /usr/contrib/bin/spamassassin -p /usr/contrib/etc/MailScanner/spam.assassin.prefs.conf --lint failed. This means that you have an error somwhere in your SpamAssassin configuration. To determine what the problem is, please run '/usr/contrib/bin/spamassassin -p /usr/contrib/etc/MailScanner/spam.assassin.prefs.conf --lint' from a shell and notice the error messages it prints. For more (debug) information, add the -D switch to the command. Usually the problem will be found in local.cf, user_prefs, or some custom rulelset found in /etc/mail/spamassassin. Here are the errors that '/usr/contrib/bin/spamassassin -p /usr/contrib/etc/MailScanner/spam.assassin.prefs.conf --lint' reported: [15745] warn: config: failed to parse line, skipping, in "/usr/contrib/etc/mail/spamassassin/local.cf": socre FORGED_HOTMAIL_RCVD2 45.0 [15745] warn: config: failed to parse line, skipping, in "/usr/contrib/etc/mail/spamassassin/local.cf": socre SARE_URGBIZ 45.0 [15745] warn: config: failed to parse line, skipping, in "/usr/contrib/etc/mail/spamassassin/local.cf": terse_report This message came for a spam friendly e-mail server. [15745] warn: config: failed to parse line, skipping, in "/usr/contrib/etc/mail/spamassassin/random.cf": [15745] warn: config: failed to parse line, skipping, in "/usr/contrib/etc/mail/spamassassin/random.cf": [15745] warn: config: failed to parse line, skipping, in "/usr/contrib/etc/mail/spamassassin/random.cf": 302 Found [15745] warn: config: failed to parse line, skipping, in "/usr/contrib/etc/mail/spamassassin/random.cf": [15745] warn: config: failed to parse line, skipping, in "/usr/contrib/etc/mail/spamassassin/random.cf": Found [15745] warn: config: failed to parse line, skipping, in "/usr/contrib/etc/mail/spamassassin/random.cf": The document has moved http://www.sa-blacklist.stearns.org/sa-blacklist/random.current.cf";>here. [15745] warn: config: failed to parse line, skipping, in "/usr/contrib/etc/mail/spamassassin/random.cf": where do you get /usr/contrib/etc/mail/spamassassin/random.cf from? -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: Status of Spamassassin
The Doctor wrote: Cans rules_du_jour work? Still getting a no update state. SARE is back up (knock on wood). Delete your .cf files and re-run RDJ... -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: Rulesemporium down?
Jerry Durand wrote: At 09:19 AM 6/9/2007, Dallas Engelken wrote: Rulesemporium.com will be coming back online at approximately 1800 GMT. Special thanks to Prolexic (http://www.prolexic.com) for the DDoS protection. Great news and good work! I assume we can re-enable sa-update for tonight's run. Thanks for keeping this running. Yes, I just verified http://www.rulesemporium.com/rules/ is serving data now. -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: Rulesemporium down?
Yet Another Ninja wrote: On 6/9/2007 6:50 PM, Jerry Durand wrote: At 09:19 AM 6/9/2007, Dallas Engelken wrote: Rulesemporium.com will be coming back online at approximately 1800 GMT. Special thanks to Prolexic (http://www.prolexic.com) for the DDoS protection. Great news and good work! I assume we can re-enable sa-update for tonight's run. Thanks for keeping this running. Guys There's really no need to automate RDJ SARE rules aren't being updated too frequently and any rule change will be announced on the list. Each RDJ empty hit adds to traffic, which, atm , is a precious luxury. Pls be considerate and help SARE keep the site alive. Prolexic will be providing proper caching of the rules shortly, so this shouldnt be much of an issue going forward. As long as people would keep their automation at 1-2 times a day, its cool. -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: Rulesemporium down?
Yet Another Ninja wrote: On 6/7/2007 2:52 PM, Jake Vickers wrote: Steven Stern wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 My systems all were unable to connect for their daily RDJ update yesterday. I time out trying to reach http://rulesemporium.com. Does anyone know what's happening? - -- Same issue here. 404 errors. Pls Disable all RDJ till further notice... Rulesemporium.com will be coming back online at approximately 1800 GMT. Special thanks to Prolexic (http://www.prolexic.com) for the DDoS protection. -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: Spamassassin is very slow...
Sven Schuster wrote: Hi, On Fri, Jun 08, 2007 at 12:26:38PM +0200, Devilish Entity told us: On 6/8/07, Theo Van Dinter <[EMAIL PROTECTED]> wrote: On Thu, Jun 07, 2007 at 03:15:35PM -0700, geist_ wrote: One AMD Unknown 1300MHz processor, 2601.92 total bogomips, 95M RAM [...] Any help should be usefull... Get more RAM. :)Seriously, 95M is not really enough for anything these days, let alone resource intensive apps such as SA. Well i assume that it is really few but it never was as slow... Plus it's only about a little server i get at max 20 mails per day... So... before it tooks about 3~4secs to parse/scan a message do you have network tests enabled, especially URIBL?? If so it might be due to the recent DDOS on uribl.com, which causes the scans to take longer due to DNS timeout?? There should be no dns timeouts for URIBL currently. The dns mirrors are all up... just the websites are ddos'd. -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: Using SA code to extract URLs ?
Michael W. Cocke wrote: I was told a while back that the best way to extract urls from emails was to use code from SpamAssassin. Ok - Now, I need to do just that. Any pointers? I've looked thru the code in SpamCopURI, but unless there are some docs hidden somewhere I can't even figure out the entry point. Are there some docs hidden somewhere (I hope!)? Thanks! Mike- here is a little something i use to extract urls from messages. it takes a mesage on STDIN, runs its through a empty instance of SA (no rules, no configs loaded), and prints to STDOUT. #!/usr/bin/perl use Mail::SpamAssassin; use Mail::SpamAssassin::PerMsgStatus; &main; # sub main { my $msg; while (<>) { $msg .= $_; } my $data = &geturi(\$msg); print $data; exit; } # sub geturi { my ($message) = shift; my $sa = create_saobj(); $sa->init(0); my $mail = $sa->parse($$message); my $msg = Mail::SpamAssassin::PerMsgStatus->new($sa, $mail); my @uris = $msg->get_uri_list(); my %uri_list; foreach my $uri (@uris) { next if ($uri =~ m/^(cid|mailto|javascript):/i); $uri_list{$uri} = 1; } my $uris = join("\n", keys %uri_list, ""); return $uris; } # sub create_saobj { my %setup_args = ( rules_filename => undef, site_rules_filename => undef, userprefs_filename => undef, userstate_dir => undef, local_tests_only => 1, dont_copy_prefs => 1 ); my $sa = Mail::SpamAssassin->new(\%setup_args); return $sa; } # # EOF # cat corpus/spam/canselon.com.html | perl parse_uri.pl http://images.loveouroffers.com/general/8675_usub/USUB_101_b_02.gif ./unsubscribeOffers.html http://images.loveouroffers.com/general/8675_usub/USUB_101_b_01.gif http://images.loveouroffers.com/general/8675_usub/spacer.gif list.html?clientid=12&em=&offerid=1&mailerid=1&emailid=0 http://list.html/?clientid=12&em=&offerid=1&mailerid=1&emailid=0 http://images.loveouroffers.com/general/8675_usub/USUB_101_b_03.jpg http:///unsubscribeOffers.html http://./unsubscribeOffers.html Enjoy. Also, I only get digest copies from this list and dont check them all, so please cc me if you want me to see it. :) -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
Re: ImageInfo Bug
Stuart Johnston wrote: Dallas, I think there is a bug in the image_size_range function. my $name = $type.'_dems'; Should probably be more like: my $name = "dems_$type"; Thanks, Stuart Yup.. Craig Green made me aware of that last week, and I've been too busy to address it. I'll get it updated on the SARE side shortly. I havent looked at Theo's sandbox lately, but I'd guess its incorrect there also then. Thanks, -- Dallas Engelken [EMAIL PROTECTED] http://uribl.com
ImageInfo plugin updated!
Greeting, I've added a few enhancements to the ImageInfo plugin for SpamAssassin. You can get it from.. http://www.rulesemporium.com/plugins.htm#imageinfo Updates: - added optimization changes by Theo Van Dinter - added jpeg support - added function image_named() - added function image_size_exact() - added function image_size_range() - added function image_to_text_ratio() See the update ruleset for some example. Tweak the rules/scores to meet your needs... -- dallase http://uribl.com
RE: Strange problem
> -Original Message- > From: Rick Macdougall [mailto:[EMAIL PROTECTED] > Sent: Monday, July 10, 2006 11:59 > To: [EMAIL PROTECTED] > Cc: users@spamassassin.apache.org > Subject: Re: Strange problem > > Sanford Whiteman wrote: > >> Both servers have exactly the same config except for the > auto-learn > >> and bayes/user prefs are stored in mysql on the FreeBSD server. > > > > Thanks to all who replied. > > I found the problem and it's related to ixhash, the timeout > doesn't work correctly / work at all. > > I see > > Jul 10 11:13:01 spa010 spamd[29830]: ixhash timeout reached > at /etc/mail/spamassassin/ixhash.pm line 91, line 2226. > > Jul 10 11:13:01 spa010 spamd[29830]: ixhash timeout reached > at/etc/mail/spamassassin/ixhash.pm line 91, line 2226. > > In the logs and the child never exits from processing the message. > > I've cc'd Dallas to see if he has any insights into the problem. > the warns are being generated because the timeout value has been exceeded... my $timeout = $permsgstatus->{main}->{conf}->{'ixhash_timeout'} || 5; eval { Mail::SpamAssassin::Util::trap_sigalrm_fully(sub { die "ixhash timeout reached"; }); the code is right.. you need to figure out why it times out. have you hardcoded ixhash_timeout to some other value? have you tried manual lookups from that box? # host -tA abc.ix.dnsbl.manitu.net Host abc.ix.dnsbl.manitu.net not found: 3(NXDOMAIN) d
RE: DNS Whitelists
> Actually what I was thinking of was an DNS version of this list so that other applications can use it. oh i see.. well SA couldnt use it without someone writing a plugin then. dallase http://uribl.com
RE: DNS Whitelists
> -Original Message- > From: Marc Perkel [mailto:[EMAIL PROTECTED] > Sent: Thursday, June 22, 2006 09:30 > To: [EMAIL PROTECTED] > Cc: users@spamassassin.apache.org > Subject: Re: DNS Whitelists > > I'm not thinking links, What I want to do is whitelist based > on the host name of the server connecting to my server. > isnt that what whitelist_rcvd_from is for? is that what http://www.rulesemporium.com/rules/70_sare_whitelist.cf is for? what am i missing here? dallase http://uribl.com
RE: DNS Whitelists
> -Original Message- > From: Marc Perkel [mailto:[EMAIL PROTECTED] > Sent: Thursday, June 22, 2006 09:15 > To: users@spamassassin.apache.org > Subject: DNS Whitelists > > Are there any DNS bases whitelists out there? If not - > shouldn't we build one? > > I need two different kinds of DNS whitelists. One would be > hosts that NEVER send spam. Large banks, etc. > > The second list is a list of hosts that should never be blacklisted. > These are hosts that might send some spam but should never > accidentally be blacklisted because of it. Examples would be > *.aol.com, *.earthlink.nat, *.yahoo.com. The idea here for > those of us who are trying to build really reliable > blacklists to reference these lists as hosts to never blacklist. > > Any thoughts on this? > > # ping aol.com.white.uribl.com PING aol.com.white.uribl.com (127.0.0.2) 56(84) bytes of data. 64 bytes from localhost (127.0.0.2): icmp_seq=1 ttl=64 time=0.095 ms # ping otherdomain.com.white.uribl.com ping: unknown host otherdomain.com.white.uribl.com white.uribl.com will probably do exactly what you want here... but just realize spammers can include these domains in their spam also. you could always do something like... urirhssub URIBL_BLACK multi.uribl.com.A 2 bodyURIBL_BLACK eval:check_uridnsbl('URIBL_BLACK') describeURIBL_BLACK Contains an URL listed in the URIBL blacklist tflags URIBL_BLACK net score URIBL_BLACK 3 urirhssub URIBL_WHITE white.uribl.com.A 2 bodyURIBL_WHITE eval:check_uridnsbl('URIBL_WHITE') describeURIBL_WHITE Contains an URL listed in the URIBL whitelist tflags URIBL_WHITE net score URIBL_WHITE -2 metaURIBL_COMPENSATE (URIBL_BLACK && URIBL_WHITE) describeURIBL_COMPENSATE Contains an URL listed on both URIBL black and white score URIBL_COMPENSATE 1 dallase http://uribl.com
RE: Latest sa-stats from last week
> -Original Message- > From: Matt Kettler [mailto:[EMAIL PROTECTED] > Sent: Monday, May 08, 2006 14:50 > To: [EMAIL PROTECTED] > Cc: users@spamassassin.apache.org > Subject: Re: Latest sa-stats from last week > > Dallas Engelken wrote: > >> -Original Message- > >> From: [mailto:[EMAIL PROTECTED] > >> Sent: Monday, May 08, 2006 14:07 > >> To: users@spamassassin.apache.org > >> Subject: Latest sa-stats from last week > >> > >> Email: 561313 Autolearn: 0 AvgScore: 6.77 > >> AvgScanTime: 2.41 sec > >> Spam:209359 Autolearn: 0 AvgScore: 16.99 > >> AvgScanTime: 2.30 sec > >> Ham: 351954 Autolearn: 0 AvgScore: 0.70 > >> AvgScanTime: 2.48 sec > >> > >> Time Spent Running SA: 376.39 hours > >> Time Spent Processing Spam: 133.76 hours > >> Time Spent Processing Ham: 242.62 hours > >> > >> TOP SPAM RULES FIRED > >> > >> RANKRULE NAME COUNT %OFRULES > >> %OFMAIL %OFSPAM %OFHAM > >> > >>1URIBL_BLACK 1633977.09 > >> 29.11 78.050.50 > > > > Nice. > > > > How does that Queen song go?? We... are... ;) > > > > I would be proud of those numbers Dallas.. However, I'd also > take them as a warning of areas needing improvement. > > URIBL has the highest spam hit rate, but you nonspam hit-rate > is more than 5 times that of JP, your closest competitor in > the world of uridnsbl's. > >1URIBL_BLACK 1633977.09 > 29.11 78.050.50 >5URIBL_JP_SURBL 1182515.13 > 21.07 56.480.09 > > Given that your spam hit rate is 1.5 times that of JP, > compared to the 5 times higher nonspam rate, it suggests JP > is doing a whole lot better in the accuracy department. > > (note: I do realize this can be biased by overall FNs in SA. > Some of those 0.50 might be SA FN's. That said, such FNs > would likely also affect other URIBLs.) > > This isn't to say that URIBL_BLACK isn't useful, or that you > guys aren't doing a good job. However, this is good evidence > you guys are doing great, but you do still have some areas > that could use improvement. > thanks, i think. ;) our fp ratio for ham has always been hanging at that level. i think thats a good sign. it means the data in our zones that are causing those ham hits have not changed, and no one has notified us that they need removal. doesnt worry me a bit. we welcome your delist requests if you actually find a FP (that we can agree on) on black.uribl.com. :) d
RE: Latest sa-stats from last week
> -Original Message- > From: [mailto:[EMAIL PROTECTED] > Sent: Monday, May 08, 2006 14:07 > To: users@spamassassin.apache.org > Subject: Latest sa-stats from last week > > Email: 561313 Autolearn: 0 AvgScore: 6.77 > AvgScanTime: 2.41 sec > Spam:209359 Autolearn: 0 AvgScore: 16.99 > AvgScanTime: 2.30 sec > Ham: 351954 Autolearn: 0 AvgScore: 0.70 > AvgScanTime: 2.48 sec > > Time Spent Running SA: 376.39 hours > Time Spent Processing Spam: 133.76 hours > Time Spent Processing Ham: 242.62 hours > > TOP SPAM RULES FIRED > > RANKRULE NAME COUNT %OFRULES > %OFMAIL %OFSPAM %OFHAM > >1URIBL_BLACK 1633977.09 > 29.11 78.050.50 Nice. How does that Queen song go?? We... are... ;)
RE: URIBL_BLACK + OB_SURBL double-listed nonspam domain
> -Original Message- > From: Matt Kettler [mailto:[EMAIL PROTECTED] > Sent: Sunday, February 19, 2006 06:09 > To: jdow > Cc: users@spamassassin.apache.org > Subject: Re: URIBL_BLACK + OB_SURBL double-listed nonspam domain > > Right now JP+SC scores 8.585, which even BAYES_00 can't > bring back down under the 5.0 line. I trust the URIBLs a lot, > I think they're great. But I don't trust them so much that > two of them should be able to over-ride BAYES_00 without any > other spam rules firing. > So score BAYES_00 at -10.. unless you don't "trust" BAYES_00 either. :)
RE: Over-scoring of SURBL lists...
> -Original Message- > From: Matt Kettler [mailto:[EMAIL PROTECTED] > Sent: Sunday, February 19, 2006 06:27 > To: [EMAIL PROTECTED] > Cc: users@spamassassin.apache.org > Subject: Re: Over-scoring of SURBL lists... > > Dallas Engelken wrote: > > > > > > So please... put this f'ing thread to bed and send a delist request. > > > Yes, but dallas.. this thread IS NOT about how to keep the > URIBLs cleaner. I really don't care how it got there. I > understand that mistakes happen. No big deal. I'm not trying > to start a witch-hunt demanding greater purity in URIBL > listings. If I wanted to do that, I'd do it on the uribl and > surbl lists. > > I *AM* trying to get people to think about the STRUCTURE OF > THE RULES and how they are scored in SpamAssassin. The > problem is nobody's even willing to discuss that end of > things without miles of proof that a problem exists. > > I've proven a problem exists.. Submitting delist requests > will NOT work as a sole fix it because it's just going to > happen again. and again, and again. Yes, delists are a good > thing. But we need to realize that human error will continue > to happen, and thus the spammassassin rules need to be > structured accordingly. > You've proven a problem of obscure FPs. We (surbl/uribl) both maintain internal whitelists. They are not fully encapsulating of every ham domain out there, but they are pretty god damn big and remove nearly all possibilities of causing what I would call substantial "damage". Your examples (to this point) are of very narrow scope. I have not heard anyone else on SA-users ever complain of rampant URIBL only FPs.. and these people will normally let you know if it exists. > So can we put all the arguments about who's URIBL is bigger > than who's to rest and start looking at the spamassassin end > of the problem? I'm not sure where you got this, but I've never said anything of the sorts. I've also never heard Jeff say anything of the sort. We have different list structure, different listing philosophies, and different sources but we have a similar intrest and many times reach the same final result. > Because I really don't give a damn about who made what > mistakes and who makes more mistakes. > > Simple fact. Mistakes get made. Sometimes multiple mistakes > coincide with each other. For some reason, many people on > this list seem to refuse to accept that can happen. So I've > had to make a lot of proof it can happen. Some folks have > taken that as criticism of the URIBLs affected. It's not, > it's just facts to support the obvious. > IMHO, your "proof" has been small and insignificant to this point. > I *like* both surbl.org and uribl.com. I thimk they're great. > So will you guys quit painting me as attacking the URIBLS > because I point out some problems with how SA implements > checking them? > > Can we address the real question here: > > How can we keep the spam tagged, and try to mitigate the FPs > by keeping additive scores for multiple URIBLs more moderate? > +20 worth of URIBL hits is fine on spam, but astronomically > high scores don't really help SA when the tagging threshold > is +5. However, they do hurt SA when overlapping mistakes happen. > > If this is the issue that you are really trying to address, it would be better done on the dev list... Because I think the users list (in general) is happy with the current implementation. If they are not, I guess now is the time to speak up. I am going to bow out of this thread now as I have spent far more time on it than it warrants. I appreciate your feedback to uribl, and welcome your delist requests for any FPs you come across. In the end, we are all working towards a common goal. Thanks, Dallas
RE: Over-scoring of SURBL lists...
> -Original Message- > From: Matt Kettler [mailto:[EMAIL PROTECTED] > Sent: Sunday, February 19, 2006 02:07 > To: jdow > Cc: users@spamassassin.apache.org > Subject: Re: Over-scoring of SURBL lists... > > jdow wrote: > > > >> rbl/uribl overlap. > > > > Matt, I think your worry about overlap is faulty. If the > lists all fed > > off one common database it would be a worry. Then the correlation > > would be a symptom of the system not working. If they all work off > > more or less individual captures and submissions their raw > databases > > have low correlation. If their results correlate well, as > in "overlap" > > as you are using it, that is an indication of their goodness. > > Yes, but the frequency of overlap in nonspam that I'm seeing > at my site is disturbing. > I've posted examples of this, and they keep getting ignored. > > This IS a real problem. I am not speculating. I've posted two > real domains on this list that have had the problem for me in > the past 7 days. > ultraedit-updates.com: OB + uribl black (delisted from both at my > request) >winterizewithscotts.com: OB + uribl black (I have > intentionally NOT submitted a delist request for this domain) > "honey, our grass is less green this year because URIBL blocked my winterizer reminder." :) winterizewithscotts.com was manually added on oct-14, no delist requests in over 4 months. It was not via web submission. It was not an automated add. Rather a direct add by someone who has added over 8k entries to uribl black in the last week. Now I'm not saying its wrong or right, I'm just saying it was a judgement call based on human review. So please... put this f'ing thread to bed and send a delist request. D
RE: Over-scoring of SURBL lists...
> -Original Message- > From: Matt Kettler [mailto:[EMAIL PROTECTED] > Sent: Saturday, February 18, 2006 00:05 > To: Raymond Dijkxhoorn > Cc: jdow; users@spamassassin.apache.org > Subject: Re: Over-scoring of SURBL lists... > > Raymond Dijkxhoorn wrote: > > Hi! > > >>> > I consider that "highly similar" for JP, SC, AB, OB and WS. > >>> > >>> As similar as 30 and 40, and 0, .3 and 7 are, I suppose. > > > >> On another paw how "independent" are these lists? Do any > inherit from > >> other lists or are they all separately maintained? > > > > They use different datasources and no cross links between them. If > > there is a real nasty one we could/would talk about it on > the private > > list but thats really sporadic. > > Untrue. AB and SC use a common data source, spamcop reports. > However, each has it's own processing/listing criteria and > each is separately maintained. > > And, realistically, since WS and uribl accept direct reports > from more-or-less anyone, their data sources could be > redundant with any other URIBLs depending on what the > > It's really straight forward for an end-user to report the > email to spamcop, then report the spamverized URI to WS and > URIBL_BLACK via web forms. > > Pickup on surbl's SC list appears to involve multiple reports > to spamcop, but there's still potential for common inputs. > > Let's see a show of hands.. How many people here have ever > filed a spam report with multiple lists, including doing > spamcop + either WS or URIBL. > > (raises own hand) > FWIW, web submissions account for less than 1% (119 of 12652 listings) of URIBL data for the last 7 days. All submissions are reviewed, so I find it hard to believe that the FPs are coming in via this mechanism.. seeing that a human reports it (i hope) and a human reviews it. From what I see, FPs normally come from automation and over zealous mass adds. D
RE: Over-scoring of SURBL lists...
> -Original Message- > From: Daryl C. W. O'Shea [mailto:[EMAIL PROTECTED] > Sent: Friday, February 17, 2006 21:34 > To: Dallas L. Engelken > Cc: users@spamassassin.apache.org > Subject: Re: Over-scoring of SURBL lists... > > Dallas L. Engelken wrote: > > The result will be no URIBL only FPs. OTOH, you may end up with a > > shit-ton of people bitching about spam accuracy dropping in > stock 3.2 > > installs if you make these changes. > > I'm not sure it'd be *that* bad. > > A grep of my logs from this week shows that 1.1% of my spam > scores under a score of 8 and only 13% of those spams hit > *any* URIBLs. > > So yeah, there'd be more FNs, but I'm not sure that it'd a > shit-ton of them. > > All I know is I've had a few system bit by http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4767 And when this happens, I hear about it, because people are complaining about getting a "shit-ton" of spam. And all just run SBL/URIBL/SURBL. No other RBLs. Now I understand loosing URIBL tests completely versus scoring between say 2 and 5.5 are completely different things... but I still believe a considerable increase will be seen in FN's. Which phone call would you rather have? Q) My client is trying to send me email and its being rejected because they are listed on URIBL, what should I do? A) whitelist the sender or request delisting from the URIBL Q) How do I block all this stock, pill, porn, etc spam that started coming in since we "upgraded"? A) Well, you can re-adjust your heuristic scoring for your URIBL tests back to their previous values, let me walk you through that.. The first question can be answered in 1 minute. The second question OTOH could take you a considerable amount of time. Especially if you have to do it for them. Oh, and you have to wait for them to reconfigure their PIX (that they don't know how to administer) before you can get in, and they want you to wait on the line until they figure it out. Meanwhile client X, Y, and Z are waiting for you to get off the phone so you can do the same thing for them ;) All I'm saying here is, I'll take the easy route :) Dallas
RE: Over-scoring of SURBL lists...
> -Original Message- > From: Matt Kettler [mailto:[EMAIL PROTECTED] > Sent: Friday, February 17, 2006 18:47 > To: Matt Kettler > Cc: Jeff Chan; users@spamassassin.apache.org > Subject: Re: Over-scoring of SURBL lists... > > Matt Kettler wrote: > > > I'll even re-quote myself: > >> I personally would like to see some statistics, but at > this point, > >> we don't have any test data on this so we're arguing your > theory vs mine. > > And your quote that I was counter-pointing: > >> As you can see the performance of the lists are different, > and the way they're created is different too. > > > > I don't see enough of a difference to clearly rule out > significant overlap. > > > > I'll define my test of "significant overlap" as: > >> 10% of total hits redundant across 3 or more lists and >1% nonspam > >> hits > > redundant across 2 or more lists. > > > > Messages received today that are double-listed in two or more > of SC, JP, AB, OB and WS: > grep "SURBL_MULTI2" /var/log/maillog |grep "Feb 17" |wc -l > 292 > > All surbl.org hits in same timeframe (includes ph, but no matter): > > grep "_SURBL" /var/log/maillog |grep "Feb 17" |wc -l > 583 > > So we at least have a 50% double-listing rate. That > in-and-of-itself isn't much of a problem, but it also doesn't > rule out overlap. It's still a whole lot higher than my first > criteria of 10% overlap > > However, right now I don't have more than 100 FPs so I can't > really comment on the nonspam hit rate of SURBL_MULTI2. > That's the important one. > > I also added multi3, multi4 and another rule to detect > overlap between uribl.com's black and surbl.org: > > meta URIBL_BLACK_OVERLAP (URIBL_BLACK && (URIBL_AB_SURBL || > URIBL_JP_SURBL || URIBL_OB_SURBL || URIBL_WS_SURBL || > URIBL_SC_SURBL)) score URIBL_BLACK_OVERLAP -1.0 > if anyone is interested, here is an alternative scoring method for 25_uribl.cf -> http://www.uribl.com/tools/25_uribl.cf (make sure you wipe out the scores for uribl tests in 50_scores.cf if you replace this file). This should make SBL/URIBL/SURBL hits range in score from 2.0 to 5.5... - 2.0 (SBL ONLY) - 2.5 (URIBL_ONLY) - 2.5 (SURBL_ONLY) - 3.0 (SBL + URIBL) - 3.0 (SBL + SURBL) - 3.0 (SURBL_ONLY x2) - 4.0 (URIBL + SURBL) - 5.0 (SBL + URIBL + SURBL) - 5.5 (SBL + URIBL + SURBLx2) If you want to reduce the possibility of URIBL-only FPs, this is the way to go. D
RE: Over-scoring of SURBL lists...
> -Original Message- > From: Theo Van Dinter [mailto:[EMAIL PROTECTED] > Sent: Friday, February 17, 2006 01:09 > To: users@spamassassin.apache.org > Subject: Re: Over-scoring of SURBL lists... > > On Thu, Feb 16, 2006 at 10:42:19PM -, Dallas Engelken wrote: > > So.. I have moved partypoker.com to grey for now. I'll let you and > > Theo thumb wrestle over it :) > > Warning: I have big hands. ;) > Yea, "thats what she said" ;) > I'm happy to show samples of mails to certain folks, btw. > There are several personal and spamtrap entries in my which > refer to them: > Oh I'm not saying anyones wrong.. I'm just tired of hearing people say we are wrong. We actually have 2 black submissions for partypoker.com. One actually had a sample attached. > > BTW, I'm wondering when URIBL will be able to keep up with my > submissions so I can start them up again. ;P > Once we get our summer interns ;) > BTW2: if anyone's curious, here's the URIBL stats for last > weeks' SA net > checks: > > MSECSSPAM% HAM% S/ORANK SCORE NAME > 0 220791504350.814 0.000.00 (all messages) > 0.0 81.4048 18.59520.814 0.000.00 (all messages as %) > 34.512 42.3169 0.34300.992 0.770.00 URIBL_BLACK > 0.701 0.7555 0.46400.620 0.490.00 URIBL_GREY > 0.000 0. 0.0.500 0.450.00 URIBL_RED > > and SURBL's for comparison: > > 25.585 31.4293 0.1.000 1.000.00 URIBL_SC_SURBL > 33.248 40.8409 0.00991.000 1.000.00 URIBL_JP_SURBL > 36.254 44.5226 0.05550.999 0.940.00 URIBL_OB_SURBL > 4.291 5.2710 0.1.000 0.910.01 T_URIBL_XS_SURBL > 3.907 4.7996 0.00201.000 0.900.00 URIBL_AB_SURBL > 39.914 48.6415 1.70710.966 0.650.00 URIBL_WS_SURBL > 0.195 0.2391 0.1.000 0.630.00 URIBL_PH_SURBL > SPAM% is crap when it comes to ruleqa on uribls. Spammers rotate domains daily. We expire dead domains daily. I guess we could keep all the bloat around to pump our numbers ;) If you had a daily rotated corpus, we'd own it in SPAM%... Todays stats. RANKRULE NAME COUNT %OFMAIL %OFSPAM %OFHAM -- 4URIBL_BLACK 471431.44 76.161.21 8URIBL_JP_SURBL 156218.60 52.330.06 9URIBL_OB_SURBL 135116.28 45.260.35 10URIBL_WS_SURBL 118614.39 39.730.46 12URIBL_SC_SURBL95911.40 32.130.00 -- Todays stats from a bigger install. -- 5URIBL_BLACK 134850 46.70 71.790.68 9URIBL_JP_SURBL 7065924.35 37.620.01 10URIBL_OB_SURBL 6915123.84 36.820.03 12URIBL_WS_SURBL 5778619.95 30.770.13 19URIBL_SC_SURBL 28533 9.83 15.190.00 -- At the end of the day if you run a GA, all uribls may look similar, but real-time stats show a much different picture. I don't think our detection speed is any faster than JP because I've seen some of the timestamps on new additions, but maybe its our rebuild and distribution time to our mirrors. I don't know, but SA user numbers tend to agree -> http://www.gossamer-threads.com/lists/spamassassin/users/67936 D
RE: Over-scoring of SURBL lists...
> -Original Message- > From: Daryl C. W. O'Shea [mailto:[EMAIL PROTECTED] > Sent: Thursday, February 16, 2006 21:51 > To: users@spamassassin.apache.org > Subject: Re: Over-scoring of SURBL lists... > > Matt Kettler wrote: > > List Mail User wrote: > > > My FPs fall into two categories: > > Like Matt, I've had similar electronics newsletters trigger > on apparently non-spammed domains. > > I've also had a number of users complain about FPs on emails > from a number of online poker sites. > > > > And yet it's in URIBL's blacklist. (I've already requested a delist) > > Do they actually delist domains by request? I've long ago given up > trying after having all of my requests rejected. > If "all of your requests" are referring to URIBL.COM, I think you are over exaggerating. You have submitted 1 time to uribl, that was a delist request for partypoker.com, which was requested to be blacklisted by Theo. You'd think 2 people working on an anti-spam project together could agree on something? One mans spam is another mans ham (or addiction?), and its up to URIBL to make that classification. Its not always going to be right to everyone. If you don't like our classification, submit a delist reqest. If we reject it, submit another. We take notice when multiple requests come in for the same domain, especially from unique uids. For a delist request, give us a reason in the "Your Message Regarding this submission (optional)" section. That goes along ways. So.. I have moved partypoker.com to grey for now. I'll let you and Theo thumb wrestle over it :) Dallas
Re: Post your top 10 from sa-stats
On Tue, 2006-01-31 at 07:37 -0600, DAve wrote: > And mine, note that these are *post* MailScanner and RBLs, which are > running on my mail gateways. By the time SA gets the mail I've pruned > anywhere from 45% to 75% of the messages, depending on the day. > > TOP SPAM RULES FIRED > RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM > 1 URIBL_BLACK 1623608.88 55.25 88.862.10 is that 2% ham hits really missed spam or are you having false positives due to URIBL_BLACK?? Thanks, -- Dallas Engelken <[EMAIL PROTECTED]> http://uribl.com
Re: Post your top 10 from sa-stats
On Mon, 2006-01-30 at 16:45 -0600, wrote: > Here is mine: > > TOP SPAM RULES FIRED > > RANKRULE NAME COUNT %OFRULES %OFMAIL %OFSPAM > %OFHAM > >1URIBL_BLACK 2577787.36 44.54 77.31 amen to that! -- Dallas Engelken <[EMAIL PROTECTED]> http://uribl.com
RE: Post your top 10 from sa-stats
On Tue, 2006-01-31 at 11:20 -0600, Kristopher Austin wrote: > Hmm, I guess that's a question for Dallas. This is the version I'm > using: > # file: sa-stats.pl > # date: 2005-08-03 > # version: 1.0 > # author: Dallas Engelken <[EMAIL PROTECTED]> > # desc: SA 3.1.x log parser > > I don't seem to be the only one showing that strange math. Dave had the > same sort of entry in his: > TOP HAM RULES FIRED > RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM > 1 HTML_MESSAGE6306721.17 21.46 63.61 56.74 > > Dallas, is there a bug or are we interpreting these numbers incorrectly? > Ok, Lets take the following sample data Email: 2766 Spam: 975 Ham: 1791 TOP SPAM RULES FIRED -- RANKRULE NAME COUNT %OFMAIL %OFSPAM %OFHAM -- 7HTML_MESSAGE 62922.74 64.51 34.51 -- TOP HAM RULES FIRED -- RANKRULE NAME COUNT %OFMAIL %OFSPAM %OFHAM -- 6HTML_MESSAGE 61822.34 64.51 34.51 -- we had 2766 total emails. for %OFMAIL, 629 spam messages hit HTML_MESSAGE which is 629/2766 = 22.74%. 618 ham messages hit HTML_MESSAGE which is 618/2766 = 22.34%. for %OFSPAM 629 spam message hit HTML_MESSAGE which is 629/975 = 64.51%. 618 spam message hit HTML_MESSAGE which is 618/1791 = 34.51%. If you want to know what percent the rule HTML_MESSAGE triggered out of all email, you'd need to add SPAM + HAM / TOTAL 618+629 / 2766 = 45.08%. The %OFMAIL category is misleading because its comparing the hit count (on that line) against the total email. I've went ahead and changed that is v1.02 and v0.92 respectively. If you like the old way it works, dont get the new version :) SA 3.0.x - http://www.rulesemporium.com/programs/sa-stats.txt SA 3.1.x - http://www.rulesemporium.com/programs/sa-stats-1.0.txt Hope this clarifies! Thanks, -- Dallas Engelken <[EMAIL PROTECTED]> http://uribl.com
Re: SpamAssassin 3.1.0-pre2 PRERELEASE available!
On Thu, 2005-06-30 at 06:39 -0500, Michael Parker wrote: > Kai Schaetzl wrote: > > > > >>SQL > >> storage is now recommended for Bayes > >> > >> > > > >Hm, time to check the documents how to set this up ... > >BTW: is my impression correct that Bayes on SQL won't do any auto-expire, > >you have to do it yourself with some SQL code? > > > > > > No, it does auto expire just fine. Not sure what gave you that impression. > maybe confused with an sql auto-whitelist? d