Re: [Clamav-users] Complexity limit on (custom) signatures?

2006-10-30 Thread Kris Deugau
Kris Deugau wrote:
 From the problems I'm having with supposedly malformed signatures, it
 looks like there's an effective complexity limit;  from the problems in
 *matching* a signature that's finally been found to be acceptable, it
 looks like there's a (lower) limit on what Clam can actually use in
 matching.
 
 Any suggestions on what I might be doing wrong?

Just to try to bring this interesting discussion back to the problem I'm
having... g

- Image-based spam is slipping past the existing spam detection tool.
Upgrading said tool is Not Possible to system load, and the fact that
this system is due to be retired about eight months ago.

- I already do virus scans with a fairly stock ClamAV install...   the
meat of the spams that are getting through is embedded in an image
file...  So I'll create signatures for these files.

- Due to the variety of hiding techniques used, it's rare to find two
identical image files, therefore MD5 sums are mostly useless.  (On a
*very* large scale, there might be enough duplication for effective use
of MD5 sigs.)

- Hex dumps of a collection of these image files shows *some*
similarity that could be used with the extended signature format.

- Scripts have been created to munge this data into what are supposedly
valid signatures.

- These supposedly-valid signatures are either:
  a) Rejected outright by Clam as malformed
  b) Accepted, but don't actually match on any of the files that were
used to create them.

As I said originally, it looks like there is a limit somewhere on how
complex a signature Clam can accept, and a lower limit on what it can
use effectively.

Am I just seeing things, or am I triggering an odd corner-case bug in
Clam's signature handling?  (Or just tripping over a designed limit?)

I would guess that it's rare for viruses to be quite as mutable as these
image spams, so where a pair of 30-character hex strings separated by
30-50 unknown characters may easily identify a virus, along with 3 or 4
variants (and continues to do so for the in-the-wild life of the virus),
that wouldn't identify very many imagespam images for very long.

-kgd
___
http://lurker.clamav.net/list/clamav-users.html


Re: [Clamav-users] Complexity limit on (custom) signatures?

2006-10-30 Thread Steve Holdoway
On Mon, 30 Oct 2006 19:35:13 +0100
aCaB [EMAIL PROTECTED] wrote:

 So, this:
 474946383761??(01|00)??0044
 Should really read:
 47494638376144

Or even 

  474946383761??0(0|1)??0044
___
http://lurker.clamav.net/list/clamav-users.html


Re: [Clamav-users] Complexity limit on (custom) signatures?

2006-10-30 Thread Steve Holdoway
On Tue, 31 Oct 2006 07:48:46 +1300
Steve Holdoway [EMAIL PROTECTED] wrote:

 On Mon, 30 Oct 2006 19:35:13 +0100
 aCaB [EMAIL PROTECTED] wrote:
 
  So, this:
  474946383761??(01|00)??0044
  Should really read:
  47494638376144
 
 Or even 
 
   474946383761??0(0|1)??0044

Sorry, scrap that. No coffee yet this morning (:
___
http://lurker.clamav.net/list/clamav-users.html


Re: [Clamav-users] Complexity limit on (custom) signatures?

2006-10-30 Thread Tomasz Kojm
On Mon, 30 Oct 2006 19:35:13 +0100
aCaB [EMAIL PROTECTED] wrote:

 Kris Deugau wrote:
  ImgSpam.Misc.5:0:0:474946383761??(01|00)??00442c??(01|00)??0084(00|48|53)(00|15)(00|30|1c)f0f0f0(f0|e0|c0)f0(e0|b0|f0|d0|c0)f0(00|f0|40)(00|d0|e0|60|70)(f0|90|00|c0)(e0|90|00|b0|70)f0??(00|90|40|7d|10)(f0|ea)??(f0|00|e0|d0|46)
 
 Hi Kris,
 There are a number of problems with your sample sig.
 The most important rules you should obey are:

A few corrections :-)

 1) you always need at least 2 static bytes before and after a wildcard
 (though a serie of ?? is fine)

with 0.9x it's enough to have a block of 2 static bytes somewhere in a part
of signature (by 'part' I mean a sequence delimited by range wildcards (*,
{})).

 2) a static block must not start with 00

it can start with 00 :-)

 3) The alt syntax is (aa|bb), not (aa|bb|cc..)

(aa|bb|cc..) is just fine

 So, just looking at the begin of your sig:

The above sig looks OK to me.

-- 
   oo. Tomasz Kojm [EMAIL PROTECTED]
  (\/)\. http://www.ClamAV.net/gpg/tkojm.gpg
 \..._ 0DCA5A08407D5288279DB43454822DC8985A444B
   //\   /\  Mon Oct 30 19:53:57 CET 2006
___
http://lurker.clamav.net/list/clamav-users.html


Re: [Clamav-users] Complexity limit on (custom) signatures?

2006-10-30 Thread aCaB
Steve Holdoway wrote:
 Or even 
 
   474946383761??0(0|1)??0044

Nope! Bytes only, no nibbles.

___
http://lurker.clamav.net/list/clamav-users.html


Re: [Clamav-users] Complexity limit on (custom) signatures?

2006-10-30 Thread aCaB
Tomasz Kojm wrote:
 with 0.9x

Indeed! :)
___
http://lurker.clamav.net/list/clamav-users.html


Re: [Clamav-users] Complexity limit on (custom) signatures?

2006-10-30 Thread Kris Deugau
aCaB wrote:
 Kris Deugau wrote:
 ImgSpam.Misc.5:0:0:474946383761??(01|00)??00442c??(01|00)??0084(00|48|53)(00|15)(00|30|1c)f0f0f0(f0|e0|c0)f0(e0|b0|f0|d0|c0)f0(00|f0|40)(00|d0|e0|60|70)(f0|90|00|c0)(e0|90|00|b0|70)f0??(00|90|40|7d|10)(f0|ea)??(f0|00|e0|d0|46)
 
 Hi Kris,
 There are a number of problems with your sample sig.

Advice appreciated... but I'm not sure you're at all correct.  As a
matter of fact, that's about a *third* of an active, live sig that Clam
seems to be using quite happily right now.  ;)

 The most important rules you should obey are:
 1) you always need at least 2 static bytes before and after a wildcard
 (though a serie of ?? is fine)

Ick.  Clam doesn't seem to complain consistently about this, though.

Just for clarification, you mean:
Valid:   ...0ef2bc34...
Invalid: ...(00|01)f3(89|bc)... or
 ...??f4(00|01)b4??...

?

 2) a static block must not start with 00

Ick again.  Clam doesn't consistently complain about this, either.

Again, for clarity:
Valid: ...??0101ab...  or
   ...(a3|45)ff003400...
Invalid:   ...??0001ab...  or
   ...(a3|45)3400...

 3) The alt syntax is (aa|bb), not (aa|bb|cc..)

Ick cubed.  g

Clam happily *accepts* (aa|bb|cc..), and in fact I think it's working
just fine.  Except when it doesn't.  :(

In short:  If it's invalid, why doesn't Clam complain?
and:  If it's valid, why doesn't it work?

G

Let's take a new example.  This is one I've just pushed out to
production now:
(This is also one of the simpler ones I've generated!)

test.test:0:*:2c??01??010003ff48badcfe30ca49abbd38ebcdbbff60288e64699e68aaae6cebbe702ccf746ddf78aeef7cefffc0a070482c1a8fc8a472c96c3a9fd0a8744aad5aaf(58|15|ed|e8)(70|2c|0a|7a)(b7|ba|16|05)(de|dc|8b)(6f|2f|2e|af)(97|d8|38|37)(20|4b|fc|2c)(1e|18|25|06)(9f|93|90|13)(cc|4f|cb|ca)(5a|e7|27|e6)(99|ad|34|53){180}ff{37}

Clam complains about this...  but once I trim the trailing {180}ff{37}
(something like that usually comes up), Clam is happy (and this time,
not only accepts the sig as valid, but tags the files I used to generate
it - which, for this class of imagespam, are almost disturbingly *regular*).

However, quite often I have to keep trimming bits off the end, with NO
pattern I can see (your rules don't seem to apply, if memory serves from
past attempts).  Eventually I reach a point where a) the sig is accepted
as valid by Clam, and b) Clam tags the source files using it.  b) is
quite often a much shorter sig than a).

Just to thoroughly confuse things, here's a sig that Clam doesn't
complain about, which *still* violates all three of your rules above (I
think...):

testsig:0:0:0aefbf??(00|01){12}(00|01|02|03)??ff

I don't know if it would actually match on any files though.  (Just
working on a quick hack to test this now.)

 So, just looking at the begin of your sig:
 474946383761 = all fine, static
 ?? = wildcard following a static block, fine
 (01|00) = wildcard not following a static block, bad
 ?? = wildcard not following a static block, bad
 0044 = static block starting with 00, bad
 
 So, this:
 474946383761??(01|00)??0044
 Should really read:
 47494638376144

The problem with doing that is that I end up with something like:

474946383761??01ae{185}(ae|01){200}(0e|f0)

Or worse:

474946383761{400}

(I've come pretty close to that - *big*, *long* strings of anything
with maybe one or two solitary static bytes.)

If more than two possiblities for any given byte (and that's pretty much
normal for these images) have to be turned into ??, I generally end up
with a VERY long string of ??, which compresses down to {nn}.  Which
doesn't make a very useful signature.  :/

-kgd
___
http://lurker.clamav.net/list/clamav-users.html


Re: [Clamav-users] Complexity limit on (custom) signatures?

2006-10-30 Thread Kris Deugau
Tomasz Kojm wrote:
 A few corrections :-)

Ah!  The Voice of Authority!  g

 aCaB [EMAIL PROTECTED] wrote:
 1) you always need at least 2 static bytes before and after a wildcard
 (though a serie of ?? is fine)
 
 with 0.9x it's enough to have a block of 2 static bytes somewhere in a part
 of signature (by 'part' I mean a sequence delimited by range wildcards (*,
 {})).

FWIW, the sig above is working fine, with 0.88.2 (haven't been inspired
enough to get 0.88.5 backported to Debian woody).

 The above sig looks OK to me.

And aside from the fact that it's not hitting new traffic any more, Clam
has been happy with it too.

I'm still curious about what seem to be inconsistencies in what's valid
and what's not, though.

-kgd
___
http://lurker.clamav.net/list/clamav-users.html


Re: [Clamav-users] Complexity limit on (custom) signatures?

2006-10-29 Thread Henrik Krohns
On Sat, Oct 28, 2006 at 04:28:47PM -0700, Dennis Peterson wrote:
 
 I don't get it.. unless you have some big honeypot, maybe 5% of traffic
 contain small images to be OCRd. If your server can't handle that, I guess
 it's running out of juice anyway. :)
 
 You can even easily create separate scanning queue for OCR, so it doesn't
 interfere with normal traffic.
 
 You may have missed that I'm in the image industry - a great deal of 
 what we do is imagery including imagery with text in it, and as we have 
 to scan all images over a particular size, it would require more cpu 
 than is worth it.

Ok that's fair. But you probably meant: scan everything _under_ SpamAssassin
scan size. That's only whole messages less than ~256kB to be scanned by
default in most software. I guess if you get images from all over, you can't
whitelist etc then.

Cheers,
Henrik
___
http://lurker.clamav.net/list/clamav-users.html


Re: [Clamav-users] Complexity limit on (custom) signatures?

2006-10-29 Thread Dennis Peterson

Henrik Krohns wrote:

On Sat, Oct 28, 2006 at 04:28:47PM -0700, Dennis Peterson wrote:

I don't get it.. unless you have some big honeypot, maybe 5% of traffic
contain small images to be OCRd. If your server can't handle that, I guess
it's running out of juice anyway. :)

You can even easily create separate scanning queue for OCR, so it doesn't
interfere with normal traffic.
You may have missed that I'm in the image industry - a great deal of 
what we do is imagery including imagery with text in it, and as we have 
to scan all images over a particular size, it would require more cpu 
than is worth it.


Ok that's fair. But you probably meant: scan everything _under_ SpamAssassin
scan size. That's only whole messages less than ~256kB to be scanned by
default in most software. I guess if you get images from all over, you can't
whitelist etc then.


Lemme run it past you one more time - images are money in my world. I 
can't make mistakes. The right image is worth millions of dollars. 
Blocking such an image is something that's going on my resume'. Nobody 
knows where the next big image is coming from, so the rule is caution, 
caution, caution. It does not apply to everyone, certainly. I envy 
others who can bitch slap image spam vendors with little regard. That 
would be cool. I can't do it. I know how but don't dare. It's probably 
why I get pissy :)


dp
___
http://lurker.clamav.net/list/clamav-users.html


Re: [Clamav-users] Complexity limit on (custom) signatures?

2006-10-28 Thread Gerard Seibert
On Friday October 27, 2006 at 08:42:34 (PM) Dennis Peterson wrote:

 Not to change the direction on you, but you might want to take advantage 
 of the work Steve Basford is doing at 
 http://www.sanesecurity.com/clamav/ for phishing problems, and also look 
 at http://www.msrbl.com/site/stats for image and spam solutions. Both 
 sites are providing excellent results on systems I'm running. The 
 patterns are downloadable and very up to date. I've not had a single 
 complaint of false positives, and the number of patterns provided is 
 quite large.
 
 Steve has also written a very useable how-to for creating these patterns.

Steve has done a remarkable job with his 'sig' files. He is constantly
updating them. I know because I use them. they are always catching
'phishing' threats' on my PC.

He also has two automated installers for downloading and installing his
signature files. I wrote the 'script' version. There is also a Perl
version available on his site.


-- 
Gerard

 There is nothing wrong with making love with the light on. Just make
 sure the car door is closed.

  George Burns
___
http://lurker.clamav.net/list/clamav-users.html


Re: [Clamav-users] Complexity limit on (custom) signatures?

2006-10-28 Thread Dennis Peterson

Kris Deugau wrote:



The stock and pill spams that I'm trying to tag, however, have images
that have *very small* variations message-to-message, but over a larger
sample there's really very little that can be seen as common across
the whole set - or even a significant part of the set.  Automating the
process of finding all possible values for the byte at this position
is the only way I can usefully get anywhere.


I did a binary diff and md5 checksums on hundreds of the stock and pill 
images and never found any two to be the same. They use a random noise 
generator to sprinkle the images with enough debris to prevent analysis, 
so even splitting the files into 128 and 512 byte slices and checking 
each of the slices was not helpful. Even when you convert the image to 
black and white to remove the color element there's still sufficient 
randomness to prevent go-nogo certainty. I've explored OCR on both color 
and de-colorized images and there have been successes, but not enough to 
warrant turning it on in production. It is very cpu intensive.


I attempted to see if there were any digital watermarks in these images 
and found nothing although the math for doing this pushes my limits.


I work in the image industry so have to be more careful than most 
regarding these, so others may have better luck than I which is another 
way of saying acceptable risk is site dependent.


I'd be very interested in any headway you make.

FWIW, I checked my current logs and found the MSRBL sigs blocked over 
6,000 images in a two week period. The Sanesecurity filters stopped an 
additional 4,000. There were a total of 16383 messages blocked using all 
ClamAV filters, and many more thousands found by various milters and 
RBL/SURBL scans. This is on one of the smaller servers I run. The bigger 
mail farms are magnitudes greater for all categories. I mention this 
only because the out of pocket cost for these successes was $0.00 USD 
and very little time invested. Which reminds me, I should send some 
donation money to all the great folks who made these success possible.


dp
___
http://lurker.clamav.net/list/clamav-users.html


Re: [Clamav-users] Complexity limit on (custom) signatures?

2006-10-28 Thread Henrik Krohns
On Sat, Oct 28, 2006 at 09:20:55AM -0700, Dennis Peterson wrote:

 I've explored OCR on both color and de-colorized images and there have
 been successes, but not enough to warrant turning it on in production. It
 is very cpu intensive.

I don't get it.. unless you have some big honeypot, maybe 5% of traffic
contain small images to be OCRd. If your server can't handle that, I guess
it's running out of juice anyway. :)

You can even easily create separate scanning queue for OCR, so it doesn't
interfere with normal traffic.

Cheers,
Henrik
___
http://lurker.clamav.net/list/clamav-users.html


Re: [Clamav-users] Complexity limit on (custom) signatures?

2006-10-28 Thread Kris Deugau
Henrik Krohns wrote:
 I don't get it.. unless you have some big honeypot, maybe 5% of traffic
 contain small images to be OCRd. If your server can't handle that, I guess
 it's running out of juice anyway. :)

Well... yeah.  g  The basic problem is that all the other garbage
(with the occasional inevitable exception) is getting caught by Clam
(viruses and most phishes) or SpamAssassin (all but a few text-based spams.

I've found *enough* similarities in the raw binary image data to
usefully make signatures for a lot of what is otherwise getting through;
 at the moment this is just a stopgap until these machines can be retired.

However, in the long run, OCR to feed the text to SpamAssassin's other
rules is a better solution;  it's much more flexible.

-kgd
___
http://lurker.clamav.net/list/clamav-users.html


Re: [Clamav-users] Complexity limit on (custom) signatures?

2006-10-28 Thread Bill Randle
On Sat, 2006-10-28 at 16:54 -0400, Kris Deugau wrote:
 Henrik Krohns wrote:
  I don't get it.. unless you have some big honeypot, maybe 5% of traffic
  contain small images to be OCRd. If your server can't handle that, I guess
  it's running out of juice anyway. :)
 
 Well... yeah.  g  The basic problem is that all the other garbage
 (with the occasional inevitable exception) is getting caught by Clam
 (viruses and most phishes) or SpamAssassin (all but a few text-based spams.
 
 I've found *enough* similarities in the raw binary image data to
 usefully make signatures for a lot of what is otherwise getting through;
  at the moment this is just a stopgap until these machines can be retired.
 
 However, in the long run, OCR to feed the text to SpamAssassin's other
 rules is a better solution;  it's much more flexible.

Indeed. For those interested in the topic of OCR to feed SpamAssassin,
there's an active project with its own mailing list that does just this.
It turns out to be a non-trivial task because many of these image spam
are animated gifs, so you need to find the right frame to pass to the
OCR program.

Start here: http://wiki.apache.org/spamassassin/FuzzyOcrPlugin then
subscribe to the Devel-Spam mailing list (there's a link on that page).

-Bill


___
http://lurker.clamav.net/list/clamav-users.html


Re: [Clamav-users] Complexity limit on (custom) signatures?

2006-10-28 Thread Dennis Peterson

Henrik Krohns wrote:

On Sat, Oct 28, 2006 at 09:20:55AM -0700, Dennis Peterson wrote:

I've explored OCR on both color and de-colorized images and there have
been successes, but not enough to warrant turning it on in production. It
is very cpu intensive.


I don't get it.. unless you have some big honeypot, maybe 5% of traffic
contain small images to be OCRd. If your server can't handle that, I guess
it's running out of juice anyway. :)

You can even easily create separate scanning queue for OCR, so it doesn't
interfere with normal traffic.


You may have missed that I'm in the image industry - a great deal of 
what we do is imagery including imagery with text in it, and as we have 
to scan all images over a particular size, it would require more cpu 
than is worth it. And when you consider repeating it all at a disaster 
recovery site it's starting to be a lot of computer power with a high 
false positive probability.


You cannot count on the image spam being gif as png images are showing 
up now as are jpg, and animated gifs are also out there. OCR isn't 
practical for me but may be for others for a while - at least until they 
start to use CAPTCHA technology to get around it.


dp
___
http://lurker.clamav.net/list/clamav-users.html


Re: [Clamav-users] Complexity limit on (custom) signatures?

2006-10-28 Thread Dennis Peterson

Bill Randle wrote:

On Sat, 2006-10-28 at 16:54 -0400, Kris Deugau wrote:

Henrik Krohns wrote:

I don't get it.. unless you have some big honeypot, maybe 5% of traffic
contain small images to be OCRd. If your server can't handle that, I guess
it's running out of juice anyway. :)

Well... yeah.  g  The basic problem is that all the other garbage
(with the occasional inevitable exception) is getting caught by Clam
(viruses and most phishes) or SpamAssassin (all but a few text-based spams.

I've found *enough* similarities in the raw binary image data to
usefully make signatures for a lot of what is otherwise getting through;
 at the moment this is just a stopgap until these machines can be retired.

However, in the long run, OCR to feed the text to SpamAssassin's other
rules is a better solution;  it's much more flexible.


Indeed. For those interested in the topic of OCR to feed SpamAssassin,
there's an active project with its own mailing list that does just this.
It turns out to be a non-trivial task because many of these image spam
are animated gifs, so you need to find the right frame to pass to the
OCR program.

Start here: http://wiki.apache.org/spamassassin/FuzzyOcrPlugin then
subscribe to the Devel-Spam mailing list (there's a link on that page).



You might want to consider the next level of image spam before you go 
too far down the OCR path:


http://www.iss.net/threats/Animated%20GIF.html

dp
___
http://lurker.clamav.net/list/clamav-users.html


Re: [Clamav-users] Complexity limit on (custom) signatures?

2006-10-28 Thread Bill Randle
On Sat, 2006-10-28 at 16:21 -0700, Dennis Peterson wrote:
 Bill Randle wrote:
  On Sat, 2006-10-28 at 16:54 -0400, Kris Deugau wrote:
 
  However, in the long run, OCR to feed the text to SpamAssassin's other
  rules is a better solution;  it's much more flexible.
  
  Indeed. For those interested in the topic of OCR to feed SpamAssassin,
  there's an active project with its own mailing list that does just this.
  It turns out to be a non-trivial task because many of these image spam
  are animated gifs, so you need to find the right frame to pass to the
  OCR program.
  
  Start here: http://wiki.apache.org/spamassassin/FuzzyOcrPlugin then
  subscribe to the Devel-Spam mailing list (there's a link on that page).
 
 
 You might want to consider the next level of image spam before you go 
 too far down the OCR path:
 
 http://www.iss.net/threats/Animated%20GIF.html

Actually, the FuzzyOCR plugin already handles animated gifs using
various techniques to extract the hidden text. It also is able to
decode png and jpeg files.

-Bill
 

___
http://lurker.clamav.net/list/clamav-users.html


Re: [Clamav-users] Complexity limit on (custom) signatures?

2006-10-28 Thread Dennis Peterson

Bill Randle wrote:

On Sat, 2006-10-28 at 16:21 -0700, Dennis Peterson wrote:




Actually, the FuzzyOCR plugin already handles animated gifs using
various techniques to extract the hidden text. It also is able to
decode png and jpeg files.


Ah - so it does. I hadn't looked at v. 2.3. I'll have another look. 
Thanks, Bill.


dp
___
http://lurker.clamav.net/list/clamav-users.html


[Clamav-users] Complexity limit on (custom) signatures?

2006-10-27 Thread Kris Deugau
I've been attempting to lighten the load for SpamAssassin a little by
creating signatures for the stock and pill spams that are flooding in
these days.  More specifically, I'm creating signatures for the attached
images in the spams.  (Upgrading SA, to be able to use OCR plugins and
so on, is not really possible, mostly due to system load.)

However, I'm having some odd problems with signatures that, so far as I
can tell, are *legitimate*, if perhaps a bit long.  Here's what I'm
doing to create signatures:

I take a set of images, manually sorted for rough similarity, and run
them through a script that calls sigtool --hex-dump, and picks out a
segment of the data.  (I started with just the first 400 characters of
hex, and pushed it up to 600;  with the current set I'm picking out ~600
characters starting with 2c from anywhere.)

I further sort the resulting data by hand to find similar data, and then
feed that through another script that splits each line up into octets
and notes which octet has been seen in which position for the entire
data set.  It then constructs what should be a correct signature that
will match each line of the input according to the rules for ClamAV
signatures.  (More than 5 different octets at a position get converted
to ??, and finally long segments of ??...  get converted to {nn}.)

However, far too often, ClamAv rejects it as a malformed signature.
Chopping {nn} bits off the end often fixes that issue, but not always;
in some cases I've had to trim further (aa|bb|cc) blocks, along with
trailing {nn} and/or ?? segments that may get exposed at the end.

That still doesn't make a good signature for my purposes;  I often have
to trim *further* to get a signature that actually matches on the image
files I started with.  Manually spreading the data out shows it *should*
match fine before I've done any trimming.

From the problems I'm having with supposedly malformed signatures, it
looks like there's an effective complexity limit;  from the problems in
*matching* a signature that's finally been found to be acceptable, it
looks like there's a (lower) limit on what Clam can actually use in
matching.

Any suggestions on what I might be doing wrong?

I can post the scripts and some example signatures if needed.

-kgd
___
http://lurker.clamav.net/list/clamav-users.html


Re: [Clamav-users] Complexity limit on (custom) signatures?

2006-10-27 Thread Dennis Peterson

Kris Deugau wrote:




From the problems I'm having with supposedly malformed signatures, it

looks like there's an effective complexity limit;  from the problems in
*matching* a signature that's finally been found to be acceptable, it
looks like there's a (lower) limit on what Clam can actually use in
matching.

Any suggestions on what I might be doing wrong?




Not to change the direction on you, but you might want to take advantage 
of the work Steve Basford is doing at 
http://www.sanesecurity.com/clamav/ for phishing problems, and also look 
at http://www.msrbl.com/site/stats for image and spam solutions. Both 
sites are providing excellent results on systems I'm running. The 
patterns are downloadable and very up to date. I've not had a single 
complaint of false positives, and the number of patterns provided is 
quite large.


Steve has also written a very useable how-to for creating these patterns.

dp
___
http://lurker.clamav.net/list/clamav-users.html