Re: How to set a part type for a add_body_part

2007-11-15 Thread Olivier Nicole
Hi Theo, 

 Hrm.  FYI, there are assumptions made that the internal message structure is
 from the original message.  So while I don't think things will break, some
 rules may trigger inappropriately if you blow their mind.

I'll try for a while.

  $part_array-[0]=3D--$boundary;
  $part_array-[1]=3DContent-Type: image/tiff;;
  $part_array-[2]=3DContent-Transfer-Encoding: Base64;
  $part_array-[3]=3D;
 Part data doesn't have mime boundaries and headers.  It's just the part
 content.  You would want to parse/shove this data into the object header.

OK, nor they are encoded anymore.

  open(FILE, $dir/$tifffile) or die $!;
  while (read(FILE, $buf, 60*57)) {
  $part_array-[$pline++]=3DMIME::Base64::encode_base64($buf);
  }
 Don't you have this from the message tree already?

No, this is a new image, extracted from the PDF part attached to the
message.

  $part_msg-header(content_type, image/tiff);
 I think you'd want content-type.

Of course :(

  push(@{$msg-{'parse_queue'}}, [ $part_msg, $boundary, $part_array, 1=
  ]);
  $msg-add_body_part($part_msg);
 This doesn't make sense.  You either want to add the body part to
 the correct part of the tree, or setup the parse queue
 appropriately and let it do it.

This I borrowed from SpamAssassin code directly as there are very few
examples of add_body_part around. It's in Message.pm.

  # we've created a new node object, so add it to the queue along with the
  # text that belongs to that part, then add the new part to the current
  # node to create the tree.
  push(@{$self-{'parse_queue'}}, [ $part_msg, $p_boundary, $part_array, $su
bparse ]);
  $msg-add_body_part($part_msg);

I could not find any proper way to associate the body of the part in
$part_array with the node defined in part_msg: there is the method
header to add /modify a header of a part, but no similar (documented)
method to add/modify the body of the part.

Best regards,

Olivier


Re: what does whitelist_from act on

2007-11-15 Thread Matt Kettler
K Anand wrote:
 I have whitelist_from [EMAIL PROTECTED] in my conf.
 As per the docs, they say that whitelist_from  will act on

 Envelope-Sender
 Resent-Sender
 X-Envelope-From
 From
In addition, the ``envelope sender'' data, taken from the SMTP envelope
data where this is available, is looked up. See |envelope_sender_header|.

So it should also, by default, match the Return-Path header.

*HOWEVER* that assumes the header is present at the time of scanning.
Normally this header is not present at the MTA layer. It's a delivery
agent thing.

 Many MTA layer SA integration tools create a fake return-path header
and then remove it.

SimScan (which you appear to use) doesn't do this, at least, the last
person who was asking about the same basic problem (although it was
relating to SPF, it still was failing due to lack of envelope
information at scan time).

You might be able to use the same solution he did, which patches qmail
to add the envelope-from information to your Received: headers.

See also:

http://wiki.apache.org/spamassassin/QmailSpfPatch



Multiple domains, only the first is tagged

2007-11-15 Thread marcel458

I use Fedora Core 8, amavisd-new, clamav and spamassassin, all current
releases.
I have 3 domains (non commercial), only the first domain is tagged, the
others not. Virus is checked for all domains.
What can I try to fix this? I already googled and searched for it but did
not found any working solution.

Thanks in advance!
-- 
View this message in context: 
http://www.nabble.com/Multiple-domains%2C-only-the-first-is-tagged-tf4816810.html#a13780608
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



It works fine now, thanks!

2007-11-15 Thread marcel458

Thanks a lot! I found that line on google before, only the person in that
post had all the domains between the scare brackets []. After changing the
line as in your example it works fine...

 


McDonald, Dan wrote:
 
 
 
 This is an amavisd-new issue.  You need to add all of the the domains to
 the @local_domains_maps variable in amavisd.conf
 
 Example:
 @local_domains_maps = ( [.$mydomain], example.com,example.org,
 example.net  );  # list of all local domains
 
 Daniel J McDonald, CCIE #2495, CISSP #78281, CNX
 Austin Energy
 http://www.austinenergy.com
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Multiple-domains%2C-only-the-first-is-tagged-tf4816810.html#a13781962
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: How to set a part type for a add_body_part

2007-11-15 Thread Olivier Nicole
Hi,

  push(@{$msg-{'parse_queue'}}, [ $part_msg, $boundary, $part_array, 1=
  ]);
  $msg-add_body_part($part_msg);
 This doesn't make sense.  You either want to add the body part to
 the correct part of the tree, or setup the parse queue
 appropriately and let it do it.

Would that be a better way to do it? It seems to work too.

$part_msg-{'type'}=image/tiff;
$part_msg-{'name'}=$tifffile;
$part_msg-header(content-type, image/tiff);
$part_msg-{'raw'}=$part_array;
$msg-add_body_part($part_msg);

Best regards,

Olivier


Re: Multiple domains, only the first is tagged

2007-11-15 Thread Daniel J McDonald

On Thu, 2007-11-15 at 12:07 -0800, marcel458 wrote:
 I use Fedora Core 8, amavisd-new, clamav and spamassassin, all current
 releases.
 I have 3 domains (non commercial), only the first domain is tagged, the
 others not. Virus is checked for all domains.
 What can I try to fix this? I already googled and searched for it but did
 not found any working solution.

This is an amavisd-new issue.  You need to add all of the the domains to
the @local_domains_maps variable in amavisd.conf

Example:
@local_domains_maps = ( [.$mydomain], example.com,example.org,
example.net  );  # list of all local domains

 
 Thanks in advance!
-- 
Daniel J McDonald, CCIE #2495, CISSP #78281, CNX
Austin Energy
http://www.austinenergy.com



FuzzyOcr: selecting what frame for animated GIFs

2007-11-15 Thread Olivier Nicole
Hi,

By default FuzzyOcr will analize only one frame from animated GIFs.

If the animation goes one word by frame: Click, Here, Now,
selecting only one frame will give no sensible result. It woul dbe
better to concatenate all the frames into one single image and process
that image.

The following patch does that: http://fuzzyocr.own-hero.net/ticket/420

- if the loop is finite (or if there is no loop), only the last frame
  is analyzed;

- if the loop is infinite but one frame is on display for a duration
  that is much longer than other frames, only that frame is analyzed.

  The exact definition of much longer is open to discussion, I
  choosed that the dominant frame should be there on more than 50% of
  teh total time of the animation, but one could choose that the
  dominant frame should be only x time the average duration of one
  frame.

- else all the frames are constructed into a single image that is
  analyzed.

Bests,

Olivier


FuzzyOcr: Pushing OCR'ed text back to SA

2007-11-15 Thread Olivier Nicole
Hi,

This ticket in FuzzyOcr http://fuzzyocr.own-hero.net/ticket/15 is
proposing to send the text resulting from from the OCR process back to
SA for analysis.

I fully second that idea but I am wondering *what* text to push back:
depending on teh scanset being used the same image will decode as:

[20834] dbg: FuzzyOcr: ocrdata=. Uíagra tl.7g
[20834] dbg: FuzzyOcr: . Cíalís t2.6g
[20834] dbg: FuzzyOcr: 
[20834] dbg: FuzzyOcr: =end

[20834] dbg: FuzzyOcr: ocrdata==end

[20834] dbg: FuzzyOcr: ocrdata=. Uíagra tl.7g
[20834] dbg: FuzzyOcr: . Cíalís t2.6g
[20834] dbg: FuzzyOcr: 
[20834] dbg: FuzzyOcr: =end

[20834] dbg: FuzzyOcr: ocrdata=' Viagra tl.79
[20834] dbg: FuzzyOcr: ' CiaIis t2.69
[20834] dbg: FuzzyOcr: =end

The last scanset is the one prefered by FuzzyOcr when we let it do the
word analysis, but the first may even be enough for SA.

So the question really is: when can we say that the OCR is giving
clean enough results that could be used by SA? We should not give SA
the result of all scansets, else that would artificially raise the
spam score.

On another hand, for a photgraphy, OCR text may look like the
following this, this should never be pushed to SA, so how to decide?

[19120] dbg: FuzzyOcr: ocrdata=. ._ .
[19120] dbg: FuzzyOcr: _\
[19120] dbg: FuzzyOcr: | _
[19120] dbg: FuzzyOcr: _ |
[19120] dbg: FuzzyOcr: 
[19120] dbg: FuzzyOcr: _? _4'|
[19120] dbg: FuzzyOcr: , _ ,. . .
[19120] dbg: FuzzyOcr: 
[19120] dbg: FuzzyOcr: __ - . . _
[19120] dbg: FuzzyOcr: _ . . .
[19120] dbg: FuzzyOcr: .._ _ .
[19120] dbg: FuzzyOcr: 
[19120] dbg: FuzzyOcr: =end

Best regards,

Olivier




Force plugins order

2007-11-15 Thread Olivier Nicole
Hello,

Is there a way to force the order that SA will try the various plugins?

Typically I would have:

- a first plugin that analyzes PDF attachment, pushed the text part
  back as text (post_message_parse) and the images as new message
  parts (add_body_part);

- a second plugin that does OCR of the images attachments and pushes
  the discovered text back as text (post_message_parse).

Obviously the first plugin should come first as it may discover images
that will be OCRed by the second plugin.

How to force the order of the plugins?

Best regards,

Olivier