I have successfully set up ExtractText plugin with proposed settings (those in pod/manual page) and here's a tip:

- put extracttext.pm into /etc/spamassassin or similar directory
   (extracttest settings aren't loaded from user_prefs)

- tesseract takes too much time to process (at least on my server),
   so I recommend to set:

extracttext_timeout     20      60

On 06.03.23 12:23, Alex wrote:
Have you noticed an increase in false positives due to legitimate "invoice"
PDFs or other attachments being processed by body filters and getting
tagged incorrectly?

Update:

so far I am only happy by catching spams using BAYES:

X-Spam-ExtractText-Chars: 118
X-Spam-ExtractText-Words: 19
X-Spam-ExtractText-Tools: pdftotext
X-Spam-ExtractText-Types: application/pdf
X-Spam-ExtractText-Extensions: pdf

I believe training of invoices would quickly fix any problem

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Nothing is fool-proof to a talented fool.

Reply via email to