I have successfully set up ExtractText plugin with proposed settings
(those in pod/manual page) and here's a tip:
- put extracttext.pm into /etc/spamassassin or similar directory
(extracttest settings aren't loaded from user_prefs)
- tesseract takes too much time to process (at least on my server),
so I recommend to set:
extracttext_timeout 20 60
On 06.03.23 12:23, Alex wrote:
Have you noticed an increase in false positives due to legitimate "invoice"
PDFs or other attachments being processed by body filters and getting
tagged incorrectly?
Update:
so far I am only happy by catching spams using BAYES:
X-Spam-ExtractText-Chars: 118
X-Spam-ExtractText-Words: 19
X-Spam-ExtractText-Tools: pdftotext
X-Spam-ExtractText-Types: application/pdf
X-Spam-ExtractText-Extensions: pdf
I believe training of invoices would quickly fix any problem
--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Nothing is fool-proof to a talented fool.