On 11/3/2025 11:35 AM, [email protected] wrote:
On 11/3/25 4:10 PM, Jared Hall via users wrote:
On 11/2/2025 9:14 AM, Matus UHLAR - fantomas wrote:

extracttext_external    pdftotext       /usr/bin/pdftotext -nopgbrk -layout -enc UTF-8 {} -
extracttext_use         pdftotext       .pdf application/pdf

Yes. Using that syntax exactly.
I am not sure if it's possible to have PDF link like in HTML and what it would extract:

<a href="http://example.com";>see this link</a>
You are correct.  ExtractText works well with visible text.  It does not seem to
pickup anchored URL references.

fwiw Mail::SpamAssassin::Plugin::PDFInfo has some code to extract anchored URL references.

Yes, I see the $pms->add_uri_detail_list($location); runs by default in PDFInfo.pm
Just loaded the plugin in v341.pre.  Works fine now.

Thanks,

-- Jared Hall
[email protected]



Reply via email to