If this is the wrong forum to report this, let me know.

I'm trying to create a couple rules to identify questionable PDFs
(phishing, etc.).  While evaluating the debug output from spamassassin for
the pdfinfo plugin, I noticed that some of the test file attributes aren't
being populated correctly, when comparing against exiftool, Adobe Reader,
Firefox, etc.  The producer and creator fields, specifically, appear to be
left as unknown.

Compared against other emails and PDFs, I get similar results, so I suspect
it's an issue with the plugin or how it is parsing the PDF.  I do have this
example available, however it is malicious (it links to a phishing site),
so I wouldn't want to link to it directly in this thread.

For example:

$ less Invoice0098539.pdf
%PDF-1.4
1 0 obj
<<
/Title (<FE><FF>)
/Creator (<FE><FF>^@w^@k^@h^@t^@m^@l^@t^@o^@p^@d^@f^@ ^@0^@.^@1^@2^@.^@5)
/Producer (<FE><FF>^@Q^@t^@ ^@4^@.^@8^@.^@7)
/CreationDate (D:20220302192255Z)
>>
...

$  exiftool Invoice0098539.pdf
ExifTool Version Number         : 12.30
File Name                       : Invoice0098539.pdf
Directory                       : .
File Size                       : 21 KiB
File Modification Date/Time     : 2022:03:02 16:34:04-05:00
File Access Date/Time           : 2022:03:02 16:37:43-05:00
File Inode Change Date/Time     : 2022:03:02 16:34:04-05:00
File Permissions                : -rw-r--r--
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.4
Linearized                      : No
Title                           :
Creator                         : wkhtmltopdf 0.12.5
Producer                        : Qt 4.8.7
Create Date                     : 2022:03:02 19:22:55Z
Page Count                      : 1

$ sa-debug
...
config: fixed relative path:
/var/lib/spamassassin/3.004004/updates_spamassassin_org/20_pdfinfo.cf
config: using "/var/lib/spamassassin/3.004004/updates_spamassassin_org/
20_pdfinfo.cf" for included file
config: read file /var/lib/spamassassin/3.004004/updates_spamassassin_org/
20_pdfinfo.cf
pdfinfo: Identified 1 possible mime parts that need checked for PDF content
pdfinfo: found part, type=1 file=Invoice0098539.pdf cte=base64
pdfinfo: set_tag called for PDFVERSION 1.4
pdfinfo: set_tag called for PDFNAME Invoice0098539.pdf
...
pdfinfo: Filename=Invoice0098539.pdf Total HxW: 560 x 824 (55232 area)
pdfinfo: Filename=Invoice0098539.pdf Title=untitled Author=unknown
Producer=unknown Created=20220302192255 Modified=0
pdfinfo: MD5 results for Invoice0098539.pdf -
md5=3F6F5C7CB71BDB101BADEF3CFFA9FE63
fuzzy1=32531F1D9420EE5721866DF28A3C6A17
fuzzy2=549DC099D6DFEF65AEA67FA0DF151C14
pdfinfo: set_tag called for PDFPRODUCER unknown
pdfinfo: set_tag called for PDFTITLE untitled
pdfinfo: set_tag called for PDFCREATOR unknown
pdfinfo: set_tag called for PDFAUTHOR unknown
pdfinfo: set_tag called for PDFMD5 32531F1D9420EE5721866DF28A3C6A17
pdfinfo: set_tag called for PDFMD5FUZZY1 32531F1D9420EE5721866DF28A3C6A17
pdfinfo: set_tag called for PDFMD5FUZZY2 549DC099D6DFEF65AEA67FA0DF151C14
pdfinfo: set_tag called for PDFCOUNT 1
pdfinfo: set_tag called for PDFIMGCOUNT 8
pdfinfo: image ratio=0.00103201042873696, min=0.000 max=0.005
pdfinfo: is_empty_body = 23 bytes
pdfinfo: pdf_name_regex hit on Invoice0098539.pdf

Reply via email to