JT DeLys wrote, on 16/07/07 02:14 PM:
Hi,

Could someone perhaps succinctly summarize the various & sundry anti-pdf-image-spam tools that are currently in play?

  PDFText
   -- works in 3.2, not 3.1

This one is my fault :(. PDFText _does_ work in 3.1 and that is where we are getting the most use from it. PDFtext2 is for 3.2.

It's goal was/is to get the text from PDF's and do your SPAM matching on them. With PDFText, you have to request the match tests with the exectute command, i.e. :

body PDF_TO_TEXT eval:check_pdftext('stock','profit','Symbol::4')

That example gives the match "Symbol:" a value of 4 points.

With PDFText2, the found text is added (rendered) to the main tests that SpamAssassin does.

Both get the info that comes from running pdfinfo and pdftotext on the PDFs attached, which gives you access to information like "Title:".

PDFText2 can also use gocr to do OCR on any PDF images. I'm not sold on that as the first one I tested it on gave back :

SZSN St_nd_ To Proflt 1,4 mllll On In D_V_lopm_nt ProJ_otI

/Sh_ndon_ Zhouyu_n S__d _nd Nur__ry Co,, Ltd (SZsN7
fo,2g up ao,Bx (9_51 EST7

SZSN _nnouno_d lt_ _nt_rln_ lnto _n __r__m_nt ln _ r__l
__t_t_ d_v_lopm_nt th_t _t_nd_ to proflt th_ o an p_ny f1,4
mllllOn_ Thl_ oomp_ny l_ Ju_t ___rln_ up_ Aot f__t _nd __t
on SZSN,

/Good luck with that :).

Hope that helps,
JES

Reply via email to