JT DeLys wrote, on 16/07/07 02:14 PM:
Hi,
Could someone perhaps succinctly summarize the various & sundry
anti-pdf-image-spam tools that are currently in play?
PDFText
-- works in 3.2, not 3.1
This one is my fault :(. PDFText _does_ work in 3.1 and that is where we
are getting the most use from it. PDFtext2 is for 3.2.
It's goal was/is to get the text from PDF's and do your SPAM matching on
them. With PDFText, you have to request the match tests with the
exectute command, i.e. :
body PDF_TO_TEXT eval:check_pdftext('stock','profit','Symbol::4')
That example gives the match "Symbol:" a value of 4 points.
With PDFText2, the found text is added (rendered) to the main tests that
SpamAssassin does.
Both get the info that comes from running pdfinfo and pdftotext on the
PDFs attached, which gives you access to information like "Title:".
PDFText2 can also use gocr to do OCR on any PDF images. I'm not sold on
that as the first one I tested it on gave back :
SZSN St_nd_ To Proflt 1,4 mllll On In D_V_lopm_nt ProJ_otI
/Sh_ndon_ Zhouyu_n S__d _nd Nur__ry Co,, Ltd (SZsN7
fo,2g up ao,Bx (9_51 EST7
SZSN _nnouno_d lt_ _nt_rln_ lnto _n __r__m_nt ln _ r__l
__t_t_ d_v_lopm_nt th_t _t_nd_ to proflt th_ o an p_ny f1,4
mllllOn_ Thl_ oomp_ny l_ Ju_t ___rln_ up_ Aot f__t _nd __t
on SZSN,
/Good luck with that :).
Hope that helps,
JES