JT DeLys wrote, on 16/07/07 06:36 PM:
Hi,
With PDFText2, the found text is added (rendered) to the main
tests that SpamAssassin does.
Do you mean to those tests defined in 80_additional.cf? or others?
It means any test you do on the body of e-mail will test against this.
for example, in your local.cf you might have :
body STOCK_TEST /stock/i
describe STOCK_TEST Found the word stock
score STOCK_TEST 4.5
When PDFText2 is loaded, it's rendered text will be tested for the word
stock just like everything else that SpamAssassin offers for your tests
to match against. You might consider it to be the more SpamAssassin
natural way of matching against PDF text :).
PDFText2 can also use gocr to do OCR on any PDF images. I'm not
sold on that as the first one I tested it on gave back :
Is that different capability/functionality than FuzzyOCR is undertaking?
Well, I am going to say similar, yet different :). PDFText2 currently
does an OCR of the images and adds them to the rendered text. The OCRed
text may not be very accurate and will not match that well.
FuzzyOCR, if I understand what I have seen so far and the author will be
much better then I to respond, takes the OCR rendered from any one of
the available OCR engines and uses String::Approx (and maybe other
tools) to match against a word list you supply specifically for
fuzzyOCR. Much better chance of getting a hit on images.
--
Thanks,
JTDeLys
Quite welcome,
JES