Daniel J McDonald wrote:
On Wed, 2007-07-11 at 14:49 +0530, Suhas Ingale wrote:
Has anyone tried running PDFInfo plugin with 3.1.7 version?
No, finally got it working yesterday evening using 3.2.1, but the
initial results are underwhelming. Almost 100% overlap with
TVD_SPACE_RATIO. Only one miss:
First of all, TVD_SPACE_RATIO only applies for those running v3.2,
whereas PDFInfo.pm can be used with any 3.x version..
Secondly, TVD_SPACE_RATIO can fire almost at will without a body.
$ echo "" | spamassassin
2.9 TVD_SPACE_RATIO BODY: TVD_SPACE_RATIO
Take the basic mime part from a pdf stock spam... it looks similar to this
--------------050701020003040207010006
Content-Type: text/plain; charset=iso-8859-2; format=flowed
Content-Transfer-Encoding: 7bit
--------------050701020003040207010006
and it fires on TVD_SPACE_RATIO fine.
$ cat /root/sample2.txt | spamassassin -D 2>&1 | grep -i tvd
[26686] dbg: tvd: word [SPAM-8.3]- Re: warning_6042146166.pdf
[26686] dbg: tvd: len=39
[26686] dbg: tvd: spaces 2 nonspaces 37
[26686] dbg: tvd: pct = 5
[26686] dbg: tvd: final = 5
[26686] dbg: rules: ran eval rule TVD_SPACE_RATIO ======> got hit (1)
change the mime part to
--------------050701020003040207010006
Content-Type: text/plain; charset=iso-8859-2; format=flowed
Content-Transfer-Encoding: 7bit
tvd no longer fires now
--------------050701020003040207010006
$ cat /root/sample2.txt | spamassassin -D 2>&1 | grep -i tvd
[26739] dbg: tvd: word [SPAM-8.3]- Re: warning_6042146166.pdf
[26739] dbg: tvd: len=39
[26739] dbg: tvd: spaces 2 nonspaces 37
[26739] dbg: tvd: pct = 5
[26739] dbg: tvd: word tvd no longer fires now
[26739] dbg: tvd: len=24
[26739] dbg: tvd: spaces 4 nonspaces 20
[26739] dbg: tvd: pct = 20
[26739] dbg: tvd: final = 20
... and 20 isnt between tvd_vertical_words('0','10')
Easy for spammy to avoid that. Even more, this rule has a good chance
of falsing. I emailed myself a png from webalizer without any body text.
# cat test | spamassassin -D 2>&1 |grep -i tvd
[27390] dbg: tvd: word hourly_usage_200706.png
[27390] dbg: tvd: len=24
[27390] dbg: tvd: spaces 0 nonspaces 24
[27390] dbg: tvd: pct = 0
[27390] dbg: tvd: final = 0
[27390] dbg: rules: ran eval rule TVD_SPACE_RATIO ======> got hit (1)
The fact is, email is "FTP for Dummies"... and IMHO, TVD_SPACE_RATIO
may be a bit high at 2.9.
BTW, v0.3 of PDFInfo.pm is now posted - so for those that have it
already, you might want to sync up
# counts GMD_PDF_HORIZ 135s/0h of 6132 corpus (4555s/1577h
AxB-MANUAL) 07/11/07
# counts GMD_PDF_HORIZ 31s/0h of 11773 corpus (10988s/785h
AxB2-TRAPS) 07/11/07
# counts GMD_PDF_SQUARE 36s/0h of 6132 corpus (4555s/1577h
AxB-MANUAL) 07/11/07
# counts GMD_PDF_SQUARE 11s/0h of 11773 corpus (10988s/785h
AxB2-TRAPS) 07/11/07
# counts GMD_PDF_VERT 24s/0h of 6132 corpus (4555s/1577h
AxB-MANUAL) 07/11/07
# counts GMD_PDF_VERT 10s/0h of 11773 corpus (10988s/785h
AxB2-TRAPS) 07/11/07
# counts GMD_PDF_FUZZY1_T1 591s/0h of 6132 corpus (555s/1577h
AxB-MANUAL) 07/11/07
# counts GMD_PDF_FUZZY1_T1 199s/0h of 11773 corpus (10988s/785h
AxB2-TRAPS) 07/11/07
# counts GMD_PDF_FUZZY2_T1 199s/0h of 11773 corpus (10988s/785h
AxB2-TRAPS) 07/11/07
# counts GMD_PDF_FUZZY2_T1 591s/0h of 6132 corpus (555s/1577h
AxB-MANUAL) 07/11/07
# counts GMD_PDF_FUZZY2_T2 118s/0h of 6132 corpus (555s/1577h
AxB-MANUAL) 07/11/07
# counts GMD_PDF_FUZZY2_T2 1s/0h of 10767 corpus (9986s/781h
AxB2-TRAPS) 07/11/07
# counts GMD_PDF_FUZZY2_T3 0s/0h of 10767 corpus (9986s/781h
AxB2-TRAPS) 07/11/07
# counts GMD_PDF_FUZZY2_T3 25s/0h of 5641 corpus (4064s/1577h
AxB-MANUAL) 07/11/07
# counts GMD_PDF_FUZZY2_T4 105s/0h of 6132 corpus (555s/1577h
AxB-MANUAL) 07/11/07
# counts GMD_PDF_FUZZY2_T4 28s/0h of 10767 corpus (9986s/781h
AxB2-TRAPS) 07/11/07
# counts GMD_AUTHOR_COLET 1s/0h of 10767 corpus (9986s/781h
AxB2-TRAPS) 07/11/07
# counts GMD_AUTHOR_COLET 2s/0h of 6132 corpus (555s/1577h
AxB-MANUAL) 07/11/07
# counts GMD_AUTHOR_MOBILE 2s/0h of 6132 corpus (555s/1577h
AxB-MANUAL) 07/11/07
# counts GMD_AUTHOR_MOBILE 55s/0h of 10767 corpus (9986s/781h
AxB2-TRAPS) 07/11/07
# counts GMD_AUTHOR_OOO 1s/0h of 10767 corpus (9986s/781h
AxB2-TRAPS) 07/11/07
# counts GMD_AUTHOR_OOO 118s/0h of 6132 corpus (555s/1577h
AxB-MANUAL) 07/11/07
# counts GMD_AUTHOR_HPADMIN 105s/0h of 6132 corpus (4555s/1577h
AxB-MANUAL) 07/11/07
# counts GMD_AUTHOR_HPADMIN 27s/0h of 11773 corpus (10988s/785h
AxB2-TRAPS) 07/11/07
# counts GMD_PRODUCER_GPL 227s/0h of 6132 corpus (555s/1577h
AxB-MANUAL) 07/11/07
# counts GMD_PRODUCER_GPL 85s/0h of 10767 corpus (9986s/781h
AxB2-TRAPS) 07/11/07
# counts GMD_PRODUCER_POWERPDF 0s/0h of 10767 corpus (9986s/781h
AxB2-TRAPS) 07/11/07
# counts GMD_PRODUCER_POWERPDF 0s/0h of 5641 corpus (4064s/1577h
AxB-MANUAL) 07/11/07
# counts GMD_PDF_STOX_M1 159s/0h of 6132 corpus (555s/1577h
AxB-MANUAL) 07/11/07
# counts GMD_PDF_STOX_M1 40s/0h of 11773 corpus (10988s/785h
AxB2-TRAPS) 07/11/07
# counts GMD_PDF_STOX_M2 223s/0h of 6132 corpus (555s/1577h
AxB-MANUAL) 07/11/07
# counts GMD_PDF_STOX_M2 29s/0h of 10767 corpus (9986s/781h
AxB2-TRAPS) 07/11/07
--
Dallas Engelken
[EMAIL PROTECTED]
http://uribl.com