[Bug 5292] URIDNSBL erroneously matches substrings of words with accents
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=5292 Henrik Krohns changed: What|Removed |Added Resolution|--- |FIXED CC||apa...@hege.li Status|NEW |RESOLVED --- Comment #7 from Henrik Krohns --- Improved with latest commit, schemeless parser will only match pure alphanumeric now. Sendingspamassassin-3.4/lib/Mail/SpamAssassin/PerMsgStatus.pm Sendingtrunk/lib/Mail/SpamAssassin/PerMsgStatus.pm Transmitting file data ..done Committing transaction... Committed revision 1865612. -- You are receiving this mail because: You are the assignee for the bug.
[auto] bad sandbox rules report
HTTP get: https://ruleqa.spamassassin.org/1-days-ago?xml=1 HTTP get: https://ruleqa.spamassassin.org/2-days-ago?xml=1 HTTP get: https://ruleqa.spamassassin.org/3-days-ago?xml=1 Bad performing rules, from the past 3 night's mass-checks. (Note: 'net' rules will be listed as 'no hits' unless you set 'tflags net'. This also applies for meta rules which use 'net' rules.) rulesrc/sandbox/smf/30_smf_nontest.cf (4 rules, 1 bad): FSL_LINK_AWS_S3_WEB: bad, avg S/O=0.01 avg Spam%=0.00 avg Ham%=0.05 rulesrc/sandbox/smf/20_smf.cf (42 rules, 32 bad): FSL_ABUSED_WEB_1: bad, avg S/O=0.40 avg Spam%=1.77 avg Ham%=2.61 FSL_ABUSED_WEB_2: bad, avg S/O=0.60 avg Spam%=0.92 avg Ham%=0.60 FSL_ABUSED_WEB_3: bad, avg S/O=0.57 avg Spam%=1.94 avg Ham%=1.46 FSL_NOT_FROM_HOTMAIL: bad, avg S/O=0.43 avg Spam%=0.05 avg Ham%=0.06 FSL_NOT_FROM_YAHOO: bad, avg S/O=0.62 avg Spam%=0.00 avg Ham%=0.00 FSL_NO_RCVD_1: bad, avg S/O=0.02 avg Spam%=0.05 avg Ham%=2.36 FSL_RCVD_EX_0: bad, avg S/O=0.02 avg Spam%=0.05 avg Ham%=2.49 FSL_RCVD_EX_1: bad, avg S/O=0.65 avg Spam%=60.80 avg Ham%=33.09 FSL_RCVD_EX_2: bad, avg S/O=0.47 avg Spam%=24.55 avg Ham%=27.93 FSL_RCVD_EX_3: bad, avg S/O=0.14 avg Spam%=3.82 avg Ham%=23.62 FSL_RCVD_EX_4: bad, avg S/O=0.57 avg Spam%=6.76 avg Ham%=5.00 FSL_RCVD_EX_5: bad, avg S/O=0.43 avg Spam%=2.46 avg Ham%=3.28 FSL_RCVD_EX_GT_5: bad, avg S/O=0.25 avg Spam%=1.56 avg Ham%=4.60 FSL_RCVD_TR_1: bad, avg S/O=0.09 avg Spam%=2.09 avg Ham%=22.36 FSL_RCVD_TR_4: bad, avg S/O=0.15 avg Spam%=0.01 avg Ham%=0.12 FSL_RCVD_TR_5: bad, avg S/O=0.69 avg Spam%=58.15 avg Ham%=26.66 FSL_RCVD_TR_GT_5: bad, avg S/O=0.13 avg Spam%=0.02 avg Ham%=0.16 FSL_RCVD_UT_1: bad, avg S/O=0.65 avg Spam%=60.80 avg Ham%=32.46 FSL_RCVD_UT_2: bad, avg S/O=0.47 avg Spam%=24.55 avg Ham%=27.93 FSL_RCVD_UT_3: bad, avg S/O=0.14 avg Spam%=3.82 avg Ham%=23.62 FSL_RCVD_UT_4: bad, avg S/O=0.57 avg Spam%=6.76 avg Ham%=5.00 FSL_RCVD_UT_5: bad, avg S/O=0.43 avg Spam%=2.46 avg Ham%=3.28 FSL_RCVD_UT_GT_5: bad, avg S/O=0.25 avg Spam%=1.56 avg Ham%=4.60 __FSL_COUNT_EXTERN: bad, avg S/O=0.51 avg Spam%=99.95 avg Ham%=97.51 # used in: FSL_RCVD_EX_0 FSL_RCVD_EX_1 FSL_RCVD_EX_2 FSL_RCVD_EX_3 FSL_RCVD_EX_4 FSL_RCVD_EX_5 FSL_RCVD_EX_GT_5 __FSL_COUNT_TRUST: bad, avg S/O=0.62 avg Spam%=88.41 avg Ham%=53.24 # used in: FSL_NO_RCVD_1 FSL_RCVD_TR_1 FSL_RCVD_TR_4 FSL_RCVD_TR_5 FSL_RCVD_TR_GT_5 __FSL_COUNT_UNTRUST: bad, avg S/O=0.51 avg Spam%=99.95 avg Ham%=96.88 # used in: FSL_NO_RCVD_1 FSL_RCVD_UT_1 FSL_RCVD_UT_2 FSL_RCVD_UT_3 FSL_RCVD_UT_4 FSL_RCVD_UT_5 FSL_RCVD_UT_GT_5 __FSL_ENVFROM_HOTMAIL: bad, avg S/O=0.43 avg Spam%=0.05 avg Ham%=0.06 # used in: FSL_NOT_FROM_HOTMAIL __FSL_ENVFROM_LIVE: bad, avg S/O=0.09 avg Spam%=0.00 avg Ham%=0.00 # used in: FSL_NOT_FROM_HOTMAIL __FSL_ENVFROM_YAHOO: bad, avg S/O=0.32 avg Spam%=0.00 avg Ham%=0.01 # used in: FSL_NOT_FROM_YAHOO __FSL_RELAY_GOOGLE: bad, avg S/O=0.04 avg Spam%=0.16 avg Ham%=3.42 # used in: TO_IN_SUBJ URI_GOOGLE_PROXY __FSL_RELAY_HOTMAIL: bad, avg S/O=0.25 avg Spam%=0.01 avg Ham%=0.02 # used in: FSL_NOT_FROM_HOTMAIL __FSL_RELAY_YAHOO: bad, avg S/O=0.34 avg Spam%=0.16 avg Ham%=0.30 # used in: FSL_NOT_FROM_YAHOO rulesrc/sandbox/sidney/70_other.cf (1 rules, 1 bad): T_UPPERCASE_HTTP: bad, avg S/O=0.47 avg Spam%=0.03 avg Ham%=0.03 rulesrc/sandbox/pds/20_ntld.cf (21 rules, 6 bad): BULK_RE_SUSP_NTLD: bad, avg S/O=0.15 avg Spam%=0.00 avg Ham%=0.00 GOOGLE_DRIVE_REPLY_BAD_NTLD: no hits at all SENT_TO_EMAIL_ADDR: no hits at all VPS_NO_NTLD: no hits at all __PDS_SENT_TO_EMAIL_ADDR: no hits at all # used in: SENT_TO_EMAIL_ADDR __VPSNUMBERONLY_TLD: no hits at all # used in: VPS_NO_NTLD rulesrc/sandbox/pds/20_gdocs.cf (10 rules, 4 bad): __PDS_GOOGLE_DRIVE_SHARE: no hits of target type # used in: GOOGLE_DRIVE_REPLY_BAD_NTLD __PDS_GOOGLE_DRIVE_SHARE_1: bad, avg S/O=0.06 avg Spam%=0.00 avg Ham%=0.00 # used in: GOOGLE_DRIVE_REPLY_BAD_NTLD __PDS_GOOGLE_DRIVE_SHARE __PDS_GOOGLE_DRIVE_SHARE_2: no hits of target type # used in: GOOGLE_DRIVE_REPLY_BAD_NTLD __PDS_GOOGLE_DRIVE_SHARE __PDS_GOOGLE_DRIVE_SHARE_3: no hits at all # used in: GOOGLE_DRIVE_REPLY_BAD_NTLD __PDS_GOOGLE_DRIVE_SHARE rulesrc/sandbox/pds/10_menaces.cf (16 rules, 4 bad): BODY_QUOTE_MALF_MSGID: bad, avg S/O=0.70 avg Spam%=1.79 avg Ham%=0.75 PDS_DOUBLE_URL: bad, avg S/O=0.08 avg Spam%=0.73 avg Ham%=8.15 __PDS_BODY_QUOTE: bad, avg S/O=0.48 avg Spam%=19.78 avg Ham%=21.75 # used in: BODY_QUOTE_MALF_MSGID __PDS_DOUBLE_URL: bad, avg S/O=0.08 avg Spam%=0.73 avg Ham%=8.15 # used in: PDS_DOUBLE_URL rulesrc/sandbox/mmartinec/20_rpvalid.cf (2 rules, 1 bad): __RP_MATCHES_RCVD: bad, avg S/O=0.16 avg Spam%=9.06 avg Ham%=48.47 # used in: ADVANCE_FEE_3_NEW LIST_PRTL_SAME_USER THIS_AD GAPPY_HTML PDS_FROM_2_EMAILS SUBJ_BROKEN_WORDS SUBJ_OBFU_PUNCT_FE
Re: text/plain in MIME format mail
Hey Henrik, Thanks for pointing out the error. Will use the stripped function instead. Regards, Shreyansh Shrivastava On Wed, 21 Aug 2019, 11:20 Henrik K, wrote: > > Everything written there is wrong. > > SA Bayes uses $pms->get_decoded_stripped_body_text_array(), which returns > the text that is supposed to be displayed to user / MUAs, with text/html > part rendered to text if exists. > > So use the stripped function, unless your engine handles mime multipart, > HTML rendering etc. > > get_decoded_stripped_body_text_array is what 'body' rules process > get_decoded_body_text_array is what 'rawbody' rules process > > I've written detailed info about the rule types here: > > https://cwiki.apache.org/confluence/display/spamassassin/WritingRulesAdvanced > > The PerMsgStatus docs are quite poor in this regard, I tried to described a > bit more in current SVN versions.. > > Cheers, > Henrik > > On Wed, Aug 21, 2019 at 03:12:22AM +0530, Shreyansh Shrivastava. wrote: > > Hey Kris, > > Thanks for the pointer. Will try to accommodate both the sections. > > > > Also, I found the answer. $pms->get_decoded_body_text_array() returns an > array > > of strings where each string represented one newline-separated line of > the > > body. Also since the newline gets converted into int text/html, the > whole > > text/html part becomes the last element of the array. Using pop() on the > array > > will leave you with only the text/plain part. > > > > Thanks, > > Shreyansh Shrivastava > > > > > > On Wed, Aug 21, 2019 at 3:06 AM Kris Deugau <[1]kdeu...@vianet.ca> > wrote: > > > > Shreyansh Shrivastava. wrote: > > > I wanted to process only the text/plain part of the mail hence I > was > > > looking for a sub in SA. The closest I could get was > > > $pms->get_decoded_body_text_array () which returns an array of > strings > > > comprising both text/plain and text/html part of the mail. > > > > > > Is there any other way of retrieving the text/plain part only? > > > > I can't really answer what you're asking, but I will point out that > the > > text/plain part is often empty or at least different from the > text/html > > part - on both spam and ham. Looking only at the text/html would be > > slightly better, but using both would be better still. > > > > The HTML formatting/structure itself is often valuable for spam signs > > too, on top of whatever readable text content it contains. > > > > -kgd > > > > > > References: > > > > [1] mailto:kdeu...@vianet.ca >
[Bug 7219] Incorrect use of __BODY_TEXT_LINE
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7219 Henrik Krohns changed: What|Removed |Added CC||apa...@hege.li --- Comment #2 from Henrik Krohns --- Could someone please fix these rules, now that we have a nosubject tflag too? Needs a feature if check for it.. Seems Bill also made some test rules, but they are still just that? billcole/21_bug_7219.cf jhardin/20_misc_testing.cf -- You are receiving this mail because: You are the assignee for the bug.
Re: text/plain in MIME format mail
Everything written there is wrong. SA Bayes uses $pms->get_decoded_stripped_body_text_array(), which returns the text that is supposed to be displayed to user / MUAs, with text/html part rendered to text if exists. So use the stripped function, unless your engine handles mime multipart, HTML rendering etc. get_decoded_stripped_body_text_array is what 'body' rules process get_decoded_body_text_array is what 'rawbody' rules process I've written detailed info about the rule types here: https://cwiki.apache.org/confluence/display/spamassassin/WritingRulesAdvanced The PerMsgStatus docs are quite poor in this regard, I tried to described a bit more in current SVN versions.. Cheers, Henrik On Wed, Aug 21, 2019 at 03:12:22AM +0530, Shreyansh Shrivastava. wrote: > Hey Kris, > Thanks for the pointer. Will try to accommodate both the sections. > > Also, I found the answer. $pms->get_decoded_body_text_array() returns an array > of strings where each string represented one newline-separated line of the > body. Also since the newline gets converted into int text/html, the whole > text/html part becomes the last element of the array. Using pop() on the array > will leave you with only the text/plain part. > > Thanks, > Shreyansh Shrivastava > > > On Wed, Aug 21, 2019 at 3:06 AM Kris Deugau <[1]kdeu...@vianet.ca> wrote: > > Shreyansh Shrivastava. wrote: > > I wanted to process only the text/plain part of the mail hence I was > > looking for a sub in SA. The closest I could get was > > $pms->get_decoded_body_text_array () which returns an array of strings > > comprising both text/plain and text/html part of the mail. > > > > Is there any other way of retrieving the text/plain part only? > > I can't really answer what you're asking, but I will point out that the > text/plain part is often empty or at least different from the text/html > part - on both spam and ham. Looking only at the text/html would be > slightly better, but using both would be better still. > > The HTML formatting/structure itself is often valuable for spam signs > too, on top of whatever readable text content it contains. > > -kgd > > > References: > > [1] mailto:kdeu...@vianet.ca