[Bug 5292] URIDNSBL erroneously matches substrings of words with accents

2019-08-21 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=5292

Henrik Krohns  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 CC||apa...@hege.li
 Status|NEW |RESOLVED

--- Comment #7 from Henrik Krohns  ---
Improved with latest commit, schemeless parser will only match pure
alphanumeric now.

Sendingspamassassin-3.4/lib/Mail/SpamAssassin/PerMsgStatus.pm
Sendingtrunk/lib/Mail/SpamAssassin/PerMsgStatus.pm
Transmitting file data ..done
Committing transaction...
Committed revision 1865612.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[auto] bad sandbox rules report

2019-08-21 Thread Rules Report Cron
HTTP get: https://ruleqa.spamassassin.org/1-days-ago?xml=1
HTTP get: https://ruleqa.spamassassin.org/2-days-ago?xml=1
HTTP get: https://ruleqa.spamassassin.org/3-days-ago?xml=1

Bad performing rules, from the past 3 night's mass-checks.

(Note: 'net' rules will be listed as 'no hits' unless you set 'tflags net'.
This also applies for meta rules which use 'net' rules.)


rulesrc/sandbox/smf/30_smf_nontest.cf (4 rules, 1 bad):

  FSL_LINK_AWS_S3_WEB:  bad, avg S/O=0.01 avg Spam%=0.00 avg Ham%=0.05

rulesrc/sandbox/smf/20_smf.cf (42 rules, 32 bad):

  FSL_ABUSED_WEB_1:  bad, avg S/O=0.40 avg Spam%=1.77 avg Ham%=2.61
  FSL_ABUSED_WEB_2:  bad, avg S/O=0.60 avg Spam%=0.92 avg Ham%=0.60
  FSL_ABUSED_WEB_3:  bad, avg S/O=0.57 avg Spam%=1.94 avg Ham%=1.46
  FSL_NOT_FROM_HOTMAIL:  bad, avg S/O=0.43 avg Spam%=0.05 avg Ham%=0.06
  FSL_NOT_FROM_YAHOO:  bad, avg S/O=0.62 avg Spam%=0.00 avg Ham%=0.00
  FSL_NO_RCVD_1:  bad, avg S/O=0.02 avg Spam%=0.05 avg Ham%=2.36
  FSL_RCVD_EX_0:  bad, avg S/O=0.02 avg Spam%=0.05 avg Ham%=2.49
  FSL_RCVD_EX_1:  bad, avg S/O=0.65 avg Spam%=60.80 avg Ham%=33.09
  FSL_RCVD_EX_2:  bad, avg S/O=0.47 avg Spam%=24.55 avg Ham%=27.93
  FSL_RCVD_EX_3:  bad, avg S/O=0.14 avg Spam%=3.82 avg Ham%=23.62
  FSL_RCVD_EX_4:  bad, avg S/O=0.57 avg Spam%=6.76 avg Ham%=5.00
  FSL_RCVD_EX_5:  bad, avg S/O=0.43 avg Spam%=2.46 avg Ham%=3.28
  FSL_RCVD_EX_GT_5:  bad, avg S/O=0.25 avg Spam%=1.56 avg Ham%=4.60
  FSL_RCVD_TR_1:  bad, avg S/O=0.09 avg Spam%=2.09 avg Ham%=22.36
  FSL_RCVD_TR_4:  bad, avg S/O=0.15 avg Spam%=0.01 avg Ham%=0.12
  FSL_RCVD_TR_5:  bad, avg S/O=0.69 avg Spam%=58.15 avg Ham%=26.66
  FSL_RCVD_TR_GT_5:  bad, avg S/O=0.13 avg Spam%=0.02 avg Ham%=0.16
  FSL_RCVD_UT_1:  bad, avg S/O=0.65 avg Spam%=60.80 avg Ham%=32.46
  FSL_RCVD_UT_2:  bad, avg S/O=0.47 avg Spam%=24.55 avg Ham%=27.93
  FSL_RCVD_UT_3:  bad, avg S/O=0.14 avg Spam%=3.82 avg Ham%=23.62
  FSL_RCVD_UT_4:  bad, avg S/O=0.57 avg Spam%=6.76 avg Ham%=5.00
  FSL_RCVD_UT_5:  bad, avg S/O=0.43 avg Spam%=2.46 avg Ham%=3.28
  FSL_RCVD_UT_GT_5:  bad, avg S/O=0.25 avg Spam%=1.56 avg Ham%=4.60
  __FSL_COUNT_EXTERN:  bad, avg S/O=0.51 avg Spam%=99.95 avg Ham%=97.51
  # used in: FSL_RCVD_EX_0 FSL_RCVD_EX_1 FSL_RCVD_EX_2 FSL_RCVD_EX_3 
FSL_RCVD_EX_4 FSL_RCVD_EX_5 FSL_RCVD_EX_GT_5
  __FSL_COUNT_TRUST:  bad, avg S/O=0.62 avg Spam%=88.41 avg Ham%=53.24
  # used in: FSL_NO_RCVD_1 FSL_RCVD_TR_1 FSL_RCVD_TR_4 FSL_RCVD_TR_5 
FSL_RCVD_TR_GT_5
  __FSL_COUNT_UNTRUST:  bad, avg S/O=0.51 avg Spam%=99.95 avg Ham%=96.88
  # used in: FSL_NO_RCVD_1 FSL_RCVD_UT_1 FSL_RCVD_UT_2 FSL_RCVD_UT_3 
FSL_RCVD_UT_4 FSL_RCVD_UT_5 FSL_RCVD_UT_GT_5
  __FSL_ENVFROM_HOTMAIL:  bad, avg S/O=0.43 avg Spam%=0.05 avg Ham%=0.06
  # used in: FSL_NOT_FROM_HOTMAIL
  __FSL_ENVFROM_LIVE:  bad, avg S/O=0.09 avg Spam%=0.00 avg Ham%=0.00
  # used in: FSL_NOT_FROM_HOTMAIL
  __FSL_ENVFROM_YAHOO:  bad, avg S/O=0.32 avg Spam%=0.00 avg Ham%=0.01
  # used in: FSL_NOT_FROM_YAHOO
  __FSL_RELAY_GOOGLE:  bad, avg S/O=0.04 avg Spam%=0.16 avg Ham%=3.42
  # used in: TO_IN_SUBJ URI_GOOGLE_PROXY
  __FSL_RELAY_HOTMAIL:  bad, avg S/O=0.25 avg Spam%=0.01 avg Ham%=0.02
  # used in: FSL_NOT_FROM_HOTMAIL
  __FSL_RELAY_YAHOO:  bad, avg S/O=0.34 avg Spam%=0.16 avg Ham%=0.30
  # used in: FSL_NOT_FROM_YAHOO

rulesrc/sandbox/sidney/70_other.cf (1 rules, 1 bad):

  T_UPPERCASE_HTTP:  bad, avg S/O=0.47 avg Spam%=0.03 avg Ham%=0.03

rulesrc/sandbox/pds/20_ntld.cf (21 rules, 6 bad):

  BULK_RE_SUSP_NTLD:  bad, avg S/O=0.15 avg Spam%=0.00 avg Ham%=0.00
  GOOGLE_DRIVE_REPLY_BAD_NTLD:  no hits at all
  SENT_TO_EMAIL_ADDR:  no hits at all
  VPS_NO_NTLD:  no hits at all
  __PDS_SENT_TO_EMAIL_ADDR:  no hits at all
  # used in: SENT_TO_EMAIL_ADDR
  __VPSNUMBERONLY_TLD:  no hits at all
  # used in: VPS_NO_NTLD

rulesrc/sandbox/pds/20_gdocs.cf (10 rules, 4 bad):

  __PDS_GOOGLE_DRIVE_SHARE:  no hits of target type
  # used in: GOOGLE_DRIVE_REPLY_BAD_NTLD
  __PDS_GOOGLE_DRIVE_SHARE_1:  bad, avg S/O=0.06 avg Spam%=0.00 avg Ham%=0.00
  # used in: GOOGLE_DRIVE_REPLY_BAD_NTLD __PDS_GOOGLE_DRIVE_SHARE
  __PDS_GOOGLE_DRIVE_SHARE_2:  no hits of target type
  # used in: GOOGLE_DRIVE_REPLY_BAD_NTLD __PDS_GOOGLE_DRIVE_SHARE
  __PDS_GOOGLE_DRIVE_SHARE_3:  no hits at all
  # used in: GOOGLE_DRIVE_REPLY_BAD_NTLD __PDS_GOOGLE_DRIVE_SHARE

rulesrc/sandbox/pds/10_menaces.cf (16 rules, 4 bad):

  BODY_QUOTE_MALF_MSGID:  bad, avg S/O=0.70 avg Spam%=1.79 avg Ham%=0.75
  PDS_DOUBLE_URL:  bad, avg S/O=0.08 avg Spam%=0.73 avg Ham%=8.15
  __PDS_BODY_QUOTE:  bad, avg S/O=0.48 avg Spam%=19.78 avg Ham%=21.75
  # used in: BODY_QUOTE_MALF_MSGID
  __PDS_DOUBLE_URL:  bad, avg S/O=0.08 avg Spam%=0.73 avg Ham%=8.15
  # used in: PDS_DOUBLE_URL

rulesrc/sandbox/mmartinec/20_rpvalid.cf (2 rules, 1 bad):

  __RP_MATCHES_RCVD:  bad, avg S/O=0.16 avg Spam%=9.06 avg Ham%=48.47
  # used in: ADVANCE_FEE_3_NEW LIST_PRTL_SAME_USER THIS_AD GAPPY_HTML 
PDS_FROM_2_EMAILS SUBJ_BROKEN_WORDS SUBJ_OBFU_PUNCT_FE

Re: text/plain in MIME format mail

2019-08-21 Thread Shreyansh Shrivastava.
Hey Henrik,

Thanks for pointing out the error. Will use the stripped function instead.

Regards,
Shreyansh Shrivastava

On Wed, 21 Aug 2019, 11:20 Henrik K,  wrote:

>
> Everything written there is wrong.
>
> SA Bayes uses $pms->get_decoded_stripped_body_text_array(), which returns
> the text that is supposed to be displayed to user / MUAs, with text/html
> part rendered to text if exists.
>
> So use the stripped function, unless your engine handles mime multipart,
> HTML rendering etc.
>
> get_decoded_stripped_body_text_array is what 'body' rules process
> get_decoded_body_text_array is what 'rawbody' rules process
>
> I've written detailed info about the rule types here:
>
> https://cwiki.apache.org/confluence/display/spamassassin/WritingRulesAdvanced
>
> The PerMsgStatus docs are quite poor in this regard, I tried to described a
> bit more in current SVN versions..
>
> Cheers,
> Henrik
>
> On Wed, Aug 21, 2019 at 03:12:22AM +0530, Shreyansh Shrivastava. wrote:
> > Hey Kris,
> > Thanks for the pointer. Will try to accommodate both the sections.
> >
> > Also, I found the answer. $pms->get_decoded_body_text_array() returns an
> array
> > of strings where each string represented one newline-separated line of
> the
> > body. Also since the newline gets converted into  int text/html, the
> whole
> > text/html part becomes the last element of the array. Using pop() on the
> array
> > will leave you with only the text/plain part.
> >
> > Thanks,
> > Shreyansh Shrivastava
> >
> >
> > On Wed, Aug 21, 2019 at 3:06 AM Kris Deugau <[1]kdeu...@vianet.ca>
> wrote:
> >
> > Shreyansh Shrivastava. wrote:
> > > I wanted to process only the text/plain part of the mail hence I
> was
> > > looking for a sub in SA. The closest I could get was
> > > $pms->get_decoded_body_text_array () which returns an array of
> strings
> > > comprising both text/plain and text/html part of the mail.
> > >
> > > Is there any other way of retrieving the text/plain part only?
> >
> > I can't really answer what you're asking, but I will point out that
> the
> > text/plain part is often empty or at least different from the
> text/html
> > part - on both spam and ham.  Looking only at the text/html would be
> > slightly better, but using both would be better still.
> >
> > The HTML formatting/structure itself is often valuable for spam signs
> > too, on top of whatever readable text content it contains.
> >
> > -kgd
> >
> >
> > References:
> >
> > [1] mailto:kdeu...@vianet.ca
>


[Bug 7219] Incorrect use of __BODY_TEXT_LINE

2019-08-21 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7219

Henrik Krohns  changed:

   What|Removed |Added

 CC||apa...@hege.li

--- Comment #2 from Henrik Krohns  ---
Could someone please fix these rules, now that we have a nosubject tflag too?
Needs a feature if check for it..

Seems Bill also made some test rules, but they are still just that?

billcole/21_bug_7219.cf
jhardin/20_misc_testing.cf

-- 
You are receiving this mail because:
You are the assignee for the bug.

Re: text/plain in MIME format mail

2019-08-21 Thread Henrik K


Everything written there is wrong.

SA Bayes uses $pms->get_decoded_stripped_body_text_array(), which returns
the text that is supposed to be displayed to user / MUAs, with text/html
part rendered to text if exists.

So use the stripped function, unless your engine handles mime multipart,
HTML rendering etc.

get_decoded_stripped_body_text_array is what 'body' rules process
get_decoded_body_text_array is what 'rawbody' rules process

I've written detailed info about the rule types here:
https://cwiki.apache.org/confluence/display/spamassassin/WritingRulesAdvanced

The PerMsgStatus docs are quite poor in this regard, I tried to described a
bit more in current SVN versions..

Cheers,
Henrik

On Wed, Aug 21, 2019 at 03:12:22AM +0530, Shreyansh Shrivastava. wrote:
> Hey Kris, 
> Thanks for the pointer. Will try to accommodate both the sections.
> 
> Also, I found the answer. $pms->get_decoded_body_text_array() returns an array
> of strings where each string represented one newline-separated line of the
> body. Also since the newline gets converted into  int text/html, the whole
> text/html part becomes the last element of the array. Using pop() on the array
> will leave you with only the text/plain part.
> 
> Thanks,
> Shreyansh Shrivastava
> 
> 
> On Wed, Aug 21, 2019 at 3:06 AM Kris Deugau <[1]kdeu...@vianet.ca> wrote:
> 
> Shreyansh Shrivastava. wrote:
> > I wanted to process only the text/plain part of the mail hence I was
> > looking for a sub in SA. The closest I could get was
> > $pms->get_decoded_body_text_array () which returns an array of strings
> > comprising both text/plain and text/html part of the mail.
> >
> > Is there any other way of retrieving the text/plain part only?
> 
> I can't really answer what you're asking, but I will point out that the
> text/plain part is often empty or at least different from the text/html
> part - on both spam and ham.  Looking only at the text/html would be
> slightly better, but using both would be better still.
> 
> The HTML formatting/structure itself is often valuable for spam signs
> too, on top of whatever readable text content it contains.
> 
> -kgd
> 
> 
> References:
> 
> [1] mailto:kdeu...@vianet.ca