W dniu 09.03.2017 o 14:42, Axb pisze: > On 03/09/2017 02:31 PM, mar...@mejor.pl wrote: >> W dniu 08.03.2017 o 17:30, Axb pisze: >>> On 03/08/2017 04:55 PM, mar...@mejor.pl wrote: >>>> W dniu 08.03.2017 o 16:33, Axb pisze: >>>>> On 03/08/2017 04:16 PM, mar...@mejor.pl wrote: >>>>>> W dniu 08.03.2017 o 16:06, Axb pisze: >>>>>>> On 03/08/2017 03:58 PM, mar...@mejor.pl wrote: >>>>>>>> W dniu 08.03.2017 o 15:27, Axb pisze: >>>>>>>>> As your command below shows you're using --reqpatlength 0 >>>>>>>>> >>>>>>>>> Start off with some sane as for example --reqpatlength 40 >>>>>>>>> >>>>>>>>> you may also want to play with --maxtextread >>>>>>>>> ( I use --maxtextread 8192 for FRAUD rules) >>>>>>>> >>>>>>>> But with --reqpatlength 10, 40, 100 or 1000 I've go no hit. Reading >>>>>>>> help >>>>>>>> ( "--reqpatlength: required pattern length, in characters >>>>>>>> (default: 0)" >>>>>>>> ) I understand that pattern in generated rule will be longer than >>>>>>>> reqpatlength (shorter strings will be ignored). Do I correctly >>>>>>>> assume >>>>>>>> how the parameter works? >>>>>>> >>>>>>> --reqpatlength 40 tells seekphrases to ignore any phrases which are >>>>>>> smaller than 40 chars >>>>>>> >>>>>>> just checked by line which is using >>>>>>> --reqpatlength 37 >>>>>> >>>>>> Any value>0 makes that no rule is generated. >>>>>> >>>>>>> body __AXB_FRAUD_LAF076 /It has come to our attention that you / >>>>>>> body __AXB_FRAUD_UPVTRT / in order to confirm your disbursement\./ >>>>>>> body __AXB_FRAUD_NOFUX2 / approval, your funds will be deposited >>>>>>> directly into your / >>>>>>> body __AXB_FRAUD_Z4ZZ7D / in order to accept your disbursement\./ >>>>>>> body __AXB_FRAUD_CUXJ6X / approval, your funds will be direct >>>>>>> deposited >>>>>>> into your / >>>>>>> body __AXB_FRAUD_NHWXKL /: You Are Eligible to Receive Funds up to >>>>>>> \$.,000\. / >>>>>>> >>>>>>> hard to guess what is not working on your side without full insight >>>>>> >>>>>> What can I do to help more? Should I share all_w.h and all_w.s files? >>>>> >>>>> before we go that way pls answer these questions >>>>> >>>>> how many spams/hams are you processing? >>>> >>>> ham: ~1400 >>>> spam: ~8200 >>>> >>>>> do you have a file named assemble.state ? if yes, how large? >>>> >>>> Yes, I've got this file, it has ~9MB size. >>>> >>>>> and pls zip & send me the full script you're using to generate the >>>>> rules, OFFLIST! do NOT post to list >>>> >>>> Ok, I'll choose tar.bz2 ;) >>>> Thanks for help. >>> >>> replying on list as much as I can so it's archived FTR >>> >>> first thin I see is that your logs do not contain a list of rules which >>> hit on each message. >>> >>> for example my "w.s" file has lines which look like: >>> >>> 53 /home/mc/Maildir/cur/1487823401.M695422P29583.ruler,S=7602,W=7747:2, >>> ADVANCE_FEE_2_NEW_MONEY,ADVANCE_FEE_3_NEW,ADVANCE_FEE_3_NEW_MONEY,ADVANCE_FEE_4_NEW,ADVANCE_FEE_4_NEW_MONEY,ADVANCE_FEE_5_NEW,ADVANCE_FEE_5_NEW_MONEY,AXB_XM2600,AXB_XMAILER_MIMEOLE_OL_024C2,CM_XRCVD_VOOZER4,DEAR_WINNER,FORGED_MUA_OUTLOOK,FROM_MISSPACED,FROM_MISSP_MSFT,FROM_MISSP_REPLYTO,FROM_MISSP_URI,FSL_419_FP1,FSL_CTYPE_WIN1251,FSL_MISSP_REPLYTO,FSL_NEW_HELO_USER,FSL_RCVD_USER,FSL_UA,FSL_XM_419,HK_NAME_MR_MRS,LOTS_OF_MONEY,LOTTO_DEPT,MONEY_FRAUD_3,MONEY_FRAUD_5,MONEY_FROM_MISSP,MSOE_MID_WRONG_CASE,NSL_RCVD_HELO_USER,TO_NO_BRKTS_FROM_MSSP,T_AXB_XM2600,T_BIG_HEADERS_5K,T_CM_XRCVD_VOOZER4,T_FSL_FREEMAIL_1,T_FSL_HELO_NON_FQDN_2,T_HK_MUCHMONEY,T_LOTTO_AGENT,T_SINGLE_HEADER_1K,T_TO_NO_BRKTS_MSFT,__419_FROM_SIG,__ADVANCE_FEE_2_NEW,__ADVANCE_FEE_2_NEW_MONEY,__ADVANCE_FEE_3_NEW,__ADVANCE_FEE_3_NEW_MONEY,__ADVANCE_FEE_4_NEW,__ADVANCE_FEE_4_NEW_MONEY,__ADVANCE_FEE_5_NEW,__ADVANCE_FEE_5_NEW_MONEY,__AFF_LOTTERY,__ANY_OUTLOOK_MUA,__ANY_TEXT_ATTACH,__ANY_TEXT_ATTACH_DOC,__AXB_MO_OL_024C2,__AXB_MO_OL_D8ACC,__AXB_XM_OL_024C2,__AXB_XM_OL_080C4,__AXB_XM_OL_424A6,__AXB_XM_OL_B9D6C,__BOUNCE_RPATH_NULL,__CONGRADULAT,__CT,__CTE,__CTYPE_CHARSET_QUOTED,__CT_TEXT_PLAIN,__DOS_HAS_ANY_URI,__DOS_RCVD_THU,__DOS_RCVD_WED,__DOS_RELAYED_EXT,__FB_CONGRADS,__FH_HAS_XMSMAIL,__FH_HAS_XPRIORITY,__FORGED_OE,__FRAUD_DBI,__FRAUD_FCW,__FROM_FULL_NAME,__FROM_MISSPACED,__FROM_MISSP_REPLYTO,__FROM_MISSP_URI,__FROM_RUNON,__FSL_419_1,__FSL_419_2,__FSL_419_3,__FSL_419_4,__FSL_419_5,__FSL_HELO_USER_1,__FSL_HELO_USER_3,__FSL_UA_2,__HAS_ANY_EMAIL,__HAS_ANY_URI,__HAS_DATE,__HAS_FROM,__HAS_MESSAGE_ID,__HAS_MIMEOLE,__HAS_MSGID,__HAS_MSMAIL_PRI,__HAS_RCVD,__HAS_REPLY_TO,__HAS_SUBJECT,__HAS_TO,__HAS_URI,__HAS_XMAIL,__HAS_X_MAILER,__HK_NAME_MR_MRS,__LAST_EXTERNAL_RELAY_NO_AUTH,__LAST_UNTRUSTED_RELAY_NO_AUTH,__LOTSA_MONEY_04,__LOTTO_ADMITS,__LOTTO_ADMITS_1,__LOTTO_WIN_01,__MIMEOLE_MS,__MIME_VERSION,__MISSING_REF,__MISSING_REPLY,__MISSING_THREAD,__MONEY_FRAUD,__MONEY_FRAUD_3,__MONEY_FRAUD_5,__MONEY_LOTTERY,__MSGID_OK_DIGITS,__MSOE_MID_WRONG_CASE,__M_NOTIFIC,__NAKED_TO,__NONEMPTY_BODY,__NO_INR_YES_REF,__OE_MUA,__RCVD_VIA_APNIC_E,__RCVD_VIA_ARIN_E,__RCVD_VIA_RIPE,__RCVD_VIA_RIPE_E,__RDNS_SHORT,__REPLYTO_EXISTS,__REPLY_FREEMAIL,__SANE_MSGID,__SARE_FRAUD_BARRISTER,__SINGLE_HEADER_1K,__SUBJ_2UPPER,__SUBJ_4LOWER,__SUBJ_HAS_WORDS,__SUBJ_NOT_SHORT,__TOCC_EXISTS,__TO_NO_ARROWS_R,__TO_NO_BRKTS_FROM_MSSP,__TO_NO_BRKTS_FROM_RUNON,__TO_NO_BRKTS_MSFT,__TO_NO_BRKTS_NOTLIST,__TVD_BODY,__TVD_MIME_ATT_TP,__URI_MAILTO,__XM_MSOE6,__XM_MS_IN_GENERAL,__XM_OUTLOOK_EXPRESS,__XPRIO,__YOU_WON,__YOU_WON_01,__YOU_WON_02,__YOU_WON_SOMTIN,__hk_million,__hk_win_1,__hk_win_5,__hk_win_6,__hk_win_b >>> >>> time=0,scantime=0,format=f,reuse=no,set=0 >>> >>> so apparently your masschecker is not seeing rules. >>> >>> I don't use --cache & --cachedir (don't remember why) - for starters >>> maybe remove >> >> I started without cache. >> >>> I have --cf='use_bayes 0' (speeds up processing) and make sure you use >>> --cf='required_score 5' >>> >>> you'll have to play with your setup till your logs show SA rule hits. >> >> Therea are no SA rules because parameter "-C=/dev/null" is set. >> >> I don't understand something. Why do I need to check >> mails-that-i-classified-as-spam-or-ham against rules? If I understand >> how creating auto rules works masscheck only dumps strings from ham and >> spam. > > the routine is supposed to create rules based from msgs in your spam > folder and needs the ham folder to counterweight against potential FPs > so for example, you don't start producing rules based on phrases in > disclaimers. > > in the log, each line starts with Y/N and a score - not sure how > necessary it is, I've always had it that way and it "works for me" > >> And next seek-phrases-in-log should create rules using found strings. >> I'm using script from svn with some changes in path. So I assumed that >> it should be more or less working:) > > a wise man once said: "to assume is not to know" > why not try avoiding modifications till you get some usefull results and > the start doing mods, one at a time.
I just modified "run" script, other perl scripts are untouched. >> Btw, I removed -C=/dev/null , rules hit are in logs but >> seek-phrases-in-log still returns no rules if I use --reqpatlength= to >> non zero value. > > I have no idea. > I'll send you a modified seek-phrases-in-log (offlist) for you to try... I've got two news, bad and good. The good news is you version of script works! Bad news is that script in official repo doesn't work. bugzilla? Thanks