updateDNS.sh on sa-vm1.apache.org - DNS updates disabled

2017-11-08 Thread noreply

3.3.3.updates (TXT) -> \"1814560\"

File /usr/local/bin/updateDNS.disabled exists, not updating DNS.


Re: Eureka: truncation of 72_active.cf

2017-11-08 Thread Dave Jones

On 11/08/2017 02:52 AM, Merijn van den Kroonenberg wrote:

On 11/07/2017 01:07 PM, Merijn van den Kroonenberg wrote:

On 11/07/2017 10:24 AM, Merijn van den Kroonenberg wrote:

Merijn,

I patched the generate-new-scores.sh locally on sa-vm1 using your
patch
file with a slight adjustment.  I changed the copied file name to
"72_active_before_grep.cf" just to make it a little more obvious.  We
will see how it looks tomorrow in the tmp working area on sa-vm1 and
I
will reply with the results.


I looked at todays tmp/generate-new-scores/trunk-new-rules-set0 and I
do
not see this patch applied. Can you even do uncommitted changes to
code
of
masses, or will it just be removed as a fresh checkout is done each
time?



It was setup but I too noticed that it didn't create the
72_active_before_grep.cf but accidentally forgot about following up on
that.  I will check on that later this evening to get this in place
before the next run in about 10 hours.



Ok thanks, its no longer high priority, I think I can do without for
debugging now. But in the longer run we can use it to decide if we can
get
rid of the grep statement altogether (if 72_active_before_grep.cf always
equals 72_active.cf then it has no use).

I just brought it up because I wondered where that code went, but guess
you removed it when it didn't work.




Well...  My local patch is still in place.  This is lines 195-206 of the
current trunk/masses/rule-update-score-gen/generate-new-scores.sh:


What is the full path? The question is in which working copy did you apply
this change (what is it used for). Because in the working directories used
for score generation it isn't. Thats in
/usr/local/spamassassin/automc/tmp/generate-new-scores/trunk-new-rules-set0
(each set having its fully checked out working copy)
In generate-new-scores.sh the workingcopies are done each night completely
from scratch (old one removed, new one checked out). So no uncommitted
code can be used in the score generation part.



/usr/local/spamassassin/automc/svn/trunk/build/mkupdates/do-stable-update-with-scores

should be calling

/usr/local/spamassassin/automc/svn/masses/rule-update-score-gen/do-nightly-rescore-example.sh

which finally calls the locally modified and not committed script that 
is intended to create our 72_active_before_grep.cf but it's not some 
something is definitely not as I expected


/usr/local/spamassassin/automc/svn/masses/rule-update-score-gen/generate-new-scores.sh

All of those script should be checking out fresh copies of everything 
they work with into


/usr/local/spamassassin/automc/tmp




=== Line 195 ===
date
echo "[ generating active ruleset via make ]"

perl Makefile.PL < /dev/null || exit $?
make > make.out 2>&1 || exit $?

# temp debug copy (investigate truncation issue)
cp rules/72_active.cf 72_active_before_grep.cf

# strip scores from new rules so that the garescorer can set them
grep -v ^score rules/72_active.cf > rules/72_active.cf-scoreless
mv -f rules/72_active.cf-scoreless rules/72_active.cf
=== Line 206 ===

So the 72_active_before_grep.cf should exist in the tmp area the past
several nights.  Hmmm.

I updated the do-stable-update-with-score cron entry to output to a log
file here in a few hours to see if we can get some helpful logging.


Okay now I see, all code up to the generate-new-scores.sh script itself is
ran from:
/usr/local/spamassassin/automc/svn/
but everything called inside generate-new-scores.sh runs from
/usr/local/spamassassin/automc/tmp/generate-new-scores/trunk-new-rules-set0
and
/usr/local/spamassassin/automc/tmp/generate-new-scores/trunk-new-rules-set1

So I guess /usr/local/spamassassin/automc/svn/ is updated to latest, but
with the svn update command local changes are not overwritten so
everything in there you can test without committing.



Thank what I think too.


Everything run from generate-new-scores.sh however is freshly checked out
so it can only run code which has been committed.



That's how it's worked in the past but I could be wrong.  My memory 
isn't what it used to be.  :)



I guess this might have frustrated some of your debugging in the past :(



Dave








Re: Mailserver at 52.169.9.191

2017-11-08 Thread Dave Jones

On 11/08/2017 01:45 PM, Kevin A. McGrail wrote:

On 11/8/2017 10:02 AM, Dave Jones wrote:


Dave, with rules how many versions do we publish now?  Just one with 
a cname for a few versions?  Which versions?  Sorry, I can't figure 
out how to get into PowerDNS to check!




http://svn.apache.org/viewvc/spamassassin/dns/spamassassin.org?view=markup 



It appears that sa-update versions 3.4.1 and above support the CNAME.


Thanks, that's what I thought.  Everything older hasn't had new versions 
in years.  I wonder why sa-updates are still being downloaded for them.  
We might consider removing the DNS entries. Or just not care as it's so 
minor.



Regards,
KAM



I think there are old "zombie" scripts out there trying to curl fetch 
updates outside of using sa-update.  Maybe they are trying to fetch them 
into their own copy/repo or something?  I don't know for sure.


Should we remove DNS entries for all "ancient" versions.  What versions 
do we officially support currently with ruleset generation?  We are only 
testing new rulesets against 3.4.x.  Should we remove the DNS records 
for anything older than 3.3.0?  That is the oldest TXT record for modern 
rulesets in 2017.


Dave



Re: Mailserver at 52.169.9.191

2017-11-08 Thread Kevin A. McGrail

On 11/8/2017 10:02 AM, Dave Jones wrote:


Dave, with rules how many versions do we publish now?  Just one with 
a cname for a few versions?  Which versions?  Sorry, I can't figure 
out how to get into PowerDNS to check!




http://svn.apache.org/viewvc/spamassassin/dns/spamassassin.org?view=markup 



It appears that sa-update versions 3.4.1 and above support the CNAME.


Thanks, that's what I thought.  Everything older hasn't had new versions 
in years.  I wonder why sa-updates are still being downloaded for them.  
We might consider removing the DNS entries. Or just not care as it's so 
minor.



Regards,
KAM



Re: 72_scores.cf compared to the one from march 15

2017-11-08 Thread Merijn van den Kroonenberg
> Hi,
>
> When I compare the current 72_scores.cf with the one from march 15 I can
> see
> we are getting closer and closer.
> The march one has 144 lines and the current one has 108.

I have been looking at this and by backtracking I see the lock-scores
script which has a definite impact on ranges.data and by that on which
rules are used by the garescorer.

Looking at the script I remembered I had already a note about this script.
Its also in
rulesrc/sandbox/dos/new-rule-score-gen/lock-scores
Which has been changed compared to the
masses/rule-update-score-gen/lock-scores:
version which we use now.
The changes seem to be related to assigning ranges to rules if they have
scores defined in the sandboxes.

I think its likely this was also running in production in march. So I
would like to see what happens if these changes are ported to
masses/rule-update-score-gen/lock-scores. (must be committed to svn for
testing).

When I have some time I want to make a write up of which rules are
considered for score generation and what happens if scores are not
generated for rules. Probably need to have a good look at what the
intention should be, after we have updates running again.

>
> When looking at the rules which are missing, then one case stands out
> clearly:
> All rules in the march version with a score like this:
> 1.000 1.000 1.000 1.000
> Are missing from our current 72_scores.cf
> [edit: they all seem to be in active.list with a tflags publish]
>
> I will see if I can find where they get lost ;)
>
> One other rule which is still missing is RP_MATCHES_RCVD, which i could
> imagine being used in custom meta rules.
>
> So I compile a list of all rules in the March 72_scores.cf which are not
> in
> our current:
>
> AC_SPAMMY_URI_PATTERNS1
> AC_SPAMMY_URI_PATTERNS10
> AC_SPAMMY_URI_PATTERNS11
> AC_SPAMMY_URI_PATTERNS12
> AC_SPAMMY_URI_PATTERNS2
> AC_SPAMMY_URI_PATTERNS3
> AC_SPAMMY_URI_PATTERNS4
> AC_SPAMMY_URI_PATTERNS8
> AC_SPAMMY_URI_PATTERNS9
> AXB_XMAILER_MIMEOLE_OL_1ECD5
> AXB_XM_FORGED_OL2600
> BODY_EMPTY
> CANT_SEE_AD
> CN_B2B_SPAMMER
> COMMENT_GIBBERISH
> ENCRYPTED_MESSAGE
> FORM_LOW_CONTRAST
> FOUND_YOU
> FREEMAIL_DOC_PDF_BCC
> FROM_WORDY_SHORT
> FSL_HELO_BARE_IP_2
> GOOGLE_DOCS_PHISH
> GOOGLE_DOCS_PHISH_MANY
> GOOG_MALWARE_DNLD
> HDRS_LCASE
> HEXHASH_WORD
> HK_SCAM_N15
> HTML_OFF_PAGE
> LIST_PRTL_PUMPDUMP
> LIST_PRTL_SAME_USER
> LOTTO_AGENT
> LOTTO_DEPT
> LUCRATIVE
> MIME_NO_TEXT
> MONEY_LOTTERY
> MSGID_NOFQDN1
> MSM_PRIO_REPTO
> PHP_NOVER_MUA
> PHP_ORIG_SCRIPT
> PHP_SCRIPT_MUA
> PP_TOO_MUCH_UNICODE02
> PP_TOO_MUCH_UNICODE05
> PUMPDUMP
> PUMPDUMP_MULTI
> RAND_HEADER_MANY
> RP_MATCHES_RCVD
> SHARE_50_50
> SPOOFED_FREEM_REPTO_CHN
> STOCK_LOW_CONTRAST
> STOCK_TIP
> SYSADMIN
> TO_NO_BRKTS_PCNT
> TW_GIBBERISH_MANY
> UC_GIBBERISH_OBFU
> URI_DATA
> URI_OPTOUT_3LD
> XPRIO_SHORT_SUBJ
>
> Which are 57 rules, more than the difference in rulecount. This means
> there
> are also many rules in our current 72_scores.cf which are not in the march
> version.
>
> Can someone explain to me why or in which cases rules are added or removed
> from the 72_scores.cf?
>
> What I already know:
> 1) during rule promotion rules are added/removed frome active.list which
> in
> turn will add/remove them from 72_scores.cf


2) when the hitrate in corpus falls below 0.01% they are removed too it
seems. So this also depends on absolute corpus size. In this case they get
the default score. (which also sounds weird to me)

>
> A few from the above list of rules can be tracked to active.list changes
> (rule promotions) between then and now. But most are still in active.list.
>
> Cheers,
> Merijn
>
>




Re: Mailserver at 52.169.9.191

2017-11-08 Thread Dave Jones

On 11/08/2017 08:45 AM, Kevin A. McGrail wrote:




I’m very much in favor of aggregating and analyzing data, that’s 
bascially what we do at dnswl.org  :) Having said 
that, I usually don’t see that much load on our sa-update mirror, just 
a bit of bandwidth being used.




Well nice sleuthing to figure out the culprits company.  I am 
considering the issue closed with MS and they will never let us know the 
outcome due to privacy


How can we aggregate the logs?  We have a free Apache G Suite instance 
I'll mention if that sparks some ideas.  But perhaps even if I just 
agreed to send my logs to you, it would be cool to see some stats on it.


To compare, here's the same commands on a few weeks of logs on my 
server. Really surprising to see those old versions:


  26592 3.3.2"
   12120 3.3.1"
    4531 3.1.8"
    3073 3.4.1"
    1506 3.2.5"
     561 3.4.0"
     470 3.2.1"
     464 3.2.4"
     107 3.2.3"
     104 3.1.7"
  53 3.3.0"
  19 3.2.0"
  18 3.2.2"
   2 3.1.9"

Dave, with rules how many versions do we publish now?  Just one with a 
cname for a few versions?  Which versions?  Sorry, I can't figure out 
how to get into PowerDNS to check!




http://svn.apache.org/viewvc/spamassassin/dns/spamassassin.org?view=markup

It appears that sa-update versions 3.4.1 and above support the CNAME.


BTW, Matthias, can you subscribe to this list as a mirror operator, 
please? Just email sysadmins-subscr...@spamassassin.apache.org, please?



Regards,

KAM






Re: Mailserver at 52.169.9.191

2017-11-08 Thread Kevin A. McGrail




I’m very much in favor of aggregating and analyzing data, that’s 
bascially what we do at dnswl.org  :) Having said 
that, I usually don’t see that much load on our sa-update mirror, just 
a bit of bandwidth being used.




Well nice sleuthing to figure out the culprits company.  I am 
considering the issue closed with MS and they will never let us know the 
outcome due to privacy


How can we aggregate the logs?  We have a free Apache G Suite instance 
I'll mention if that sparks some ideas.  But perhaps even if I just 
agreed to send my logs to you, it would be cool to see some stats on it.


To compare, here's the same commands on a few weeks of logs on my 
server. Really surprising to see those old versions:


 26592 3.3.2"
  12120 3.3.1"
   4531 3.1.8"
   3073 3.4.1"
   1506 3.2.5"
    561 3.4.0"
    470 3.2.1"
    464 3.2.4"
    107 3.2.3"
    104 3.1.7"
 53 3.3.0"
 19 3.2.0"
 18 3.2.2"
  2 3.1.9"

Dave, with rules how many versions do we publish now?  Just one with a 
cname for a few versions?  Which versions?  Sorry, I can't figure out 
how to get into PowerDNS to check!


BTW, Matthias, can you subscribe to this list as a mirror operator, 
please? Just email sysadmins-subscr...@spamassassin.apache.org, please?



Regards,

KAM



SpamAssassin SysAdmins List - Please Join

2017-11-08 Thread Kevin A. McGrail

Hi All,

As mirror operators for SpamAssassin, would you mind joining the Apache 
SpamAssassin SysAdmin's list by emailing 
sysadmins-subscr...@spamassassin.apache.org, please?


Regards,

KAM



Re: Eureka: truncation of 72_active.cf

2017-11-08 Thread Merijn van den Kroonenberg
> On 11/07/2017 01:07 PM, Merijn van den Kroonenberg wrote:
>>> On 11/07/2017 10:24 AM, Merijn van den Kroonenberg wrote:
> Merijn,
>
> I patched the generate-new-scores.sh locally on sa-vm1 using your
> patch
> file with a slight adjustment.  I changed the copied file name to
> "72_active_before_grep.cf" just to make it a little more obvious.  We
> will see how it looks tomorrow in the tmp working area on sa-vm1 and
> I
> will reply with the results.

 I looked at todays tmp/generate-new-scores/trunk-new-rules-set0 and I
 do
 not see this patch applied. Can you even do uncommitted changes to
 code
 of
 masses, or will it just be removed as a fresh checkout is done each
 time?

>>>
>>> It was setup but I too noticed that it didn't create the
>>> 72_active_before_grep.cf but accidentally forgot about following up on
>>> that.  I will check on that later this evening to get this in place
>>> before the next run in about 10 hours.
>>>
>>
>> Ok thanks, its no longer high priority, I think I can do without for
>> debugging now. But in the longer run we can use it to decide if we can
>> get
>> rid of the grep statement altogether (if 72_active_before_grep.cf always
>> equals 72_active.cf then it has no use).
>>
>> I just brought it up because I wondered where that code went, but guess
>> you removed it when it didn't work.
>>
>>
>
> Well...  My local patch is still in place.  This is lines 195-206 of the
> current trunk/masses/rule-update-score-gen/generate-new-scores.sh:

What is the full path? The question is in which working copy did you apply
this change (what is it used for). Because in the working directories used
for score generation it isn't. Thats in
/usr/local/spamassassin/automc/tmp/generate-new-scores/trunk-new-rules-set0
(each set having its fully checked out working copy)
In generate-new-scores.sh the workingcopies are done each night completely
from scratch (old one removed, new one checked out). So no uncommitted
code can be used in the score generation part.

>
> === Line 195 ===
> date
> echo "[ generating active ruleset via make ]"
>
> perl Makefile.PL < /dev/null || exit $?
> make > make.out 2>&1 || exit $?
>
> # temp debug copy (investigate truncation issue)
> cp rules/72_active.cf 72_active_before_grep.cf
>
> # strip scores from new rules so that the garescorer can set them
> grep -v ^score rules/72_active.cf > rules/72_active.cf-scoreless
> mv -f rules/72_active.cf-scoreless rules/72_active.cf
> === Line 206 ===
>
> So the 72_active_before_grep.cf should exist in the tmp area the past
> several nights.  Hmmm.
>
> I updated the do-stable-update-with-score cron entry to output to a log
> file here in a few hours to see if we can get some helpful logging.

Okay now I see, all code up to the generate-new-scores.sh script itself is
ran from:
/usr/local/spamassassin/automc/svn/
but everything called inside generate-new-scores.sh runs from
/usr/local/spamassassin/automc/tmp/generate-new-scores/trunk-new-rules-set0
and
/usr/local/spamassassin/automc/tmp/generate-new-scores/trunk-new-rules-set1

So I guess /usr/local/spamassassin/automc/svn/ is updated to latest, but
with the svn update command local changes are not overwritten so
everything in there you can test without committing.

Everything run from generate-new-scores.sh however is freshly checked out
so it can only run code which has been committed.

I guess this might have frustrated some of your debugging in the past :(

>
> Dave
>




Cron <automc@sa-vm1> ~/svn/trunk/build/mkupdates/run_nightly | /usr/bin/tee /var/www/automc.spamassassin.org/mkupdates/mkupdates.txt

2017-11-08 Thread Cron Daemon
+ promote_active_rules
+ pwd
/usr/local/spamassassin/automc/svn/trunk
+ /usr/bin/perl build/mkupdates/listpromotable
HTTP get: http://ruleqa.spamassassin.org/1-days-ago?xml=1
HTTP get: http://ruleqa.spamassassin.org/2-days-ago?xml=1
no 'mcviewing', 'mcsubmitters' microformats on day 2
URL: http://ruleqa.spamassassin.org/2-days-ago?xml=1
+ exit 25