> Merijn, > > I patched the generate-new-scores.sh locally on sa-vm1 using your patch > file with a slight adjustment. I changed the copied file name to > "72_active_before_grep.cf" just to make it a little more obvious. We > will see how it looks tomorrow in the tmp working area on sa-vm1 and I > will reply with the results.
Nice, might be good to have this extra debugging info available for now. This request was made before I knew about the language lines, so hopefully won't really need this now. > > I am not seeing how the 72_active.cf file is generated before line 200 > of generate-new-scores.sh. I was going to go back a few days before > Kevin's commit to remove the other languages from MILLION_USD to perform > the same grep that you did. I wanted to do this on the sa-vm1 server to > figure out how to fix the grep properly so Kevin can put back those > other languages. Check bugzilla bug 7497 I put in an instruction on how to reproduce/test. https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7497 > > Dave > > > > From: Merijn van den Kroonenberg <mer...@web2all.nl> > Sent: Wednesday, November 1, 2017 8:52 AM > To: David Jones > Subject: Eureka: truncation of 72_active.cf > > Hi David, > > After backtracking the scripts while verifying generated files I finally > came very close to the cause of the truncation issue. > > The reason why the scoreset files and 72_scores.cf is truncated after > MILLION_USD is because the rules/72_active.cf is truncated. > > This file is generated by build/mkrules (triggered from Makefile.pl) > > It ends like this: > > body MILLION_USD /Million\b.{0,40}\b(?:United States? > Dollars?|USD)/i > describe MILLION_USD Talks about millions of dollars > #score MILLION_USD 2 > Binary file rules/72_active.cf matches > > As you can see, this last line is a typical grep message. So it was not > hard to track it to the script causing this: > > ./masses/rule-update-score-gen/generate-new-scores.sh:202:grep -v ^score > rules/72_active.cf > rules/72_active.cf-scoreless > ./masses/rule-update-score-gen/generate-new-scores.sh:203:mv -f > rules/72_active.cf-scoreless rules/72_active.cf > > My theory is that grep encounters too many non-text characters in > rules/72_active.cf so its deciding its a binary file after all and stops > grepping the rest of the file. > > As you can see in ./masses/rule-update-score-gen/generate-new-scores.sh > the original 72_active.cf is overwritten so I cannot see what is > actually in there that causes grep to panic. > > I think if we patch scripts to make a copy right after mkrules runs, we > will be able to see or test why grep chokes. > > I attached a patch file with proposal for debugging. > > We are very close to the problem now I think. > > Met vriendelijke groet, > > Merijn van den Kroonenberg > > Web2All B.V. > Gulickstraat 17 > 5931 LA Tegelen > Tel. +31 475 775511 > Fax. +31 475 338290 > > mer...@web2all.nl | www.web2all.nl > >