[RD] Bigevil update info, and more help request.

Chris Santerre 20 Jul 2004 21:28:07 -0000

(Scott just letting you know in case you want to improve your already great
script.)


I just posted another update and noticed one key thing, there were no ?: in
the regex! As in 

(?:a|b|c|d)

OUCH! I added them in so that should definetley speed things up for people!
I'm actually running BE again on my server after a small memory upgrade and
this tweak. 

Now, for the heck of it I pulled out all the "\.com"'s from the file. It was
100k. What I'm looking for is a script to try to fix this. Check everyline
starting with URI, if all that line includes is the .com TLD then change
that rule so only one \.com show up at the end. Does anyone understand what
I mean AND know how to do this? Or even more advanced, group the whole line
of regex by ending TLD instead of alpha order. 

The code we are using to make the rule is like one step. I wish I could make
it 2-4 steps. What do I mean? It could have written these more streamlined:

/\bhomeg(?:ain\.com|ain\.biz|ain\.net|un\.com)\b/i

It should have SUB sections, but our parser is only one level deep :(

/\bhomel(?:oanace\.com|andunited\.com|anddefensejournal\.com|anddefenseradio
\.com|andsecurityresearch\.com|ead\.net|essprelates\.com|essteens\.com)\b/i

Could be written

/\bhomel(?:oanace\.com|and(?:united|defensejournal|defenseradio|securityrese
arch)\.com|e(?:ad\.net|(?:ss(prelates|teens)))\.com)\b/i

I bet the file could be half the size it is now. But I don't have the script
experience to do this. So anyone who can improve the logic would be a great
help. 

And I still can't thank Scott enough for giving me this script in the first
place. Without which BigEvil would have died with the start of ws.surbl.org.
Thanks Scott!

Scott's source:
http://www.cs.rice.edu/~scrosby/datamining/src/prefixStringFactor/prefixStri
ngFactor.ml

Chris Santerre 
System Admin and SARE Ninja
http://www.rulesemporium.com
http://www.surbl.org
'It is not the strongest of the species that survives,
not the most intelligent, but the one most responsive to change.'
Charles Darwin

[RD] Bigevil update info, and more help request.

Reply via email to