(Scott just letting you know in case you want to improve your already great script.)
I just posted another update and noticed one key thing, there were no ?: in the regex! As in (?:a|b|c|d) OUCH! I added them in so that should definetley speed things up for people! I'm actually running BE again on my server after a small memory upgrade and this tweak. Now, for the heck of it I pulled out all the "\.com"'s from the file. It was 100k. What I'm looking for is a script to try to fix this. Check everyline starting with URI, if all that line includes is the .com TLD then change that rule so only one \.com show up at the end. Does anyone understand what I mean AND know how to do this? Or even more advanced, group the whole line of regex by ending TLD instead of alpha order. The code we are using to make the rule is like one step. I wish I could make it 2-4 steps. What do I mean? It could have written these more streamlined: /\bhomeg(?:ain\.com|ain\.biz|ain\.net|un\.com)\b/i It should have SUB sections, but our parser is only one level deep :( /\bhomel(?:oanace\.com|andunited\.com|anddefensejournal\.com|anddefenseradio \.com|andsecurityresearch\.com|ead\.net|essprelates\.com|essteens\.com)\b/i Could be written /\bhomel(?:oanace\.com|and(?:united|defensejournal|defenseradio|securityrese arch)\.com|e(?:ad\.net|(?:ss(prelates|teens)))\.com)\b/i I bet the file could be half the size it is now. But I don't have the script experience to do this. So anyone who can improve the logic would be a great help. And I still can't thank Scott enough for giving me this script in the first place. Without which BigEvil would have died with the start of ws.surbl.org. Thanks Scott! Scott's source: http://www.cs.rice.edu/~scrosby/datamining/src/prefixStringFactor/prefixStri ngFactor.ml Chris Santerre System Admin and SARE Ninja http://www.rulesemporium.com http://www.surbl.org 'It is not the strongest of the species that survives, not the most intelligent, but the one most responsive to change.' Charles Darwin