I changed my plugin.includes from urlfilter-regex To urlfilter-automaton And left everything else the same. Using the default delivered automaton-urlfilter.txt with a few addition excludes like: -http://(@\.)*twitter\.com(/@)*
Now the hadoop log has a larger number of WARN opic.OPICScoringFilter - java.net.MalformedURLException: no protocol: Does urlfilter-automaton allow in something the urlfilter-regex does not? How can I filter out the MalformedURLException Thanks Brad

