What is the correct way to verify a pattern using URLFilterChecker after adding 
it to conf/regex-urlfilter.txt ? I know I’ll need rerun ant to get the conf 
change into the mapreduce job when the pattern excludes as I intend.

To conf/regex-urlfilter.txt  before my whitelist I added:
-.*cabinetobituaries/.*

At a command prompt I run:
runtime/deploy/bin/nutch org.apache.nutch.net.URLFilterChecker -allCombined

The output says "Checking combination of all URLFilters available” then I press 
enter and get the following

15/02/09 19:28:25 INFO plugin.PluginRepository: Plugins: looking in: 
/data/hadoop/hadoop-unjar490367744495018237/classes/plugins
15/02/09 19:28:25 INFO plugin.PluginRepository: Plugin Auto-activation mode: 
[true]
15/02/09 19:28:25 INFO plugin.PluginRepository: Registered Plugins:
<--- SNIP --->
15/02/09 19:28:25 INFO plugin.PluginRepository: Nutch URL Filter 
(org.apache.nutch.net.URLFilter)
<--- SNIP --->
15/02/09 19:28:25 INFO conf.Configuration: found resource regex-urlfilter.txt 
at file:/data/hadoop/hadoop-unjar490367744495018237/regex-urlfilter.txt
-

If I enter 
-http://www.cabinet.com/cabinet/cabinetobituaries/1054824-435/robert-g.-judy.html
 then press enter the output is
--http://www.cabinet.com/cabinet/cabinetobituaries/1054824-435/robert-g.-judy.html

If I press enter without providing a URL then the output is (a blank line 
followed by a dash)

-

I’m not sure what to expect as a response or if that was passing or failure

Scott Lundgren
Software Engineer
(704) 973-7388
slundg...@qsfllc.com<mailto:slundg...@qsfllc.com>

QuietStream Financial, LLC<http://www.quietstreamfinancial.com>
11121 Carmel Commons Boulevard | Suite 250
Charlotte, North Carolina 28226

Our Portfolio of Commercial Real Estate Solutions:
•        <http://www.defeasewithease.com> Commercial 
Defeasance<http://www.defeasewithease.com/> (Defease With Ease®)
•        Fairview Real Estate Solutions<http://www.fairviewres.com/>
•        Great River Mortgage Capital<http://www.greatrivermortgagecapital.com/>
•        Tax Credit Asset Management<http://www.tcamre.com/>
•        Radian Generation<http://www.radiangeneration.com/>
•        EntityKeeper<http://www.entitykeeper.com/>™
•        Crowd With Ease<http://www.crowdwithease.com>™
•        FullCapitalStack<http://www.fullcapitalstack.com>™
•        CrowdRabbit<http://www.crowdrabbit.com>™

Reply via email to