Hi, I have problem when I am using black-white list url filtering. I have two 
directiory for filtering
called NegativeURLS and PositiveURLS

*****************************************************************************************
in NegativeURLS, I have
www.hurriyet.com.tr

in PostiveURLS, I have www.milliyet.com.tr

*****************************************************************************************
In the input directory for Crawl operation, I have
www.hurriyet.com.tr
www.milliyet.com.tr

I run the following commands from shell.

$ ./nutch org.apache.nutch.crawl.bw.BWInjector bwdb ~/URL/PositiveURLS/ -white

$ ./nutch org.apache.nutch.crawl.bw.BWInjector bwdb ~/URL/NegativeURLS/ -black

Then I run inject,generate and Fetch, After that I run following
$ ./nutch org.apache.nutch.crawl.bw.BWUpdateDb <crawldb> bwdb 
~/trace/output/segments/20060522115951/

Finally I run GenericReader and I print the output, it contains the URLs that 
are in the blacklist,
what can be the problem?




Reply via email to