Hi, I have problem when I am using black-white list url filtering. I have two
directiory for filtering
called NegativeURLS and PositiveURLS
*****************************************************************************************
in NegativeURLS, I have
www.hurriyet.com.tr
in PostiveURLS, I have
www.milliyet.com.tr
*****************************************************************************************
In the input directory for Crawl operation, I have
www.hurriyet.com.tr
www.milliyet.com.tr
I run the following commands from shell.
$ ./nutch org.apache.nutch.crawl.bw.BWInjector bwdb ~/URL/PositiveURLS/ -white
$ ./nutch org.apache.nutch.crawl.bw.BWInjector bwdb ~/URL/NegativeURLS/ -black
Then I run inject,generate and Fetch, After that I run following
$ ./nutch org.apache.nutch.crawl.bw.BWUpdateDb <crawldb> bwdb
~/trace/output/segments/20060522115951/
Finally I run GenericReader and I print the output, it contains the URLs that
are in the blacklist,
what can be the problem?