either you have no seed urls or your filter is to restrictive.
also take care that nutch crawl will use conf/crawl-urlfilter.txt by
default and
not conf/regex-urlfilter.txt!
Aditya Sakhuja schrieb:
I am having issues getting the data injected into the crawldb. I have set
the filter in the
Paul Tomblin wrote:
On Wed, Aug 19, 2009 at 1:00 PM, Ken Kruglerkkrugler_li...@transpac.com wrote:
Another question: is Nutch smart enough to use that signature to
determine that, say, http://xcski.com/ and http://xcski.com/index.html
are the same page?
I believe the hashes would be the same