date:20090822

Re: crawldb not updating

2009-08-22 Thread reinhard schwab

either you have no seed urls or your filter is to restrictive. also take care that nutch crawl will use conf/crawl-urlfilter.txt by default and not conf/regex-urlfilter.txt! Aditya Sakhuja schrieb: I am having issues getting the data injected into the crawldb. I have set the filter in the

Re: Nutch.SIGNATURE_KEY

2009-08-22 Thread Andrzej Bialecki

Paul Tomblin wrote: On Wed, Aug 19, 2009 at 1:00 PM, Ken Kruglerkkrugler_li...@transpac.com wrote: Another question: is Nutch smart enough to use that signature to determine that, say, http://xcski.com/ and http://xcski.com/index.html are the same page? I believe the hashes would be the same