Dear all, I'm completely new to Apache Nutch, I started only few days ago to use it for the first time and I was impressed from its capabilities.
I'm experiencing a little issue I hope someone can help me to fix: I configured a test instance of Apache Nutch (1.9) to crawl a news website using the following parameters: <configuration> <property> <name>http.agent.name</name> <value>NewsWatcher Agent</value> </property> <property> <name>fetcher.threads.per.queue</name> <value>50</value> <description></description> </property> <property> <name>plugin.includes</name> <value>protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)| indexer-solr|scoring-opic|urlnormalizer-(pass|regex|basic)|scoring-depth</va lue> <description></description> </property> <property> <name>db.fetch.interval.default</name> <value>300</value> <description></description> </property> </configuration> and running a cron over ./bin/crawl command every five minutes with a _maxdepth_=2 because I want to frequently update my index with only new articles published in homepage without crawling the whole site. At the first run everything is fine, but after it seems the homepage is not updated anymore. Looking at the log file it seems that the whole process is ok but I cannot see new articles, published in homepage, in my index. Looking in the crawldb with readdb command I always obtain the same signature even if the page is changed. Can anyone help me to understand how to investigate this issue? Is there something else I can check after the log file? Is there any debug option I can enable? Thanks a lot everybody in advance, Matteo

