Hi everyone, I am kind of a n00b to nutch. So here are a few questions for you to answer (or your amusement)
1. Duing a nutch crawl and subsequent crawls, does the crawler always pick up new links on a page or just checks for old ones? For eg. if i set 20 as the limit of number of links on a page and 5 as the depth. The first crawl gets me 20 links on a page. What does a subsequent crawl of the same page get me? Does it just checks for the first 20 links and sees if they have been crawled or does it get me new links? 2. I know you cannot re-index a page that has once been crawled. Yet I cannot find out why when I put in links that have been crawled earlier with certain changes in the meta-data show any change in the index (I am picking up content, description and title). I have set the max time between subsequent re-fetches as 1 day. 3. i am using patch 963 for deleting 404-pages. Yet only few get deleted from the index. Is it because the pages initially were picked up through a normal crawl, but i am forcing links into url.txt that need to be deleted. Thanks and Regards, Tamanjit Bindra

