Hi All This is a pretty basic question - apologies in advance (but haven't been able to find an answer).
If I have a web site/server that contains content (both html and pdf/word/excell documents etc) that is contantly changing, ie new files are being added, existing files deleted and updated etc. How does Nutch deal with this? I have setup Nutch with Solr and see a digest field for each file in the Solr index. This seesm to be some form of hash. However when I run the Nutch trawl it only seems to add new files. Does it have some mechanism for detecting deleted and updated files. How does Nutch deal with sites that are constantly changing? How do people trigger their crawls on such sites? Apologies if this all a bit vague, but I'm struggling to decide the best way to explain what I'm trying to achieve without a better understanding of the underying processes. regards Paul

