Lourival Júnior wrote: > Hi all! > > I have some problems with update my WebDB. I've a page, test.htm, that > has 4 > links to 4 pdf's documents. I execute the crawler then when I do this > command: > > bin/nutch readdb Mydir/db -stats > > I get this output: > > Number of pages: 5 > Number of links: 4 > > That's ok. The problem is when I add more 4 links to the test.htm. I want a > script that re crawl or update my WebDB without I have to delete Mydir > folder. I hope I am being clearly. > I found some shell scripts to do this, however it's don't do what I want. > Always I get the same number of pages and links. > > Anyone can help me?
Hi, please re-read from the mailinglist-archives as of ... hmm ... yesterday I think. You'll have to do a small modification to be able to re-inject your URL to start re-crawling it on the next run. Otherwise a page will only be re-crawled after a configurable amount of days, which is the same value also used for the PDFs. Regards, Stefan _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
