I would like to crawl some specific sites with nutch for content. I will be physicaly looking for sites all the time and would like to add them to my index on a regular basis. So say I look around for sites to crawl and say add 1 or 2 a week. Can anyone psudo walk through this with me?
I crawled some sites with nutch by creating a flat file of URL's and then ran the crawl command, it created the directories/db's but I tried to add a new site after the crawl but I got an error about directory or DB already exists. Do I have to recrawl all my content every time I add something?? So say delete the folder, add the new site to my flat file and crawl them all over again? Thanks.
