*This message was transferred with a trial version of CommuniGate(tm) Pro* Use this command line to inject url's to your existing db: nutch inject db -urlfile sites.txt
Work's for me :) -----Message d'origine----- De : Ian Reardon [mailto:[EMAIL PROTECTED] Envoy� : mercredi 11 mai 2005 00:02 � : [email protected] Objet : Crawl some sites I would like to crawl some specific sites with nutch for content. I will be physicaly looking for sites all the time and would like to add them to my index on a regular basis. So say I look around for sites to crawl and say add 1 or 2 a week. Can anyone psudo walk through this with me? I crawled some sites with nutch by creating a flat file of URL's and then ran the crawl command, it created the directories/db's but I tried to add a new site after the crawl but I got an error about directory or DB already exists. Do I have to recrawl all my content every time I add something?? So say delete the folder, add the new site to my flat file and crawl them all over again? Thanks.
