*This message was transferred with a trial version of CommuniGate(tm) Pro*
Use this command line to inject url's to your existing db:
nutch inject db -urlfile sites.txt

Work's for me :)




-----Message d'origine-----
De : Ian Reardon [mailto:[EMAIL PROTECTED]
Envoy� : mercredi 11 mai 2005 00:02
� : [email protected]
Objet : Crawl some sites

 I would like to crawl some specific sites with nutch for content. I
will be physicaly looking for sites all the time and would like to add
them to my index on a regular basis.  So say I look around for sites to
crawl and say add 1 or 2 a week.  Can anyone psudo walk through this
with me?

I crawled some sites with nutch by creating a flat file of URL's and
then ran the crawl command, it created the directories/db's but I tried
to add a new site after the crawl but I got an error about directory or
DB already exists.  Do I have to recrawl all my content every time I add
something?? So say delete the folder, add the new site to my flat file
and crawl them all over again?  Thanks.



Reply via email to