If ï want to crawl the whole WWW but I don't use the DMOZ data,What should Iïïï
On 5/12/05, Jean-Luc <[EMAIL PROTECTED]> wrote: > > *This message was transferred with a trial version of CommuniGate(tm) Pro* > Use this command line to inject url's to your existing db: > nutch inject db -urlfile sites.txt > > Work's for me :) > > -----Message d'origine----- > De : Ian Reardon [mailto:[EMAIL PROTECTED] > Envoyà : mercredi 11 mai 2005 00:02 > à : [email protected] > Objet : Crawl some sites > > I would like to crawl some specific sites with nutch for content. I > will be physicaly looking for sites all the time and would like to add > them to my index on a regular basis. So say I look around for sites to > crawl and say add 1 or 2 a week. Can anyone psudo walk through this > with me? > > I crawled some sites with nutch by creating a flat file of URL's and > then ran the crawl command, it created the directories/db's but I tried > to add a new site after the crawl but I got an error about directory or > DB already exists. Do I have to recrawl all my content every time I add > something?? So say delete the folder, add the new site to my flat file > and crawl them all over again? Thanks. > > ------------------------------------------------------- > This SF.Net <http://SF.Net> email is sponsored by Oracle Space Sweepstakes > Want to be the first software developer in space? > Enter now for the Oracle Space Sweepstakes! > http://ads.osdn.com/?ad_ids93&alloc_id281&opclick > _______________________________________________ > Nutch-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/nutch-general > -- ---Letter From your friend Blue at HUST CGCL---
