Re: [Nutch-general] RE : Crawl some sites

Zhou LiBing Wed, 11 May 2005 17:49:28 -0700

If ï want to crawl the whole WWW but I don't use the DMOZ data,What should 
Iïïï


 On 5/12/05, Jean-Luc <[EMAIL PROTECTED]> wrote: 
> 
> *This message was transferred with a trial version of CommuniGate(tm) Pro*
> Use this command line to inject url's to your existing db:
> nutch inject db -urlfile sites.txt
> 
> Work's for me :)
> 
> -----Message d'origine-----
> De : Ian Reardon [mailto:[EMAIL PROTECTED]
> EnvoyÃ : mercredi 11 mai 2005 00:02
> Ã : [email protected]
> Objet : Crawl some sites
> 
> I would like to crawl some specific sites with nutch for content. I
> will be physicaly looking for sites all the time and would like to add
> them to my index on a regular basis. So say I look around for sites to
> crawl and say add 1 or 2 a week. Can anyone psudo walk through this
> with me?
> 
> I crawled some sites with nutch by creating a flat file of URL's and
> then ran the crawl command, it created the directories/db's but I tried
> to add a new site after the crawl but I got an error about directory or
> DB already exists. Do I have to recrawl all my content every time I add
> something?? So say delete the folder, add the new site to my flat file
> and crawl them all over again? Thanks.
> 
> -------------------------------------------------------
> This SF.Net <http://SF.Net> email is sponsored by Oracle Space Sweepstakes
> Want to be the first software developer in space?
> Enter now for the Oracle Space Sweepstakes!
> http://ads.osdn.com/?ad_ids93&alloc_id281&opclick
> _______________________________________________
> Nutch-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/nutch-general
> 



-- 
---Letter From your friend Blue at HUST CGCL---

Re: [Nutch-general] RE : Crawl some sites

Reply via email to