Re: [Nutch-general] using nutch just for crawling, not indexing?

EM Mon, 09 May 2005 22:56:40 -0700

I don't recall the exact command, but you can use the 'inject' command
to inject an url as a starting point.

Zhou LiBing wrote:

>hi
>  I have a problem about the nutch crawler, How can I crawling the www 
>according to one or serveral specified URL? becauseＩdon't want to use 
>the 
>DMOZ data.
>    
>
> On 5/3/05, Jason Manfield <[EMAIL PROTECTED]> wrote: 
>  
>
>>We would like to use nutch just for crawling, and then index the crawled 
>>database into our proprietory datastore/index. How do we go about this? I 
>>see that nutch is a shell script, so it is possible to just crawl. Once it 
>>crawls, I suppose the crawled data is dumped into webdb. Are there exposed 
>>APIs to extract the data from webdb?
>>
>>One more catch -- our company is a .NET shop :((, so we would like to use 
>>C# to read the data of the fetched/crawled pages for further indexing.
>>
>>Ideas/suggestions?
>>
>>Any plans to have nutch for .NET (like dotLucene)?
>>
>>__________________________________________________
>>Do You Yahoo!?
>>Tired of spam? Yahoo! Mail has the best spam protection around
>>http://mail.yahoo.com
>>
>>    
>>
>
>
>
>  
>

Re: [Nutch-general] using nutch just for crawling, not indexing?

Reply via email to