hi
I have a problem about the nutch crawler, How can I crawling the www
according to one or serveral specified URL? becauseIdon't want to use
the
DMOZ data.
On 5/3/05, Jason Manfield <[EMAIL PROTECTED]> wrote:
>
> We would like to use nutch just for crawling, and then index the crawled
> database into our proprietory datastore/index. How do we go about this? I
> see that nutch is a shell script, so it is possible to just crawl. Once it
> crawls, I suppose the crawled data is dumped into webdb. Are there exposed
> APIs to extract the data from webdb?
>
> One more catch -- our company is a .NET shop :((, so we would like to use
> C# to read the data of the fetched/crawled pages for further indexing.
>
> Ideas/suggestions?
>
> Any plans to have nutch for .NET (like dotLucene)?
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
--
---Letter From your friend Blue at HUST CGCL---