We would like to use nutch just for crawling, and then index the crawled database into our proprietory datastore/index. How do we go about this? I see that nutch is a shell script, so it is possible to just crawl. Once it crawls, I suppose the crawled data is dumped into webdb. Are there exposed APIs to extract the data from webdb? One more catch -- our company is a .NET shop :((, so we would like to use C# to read the data of the fetched/crawled pages for further indexing. Ideas/suggestions? Any plans to have nutch for .NET (like dotLucene)?
__________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
