Otis
 
Thanks for the pointer.
 
I suppose the Fetcher.java is the core guy reading contents from the URLs and 
dumping it to different directories in the filesystem (via Fetcher.outputPage), 
right? In that case, can this be intercepted (via my code changes locally) to 
dump the extracted contents into our proprietary system? Are the segments 
created as part of the Fetcher or before the call to the Fetcher?
 
Thanks
 
Jason


[EMAIL PROTECTED] wrote:
Jason - this is perfectly doable -- I do this for my social bookmarking
project, Simpy.com 

I think people tend to run Nutch using the nutch shell script that
comes with Nutch, but you can really call the Fetcher Java class
directly and programmatically yourself, as it has the main method. You
can do the same with the SegmentMergeTool. So, if you can write a Java
app, just call Nutch's Java classes the same way that the shell script
does.

I can't help you with reading Nutch's files with C#, but the source is
there, so you should be able to write file readers in C#.

Otis
____________________________________________________________________
Simpy -- simpy.com -- tags, social bookmarks, personal search engine



--- Jason Manfield wrote:
> We would like to use nutch just for crawling, and then index the
> crawled database into our proprietory datastore/index. How do we go
> about this? I see that nutch is a shell script, so it is possible to
> just crawl. Once it crawls, I suppose the crawled data is dumped into
> webdb. Are there exposed APIs to extract the data from webdb? 
> 
> One more catch -- our company is a .NET shop :((, so we would like to
> use C# to read the data of the fetched/crawled pages for further
> indexing.
> 
> Ideas/suggestions?
> 
> Any plans to have nutch for .NET (like dotLucene)?
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Reply via email to