Hello,

 

Sorry if this topic has arisen before, but we're trying to enhance Nutch
to accept on-the-fly injections of new content.  In other words, we have
a crawler that feeds "page injection" commands to an HTTP server - this
server, in turn, adds the URL to the crawldb (if necessary), generates
the fetcher output, metadata, parsed content, etc. - then reindexes.
We're in the process of making this work.

 

Is this feasible on a large scale?  :-)   The business requirement
behind this is: company A has a search engine; company B pays company A
lots of money to include their content; company B expects injected
content to be available immediately.

 

I'm looking for constructive advice as to how to proceed - I'd be happy
to do the work to make this all happen, just need some guidance.

 

Thanks,

DaveG

Reply via email to