Hello,
Sorry if this topic has arisen before, but we're trying to enhance Nutch to accept on-the-fly injections of new content. In other words, we have a crawler that feeds "page injection" commands to an HTTP server - this server, in turn, adds the URL to the crawldb (if necessary), generates the fetcher output, metadata, parsed content, etc. - then reindexes. We're in the process of making this work. Is this feasible on a large scale? :-) The business requirement behind this is: company A has a search engine; company B pays company A lots of money to include their content; company B expects injected content to be available immediately. I'm looking for constructive advice as to how to proceed - I'd be happy to do the work to make this all happen, just need some guidance. Thanks, DaveG
