Dave, you could think about running a separate crawler to handle these ad-hoc 
requests, perform the crawl, generate the index, then merge with the "live" 
index. This will result in a shorter turn-around time for the paying customers 
anyhow..

kelvin

On Sat, 8 Apr 2006 16:32:30 -0400, Goldschmidt, Dave wrote:
> Hello,
>
>
> Sorry if this topic has arisen before, but we're trying to enhance
> Nutch to accept on-the-fly injections of new content.  In other
> words, we have a crawler that feeds "page injection" commands to an
> HTTP server - this server, in turn, adds the URL to the crawldb (if
> necessary), generates the fetcher output, metadata, parsed content,
> etc. - then reindexes. We're in the process of making this work.
>
>
> Is this feasible on a large scale?  :-)   The business requirement
> behind this is: company A has a search engine; company B pays
> company A lots of money to include their content; company B expects
> injected content to be available immediately.
>
>
> I'm looking for constructive advice as to how to proceed - I'd be
> happy to do the work to make this all happen, just need some
> guidance.
>
>
> Thanks,
>
> DaveG


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to