Dave, you could think about running a separate crawler to handle these ad-hoc requests, perform the crawl, generate the index, then merge with the "live" index. This will result in a shorter turn-around time for the paying customers anyhow..
kelvin On Sat, 8 Apr 2006 16:32:30 -0400, Goldschmidt, Dave wrote: > Hello, > > > Sorry if this topic has arisen before, but we're trying to enhance > Nutch to accept on-the-fly injections of new content. In other > words, we have a crawler that feeds "page injection" commands to an > HTTP server - this server, in turn, adds the URL to the crawldb (if > necessary), generates the fetcher output, metadata, parsed content, > etc. - then reindexes. We're in the process of making this work. > > > Is this feasible on a large scale? :-) The business requirement > behind this is: company A has a search engine; company B pays > company A lots of money to include their content; company B expects > injected content to be available immediately. > > > I'm looking for constructive advice as to how to proceed - I'd be > happy to do the work to make this all happen, just need some > guidance. > > > Thanks, > > DaveG ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
