Thanks, Kelvin, for the idea. I'd like to streamline this further to hit the live index directly. Anyone try this or have thoughts on how this could be done? I'm able to merge content and parse data in with a live index on the fly, though I'm not so sure this approach will scale.
Other ideas? Thanks, DaveG -----Original Message----- From: Kelvin Tan [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 11, 2006 5:43 AM To: [EMAIL PROTECTED] Subject: Re: [Nutch-general] Add new content on the fly! Dave, you could think about running a separate crawler to handle these ad-hoc requests, perform the crawl, generate the index, then merge with the "live" index. This will result in a shorter turn-around time for the paying customers anyhow.. kelvin On Sat, 8 Apr 2006 16:32:30 -0400, Goldschmidt, Dave wrote: > Hello, > > > Sorry if this topic has arisen before, but we're trying to enhance > Nutch to accept on-the-fly injections of new content. In other > words, we have a crawler that feeds "page injection" commands to an > HTTP server - this server, in turn, adds the URL to the crawldb (if > necessary), generates the fetcher output, metadata, parsed content, > etc. - then reindexes. We're in the process of making this work. > > > Is this feasible on a large scale? :-) The business requirement > behind this is: company A has a search engine; company B pays > company A lots of money to include their content; company B expects > injected content to be available immediately. > > > I'm looking for constructive advice as to how to proceed - I'd be > happy to do the work to make this all happen, just need some > guidance. > > > Thanks, > > DaveG ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
