Hi all,

I've been tasked with looking into this and am not a coder - that said, Nutch is doing great and the bean counters have asked me to look into adding sponsored link results and I'm wondering how best to add this.

It would be nice to utilize the Nutch engine to come up with the pages versus just doing a lookup on words and results in a flat file but the key word data could change daily (hourly) and would need to be able to be hand entered (or automated) as people sign up (re-index is not really an option). I'm not sure this would fly within the main Nutch segments and index, but I could see maybe a separate index or possibly adding a flag to the existing data but I've not seen any easy to use tools to change/update/insert records into what is already there (yes Luke on the index but that does not touch the segment data, right?). I don't want to change existing searched data and I don't see an issue with having duplicate results (sponsored up top and existing entry down below somewhere) but it would be more elegant to not have that occur. I also see issues in a simple flat file look up as a multiple word search is best handled inside Nutch to "score" the results versus having to do something similar in the sponsored results. I can see the need to control the summary text displayed and also pass thru any codes in the URL which are currently being stripped during the main crawl/index cycle. I also see issues with seriously customizing the internals as they would have to be maintained as Nutch itself is updated....

If anyone has looked at this and has at least some ideas on how best to do this let me know. I need to come up with a preliminary estimate before I can engage and pay the coders to make this happen so if there are any easy or "best practices" ways on doing this any help/pointers would be appreciated....

--
rp



Reply via email to