I am only interested in searching across a corpus of injected domains. The problem with this, however, is that two of the most valuable elements towards achieving ranking accuracy won't be there: incoming anchor text and the authority level inherited from sites linking to it.
I can get backlink information for each url I'm interested in from Yahoo Site Explorer or Alexa's set of web search tools. If I started the crawl at these URLs, I would capture the anchor text and authority levels of the pages I'm really interested in - but I would then have to remove the pages I'm not interested in. I'm wondering if anyone has ever tried to do what I'm trying to do - and if so, please share any tips/ideas that might make the process a little less painful. Thanks! :) -- View this message in context: http://www.nabble.com/Mimicking-Anchor-Text-Relevance---Authority-On-a-Focused-Crawl-tf4668564.html#a13336338 Sent from the Nutch - User mailing list archive at Nabble.com.
