Hello: I am currently trying to resolve degrees of seperation between a given seed url and a discovered link within the crawl.
Let me give an example I seed a crawl with http://lucene.apache.org/; now a discovered link one crawl depth away is http://forrest.apache.org/ and so on... I figure I have two options to analyze this data: 1. During crawl capture this degrees of seperation via some nutch plugin or aspect. This is not ideal as keeping track of metadata for a given domain relationship is extremely difficult to manage as its not a true linked list or tree but more of a web. 2. After a crawl; *somehow* analyze the lucene or nutch index JIT or maybe using some kind of filter to retrieve the degrees of seperation from a given domain. If any has any input; it would be greatly appreciated. ...Kevin Bacon... Cheers, Christiaan Veerman IT Consultant KnowledgeStorm, Inc. Direct: 678-597-5941 Cell: 404-771-6126 Fax: 678-597-5917 mailto:[EMAIL PROTECTED] 2520 Northwinds Parkway, Suite 600 Alpharetta, Georgia 30004 www.knowledgestorm.com <http://www.knowledgestorm.com/> KnowledgeStorm - Reach. Search. Results. ****DISCLAIMER The information contained in this e-mail and attachments, if any, is confidential and may be subject to legal privilege. If you are not the intended recipient, you must not use, copy, distribute or disclose the e-mail and its attachment, or any part of its content or take any action in reliance of it. If you have received this e-mail in error, please e-mail the message back to the sender by replying and then deleting it. We cannot accept responsibility for loss or damage arising from the use of this e-mail or attachments, and recommend that you subject these to your virus checking procedures prior to use
