Degrees of Seperation

Veerman, Christiaan Fri, 26 May 2006 05:33:10 -0700

Hello:

I am currently trying to resolve degrees of seperation between a given
seed url and a discovered link within the crawl.


Let me give an example I seed a crawl with http://lucene.apache.org/;
now a discovered link one crawl depth away is http://forrest.apache.org/
and so on...

I figure I have two options to analyze this data:

1. During crawl capture this degrees of seperation via some nutch plugin
or aspect. This is not ideal as keeping track of metadata for a given
domain relationship is extremely difficult to manage as its not a true
linked list or tree but more of a web.

2. After a crawl; *somehow* analyze the lucene or nutch index JIT or
maybe using some kind of filter to retrieve the degrees of seperation
from a given domain.

If any has any input; it would be greatly appreciated.


...Kevin Bacon...


Cheers,

Christiaan Veerman
IT Consultant
KnowledgeStorm, Inc.
Direct: 678-597-5941
Cell: 404-771-6126
Fax: 678-597-5917
mailto:[EMAIL PROTECTED]
2520 Northwinds Parkway, Suite 600
Alpharetta, Georgia 30004
www.knowledgestorm.com <http://www.knowledgestorm.com/> 
KnowledgeStorm - Reach. Search. Results.

****DISCLAIMER
The information contained in this e-mail and attachments, if any, is 
confidential and may be subject to legal privilege.  If you are not the 
intended recipient, you must not use, copy, distribute or disclose the e-mail 
and its attachment, or any part of its content or take any action in reliance 
of it.  If you have received this e-mail in error, please e-mail the message 
back to the sender by replying and then deleting it.  We cannot accept 
responsibility for loss or damage arising from the use of this e-mail or 
attachments, and recommend that you subject these to your virus checking 
procedures prior to use

Degrees of Seperation

Reply via email to