The Nutch Crawler and the Web Link Graph

John Casey Tue, 15 Aug 2006 05:00:47 -0700

Hi All, is there any way to extract the outlinks of particular webpage/URL?
I have had a look the LinkDBReader but this will only give me a listing of
pages that link to the page in question. Any ideas ? I have been having a
look in the segments directory and have been trying to read/parse the files
using Hadoops SequenceFile.Reader but haven't had much luck getting the
format right. Is there any documentation on this? My intuition tells that
nutch probably does store the outlinks of a URL somewhere but its hard to
tell where.


Any ideas appreciated.

The Nutch Crawler and the Web Link Graph

Reply via email to