Hi All, is there any way to extract the outlinks of particular webpage/URL? I have had a look the LinkDBReader but this will only give me a listing of pages that link to the page in question. Any ideas ? I have been having a look in the segments directory and have been trying to read/parse the files using Hadoops SequenceFile.Reader but haven't had much luck getting the format right. Is there any documentation on this? My intuition tells that nutch probably does store the outlinks of a URL somewhere but its hard to tell where.
Any ideas appreciated.
