[Nutch-general] The Nutch Crawler and the Web Link Graph

John Casey Tue, 15 Aug 2006 05:00:54 -0700

Hi All, is there any way to extract the outlinks of particular webpage/URL?
I have had a look the LinkDBReader but this will only give me a listing of
pages that link to the page in question. Any ideas ? I have been having a
look in the segments directory and have been trying to read/parse the files
using Hadoops SequenceFile.Reader but haven't had much luck getting the
format right. Is there any documentation on this? My intuition tells that
nutch probably does store the outlinks of a URL somewhere but its hard to
tell where.


Any ideas appreciated.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642

_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] The Nutch Crawler and the Web Link Graph

Reply via email to