Linkdb contains all the information about the web graph. After fetching
the segments, you should run bin/nutch invertlinks to build the linkdb,
which is a MapFile. The entries in the MapFile are <key,value> pairs,
where keys are Text objects(containing urls) and values are Inlinks
objects. In fact FYI, linkdb can easily be "processed" by map-reduce jobs.
DS jha wrote:
Hi -
I want to read the map of incoming and outgoing links of a document
and use that for some analysis purpose. Does nutch store link graph
once fetch/parse/index is complete?
After browsing thru the code, it does seem that during document
parsing and storing, incoming and outgoing links are getting passed
around between objects but is that information available once the
process is complete - by reading segment or index information?
Thanks,
Jha