Indexing documents with all incoming links

After implementing my own org.apache.nutch.inderer.IndexWriter I checkthe data coming along and I only see


url
tstamp
digest
boost
segment
cache
host
title
content


In particular I would like to see all incoming links for the document.

I think I call the indexer correctly, because the linkdb is given on thecommand line and I see in the logs:

2014-04-28 15:53:39,963 INFO indexer.IndexerMapReduce -IndexerMapReduce: linkdb: nutch-crawldata/linkdb

I did not find in the code the place where a NutchDocument, as passed toIndexWriter.write() is created and filled.

Is it in principle possible to get incoming links in a NutchDocument forindexing or is this not even implemented?


Harald.

Reply via email to