After implementing my own org.apache.nutch.inderer.IndexWriter I check
the data coming along and I only see
url
tstamp
digest
boost
segment
cache
host
title
content
In particular I would like to see all incoming links for the document.
I think I call the indexer correctly, because the linkdb is given on the
command line and I see in the logs:
2014-04-28 15:53:39,963 INFO indexer.IndexerMapReduce -
IndexerMapReduce: linkdb: nutch-crawldata/linkdb
I did not find in the code the place where a NutchDocument, as passed to
IndexWriter.write() is created and filled.
Is it in principle possible to get incoming links in a NutchDocument for
indexing or is this not even implemented?
Harald.