Hi Mike,
afaik, it can't. But it would be really useful for archiving, post-processing,
data mining, etc.
Have a look at NUTCH-1047 and NUTCH-1088. Currently, you would need to write a
class XMLIndexWriter
which implements the interface NutchIndexWriter and use it via
NutchIndexWriterFactory.ad
I have nutch crawling and solr indexing successfully and I have dumped
the index to XML with Luke.
What I would like to do is generate one xml file per url crawled for
loading into an XML database(MarkLogic). Yeah I can write a java or
xquery tool to convert the 1 big xml file that Luke dumps
2 matches
Mail list logo