I have nutch crawling and solr indexing successfully and I have dumped
the index to XML with Luke.
What I would like to do is generate one xml file per url crawled for
loading into an XML database(MarkLogic). Yeah I can write a java or
xquery tool to convert the 1 big xml file that Luke dumps to individual
files.
Ideally nutch would output these files so I wouldn't need to have solr,
Luke, and some tool I need to write in the content processing chain.
KISS right?
Any thoughts on how to do this in the simplest way?
thanks,
Mike