I have nutch crawling and solr indexing successfully and I have dumped the index to XML with Luke.

What I would like to do is generate one xml file per url crawled for loading into an XML database(MarkLogic). Yeah I can write a java or xquery tool to convert the 1 big xml file that Luke dumps to individual files.

Ideally nutch would output these files so I wouldn't need to have solr, Luke, and some tool I need to write in the content processing chain. KISS right?

Any thoughts on how to do this in the simplest way?

thanks,

Mike

Reply via email to