Re: can nutch output xml?

2012-10-24 Thread Sebastian Nagel
Hi Mike, afaik, it can't. But it would be really useful for archiving, post-processing, data mining, etc. Have a look at NUTCH-1047 and NUTCH-1088. Currently, you would need to write a class XMLIndexWriter which implements the interface NutchIndexWriter and use it via NutchIndexWriterFactory.ad

can nutch output xml?

2012-10-24 Thread Mike Whitman
I have nutch crawling and solr indexing successfully and I have dumped the index to XML with Luke. What I would like to do is generate one xml file per url crawled for loading into an XML database(MarkLogic). Yeah I can write a java or xquery tool to convert the 1 big xml file that Luke dumps