Hi, I was looking at nutch as a crawler for indexing into Indri. In Indri's docs, it lists "warc" as a corpus class option described as "WARC (Web ARChive) format, such as is output by the Nutch webcrawler" -- c.f. http://lemur.sourceforge.net/indri/IndriIndexer.html
After finishing a short crawl using nutch (v1.2), I found no way to produce WARC output -- neither the native data store nor any of the export/dump options appear to be WARC. I've inquired on Indri/Lemur forums about this, but I thought I'd check here also if anyone knows what the docs might be referring to... or how else I might proceed. Thanks! -Michael

