I am currently attempting to dump the contents of a crawl into multiple WARC files using
./bin/nutch commoncrawldump -outputDir nameOfOutputDir -segment crawl/segments/segmentDir -warc However, I get multiple occurrences of URL skipped. Content of size X was truncated to Y. I have set both http.content.limit and file.content.limit to -1 in order to remove any limits, but I'm guessing neither applies to this situation. Any way of removing said cap? Thanks, JJAM

