I am currently attempting to dump the contents of a crawl into multiple
WARC files using 

./bin/nutch commoncrawldump -outputDir nameOfOutputDir -segment
crawl/segments/segmentDir -warc

However, I get multiple occurrences of 

URL skipped. Content of size X was truncated to Y. 

I have set both http.content.limit and file.content.limit to -1 in order
to remove any limits, but I'm guessing neither applies to this
situation. Any way of removing said cap? 

Thanks, 

JJAM

  

Reply via email to