Yes, there is a way to download webpages without truncating. Just put http.content.limit in the nutch-site.xml file with the value -1.
<property> <name>http.content.limit</name> <value>-1</value> <description>The length limit for downloaded content, in bytes. If this value is nonnegative (>=0), content longer than it will be truncated; otherwise, no truncation at all. </description></property> On Sun, Apr 16, 2017 at 7:34 PM Fabio Ricci <[email protected]> wrote: > Hi > > is there somebody here ;) - Don’t expect you on Easter… > > NUTCH 1.13 stores in the dump incomplete websites. > > Is there a way to instruct it to download all content of a website, from > <html> to </html> ? > > Thank you very much in advance > > Regards > Fabio

