Hello Sazedul Thank you for your hint - indeed I was hoping it would be so as you said. I am using the url http://amwmg.com/ <http://amwmg.com/> for tests, this is a quite long page.
Unfortunately even after having changed - in nutch-site.xml - the value of http.content.limit to -1 a truncation occur. The same happened even with a value of 5000000 … (So it seems I have to download url contents by myself… ) Thanks a lot anyway! Fabio > On 16 Apr 2017, at 15:50, Sazedul Islam <[email protected]> wrote: > > Yes, there is a way to download webpages without truncating. Just put > http.content.limit in the nutch-site.xml file with the value -1. > > <property> <name>http.content.limit</name> <value>-1</value> > <description>The length limit for downloaded content, in bytes. If > this value is nonnegative (>=0), content longer than it will be > truncated; otherwise, no truncation at all. > </description></property> > > > On Sun, Apr 16, 2017 at 7:34 PM Fabio Ricci <[email protected]> > wrote: > >> Hi >> >> is there somebody here ;) - Don’t expect you on Easter… >> >> NUTCH 1.13 stores in the dump incomplete websites. >> >> Is there a way to instruct it to download all content of a website, from >> <html> to </html> ? >> >> Thank you very much in advance >> >> Regards >> Fabio

