Yes, there is a way to download webpages without truncating. Just put
http.content.limit in the nutch-site.xml file with the value -1.

<property>  <name>http.content.limit</name>  <value>-1</value>
<description>The length limit for downloaded content, in bytes.  If
this value is nonnegative (>=0), content longer than it will be
truncated;  otherwise, no truncation at all.
</description></property>


On Sun, Apr 16, 2017 at 7:34 PM Fabio Ricci <[email protected]>
wrote:

> Hi
>
> is there somebody here ;) - Don’t expect you on Easter…
>
> NUTCH 1.13 stores in the dump incomplete websites.
>
> Is there a way to instruct it to download all content of a website, from
> <html> to </html> ?
>
> Thank you very much in advance
>
> Regards
> Fabio

Reply via email to