And your regex rules?
So is the URL fetched?

On Thu, Jan 31, 2013 at 8:47 PM, Sourajit Basak
<[email protected]> wrote:
> Here it goes.
>
> Try to dump the content from this url with the following settings.
> http://www.nytimes.com/2013/01/31/technology/chinese-hackers-infiltrate-new-york-times-computers.html?pagewanted=2&_r=0&ref=global-home
>
>   <property>
>     <name>http.content.limit</name>
>     <value>-1</value>
>   </property>
>
> This page is gzip encoded. You will see that the fetcher is unable to
> download any content. Check by inspecting the content-length.
> Initially I was thinking it to be a problem with the parse-html plugin but
> now it seems that the fetcher returns null content.
>
> This seemed related to NUTCH-374
>
> Let me know if you need further info.

Reply via email to