It could b related to http://issues.apache.org/jira/browse/NUTCH-374 when the property http.content.limit is set to -1 and the data from the server is gzip'ed the content is not decoded properly. Jason
On Feb 8, 2007, at 6:45 AM, wangxu wrote: > wangxu wrote: >> when I fetched some certain sites, >> I got empty content,contentType,but the fetch status was >> "fetch_success" and the metadata was sometimes not empty. >> >> how does website configure itself to achieve this? >> any methods to avoid this situation? >> I used agent-name: >> Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; MyIE2; .NET CLR >> 1.1.4322) >> >> > sorry,empty content,parsedtext/parseddata ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier. Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
