Hi,
I set the http.content.limit to -1 to not truncate any data being
fetched, however if the fetched data was compressed (http response
header Content-Encoding: gzip) then Nutch was not able to uncompress
this data. If i set http.content.limit to its default value of 65536,
Nutch did not have any problem. I debugged nutch in eclipse and I
think the problem is in GZIPUtils.java in the loop:
if ((written + size) > sizeLimit) {
outStream.write(buf, 0, sizeLimit - written);
break;
}
It should truncate the data only if sizeLimit >=0, so the above loop
should read:
if ((written + size) > sizeLimit && sizeLimit >=0) {
outStream.write(buf, 0, sizeLimit - written);
break;
}
Has anyone seen this before and is this solution correct?
Thanks,
Meghna
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general