Patch für httpResponse

2011-08-23 Thread Simone Frenzel
-- Forwarded message --
From: Simone Frenzel psimon...@googlemail.com
Date: 2011/8/22
Subject: Patch für httpResponse
To: dev-subscr...@nutch.apache.org


Hi,

tested nutch on differnt webpages. In case of a short ziped pages  it thrwos
an IO_Exception:
java.io.IOException: unzipBestEffort returned null
2011-08-19 17:06:55,190 ERROR httpclient.Http - at
org.apache.nutch.protocol.http.api.HttpBase.processGzipEncoded(HttpBase.java:310)
2011-08-19 17:06:55,191 ERROR httpclient.Http - at
org.apache.nutch.protocol.httpclient.HttpResponse.init(HttpResponse.java:163)
2011-08-19 17:06:55,191 ERROR httpclient.Http - at
org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
2011-08-19 17:06:55,191 ERROR httpclient.Http - at
org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:138)
2011-08-19 17:06:55,191 ERROR httpclient.Http - at
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:628)

 a  little change on HttpResponse solve the problem - now there is no
problem with zipped Pages, BaiscAuth and Zipped Pages ... anymore.

Patch is attched.

Greetings and thanks
Index: trunk/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/HttpResponse.java
===
--- trunk/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/HttpResponse.java	(Revision 1160266)
+++ trunk/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/HttpResponse.java	(Arbeitskopie)
@@ -124,7 +124,7 @@
 int totalRead = 0;
 ByteArrayOutputStream out = new ByteArrayOutputStream();
 while ((bufferFilled = in.read(buffer, 0, buffer.length)) != -1
- totalRead + bufferFilled  contentLength) {
+ totalRead + bufferFilled = contentLength) {
   totalRead += bufferFilled;
   out.write(buffer, 0, bufferFilled);
 }


Re: Patch für httpResponse

2011-08-23 Thread Julien Nioche
Simone,

Would you mind opening a JIRA for this and attach your patch + grant it to
ASF? I know it is fairly small but it makes it easier to track the progress,
link to svn commits, etc...

Thanks

Julien

On 23 August 2011 07:53, Simone Frenzel psimon...@googlemail.com wrote:



 -- Forwarded message --
 From: Simone Frenzel psimon...@googlemail.com
 Date: 2011/8/22
 Subject: Patch für httpResponse
 To: dev-subscr...@nutch.apache.org


 Hi,

 tested nutch on differnt webpages. In case of a short ziped pages  it
 thrwos an IO_Exception:
 java.io.IOException: unzipBestEffort returned null
 2011-08-19 17:06:55,190 ERROR httpclient.Http - at
 org.apache.nutch.protocol.http.api.HttpBase.processGzipEncoded(HttpBase.java:310)
 2011-08-19 17:06:55,191 ERROR httpclient.Http - at
 org.apache.nutch.protocol.httpclient.HttpResponse.init(HttpResponse.java:163)
 2011-08-19 17:06:55,191 ERROR httpclient.Http - at
 org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
 2011-08-19 17:06:55,191 ERROR httpclient.Http - at
 org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:138)
 2011-08-19 17:06:55,191 ERROR httpclient.Http - at
 org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:628)

  a  little change on HttpResponse solve the problem - now there is no
 problem with zipped Pages, BaiscAuth and Zipped Pages ... anymore.

 Patch is attched.

 Greetings and thanks




-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com