[jira] [Commented] (NUTCH-2575) protocol-http does not respect the maximum content-size for chunked responses

2018-05-24 Thread Omkar Reddy (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488900#comment-16488900
 ] 

Omkar Reddy commented on NUTCH-2575:


I have taken up [NUTCH-2557|https://issues.apache.org/jira/browse/NUTCH-2557] 
and started working on it. Thanks. 

> protocol-http does not respect the maximum content-size for chunked responses
> -
>
> Key: NUTCH-2575
> URL: https://issues.apache.org/jira/browse/NUTCH-2575
> Project: Nutch
>  Issue Type: Sub-task
>  Components: protocol
>Affects Versions: 1.14
>Reporter: Gerard Bouchar
>Priority: Critical
> Fix For: 1.15
>
>
> There is a bug in HttpResponse::readChunkedContent that prevents it to stop 
> reading content when it exceeds the maximum allowed size.
> There [is a variable 
> contentBytesRead|https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L404]
>  that is used to check how much content has been read, but it is never 
> updated, so it always stays null, and [the size 
> check|https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L440-L442]
>  always returns false (unless a single chunk is larger than the maximum 
> allowed content size).
> This allows any server to cause out-of-memory errors on our size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2575) protocol-http does not respect the maximum content-size for chunked responses

2018-05-24 Thread Gerard Bouchar (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1649#comment-1649
 ] 

Gerard Bouchar commented on NUTCH-2575:
---

Thank you for the fix! Is there work being done on the other subissues ?

> protocol-http does not respect the maximum content-size for chunked responses
> -
>
> Key: NUTCH-2575
> URL: https://issues.apache.org/jira/browse/NUTCH-2575
> Project: Nutch
>  Issue Type: Sub-task
>  Components: protocol
>Affects Versions: 1.14
>Reporter: Gerard Bouchar
>Priority: Critical
> Fix For: 1.15
>
>
> There is a bug in HttpResponse::readChunkedContent that prevents it to stop 
> reading content when it exceeds the maximum allowed size.
> There [is a variable 
> contentBytesRead|https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L404]
>  that is used to check how much content has been read, but it is never 
> updated, so it always stays null, and [the size 
> check|https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L440-L442]
>  always returns false (unless a single chunk is larger than the maximum 
> allowed content size).
> This allows any server to cause out-of-memory errors on our size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2575) protocol-http does not respect the maximum content-size for chunked responses

2018-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16471706#comment-16471706
 ] 

ASF GitHub Bot commented on NUTCH-2575:
---

Omkar20895 commented on issue #327: NUTCH-2575 Storing total number of bytes 
read after every chunk
URL: https://github.com/apache/nutch/pull/327#issuecomment-388318227
 
 
   Thank you @sebastian-nagel 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> protocol-http does not respect the maximum content-size for chunked responses
> -
>
> Key: NUTCH-2575
> URL: https://issues.apache.org/jira/browse/NUTCH-2575
> Project: Nutch
>  Issue Type: Sub-task
>  Components: protocol
>Affects Versions: 1.14
>Reporter: Gerard Bouchar
>Priority: Critical
> Fix For: 1.15
>
>
> There is a bug in HttpResponse::readChunkedContent that prevents it to stop 
> reading content when it exceeds the maximum allowed size.
> There [is a variable 
> contentBytesRead|https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L404]
>  that is used to check how much content has been read, but it is never 
> updated, so it always stays null, and [the size 
> check|https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L440-L442]
>  always returns false (unless a single chunk is larger than the maximum 
> allowed content size).
> This allows any server to cause out-of-memory errors on our size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2575) protocol-http does not respect the maximum content-size for chunked responses

2018-05-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16471213#comment-16471213
 ] 

Hudson commented on NUTCH-2575:
---

SUCCESS: Integrated in Jenkins build Nutch-trunk #3524 (See 
[https://builds.apache.org/job/Nutch-trunk/3524/])
NUTCH-2575 Storing total number of bytes read after every chunk 
(omkarreddy2008: 
[https://github.com/apache/nutch/commit/b541de8ff20b818667e2765664ae2f133b439dc3])
* (edit) 
src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java


> protocol-http does not respect the maximum content-size for chunked responses
> -
>
> Key: NUTCH-2575
> URL: https://issues.apache.org/jira/browse/NUTCH-2575
> Project: Nutch
>  Issue Type: Sub-task
>  Components: protocol
>Affects Versions: 1.14
>Reporter: Gerard Bouchar
>Priority: Critical
> Fix For: 1.15
>
>
> There is a bug in HttpResponse::readChunkedContent that prevents it to stop 
> reading content when it exceeds the maximum allowed size.
> There [is a variable 
> contentBytesRead|https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L404]
>  that is used to check how much content has been read, but it is never 
> updated, so it always stays null, and [the size 
> check|https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L440-L442]
>  always returns false (unless a single chunk is larger than the maximum 
> allowed content size).
> This allows any server to cause out-of-memory errors on our size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2575) protocol-http does not respect the maximum content-size for chunked responses

2018-05-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16471105#comment-16471105
 ] 

ASF GitHub Bot commented on NUTCH-2575:
---

sebastian-nagel closed pull request #327: NUTCH-2575 Storing total number of 
bytes read after every chunk
URL: https://github.com/apache/nutch/pull/327
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java
 
b/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java
index c87c11125..591b94298 100644
--- 
a/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java
+++ 
b/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java
@@ -464,6 +464,7 @@ private void readChunkedContent(PushbackInputStream in, 
StringBuffer line)
 chunkBytesRead += len;
   }
 
+  contentBytesRead += chunkBytesRead;
   readLine(in, line, false);
 
 }


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> protocol-http does not respect the maximum content-size for chunked responses
> -
>
> Key: NUTCH-2575
> URL: https://issues.apache.org/jira/browse/NUTCH-2575
> Project: Nutch
>  Issue Type: Sub-task
>Affects Versions: 1.14
>Reporter: Gerard Bouchar
>Priority: Critical
>
> There is a bug in HttpResponse::readChunkedContent that prevents it to stop 
> reading content when it exceeds the maximum allowed size.
> There [is a variable 
> contentBytesRead|https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L404]
>  that is used to check how much content has been read, but it is never 
> updated, so it always stays null, and [the size 
> check|https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L440-L442]
>  always returns false (unless a single chunk is larger than the maximum 
> allowed content size).
> This allows any server to cause out-of-memory errors on our size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)