[jira] [Commented] (NUTCH-2560) protocol-http throws an error when an http header spans over multiple lines

2018-06-12 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509872#comment-16509872
 ] 

Hudson commented on NUTCH-2560:
---

SUCCESS: Integrated in Jenkins build Nutch-trunk #3534 (See 
[https://builds.apache.org/job/Nutch-trunk/3534/])
NUTCH-2560 protocol-http throws an error when an http header spans over 
(snagel: 
[https://github.com/apache/nutch/commit/a2771dc0d1f551b8dd1e07609ce978251a05f34a])
* (edit) 
src/plugin/protocol-http/src/test/org/apache/nutch/protocol/http/TestBadServerResponses.java


> protocol-http throws an error when an http header spans over multiple lines
> ---
>
> Key: NUTCH-2560
> URL: https://issues.apache.org/jira/browse/NUTCH-2560
> Project: Nutch
>  Issue Type: Sub-task
>Affects Versions: 1.14
>Reporter: Gerard Bouchar
>Priority: Major
> Fix For: 1.15
>
>
> Some servers invalidly send headers that span over multiple lines. In that 
> case, browsers simply ignore the subsequent lines, but protocol-http throws 
> an error, thus preventing us from fetching the contents of the page.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2560) protocol-http throws an error when an http header spans over multiple lines

2018-06-11 Thread Sebastian Nagel (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508028#comment-16508028
 ] 

Sebastian Nagel commented on NUTCH-2560:


See [RFC 7230, section 3.2.4|https://tools.ietf.org/html/rfc7230#section-3.2.4]:
{quote}Historically, HTTP header field values could be extended over
   multiple lines by preceding each extra line with at least one space
   or horizontal tab (obs-fold).  This specification deprecates such
   line folding{quote}

Actually this seems to work if multi-line headers follow the spec (extra space 
at beginning of line), the unit test in [commit 
a2771dc|https://github.com/apache/nutch/pull/347/commits/a2771dc0d1f551b8dd1e07609ce978251a05f34a]
 passes if ported to Nutch 1.14.

> protocol-http throws an error when an http header spans over multiple lines
> ---
>
> Key: NUTCH-2560
> URL: https://issues.apache.org/jira/browse/NUTCH-2560
> Project: Nutch
>  Issue Type: Sub-task
>Affects Versions: 1.14
>Reporter: Gerard Bouchar
>Priority: Major
> Fix For: 1.15
>
>
> Some servers invalidly send headers that span over multiple lines. In that 
> case, browsers simply ignore the subsequent lines, but protocol-http throws 
> an error, thus preventing us from fetching the contents of the page.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)