[jira] [Updated] (NUTCH-2555) URL normalization problem: path not starting with a '/'

2018-06-08 Thread Sebastian Nagel (JIRA)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel updated NUTCH-2555:
---
Fix Version/s: 1.15

> URL normalization problem: path not starting with a '/'
> ---
>
> Key: NUTCH-2555
> URL: https://issues.apache.org/jira/browse/NUTCH-2555
> Project: Nutch
>  Issue Type: Sub-task
>Affects Versions: 1.14
>Reporter: Gerard Bouchar
>Priority: Major
> Fix For: 1.15
>
>
> When an URL does not have a path but has GET parameters (for instance 
> '[http://example.com?a=1')|http://example.com/?a=1%27)] it should be 
> normalized to add a '/' at the beginning of the path (giving 
> [http://example.com/?a=1|http://example.com/?a=1%27)]). Our logs show that 
> non-normalized URLs reach protocol-http, which then uses URL::getFile() to 
> get the path, and tries to send an invalid HTTP request:
> GET ?a=1 HTTP/1.0
> instead of
> GET /?a=1 HTTP/1.0
>  
> Example URL for which this poses a problem: 
> [http://news.fx678.com?171|http://news.fx678.com/?171]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (NUTCH-2555) URL normalization problem: path not starting with a '/'

2018-06-08 Thread Sebastian Nagel (JIRA)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel updated NUTCH-2555:
---
Affects Version/s: 1.14

> URL normalization problem: path not starting with a '/'
> ---
>
> Key: NUTCH-2555
> URL: https://issues.apache.org/jira/browse/NUTCH-2555
> Project: Nutch
>  Issue Type: Sub-task
>Affects Versions: 1.14
>Reporter: Gerard Bouchar
>Priority: Major
> Fix For: 1.15
>
>
> When an URL does not have a path but has GET parameters (for instance 
> '[http://example.com?a=1')|http://example.com/?a=1%27)] it should be 
> normalized to add a '/' at the beginning of the path (giving 
> [http://example.com/?a=1|http://example.com/?a=1%27)]). Our logs show that 
> non-normalized URLs reach protocol-http, which then uses URL::getFile() to 
> get the path, and tries to send an invalid HTTP request:
> GET ?a=1 HTTP/1.0
> instead of
> GET /?a=1 HTTP/1.0
>  
> Example URL for which this poses a problem: 
> [http://news.fx678.com?171|http://news.fx678.com/?171]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (NUTCH-2555) URL normalization problem: path not starting with a '/'

2018-04-09 Thread Gerard Bouchar (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerard Bouchar updated NUTCH-2555:
--
Description: 
When an URL does not have a path but has GET parameters (for instance 
'[http://example.com?a=1')|http://example.com/?a=1%27)] it should be normalized 
to add a '/' at the beginning of the path (giving 
[http://example.com/?a=1|http://example.com/?a=1%27)]). Our logs show that 
non-normalized URLs reach protocol-http, which then uses URL::getFile() to get 
the path, and tries to send an invalid HTTP request:

GET ?a=1 HTTP/1.0

instead of

GET /?a=1 HTTP/1.0

 

Example URL for which this poses a problem: 
[http://news.fx678.com?171|http://news.fx678.com/?171]

  was:
When an URL does not have a path but has GET parameters (for instance 
'[http://example.com?a=1')|http://example.com/?a=1%27)] it should be normalized 
to add a '/' at the beginning of the path (giving 
[http://example.com/?a=1|http://example.com/?a=1%27)]). Our logs show that 
non-normalized URLs reach protocol-http, which then tries to send an invalid 
HTTP request:

GET ?a=1 HTTP/1.0

instead of

GET /?a=1 HTTP/1.0

 

Example URL for which this poses a problem: 
[http://news.fx678.com?171|http://news.fx678.com/?171]


> URL normalization problem: path not starting with a '/'
> ---
>
> Key: NUTCH-2555
> URL: https://issues.apache.org/jira/browse/NUTCH-2555
> Project: Nutch
>  Issue Type: Sub-task
>Reporter: Gerard Bouchar
>Priority: Major
>
> When an URL does not have a path but has GET parameters (for instance 
> '[http://example.com?a=1')|http://example.com/?a=1%27)] it should be 
> normalized to add a '/' at the beginning of the path (giving 
> [http://example.com/?a=1|http://example.com/?a=1%27)]). Our logs show that 
> non-normalized URLs reach protocol-http, which then uses URL::getFile() to 
> get the path, and tries to send an invalid HTTP request:
> GET ?a=1 HTTP/1.0
> instead of
> GET /?a=1 HTTP/1.0
>  
> Example URL for which this poses a problem: 
> [http://news.fx678.com?171|http://news.fx678.com/?171]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (NUTCH-2555) URL normalization problem: path not starting with a '/'

2018-04-09 Thread Gerard Bouchar (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerard Bouchar updated NUTCH-2555:
--
Description: 
When an URL does not have a path but has GET parameters (for instance 
'[http://example.com?a=1')|http://example.com/?a=1%27)] it should be normalized 
to add a '/' at the beginning of the path (giving 
[http://example.com/?a=1|http://example.com/?a=1%27)]). Our logs show that 
non-normalized URLs reach protocol-http, which then tries to send an invalid 
HTTP request:

GET ?a=1 HTTP/1.0

instead of

GET /?a=1 HTTP/1.0

 

Example URL for which this poses a problem: 
[http://news.fx678.com?171|http://news.fx678.com/?171]

  was:
When an URL does not have a path but has GET parameters (for instance 
'[http://example.com?a=1')|http://example.com/?a=1%27)] it should be normalized 
to add a '/' at the beginning of the path (giving 
[http://example.com/?a=1|http://example.com/?a=1%27)]). Our logs show that 
non-normalized URLs reach protocol-http, which then tries to send an invalid 
HTTP request:

GET ?a=1 HTTP/1.0

instead of

GET /?a=1 HTTP/1.0


> URL normalization problem: path not starting with a '/'
> ---
>
> Key: NUTCH-2555
> URL: https://issues.apache.org/jira/browse/NUTCH-2555
> Project: Nutch
>  Issue Type: Sub-task
>Reporter: Gerard Bouchar
>Priority: Major
>
> When an URL does not have a path but has GET parameters (for instance 
> '[http://example.com?a=1')|http://example.com/?a=1%27)] it should be 
> normalized to add a '/' at the beginning of the path (giving 
> [http://example.com/?a=1|http://example.com/?a=1%27)]). Our logs show that 
> non-normalized URLs reach protocol-http, which then tries to send an invalid 
> HTTP request:
> GET ?a=1 HTTP/1.0
> instead of
> GET /?a=1 HTTP/1.0
>  
> Example URL for which this poses a problem: 
> [http://news.fx678.com?171|http://news.fx678.com/?171]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)