[jira] [Commented] (NUTCH-1344) BasicURLNormalizer to normalize https same as http

2012-10-10 Thread Julien Nioche (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13473066#comment-13473066
 ] 

Julien Nioche commented on NUTCH-1344:
--

Good catch Sebastian. PLease commit to both trunk and 2.x

 BasicURLNormalizer to normalize https same as http 
 ---

 Key: NUTCH-1344
 URL: https://issues.apache.org/jira/browse/NUTCH-1344
 Project: Nutch
  Issue Type: Bug
Affects Versions: nutchgora, 1.6
Reporter: Sebastian Nagel
 Attachments: NUTCH-1344.patch


 Most of the normalization done by BasicURLNormalizer (lowercasing host, 
 removing default port, removal of page anchors, cleaning . and . in the path) 
 is not done for URLs with protocol https.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1344) BasicURLNormalizer to normalize https same as http

2012-10-09 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13472219#comment-13472219
 ] 

Markus Jelsma commented on NUTCH-1344:
--

I wouldn't know why. I think they should be treated equally. 

 BasicURLNormalizer to normalize https same as http 
 ---

 Key: NUTCH-1344
 URL: https://issues.apache.org/jira/browse/NUTCH-1344
 Project: Nutch
  Issue Type: Bug
Affects Versions: nutchgora, 1.6
Reporter: Sebastian Nagel
 Attachments: NUTCH-1344.patch


 Most of the normalization done by BasicURLNormalizer (lowercasing host, 
 removing default port, removal of page anchors, cleaning . and . in the path) 
 is not done for URLs with protocol https.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1344) BasicURLNormalizer to normalize https same as http

2012-10-08 Thread Sebastian Nagel (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471915#comment-13471915
 ] 

Sebastian Nagel commented on NUTCH-1344:


Is there any reason why https should be treated different from http (and ftp)?

 BasicURLNormalizer to normalize https same as http 
 ---

 Key: NUTCH-1344
 URL: https://issues.apache.org/jira/browse/NUTCH-1344
 Project: Nutch
  Issue Type: Bug
Affects Versions: nutchgora, 1.6
Reporter: Sebastian Nagel
 Attachments: NUTCH-1344.patch


 Most of the normalization done by BasicURLNormalizer (lowercasing host, 
 removing default port, removal of page anchors, cleaning . and . in the path) 
 is not done for URLs with protocol https.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira