[jira] [Updated] (NUTCH-685) Content-level redirect status lost in ParseSegment

2019-09-27 Thread Sebastian Nagel (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel updated NUTCH-685:
--
Fix Version/s: 1.17

> Content-level redirect status lost in ParseSegment
> --
>
> Key: NUTCH-685
> URL: https://issues.apache.org/jira/browse/NUTCH-685
> Project: Nutch
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Andrzej Bialecki
>Priority: Major
> Fix For: 1.17
>
>
> When Fetcher runs in parsing mode, content-level redirects (HTML meta tag 
> "Refresh") are properly discovered and recorded in crawl_fetch under source 
> URL and target URL. If Fetcher runs in non-parsing mode, and ParseSegment is 
> run as a separate step, the content-level redirection data is used only to 
> add the new (target) URL, but the status of the original URL is not reset to 
> indicate a redirect. Consequently, status of the original URL will be 
> different depending on the way you run Fetcher, whereas it should be the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (NUTCH-685) Content-level redirect status lost in ParseSegment

2015-01-29 Thread Julien Nioche (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Nioche updated NUTCH-685:

Assignee: (was: Julien Nioche)

 Content-level redirect status lost in ParseSegment
 --

 Key: NUTCH-685
 URL: https://issues.apache.org/jira/browse/NUTCH-685
 Project: Nutch
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Andrzej Bialecki 
 Fix For: 1.11


 When Fetcher runs in parsing mode, content-level redirects (HTML meta tag 
 Refresh) are properly discovered and recorded in crawl_fetch under source 
 URL and target URL. If Fetcher runs in non-parsing mode, and ParseSegment is 
 run as a separate step, the content-level redirection data is used only to 
 add the new (target) URL, but the status of the original URL is not reset to 
 indicate a redirect. Consequently, status of the original URL will be 
 different depending on the way you run Fetcher, whereas it should be the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-685) Content-level redirect status lost in ParseSegment

2014-03-14 Thread Sebastian Nagel (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel updated NUTCH-685:
--

Fix Version/s: 1.9

 Content-level redirect status lost in ParseSegment
 --

 Key: NUTCH-685
 URL: https://issues.apache.org/jira/browse/NUTCH-685
 Project: Nutch
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Andrzej Bialecki 
Assignee: Andrzej Bialecki 
 Fix For: 2.4, 1.9


 When Fetcher runs in parsing mode, content-level redirects (HTML meta tag 
 Refresh) are properly discovered and recorded in crawl_fetch under source 
 URL and target URL. If Fetcher runs in non-parsing mode, and ParseSegment is 
 run as a separate step, the content-level redirection data is used only to 
 add the new (target) URL, but the status of the original URL is not reset to 
 indicate a redirect. Consequently, status of the original URL will be 
 different depending on the way you run Fetcher, whereas it should be the same.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (NUTCH-685) Content-level redirect status lost in ParseSegment

2013-01-12 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-685:
---

Fix Version/s: 2.2
   1.7

 Content-level redirect status lost in ParseSegment
 --

 Key: NUTCH-685
 URL: https://issues.apache.org/jira/browse/NUTCH-685
 Project: Nutch
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Andrzej Bialecki 
Assignee: Andrzej Bialecki 
 Fix For: 1.7, 2.2


 When Fetcher runs in parsing mode, content-level redirects (HTML meta tag 
 Refresh) are properly discovered and recorded in crawl_fetch under source 
 URL and target URL. If Fetcher runs in non-parsing mode, and ParseSegment is 
 run as a separate step, the content-level redirection data is used only to 
 add the new (target) URL, but the status of the original URL is not reset to 
 indicate a redirect. Consequently, status of the original URL will be 
 different depending on the way you run Fetcher, whereas it should be the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira