[jira] [Updated] (NUTCH-685) Content-level redirect status lost in ParseSegment
[ https://issues.apache.org/jira/browse/NUTCH-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-685: -- Fix Version/s: 1.17 > Content-level redirect status lost in ParseSegment > -- > > Key: NUTCH-685 > URL: https://issues.apache.org/jira/browse/NUTCH-685 > Project: Nutch > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Andrzej Bialecki >Priority: Major > Fix For: 1.17 > > > When Fetcher runs in parsing mode, content-level redirects (HTML meta tag > "Refresh") are properly discovered and recorded in crawl_fetch under source > URL and target URL. If Fetcher runs in non-parsing mode, and ParseSegment is > run as a separate step, the content-level redirection data is used only to > add the new (target) URL, but the status of the original URL is not reset to > indicate a redirect. Consequently, status of the original URL will be > different depending on the way you run Fetcher, whereas it should be the same. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (NUTCH-685) Content-level redirect status lost in ParseSegment
[ https://issues.apache.org/jira/browse/NUTCH-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-685: Assignee: (was: Julien Nioche) Content-level redirect status lost in ParseSegment -- Key: NUTCH-685 URL: https://issues.apache.org/jira/browse/NUTCH-685 Project: Nutch Issue Type: Bug Affects Versions: 1.0.0 Reporter: Andrzej Bialecki Fix For: 1.11 When Fetcher runs in parsing mode, content-level redirects (HTML meta tag Refresh) are properly discovered and recorded in crawl_fetch under source URL and target URL. If Fetcher runs in non-parsing mode, and ParseSegment is run as a separate step, the content-level redirection data is used only to add the new (target) URL, but the status of the original URL is not reset to indicate a redirect. Consequently, status of the original URL will be different depending on the way you run Fetcher, whereas it should be the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (NUTCH-685) Content-level redirect status lost in ParseSegment
[ https://issues.apache.org/jira/browse/NUTCH-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-685: -- Fix Version/s: 1.9 Content-level redirect status lost in ParseSegment -- Key: NUTCH-685 URL: https://issues.apache.org/jira/browse/NUTCH-685 Project: Nutch Issue Type: Bug Affects Versions: 1.0.0 Reporter: Andrzej Bialecki Assignee: Andrzej Bialecki Fix For: 2.4, 1.9 When Fetcher runs in parsing mode, content-level redirects (HTML meta tag Refresh) are properly discovered and recorded in crawl_fetch under source URL and target URL. If Fetcher runs in non-parsing mode, and ParseSegment is run as a separate step, the content-level redirection data is used only to add the new (target) URL, but the status of the original URL is not reset to indicate a redirect. Consequently, status of the original URL will be different depending on the way you run Fetcher, whereas it should be the same. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (NUTCH-685) Content-level redirect status lost in ParseSegment
[ https://issues.apache.org/jira/browse/NUTCH-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-685: --- Fix Version/s: 2.2 1.7 Content-level redirect status lost in ParseSegment -- Key: NUTCH-685 URL: https://issues.apache.org/jira/browse/NUTCH-685 Project: Nutch Issue Type: Bug Affects Versions: 1.0.0 Reporter: Andrzej Bialecki Assignee: Andrzej Bialecki Fix For: 1.7, 2.2 When Fetcher runs in parsing mode, content-level redirects (HTML meta tag Refresh) are properly discovered and recorded in crawl_fetch under source URL and target URL. If Fetcher runs in non-parsing mode, and ParseSegment is run as a separate step, the content-level redirection data is used only to add the new (target) URL, but the status of the original URL is not reset to indicate a redirect. Consequently, status of the original URL will be different depending on the way you run Fetcher, whereas it should be the same. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira