[jira] [Updated] (NUTCH-952) fix outlink which started with '?' in html parser
[ https://issues.apache.org/jira/browse/NUTCH-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-952: -- Attachment: test_nutch_952.html Was fixed by NUTCH-797 for v 1.4 (2.x will follow soon). Example link (attached) works now for 1.8 (both with parse-html and parse-tika): {code} % nutch parsechecker http://localhost/test_nutch_952.html ... Outlinks: 1 outlink: toUrl: http://bbs.soso.com/search?w=ruby%20on%20railsty=csd=0 {code} fix outlink which started with '?' in html parser - Key: NUTCH-952 URL: https://issues.apache.org/jira/browse/NUTCH-952 Project: Nutch Issue Type: Bug Components: parser Affects Versions: nutchgora Reporter: Stondet Attachments: NUTCH-952-v2.patch, test_nutch_952.html a href=?w=ruby%20on%20railsty=csd=0 ruby on rails/a(a snippet from http://bbs.soso.com/search?ty=csd=0w=rails) outlink parsed from above link: http://bbs.soso.com/?w=ruby%20on%20railsty=csd=0 but expected is http://bbs.soso.com/search?w=ruby%20on%20railsty=csd=0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (NUTCH-952) fix outlink which started with '?' in html parser
[ https://issues.apache.org/jira/browse/NUTCH-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-952: -- Fix Version/s: (was: 1.9) fix outlink which started with '?' in html parser - Key: NUTCH-952 URL: https://issues.apache.org/jira/browse/NUTCH-952 Project: Nutch Issue Type: Bug Components: parser Affects Versions: nutchgora Reporter: Stondet Attachments: NUTCH-952-v2.patch, test_nutch_952.html a href=?w=ruby%20on%20railsty=csd=0 ruby on rails/a(a snippet from http://bbs.soso.com/search?ty=csd=0w=rails) outlink parsed from above link: http://bbs.soso.com/?w=ruby%20on%20railsty=csd=0 but expected is http://bbs.soso.com/search?w=ruby%20on%20railsty=csd=0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (NUTCH-952) fix outlink which started with '?' in html parser
[ https://issues.apache.org/jira/browse/NUTCH-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-952: --- Fix Version/s: 1.7 fix outlink which started with '?' in html parser - Key: NUTCH-952 URL: https://issues.apache.org/jira/browse/NUTCH-952 Project: Nutch Issue Type: Bug Components: parser Affects Versions: nutchgora Reporter: Stondet Fix For: 1.7 Attachments: NUTCH-952-v2.patch a href=?w=ruby%20on%20railsty=csd=0 ruby on rails/a(a snippet from http://bbs.soso.com/search?ty=csd=0w=rails) outlink parsed from above link: http://bbs.soso.com/?w=ruby%20on%20railsty=csd=0 but expected is http://bbs.soso.com/search?w=ruby%20on%20railsty=csd=0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (NUTCH-952) fix outlink which started with '?' in html parser
[ https://issues.apache.org/jira/browse/NUTCH-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stondet updated NUTCH-952: -- Affects Version/s: (was: 1.3) 2.0 fix outlink which started with '?' in html parser - Key: NUTCH-952 URL: https://issues.apache.org/jira/browse/NUTCH-952 Project: Nutch Issue Type: Bug Components: parser Affects Versions: 2.0 Reporter: Stondet Attachments: NUTCH-952-v2.patch a href=?w=ruby%20on%20railsty=csd=0 ruby on rails/a(a snippet from http://bbs.soso.com/search?ty=csd=0w=rails) outlink parsed from above link: http://bbs.soso.com/?w=ruby%20on%20railsty=csd=0 but expected is http://bbs.soso.com/search?w=ruby%20on%20railsty=csd=0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (NUTCH-952) fix outlink which started with '?' in html parser
[ https://issues.apache.org/jira/browse/NUTCH-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stondet updated NUTCH-952: -- Attachment: NUTCH-952.patch fix outlink which started with '?' fix outlink which started with '?' in html parser - Key: NUTCH-952 URL: https://issues.apache.org/jira/browse/NUTCH-952 Project: Nutch Issue Type: Bug Components: parser Reporter: Stondet Attachments: NUTCH-952.patch a href=?w=ruby%20on%20railsty=csd=0 ruby on rails/a(a snippet from http://bbs.soso.com/search?ty=csd=0w=rails) outlink parsed from above link: http://bbs.soso.com/?w=ruby%20on%20railsty=csd=0 but expected is http://bbs.soso.com/search?w=ruby%20on%20railsty=csd=0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.