[jira] [Updated] (NUTCH-1250) parse-html does not parse links with empty anchor
[ https://issues.apache.org/jira/browse/NUTCH-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1250: --- Fix Version/s: 1.8 parse-html does not parse links with empty anchor - Key: NUTCH-1250 URL: https://issues.apache.org/jira/browse/NUTCH-1250 Project: Nutch Issue Type: Bug Components: parser Affects Versions: 1.4 Reporter: Andreas Janning Fix For: 2.3, 1.8 Attachments: DOMContentUtils_v1.patch, DOMContentUtils_v2.patch, TestDomContentUitls_v1.patch The parse-html plugin does not generate an outlink if the link has no anchor For example the following HTML-Code does not create an Outlink: {code:html} a href=example.com/a {code} The JUnit-Test TestDOMContentUtils tries to test this but fails since there is a comment inside the a-Tag. {code:title=TestDOMContentUtils.java|borderStyle=solid} new String(htmlheadtitle title /title + /headbody + a href=\g\!--no anchor--/a + a href=\g1\ !--whitespace-- /a + a href=\g2\ img src=test.gif alt='bla bla' /a + /body/html), {code} When you remove the comment the test fails. {code:title=TestDOMContentUtils.java Test fails|borderStyle=solid} new String(htmlheadtitle title /title + /headbody + a href=\g\/a // no anchor + a href=\g1\ !--whitespace-- /a + a href=\g2\ img src=test.gif alt='bla bla' /a + /body/html), {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1250) parse-html does not parse links with empty anchor
[ https://issues.apache.org/jira/browse/NUTCH-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1250: Patch Info: Patch Available parse-html does not parse links with empty anchor - Key: NUTCH-1250 URL: https://issues.apache.org/jira/browse/NUTCH-1250 Project: Nutch Issue Type: Bug Components: parser Affects Versions: 1.4 Reporter: Andreas Janning Fix For: 1.7, 2.2 Attachments: DOMContentUtils_v1.patch, DOMContentUtils_v2.patch, TestDomContentUitls_v1.patch The parse-html plugin does not generate an outlink if the link has no anchor For example the following HTML-Code does not create an Outlink: {code:html} a href=example.com/a {code} The JUnit-Test TestDOMContentUtils tries to test this but fails since there is a comment inside the a-Tag. {code:title=TestDOMContentUtils.java|borderStyle=solid} new String(htmlheadtitle title /title + /headbody + a href=\g\!--no anchor--/a + a href=\g1\ !--whitespace-- /a + a href=\g2\ img src=test.gif alt='bla bla' /a + /body/html), {code} When you remove the comment the test fails. {code:title=TestDOMContentUtils.java Test fails|borderStyle=solid} new String(htmlheadtitle title /title + /headbody + a href=\g\/a // no anchor + a href=\g1\ !--whitespace-- /a + a href=\g2\ img src=test.gif alt='bla bla' /a + /body/html), {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1250) parse-html does not parse links with empty anchor
[ https://issues.apache.org/jira/browse/NUTCH-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1250: -- Attachment: DOMContentUtils_v2.patch parse-html does not parse links with empty anchor - Key: NUTCH-1250 URL: https://issues.apache.org/jira/browse/NUTCH-1250 Project: Nutch Issue Type: Bug Components: parser Affects Versions: 1.4 Reporter: Andreas Janning Fix For: 1.7, 2.2 Attachments: DOMContentUtils_v1.patch, DOMContentUtils_v2.patch The parse-html plugin does not generate an outlink if the link has no anchor For example the following HTML-Code does not create an Outlink: {code:html} a href=example.com/a {code} The JUnit-Test TestDOMContentUtils tries to test this but fails since there is a comment inside the a-Tag. {code:title=TestDOMContentUtils.java|borderStyle=solid} new String(htmlheadtitle title /title + /headbody + a href=\g\!--no anchor--/a + a href=\g1\ !--whitespace-- /a + a href=\g2\ img src=test.gif alt='bla bla' /a + /body/html), {code} When you remove the comment the test fails. {code:title=TestDOMContentUtils.java Test fails|borderStyle=solid} new String(htmlheadtitle title /title + /headbody + a href=\g\/a // no anchor + a href=\g1\ !--whitespace-- /a + a href=\g2\ img src=test.gif alt='bla bla' /a + /body/html), {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1250) parse-html does not parse links with empty anchor
[ https://issues.apache.org/jira/browse/NUTCH-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1250: -- Attachment: TestDomContentUitls_v1.patch TestDomContenxtUtils patch add no anchor test case. parse-html does not parse links with empty anchor - Key: NUTCH-1250 URL: https://issues.apache.org/jira/browse/NUTCH-1250 Project: Nutch Issue Type: Bug Components: parser Affects Versions: 1.4 Reporter: Andreas Janning Fix For: 1.7, 2.2 Attachments: DOMContentUtils_v1.patch, DOMContentUtils_v2.patch, TestDomContentUitls_v1.patch The parse-html plugin does not generate an outlink if the link has no anchor For example the following HTML-Code does not create an Outlink: {code:html} a href=example.com/a {code} The JUnit-Test TestDOMContentUtils tries to test this but fails since there is a comment inside the a-Tag. {code:title=TestDOMContentUtils.java|borderStyle=solid} new String(htmlheadtitle title /title + /headbody + a href=\g\!--no anchor--/a + a href=\g1\ !--whitespace-- /a + a href=\g2\ img src=test.gif alt='bla bla' /a + /body/html), {code} When you remove the comment the test fails. {code:title=TestDOMContentUtils.java Test fails|borderStyle=solid} new String(htmlheadtitle title /title + /headbody + a href=\g\/a // no anchor + a href=\g1\ !--whitespace-- /a + a href=\g2\ img src=test.gif alt='bla bla' /a + /body/html), {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1250) parse-html does not parse links with empty anchor
[ https://issues.apache.org/jira/browse/NUTCH-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1250: Fix Version/s: 2.2 1.7 parse-html does not parse links with empty anchor - Key: NUTCH-1250 URL: https://issues.apache.org/jira/browse/NUTCH-1250 Project: Nutch Issue Type: Bug Components: parser Affects Versions: 1.4 Reporter: Andreas Janning Fix For: 1.7, 2.2 The parse-html plugin does not generate an outlink if the link has no anchor For example the following HTML-Code does not create an Outlink: {code:html} a href=example.com/a {code} The JUnit-Test TestDOMContentUtils tries to test this but fails since there is a comment inside the a-Tag. {code:title=TestDOMContentUtils.java|borderStyle=solid} new String(htmlheadtitle title /title + /headbody + a href=\g\!--no anchor--/a + a href=\g1\ !--whitespace-- /a + a href=\g2\ img src=test.gif alt='bla bla' /a + /body/html), {code} When you remove the comment the test fails. {code:title=TestDOMContentUtils.java Test fails|borderStyle=solid} new String(htmlheadtitle title /title + /headbody + a href=\g\/a // no anchor + a href=\g1\ !--whitespace-- /a + a href=\g2\ img src=test.gif alt='bla bla' /a + /body/html), {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira