[jira] [Updated] (NUTCH-1250) parse-html does not parse links with empty anchor

2013-05-22 Thread Sebastian Nagel (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel updated NUTCH-1250:
---

Fix Version/s: 1.8

 parse-html does not parse links with empty anchor
 -

 Key: NUTCH-1250
 URL: https://issues.apache.org/jira/browse/NUTCH-1250
 Project: Nutch
  Issue Type: Bug
  Components: parser
Affects Versions: 1.4
Reporter: Andreas Janning
 Fix For: 2.3, 1.8

 Attachments: DOMContentUtils_v1.patch, DOMContentUtils_v2.patch, 
 TestDomContentUitls_v1.patch


 The parse-html plugin does not generate an outlink if the link has no anchor
 For example the following HTML-Code does not create an Outlink:
 {code:html} 
   a href=example.com/a
 {code}
 The JUnit-Test TestDOMContentUtils tries to test this but fails since there 
 is a comment inside the a-Tag.
 {code:title=TestDOMContentUtils.java|borderStyle=solid}
 new String(htmlheadtitle title /title
 + /headbody
 + a href=\g\!--no anchor--/a
 + a href=\g1\ !--whitespace--  /a
 + a href=\g2\  img src=test.gif alt='bla bla' /a
 + /body/html), 
 {code}
 When you remove the comment the test fails.
 {code:title=TestDOMContentUtils.java Test fails|borderStyle=solid}
 new String(htmlheadtitle title /title
 + /headbody
 + a href=\g\/a // no anchor
 + a href=\g1\ !--whitespace--  /a
 + a href=\g2\  img src=test.gif alt='bla bla' /a
 + /body/html), 
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (NUTCH-1250) parse-html does not parse links with empty anchor

2013-04-24 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-1250:


Patch Info: Patch Available

 parse-html does not parse links with empty anchor
 -

 Key: NUTCH-1250
 URL: https://issues.apache.org/jira/browse/NUTCH-1250
 Project: Nutch
  Issue Type: Bug
  Components: parser
Affects Versions: 1.4
Reporter: Andreas Janning
 Fix For: 1.7, 2.2

 Attachments: DOMContentUtils_v1.patch, DOMContentUtils_v2.patch, 
 TestDomContentUitls_v1.patch


 The parse-html plugin does not generate an outlink if the link has no anchor
 For example the following HTML-Code does not create an Outlink:
 {code:html} 
   a href=example.com/a
 {code}
 The JUnit-Test TestDOMContentUtils tries to test this but fails since there 
 is a comment inside the a-Tag.
 {code:title=TestDOMContentUtils.java|borderStyle=solid}
 new String(htmlheadtitle title /title
 + /headbody
 + a href=\g\!--no anchor--/a
 + a href=\g1\ !--whitespace--  /a
 + a href=\g2\  img src=test.gif alt='bla bla' /a
 + /body/html), 
 {code}
 When you remove the comment the test fails.
 {code:title=TestDOMContentUtils.java Test fails|borderStyle=solid}
 new String(htmlheadtitle title /title
 + /headbody
 + a href=\g\/a // no anchor
 + a href=\g1\ !--whitespace--  /a
 + a href=\g2\  img src=test.gif alt='bla bla' /a
 + /body/html), 
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (NUTCH-1250) parse-html does not parse links with empty anchor

2013-01-27 Thread lufeng (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lufeng updated NUTCH-1250:
--

Attachment: DOMContentUtils_v2.patch

 parse-html does not parse links with empty anchor
 -

 Key: NUTCH-1250
 URL: https://issues.apache.org/jira/browse/NUTCH-1250
 Project: Nutch
  Issue Type: Bug
  Components: parser
Affects Versions: 1.4
Reporter: Andreas Janning
 Fix For: 1.7, 2.2

 Attachments: DOMContentUtils_v1.patch, DOMContentUtils_v2.patch


 The parse-html plugin does not generate an outlink if the link has no anchor
 For example the following HTML-Code does not create an Outlink:
 {code:html} 
   a href=example.com/a
 {code}
 The JUnit-Test TestDOMContentUtils tries to test this but fails since there 
 is a comment inside the a-Tag.
 {code:title=TestDOMContentUtils.java|borderStyle=solid}
 new String(htmlheadtitle title /title
 + /headbody
 + a href=\g\!--no anchor--/a
 + a href=\g1\ !--whitespace--  /a
 + a href=\g2\  img src=test.gif alt='bla bla' /a
 + /body/html), 
 {code}
 When you remove the comment the test fails.
 {code:title=TestDOMContentUtils.java Test fails|borderStyle=solid}
 new String(htmlheadtitle title /title
 + /headbody
 + a href=\g\/a // no anchor
 + a href=\g1\ !--whitespace--  /a
 + a href=\g2\  img src=test.gif alt='bla bla' /a
 + /body/html), 
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (NUTCH-1250) parse-html does not parse links with empty anchor

2013-01-27 Thread lufeng (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lufeng updated NUTCH-1250:
--

Attachment: TestDomContentUitls_v1.patch

TestDomContenxtUtils patch add no anchor test case.

 parse-html does not parse links with empty anchor
 -

 Key: NUTCH-1250
 URL: https://issues.apache.org/jira/browse/NUTCH-1250
 Project: Nutch
  Issue Type: Bug
  Components: parser
Affects Versions: 1.4
Reporter: Andreas Janning
 Fix For: 1.7, 2.2

 Attachments: DOMContentUtils_v1.patch, DOMContentUtils_v2.patch, 
 TestDomContentUitls_v1.patch


 The parse-html plugin does not generate an outlink if the link has no anchor
 For example the following HTML-Code does not create an Outlink:
 {code:html} 
   a href=example.com/a
 {code}
 The JUnit-Test TestDOMContentUtils tries to test this but fails since there 
 is a comment inside the a-Tag.
 {code:title=TestDOMContentUtils.java|borderStyle=solid}
 new String(htmlheadtitle title /title
 + /headbody
 + a href=\g\!--no anchor--/a
 + a href=\g1\ !--whitespace--  /a
 + a href=\g2\  img src=test.gif alt='bla bla' /a
 + /body/html), 
 {code}
 When you remove the comment the test fails.
 {code:title=TestDOMContentUtils.java Test fails|borderStyle=solid}
 new String(htmlheadtitle title /title
 + /headbody
 + a href=\g\/a // no anchor
 + a href=\g1\ !--whitespace--  /a
 + a href=\g2\  img src=test.gif alt='bla bla' /a
 + /body/html), 
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (NUTCH-1250) parse-html does not parse links with empty anchor

2013-01-12 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-1250:


Fix Version/s: 2.2
   1.7

 parse-html does not parse links with empty anchor
 -

 Key: NUTCH-1250
 URL: https://issues.apache.org/jira/browse/NUTCH-1250
 Project: Nutch
  Issue Type: Bug
  Components: parser
Affects Versions: 1.4
Reporter: Andreas Janning
 Fix For: 1.7, 2.2


 The parse-html plugin does not generate an outlink if the link has no anchor
 For example the following HTML-Code does not create an Outlink:
 {code:html} 
   a href=example.com/a
 {code}
 The JUnit-Test TestDOMContentUtils tries to test this but fails since there 
 is a comment inside the a-Tag.
 {code:title=TestDOMContentUtils.java|borderStyle=solid}
 new String(htmlheadtitle title /title
 + /headbody
 + a href=\g\!--no anchor--/a
 + a href=\g1\ !--whitespace--  /a
 + a href=\g2\  img src=test.gif alt='bla bla' /a
 + /body/html), 
 {code}
 When you remove the comment the test fails.
 {code:title=TestDOMContentUtils.java Test fails|borderStyle=solid}
 new String(htmlheadtitle title /title
 + /headbody
 + a href=\g\/a // no anchor
 + a href=\g1\ !--whitespace--  /a
 + a href=\g2\  img src=test.gif alt='bla bla' /a
 + /body/html), 
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira