HtmlParser should strip linefeeds out of links
----------------------------------------------

                 Key: TIKA-381
                 URL: https://issues.apache.org/jira/browse/TIKA-381
             Project: Tika
          Issue Type: Improvement
          Components: parser
    Affects Versions: 0.6
            Reporter: Ken Krugler
            Assignee: Ken Krugler


A number of HTML pages contain links where the URL has a linefeed in the middle 
of it.

Browsers such as Firefox will automatically remove the character but Tika 
passes it back, which results in a broken URL.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to