[
https://issues.apache.org/jira/browse/TIKA-58?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sami Siren updated TIKA-58:
---------------------------
Attachment: TIKA-58_2.diff
Modified according to comments from Jukka.
> Replace jtidy html parser with nekohtml based parser
> ----------------------------------------------------
>
> Key: TIKA-58
> URL: https://issues.apache.org/jira/browse/TIKA-58
> Project: Tika
> Issue Type: Improvement
> Components: general
> Reporter: Sami Siren
> Assignee: Sami Siren
> Priority: Minor
> Attachments: TIKA-58.diff, TIKA-58_2.diff
>
>
> Following patch will replace the JTidy based html parser with NekoHTML based
> sax parser. It only provides the same functionality that the JTidy based one
> (extracts a title into metadata) and passes other sax events through. Speed
> improvement is around 100%.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.