[ https://issues.apache.org/jira/browse/TIKA-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656484#action_12656484 ]
Jukka Zitting commented on TIKA-182: ------------------------------------ After thinking about this a bit more, I find myself reluctant to apply this patch. Adding such a low-level extension point essentially prevents us from changing to some other parser library that doesn't generate those low-level SAX events. For example I wouldn't count out the possibility that at some point we'd want to replace NekoHTML with a higher level HTML parser that better expresses how the HTML content gets expressed to the user. > Allow clients to listen to the raw SAX events if available > ---------------------------------------------------------- > > Key: TIKA-182 > URL: https://issues.apache.org/jira/browse/TIKA-182 > Project: Tika > Issue Type: New Feature > Components: parser > Reporter: Jukka Zitting > Priority: Minor > > As discussed on the mailing list > (http://markmail.org/message/gojiffbhlcuifnzd) it would be nice to allow > clients to listen to the raw SAX events of an underlying XML-based (or -like) > document. > There's a proposed patch for the HTML parser in > http://markmail.org/message/l72v6ybf4jjrcp7p -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.