Hi, On 10/15/07, Sami Siren (JIRA) <[EMAIL PROTECTED]> wrote: > Add encode detection support for HTML parser
This feature sounds like something that the upstream HTML parser library might want to do. I'm not sure if NekoHTML is maintained anywhere, but if it was we should probably consider sending a patch for that. More generally, IMHO the Tika parsers should optimally be lightweight adapters to the native interface of the underlying parsing library. Whenever we come across cases where we find ourselves adding non-trivial features within Tika to the Parser classes, we should at least consider sending the improvements as patches to the upstream parser projects. Otherwise we'll soon end up with tons of bug reports about the details of parsing specific content types. BR, Jukka Zitting
