Pushing functionality to upstream projects (Was: [jira] Resolved: (TIKA-65) Add encode detection support for HTML parser)

Jukka Zitting Mon, 15 Oct 2007 09:28:57 -0700

Hi,

On 10/15/07, Sami Siren (JIRA) <[EMAIL PROTECTED]> wrote:
> Add encode detection support for HTML parser


This feature sounds like something that the upstream HTML parser
library might want to do. I'm not sure if NekoHTML is maintained
anywhere, but if it was we should probably consider sending a patch
for that.

More generally, IMHO the Tika parsers should optimally be lightweight
adapters to the native interface of the underlying parsing library.
Whenever we come across cases where we find ourselves adding
non-trivial features within Tika to the Parser classes, we should at
least consider sending the improvements as patches to the upstream
parser projects. Otherwise we'll soon end up with tons of bug reports
about the details of parsing specific content types.

BR,

Jukka Zitting

Pushing functionality to upstream projects (Was: [jira] Resolved: (TIKA-65) Add encode detection support for HTML parser)

Reply via email to