Re: Pushing functionality to upstream projects (Was: [jira] Resolved: (TIKA-65) Add encode detection support for HTML parser)

Sami Siren Mon, 15 Oct 2007 10:27:17 -0700

Jukka Zitting wrote:
> Hi,
> 
> On 10/15/07, Sami Siren (JIRA) <[EMAIL PROTECTED]> wrote:
>> Add encode detection support for HTML parser
> 
> This feature sounds like something that the upstream HTML parser
> library might want to do. I'm not sure if NekoHTML is maintained
> anywhere, but if it was we should probably consider sending a patch
> for that.


Well I used the same piece of code that you used for txtparser so
detection/decoding is provided by icu4j and and not Tika. So I was
basically just using icu for getting properly decoded reader nothing
fancier than that.

Perhaps we could extract the decoding functionality into separate step
so it would be clearer (for both txt and html) and nothing will be
magically handled "under cover".

-- 
 Sami Siren

Re: Pushing functionality to upstream projects (Was: [jira] Resolved: (TIKA-65) Add encode detection support for HTML parser)

Reply via email to