Jukka Zitting wrote: > Hi, > > On 10/15/07, Sami Siren (JIRA) <[EMAIL PROTECTED]> wrote: >> Add encode detection support for HTML parser > > This feature sounds like something that the upstream HTML parser > library might want to do. I'm not sure if NekoHTML is maintained > anywhere, but if it was we should probably consider sending a patch > for that.
Well I used the same piece of code that you used for txtparser so detection/decoding is provided by icu4j and and not Tika. So I was basically just using icu for getting properly decoded reader nothing fancier than that. Perhaps we could extract the decoding functionality into separate step so it would be clearer (for both txt and html) and nothing will be magically handled "under cover". -- Sami Siren
