Jukka Zitting wrote: > Agreed, it's a relatively simple and straightforward enhancement, but > wouldn't it be useful also for other users of NekoHTML?
I guess so, or alternatively as I previously mentioned make it completely separate (and visible) step of the process. > Also, how > about handling <meta http-equiv='Content-Type' > content='text/html;charset=...'> tags or <?xml version="1.0" > encoding="..."?> prefixes? I think neko already does some of this. But even if it does obey them the problem remains (atleast up to some point) because in the wild web people tend to put incorrect values for those hints. > IMHO concerns like that are a slippery slope that we should avoid > getting involved with within Tika. It's best if all such knowledge is > embedded in the external parser libraries we use. Yes, I agree. -- Sami Siren
