Re: HTML Text Extraction fails even with Jackrabbit 2.1

Jukka Zitting Fri, 30 Apr 2010 02:21:40 -0700

Hi,

On Thu, Apr 29, 2010 at 5:26 PM, Jawad Bokhari <[email protected]> wrote:
> Caused by: java.nio.charset.IllegalCharsetNameException:


It looks like the HTML documents you have use some character encoding
that's not supported by the underlying Java platform.

Can you file a bug about this in
https://issues.apache.org/jira/browse/TIKA for the Tika project that
Jackrabbit nowadays uses for full text extraction? It would be great
if you could also attach a troublesome HTML file to the bug report.

BR,

Jukka Zitting

Re: HTML Text Extraction fails even with Jackrabbit 2.1

Reply via email to