Hi,

On Wed, Jun 22, 2011 at 1:37 PM, Denis Voloshin <[email protected]> wrote:
> I'd like to verify either Tika doesn't support  non-West European languages
> or I'm just missing something in my client  code.

Tika uses Unicode internally and should be able to handle pretty much
all languages in the world with few problems.

The output example you attached (with plenty of "?" characters)
suggests that your default output encoding (see [1]) is not able to
represent all these characters and simply falls back to the default
"?" replacement character.

[1] 
http://java.sun.com/javase/technologies/core/basic/intl/faq.jsp#default-encoding

BR,

Jukka Zitting

Reply via email to