Hi, On Wed, Jun 22, 2011 at 1:37 PM, Denis Voloshin <[email protected]> wrote: > I'd like to verify either Tika doesn't support non-West European languages > or I'm just missing something in my client code.
Tika uses Unicode internally and should be able to handle pretty much all languages in the world with few problems. The output example you attached (with plenty of "?" characters) suggests that your default output encoding (see [1]) is not able to represent all these characters and simply falls back to the default "?" replacement character. [1] http://java.sun.com/javase/technologies/core/basic/intl/faq.jsp#default-encoding BR, Jukka Zitting
