Hi,

On Thu, Jun 23, 2011 at 11:59 AM, Denis Voloshin <[email protected]> wrote:
> Thanks for relaying,  the problem was indeed around  the way I was converting 
> the extracted  data as string to byte array.
> Bwt, is there way in Tika api  to obtain extracted data as InputStream and 
> not only as string from ContentHandler object.

Not as an InputStream (because of the encoding question), but you can
use the parseToString() and parse() methods of the
org.apache.tika.Tika facade to get a String or a java.io.Reader for
reading the extracted text.

Alternatively, if you want to output the extracted text to a Writer or
an OutputStream, you can use the WriteOutHandler class for that. To
explicitly specify the output encoding you want, use a
java.io.OutputStreamWriter wrapper around your output stream.

BR,

Jukka Zitting

Reply via email to