Nick, for debugging purposes I was calling 2 ways. Yes, Tika is on the
class path and it can extract the metadata, the only thing it cannot
extract is the documents contents even if it is a TXT file. I can see the
full content of the file being uploaded on the bytes array.

I wonder if I have to have some logic to call the proper parser based on
the metadata information. Such as, if it is a PDF then call the pdfparser,
DOC docparser and so on.


On Tue, Jun 10, 2014 at 8:30 AM, Nick Burch <[email protected]> wrote:

> On Tue, 10 Jun 2014, Carlos Scheidecker wrote:
>
>>    Parser parser = new AutoDetectParser(); // Should auto-detect!
>>    StringWriter textBuffer = new StringWriter();
>>    BodyContentHandler handler = new BodyContentHandler(textBuffer);
>>        ParseContext context = new ParseContext();
>>        parser.parse(is, handler, metadata, context);
>>        String content2 = textBuffer.toString();
>>        String content1 = handler.toString();
>>
>>        Tika tk = new Tika();
>> String text = tk.parseToString(is, metadata);
>>
>
> You appear to be calling tika twice, once via a parser explicitly, and
> once via the Tika facade - did you really mean to do that?
>
> Otherwise, are you sure you have the tika parser jar on your classpath,
> along with all of the dependencies? Try asking DefaultParser what Parsers
> it knowns about, and ensure you've not lost any. The standalone tika-app
> can tell you what ones to expect
>
> Nick
>

Reply via email to