Nick, for debugging purposes I was calling 2 ways. Yes, Tika is on the class path and it can extract the metadata, the only thing it cannot extract is the documents contents even if it is a TXT file. I can see the full content of the file being uploaded on the bytes array.
I wonder if I have to have some logic to call the proper parser based on the metadata information. Such as, if it is a PDF then call the pdfparser, DOC docparser and so on. On Tue, Jun 10, 2014 at 8:30 AM, Nick Burch <[email protected]> wrote: > On Tue, 10 Jun 2014, Carlos Scheidecker wrote: > >> Parser parser = new AutoDetectParser(); // Should auto-detect! >> StringWriter textBuffer = new StringWriter(); >> BodyContentHandler handler = new BodyContentHandler(textBuffer); >> ParseContext context = new ParseContext(); >> parser.parse(is, handler, metadata, context); >> String content2 = textBuffer.toString(); >> String content1 = handler.toString(); >> >> Tika tk = new Tika(); >> String text = tk.parseToString(is, metadata); >> > > You appear to be calling tika twice, once via a parser explicitly, and > once via the Tika facade - did you really mean to do that? > > Otherwise, are you sure you have the tika parser jar on your classpath, > along with all of the dependencies? Try asking DefaultParser what Parsers > it knowns about, and ensure you've not lost any. The standalone tika-app > can tell you what ones to expect > > Nick >
