I've tried it with multiple documents, of both .doc and .pdf types, so I am inclined to believe the issue is not with the stream, unless it is an issue with the sling method for retrieving the stream.
I have tika-core, tika-bundle and tika-parsers listed as dependencies within my project. My console is listing that both tika-core and tika-bundle are active bundles within the project. Is there a conflict between tika-bundle and tika-parsers? On Mon, Feb 18, 2013 at 9:52 AM, Jukka Zitting <[email protected]>wrote: > Hi, > > On Mon, Feb 18, 2013 at 4:46 PM, Matthew Taylor <[email protected]> > wrote: > > Thanks for the response. Unfortunately, when I tried that, it returned an > > empty string. The same thing happened when I tried parser.parse() and > used > > BodyContentHandler.toString(). > > > > The input stream says that data is available, however, before it is > passed > > into Tika. Any other ideas? > > Perhaps the stream simply can't be parsed by Tika? Have you tried > > java -jar tika-app-1.3.jar --text < /path/to/file > > on the document? > > Alternatively, if you're running Tika in an OSGi environment like > Sling, do you have just tika-core deployed (AFAIUI that's the default > with Sling)? The core bundle doesn't contain any parser components, so > it won't be able to extract text from any documents. Deploying > tika-bundle along with core should fix that. > > BR, > > Jukka Zitting > -- Matthew Taylor Software Consultant Behavioral Media Networks - http://launch.bmedianet.com/ Email: [email protected]
