I've tried it with multiple documents, of both .doc and .pdf types, so I am
inclined to believe the issue is not with the stream, unless it is an issue
with the sling method for retrieving the stream.

I have tika-core, tika-bundle and tika-parsers listed as dependencies
within my project. My console is listing that both tika-core and
tika-bundle are active bundles within the project. Is there a conflict
between tika-bundle and tika-parsers?


On Mon, Feb 18, 2013 at 9:52 AM, Jukka Zitting <[email protected]>wrote:

> Hi,
>
> On Mon, Feb 18, 2013 at 4:46 PM, Matthew Taylor <[email protected]>
> wrote:
> > Thanks for the response. Unfortunately, when I tried that, it returned an
> > empty string. The same thing happened when I tried parser.parse() and
> used
> > BodyContentHandler.toString().
> >
> > The input stream says that data is available, however, before it is
> passed
> > into Tika. Any other ideas?
>
> Perhaps the stream simply can't be parsed by Tika? Have you tried
>
>     java -jar tika-app-1.3.jar --text < /path/to/file
>
> on the document?
>
> Alternatively, if you're running Tika in an OSGi environment like
> Sling, do you have just tika-core deployed (AFAIUI that's the default
> with Sling)? The core bundle doesn't contain any parser components, so
> it won't be able to extract text from any documents. Deploying
> tika-bundle along with core should fix that.
>
> BR,
>
> Jukka Zitting
>



-- 
Matthew Taylor
Software Consultant
Behavioral Media Networks - http://launch.bmedianet.com/
Email: [email protected]

Reply via email to