On Tue, 18 Mar 2014, optimusfan wrote:
Hello.  I am currently trying to use POI to extract text to be indexed in SOLR.  The sources include Word .doc and .docx files stored in a Sharepoint repository and accessed via a URL.

My issue is that whenever I call ExtractorFactory.createExtractor(inputstream) I receive the following exception:

java.lang.IllegalArgumentException: Your InputStream was neither an OLE2 
stream, nor an OOXML stream

Can you get a real url for the documents, without any sharepoint wrapping? (Something that OpenOffice can open for example)

Otherwise, try using something like CMIS (Apache Chemistry provides a library) to fetch the real file, which you can pass to POI

Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to