On Tue, 18 Mar 2014, optimusfan wrote:
Hello. I am currently trying to use POI to extract text to be indexed
in SOLR. The sources include Word .doc and .docx files stored in a
Sharepoint repository and accessed via a URL.
My issue is that whenever I
call ExtractorFactory.createExtractor(inputstream) I receive the
following exception:
java.lang.IllegalArgumentException: Your InputStream was neither an OLE2
stream, nor an OOXML stream
Can you get a real url for the documents, without any sharepoint wrapping?
(Something that OpenOffice can open for example)
Otherwise, try using something like CMIS (Apache Chemistry provides a
library) to fetch the real file, which you can pass to POI
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]