Hi Justin, I guess this is a bug. can you please file a jira issue and reveal more of the stack trace? it does not show what part of jackrabbit causes the exception.
thanks regards marcel Justin Grunau wrote: > Jackrabbit text extractors return Readers from their extractText methods. > > In the case of PowerPoint files, I am finding that on Linux alone, I get the > following exception stack trace when I attempt to read anything from the > Reader > returns from the MsPowerPointTextExtractor.extractText method: > > sun.io.MalformedInputException > at sun.io.ByteToCharUTF8.convert(ByteToCharUTF8.java:262) > at > sun.nio.cs.StreamDecoder$ConverterSD.convertInto(StreamDecoder.java:314) > at > sun.nio.cs.StreamDecoder$ConverterSD.implRead(StreamDecoder.java:345) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:250) > at sun.nio.cs.StreamDecoder.read0(StreamDecoder.java:199) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:185) > at java.io.InputStreamReader.read(InputStreamReader.java:196) > > Of course I have no control over what encoding any PowerPoint documents > happen to be in (nor can I determine the encoding without using some sort of > parser to read the file). I also know of no way to tell an InputStreamReader > what encoding to convert into. It simply appears that whatever the default > encoding of the operating system is (in this case, UTF8) will be used. > > As of now, I have no way to reliably use the Jackrabbit > MsPowerPointTextExtractor on Linux at all -- it works fine for me on Windows. > Any suggestions? > > > > > >
