Re: MalformedInputException on Linux with MsPowerPointTextExtractor

Marcel Reutegger Tue, 04 Nov 2008 05:03:06 -0800

Hi Justin,

I guess this is a bug. can you please file a jira issue and reveal more of the
stack trace? it does not show what part of jackrabbit causes the exception.


thanks

regards
 marcel

Justin Grunau wrote:
> Jackrabbit text extractors return Readers from their extractText methods.
> 
> In the case of PowerPoint files, I am finding that on Linux alone, I get the 
> following exception stack trace when I attempt to read anything from the 
> Reader 
> returns from the MsPowerPointTextExtractor.extractText method:
> 
> sun.io.MalformedInputException
>         at sun.io.ByteToCharUTF8.convert(ByteToCharUTF8.java:262)
>         at 
> sun.nio.cs.StreamDecoder$ConverterSD.convertInto(StreamDecoder.java:314)
>         at 
> sun.nio.cs.StreamDecoder$ConverterSD.implRead(StreamDecoder.java:345)
>         at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:250)
>         at sun.nio.cs.StreamDecoder.read0(StreamDecoder.java:199)
>         at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:185)
>         at java.io.InputStreamReader.read(InputStreamReader.java:196)
> 
> Of course I have no control over what encoding any PowerPoint documents 
> happen to be in (nor can I determine the encoding without using some sort of 
> parser to read the file).  I also know of no way to tell an InputStreamReader 
> what encoding to convert into.  It simply appears that whatever the default 
> encoding of the operating system is (in this case, UTF8) will be used.
> 
> As of now, I have no way to reliably use the Jackrabbit 
> MsPowerPointTextExtractor on Linux at all -- it works fine for me on Windows. 
>  Any suggestions?
> 
> 
> 
>       
> 
>

Re: MalformedInputException on Linux with MsPowerPointTextExtractor

Reply via email to