Hello, I've been trying to extract text from a couple of different MS-Word files and I'm getting mixed results. Almost by random (as I see it) I get this error: java.lang.StringIndexOutOfBoundsException: String index out of range: -21047 at java.lang.AbstractStringBuilder.substring(AbstractStringBuilder.java:886) at java.lang.StringBuffer.substring(StringBuffer.java:417) at org.apache.poi.hwpf.model.TextPiece.substring(TextPiece.java:88) at org.apache.tika.parser.microsoft.WordParser.extractText(WordParser.java:163)
Looking at the TextPiece in POI I can see that the substring method is called with a negative value for end public String substring(int start, int end) { int denominator = _usesUnicode ? 2 : 1; return ((StringBuffer)_buf).substring(start/denominator, end/denominator); } I just can't see why / how runEnd - currentTextStart can end up being a negative value. String str = currentPiece.substring(0, runEnd - currentTextStart); Any ideas? Regards Mats