I get this error when indexing files with embedded Visio diagrams:
Exception in thread "main" org.apache.tika.exception.TikaException:
Unexpected RuntimeException from
org.apache.tika.parser.rtf.rtfpar...@1335b86
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:134)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99)
at cc.TikaCC.getDocument(TikaCC.java:94)
at cc.IndexCC.indexFile(IndexCC.java:120)
at cc.IndexCC.index(IndexCC.java:70)
at cc.TikaCC.main(TikaCC.java:67)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 42
at
javax.swing.text.rtf.RTFReader$AttributeTrackingDestination.handleKeyword(Unknown
Source)
at javax.swing.text.rtf.RTFReader.handleKeyword(Unknown Source)
at javax.swing.text.rtf.RTFParser.write(Unknown Source)
at javax.swing.text.rtf.RTFParser.write(Unknown Source)
at javax.swing.text.rtf.AbstractFilter.write(Unknown Source)
at javax.swing.text.rtf.AbstractFilter.readFromStream(Unknown Source)
at javax.swing.text.rtf.RTFEditorKit.read(Unknown Source)
at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:58)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:132)
I suppose it's the same problem as when parsing .vsd files directly.
Is it possible to simply disregards embedded visio diagrams and parse
only text from rtfs, docs...?
Has anyone found the solution to parsing Visio files without problems?
I know aboout the proposed patch, but I am either doing something
wrong or the patch is not working.
Any suggestions?