Folks,
I was trying to upgrade to Tika 1.0 and found I could break tiak-app
with some MSG files :-(
I have a Windows (Outlook) .msg file with an attached PDF which parses
in Tika-app 0.7, 0.9, 0.10
but in Tika-app 1.0 I get a stack trace.
<error>
Apache Tika was unable to parse the document
at \\....XYZ.msg
The full exception stack trace is included below:
org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.microsoft.OfficeParser@57284c88
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at org.apache.tika.gui.TikaGUI.handleStream(TikaGUI.java:320)
at org.apache.tika.gui.TikaGUI.openFile(TikaGUI.java:279)
at
org.apache.tika.gui.ParsingTransferHandler.importFiles(ParsingTransferHandler.java:94)
at
org.apache.tika.gui.ParsingTransferHandler.importData(ParsingTransferHandler.java:77)
[rest of stack trace removed]
</error>
I note that it says ...microsoft.OfficeParser, so I'm guessing it is in
the message where it is falling over.
Is there anything I could do to configure the app?
Every version of the tika-app is started with the trivial command
similar to C:\dev\tools\Tika\1.0\tika-app-1.0.jar -g
and I drag and drop onto it.
Interestingly enough running it from the command line, results in what
looks like good output for all possible switches -m, -t, -x, -h
-Paul