This sounds like a Tika issue, let's move discussion to that list.
If you are still having problems after you upgrade to Tika 1.8, please at least
submit the stack traces (if you can) to the Tika jira. We may be able to find
a document that triggers that stack trace in govdocs1 or the slice of
Let's move this to the Tika users' list.
I'm aware that [1] is quite common in govdocs1, and it might (?) be the source
of your problem with MSWord files.
If you can share a stack trace, we'll be better able to diagnose.
Best,
Tim
[1]