I'm just getting started with Tika, and I tried the basic AutoDetectParser and the basic ParsingReader on a batch of a few thousand docx files (tika-app v1.0). On my laptop, I was able to extract text at a rate of 200 docs per minute. When I ran XWPFWordExtractor (poi 3.8) on the same docs, the rate was 1000 docs per minute. Is there a faster way to use Tika to extract text from a file? Is this performance difference expected and/or experienced by others?
Thank you.
