FW: Default Tika extraction of docx 5X slower than XWPFWordExtractor?

Allison, Timothy B. Fri, 20 Jan 2012 04:43:26 -0800

  I'm just getting started with Tika, and I tried the basic AutoDetectParser 
and the basic ParsingReader on a batch of a few thousand docx files (tika-app 
v1.0).  On my laptop, I was able to extract text at a rate of 200 docs per 
minute.  When I ran XWPFWordExtractor (poi 3.8) on the same docs, the rate was 
1000 docs per minute.  Is there a faster way to use Tika to extract text from a 
file?  Is this performance difference expected and/or experienced by others?


     Thank you.

FW: Default Tika extraction of docx 5X slower than XWPFWordExtractor?

Reply via email to