Hello, I am using latest version of Tika (with Tesseract).
Some of the words in embedded image in a Microsoft doc are mis-spelt. What is the best way to handle this? Can I extend Tika to read from a say cache having key-value pairs to correct the output of Tika? Please suggest. Thanks Naga
