Just to confirm the miss spelt words, are if you open the word doc do
you see them spelt the same way?

e.g.
1) is the word doc wrong
2) tika is renaming something incorrectly?

as it it's (2), then i would patch tika to correct the parser.

if it's (1) then i would extend the current parser being used, which
can then do what you need.



On 24 August 2017 at 01:34, Naga Vijay <[email protected]> wrote:
>
> (+) [email protected]
>
> On Mon, Aug 21, 2017 at 4:14 PM, Naga Vijay <[email protected]> wrote:
>>
>> Hello,
>>
>> I am using latest version of Tika (with Tesseract).
>>
>> Some of the words in embedded image in a Microsoft doc are mis-spelt in
>> the Tika output.
>>
>> What is the best way to handle this?
>>
>> Can I extend Tika to read from a cache having key-value pairs to correct
>> the output of Tika?
>>
>> Please suggest.
>>
>> Thanks
>> Naga
>
>

Reply via email to