Amit Humnabadkar created TIKA-2441:
--------------------------------------

             Summary: Unable to extract text present in a table inside a 
textbox in MS Word
                 Key: TIKA-2441
                 URL: https://issues.apache.org/jira/browse/TIKA-2441
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.15
         Environment: Windows, Linux, Apache tika 1.15 used with Apache 
Solr-6.6.0
            Reporter: Amit Humnabadkar


Hello,

I am using Tika-1.15 with Solr-6.6.0 to indexing and searching. This setup 
fails to index text present in a table inside a textbox in a word document.

A MS Word document contains two words - 
1. Germany - present in a table inside a textbox
2. Africa - present in a textbox

Germany is not getting indexed while Africa gets indexed successfully. Looks 
like Tika fails to extract the content present in table inside a textbox.

Please have a look.

Thanks,
Amit Humnabadkar
[^doc001.zip]





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to