[jira] [Updated] (TIKA-2077) Special character extracted as AAAAAAAA in docx file extraction
[ https://issues.apache.org/jira/browse/TIKA-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akash Sudhakar updated TIKA-2077: - Priority: Major (was: Minor) > Special character extracted as in docx file extraction > --- > > Key: TIKA-2077 > URL: https://issues.apache.org/jira/browse/TIKA-2077 > Project: Tika > Issue Type: Bug >Affects Versions: 1.13 >Reporter: Akash Sudhakar > Attachments: TestData.docx > > > During docx file extraction using tika 1.13, special character is extracted > as . > How to avoid this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-2077) Special character extracted as AAAAAAAA in docx file extraction
[ https://issues.apache.org/jira/browse/TIKA-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akash Sudhakar updated TIKA-2077: - Attachment: TestData.docx Attached test file. Below is the code used. BodyContentHandler handler = new BodyContentHandler(); AutoDetectParser parser = new AutoDetectParser(); Metadata metadata = new Metadata(); InputStream stream = new BufferedInputStream(new FileInputStream(file)); parser.parse(stream, handler, metadata); > Special character extracted as in docx file extraction > --- > > Key: TIKA-2077 > URL: https://issues.apache.org/jira/browse/TIKA-2077 > Project: Tika > Issue Type: Bug >Affects Versions: 1.13 >Reporter: Akash Sudhakar >Priority: Minor > Attachments: TestData.docx > > > During docx file extraction using tika 1.13, special character is extracted > as . > How to avoid this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-2077) Special character extracted as AAAAAAAA in docx file extraction
[ https://issues.apache.org/jira/browse/TIKA-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akash Sudhakar updated TIKA-2077: - Affects Version/s: (was: 1.10) 1.13 > Special character extracted as in docx file extraction > --- > > Key: TIKA-2077 > URL: https://issues.apache.org/jira/browse/TIKA-2077 > Project: Tika > Issue Type: Bug >Affects Versions: 1.13 >Reporter: Akash Sudhakar >Priority: Minor > > During docx file extraction using tika 1.13, special character is extracted > as . > How to avoid this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-2077) Special character extracted as AAAAAAAA in docx file extraction
[ https://issues.apache.org/jira/browse/TIKA-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akash Sudhakar updated TIKA-2077: - Description: During docx file extraction using tika 1.13, special character is extracted as . How to avoid this. > Special character extracted as in docx file extraction > --- > > Key: TIKA-2077 > URL: https://issues.apache.org/jira/browse/TIKA-2077 > Project: Tika > Issue Type: Bug >Affects Versions: 1.13 >Reporter: Akash Sudhakar >Priority: Minor > > During docx file extraction using tika 1.13, special character is extracted > as . > How to avoid this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)