[jira] [Comment Edited] (TIKA-3841) 使用tika解析部分word文档出现异常,tika_exception
[ https://issues.apache.org/jira/browse/TIKA-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816035#comment-17816035 ] Lonzak edited comment on TIKA-3841 at 2/9/24 12:20 PM: --- My Chinese is a bit rusty so can someone change the title to: Exception when using tika to parse some Word documents, tika_exception ? Thanks was (Author: tom_1st): My chinese is a bit rusty so can someone change the title to: Exception when using tika to parse some Word documents, tika_exception ? Thanks > 使用tika解析部分word文档出现异常,tika_exception > --- > > Key: TIKA-3841 > URL: https://issues.apache.org/jira/browse/TIKA-3841 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.24, 2.4.1, 1.28.4 > Environment: h3. Java Version > java version "1.8.0_291" > h3. OS Version > Linux localhost.localdomain 3.10.0-957.el7.x86_64 > [#1|https://github.com/elastic/elasticsearch/issues/1] SMP Thu Nov 8 23:39:32 > UTC 2018 x86_64 x86_64 x86_64 GNU/Linux >Reporter: lxz >Priority: Blocker > > { > "error": { > "root_cause": [ > { "type": "parse_exception", "reason": "Error parsing > document in field [content]" } > ], > "type": "parse_exception", > "reason": "Error parsing document in field [content]", > "caused_by": { > "type": "tika_exception", > "reason": "Unexpected RuntimeException from > org.apache.tika.parser.microsoft.OfficeParser@3b5e180a", > "caused_by": > { "type": "array_index_out_of_bounds_exception", > "reason": "351" } > } > }, > "status": 400 > } -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (TIKA-3841) 使用tika解析部分word文档出现异常,tika_exception
[ https://issues.apache.org/jira/browse/TIKA-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17597261#comment-17597261 ] Tim Allison edited comment on TIKA-3841 at 8/29/22 3:42 PM: Agreed, [~tilman]. I've opened: https://bz.apache.org/bugzilla/show_bug.cgi?id=66245 . Thank you [~lxz] for sharing an example file. was (Author: talli...@mitre.org): Agreed, [~tilman]. I've opened: https://bz.apache.org/bugzilla/show_bug.cgi?id=66245 > 使用tika解析部分word文档出现异常,tika_exception > --- > > Key: TIKA-3841 > URL: https://issues.apache.org/jira/browse/TIKA-3841 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.24, 2.4.1, 1.28.4 > Environment: h3. Java Version > java version "1.8.0_291" > h3. OS Version > Linux localhost.localdomain 3.10.0-957.el7.x86_64 > [#1|https://github.com/elastic/elasticsearch/issues/1] SMP Thu Nov 8 23:39:32 > UTC 2018 x86_64 x86_64 x86_64 GNU/Linux >Reporter: lxz >Priority: Blocker > Attachments: 22030714121143428592.doc > > > { > "error": { > "root_cause": [ > { "type": "parse_exception", "reason": "Error parsing > document in field [content]" } > ], > "type": "parse_exception", > "reason": "Error parsing document in field [content]", > "caused_by": { > "type": "tika_exception", > "reason": "Unexpected RuntimeException from > org.apache.tika.parser.microsoft.OfficeParser@3b5e180a", > "caused_by": > { "type": "array_index_out_of_bounds_exception", > "reason": "351" } > } > }, > "status": 400 > } -- This message was sent by Atlassian Jira (v8.20.10#820010)