[jira] [Comment Edited] (TIKA-3138) PDF parser with XFA produce malformed XML

2021-10-12 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17427623#comment-17427623 ] Radim Rehurek edited comment on TIKA-3138 at 10/12/21, 11:13 AM: ---

[jira] [Comment Edited] (TIKA-3138) PDF parser with XFA produce malformed XML

2021-10-12 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17427623#comment-17427623 ] Radim Rehurek edited comment on TIKA-3138 at 10/12/21, 11:13 AM: ---

[jira] [Comment Edited] (TIKA-3138) PDF parser with XFA produce malformed XML

2021-10-12 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17427623#comment-17427623 ] Radim Rehurek edited comment on TIKA-3138 at 10/12/21, 11:13 AM: ---

[jira] [Comment Edited] (TIKA-3138) PDF parser with XFA produce malformed XML

2021-10-12 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17427623#comment-17427623 ] Radim Rehurek edited comment on TIKA-3138 at 10/12/21, 11:06 AM: ---

[jira] [Comment Edited] (TIKA-3138) PDF parser with XFA produce malformed XML

2021-10-12 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17427623#comment-17427623 ] Radim Rehurek edited comment on TIKA-3138 at 10/12/21, 11:05 AM: ---

[jira] [Comment Edited] (TIKA-3138) PDF parser with XFA produce malformed XML

2021-10-12 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17427623#comment-17427623 ] Radim Rehurek edited comment on TIKA-3138 at 10/12/21, 11:05 AM: ---

[jira] [Commented] (TIKA-3138) PDF parser with XFA produce malformed XML

2021-10-12 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17427623#comment-17427623 ] Radim Rehurek commented on TIKA-3138: - [~tallison] I don't see TIKA-3138 anywhere in [

[jira] [Comment Edited] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-28 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118916#comment-17118916 ] Radim Rehurek edited comment on TIKA-3103 at 5/28/20, 5:21 PM: -

[jira] [Comment Edited] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-28 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118916#comment-17118916 ] Radim Rehurek edited comment on TIKA-3103 at 5/28/20, 5:21 PM: -

[jira] [Commented] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-28 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118916#comment-17118916 ] Radim Rehurek commented on TIKA-3103: - We do use Linux, so your option 1) would work.

[jira] [Comment Edited] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112356#comment-17112356 ] Radim Rehurek edited comment on TIKA-3103 at 5/20/20, 3:45 PM: -

[jira] [Comment Edited] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112356#comment-17112356 ] Radim Rehurek edited comment on TIKA-3103 at 5/20/20, 3:44 PM: -

[jira] [Comment Edited] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112356#comment-17112356 ] Radim Rehurek edited comment on TIKA-3103 at 5/20/20, 3:44 PM: -

[jira] [Comment Edited] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112356#comment-17112356 ] Radim Rehurek edited comment on TIKA-3103 at 5/20/20, 3:40 PM: -

[jira] [Comment Edited] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112356#comment-17112356 ] Radim Rehurek edited comment on TIKA-3103 at 5/20/20, 3:37 PM: -

[jira] [Comment Edited] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112356#comment-17112356 ] Radim Rehurek edited comment on TIKA-3103 at 5/20/20, 3:37 PM: -

[jira] [Comment Edited] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112356#comment-17112356 ] Radim Rehurek edited comment on TIKA-3103 at 5/20/20, 3:34 PM: -

[jira] [Comment Edited] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112356#comment-17112356 ] Radim Rehurek edited comment on TIKA-3103 at 5/20/20, 3:33 PM: -

[jira] [Comment Edited] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112356#comment-17112356 ] Radim Rehurek edited comment on TIKA-3103 at 5/20/20, 3:33 PM: -

[jira] [Comment Edited] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112356#comment-17112356 ] Radim Rehurek edited comment on TIKA-3103 at 5/20/20, 3:32 PM: -

[jira] [Comment Edited] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112356#comment-17112356 ] Radim Rehurek edited comment on TIKA-3103 at 5/20/20, 3:32 PM: -

[jira] [Updated] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Rehurek updated TIKA-3103: Attachment: (was: Screen Shot 2020-05-20 at 17.30.04.png) > Tesseract fails to respect timeouts

[jira] [Updated] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Rehurek updated TIKA-3103: Attachment: (was: Screen Shot 2020-05-20 at 17.28.45.png) > Tesseract fails to respect timeouts

[jira] [Updated] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Rehurek updated TIKA-3103: Attachment: Screen Shot 2020-05-20 at 17.30.04.png > Tesseract fails to respect timeouts and clean u

[jira] [Commented] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112356#comment-17112356 ] Radim Rehurek commented on TIKA-3103: - I take it back. There are still Tesseract proce

[jira] [Updated] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Rehurek updated TIKA-3103: Attachment: Screen Shot 2020-05-20 at 17.28.45.png > Tesseract fails to respect timeouts and clean u

[jira] [Comment Edited] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112301#comment-17112301 ] Radim Rehurek edited comment on TIKA-3103 at 5/20/20, 2:48 PM: -

[jira] [Comment Edited] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112301#comment-17112301 ] Radim Rehurek edited comment on TIKA-3103 at 5/20/20, 2:47 PM: -

[jira] [Commented] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112301#comment-17112301 ] Radim Rehurek commented on TIKA-3103: - FYI, in case anyone hits this in the future: se

[jira] [Comment Edited] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112161#comment-17112161 ] Radim Rehurek edited comment on TIKA-3103 at 5/20/20, 1:11 PM: -

[jira] [Updated] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Rehurek updated TIKA-3103: Description: We're using the Tika Server with OCR: _java -jar /opt/tika/tika-server-1.24.1.jar -p 9

[jira] [Updated] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Rehurek updated TIKA-3103: Description: We're using the Tika Server with OCR: _java -jar /opt/tika/tika-server-1.24.1.jar -p 9

[jira] [Comment Edited] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112161#comment-17112161 ] Radim Rehurek edited comment on TIKA-3103 at 5/20/20, 12:55 PM:

[jira] [Comment Edited] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112157#comment-17112157 ] Radim Rehurek edited comment on TIKA-3103 at 5/20/20, 12:55 PM:

[jira] [Commented] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112161#comment-17112161 ] Radim Rehurek commented on TIKA-3103: - I confirm reducing the `timeout` Tesseract para

[jira] [Commented] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112157#comment-17112157 ] Radim Rehurek commented on TIKA-3103: - Thanks for the quick response Tim. > {{apache-

[jira] [Updated] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Rehurek updated TIKA-3103: Description: We're using the Tika Server with OCR: _java -jar /opt/tika/tika-server-1.24.1.jar -p 9

[jira] [Updated] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Rehurek updated TIKA-3103: Description: We're using the Tika Server with OCR: _java -jar /opt/tika/tika-server-1.24.1.jar -p 9

[jira] [Updated] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Rehurek updated TIKA-3103: Affects Version/s: (was: 1.22) > Tesseract fails to respect timeouts and clean up after itself >

[jira] [Updated] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Rehurek updated TIKA-3103: Description: We're using the Tika Server with OCR: _java -jar /opt/tika/tika-server-1.24.1.jar -p 9

[jira] [Updated] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Rehurek updated TIKA-3103: Affects Version/s: 1.22 > Tesseract fails to respect timeouts and clean up after itself > --

[jira] [Updated] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Rehurek updated TIKA-3103: Description: We're using the Tika Server with OCR: _java -jar /opt/tika/tika-server-1.24.1.jar -p 9

[jira] [Updated] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Rehurek updated TIKA-3103: Description: We're using the Tika Server with OCR: _java -jar /pii_tools/tika/tika-server-1.24.1.ja

[jira] [Updated] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Rehurek updated TIKA-3103: Description: We're using the Tika Server with OCR: _java -jar /pii_tools/tika/tika-server-1.24.1.ja

[jira] [Updated] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Rehurek updated TIKA-3103: Description: We're using the Tika Server with OCR: _java -jar /pii_tools/tika/tika-server-1.24.1.ja

[jira] [Updated] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Rehurek updated TIKA-3103: Description: We're using the Tika Server with OCR: _java -jar /pii_tools/tika/tika-server-1.24.1.ja

[jira] [Updated] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Rehurek updated TIKA-3103: Description: We're using the Tika Server with OCR: _java -jar /pii_tools/tika/tika-server-1.24.1.ja

[jira] [Created] (TIKA-3103) Tesseract fails to respect timeouts and clean up after itself

2020-05-20 Thread Radim Rehurek (Jira)
Radim Rehurek created TIKA-3103: --- Summary: Tesseract fails to respect timeouts and clean up after itself Key: TIKA-3103 URL: https://issues.apache.org/jira/browse/TIKA-3103 Project: Tika Issue

[jira] [Comment Edited] (TIKA-1020) Excel 2010 parser missing cell values are not reported resulting in missing columns values

2018-03-07 Thread Radim Rehurek (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16389884#comment-16389884 ] Radim Rehurek edited comment on TIKA-1020 at 3/7/18 5:57 PM: - W

[jira] [Comment Edited] (TIKA-1020) Excel 2010 parser missing cell values are not reported resulting in missing columns values

2018-03-07 Thread Radim Rehurek (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16389884#comment-16389884 ] Radim Rehurek edited comment on TIKA-1020 at 3/7/18 5:57 PM: - W

[jira] [Comment Edited] (TIKA-1020) Excel 2010 parser missing cell values are not reported resulting in missing columns values

2018-03-07 Thread Radim Rehurek (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16389884#comment-16389884 ] Radim Rehurek edited comment on TIKA-1020 at 3/7/18 5:56 PM: - W

[jira] [Commented] (TIKA-1020) Excel 2010 parser missing cell values are not reported resulting in missing columns values

2018-03-07 Thread Radim Rehurek (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16389884#comment-16389884 ] Radim Rehurek commented on TIKA-1020: - We just hit this bug too. I say "bug" because E