[jira] [Comment Edited] (TIKA-3684) Extract text returns the text multiple times

2022-03-02 Thread Naama Hophstatder (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17500554#comment-17500554 ] Naama Hophstatder edited comment on TIKA-3684 at 3/3/22, 7:42 AM: -- Thanks

[jira] [Commented] (TIKA-3684) Extract text returns the text multiple times

2022-03-02 Thread Naama Hophstatder (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17500554#comment-17500554 ] Naama Hophstatder commented on TIKA-3684: - Thanks for your efforts, I took your xml config file

[GitHub] [tika] dependabot[bot] opened a new pull request #519: Bump commons-net from 3.7.2 to 3.8.0

2022-03-02 Thread GitBox
dependabot[bot] opened a new pull request #519: URL: https://github.com/apache/tika/pull/519 Bumps commons-net from 3.7.2 to 3.8.0. [![Dependabot compatibility

[GitHub] [tika] dependabot[bot] opened a new pull request #518: Bump build-helper-maven-plugin from 3.0.0 to 3.3.0

2022-03-02 Thread GitBox
dependabot[bot] opened a new pull request #518: URL: https://github.com/apache/tika/pull/518 Bumps [build-helper-maven-plugin](https://github.com/mojohaus/build-helper-maven-plugin) from 3.0.0 to 3.3.0. Release notes Sourced from

[jira] [Commented] (TIKA-3668) High CPU utilization in Tika 2.2.0

2022-03-02 Thread Manjunath Dhongadi (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17500510#comment-17500510 ] Manjunath Dhongadi commented on TIKA-3668: -- In performance testing, we measure both time and CPU

[GitHub] [tika] dependabot[bot] opened a new pull request #517: Bump bndlib from 1.50.0 to 2.0.0.20130123-133441

2022-03-02 Thread GitBox
dependabot[bot] opened a new pull request #517: URL: https://github.com/apache/tika/pull/517 Bumps [bndlib](https://github.com/bndtools/bnd) from 1.50.0 to 2.0.0.20130123-133441. Commits See full diff in https://github.com/bndtools/bnd/commits;>compare view

[jira] [Commented] (TIKA-3668) High CPU utilization in Tika 2.2.0

2022-03-02 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17500329#comment-17500329 ] Tim Allison commented on TIKA-3668: --- When you say performance testing, is it taking longer to process

[jira] [Commented] (TIKA-3668) High CPU utilization in Tika 2.2.0

2022-03-02 Thread Manjunath Dhongadi (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17500277#comment-17500277 ] Manjunath Dhongadi commented on TIKA-3668: -- We have observed this scenario during performance

[jira] [Created] (TIKA-3685) Update .gitattributes for eol to fix build on Windows with autocrlf

2022-03-02 Thread Tim Allison (Jira)
Tim Allison created TIKA-3685: - Summary: Update .gitattributes for eol to fix build on Windows with autocrlf Key: TIKA-3685 URL: https://issues.apache.org/jira/browse/TIKA-3685 Project: Tika

[jira] [Commented] (TIKA-3684) Extract text returns the text multiple times

2022-03-02 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17500173#comment-17500173 ] Tim Allison commented on TIKA-3684: --- I attached an example for turning off the WMFParser and the

[jira] [Updated] (TIKA-3684) Extract text returns the text multiple times

2022-03-02 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-3684: -- Attachment: tika-config-no-xmf.xml > Extract text returns the text multiple times >

[jira] [Commented] (TIKA-3684) Extract text returns the text multiple times

2022-03-02 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17500168#comment-17500168 ] Tim Allison commented on TIKA-3684: --- We could also parameterize the WMF and EMF parsers to turn off text

[jira] [Commented] (TIKA-3684) Extract text returns the text multiple times

2022-03-02 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17500170#comment-17500170 ] Tim Allison commented on TIKA-3684: --- Sorry, didn't see your response. bq. has no "text" meaning in

[jira] [Commented] (TIKA-3684) Extract text returns the text multiple times

2022-03-02 Thread Naama Hophstatder (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17500143#comment-17500143 ] Naama Hophstatder commented on TIKA-3684: - I see the results of the /rmeta endpoint, understand

[jira] [Comment Edited] (TIKA-3684) Extract text returns the text multiple times

2022-03-02 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17500066#comment-17500066 ] Tim Allison edited comment on TIKA-3684 at 3/2/22, 11:29 AM: - Thank you for

[jira] [Commented] (TIKA-3684) Extract text returns the text multiple times

2022-03-02 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17500066#comment-17500066 ] Tim Allison commented on TIKA-3684: --- If you use the /rmeta endpoint (attached), you can see that there's

[jira] [Updated] (TIKA-3684) Extract text returns the text multiple times

2022-03-02 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-3684: -- Attachment: example.json > Extract text returns the text multiple times >

[GitHub] [tika] tballison merged pull request #515: Bump aws.version from 1.12.164 to 1.12.169

2022-03-02 Thread GitBox
tballison merged pull request #515: URL: https://github.com/apache/tika/pull/515 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [tika] tballison merged pull request #516: Bump log4j2.version from 2.17.1 to 2.17.2

2022-03-02 Thread GitBox
tballison merged pull request #516: URL: https://github.com/apache/tika/pull/516 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: