[jira] [Commented] (TIKA-3004) OutlookPSTParser missing emails attached to other emails

2020-11-26 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239454#comment-17239454 ] Hudson commented on TIKA-3004: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #56 (See

[jira] [Commented] (TIKA-3004) OutlookPSTParser missing emails attached to other emails

2020-11-26 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239453#comment-17239453 ] Hudson commented on TIKA-3004: -- SUCCESS: Integrated in Jenkins build Tika » tika-branch1x-jdk8 #38 (See

[jira] [Commented] (TIKA-3004) OutlookPSTParser missing emails attached to other emails

2020-11-26 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239444#comment-17239444 ] Hudson commented on TIKA-3004: -- UNSTABLE: Integrated in Jenkins build Tika » tika-branch1x-jdk8 #37 (See

[jira] [Resolved] (TIKA-3004) OutlookPSTParser missing emails attached to other emails

2020-11-26 Thread Jira
[ https://issues.apache.org/jira/browse/TIKA-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luís Filipe Nassif resolved TIKA-3004. -- Fix Version/s: 2.0 Resolution: Fixed resolved by 

[jira] [Commented] (TIKA-3237) Great optimization in ForkParser

2020-11-26 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239429#comment-17239429 ] Hudson commented on TIKA-3237: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #55 (See

[jira] [Updated] (TIKA-3238) RTFParser fails to generate full content of an RTF file that has been generated in libreoffice

2020-11-26 Thread Bruno (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruno updated TIKA-3238: Flags: Important Description: Some RTF files, when created in libreoffice writer seem to not be parsed

[jira] [Created] (TIKA-3238) RTFParser fails to generate full content of an RTF file that has been generated in libreoffice

2020-11-26 Thread Bruno (Jira)
Bruno created TIKA-3238: --- Summary: RTFParser fails to generate full content of an RTF file that has been generated in libreoffice Key: TIKA-3238 URL: https://issues.apache.org/jira/browse/TIKA-3238 Project:

Re: How to configure Apache Tika in a kube environment to obtain maximum throughput when parsing a massive number of documents?

2020-11-26 Thread Luís Filipe Nassif
Yes, tika-server is the long way choice, as discussed in user's list recent thread. I hope I will have time in the future to migrate to it to get rid of jar hell problems definitely... Em qui., 26 de nov. de 2020 às 14:32, Nicholas DiPiazza < nicholas.dipia...@gmail.com> escreveu: > I created a

Re: [ANNOUNCE] Welcome Peter Lee as Tika PMC member and committer

2020-11-26 Thread Luís Filipe Nassif
Thank you, Peter, for all your contributions and welcome! Em qua., 25 de nov. de 2020 às 23:21, Chris Mattmann escreveu: > Welcome Peter!  > > > > > > > > *From: *Peter Lee > *Reply-To: * > *Date: *Wednesday, November 25, 2020 at 6:08 PM > *To: *"dev@tika.apache.org" , "talli...@apache.org" <

[jira] [Commented] (TIKA-3237) Great optimization in ForkParser

2020-11-26 Thread Jira
[ https://issues.apache.org/jira/browse/TIKA-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239412#comment-17239412 ] Luís Filipe Nassif commented on TIKA-3237: -- Should I open another issue for the

[jira] [Commented] (TIKA-3237) Great optimization in ForkParser

2020-11-26 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239411#comment-17239411 ] Hudson commented on TIKA-3237: -- UNSTABLE: Integrated in Jenkins build Tika » tika-branch1x-jdk8 #36 (See

[jira] [Resolved] (TIKA-3237) Great optimization in ForkParser

2020-11-26 Thread Jira
[ https://issues.apache.org/jira/browse/TIKA-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luís Filipe Nassif resolved TIKA-3237. -- Fix Version/s: 2.0 Resolution: Fixed closed by 

[jira] [Updated] (TIKA-3237) Great optimization in ForkParser

2020-11-26 Thread Jira
[ https://issues.apache.org/jira/browse/TIKA-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luís Filipe Nassif updated TIKA-3237: - Description: There is a huge overhead in ForkParser ContentHandlerProxy and

[jira] [Created] (TIKA-3237) Great optimization in ForkParser

2020-11-26 Thread Jira
Luís Filipe Nassif created TIKA-3237: Summary: Great optimization in ForkParser Key: TIKA-3237 URL: https://issues.apache.org/jira/browse/TIKA-3237 Project: Tika Issue Type: Improvement

Re: How to configure Apache Tika in a kube environment to obtain maximum throughput when parsing a massive number of documents?

2020-11-26 Thread Nicholas DiPiazza
I created a tika fork example I want to add to the documentation as well: https://github.com/nddipiazza/tika-fork-parser-example When we submit your fixes, we should update this example with multi-threading. On Thu, Nov 26, 2020 at 11:28 AM Nicholas DiPiazza < nicholas.dipia...@gmail.com> wrote:

Re: How to configure Apache Tika in a kube environment to obtain maximum throughput when parsing a massive number of documents?

2020-11-26 Thread Nicholas DiPiazza
Hey Luis, It is related because after your fixes I might be able to take some significant performance advantage by switching to fork parser. I would make great use of an example of someone else who has set up a ForkParser multi-thread able processing program that can gracefully handle the huge