Re: How to configure Apache Tika in a kube environment to obtain maximum throughput when parsing a massive number of documents?

2020-11-25 Thread Luís Filipe Nassif
Not what you asked but related :) Luis Em qua, 25 de nov de 2020 23:20, Luís Filipe Nassif escreveu: > I've done some few improvements in ForkParser performance in an internal > fork. Will try to contribute upstream... > > Em seg, 23 de nov de 2020 12:05, Nicholas DiPiazza < >

Re: [ANNOUNCE] Welcome Peter Lee as Tika PMC member and committer

2020-11-25 Thread Chris Mattmann
Welcome Peter!  From: Peter Lee Reply-To: Date: Wednesday, November 25, 2020 at 6:08 PM To: "dev@tika.apache.org" , "talli...@apache.org" Cc: "u...@tika.apache.org" Subject: Re: [ANNOUNCE] Welcome Peter Lee as Tika PMC member and committer Many thanks to you, Tim. :) Hi,

Re: How to configure Apache Tika in a kube environment to obtain maximum throughput when parsing a massive number of documents?

2020-11-25 Thread Luís Filipe Nassif
I've done some few improvements in ForkParser performance in an internal fork. Will try to contribute upstream... Em seg, 23 de nov de 2020 12:05, Nicholas DiPiazza < nicholas.dipia...@gmail.com> escreveu: > I am attempting to Tika parse dozens of millions of office documents. Pdfs, > docs,

[jira] [Commented] (TIKA-3221) /rmeta/text endpoint - allow a "max parse time" parameter where after exceeded, return bytes/metadata mangaed to get up to that point

2020-11-25 Thread Jira
[ https://issues.apache.org/jira/browse/TIKA-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239012#comment-17239012 ] Luís Filipe Nassif commented on TIKA-3221: -- Sorry! Misunderstood the request. I use ForkParser

Re: [ANNOUNCE] Welcome Peter Lee as Tika PMC member and committer

2020-11-25 Thread Peter Lee
Many thanks to you, Tim. :) Hi, all I'm Peter Lee and I was a Apache Commons committer. I'm familiar with many archivers and compressors. Feel free to ask me if you have some problems in compression. I'm honored to be part of Tika. Tika is great and it helped me a lot. Besides, Tika is a great

Re: [VOTE] Release Apache Tika 1.25 Candidate #2

2020-11-25 Thread Ken Krugler
+1 Thanks Tim. — Ken > On Nov 25, 2020, at 4:20 AM, Tim Allison wrote: > > A candidate for the Tika 1.25 release is available at: > https://dist.apache.org/repos/dist/dev/tika/ > > > The release candidate is a zip archive of the sources in: >

Re: [VOTE] Release Apache Tika 1.25 Candidate #2

2020-11-25 Thread Dave Meikle
On Wed, 25 Nov 2020 at 12:20, Tim Allison wrote: > Please vote on releasing this package as Apache Tika 1.25. > The vote is open for the next 72 hours and passes if a majority of at > least three +1 Tika PMC votes are cast. > > [ ] +1 Release this package as Apache Tika 1.25 > [ ] -1 Do not

[ANNOUNCE] Welcome Peter Lee as Tika PMC member and committer

2020-11-25 Thread Tim Allison
All, The Tika PMC has elected to add Peter Lee to our ranks. Lee, Please introduce yourself, and welcome aboard! Cheers, Tim

[VOTE] Release Apache Tika 1.25 Candidate #2

2020-11-25 Thread Tim Allison
A candidate for the Tika 1.25 release is available at: https://dist.apache.org/repos/dist/dev/tika/ The release candidate is a zip archive of the sources in: https://github.com/apache/tika/tree/1.25-rc2/ The SHA-512 checksum of the archive is