Re: [EXTERNAL] Re: Subset(s) of Tika?

2023-01-05 Thread Chris Mattmann
Not sure of your operating environment, but if you are using Python you can also use http://github.com/chrismattmann/tika-python by doing ‘pip install tika’ and then from there you have a python wrapper around the latest Tika server (2.6.0). Thanks, Chris From: Bridger

Re: Subset(s) of Tika?

2023-01-05 Thread Bridger Dyson-Smith
Hi Nick and Georg On Thu, Jan 5, 2023 at 9:34 AM Nick Burch wrote: > On Thu, 5 Jan 2023, Georg.Fischer wrote: > > The tika.jar has >54 MB, and I suspect that the loading of the big jar > > (under Windows) is hindering the performance. I should perhaps move to > > Linux, or try the Tika server.

Re: Subset(s) of Tika?

2023-01-05 Thread Nick Burch
On Thu, 5 Jan 2023, Georg.Fischer wrote: The tika.jar has >54 MB, and I suspect that the loading of the big jar (under Windows) is hindering the performance. I should perhaps move to Linux, or try the Tika server. The Tika App jar has always been the "kitchen sink included quickstart" option

Subset(s) of Tika?

2023-01-05 Thread Georg.Fischer
I used a recent tika.jar on the Windows 10 commandline to extract text from some 30 PDF files, with a makefile converting one file per command. That was quite successful, but it took some time, and the approach will perhaps not be appropriate for 300 or 1000 PDFs. The tika.jar has >54 MB, and I