Hi Nick and Georg

On Thu, Jan 5, 2023 at 9:34 AM Nick Burch <[email protected]> wrote:

> On Thu, 5 Jan 2023, Georg.Fischer wrote:
> > The tika.jar has >54 MB, and I suspect that the loading of the big jar
> > (under Windows) is hindering the performance. I should perhaps move to
> > Linux, or try the Tika server.
>
> The Tika App jar has always been the "kitchen sink included quickstart"
> option
>
> The Tika java library, and the Tika Server both support including or
> excluding groups of file format parsers
>
> > I used a recent tika.jar on the Windows 10 commandline to extract text
> > from some 30 PDF files, with a makefile converting one file per command.
> > That was quite successful, but it took some time, and the approach will
> > perhaps not be appropriate for 300 or 1000 PDFs.
>
> For a folder of files, you might be better off with Tika Batch, which is
> aimed at batch processing a large number of files. It can respawn failed
> child processes, doesn't require starting a JVM every file etc
>
> Otherwise, the Tika Server is a good option. If you're doing everything
> locally, turn on "-enableUnsecureFeatures -enableFileUrl" and then you can
> pass it a file path to process (but not on a publically available
> machine!)
>
> Now that's a neat trick - I was just going to suggest the Server but those
switches are
definitely something to add to my notes. Also, thanks for suggesting Tika
Batch - I didn't
know about that either.


> Nick
>

Best,
Bridger

Reply via email to