RE: Very slow parsing of a few PDF files

2017-11-21 Thread Jim Idle
I didn't know that there was a ForkParser, but that might possibly be a significant overhead on the application - looks like it has a pool, though I don't know if it gives the ability to say kill a long running parser and restart the pool. I will look in to it: one thing I see already is that

RE: Very slow parsing of a few PDF files

2017-11-21 Thread Nick Burch
On Tue, 21 Nov 2017, Jim Idle wrote: Following up on this, I will try cancelling my thread based tasks after a pre-set time limit. That is only going to work if Tika and the underlying parsers behave correctly with the interrupted exception. Anyone had any success with that? I am mainly