> but I am having a problem: the thread that processes the pdf file keeps
running, creating images and performing OCR. Is this supposed to happen?
TL;DR: yes, because there is no safe way to kill a thread
Yes that's supposed to happen. The reason this feature implemented was
because in most
Hi,
I increased the maximum time (I set 300) for the text extraction and tested
it using a pdf file with many pages. I get the timeout in the log in the
expected time:
2019-08-23 09:02:38,380 DEBUG
[org.apache.jackrabbit.oak.plugins.index.search.spi.binary.FulltextBinaryTextExtractor]
Hi Vikas,
thank you for your reply. I will try to change those parameters and see
what happens.
To answer one of my questions, I found that text is extracted only from pdf
if I add application/pdf to DefaultParser in the index Tika
config file.
Regards.
Jorge Flórez
El jue., 22 ago. 2019 a las
Hi,
> Is it possible to change the maximum time for that text extraction
You should be able to configure timeout by setting
-Doak.extraction.timeoutSeconds=120
[0] on ivm command line.
Alternatively, you could also disable running in different thread by
setting