Re: About text extraction for index

jorgeeflorez . Fri, 23 Aug 2019 05:14:03 -0700

Hi Vikas,

thank you for your reply. I will try to change those parameters and see
what happens.
To answer one of my questions, I found that text is extracted only from pdf
if I add <mime>application/pdf</mime> to DefaultParser in the index Tika
config file.


Regards.
Jorge Flórez


El jue., 22 ago. 2019 a las 12:43, Vikas Saurabh (<vikas.saur...@gmail.com>)
escribió:

> Hi,
>
> > Is it possible to change the maximum time for that text extraction
>
> You should be able to configure timeout by setting
> -Doak.extraction.timeoutSeconds=120
> [0] on ivm command line.
>
> Alternatively, you could also disable running in different thread by
> setting -Doak.extraction.inCallerThread=true
>
> Hope that helps.
>
> [0]:
>
> http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/ExtractedTextCache.java?view=markup&pathrev=1814745#l61
>
> --Vikas
> (sent from mobile)
>

Re: About text extraction for index

Reply via email to