Hi,

Thanks for the quick fix !
The value of the parameter "path" where you did the commit (parse method in 
Tikaresource class) is always set to "unpack/all" when I launched the 
indexation on the file share. Normally it should be the file path right ? I do 
not understand why it has this value.

Thanks,
Best regards,

Olivier 


> Le 11 oct. 2018 à 19:46, Tim Allison <[email protected]> a écrit :
> 
> Doh. Sorry.  I just added that in bf75e39.  Please let us know what
> else you find!
> 
> Aside from the unit tests, I haven't had a chance to try to break the
> -spawnChild option with our regression corpus.
> On Thu, Oct 11, 2018 at 9:59 AM Olivier Tavard
> <[email protected]> wrote:
>> 
>> Hi,
>> 
>> I have a question about the log into Tika and for Tika server specifically.
>> We use Tika server for indexing millions of files into a Windows fileshare. 
>> To be more precise we use Apache ManifoldCF to crawl the files and the text 
>> extraction is done by Tika server 1.19.
>> The spawnChild option is active. In case of very big files, we have somme 
>> OOM and the Tika server parent kills and restarts child process as it 
>> should. It works great, I just wanted to know if it would be possible to 
>> have into the Tika server child log the name of the file that caused the 
>> OOM. So far in the Tika log I can find the error and the date of the error 
>> but not the filename. I changed the log mode to debug but the filename did 
>> not appear neither.
>> 
>> To find this information first I have to find the date and time of the 
>> restart of the child in the Tika server log.  Then I open the log of Apache 
>> ManifoldCF and search into it at the date and time found before in the Tika 
>> log  to finally find the problematic file sent to Tika.
>> Did I miss something and the filename can be found on the Tika log ? If Tika 
>> could add the filename into its own log, it would be very helpful for us.
>> 
>> Thanks,
>> Best regards,
>> Olivier

Reply via email to