Hi, Thanks for the quick fix ! The value of the parameter "path" where you did the commit (parse method in Tikaresource class) is always set to "unpack/all" when I launched the indexation on the file share. Normally it should be the file path right ? I do not understand why it has this value.
Thanks, Best regards, Olivier > Le 11 oct. 2018 à 19:46, Tim Allison <[email protected]> a écrit : > > Doh. Sorry. I just added that in bf75e39. Please let us know what > else you find! > > Aside from the unit tests, I haven't had a chance to try to break the > -spawnChild option with our regression corpus. > On Thu, Oct 11, 2018 at 9:59 AM Olivier Tavard > <[email protected]> wrote: >> >> Hi, >> >> I have a question about the log into Tika and for Tika server specifically. >> We use Tika server for indexing millions of files into a Windows fileshare. >> To be more precise we use Apache ManifoldCF to crawl the files and the text >> extraction is done by Tika server 1.19. >> The spawnChild option is active. In case of very big files, we have somme >> OOM and the Tika server parent kills and restarts child process as it >> should. It works great, I just wanted to know if it would be possible to >> have into the Tika server child log the name of the file that caused the >> OOM. So far in the Tika log I can find the error and the date of the error >> but not the filename. I changed the log mode to debug but the filename did >> not appear neither. >> >> To find this information first I have to find the date and time of the >> restart of the child in the Tika server log. Then I open the log of Apache >> ManifoldCF and search into it at the date and time found before in the Tika >> log to finally find the problematic file sent to Tika. >> Did I miss something and the filename can be found on the Tika log ? If Tika >> could add the filename into its own log, it would be very helpful for us. >> >> Thanks, >> Best regards, >> Olivier
