Hi,
While running a job with file-system repository and solr as an
output conneciton, with tika transformation in between, we see that a
tar.gz file is being extracted again and again without going to Solr
ingestion phase. We are seeing the following in the history screen:
05-30-2018 10:45:22.659 extract [TikaTransformer] file: /file1.. ...
Projects/ImageProcessing/Girod/public_package.tar.gz
OK 3544906667 503767
05-30-2018 10:37:11.598 extract [TikaTransformer] file:/file1..
Projects/ImageProcessing/Girod/public_package.tar.gz
OK 3544906667 489356
05-30-2018 10:28:49.251 extract [TikaTransformer] file: /file1.. ..
Projects/ImageProcessing/Girod/public_package.tar.gz
OK 3544906667 501580
05-30-2018 10:20:35.719 extract [TikaTransformer] file:/ /file1.. ...
Projects/ImageProcessing/Girod/public_package.tar.gz
OK 3544906667 489647
05-30-2018 10:12:24.859 extract [TikaTransformer] file: /file1.. ...
Projects/ImageProcessing/Girod/public_package.tar.gz
OK 3544906667 489811
05-30-2018 10:03:57.290 extract [TikaTransformer] file: /file1.. ...
Projects/ImageProcessing/Girod/public_package.tar.gz
Any idea why Mainfold cf tries extraction multiple times? Also can we set
the limit to terminate a job if it fails at a particular phase a certain
number of times? For eg., Solr ingestion fails 5 times and the job should
be terminated by itself.
Thanks and regards,
Vinay B S