detect corrupt file and build a list of them before indexing in solr

kostali hassan Fri, 15 Jul 2016 02:16:58 -0700

I'am looking to index ms word and pdf using uploading data with solr cell
using apache tika;
 I just hope use tika to detect corrupt files before indexing and get a
list of corrupted file. if its possible.
I try runing java -jar tika-app.jar <input_dir> <output_dir> I get in the
output_dir all the files of <input_dir> in format xml and all the corrupt
file with size 0ko (empty)

detect corrupt file and build a list of them before indexing in solr

Reply via email to