when I add to inputDIR d:\test the log tell me:java.lang.RuntimeException:
Crawler couldn't find this directory:D:\tika_batch_config\test
the same if I add to inputDIR d:\Cvs the log is:java.lang.RuntimeException:
Crawler couldn't find this directory: D:\tika_batch_config\Cvs

2016-07-15 17:54 GMT+01:00 kostali hassan <[email protected]>:

> I added this directorry ANd still not working
>
> 2016-07-15 17:42 GMT+01:00 Allison, Timothy B. <[email protected]>:
>
>> Y, the log tells you that the input directory wasn’t specified correctly:
>>
>>
>>
>> 1375 2016-07-15 17:33:17,354 [Thread-2] INFO
>> org.apache.tika.batch.BatchProcessDriverCLI  - BatchProcess:
>> java.lang.RuntimeException: Crawler couldn't find this
>> directory:D:\tika_batch_config\test
>>
>>
>>
>> *From:* kostali hassan [mailto:[email protected]]
>> *Sent:* Friday, July 15, 2016 12:40 PM
>>
>> *To:* [email protected]
>> *Subject:* Re: detect corrupt file and build a list of them before
>> indexing in solr
>>
>>
>>
>> only JXmx1g work AND the inputDIR is empty AND I get this files empty in
>> logs :
>>
>> batch-driver-warn.log
>>
>> batch-process-warn.log
>>
>> tika-batch-pdfbox.log
>>
>>
>>
>> AND this attached files
>>
>>
>>
>> 2016-07-15 16:36 GMT+01:00 Allison, Timothy B. <[email protected]>:
>>
>> Try changing the max heap to something that will work on your computer:
>>
>>
>>
>> -JXmx5g
>>
>>
>>
>> To (say):
>>
>>
>>
>> -JXmx1g
>>
>> *From:* kostali hassan [mailto:[email protected]]
>> *Sent:* Friday, July 15, 2016 11:27 AM
>> *To:* [email protected]
>> *Subject:* Re: detect corrupt file and build a list of them before
>> indexing in solr
>>
>>
>>
>> I get this files in the logs ; AND when I run the script he dont finich
>> he restart all the time
>>
>>
>>
>> 2016-07-15 13:19 GMT+01:00 Allison, Timothy B. <[email protected]>:
>>
>> Sorry, you’ll get 0 byte files for an error that caused Tika batch to do
>> a restart (hang/oom); and depending on cause, you may get an error logged
>> in batch-process-error.xml.  If your OS kills the process or something
>> truly catastrophic happens, the only trace you have is the 0 byte file.
>>
>>
>>
>>   For regular caught exceptions, you can look in the .json file (key: 
>> TikaCoreProperties.*TIKA_META_EXCEPTION_PREFIX*+*"runtime"*)
>>
>> for the stack trace, or you can look in the logs as described below.
>>
>>
>>
>> *From:* Allison, Timothy B. [mailto:[email protected]]
>> *Sent:* Friday, July 15, 2016 8:11 AM
>> *To:* [email protected]
>> *Subject:* RE: detect corrupt file and build a list of them before
>> indexing in solr
>>
>>
>>
>> Checking for 0 byte files is one option.  The other option is to
>> configure the logs to capture exceptions.  I’ve attached the config files
>> and the shell script that I use when running our large scale regression
>> testing here:
>> https://wiki.apache.org/tika/TikaBatchUsage?action=AttachFile&do=view&target=tika-batch-sh.zip
>>
>>
>>
>> To run those, unzip the folder, put the tika-app.jar in the bin/
>> directory, update the shell script for your <input_dir> and your
>> <output_dir> and you should be good to go.  You may need to create a “logs”
>> directory.  Exceptions will be recorded in the batch-process-warn.log, and
>> original file names are included along with stack traces.
>>
>>
>>
>> *From:* kostali hassan [mailto:[email protected]
>> <[email protected]>]
>> *Sent:* Friday, July 15, 2016 5:17 AM
>> *To:* [email protected]
>> *Subject:* detect corrupt file and build a list of them before indexing
>> in solr
>>
>>
>>
>> I'am looking to index ms word and pdf using uploading data with solr cell
>> using apache tika;
>>
>>  I just hope use tika to detect corrupt files before indexing and get a
>> list of corrupted file. if its possible.
>>
>> I try runing java -jar tika-app.jar <input_dir> <output_dir> I get in the
>> output_dir all the files of <input_dir> in format xml and all the corrupt
>> file with size 0ko (empty)
>>
>>
>>
>>
>>
>
>

Reply via email to