Hi,
  I tried requesting a Jira account (birdya22) to report this issue
but the request was denied.

The reply said I could submit PRs on github (I have an account), but I
didn't see how to do it (https://github.com/apache/tika/).

So I've subscribed to this list and here are the details.

I tried to get Tika and ExifTool to work together to process some JPEG
image files and came across a number of issues.
1) Tika and ExifTool don't work on Windows
I used the Wiki page
'https://cwiki.apache.org/confluence/display/TIKA/EXIFToolParser' to
understand how to do the integration.
Because I wasn't getting the metadata I expected, I used the
'--verbose' option and got a Java Exception which contained this text:
 "WARN  [main] 07:13:34,699
org.apache.tika.parser.external.ExternalParser problem with process
exec
java.io.IOException: Cannot run program "env": CreateProcess error=2,
The system cannot find the file specified"
The exception occurs because 'env' is not a valid Windows command.
I tracked this down to the file
'org\apache\tika\parser\external\tika-external-parsers.xml' in the
Tika App jar where the command is:
'<command>env FOO=${OUTPUT} exiftool ${INPUT}</command>'
This doesn't work on Windows because 'env' does not exist as a command.

2) In the same file I noticed an entry for 'sox'. For the same reason
as ExifTool, Tika and sox won't work on Windows
The command is:
<command>env FOO=${OUTPUT} sox --info ${INPUT}</command>
Note I didn't find any information on 'sox' on the Wiki.

3) Looking at the file
'org\apache\tika\parser\external\tika-external-parsers.xml' I noticed
that it only contains video related mime-types, meaning that I cannot
use it with image files. The Wiki page says:
'EXIFTool is a wonderful tool that reads videos, images, audio and
other media files and that extracts EXIF metadata from them.'
I took this to mean that Tika can extract metadata from all 3 file
types, but that isn't the case as it only supports video files.
Given this can I suggest the Wiki page should be updated to make this clear.

Adrian

Reply via email to