While gathering the data about the issues I had with Tika batch and
Tika Pipes I worked out what my problem was and the
External2.ExternalParser now works with ExifTool in both Tika Batch
and Tika Pipes.
My problem was that I hadn't put ExifTool on the path when running
Tika Batch and Tika Pipes and the exception I got wasn't very helpful
in diagnosing the problem. The exception was:
WARN  [main] 08:49:39,817 org.apache.tika.pipes.PipesServer parse
exception: 20230811_131237.jpg
org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.external2.ExternalParser@7fcbe147
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:312)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:204)
at 
org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:168)
at org.apache.tika.pipes.PipesServer.parseRecursive(PipesServer.java:659)
at org.apache.tika.pipes.PipesServer.parseWithStream(PipesServer.java:546)
at org.apache.tika.pipes.PipesServer.parseFromTuple(PipesServer.java:487)
at org.apache.tika.pipes.PipesServer.actuallyParse(PipesServer.java:377)
at org.apache.tika.pipes.PipesServer.parseOne(PipesServer.java:344)
at org.apache.tika.pipes.PipesServer.processRequests(PipesServer.java:246)
at org.apache.tika.pipes.PipesServer.main(PipesServer.java:180)
Caused by: java.lang.NullPointerException
at 
java.base/java.util.concurrent.ConcurrentHashMap.replaceNode(ConcurrentHashMap.java:1111)
at 
java.base/java.util.concurrent.ConcurrentHashMap.remove(ConcurrentHashMap.java:1102)
at org.apache.tika.utils.ProcessUtils.release(ProcessUtils.java:45)
at org.apache.tika.utils.ProcessUtils.execute(ProcessUtils.java:215)
at 
org.apache.tika.parser.external2.ExternalParser.parse(ExternalParser.java:134)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
... 10 more

I have a question about Tika Pipes and Tika-app.jar: is it possible to
use the FetchEmitTuple to change the output format, or am I stuck with
JSON?

On Tue, 26 Aug 2025 at 22:42, Adrian Bird <adrian.bi...@googlemail.com> wrote:
>
> By looking at the test cases I have got the External2.ExternalParser
> to work with ExifTool for a single file.
>
> But, when I tried using Tika batch and Tika Pipes to process a
> directory of images I got exceptions in both cases. I can add details
> here, but I  need to gather the details of the issues.
>
> I will raise a JIRA about my original issue.
>
>
>
> On Tue, 26 Aug 2025 at 22:03, Tim Allison <talli...@apache.org> wrote:
> >
> > This is bad. I’m sorry for your experience with this.
> >
> > Let me see if I can get something working with our v2 external parsers.
> >
> > At the least, I agree that we need to fix our documentation.
> >
> > On Tue, Aug 26, 2025 at 6:11 AM Adrian Bird <adrian.bi...@googlemail.com> 
> > wrote:
> >>
> >> Hi,
> >>   I tried requesting a Jira account (birdya22) to report this issue
> >> but the request was denied.
> >>
> >> The reply said I could submit PRs on github (I have an account), but I
> >> didn't see how to do it (https://github.com/apache/tika/).
> >>
> >> So I've subscribed to this list and here are the details.
> >>
> >> I tried to get Tika and ExifTool to work together to process some JPEG
> >> image files and came across a number of issues.
> >> 1) Tika and ExifTool don't work on Windows
> >> I used the Wiki page
> >> 'https://cwiki.apache.org/confluence/display/TIKA/EXIFToolParser' to
> >> understand how to do the integration.
> >> Because I wasn't getting the metadata I expected, I used the
> >> '--verbose' option and got a Java Exception which contained this text:
> >>  "WARN  [main] 07:13:34,699
> >> org.apache.tika.parser.external.ExternalParser problem with process
> >> exec
> >> java.io.IOException: Cannot run program "env": CreateProcess error=2,
> >> The system cannot find the file specified"
> >> The exception occurs because 'env' is not a valid Windows command.
> >> I tracked this down to the file
> >> 'org\apache\tika\parser\external\tika-external-parsers.xml' in the
> >> Tika App jar where the command is:
> >> '<command>env FOO=${OUTPUT} exiftool ${INPUT}</command>'
> >> This doesn't work on Windows because 'env' does not exist as a command.
> >>
> >> 2) In the same file I noticed an entry for 'sox'. For the same reason
> >> as ExifTool, Tika and sox won't work on Windows
> >> The command is:
> >> <command>env FOO=${OUTPUT} sox --info ${INPUT}</command>
> >> Note I didn't find any information on 'sox' on the Wiki.
> >>
> >> 3) Looking at the file
> >> 'org\apache\tika\parser\external\tika-external-parsers.xml' I noticed
> >> that it only contains video related mime-types, meaning that I cannot
> >> use it with image files. The Wiki page says:
> >> 'EXIFTool is a wonderful tool that reads videos, images, audio and
> >> other media files and that extracts EXIF metadata from them.'
> >> I took this to mean that Tika can extract metadata from all 3 file
> >> types, but that isn't the case as it only supports video files.
> >> Given this can I suggest the Wiki page should be updated to make this 
> >> clear.
> >>
> >> Adrian

Reply via email to