I have recently moved my PHP command-line script to a new server, installed java 1.5.0 and am trying to run tika.
Each time I run it, I get the same error when running tika: -bash-4.1$ java -jar tika-app-1.2.jar Exception in thread "main" java.lang.RuntimeException: Unable to parse the default media type registry at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:482) at org.apache.tika.detect.DefaultDetector.<init>(DefaultDetector.java:98) at org.apache.tika.cli.TikaCLI.<init>(TikaCLI.java:303) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:105) Caused by: org.apache.tika.mime.MimeTypeException: Invalid type configuration at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:119) at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:64) at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:93) at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:149) at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:479) ...3 more Caused by: org.apache.tika.mime.MimeTypeException: Invalid media type name: application/dita+xml;format=map at org.apache.tika.mime.MimeTypesReader.startElement(MimeTypesReader.java:148) at gnu.xml.stream.SAXParser.parse(libgcj.so.10) at javax.xml.parsers.SAXParser.parse(libgcj.so.10) at javax.xml.parsers.SAXParser.parse(libgcj.so.10) at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:115) ...7 more My PHP script is running command-lines like this: java -jar tika-app-1.2.jar -eUTF-8 --text "/var/www/vhosts/example/sample-docs/welsh_corpus.txt" >/tmp/phpVAc0aW 2>/tmp/phpcKRcIo All the same. I have no idea what "Unable to parse the default media type registry" could mean, how to fix it, or what it is that is different on this new server (I installed java from the same source, and move from CloudLinux where it was previously working to CentOS 6 now). Anyone else had this error? I expect it has an easy fix, but the error just means nothing to me and my Google-foo cannot find any explanation, apart from the line of source in tika that generates the error. -- Jason
