I'm trying to get tika to detect .bat and .cmd files. Both are returning as 
text/plain.

In the xml file, 
(https://github.com/apache/tika/blob/master/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml)
bat falls under application/x-msdownload but yet it returns as text/plain.

cmd is under text/plain also surprisingly. I would have expected it to be with 
.bat.

Has anyone had tika properly detect batch script files?

The closest thing I can find when searching for this is this unresolved ticket: 
https://issues.apache.org/jira/browse/TIKA-1148


When I run the tika-app jar by itself, I get the same results (plain/text) as 
when I'm doing this through java code.

> java -jar tika-app-1.16.jar -d BatchInstall.bat
Aug 23, 2017 9:40:22 AM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.
TIFFImageWriter not loaded. tiff files will not be processed
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.
J2KImageReader not loaded. JPEG2000 files will not be processed.
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.

Aug 23, 2017 9:40:23 AM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
text/plain

=====================
Java version
private static final Tika CONTENT_TYPE_DETECTOR = new Tika();
return CONTENT_TYPE_DETECTOR.detect(fileItem.get(), fileItem.getName())
// Returns text/plain

Reply via email to