After upgrading to Tika 1.21 I have noticed several known XLSX files
are detected by Tika as "application/x-tika-ooxml". I think I've
narrowed it down to the new StreamingZipContainerDetector. After
inspecting the "[Content_Types].xml" of these XLSX files there is no
reference to any of the configured content types for XLSX in the
OOXML_CONTENT_TYPES in StreamingZipContainerDetector. Specifically,

"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml"
"application/vnd.ms-excel.sheet.macroEnabled.main+xml"
"application/vnd.ms-excel.sheet.binary.macroEnabled.main"

I do see a content type of

"application/vnd.openxmlformats-officedocument.spreadsheetml.template.main+xml"

in "[Content_Types].xml". Is the StreamingZipContainerDetector missing
the XSSFRelation TEMPLATE_WORKBOOK in OOXML_CONTENT_TYPES?

Reply via email to