Hello,
I've been struggling with this for a bit and I was hoping you guys could help.
I have a written a custom detector for detecting a subclass of application/zip.
Specifically, for properly detecting files from the apple app store which are
basically zip files with a few special files in it (similar to android APK
files). The code is below. But, I am also using the magic mime type detector
for some special mime types (such as an android dex file, etc.). Here's the
situation:
1) When I just include the new custom detector in the build.xml as such (no
other changes):
<service type="org.apache.tika.detect.Detector" provider="<fqn of custom
detector"/>
It detects the apple app store files perfectly, but does not detect the dex
files anymore using the magic type detection (returns application/octet-stream).
2) When I remove the custom detector, it will then detect dex files just fine,
but no longer will detect apple app store files (of course, since I removed the
custom detector).
So, it seems to me that there is some conflict between the way I am using the
custom detector and the MagicTypeDetector used by DefaultDetector? Or is there
something wrong with my tika-config (see below). Any help would be greatly
appreciated. Thanks!
-Paul
tika-config.xml:
<properties>
<mimeTypeRepository resource="/etc/tika-mimetypes.xml" magic="true"/>
</properties>
custom detector:
public class AppleAppDetector implements Detector {
/**
* A simple class that takes in an input stream and will detect if the file
is an app store file.
*
*
*/
private static final long serialVersionUID = 1L;
/**
* Main interface to checking if file in an InputStream is an App store file
*
*/
@Override
public MediaType detect(InputStream input, Metadata metadata) throws
IOException {
ReusableBufferedInputStream buffered = null;
try {
buffered =
ReusableBufferedInputStreamPool.getInstance().borrowObject();
} catch (Exception e) {
return MediaType.OCTET_STREAM;
} finally {
ReusableBufferedInputStreamPool.getInstance().returnObjectQuietly(buffered);
}
return detect(buffered, input, metadata);
}
/**
* Checks to see if the inputStream is an app store file
*
* @param inputStream
* @param archiveStreamFactory
* @param metadata
* @return
* @throws IOException
*/
public MediaType detect(ReusableBufferedInputStream inputStream,
ArchiveStreamFactory archiveStreamFactory,
Metadata metadata) throws IOException {
ArchiveInputStream archiveInput = null;
try {
archiveInput =
archiveStreamFactory.createArchiveInputStream(inputStream);
MediaType type = MediaType.OCTET_STREAM;
ArchiveEntry entry = null;
IpaFileMatcher matcher = new IpaFilePatternBuilder().build();
while (((entry = archiveInput.getNextEntry()) != null) &&
!matcher.isIpa()) {
// This will short circuit as soon as all the required files are
matched
matcher.checkMatch(entry.getName());
}
if (matcher.isIpa()) {
type = MediaType.application("x-itunes-ipa");
}
return type;
} catch (ArchiveException e) {
return MediaType.OCTET_STREAM;
} finally {
IOUtils.closeQuietly(archiveInput);
}
}
/**
* Attempts to acquire and input stream and then passes on
*
* @param inputStream
* @param input
* @param metadata
* @return
* @throws IOException
*/
public MediaType detect(ReusableBufferedInputStream inputStream, InputStream
input, Metadata metadata)
throws IOException {
try {
inputStream.acquire(input);
} catch (Exception e) {
throw new IOException(e);
}
return detect(inputStream, new ArchiveStreamFactory(), metadata);
}
}