Hello,

I've been struggling with this for a bit and I was hoping you guys could help.  
I have a written a custom detector for detecting a subclass of application/zip. 
 Specifically, for properly detecting files from the apple app store which are 
basically zip files with a few special files in it (similar to android APK 
files).   The code is below.  But, I am also using the magic mime type detector 
for some special mime types (such as an android dex file, etc.).  Here's the 
situation:

1) When I just include the new custom detector in the build.xml as such (no 
other changes):

<service type="org.apache.tika.detect.Detector" provider="<fqn of custom 
detector"/>

It detects the apple app store files perfectly, but does not detect the dex 
files anymore using the magic type detection (returns application/octet-stream).

2) When I remove the custom detector, it will then detect dex files just fine, 
but no longer will detect apple app store files (of course, since I removed the 
custom detector).

So, it seems to me that there is some conflict between the way I am using the 
custom detector and the MagicTypeDetector used by DefaultDetector?  Or is there 
something wrong with my tika-config (see below).  Any help would be greatly 
appreciated.  Thanks!

-Paul

tika-config.xml:

<properties>

  <mimeTypeRepository resource="/etc/tika-mimetypes.xml" magic="true"/>

</properties>


custom detector:

public class AppleAppDetector implements Detector {
   /**
    * A simple class that takes in an input stream and will detect if the file 
is an app store file.
    * 
    * 
    */

   private static final long serialVersionUID = 1L;

   /**
    * Main interface to checking if file in an InputStream is an App store file
    * 
    */
   @Override
   public MediaType detect(InputStream input, Metadata metadata) throws 
IOException {
       ReusableBufferedInputStream buffered = null;
       try {
           buffered = 
ReusableBufferedInputStreamPool.getInstance().borrowObject();
       } catch (Exception e) {
           return MediaType.OCTET_STREAM;
       } finally {
           
ReusableBufferedInputStreamPool.getInstance().returnObjectQuietly(buffered);
       }

       return detect(buffered, input, metadata);
   }

   /**
    * Checks to see if the inputStream is an app store file
    * 
    * @param inputStream
    * @param archiveStreamFactory
    * @param metadata
    * @return
    * @throws IOException
    */
   public MediaType detect(ReusableBufferedInputStream inputStream, 
ArchiveStreamFactory archiveStreamFactory,
                   Metadata metadata) throws IOException {
       ArchiveInputStream archiveInput = null;
       try {
           archiveInput = 
archiveStreamFactory.createArchiveInputStream(inputStream);

           MediaType type = MediaType.OCTET_STREAM;
           ArchiveEntry entry = null;
           IpaFileMatcher matcher = new IpaFilePatternBuilder().build();
           while (((entry = archiveInput.getNextEntry()) != null) && 
!matcher.isIpa()) {
               // This will short circuit as soon as all the required files are 
matched
               matcher.checkMatch(entry.getName());
           }

           if (matcher.isIpa()) {
               type = MediaType.application("x-itunes-ipa");
           }
           return type;
       } catch (ArchiveException e) {
           return MediaType.OCTET_STREAM;
       } finally {
           IOUtils.closeQuietly(archiveInput);
       }
   }

   /**
    * Attempts to acquire and input stream and then passes on
    * 
    * @param inputStream
    * @param input
    * @param metadata
    * @return
    * @throws IOException
    */
   public MediaType detect(ReusableBufferedInputStream inputStream, InputStream 
input, Metadata metadata)
                   throws IOException {
       try {
           inputStream.acquire(input);
       } catch (Exception e) {
           throw new IOException(e);
       }
       return detect(inputStream, new ArchiveStreamFactory(), metadata);
   }
}

Reply via email to