Hi,

On Sun, May 17, 2009 at 9:03 AM, Robert Burrell Donkin
<[email protected]> wrote:
> IMHO it makes sense to factor out an interface, retain the existing
> implementation and create a separate module. this will allow assembler
> who don't want to use tika to create applications that don't use it.

How about using the org.apache.tika.detect.Detector interface (see below)?

Tika comes with default implementations of the interface, but it
should be straightforward to implement the interface also based on
alternative implementations.

BR,

Jukka Zitting


/**
 * Content type detector. Implementations of this interface use various
 * heuristics to detect the content type of a document based on given
 * input metadata or the first few bytes of the document stream.
 *
 * @since Apache Tika 0.3
 */
public interface Detector {

    /**
     * Detects the content type of the given input document. Returns
     * <code>application/octet-stream</code> if the type of the document
     * can not be detected.
     * <p>
     * If the document input stream is not available, then the first
     * argument may be <code>null</code>. Otherwise the detector may
     * read bytes from the start of the stream to help in type detection.
     * The given stream is guaranteed to support the
     * {...@link InputStream#markSupported() mark feature} and the detector
     * is expected to {...@link InputStream#mark(int) mark} the stream before
     * reading any bytes from it, and to {...@link InputStream#reset() reset}
     * the stream before returning. The stream must not be closed by the
     * detector.
     * <p>
     * The given input metadata is only read, not modified, by the detector.
     *
     * @param input document input stream, or <code>null</code>
     * @param metadata input metadata for the document
     * @return detected media type, or <code>application/octet-stream</code>
     * @throws IOException if the document input stream could not be read
     */
    MediaType detect(InputStream input, Metadata metadata) throws IOException;

}

Reply via email to