Jukka Zitting wrote: > Hi, > > On Sun, May 17, 2009 at 9:03 AM, Robert Burrell Donkin > <[email protected]> wrote: >> IMHO it makes sense to factor out an interface, retain the existing >> implementation and create a separate module. this will allow assembler >> who don't want to use tika to create applications that don't use it. > > How about using the org.apache.tika.detect.Detector interface (see below)? > > Tika comes with default implementations of the interface, but it > should be straightforward to implement the interface also based on > alternative implementations.
i see this as a stepping stone. tika already supports most of the heuristics rat uses so IMHO it would make more sense to feed back rules upstream (either into the default typer, or a variant tuned for development). a couple of issues that suggest that this might be better than jumping to tika right away: 1. in terms of interface reuse ATM tika trunk doesn't offer a minimal api[1] 2. the latest release (tika 0.2) is not modular - robert [1] IMHO breaking out a tika api module with minimal dependencies would encourage wider use of tika's basic abstractions
