i'd like to switch to tika for mime type detection in rat. the world of dependencies for the org.apache.tika.parser worries me a little. i think that it should be possible just to exclude them using maven (and i'll probably begin by doing that) but the detection stuff is cool and would be more generally useful without the parser dependencies.
what's the consensus about modularisation? BTW i'll could probably write something up on detection if that'd be useful. (these days, i find confluence has a lot quicker document cycle than maven. so, i wondered whether there were any plans to move tika's main documentation to confluence) - robert