Thanks Julien, I quickly figured out how to create a TikaConfig object configured the way I wanted it.
It took me a while to figure out that the CompositeParser returned by TikaConfig.getParser() doesn't automatically detect the MediaType, and instead I needed to create an AutoDetectParser, passing it the TikaConfig I wanted to use. Once I figured that out, your example works great! Thanks for the help, Paul On Thu, May 5, 2011 at 2:00 PM, Julien Nioche <[email protected] > wrote: > Hi Paul, > > You can do that with a custom XML config. See > https://issues.apache.org/jira/browse/TIKA-527 for details. > * > <properties> > <parsers> > <!-- Load all available parsers --> > <parser class="org.apache.tika.parser.DefaultParser"/> > > <!-- Override parsing of all types supported by CustomParser --> > <parser class="com.sealsoftware.tika.mail.MyFunkyCustomParser"/> > </parsers> > </properties> > * > HTH > > Julien > > > On 5 May 2011 18:57, Paul Jakubik <[email protected]> wrote: > >> Hi, >> >> The quick guide to adding parsers ( >> http://tika.apache.org/0.9/parser_guide.html) says that you should modify >> the following to make tika aware of a new parser: >> >> - >> tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml >> - >> >> tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser >> >> Is there a way to add and/or replace parsers without modifying Tika >> source? >> >> For instance, in one place where I use Tika I might want to replace the >> standard Tika RFC822 parser with one that captures more email headers as >> metadata. Can I change the parser used by AutoDetectParser by calling >> getParsers()/setParsers() on the AutoDetectParser? Is there some other >> preferred way to programmatically change the mapping from some MediaType to >> a specific parser? >> >> Thanks, >> Paul >> > > > > -- > * > *Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com >
