Thanks Julien,

I quickly figured out how to create a TikaConfig object configured the way I
wanted it.

It took me a while to figure out that the CompositeParser returned by
TikaConfig.getParser() doesn't automatically detect the MediaType, and
instead I needed to create an AutoDetectParser, passing it the TikaConfig I
wanted to use. Once I figured that out, your example works great!

Thanks for the help,
Paul

On Thu, May 5, 2011 at 2:00 PM, Julien Nioche <[email protected]
> wrote:

> Hi Paul,
>
> You can do that with a custom XML config. See
> https://issues.apache.org/jira/browse/TIKA-527 for details.
> *
> <properties>
>     <parsers>
>         <!-- Load all available parsers -->
>         <parser class="org.apache.tika.parser.DefaultParser"/>
>
>         <!-- Override parsing of all types supported by CustomParser -->
>         <parser class="com.sealsoftware.tika.mail.MyFunkyCustomParser"/>
>     </parsers>
> </properties>
> *
> HTH
>
> Julien
>
>
> On 5 May 2011 18:57, Paul Jakubik <[email protected]> wrote:
>
>> Hi,
>>
>> The quick guide to adding parsers (
>> http://tika.apache.org/0.9/parser_guide.html) says that you should modify
>> the following to make tika aware of a new parser:
>>
>>    -
>>     tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
>>    -
>>    
>> tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser
>>
>> Is there a way to add and/or replace parsers without modifying Tika
>> source?
>>
>> For instance, in one place where I use Tika I might want to replace the
>> standard Tika RFC822 parser with one that captures more email headers as
>> metadata. Can I change the parser used by AutoDetectParser by calling
>> getParsers()/setParsers() on the AutoDetectParser? Is there some other
>> preferred way to programmatically change the mapping from some MediaType to
>> a specific parser?
>>
>> Thanks,
>> Paul
>>
>
>
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
>

Reply via email to