Hello! Thank you. It worked. Regards, Florin --- On Wed, 7/13/11, Julien Nioche <[email protected]> wrote:
From: Julien Nioche <[email protected]> Subject: Re: Changing existing PDFParser To: [email protected] Date: Wednesday, July 13, 2011, 9:36 AM Hi Florin You can create a custom config <properties> <parsers> <!-- Load all available parsers --> <parser class="org.apache.tika.parser.DefaultParser"/> <!-- Override parsing of all types supported by CustomParser --> <parser class="com.digitalpebble.tika.pdf.TETPDFParser"/> </parsers> </properties> then call conf = new TikaConfig(customConf); your custom parser will then be used for pdf documents, assuming that your parser has something like : private static final Set<MediaType> SUPPORTED_TYPES = Collections .singleton(MediaType.application("pdf")); public Set<MediaType> getSupportedTypes(ParseContext context) { return SUPPORTED_TYPES; } HTH Julien On 13 July 2011 14:11, Florin P <[email protected]> wrote: Hello! We would like to replace the existing PDFParser with our custom one. Moreover we would like that our CustomPDFParser to be used for all pdf documents that we are parsing. How we can achieve this by using Java API? We are using Apache Tika 0.9. Thank you, Florin -- Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com
