Re: Changing existing PDFParser

Florin P Thu, 14 Jul 2011 01:17:39 -0700

Hello!
   Thank you. It worked.
Regards,
 Florin

--- On Wed, 7/13/11, Julien Nioche <[email protected]> wrote:


From: Julien Nioche <[email protected]>
Subject: Re: Changing existing PDFParser
To: [email protected]
Date: Wednesday, July 13, 2011, 9:36 AM

Hi Florin
You can create a custom config 

<properties> 
    <parsers> 
        <!-- Load all available parsers -->
        <parser class="org.apache.tika.parser.DefaultParser"/>
        <!-- Override parsing of all types supported by CustomParser -->

        <parser class="com.digitalpebble.tika.pdf.TETPDFParser"/>
    </parsers> 
</properties>
then call 

        conf = new TikaConfig(customConf);
your custom parser will then be used for pdf documents,  assuming that your 
parser has something like : 


private static final Set<MediaType> SUPPORTED_TYPES = Collections
                        .singleton(MediaType.application("pdf"));
        public Set<MediaType> getSupportedTypes(ParseContext context) {

                return SUPPORTED_TYPES;
        }
HTH
Julien


On 13 July 2011 14:11, Florin P <[email protected]> wrote:

 Hello!

   We would like to replace the existing PDFParser with

 our custom one. Moreover we would like that our

 CustomPDFParser to be used for all pdf documents that we are

 parsing.  How we can achieve this by using Java API? We

 are using Apache Tika 0.9.



  Thank you,



 Florin










-- 


Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/

http://www.digitalpebble.com

Re: Changing existing PDFParser

Reply via email to