Hi Florin
You can create a custom config
<properties>
> <parsers>
> <!-- Load all available parsers -->
> <parser class="org.apache.tika.parser.DefaultParser"/>
> <!-- Override parsing of all types supported by CustomParser -->
> <parser class="com.digitalpebble.tika.pdf.TETPDFParser"/>
> </parsers>
> </properties>
then call
conf = new TikaConfig(customConf);
your custom parser will then be used for pdf documents, assuming that your
parser has something like :
private static final Set<MediaType> SUPPORTED_TYPES = Collections
> .singleton(MediaType.application("pdf"));
> public Set<MediaType> getSupportedTypes(ParseContext context) {
> return SUPPORTED_TYPES;
> }
HTH
Julien
On 13 July 2011 14:11, Florin P <[email protected]> wrote:
> Hello!
> We would like to replace the existing PDFParser with
> our custom one. Moreover we would like that our
> CustomPDFParser to be used for all pdf documents that we are
> parsing. How we can achieve this by using Java API? We
> are using Apache Tika 0.9.
>
> Thank you,
>
> Florin
>
>
>
>
--
*
*Open Source Solutions for Text Engineering
http://digitalpebble.blogspot.com/
http://www.digitalpebble.com