I tried instantiating using Tika config already  but it still makes call to
ExternalParsers
my configs are

<?xml version="1.0" encoding="UTF-8"?>
<properties>
    <parsers>
        <!-- Default Parser for most things, except for 2 mime types, and never
             use the Executable Parser -->
        <parser class="org.apache.tika.parser.DefaultParser">
            <parser-exclude
class="org.apache.tika.parser.external.CompositeExternalParser"/>
            <parser-exclude
class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
        </parser>
    </parsers>

</properties>



import org.apache.tika.config.TikaConfig;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.AutoDetectParser;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.parser.Parser;
import org.apache.tika.sax.BodyContentHandler;

import java.io.FileInputStream;
import java.io.FileOutputStream;

public class Autodetect
{
    public static void main(String[] args) throws Exception
    {
        long start=System.currentTimeMillis();
        TikaConfig config = new TikaConfig("tika-config.xml");
        AutoDetectParser parser=new AutoDetectParser(config);
        System.out.println("Time for init
"+(System.currentTimeMillis() - start));
        FileInputStream is=new FileInputStream("test.zip");
        FileOutputStream os=new FileOutputStream("out.txt");
        BodyContentHandler contentHandler = new BodyContentHandler(os);
        Metadata metadata=new Metadata();
        ParseContext parseContext=new ParseContext();
        parseContext.set(Parser.class,parser);
        parser.parse(is,contentHandler,metadata,parseContext);

    }

}



On Fri, Aug 4, 2017 at 2:31 PM, Nick Burch <[email protected]> wrote:

> On Fri, 4 Aug 2017, aravinth thangasami wrote:
>
>> we are using Tika 1.13.
>>
>
> 1.15 is out!
>
> While instantiating AutoDetectParser we found that the
>> CompositeExternalParser which actually we don't need, takes up more time.
>> It because of  ExifTool & FFmpeg.
>>
>> I tried with removing CompositeExternalParser from Jar and we are seeing
>> an
>> Improvement.
>>
>
> You should be able to exclude that from DefaultParser in config with a
> parser-exclude:
> http://tika.apache.org/1.16/configuring.html#Configuring_Parsers
>
> Then make sure you create your AutoDetectParser from the config with that
> exclude
>
> Nick
>

Reply via email to