I tried instantiating using Tika config already but it still makes call to
ExternalParsers
my configs are
<?xml version="1.0" encoding="UTF-8"?>
<properties>
<parsers>
<!-- Default Parser for most things, except for 2 mime types, and never
use the Executable Parser -->
<parser class="org.apache.tika.parser.DefaultParser">
<parser-exclude
class="org.apache.tika.parser.external.CompositeExternalParser"/>
<parser-exclude
class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
</parser>
</parsers>
</properties>
import org.apache.tika.config.TikaConfig;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.AutoDetectParser;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.parser.Parser;
import org.apache.tika.sax.BodyContentHandler;
import java.io.FileInputStream;
import java.io.FileOutputStream;
public class Autodetect
{
public static void main(String[] args) throws Exception
{
long start=System.currentTimeMillis();
TikaConfig config = new TikaConfig("tika-config.xml");
AutoDetectParser parser=new AutoDetectParser(config);
System.out.println("Time for init
"+(System.currentTimeMillis() - start));
FileInputStream is=new FileInputStream("test.zip");
FileOutputStream os=new FileOutputStream("out.txt");
BodyContentHandler contentHandler = new BodyContentHandler(os);
Metadata metadata=new Metadata();
ParseContext parseContext=new ParseContext();
parseContext.set(Parser.class,parser);
parser.parse(is,contentHandler,metadata,parseContext);
}
}
On Fri, Aug 4, 2017 at 2:31 PM, Nick Burch <[email protected]> wrote:
> On Fri, 4 Aug 2017, aravinth thangasami wrote:
>
>> we are using Tika 1.13.
>>
>
> 1.15 is out!
>
> While instantiating AutoDetectParser we found that the
>> CompositeExternalParser which actually we don't need, takes up more time.
>> It because of ExifTool & FFmpeg.
>>
>> I tried with removing CompositeExternalParser from Jar and we are seeing
>> an
>> Improvement.
>>
>
> You should be able to exclude that from DefaultParser in config with a
> parser-exclude:
> http://tika.apache.org/1.16/configuring.html#Configuring_Parsers
>
> Then make sure you create your AutoDetectParser from the config with that
> exclude
>
> Nick
>