New config paradigm

Peter Kronenberg Wed, 10 Feb 2021 07:12:32 -0800

Ok, I’m gonna have questions 😊

In this code, I assume that this extracts the settings that are in the 
tika-config.  And we have to extract one parser at a time, right?


try (InputStream is = 
TikaOCRParser.class.getResourceAsStream("/tika-config.xml")) {
    tikaConfig = new TikaConfig(is);
}
Parser pdfParser = findParser(tikaConfig.getParser(), 
org.apache.tika.parser.pdf.PDFParser.class);
PDFParserConfig pdfParserConfig = ((PDFParser)pdfParser).getPDFParserConfig();
System.out.println("OCR Strategy: " + pdfParserConfig.getOcrStrategy());

If I then proceed to do this


final PDFParserConfig pdfConfig = new PDFParserConfig();
pdfConfig.setOcrStrategy(PDFParserConfig.OCR_STRATEGY.AUTO);


final AutoDetectParser parser = new AutoDetectParser(tikaConfig);
final ParseContext parseContext = new ParseContext();

parseContext.set(AutoDetectParser.class, parser);
parseContext.set(PDFParserConfig.class, pdfConfig);


How do I now get the values that are being used in the composite parseContext?  
I want to confirm that the values are as expected

New config paradigm

Reply via email to