Ok, Iām gonna have questions š
In this code, I assume that this extracts the settings that are in the
tika-config. And we have to extract one parser at a time, right?
try (InputStream is =
TikaOCRParser.class.getResourceAsStream("/tika-config.xml")) {
tikaConfig = new TikaConfig(is);
}
Parser pdfParser = findParser(tikaConfig.getParser(),
org.apache.tika.parser.pdf.PDFParser.class);
PDFParserConfig pdfParserConfig = ((PDFParser)pdfParser).getPDFParserConfig();
System.out.println("OCR Strategy: " + pdfParserConfig.getOcrStrategy());
If I then proceed to do this
final PDFParserConfig pdfConfig = new PDFParserConfig();
pdfConfig.setOcrStrategy(PDFParserConfig.OCR_STRATEGY.AUTO);
final AutoDetectParser parser = new AutoDetectParser(tikaConfig);
final ParseContext parseContext = new ParseContext();
parseContext.set(AutoDetectParser.class, parser);
parseContext.set(PDFParserConfig.class, pdfConfig);
How do I now get the values that are being used in the composite parseContext?
I want to confirm that the values are as expected