Well both, I guess. When I send the result back to the user, I want to be able to also capture all the options that were used. I can get them from what was set, but I’d rather get them from the ‘source’. Although, if there is no practical way to do it, once I’m convinced that It all ‘works’, I’ll be ok with just echoing back what was set.
From: Tim Allison <[email protected]> Sent: Friday, February 12, 2021 11:40 AM To: [email protected] Subject: Re: {EXTERNAL}New config paradigm Is the goal to do this on an ongoing/programmatic basis, or do you just want debugging info during development? On Fri, Feb 12, 2021 at 9:54 AM Peter Kronenberg <[email protected]<mailto:[email protected]>> wrote: Still trying to understand how I get the settings that have been set on a parseContext. In other words, let’s say that I just have a parseContext. I have no idea what configs have been added to it. Is there a way to extract the parsers or the configs from the parseContext and view the settings? I can use the settings that I *think* I passed into it, but I would rather get the settings from the parseContext itself, to ensure that they are what I think they are. From: Peter Kronenberg <[email protected]<mailto:[email protected]>> Sent: Wednesday, February 10, 2021 10:12 AM To: [email protected]<mailto:[email protected]> Subject: {EXTERNAL}New config paradigm This email was sent from outside your organisation, yet is displaying the name of someone from your organisation. This often happens in phishing attempts. Please only interact with this email if you know its source and that the content is safe. CAUTION: This email originated from outside of the organization. DO NOT click links or open attachments unless you recognize the sender and know the content is safe. Ok, I’m gonna have questions 😊 In this code, I assume that this extracts the settings that are in the tika-config. And we have to extract one parser at a time, right? try (InputStream is = TikaOCRParser.class.getResourceAsStream("/tika-config.xml")) { tikaConfig = new TikaConfig(is); } Parser pdfParser = findParser(tikaConfig.getParser(), org.apache.tika.parser.pdf.PDFParser.class); PDFParserConfig pdfParserConfig = ((PDFParser)pdfParser).getPDFParserConfig(); System.out.println("OCR Strategy: " + pdfParserConfig.getOcrStrategy()); If I then proceed to do this final PDFParserConfig pdfConfig = new PDFParserConfig(); pdfConfig.setOcrStrategy(PDFParserConfig.OCR_STRATEGY.AUTO); final AutoDetectParser parser = new AutoDetectParser(tikaConfig); final ParseContext parseContext = new ParseContext(); parseContext.set(AutoDetectParser.class, parser); parseContext.set(PDFParserConfig.class, pdfConfig); How do I now get the values that are being used in the composite parseContext? I want to confirm that the values are as expected
