Well both, I guess.  When I send the result back to the user, I want to be able 
to also capture all the options that were used.  I can get them from what was 
set, but I’d rather get them from the ‘source’.  Although, if there is no 
practical way to do it, once I’m convinced that It all ‘works’, I’ll be ok with 
just echoing back what was set.

From: Tim Allison <[email protected]>
Sent: Friday, February 12, 2021 11:40 AM
To: [email protected]
Subject: Re: {EXTERNAL}New config paradigm

Is the goal to do this on an ongoing/programmatic basis, or do you just want 
debugging info during development?

On Fri, Feb 12, 2021 at 9:54 AM Peter Kronenberg 
<[email protected]<mailto:[email protected]>> wrote:
Still trying to understand how I get the settings that have been set on a 
parseContext.  In other words, let’s say that I just have a parseContext. I 
have no idea what configs have been added to it.  Is there a way to extract the 
parsers or the configs from the parseContext and view the settings?
I can use the settings that I *think* I passed into it, but I would rather get 
the settings from the parseContext itself, to ensure that they are what I think 
they are.


From: Peter Kronenberg 
<[email protected]<mailto:[email protected]>>
Sent: Wednesday, February 10, 2021 10:12 AM
To: [email protected]<mailto:[email protected]>
Subject: {EXTERNAL}New config paradigm

This email was sent from outside your organisation, yet is displaying the name 
of someone from your organisation. This often happens in phishing attempts. 
Please only interact with this email if you know its source and that the 
content is safe.

CAUTION: This email originated from outside of the organization. DO NOT click 
links or open attachments unless you recognize the sender and know the content 
is safe.
Ok, I’m gonna have questions 😊

In this code, I assume that this extracts the settings that are in the 
tika-config.  And we have to extract one parser at a time, right?

try (InputStream is = 
TikaOCRParser.class.getResourceAsStream("/tika-config.xml")) {
    tikaConfig = new TikaConfig(is);
}
Parser pdfParser = findParser(tikaConfig.getParser(), 
org.apache.tika.parser.pdf.PDFParser.class);
PDFParserConfig pdfParserConfig = ((PDFParser)pdfParser).getPDFParserConfig();
System.out.println("OCR Strategy: " + pdfParserConfig.getOcrStrategy());

If I then proceed to do this


final PDFParserConfig pdfConfig = new PDFParserConfig();
pdfConfig.setOcrStrategy(PDFParserConfig.OCR_STRATEGY.AUTO);


final AutoDetectParser parser = new AutoDetectParser(tikaConfig);
final ParseContext parseContext = new ParseContext();

parseContext.set(AutoDetectParser.class, parser);
parseContext.set(PDFParserConfig.class, pdfConfig);

How do I now get the values that are being used in the composite parseContext?  
I want to confirm that the values are as expected

Reply via email to