Give or take, yes.  The problem is that findParser is reliable enough
for unit tests but not reliable for production.  If there's a
CompoundParser class that the search code doesn't properly process to
find the underlying parser, things can go wrong...e.g. it might not be
able to find the underlying TesseractOCRParser.  I'd encourage the
first option strongly over the second.

On Wed, Feb 10, 2021 at 1:22 PM Peter Kronenberg
<[email protected]> wrote:
>
> What is the difference between these two examples.
>
>
>
> In the first one, I construct a new TesseractOCRConfig, make changes,  and 
> add it to the parseContext.
>
> In the second one, I get the TesseractOCRConfig from the TesseractOCRParser 
> in the TikaConfig and make changes.  I don’t add it to the parseContext since 
> it doesn’t seem to be necessary
>
>
>
> Are these 2 things equivalent?
>
>
>
>
>
> public static String parse(String file) throws TikaException, SAXException, 
> IOException {
>
>     AutoDetectParser parser = new AutoDetectParser(new TikaConfig());
>
>     ParseContext parseContext = new ParseContext();
>
>     TesseractOCRConfig tessConfig = new TesseractOCRConfig();
>     parseContext.set(AutoDetectParser.class, parser);
>     parseContext.set(TesseractOCRConfig.class, tessConfig);
>
>     tessConfig.setEnableImageProcessing(true);
>
> }
>
>
>
>
>
> public static String parse(String file) throws TikaException, SAXException, 
> IOException {
>
>     AutoDetectParser parser = new AutoDetectParser(new TikaConfig());
>
>
>     Parser tesseractOcrParser = findParser(tikaConfig.getParser(), 
> org.apache.tika.parser.ocr.TesseractOCRParser.class);
>
>     TesseractOCRConfig tessConfig = 
> ((TesseractOCRParser)tesseractOcrParser).getDefaultConfig();
>
>
>
>     //parseContext.set(AutoDetectParser.class, parser);
>     //parseContext.set(TesseractOCRConfig.class, tessConfig);
>
>     tessConfig.setEnableImageProcessing(true);
>
>
> }
>
>

Reply via email to