Ok, your last point might be the issue. If i don't set it in 
tesseractOCRConfig, then seting it in tika-config has no effect? I'm not sure I 
understand the thinking or logic behind this.


________________________________
From: Tim Allison <[email protected]>
Sent: Monday, February 8, 2021 8:47:07 PM
To: Peter Kronenberg <[email protected]>; [email protected] 
<[email protected]>
Subject: Re: Tika-config

I regret that I'm not able to reproduce this...that is, this works for me;


@Test
public void oneOff() throws Exception {
    System.setProperty("tika.config", "C:\\users\\talli\\myconfig.xml");
    TikaConfig config = new TikaConfig();
    AutoDetectParser parser = new AutoDetectParser(config);
    assertContains("quick brown fox", getXML("testOCR_spacing.png", 
parser).xml);
}

where myconfig.xml is:
<?xml version="1.0" encoding="UTF-8"?>
<properties>
    <parsers>
        <parser class="org.apache.tika.parser.DefaultParser">
        </parser>

        <parser class="org.apache.tika.parser.ocr.TesseractOCRParser">
            <params>
                <param name="tesseractPath" type="string">C:\Program 
Files\Tesseract-OCR2</param>
                <param name="tessdataPath" type="string">C:\Program 
Files\Tesseract-OCR2\tessdata</param>
            </params>
        </parser>
    </parsers>
</properties>

Whatever you set in your tessConfig will _override_ the underlying settings of 
the parser...all of them.  So, if you aren't setting the path there, then, y, 
you won't see any effect.

On Mon, Feb 8, 2021 at 5:35 PM Peter Kronenberg 
<[email protected]<mailto:[email protected]>> wrote:
Like this.


        TikaConfig tikaConfig = new TikaConfig();

        final AutoDetectParser parser = new AutoDetectParser(tikaConfig);

        final ParseContext parseContext = new ParseContext();

        parseContext.set(AutoDetectParser.class, parser);
        parseContext.set(PDFParserConfig.class, pdfConfig);
        parseContext.set(TesseractOCRConfig.class, tessConfig);

-----Original Message-----
From: Tim Allison <[email protected]<mailto:[email protected]>>
Sent: Monday, February 8, 2021 5:31 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: Tika-config

How are you using the TikaConfig?

On Mon, Feb 8, 2021 at 4:11 PM Peter Kronenberg 
<[email protected]<mailto:[email protected]>> wrote:
>
> What is wrong with this?
>
> I specified the tika-config env variable.  I know it works because if
> I make a syntax error in the tika-config.xml, it complains.  So it’s
> finding the file.  But it’s not applying the properties
>
>
>
> I have this tika-config.  I tried forward slashes instead of the double 
> backslashes.  Same result.  No errors.  It’s just not applying the values.
>
>
>
> <?xml version="1.0" encoding="UTF-8"?> <properties>
>     <parsers>
>         <parser class="org.apache.tika.parser.DefaultParser">
>         </parser>
>
>         <parser class="org.apache.tika.parser.ocr.TesseractOCRParser">
>             <params>
>                 <param name="tesseractPath" 
> type="string">c:\\tesseract_config</param>
>                 <param name="tessdataPath" 
> type="string">c:\\tessdata_config</param>
>             </params>
>         </parser>
>     </parsers>
> </properties>
>
>

Reply via email to