I regret that I'm not able to reproduce this...that is, this works for me;

@Test
public void oneOff() throws Exception {
    System.setProperty("tika.config", "C:\\users\\talli\\myconfig.xml");
    TikaConfig config = new TikaConfig();
    AutoDetectParser parser = new AutoDetectParser(config);
    assertContains("quick brown fox", getXML("testOCR_spacing.png",
parser).xml);
}


where myconfig.xml is:
<?xml version="1.0" encoding="UTF-8"?>
<properties>
    <parsers>
        <parser class="org.apache.tika.parser.DefaultParser">
        </parser>

        <parser class="org.apache.tika.parser.ocr.TesseractOCRParser">
            <params>
                <param name="tesseractPath" type="string">C:\Program
Files\Tesseract-OCR2</param>
                <param name="tessdataPath" type="string">C:\Program
Files\Tesseract-OCR2\tessdata</param>
            </params>
        </parser>
    </parsers>
</properties>

Whatever you set in your tessConfig will _override_ the underlying settings
of the parser...all of them.  So, if you aren't setting the path there,
then, y, you won't see any effect.

On Mon, Feb 8, 2021 at 5:35 PM Peter Kronenberg <[email protected]>
wrote:

> Like this.
>
>
>         TikaConfig tikaConfig = new TikaConfig();
>
>         final AutoDetectParser parser = new AutoDetectParser(tikaConfig);
>
>         final ParseContext parseContext = new ParseContext();
>
>         parseContext.set(AutoDetectParser.class, parser);
>         parseContext.set(PDFParserConfig.class, pdfConfig);
>         parseContext.set(TesseractOCRConfig.class, tessConfig);
>
> -----Original Message-----
> From: Tim Allison <[email protected]>
> Sent: Monday, February 8, 2021 5:31 PM
> To: [email protected]
> Subject: Re: Tika-config
>
> How are you using the TikaConfig?
>
> On Mon, Feb 8, 2021 at 4:11 PM Peter Kronenberg <[email protected]>
> wrote:
> >
> > What is wrong with this?
> >
> > I specified the tika-config env variable.  I know it works because if
> > I make a syntax error in the tika-config.xml, it complains.  So it’s
> > finding the file.  But it’s not applying the properties
> >
> >
> >
> > I have this tika-config.  I tried forward slashes instead of the double
> backslashes.  Same result.  No errors.  It’s just not applying the values.
> >
> >
> >
> > <?xml version="1.0" encoding="UTF-8"?> <properties>
> >     <parsers>
> >         <parser class="org.apache.tika.parser.DefaultParser">
> >         </parser>
> >
> >         <parser class="org.apache.tika.parser.ocr.TesseractOCRParser">
> >             <params>
> >                 <param name="tesseractPath"
> type="string">c:\\tesseract_config</param>
> >                 <param name="tessdataPath"
> type="string">c:\\tessdata_config</param>
> >             </params>
> >         </parser>
> >     </parsers>
> > </properties>
> >
> >
>

Reply via email to