I regret that I'm not able to reproduce this...that is, this works for me;
@Test
public void oneOff() throws Exception {
System.setProperty("tika.config", "C:\\users\\talli\\myconfig.xml");
TikaConfig config = new TikaConfig();
AutoDetectParser parser = new AutoDetectParser(config);
assertContains("quick brown fox", getXML("testOCR_spacing.png",
parser).xml);
}
where myconfig.xml is:
<?xml version="1.0" encoding="UTF-8"?>
<properties>
<parsers>
<parser class="org.apache.tika.parser.DefaultParser">
</parser>
<parser class="org.apache.tika.parser.ocr.TesseractOCRParser">
<params>
<param name="tesseractPath" type="string">C:\Program
Files\Tesseract-OCR2</param>
<param name="tessdataPath" type="string">C:\Program
Files\Tesseract-OCR2\tessdata</param>
</params>
</parser>
</parsers>
</properties>
Whatever you set in your tessConfig will _override_ the underlying settings
of the parser...all of them. So, if you aren't setting the path there,
then, y, you won't see any effect.
On Mon, Feb 8, 2021 at 5:35 PM Peter Kronenberg <[email protected]>
wrote:
> Like this.
>
>
> TikaConfig tikaConfig = new TikaConfig();
>
> final AutoDetectParser parser = new AutoDetectParser(tikaConfig);
>
> final ParseContext parseContext = new ParseContext();
>
> parseContext.set(AutoDetectParser.class, parser);
> parseContext.set(PDFParserConfig.class, pdfConfig);
> parseContext.set(TesseractOCRConfig.class, tessConfig);
>
> -----Original Message-----
> From: Tim Allison <[email protected]>
> Sent: Monday, February 8, 2021 5:31 PM
> To: [email protected]
> Subject: Re: Tika-config
>
> How are you using the TikaConfig?
>
> On Mon, Feb 8, 2021 at 4:11 PM Peter Kronenberg <[email protected]>
> wrote:
> >
> > What is wrong with this?
> >
> > I specified the tika-config env variable. I know it works because if
> > I make a syntax error in the tika-config.xml, it complains. So it’s
> > finding the file. But it’s not applying the properties
> >
> >
> >
> > I have this tika-config. I tried forward slashes instead of the double
> backslashes. Same result. No errors. It’s just not applying the values.
> >
> >
> >
> > <?xml version="1.0" encoding="UTF-8"?> <properties>
> > <parsers>
> > <parser class="org.apache.tika.parser.DefaultParser">
> > </parser>
> >
> > <parser class="org.apache.tika.parser.ocr.TesseractOCRParser">
> > <params>
> > <param name="tesseractPath"
> type="string">c:\\tesseract_config</param>
> > <param name="tessdataPath"
> type="string">c:\\tessdata_config</param>
> > </params>
> > </parser>
> > </parsers>
> > </properties>
> >
> >
>