sorry an "internal tessconfig"
On Mon, Feb 8, 2021 at 9:23 PM Tim Allison <[email protected]> wrote: > > Let's say you have an internal tessconfig file in the parser that > you've configured through a tikaconfig. When, at runtime, you send in > a new tessconfig via the parsecontext, how can we tell which > parameters you want to change from the new tessconfig? > > Yes, I realize that it would be possible to keep track of what > parameters have been changed in the runtime config and then do > something smart, but this hasn't been an issue to date. > > On Mon, Feb 8, 2021 at 9:15 PM Peter Kronenberg > <[email protected]> wrote: > > > > Ok, your last point might be the issue. If i don't set it in > > tesseractOCRConfig, then seting it in tika-config has no effect? I'm not > > sure I understand the thinking or logic behind this. > > > > > > ________________________________ > > From: Tim Allison <[email protected]> > > Sent: Monday, February 8, 2021 8:47:07 PM > > To: Peter Kronenberg <[email protected]>; [email protected] > > <[email protected]> > > Subject: Re: Tika-config > > > > I regret that I'm not able to reproduce this...that is, this works for me; > > > > @Test > > public void oneOff() throws Exception { > > System.setProperty("tika.config", "C:\\users\\talli\\myconfig.xml"); > > TikaConfig config = new TikaConfig(); > > AutoDetectParser parser = new AutoDetectParser(config); > > assertContains("quick brown fox", getXML("testOCR_spacing.png", > > parser).xml); > > } > > > > > > where myconfig.xml is: > > <?xml version="1.0" encoding="UTF-8"?> > > <properties> > > <parsers> > > <parser class="org.apache.tika.parser.DefaultParser"> > > </parser> > > > > <parser class="org.apache.tika.parser.ocr.TesseractOCRParser"> > > <params> > > <param name="tesseractPath" type="string">C:\Program > > Files\Tesseract-OCR2</param> > > <param name="tessdataPath" type="string">C:\Program > > Files\Tesseract-OCR2\tessdata</param> > > </params> > > </parser> > > </parsers> > > </properties> > > > > Whatever you set in your tessConfig will _override_ the underlying settings > > of the parser...all of them. So, if you aren't setting the path there, > > then, y, you won't see any effect. > > > > On Mon, Feb 8, 2021 at 5:35 PM Peter Kronenberg <[email protected]> > > wrote: > > > > Like this. > > > > > > TikaConfig tikaConfig = new TikaConfig(); > > > > final AutoDetectParser parser = new AutoDetectParser(tikaConfig); > > > > final ParseContext parseContext = new ParseContext(); > > > > parseContext.set(AutoDetectParser.class, parser); > > parseContext.set(PDFParserConfig.class, pdfConfig); > > parseContext.set(TesseractOCRConfig.class, tessConfig); > > > > -----Original Message----- > > From: Tim Allison <[email protected]> > > Sent: Monday, February 8, 2021 5:31 PM > > To: [email protected] > > Subject: Re: Tika-config > > > > How are you using the TikaConfig? > > > > On Mon, Feb 8, 2021 at 4:11 PM Peter Kronenberg <[email protected]> > > wrote: > > > > > > What is wrong with this? > > > > > > I specified the tika-config env variable. I know it works because if > > > I make a syntax error in the tika-config.xml, it complains. So it’s > > > finding the file. But it’s not applying the properties > > > > > > > > > > > > I have this tika-config. I tried forward slashes instead of the double > > > backslashes. Same result. No errors. It’s just not applying the values. > > > > > > > > > > > > <?xml version="1.0" encoding="UTF-8"?> <properties> > > > <parsers> > > > <parser class="org.apache.tika.parser.DefaultParser"> > > > </parser> > > > > > > <parser class="org.apache.tika.parser.ocr.TesseractOCRParser"> > > > <params> > > > <param name="tesseractPath" > > > type="string">c:\\tesseract_config</param> > > > <param name="tessdataPath" > > > type="string">c:\\tessdata_config</param> > > > </params> > > > </parser> > > > </parsers> > > > </properties> > > > > > >
