Let's say you have an internal tessconfig file in the parser that
you've configured through a tikaconfig.  When, at runtime, you send in
a new tessconfig via the parsecontext, how can we tell which
parameters you want to change from the new tessconfig?

Yes, I realize that it would be possible to keep track of what
parameters have been changed in the runtime config and then do
something smart, but this hasn't been an issue to date.

On Mon, Feb 8, 2021 at 9:15 PM Peter Kronenberg
<[email protected]> wrote:
>
> Ok, your last point might be the issue. If i don't set it in 
> tesseractOCRConfig, then seting it in tika-config has no effect? I'm not sure 
> I understand the thinking or logic behind this.
>
>
> ________________________________
> From: Tim Allison <[email protected]>
> Sent: Monday, February 8, 2021 8:47:07 PM
> To: Peter Kronenberg <[email protected]>; [email protected] 
> <[email protected]>
> Subject: Re: Tika-config
>
> I regret that I'm not able to reproduce this...that is, this works for me;
>
> @Test
> public void oneOff() throws Exception {
>     System.setProperty("tika.config", "C:\\users\\talli\\myconfig.xml");
>     TikaConfig config = new TikaConfig();
>     AutoDetectParser parser = new AutoDetectParser(config);
>     assertContains("quick brown fox", getXML("testOCR_spacing.png", 
> parser).xml);
> }
>
>
> where myconfig.xml is:
> <?xml version="1.0" encoding="UTF-8"?>
> <properties>
>     <parsers>
>         <parser class="org.apache.tika.parser.DefaultParser">
>         </parser>
>
>         <parser class="org.apache.tika.parser.ocr.TesseractOCRParser">
>             <params>
>                 <param name="tesseractPath" type="string">C:\Program 
> Files\Tesseract-OCR2</param>
>                 <param name="tessdataPath" type="string">C:\Program 
> Files\Tesseract-OCR2\tessdata</param>
>             </params>
>         </parser>
>     </parsers>
> </properties>
>
> Whatever you set in your tessConfig will _override_ the underlying settings 
> of the parser...all of them.  So, if you aren't setting the path there, then, 
> y, you won't see any effect.
>
> On Mon, Feb 8, 2021 at 5:35 PM Peter Kronenberg <[email protected]> 
> wrote:
>
> Like this.
>
>
>         TikaConfig tikaConfig = new TikaConfig();
>
>         final AutoDetectParser parser = new AutoDetectParser(tikaConfig);
>
>         final ParseContext parseContext = new ParseContext();
>
>         parseContext.set(AutoDetectParser.class, parser);
>         parseContext.set(PDFParserConfig.class, pdfConfig);
>         parseContext.set(TesseractOCRConfig.class, tessConfig);
>
> -----Original Message-----
> From: Tim Allison <[email protected]>
> Sent: Monday, February 8, 2021 5:31 PM
> To: [email protected]
> Subject: Re: Tika-config
>
> How are you using the TikaConfig?
>
> On Mon, Feb 8, 2021 at 4:11 PM Peter Kronenberg <[email protected]> 
> wrote:
> >
> > What is wrong with this?
> >
> > I specified the tika-config env variable.  I know it works because if
> > I make a syntax error in the tika-config.xml, it complains.  So it’s
> > finding the file.  But it’s not applying the properties
> >
> >
> >
> > I have this tika-config.  I tried forward slashes instead of the double 
> > backslashes.  Same result.  No errors.  It’s just not applying the values.
> >
> >
> >
> > <?xml version="1.0" encoding="UTF-8"?> <properties>
> >     <parsers>
> >         <parser class="org.apache.tika.parser.DefaultParser">
> >         </parser>
> >
> >         <parser class="org.apache.tika.parser.ocr.TesseractOCRParser">
> >             <params>
> >                 <param name="tesseractPath" 
> > type="string">c:\\tesseract_config</param>
> >                 <param name="tessdataPath" 
> > type="string">c:\\tessdata_config</param>
> >             </params>
> >         </parser>
> >     </parsers>
> > </properties>
> >
> >

Reply via email to