sorry an "internal tessconfig"

On Mon, Feb 8, 2021 at 9:23 PM Tim Allison <[email protected]> wrote:
>
> Let's say you have an internal tessconfig file in the parser that
> you've configured through a tikaconfig.  When, at runtime, you send in
> a new tessconfig via the parsecontext, how can we tell which
> parameters you want to change from the new tessconfig?
>
> Yes, I realize that it would be possible to keep track of what
> parameters have been changed in the runtime config and then do
> something smart, but this hasn't been an issue to date.
>
> On Mon, Feb 8, 2021 at 9:15 PM Peter Kronenberg
> <[email protected]> wrote:
> >
> > Ok, your last point might be the issue. If i don't set it in 
> > tesseractOCRConfig, then seting it in tika-config has no effect? I'm not 
> > sure I understand the thinking or logic behind this.
> >
> >
> > ________________________________
> > From: Tim Allison <[email protected]>
> > Sent: Monday, February 8, 2021 8:47:07 PM
> > To: Peter Kronenberg <[email protected]>; [email protected] 
> > <[email protected]>
> > Subject: Re: Tika-config
> >
> > I regret that I'm not able to reproduce this...that is, this works for me;
> >
> > @Test
> > public void oneOff() throws Exception {
> >     System.setProperty("tika.config", "C:\\users\\talli\\myconfig.xml");
> >     TikaConfig config = new TikaConfig();
> >     AutoDetectParser parser = new AutoDetectParser(config);
> >     assertContains("quick brown fox", getXML("testOCR_spacing.png", 
> > parser).xml);
> > }
> >
> >
> > where myconfig.xml is:
> > <?xml version="1.0" encoding="UTF-8"?>
> > <properties>
> >     <parsers>
> >         <parser class="org.apache.tika.parser.DefaultParser">
> >         </parser>
> >
> >         <parser class="org.apache.tika.parser.ocr.TesseractOCRParser">
> >             <params>
> >                 <param name="tesseractPath" type="string">C:\Program 
> > Files\Tesseract-OCR2</param>
> >                 <param name="tessdataPath" type="string">C:\Program 
> > Files\Tesseract-OCR2\tessdata</param>
> >             </params>
> >         </parser>
> >     </parsers>
> > </properties>
> >
> > Whatever you set in your tessConfig will _override_ the underlying settings 
> > of the parser...all of them.  So, if you aren't setting the path there, 
> > then, y, you won't see any effect.
> >
> > On Mon, Feb 8, 2021 at 5:35 PM Peter Kronenberg <[email protected]> 
> > wrote:
> >
> > Like this.
> >
> >
> >         TikaConfig tikaConfig = new TikaConfig();
> >
> >         final AutoDetectParser parser = new AutoDetectParser(tikaConfig);
> >
> >         final ParseContext parseContext = new ParseContext();
> >
> >         parseContext.set(AutoDetectParser.class, parser);
> >         parseContext.set(PDFParserConfig.class, pdfConfig);
> >         parseContext.set(TesseractOCRConfig.class, tessConfig);
> >
> > -----Original Message-----
> > From: Tim Allison <[email protected]>
> > Sent: Monday, February 8, 2021 5:31 PM
> > To: [email protected]
> > Subject: Re: Tika-config
> >
> > How are you using the TikaConfig?
> >
> > On Mon, Feb 8, 2021 at 4:11 PM Peter Kronenberg <[email protected]> 
> > wrote:
> > >
> > > What is wrong with this?
> > >
> > > I specified the tika-config env variable.  I know it works because if
> > > I make a syntax error in the tika-config.xml, it complains.  So it’s
> > > finding the file.  But it’s not applying the properties
> > >
> > >
> > >
> > > I have this tika-config.  I tried forward slashes instead of the double 
> > > backslashes.  Same result.  No errors.  It’s just not applying the values.
> > >
> > >
> > >
> > > <?xml version="1.0" encoding="UTF-8"?> <properties>
> > >     <parsers>
> > >         <parser class="org.apache.tika.parser.DefaultParser">
> > >         </parser>
> > >
> > >         <parser class="org.apache.tika.parser.ocr.TesseractOCRParser">
> > >             <params>
> > >                 <param name="tesseractPath" 
> > > type="string">c:\\tesseract_config</param>
> > >                 <param name="tessdataPath" 
> > > type="string">c:\\tessdata_config</param>
> > >             </params>
> > >         </parser>
> > >     </parsers>
> > > </properties>
> > >
> > >

Reply via email to