Hi Dave, I like having request headers with the Tesseract properties, prefixed with X-Tika-OCR<propertyname>. Very cool idea!
Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: David Meikle <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Monday, November 17, 2014 at 3:54 PM To: "[email protected]" <[email protected]> Subject: Re: Setting tesseract properties when using tika-server > > > >Hi Nick, > > >On 16 Nov 2014, at 11:16, Nick Burch <[email protected]> wrote: > >Maybe > we could say that the default Tika URL won't include tessaract. We then >provide another one that does bring it in, and offers parameters to hint >which languages to try for on that request? > > > > > > >Considering this again, we already have set the pattern that you can hint >via headers (i.e. our File-Name header), so why not do this via a header. > > >Thinking about calling this X-Tika-OCRLanguage? Any other preferences? > > >Cheers, >Dave
