On Sat, 15 Nov 2014, David Meikle wrote:
The OP is using the Tika Server though. I guess we'd need to allow for an extra header in the server to get this set on the context used in the server's parsing?

We could do something like this to allow users to set the language per request - I am using the parser wrapped via its own server API, so all I am doing is capturing a request parameter and then setting the context to override a patched TesseractOCRConfig that loads from an external properties file akin to the PDFConfig file. I will add that in at least.

I personally don’t like custom headers that modify behaviour, although you do see if in POST requests at times. Same difference really between this and an optional parameter. Maybe the config file will be enough as having added the above, I don’t see much difference between a call with a single language and one with all languages configured.

Maybe we could say that the default Tika URL won't include tessaract. We then provide another one that does bring it in, and offers parameters to hint which languages to try for on that request?

Nick

Reply via email to