[tesseract-ocr] text2image creates char boxes for 'fi' and 'fl'. Why?

2016-09-03 Thread Brais Gabín Moreira
Hi, I'm trying to train tesseract. But text2image creates a single box for 'fi' or 'fl'. Why it thinks that 'fi' or 'fl' are a single character instead of two? How can I fix this? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe

[tesseract-ocr] Re: text2image creates char boxes for 'fi' and 'fl'. Why?

2016-09-08 Thread Brais Gabín Moreira
en >> chasing other issues and haven't verified a solution. >> >> >> On Saturday, September 3, 2016 at 5:23:55 AM UTC-4, Brais Gabín Moreira >> wrote: >>> >>> Hi, I'm trying to train tesseract. But text2image creates a single box

[tesseract-ocr] Tesseract improves after every recognition. How can I persist this improvements?

2016-09-11 Thread Brais Gabín Moreira
I'm running tesseract to read screenshots. I noticed that when I run the "easy screenshots" first and then the more difficult ones I get better recognition. This is not a feeling. I can reproduce this behaviour. It's possible to export this new "knowledge" that tesseract learned to a new .train

[tesseract-ocr] Reduce the weight of eng.traineddata using only one font

2016-09-11 Thread Brais Gabín Moreira
I'm using tesseract to recognice some screenshots. I'm building this in an Android app so ~20MB of traineddata is a lot of weight. I know the font in those screenshots. How can I reproduce the steps to generate the eng.traineddata? I want to use the same data: text, dictionary, patterns, etc. O

[tesseract-ocr] Re: Help - Simple Example

2016-09-11 Thread Brais Gabín Moreira
You can try somthing like this: http://www.imagemagick.org/Usage/color_mods/#level make the light colors completely white and the dark colors completely black. I use something similar with my images and it works great (I can't use imagemagick). El domingo, 11 de septiembre de 2016, 21:19:23 (UT

[tesseract-ocr] Re: Reduce the weight of eng.traineddata using only one font

2016-09-12 Thread Brais Gabín Moreira
ile, one of > which is only 3MB. > > https://sourceforge.net/projects/tesseract-ocr-alt/files/ > > On Sunday, September 11, 2016 at 7:02:54 AM UTC-5, Brais Gabín Moreira > wrote: >> >> I'm using tesseract to recognice some screenshots. I'm building this in &g

[tesseract-ocr] A docker image with the training tools

2016-09-21 Thread Brais Gabín Moreira
I couldn't find a Docker image with the training tools on it so I built one! Code: https://github.com/BraisGabin/docker-tesseract Image: https://hub.docker.com/r/braisgabin/tesseract/ I hope you have better luck training your OCR than me :P -- You received this message because you are subscrib