Re: [tesseract-ocr] Tesseract makes different predictions on seemingly equal images. How to make it more robust?

2020-07-15 Thread Lorenzo Bolzani
I think the reason is that your input is bad so the model is confused and a few pixels are enough to see an extra letter. Your input is "bad" because it is different from the one used to train the neural network. The difference between the two images is small but the difference from the training

Re: [tesseract-ocr] Tesseract makes different predictions on seemingly equal images. How to make it more robust?

2020-07-15 Thread MysteriousGuy
This seems like an ad-hoc approach. I am already converting images to grayscale. If I apply blurring, binarisation, etc. then I will solve this case but I will prompt another case to fail as a result. There is something with tesseract that fails to generalize on clearly near-identical images,

Re: [tesseract-ocr] Tesseract makes different predictions on seemingly equal images. How to make it more robust?

2020-07-15 Thread Tuan Ardouin
You need to apply some pre-processing to your image. On Wednesday, July 15, 2020 at 9:01:14 AM UTC+2, MysteriousGuy wrote: > > Hi. Latest stable version (4.1.1) produces the same error > > 2020 m. liepa 14 d., antradienis 17:13:40 UTC+3, zdenop rašė: >> >> Try to use the latest version of

Re: [tesseract-ocr] Tesseract makes different predictions on seemingly equal images. How to make it more robust?

2020-07-15 Thread MysteriousGuy
Hi. Latest stable version (4.1.1) produces the same error 2020 m. liepa 14 d., antradienis 17:13:40 UTC+3, zdenop rašė: > > Try to use the latest version of tesseract. > > Zdenko > > > ut 14. 7. 2020 o 16:04 MysteriousGuy > > napísal(a): > >> I am using Tesseract to extract text from images

Re: [tesseract-ocr] Tesseract makes different predictions on seemingly equal images. How to make it more robust?

2020-07-14 Thread Zdenko Podobny
Try to use the latest version of tesseract. Zdenko ut 14. 7. 2020 o 16:04 MysteriousGuy napísal(a): > I am using Tesseract to extract text from images attached. For some > reason, even though the images are nearly identical, tesseract makes a > mistake in one of them: for 'bad.png' the output

[tesseract-ocr] Tesseract makes different predictions on seemingly equal images. How to make it more robust?

2020-07-14 Thread MysteriousGuy
I am using Tesseract to extract text from images attached. For some reason, even though the images are nearly identical, tesseract makes a mistake in one of them: for 'bad.png' the output is ELHADIJ, whereas for 'good.png' it is ELHADJ Here is what I have and done: - tesseract version: