I think the reason is that your input is bad so the model is confused and
a few pixels are enough to see an extra letter.
Your input is "bad" because it is different from the one used to train the
neural network. The difference between the two images is small but the
difference from the training
This seems like an ad-hoc approach. I am already converting images to
grayscale. If I apply blurring, binarisation, etc. then I will solve this
case but I will prompt another case to fail as a result. There is something
with tesseract that fails to generalize on clearly near-identical images,
You need to apply some pre-processing to your image.
On Wednesday, July 15, 2020 at 9:01:14 AM UTC+2, MysteriousGuy wrote:
>
> Hi. Latest stable version (4.1.1) produces the same error
>
> 2020 m. liepa 14 d., antradienis 17:13:40 UTC+3, zdenop rašė:
>>
>> Try to use the latest version of
Hi. Latest stable version (4.1.1) produces the same error
2020 m. liepa 14 d., antradienis 17:13:40 UTC+3, zdenop rašė:
>
> Try to use the latest version of tesseract.
>
> Zdenko
>
>
> ut 14. 7. 2020 o 16:04 MysteriousGuy >
> napísal(a):
>
>> I am using Tesseract to extract text from images
Try to use the latest version of tesseract.
Zdenko
ut 14. 7. 2020 o 16:04 MysteriousGuy napísal(a):
> I am using Tesseract to extract text from images attached. For some
> reason, even though the images are nearly identical, tesseract makes a
> mistake in one of them: for 'bad.png' the output
I am using Tesseract to extract text from images attached. For some reason,
even though the images are nearly identical, tesseract makes a mistake in
one of them: for 'bad.png' the output is ELHADIJ, whereas for 'good.png' it
is ELHADJ
Here is what I have and done:
- tesseract version:
6 matches
Mail list logo