Thanks, but as I see the problem is active since 2017, and no clear 
solution is present.

Now I tried to get recognition result via iterator API, and that's really a 
strange thing.
All the characted are listed, and those that are "duplicates" share the 
same coordinates as the correct ones, but have different confidence values.
First idea was to sort them on X coordinate and just get best fit values, 
BUT the X coordinates returned by TessPageIteratorBoundingBox happen *to be 
totally invalid*.
Seems it's some critical bug is Tesseract !!!

Let's take a line of "1234567890". Result returned by iterator is:
>> 1
Conf: 98,65
Box: 1805, 771, 1843, 813
>> 2
Conf: 99,00
Box: 1811, 771, 1875, 813
>> 3  
Conf: 99,00
Box: 1843, 771, 1927, 813
>> 4
Conf: 99,00
Box: 1890, 771, 1964, 813
>> 5  *<<< DAM, what is here ?! Why letter "5" is reported with X 
coordinate right after letter "3", while really it goes after letter "5" ?!*
Conf: 99,00
Box: 1927, 771, 2001, 813
>> 6 << This one is even more amazing. Letter "6" is said right the place 
of letter "1", and size is 30+mm !!!
Conf: 99,02
Box: 1805, 771, 2195, 813
>> 7
Conf: 98,99
Box: 2005, 771, 2090, 813
>> 8
Conf: 98,96
Box: 2053, 771, 2127, 813
>> 9
Conf: 99,01
Box: 2095, 771, 2158, 813
>> 0
Conf: 98,98
Box: 2126, 771, 2190, 813

четверг, 4 июля 2019 г., 15:09:13 UTC+3 пользователь shree написал:
>
> This is an open issue - see 
> https://github.com/tesseract-ocr/tesseract/issues/1060
> and other related issues
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a6b65fd0-38ef-407d-9e67-e0b0d19066a2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to