Update: For now, I have developed a workaround that is able to detect such 
representations of the digit 1 by measuring how rectangular the symbol is.

Yet it would be nice to hear something from the Tesseract developers. Maybe 
they can shed some light on this issue.

Am Dienstag, 22. April 2014 15:54:12 UTC+2 schrieb Robert Nitsch:
>
> Hello everybody,
>
> I am currently dealing with a font that has a very simple representation 
> of the digit 1: It's basically just a vertical rectangle. It seems that 
> this is quite troublesome for Tesseract, because the digit 1 is almost 
> never recognized. By not recognized I mean that the GetUTF8Text of the 
> ResultIterator returns a NULL pointer. This seems to be even worse than a 
> misclassification, because it appears that this can't be fixed by adding 
> more training samples. Please note that all of the other digits are 
> recognized quite well.
>
> I have tried all kinds of fixes. I have tried various resizing factors, 
> various blurring and sharpening operations, morphological filters, as well 
> as a number of different paddings, but no avail. Maybe you can help me 
> out.
>
> The attachments consist of the 3 images I use for training and 1 example 
> input image for which Tesseract returns a NULL pointer.
>
> Thank you and best regards,
> Robert
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/9df9c658-6320-474b-ad9a-3c63cc18f48f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to