[tesseract-ocr] Numeric recognition accuracy

Vasa Serafin Sun, 03 Apr 2016 23:43:17 -0700

Hi community,

I have been playing around with the engine and have found some issues with 
some pictures, I am using bitmaps generated by the computer on diagrams 
that I create that then change regularly.


The issue I have is that the text, which is numeric in nature, is not being 
identified, or is identified wrong (not by much, but enough).

Attached is an example image, the image shows 13.00%, this is sometimes 
identified as I3.00% or I 3.00X, or I3.0096.

I can understand why this occurs as they are similar to the engine, but 
when I increase the image size, it works better, which is expected and 
supported by the optimization documentation, optimal size is 300DPI.

I would like some guidance as to any flags or the like, or even an advanced 
numeric trainingdata that can help in this regard.

Any advice or tips or even a guide to better utilization of the engine 
would be appreciated.

Thanks.

PS. Current code:

engine = new TesseractEngine(@"./tessdata", "eng", EngineMode.TesseractOnly, 
"config");

private string Decypher_add_entries(Bitmap bitmap, int blowupW, int blowupH)
        {
            bitmap = ResizeImage(bitmap, bitmap.Width * blowupW, 
bitmap.Height * blowupH);

            string text = "";

            //var i = 1;
            using (var page = engine.Process(bitmap))
            {
                text = page.GetText();
            }

            return text;
        }

I might not be utilizing all the available commands that can assist me, 
thats all the code I use for implementation which is a fairly simple 3-4 
lines of code.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b64ff8e1-4c9c-4c28-a6ec-5e9ee859718f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Numeric recognition accuracy

Reply via email to