Hi community,
I have been playing around with the engine and have found some issues with
some pictures, I am using bitmaps generated by the computer on diagrams
that I create that then change regularly.
The issue I have is that the text, which is numeric in nature, is not being
identified, or is identified wrong (not by much, but enough).
Attached is an example image, the image shows 13.00%, this is sometimes
identified as I3.00% or I 3.00X, or I3.0096.
I can understand why this occurs as they are similar to the engine, but
when I increase the image size, it works better, which is expected and
supported by the optimization documentation, optimal size is 300DPI.
I would like some guidance as to any flags or the like, or even an advanced
numeric trainingdata that can help in this regard.
Any advice or tips or even a guide to better utilization of the engine
would be appreciated.
Thanks.
PS. Current code:
engine = new TesseractEngine(@"./tessdata", "eng", EngineMode.TesseractOnly,
"config");
private string Decypher_add_entries(Bitmap bitmap, int blowupW, int blowupH)
{
bitmap = ResizeImage(bitmap, bitmap.Width * blowupW,
bitmap.Height * blowupH);
string text = "";
//var i = 1;
using (var page = engine.Process(bitmap))
{
text = page.GetText();
}
return text;
}
I might not be utilizing all the available commands that can assist me,
thats all the code I use for implementation which is a fairly simple 3-4
lines of code.
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/b64ff8e1-4c9c-4c28-a6ec-5e9ee859718f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.