Hello,
I am trying to use the Tessnet2 using Tesseract engine in C#. For many of
the test images I give to Tessnet2, the output is very bad, and almost
nothing is correct.
This is my code in the C# console project, Program.cs class:
static void Main(string[] args)
{
try
{
Bitmap image = new Bitmap(@"C:\Users\hp\Desktop\eurotext.tif");
var ocr = new Tesseract();
//when I tried to add the SetVariable(...), it didn't change the output
much
ocr.Init(@"C:\Program Files (x86)\Tesseract-OCR", "eng", true);
var result = ocr.DoOCR(image, Rectangle.Empty);
foreach (Word word in result)
Console.WriteLine("{0} : {1}", word.Confidence, word.Text);
Console.ReadLine();
}
catch (Exception exception)
{
Console.WriteLine("Error");
}}
For example, this is a sample (large binary 300 dpi) test image
"eurotext.tif": [image: enter image description here]
And this is the Tessnet2 output for this image:
[image: enter image description here]
I have been using this website to learn the steps to use Tessnet2:
https://code.msdn.microsoft.com/windowsdesktop/How-to-use-Tessnet2-library-716be12f
I used this website to try to correctly use the SetVariable(...) function
to make it do what I want, but with no luck and not much difference in the
output: http://www.sk-spell.sk.cx/tesseract-ocr-en
I found the Tesseract guidelines to reduce the error of the engine:
http://code.google.com/p/tesseract-ocr/wiki/ImproveQuality
-
it says "Tesseract works best with text using a DPI of at least 300
dpi".. this sample image is 300 dpi
-
this sample image is also binary, which should give a better output, as
was advised by many people on various websites
I looked everywhere for a solution that can increase the accuracy, and I
found many posts and people with similar problems, but with no working
solution.
What could be the reason for this problem? How can I solve it?
I am a beginner in this topic, so please bear with me if the solution is
too trivial.
Thanks!
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/d79f3f11-c1b9-4e96-886d-04ed10e73344%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.