[tesseract-ocr] Tesseract not working for some single examples.

Filip Bry Tue, 30 Jul 2024 07:45:52 -0700


I'm trying to use a tesseract in project wrote in C#. I have a problem with 
reading text from a part of an image. I'm trying to find this 4 signs (in 
example 0000) and number after "e". Additionally, for some examples it is 
working perfectly but for some others its printing "Empty page!!!". 
Difference between examples is color of the background but whole image 
processing is the same for every try. What should I do to minimize 
probability of error?



Thats the image where ocr is working correctly:
[image: working.jpg]

and here is not working: 

[image: not working.jpg]



Part of code in c#:


public static class Sign
{
    public static void Verify()
    {
        string imagePath = "path.bmp";
        Mat imageSign = new Mat(imagePath);

        int h = imageSign.Rows;
        int w = imageSign.Cols;
        int point1 = (int)(0.01 * w);
        int point2 = (int)(0.6 * h);
        int point3 = (int)(0.3 * w);
        int point4 = (int)(0.9 * h);
        OpenCvSharp.Point start_point = new OpenCvSharp.Point(point1, 
point2);
        OpenCvSharp.Point end_point = new OpenCvSharp.Point(point3, point4);
        imageSign = new Mat(imageSign, new OpenCvSharp.Rect(point1, point2, 
point3 - point1, point4 - point2));
        Cv2.Resize(imageSign, imageSign, new OpenCvSharp.Size(), 2, 2);
        imageSign.SaveImage(imagePath);
        
        using (Bitmap bitmap = (Bitmap)Image.FromFile(imagePathE))
        {
            using (Bitmap newBitmap = new Bitmap(bitmap))
            {
                string imagePathA = "2nd image path.bmp";
                newBitmap.SetResolution(300, 300);
                newBitmap.Save(imagePathA);
            }
        }




        string imagePathB = " "2nd image path.bmp " ;
        var pixFromFile = Pix.LoadFromFile(imagePathB);
        string customConfig = "--psm 10 --oem 3";
        using (var engine = new TesseractEngine(@"C:\Program 
Files\Tesseract-OCR\tessdata", "eng", EngineMode.Default))
        {

            engined.SetVariable("tessedit_char_whitelist", "0123456789");
            using (var page = engined.Process(pixFromFile, customConfig))
            {
                string text = page.GetText();
                Console.Write(text);

                string[] lines = text.Split('\n');
                bool linijka = false;

                foreach (string line in lines)
                {
                    if (line.Length == 4 || line.Length == 5)
                    {
                        Console.WriteLine("Oznaczenie e5: ");
                        Console.WriteLine(line);
                        linijka = true;
                    }
                    if (line.Length == 1)
                    {
                        Console.WriteLine("e_:");
                        Console.WriteLine(line);
                    }
                }

               
                Cv2.ImShow("koniec", imageSign);
                Cv2.WaitKey(0);
            }
        }

I tried cropping an image and for some reason when i making it bigger or 
smaller than it is now, it adversely affects on results. Additionally I 
tried some other tesseract psm configurations and changed dpi of image to 
300.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/6ba5367b-235d-4608-9ba6-65c2a2a5eef9n%40googlegroups.com.

[tesseract-ocr] Tesseract not working for some single examples.

Reply via email to