Thanks, Sven!

Actually, that is what I did, and it is working.. between better and
great (of course, my expectations go up with me seeing what is
possible).

I have also written my own convert-to-grayscale and max-contrast,
allowing even for some increase above max values (putting some values
close to max and min to black and white as well).

It is working a lot better.

I still get occasional bad results (though way less frequently).

So, for anyone wanting to do this as well: scaling up the image by a
factor of 3 and increasing the contrast improves recognition quality a
lot.

I wonder: is there any testdata for Windows with standard fonts? Or
how to approach this?

Best wishes
Andreas

On 6 Jul., 17:57, Sven Pedersen <[email protected]> wrote:
> For screen captures it is necessary to increase the resolution, since
> it is usually 72-90dpi you must rescale them to 200-300dpi, then
> you'll see a drastic improvement in accuracy. I don't know anything
> about the C# stuff though...
> --Sven
>
>
>
>
>
> On Wed, Jul 6, 2011 at 9:02 AM, Andreas Reiff <[email protected]> 
> wrote:
> > Hello Quan!
>
> > That did the trick, many thanks!
>
> > By the way, I am using unsafe, because it is in the example
> > Simple1.cs.
>
> > Apart from that, I would rather not use it, since it propagates up..
> > and it doesn't prevent an application from crashing anyway.
>
> > If you find the time, could you answer one more related question: I
> > want to do screen text recognition, like text on menus, in notepad,
> > and the like. Your testdata seems to be rather bad for this (now that
> > it is running, I could test). How best to handle this? Create/get new
> > testdata? Is it possible to use it without testdata at all?
>
> > I would have expected screen recognition to be especially easy, since
> > there is no noise. But then again, I have spent too little time to
> > look into this yet.
>
> > Best wishes,
> > Andreas
>
> > On 6 Jul., 14:05, Quan Nguyen <[email protected]> wrote:
> >> Andreas,
>
> >> Try adding a slash to the data path, such as:
>
> >> string tessdataFolder = @"D:\Temp\IPoVnOCRer\IPoVn\Test\Tessdata\";
>
> >> I'm curious as to why you use unsafe block in your code.
>
> >> Quan
>
> >> On Jul 6, 5:01 am, Andreas Reiff <[email protected]> wrote:
>
> >> > I get an AccessViolationException, trying to adapt your code for my
> >> > needs: Attempted to read or write protected memory. This is often an
> >> > indication that other memory is corrupt.
>
> >> > The code is more or less copied from your simple1 - my bitmap does not
> >> > come out of a file but from a screenshot (part of the screen).
>
> >> > public static void Recognize(Bitmap bmp)
> >> > {
> >> >     string language = "eng";
> >> >     int oem = (int)eOcrEngineMode.OEM_DEFAULT;
>
> >> >     using (TesseractProcessor processor = new TesseractProcessor())
> >> >     {
> >> >         DateTime started = DateTime.Now;
> >> >         DateTime ended = DateTime.Now;
>
> >> >         string tessdataFolder = @"D:\Temp\IPoVnOCRer\IPoVn\Test
> >> > \Tessdata";
>
> >> >         processor.Init(tessdataFolder, language, oem);
>
> >> >         string text = "";
> >> >         unsafe
> >> >         {
> >> >             started = DateTime.Now;
> >> >             text = processor.Recognize(bmp);
> >> >             ended = DateTime.Now;
>
> >> >             Console.WriteLine("Duration recognition: {0} ms\n\n",
> >> > (ended - started).TotalMilliseconds);
> >> >         }
>
> >> >         Console.WriteLine(
> >> >             string.Format("RecognizeMode: {1}\nRecognized Text:\n{0}\n+
> >> > +++++++++++++++++++++++++++++++\n", text,
> >> > ((eOcrEngineMode)oem).ToString()));
>
> >> >     }
>
> >> > }
>
> >> > BTW, thx for writing a wrapper - if it works, it solves just about all
> >> > my problems. :)- Zitierten Text ausblenden -
>
> >> - Zitierten Text anzeigen -
>
> > --
> > You received this message because you are subscribed to the Google
> > Groups "tesseract-ocr" group.
> > To post to this group, send email to [email protected]
> > To unsubscribe from this group, send email to
> > [email protected]
> > For more options, visit this group at
> >http://groups.google.com/group/tesseract-ocr?hl=en
>
> --
> ``All that is gold does not glitter,
>   not all those who wander are lost;
> the old that is strong does not wither,
>   deep roots are not reached by the frost.
> From the ashes a fire shall be woken,
>   a light from the shadows shall spring;
> renewed shall be blade that was broken,
>   the crownless again shall be king.”- Zitierten Text ausblenden -
>
> - Zitierten Text anzeigen -

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to