Andreas,

By scaling the screenshots to a higher resolution, to about 300 DPI,
you'd likely get better results. VietOCR.NET has a Screenshot mode
that performs this rescaling. You may want to check it out.

I believe the language packs included in tesseractdotnet are
Tesseract's standard issues. The eng seems to work very well for
Windows' standard fonts. Check the site http://code.google.com/p/tesseract-ocr/
for more info.

Quan

On Jul 6, 12:13 pm, Andreas Reiff <[email protected]> wrote:
> Thanks, Sven!
>
> Actually, that is what I did, and it is working.. between better and
> great (of course, my expectations go up with me seeing what is
> possible).
>
> I have also written my own convert-to-grayscale and max-contrast,
> allowing even for some increase above max values (putting some values
> close to max and min to black and white as well).
>
> It is working a lot better.
>
> I still get occasional bad results (though way less frequently).
>
> So, for anyone wanting to do this as well: scaling up the image by a
> factor of 3 and increasing the contrast improves recognition quality a
> lot.
>
> I wonder: is there any testdata for Windows with standard fonts? Or
> how to approach this?
>
> Best wishes
> Andreas
>
> On 6 Jul., 17:57, Sven Pedersen <[email protected]> wrote:
>
>
>
> > For screen captures it is necessary to increase the resolution, since
> > it is usually 72-90dpi you must rescale them to 200-300dpi, then
> > you'll see a drastic improvement in accuracy. I don't know anything
> > about the C# stuff though...
> > --Sven
>
> > On Wed, Jul 6, 2011 at 9:02 AM, Andreas Reiff <[email protected]> 
> > wrote:
> > > Hello Quan!
>
> > > That did the trick, many thanks!
>
> > > By the way, I am using unsafe, because it is in the example
> > > Simple1.cs.
>
> > > Apart from that, I would rather not use it, since it propagates up..
> > > and it doesn't prevent an application from crashing anyway.
>
> > > If you find the time, could you answer one more related question: I
> > > want to do screen text recognition, like text on menus, in notepad,
> > > and the like. Your testdata seems to be rather bad for this (now that
> > > it is running, I could test). How best to handle this? Create/get new
> > > testdata? Is it possible to use it without testdata at all?
>
> > > I would have expected screen recognition to be especially easy, since
> > > there is no noise. But then again, I have spent too little time to
> > > look into this yet.
>
> > > Best wishes,
> > > Andreas
>
> > > On 6 Jul., 14:05, Quan Nguyen <[email protected]> wrote:
> > >> Andreas,
>
> > >> Try adding a slash to the data path, such as:
>
> > >> string tessdataFolder = @"D:\Temp\IPoVnOCRer\IPoVn\Test\Tessdata\";
>
> > >> I'm curious as to why you use unsafe block in your code.
>
> > >> Quan
>
> > >> On Jul 6, 5:01 am, Andreas Reiff <[email protected]> wrote:
>
> > >> > I get an AccessViolationException, trying to adapt your code for my
> > >> > needs: Attempted to read or write protected memory. This is often an
> > >> > indication that other memory is corrupt.
>
> > >> > The code is more or less copied from your simple1 - my bitmap does not
> > >> > come out of a file but from a screenshot (part of the screen).
>
> > >> > public static void Recognize(Bitmap bmp)
> > >> > {
> > >> >     string language = "eng";
> > >> >     int oem = (int)eOcrEngineMode.OEM_DEFAULT;
>
> > >> >     using (TesseractProcessor processor = new TesseractProcessor())
> > >> >     {
> > >> >         DateTime started = DateTime.Now;
> > >> >         DateTime ended = DateTime.Now;
>
> > >> >         string tessdataFolder = @"D:\Temp\IPoVnOCRer\IPoVn\Test
> > >> > \Tessdata";
>
> > >> >         processor.Init(tessdataFolder, language, oem);
>
> > >> >         string text = "";
> > >> >         unsafe
> > >> >         {
> > >> >             started = DateTime.Now;
> > >> >             text = processor.Recognize(bmp);
> > >> >             ended = DateTime.Now;
>
> > >> >             Console.WriteLine("Duration recognition: {0} ms\n\n",
> > >> > (ended - started).TotalMilliseconds);
> > >> >         }
>
> > >> >         Console.WriteLine(
> > >> >             string.Format("RecognizeMode: {1}\nRecognized Text:\n{0}\n+
> > >> > +++++++++++++++++++++++++++++++\n", text,
> > >> > ((eOcrEngineMode)oem).ToString()));
>
> > >> >     }
>
> > >> > }
>
> > >> > BTW, thx for writing a wrapper - if it works, it solves just about all
> > >> > my problems. :)- Zitierten Text ausblenden -
>
> > >> - Zitierten Text anzeigen -
>
> > > --
> > > You received this message because you are subscribed to the Google
> > > Groups "tesseract-ocr" group.
> > > To post to this group, send email to [email protected]
> > > To unsubscribe from this group, send email to
> > > [email protected]
> > > For more options, visit this group at
> > >http://groups.google.com/group/tesseract-ocr?hl=en
>
> > --
> > ``All that is gold does not glitter,
> >   not all those who wander are lost;
> > the old that is strong does not wither,
> >   deep roots are not reached by the frost.
> > From the ashes a fire shall be woken,
> >   a light from the shadows shall spring;
> > renewed shall be blade that was broken,
> >   the crownless again shall be king.”- Zitierten Text ausblenden -
>
> > - Zitierten Text anzeigen -- Hide quoted text -
>
> - Show quoted text -

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to