Andreas, By scaling the screenshots to a higher resolution, to about 300 DPI, you'd likely get better results. VietOCR.NET has a Screenshot mode that performs this rescaling. You may want to check it out.
I believe the language packs included in tesseractdotnet are Tesseract's standard issues. The eng seems to work very well for Windows' standard fonts. Check the site http://code.google.com/p/tesseract-ocr/ for more info. Quan On Jul 6, 12:13 pm, Andreas Reiff <[email protected]> wrote: > Thanks, Sven! > > Actually, that is what I did, and it is working.. between better and > great (of course, my expectations go up with me seeing what is > possible). > > I have also written my own convert-to-grayscale and max-contrast, > allowing even for some increase above max values (putting some values > close to max and min to black and white as well). > > It is working a lot better. > > I still get occasional bad results (though way less frequently). > > So, for anyone wanting to do this as well: scaling up the image by a > factor of 3 and increasing the contrast improves recognition quality a > lot. > > I wonder: is there any testdata for Windows with standard fonts? Or > how to approach this? > > Best wishes > Andreas > > On 6 Jul., 17:57, Sven Pedersen <[email protected]> wrote: > > > > > For screen captures it is necessary to increase the resolution, since > > it is usually 72-90dpi you must rescale them to 200-300dpi, then > > you'll see a drastic improvement in accuracy. I don't know anything > > about the C# stuff though... > > --Sven > > > On Wed, Jul 6, 2011 at 9:02 AM, Andreas Reiff <[email protected]> > > wrote: > > > Hello Quan! > > > > That did the trick, many thanks! > > > > By the way, I am using unsafe, because it is in the example > > > Simple1.cs. > > > > Apart from that, I would rather not use it, since it propagates up.. > > > and it doesn't prevent an application from crashing anyway. > > > > If you find the time, could you answer one more related question: I > > > want to do screen text recognition, like text on menus, in notepad, > > > and the like. Your testdata seems to be rather bad for this (now that > > > it is running, I could test). How best to handle this? Create/get new > > > testdata? Is it possible to use it without testdata at all? > > > > I would have expected screen recognition to be especially easy, since > > > there is no noise. But then again, I have spent too little time to > > > look into this yet. > > > > Best wishes, > > > Andreas > > > > On 6 Jul., 14:05, Quan Nguyen <[email protected]> wrote: > > >> Andreas, > > > >> Try adding a slash to the data path, such as: > > > >> string tessdataFolder = @"D:\Temp\IPoVnOCRer\IPoVn\Test\Tessdata\"; > > > >> I'm curious as to why you use unsafe block in your code. > > > >> Quan > > > >> On Jul 6, 5:01 am, Andreas Reiff <[email protected]> wrote: > > > >> > I get an AccessViolationException, trying to adapt your code for my > > >> > needs: Attempted to read or write protected memory. This is often an > > >> > indication that other memory is corrupt. > > > >> > The code is more or less copied from your simple1 - my bitmap does not > > >> > come out of a file but from a screenshot (part of the screen). > > > >> > public static void Recognize(Bitmap bmp) > > >> > { > > >> > string language = "eng"; > > >> > int oem = (int)eOcrEngineMode.OEM_DEFAULT; > > > >> > using (TesseractProcessor processor = new TesseractProcessor()) > > >> > { > > >> > DateTime started = DateTime.Now; > > >> > DateTime ended = DateTime.Now; > > > >> > string tessdataFolder = @"D:\Temp\IPoVnOCRer\IPoVn\Test > > >> > \Tessdata"; > > > >> > processor.Init(tessdataFolder, language, oem); > > > >> > string text = ""; > > >> > unsafe > > >> > { > > >> > started = DateTime.Now; > > >> > text = processor.Recognize(bmp); > > >> > ended = DateTime.Now; > > > >> > Console.WriteLine("Duration recognition: {0} ms\n\n", > > >> > (ended - started).TotalMilliseconds); > > >> > } > > > >> > Console.WriteLine( > > >> > string.Format("RecognizeMode: {1}\nRecognized Text:\n{0}\n+ > > >> > +++++++++++++++++++++++++++++++\n", text, > > >> > ((eOcrEngineMode)oem).ToString())); > > > >> > } > > > >> > } > > > >> > BTW, thx for writing a wrapper - if it works, it solves just about all > > >> > my problems. :)- Zitierten Text ausblenden - > > > >> - Zitierten Text anzeigen - > > > > -- > > > You received this message because you are subscribed to the Google > > > Groups "tesseract-ocr" group. > > > To post to this group, send email to [email protected] > > > To unsubscribe from this group, send email to > > > [email protected] > > > For more options, visit this group at > > >http://groups.google.com/group/tesseract-ocr?hl=en > > > -- > > ``All that is gold does not glitter, > > not all those who wander are lost; > > the old that is strong does not wither, > > deep roots are not reached by the frost. > > From the ashes a fire shall be woken, > > a light from the shadows shall spring; > > renewed shall be blade that was broken, > > the crownless again shall be king.”- Zitierten Text ausblenden - > > > - Zitierten Text anzeigen -- Hide quoted text - > > - Show quoted text - -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

