2009/2/27 Albert Law <[email protected]>: > > Hi, > > Yes, I would also like to find the "don't suck button". Just kidding. > > It just sounds like typical OCR problems. Being a human I can figure out 0 > from O from o from Q from @. But for a computer to do > so is hard especially with small DPIs and with font modifiers (e.g. bold and > italics). So I would just accept it as reality and add > a spell checker of sorts to scan the output. >
Gutcheck (http://gutcheck.sourceforge.net/) is a tool for catching common scannos; though it's designed for use at Project Gutenberg/Distributed Proofreaders, so it has some specific checks to find words that wouldn't have occurred in old text: it will flag 'modem' as a scanno or 'modern', for example. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

