An easier solution would be to post-process with regular expressions. --Sven
On Friday, November 18, 2011, speeder <[email protected]> wrote: > Unfortunately, I do not know the scrollbar positions, the OCR is made from a picture taken manually with a iPhone, from another phone, and thus it might have, or not a scrollbar, and it might be, or not, in certain places. > > On Fri, Nov 18, 2011 at 3:04 PM, WalterA <[email protected]> wrote: >> >> I may be wrong, but I don't believe that setting a blacklist to the >> character(s) often recognized in the scroll bar position will work, >> since it will just force the engine to interpret the perceived >> character as something else. However, since you probably know the >> position of the scrollbars, you might want to use a rectangular input >> region definition to include everything but the scrollbar area. Look >> at TessBaseAPI::TesseractRect() or TessBaseAPI::SetRectangle() in the >> baseapi.h header. >> >> -Walter >> >> >> On Nov 17, 5:05 am, speeder <[email protected]> wrote: >> > I used a whitelist to detect only numbers, and noticed that all characters >> > are read as numbers. >> > >> > I assume blacklist do the opposite, and make a character get read as >> > something else. (ie: if you blacklist lowercase L, then it will be read as >> > uppercase "i" ) >> > >> > How I actually ban a character from being read? (ie: see my post about >> > scrollbars... I think if I ban { } | and some other characters, the >> > scrollbars will not bother me again. >> > >> > Maurício Gomes >> >> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected] >> To unsubscribe from this group, send email to >> [email protected] >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- ``All that is gold does not glitter, not all those who wander are lost; the old that is strong does not wither, deep roots are not reached by the frost. >From the ashes a fire shall be woken, a light from the shadows shall spring; renewed shall be blade that was broken, the crownless again shall be king.” -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

