An easier solution would be to post-process with regular expressions.
--Sven

On Friday, November 18, 2011, speeder <[email protected]> wrote:
> Unfortunately, I do not know the scrollbar positions, the OCR is made
from a picture taken manually with a iPhone, from another phone, and thus
it might have, or not a scrollbar, and it might be, or not, in certain
places.
>
> On Fri, Nov 18, 2011 at 3:04 PM, WalterA <[email protected]> wrote:
>>
>> I may be wrong, but I don't believe that setting a blacklist to the
>> character(s) often recognized in the scroll bar position will work,
>> since it will just force the engine to interpret the perceived
>> character as something else.  However, since you probably know the
>> position of the scrollbars, you might want to use a rectangular input
>> region definition to include everything but the scrollbar area.  Look
>> at TessBaseAPI::TesseractRect() or TessBaseAPI::SetRectangle() in the
>> baseapi.h header.
>>
>> -Walter
>>
>>
>> On Nov 17, 5:05 am, speeder <[email protected]> wrote:
>> > I used a whitelist to detect only numbers, and noticed that all
characters
>> > are read as numbers.
>> >
>> > I assume blacklist do the opposite, and make a character get read as
>> > something else. (ie: if you blacklist lowercase L, then it will be
read as
>> > uppercase "i" )
>> >
>> > How I actually ban a character from being read? (ie: see my post about
>> > scrollbars... I think if I ban { } | and some other characters, the
>> > scrollbars will not bother me again.
>> >
>> > Maurício Gomes
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]
>> To unsubscribe from this group, send email to
>> [email protected]
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
>From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to