Actually, there's an issue already on this point:
http://code.google.com/p/tesseract-ocr/issues/detail?id=455&sort=-id
I don't see any progress on it, though

Warm regards,
Dmitri Silaev





On Thu, Mar 31, 2011 at 7:55 AM, patrickq <[email protected]> wrote:
> Upon further experimentation I think I found out that the whole
> whitelist is render irrelevant whenever a character in the blacklist
> is NOT in the training set ... this is crazy of course but it appears
> to be the case, as if the code handling this list decides to stop
> processing the list if one of the characters is not in the training
> set in the first place.
>
> On Mar 30, 10:33 pm, patrickq <[email protected]> wrote:
>> I am trying to provide a black list with UTF8 characters specified
>> using their byte codes, as follows:
>>
>>         // U+FB00       ff     ef ac 80        LATIN SMALL LIGATURE FF
>>         // U+FB01       fi     ef ac 81        LATIN SMALL LIGATURE FI
>>
>>         myTess->SetVariable("tessedit_char_blacklist", "\xef\xac\x80\xef\xac
>> \x81");
>>
>> But this doesn't work. I tried "\x0ef\x0ac\x080" (adding a leading 0)
>> but same result. The call doesn't return an error but the characters
>> in question are not black listed.
>>
>> Is this string variable not in UTF8 format? Is there a problem in the
>> C syntax I used to provide the hex codes?
>>
>> Thanks!
>> Patrick
>
> --
> You received this message because you are subscribed to the Google Groups 
> "tesseract-ocr" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to