Upon further experimentation I think I found out that the whole
whitelist is render irrelevant whenever a character in the blacklist
is NOT in the training set ... this is crazy of course but it appears
to be the case, as if the code handling this list decides to stop
processing the list if one of the characters is not in the training
set in the first place.

On Mar 30, 10:33 pm, patrickq <[email protected]> wrote:
> I am trying to provide a black list with UTF8 characters specified
> using their byte codes, as follows:
>
>         // U+FB00       ff     ef ac 80        LATIN SMALL LIGATURE FF
>         // U+FB01       fi     ef ac 81        LATIN SMALL LIGATURE FI
>
>         myTess->SetVariable("tessedit_char_blacklist", "\xef\xac\x80\xef\xac
> \x81");
>
> But this doesn't work. I tried "\x0ef\x0ac\x080" (adding a leading 0)
> but same result. The call doesn't return an error but the characters
> in question are not black listed.
>
> Is this string variable not in UTF8 format? Is there a problem in the
> C syntax I used to provide the hex codes?
>
> Thanks!
> Patrick

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to