Re: Germanletters and Symbols in Whitelist

Ray Smith Thu, 12 Mar 2009 16:48:14 -0700

This looks like a latin-1 vs utf-8 issue.You need to use
"\u00e4\u00c4\u00f6\u00d6\u00fc\u00dc\u00df\u20ac"
which most compilers accept and turn to utf-8.
Ray.


On Wed, Feb 11, 2009 at 3:24 AM, paulwesterkamp <
[email protected]> wrote:

>
> Me Again.
>
> i use visual Studio 2005. i tried using the tessnet2.dll
> but when i use one of these symbols [ä Ä ö Ö ü Ü ß €] in the whitelist
> via
>
> ocr.SetVariable("tessedit_char_whitelist", "äÄ0123€");
>
> the application crashes and throws an Assertion Failed Error in line
> 76 of unicharset.cpp
>
> assert(ids.contains(unichar_repr, length));
>
> Following the steps in Debugger i find out he passes unichar_repr = €
> and length=2
> therefore the method Contains returns false. When Length is 2 he sets
> Current_nodes to childnodes but these are deleted in
>
> UNICHARMAP::UNICHARMAP_NODE::~UNICHARMAP_NODE() {
>  if (children != 0) {
>    delete[] children;
>  }
> }
>
> So  return current_nodes != 0 always returns false
> and so the assert fails.
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Germanletters and Symbols in Whitelist

Reply via email to