OK, I'll not create an issue for this for now, but the reason I brought it up was that I have saw isspace fail last night with text that was out of bounds ( the char value was -128 ).
When I changed the isspace to iswspace the problem was resolved. I'm no expert at this and I am likely to be fixing this incorrectly, but isspace was crashing out which meant I was unable to do any training. I can supply you with the tiff / box file if you want. On Feb 9, 3:07 am, "Jimmy O'Regan" <[email protected]> wrote: > On 9 February 2012 00:02, Wil Hadden <[email protected]> wrote: > > > Hi, > > > I thought I should let you know of an issue I may have uncovered. > > > In paragraphs.cpp in InitializeRowInfo there is a call to GetUTF8Text > > followed by a while loop that uses isspace. > > > The problem is that isspace expects a char, not utf8 and it can throw > > an assert. Changing the isspace to iswspace fixes the issue on Windows > > builds, you may need another solution for other platforms. > > No, you're imagining there's a problem where none exists, and your > change might introduce a problem where none had existed because > GetUTF8Text returns char*, not wchar_t* (UTF-8 is backwards compatible > with ASCII, there's no problem with isspace). > > -- > <Sefam> Are any of the mentors around? > <jimregan> yes, they're the ones trolling you -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

