Seems to be worth posting an issue. Please refer to http://code.google.com/p/tesseract-ocr/issues/list
Warm regards, Dmitri Silaev www.CustomOCR.com On Fri, Aug 12, 2011 at 3:30 AM, Stane <[email protected]> wrote: > My original image file was 12600 × 6670 big, wich is within the max. > of 7FFFx7FFF and contains about 7000 diff. chars. > Which is kinda small according to the wiki which says you should have > 10th of thousand of chars for large char sets. > Anyway I chopped this big image in smaller once of 500chars per image, > I trained tesseract with all these pieces, and the above mentioned > error just came up for piece. So I continued, thinking that losing > 500Chars is better then not be able to train anything. > I merged the remaining .tr files and .box files to one big .tr > and .box file. > > Everything goes fine till I need to use mftraining. > The command: > mftraining -F font_properties -U unicharset -O jpn.unicharset > jpn.fontname.exp0.tr > cause "Error: Illegal short name for a feature!" > I tried this step with many different training images, the error > appears always regardless of the size of the .tr file. > I should mention that till now i was working on MacOS Lion. > > I tried the whole thing again on my Windows System, where i also get > the "Assertion failed" error but not the "Error: Illegal short name > for a feature!" instead of that mftraining just crashes when the .tr > file is to big( in my case 45MB). It crashes in > SetUpForFloat2Int(unicharset_training, ClassList), I also noticed that > before it crashes the Microfeat file is written, when this file is > below 32MB the program continues normally, but is this file bigger > than 32MB it crashs. > Looks like a too small variable. Is this bug known? Have others > encounter similar problems with large char sets? > > Is there actualy an official char limit? > I saw that the original jpn.tessdata can regocnize about 4000 > different chars, are 7000 too much? > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

