My original image file was 12600 × 6670 big, wich is within the max.
of 7FFFx7FFF and contains about 7000 diff. chars.
Which is kinda small according to the wiki which says you should have
10th of thousand of chars for large char sets.
Anyway I chopped this big image in smaller once of 500chars per image,
I trained tesseract with all these pieces, and the above mentioned
error just came up for piece. So I continued, thinking that losing
500Chars is better then not be able to train anything.
I merged the remaining .tr files and .box files to one big .tr
and .box file.

Everything goes fine till I need to use mftraining.
The command:
mftraining -F font_properties -U unicharset -O jpn.unicharset
jpn.fontname.exp0.tr
cause "Error: Illegal short name for a feature!"
I tried this step with many different training images, the error
appears always regardless of the size of the .tr file.
I should mention that till now i was working on MacOS Lion.

I tried the whole thing again on my Windows System, where i also get
the "Assertion failed" error but not the "Error: Illegal short name
for a feature!" instead of that mftraining just crashes when the .tr
file is to big( in my case 45MB). It crashes in
SetUpForFloat2Int(unicharset_training, ClassList), I also noticed that
before it crashes the Microfeat file is written, when this file is
below 32MB the program continues normally, but is this file bigger
than 32MB it crashs.
Looks like a too small variable. Is this bug known? Have others
encounter similar problems with large char sets?

Is there actualy an official char limit?
I saw that the original jpn.tessdata can regocnize about 4000
different chars, are 7000 too much?

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to