Seems to be worth posting an issue. Please refer to
http://code.google.com/p/tesseract-ocr/issues/list

Warm regards,
Dmitri Silaev
www.CustomOCR.com





On Fri, Aug 12, 2011 at 3:30 AM, Stane <[email protected]> wrote:
> My original image file was 12600 × 6670 big, wich is within the max.
> of 7FFFx7FFF and contains about 7000 diff. chars.
> Which is kinda small according to the wiki which says you should have
> 10th of thousand of chars for large char sets.
> Anyway I chopped this big image in smaller once of 500chars per image,
> I trained tesseract with all these pieces, and the above mentioned
> error just came up for piece. So I continued, thinking that losing
> 500Chars is better then not be able to train anything.
> I merged the remaining .tr files and .box files to one big .tr
> and .box file.
>
> Everything goes fine till I need to use mftraining.
> The command:
> mftraining -F font_properties -U unicharset -O jpn.unicharset
> jpn.fontname.exp0.tr
> cause "Error: Illegal short name for a feature!"
> I tried this step with many different training images, the error
> appears always regardless of the size of the .tr file.
> I should mention that till now i was working on MacOS Lion.
>
> I tried the whole thing again on my Windows System, where i also get
> the "Assertion failed" error but not the "Error: Illegal short name
> for a feature!" instead of that mftraining just crashes when the .tr
> file is to big( in my case 45MB). It crashes in
> SetUpForFloat2Int(unicharset_training, ClassList), I also noticed that
> before it crashes the Microfeat file is written, when this file is
> below 32MB the program continues normally, but is this file bigger
> than 32MB it crashs.
> Looks like a too small variable. Is this bug known? Have others
> encounter similar problems with large char sets?
>
> Is there actualy an official char limit?
> I saw that the original jpn.tessdata can regocnize about 4000
> different chars, are 7000 too much?
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to