Experimented with number of tif files(Kannada) used for generating
mftraining / cntraining and unicharset purpose.
Attached rtf file which is self explanatory.
It is observed if the number of more than 25 and 26 tif files are used *error
*generated whereas if the number is   24 or less tif files are used *no
error* x classes in inttemp while unicharset contains y unichars. In this
case, question of apply box failures or overlap does not arises.
According wiki instructions tesseract does not support for more than 32
files. How to overcome this restrictions of 32 files?
With regards,
-sriranga(77yrsold)

On Mon, Mar 22, 2010 at 9:13 PM, 74yrs old <[email protected]> wrote:

> Whether "APPLY_BOXES: Unlabelled word blk: *X* row:* Y* allrows*:Y*"
> will generate error noted below** or has relationship for Inttemp and
> unicharset datafiles and if so how to overcome/avoid the Applyboxes error?
> **Error: X classes in inttemp while unicharset contains Y unichars.
> APPLY_BOXES: Unlabelled word blk:1 row:5 allrows:5
> APPLY_BOXES: Unlabelled word blk:1 row:10 allrows:10
> APPLY_BOXES: Unlabelled word blk:1 row:12 allrows:12
> APPLY_BOXES: Unlabelled word blk:1 row:13 allrows:13
> APPLY_BOXES: Unlabelled word blk:1 row:14 allrows:14
> APPLY_BOXES: Unlabelled word blk:1 row:16 allrows:16
> APPLY_BOXES: Unlabelled word blk:1 row:18 allrows:18
> APPLY_BOXES: Unlabelled word blk:1 row:18 allrows:18
> APPLY_BOXES: Unlabelled word blk:1 row:23 allrows:23
> APPLY_BOXES: Unlabelled word blk:1 row:26 allrows:26
> APPLY_BOXES: Unlabelled word blk:1 row:26 allrows:26
> APPLY_BOXES: Unlabelled word blk:1 row:27 allrows:27
> APPLY_BOXES: Unlabelled word blk:1 row:28 allrows:28
> APPLY_BOXES: Unlabelled word blk:1 row:29 allrows:29
> APPLY_BOXES: Unlabelled word blk:1 row:34 allrows:34
> APPLY_BOXES: Unlabelled word blk:1 row:36 allrows:36
> APPLY_BOXES: Unlabelled word blk:1 row:38 allrows:38
> APPLY_BOXES:
>    Boxes read from boxfile:     628
>    Initially labelled blobs:    628 in 41 rows
>    Box failures detected:             0
>    Duped blobs for rebalance:     0
>    "ಪ್ರೀ" has fewest samples:     1
>                 Total unlabelled words:       17
>                 Final labelled words:        628
> Generating training data
> TRAINING ... Font name = UnknownFont.
> Generated training data for 628 blobs
>
> Valuable guidance is solicited.
> -sriranga(77yrsold)
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Attachment: error X and Y.rtf
Description: RTF file

Reply via email to