Re: Tesseract training problems and dictionary problems

bo gao Tue, 23 Aug 2011 23:41:03 -0700

In the training process, although I have all the files I need, I have some
failure report: Couldn't find a matching blob.
Is that normal?

Thanks!
......
APPLY_BOXES: boxfile line 28970/g ((20496,698),(20515,729)): FAILURE!
Couldn't f
ind a matching blob
APPLY_BOXES:
   Boxes read from boxfile:   29046
   Boxes failed resegmentation:     463
......
APPLY_BOXES: Unlabelled word at :Bounding box=(5908,960)->(5944,971)
APPLY_BOXES: Unlabelled word at :Bounding box=(690,962)->(761,994)
APPLY_BOXES: Unlabelled word at :Bounding box=(2307,959)->(2345,972)
   Found 28583 good blobs and 1026 unlabelled blobs in 0 words.
   74 remaining unlabelled words deleted.
TRAINING ... Font name = arial
Generated training data for 5943 words

On Tue, Aug 23, 2011 at 7:32 PM, bo gao <[email protected]> wrote:

> Hi, All,
>
> For dictionary:
>
> I added dictionary for Tessearct 3, but I did not see the output changed.
>
> Then I try to turn up parameters as told in Wiki page:
>
> Try upping NON_WERD and GARBAGE_STRING in dict/permute.cpp to maybe 3 or
> even 5.
>
> There is no NON_WERD and GARBAGE_STRING in dict/permute.cpp, should I
> refer to segment_penalty_garbage  segment_penalty_dict_nonword in
> dict/dict.h?
>
> How can I put more weights on dictionary?
>
> For training:
>
> I used the 32 tiff files, but after training the performance degrade, and
> the traineddata is much smaller. How should I improve the performance?
> Anyone trained a better classifier than provided eng,traineddata?
>
> Thanks!
> --
>
> Best,
>
> Bo
>

-- 

Best,

Bo

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Tesseract training problems and dictionary problems

Reply via email to