Re: Tesseract training problems and dictionary problems

bo gao Wed, 24 Aug 2011 19:32:38 -0700

Anyone knows about it?

On Tue, Aug 23, 2011 at 8:13 PM, bo gao <[email protected]> wrote:


> In the training process, although I have all the files I need, I have some
> failure report: Couldn't find a matching blob.
> Is that normal?
>
> Thanks!
> ......
> APPLY_BOXES: boxfile line 28970/g ((20496,698),(20515,729)): FAILURE!
> Couldn't f
> ind a matching blob
> APPLY_BOXES:
>    Boxes read from boxfile:   29046
>    Boxes failed resegmentation:     463
> ......
> APPLY_BOXES: Unlabelled word at :Bounding box=(5908,960)->(5944,971)
> APPLY_BOXES: Unlabelled word at :Bounding box=(690,962)->(761,994)
> APPLY_BOXES: Unlabelled word at :Bounding box=(2307,959)->(2345,972)
>    Found 28583 good blobs and 1026 unlabelled blobs in 0 words.
>    74 remaining unlabelled words deleted.
> TRAINING ... Font name = arial
> Generated training data for 5943 words
>
>
> On Tue, Aug 23, 2011 at 7:32 PM, bo gao <[email protected]> wrote:
>
>> Hi, All,
>>
>> For dictionary:
>>
>> I added dictionary for Tessearct 3, but I did not see the output changed.
>>
>> Then I try to turn up parameters as told in Wiki page:
>>
>> Try upping NON_WERD and GARBAGE_STRING in dict/permute.cpp to maybe 3 or
>> even 5.
>>
>> There is no NON_WERD and GARBAGE_STRING in dict/permute.cpp, should I
>> refer to segment_penalty_garbage  segment_penalty_dict_nonword in
>> dict/dict.h?
>>
>> How can I put more weights on dictionary?
>>
>> For training:
>>
>> I used the 32 tiff files, but after training the performance degrade, and
>> the traineddata is much smaller. How should I improve the performance?
>> Anyone trained a better classifier than provided eng,traineddata?
>>
>> Thanks!
>> --
>>
>> Best,
>>
>> Bo
>>
>
>
>
> --
>
> Best,
>
> Bo
>



-- 

Best,

Bo

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Tesseract training problems and dictionary problems

Reply via email to