Dear Prof. Breuel,
thank you for your informative reply. We are again
test the performance of bpnet where we train the isolated characters with
single color and trying to recognize the test images. We are continuously
experimenting the parameters of bpnet. However we are yet to get 100%
accuracy even to recognize the training image. Can you please suggest us
whether we are missing anything which is eventually effecting the
recognition?
On Fri, Feb 6, 2009 at 8:30 PM, Thomas Breuel <[email protected]> wrote:
> Dear Mezhirov,
>> I need to know about the procedure to train *"i"*using
>> make_ConnectComponentSegmenter() method. For this when I provide
>> *i* as the transcription of the image, then the output of the lua script
>> (train-bpnet-lines.lua) is an error message which is :
>>
>> *narray: index out of range in function addTrainingLine (at
>> train-bpnet-lines.lua file)*
>>
>> Well, I understand that as the connected component segmenter provide two
>> colors for character image "i" so we should provide two transcription for
>> this.
>
>
> There are two cases you need to distinguish.
>
> For training, you need a correct segmentation, plus the transcription. The
> only way to get a correct segmentation is by creating it by hand, or by
> using an alignment procedure. None of the segmentation methods in OCRopus
> will give you a correct segmentation in general.
>
> For recognition, you use one of the built in segmenters. They generate an
> oversegmentation. CurvedCutSegmenter was designed for handwriting and works
> passably well for printed Western languages. It probably won't work well
> for Bangla. Connected component segmenter doesn't work well for anything
> other than very clean printed Western fonts. It is mostly there for control
> experiments.
>
>
>> However, at this moment I just want to know exactly what strategy you
>> follow to training "i". In our script (Bangla) there are so many characters
>> which have a disjoint shape and we need to fix a common strategy to train
>> them that you are following to train *"i"*.
>
>
> There are several different strategies you can use, and nobody knows what
> the best one is. You can divide characters into small parts and then train
> each small part, giving you a fairly small number of characters, or you can
> train larger pieces and have a larger character set.
>
> However, none of the built in segmenters will likely work well for Bangla.
> The CurvedCutSegmenter might work well, if you modify it to do right cuts
> instead of left cuts (since you want to cut to the right of the vertical
> lines).
>
> The next version of OCRopus (soon) will have large character set training
> support. I think a simple segmentation plus large character set training
> will be important.
>
> Please see the discussion here:
>
> http://sites.google.com/site/ocropus/languages/devanagari-hindi-sanskrit
>
> So, the specific answer to your question is: if you want to train the
> letter "i" as the letter "i", then you need to ensure that all its pixels
> have the same color, and you need to transcribe it with exactly one
> character.
>
> One more thing: there are many ways in which training can fail and throw an
> exception. Our code is exception safe, so if you get an exception in some
> main loop, you can simply continue training or processing the next image.
> There won't be any storage leak or undefined data structures (if there are,
> it's a bug and please report it).
>
> Tom
>
> >
>
Regards,
--
Hasnat
Center for Research on Bangla Language Processing (CRBLP)
http://mhasnat.googlepages.com/
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---