Hello,

after reading some documentation about FST, in general, and OpenFST  I have
tried to build my simple language model.
After a lot of attempt i did not get any result.
Following
www.openfst.org documentation
and some other tutorials like
http://www.stringology.org/event/CIAA2007/pres/Tue2/Riley.pdf
I tried to build a FST for my recognition.
I have still a lot of doubts how OCRopus use the FST.

1) If I have to recognize A,  should the input label be just A?
2) the weight is defined in base a which principle?

Anyway I notice that all the final weights, in the default.fst file, are
really big like 1.99999994e+38.

I did a lot of tests but i report just the last one.
The results of other tests were: A empty string or a strange character
(Little square with 4 litter numbers inside 2x2)

The last test:
I tried to recognize a text really simple like: AA
(with just a letter that normally is often recognized).

For doing that I built a really simple FST.

For isymbols in ( test.isyms file):

>  A 1
>
>
For osymbls in (test.osyms file):
       A 1

In the transducer file textual representation:

> 0 1 A A 0.6
>
> 1 2 A A 0.6
>>
> 2 1.99999994e+38
>
>
Obtaining the FST in the figure.
When I run ocropus with my FST I get:

*overco...@overcomer-laptop:~/Scrivania/prova$ ocropus page AA.png
[beam search failed]*

if i change the weight of the last state (2), setting the weigth to 1, for
example, I get an empty string:

*overco...@overcomer-laptop:~/Scrivania/prova$ ocropus page AA.png

overco...@overcomer-laptop:~/Scrivania/prova$ _*

Obviously running ocropus with the default.fst file works perfectly.

What's wrong in my FST?

I hope really that someone can help me, I am doing a lot of tests, I am
proceding for attempts and I am getting crazy.
I hope that I was clear and I am sorry for the prolixity.

Thanks.


2009/5/15 Thomas Breuel <[email protected]>

> On Fri, May 15, 2009 at 10:18, Pierpaolo Monaco <
> [email protected]> wrote:
>
>> Using tesseract i can limit the output with a shell command.
>>
>> I just need to create a file in the tesseract-ocr/tessdata/configs/ that,
>> for example, I call myletters.
>> In the file i define the whitelist in this way, writing in the file:
>>
>> tessedit_char_whitelist QWERTYUIOPASDFGHJKLZXCVBNM
>>
>> After that i can process an image writing:
>>
>> $ tesseract prova.tif out nobatch myletters
>>
>> I will have just upper case letters as result. (letters from my white
>> list)
>>
>> Can I do something like that in ocropus or I need to do that whit a
>> language model?
>>
>
> You need a language model for that, but a pretty simple one.  The language
> model you need is the equivalent of "[A-Z]*".  You can create something as
> simple as that by hand even; you just need one or two states, plus a
> transition for each permited letter.  See the OpenFST documentation (you do
> not need to use OpenFST, but OCRopus uses the same representation).
>
> If you want good recognition performance, you should also retrain the
> classifier on just your target character set.
>
> I've written an overview paper describing how all the bits and pieces of
> OCRopus fit together; I'll try and put that up publicly in a couple of
> weeks.
>
> After that, I'll revise the tutorial to conform to OCRopus 0.4.
>
> Tom
>
>
> >
>


-- 
-----------------------------
Pierpaolo Monaco
----------------------------

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---

<<attachment: test.png>>

Reply via email to