OCRopus doesn't really work that way; it performs text line
recognition in an integrated way.  You can ask it to give you words,
characters, and bigrams, but it will give you that information after
it has already performed recognition.

The problem is that segmentation just cannot be unambiguously carried
out in a top-down way.  For example, what are the words in "He said:
'h e l l o  w o r l d'"?  And characters like "rn" and "m" are
inherently ambiguous.

If it were possible to reliably partition lines into words and words
into characters prior to character recognition, then OCR would be
much, much simpler.  (In fact, in some language it is.)

Tom

On Aug 11, 2:09 am, eel <[email protected]> wrote:
> Hello Everybody,
>
> I need, for academic purpose, to use ocropus to simulate a text
> recognition.
> But the simulation should separate some specific steps of the
> recognition.
> In order to achieve that, I need some specific features.
>
> Here is the idea:
>
> Input : image which represent a line of text.
>
> - Need a feature which segment the line into words.
>
> Browse each word(image) and segment it into characters.
>
> Get each image which represent a character and try to recognize it.
>
> Need a feature which give me a bigram score between two letters.
>
> Thus, to resume, I need:
>
> A feature which segment the line into word.
> A feature which segment it into characters.
> A feature which recognize a character the ascii solution and a score.
> A feature which give me a bigram score between two letters.
>
> Does anyone can told me how to do that with ocropus ?
>
> If someone can help me it will be very helpfull .
>
> Thank in advance.

-- 
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en.

Reply via email to