On Tue, Apr 14, 2009 at 05:04, overcomer <[email protected]> wrote:

> Although the image is pretty clear, the text is really "readable", if
> I process the image with tesseract the result is disappointing.
> Tesseract recognizes something like the 15 percent of the characters.
> I think that is depend because of a not-correct use of the dictionary.
> Now, i can improve this result because the strings are related between
> them, and some of them are for example just a name of a person or of a
> city, so with a limited output.
> What I need to know, if there is some function that analizes the
> character and return a value that represent the probabilities of the
> character to be that one, or another one.


Yes; OCRopus 0.3 has the ocr-bpnet classifier; OCRopus 0.4 replaces that
with a new classifier.  Both the old and the new classifiers output
posterior probabilities.

In this way when I will rebuild the string, i can use just this
> probabilities and other implicit informations of document to improve
> my results.


Yes, not only can you do that, OCRopus supports that directly through its
use of statistical language models.  That is, you can define a statistical
language model that says something like:

5.1% London
4.9% Paris
4.7% New York
4.3% Berlin
...

If you give OCRopus the input string containing just a city name, you run it
through its recognizer, and then you apply the statistical language model,
it will give you the most probable interpretation of the input image.

In 0.3, this process is still a little obscure, in 0.4, you will be able to
run it directly from the command line.

Tom

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to