No, no preprocessing is normally required.  Whatever text you give it is 
simply used to determine the probabilities. Note that line breaks matter, 
since the model models the start of the line. Furthermore, contexts longer 
than 3-4 may cause the model to become too sparse (there is no back-off 
right now). The trickiest part in getting the language model to work is in 
finding the right weights for characters, language models, and whitespace 
(specified with command line parameters to ocropus-ngraphs during 
matching).  They are a tradeoff between how well your documents match your 
corpus, document quality, and recognizer quality.

Tom

On Thursday, August 23, 2012 10:51:56 PM UTC+2, Luciano Édipo wrote:
>
> I am creating a language model using OCRopus-ngraph, there must be some 
> pre-processing or preparation of the set of text used to generate the 
> model? Some indication about it?
>

-- 
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msg/ocropus/-/zke6H5y4MigJ.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to