Also,I tried testing some of the images with stand alone tesseract as the 
OCR recognizer and found that the results were on average better ,in case 
the images do not demand any layout analysis(which is expected,I believe).I 
had come across discussion threads stating tesseract is not the default OCR 
for ocropus and the pluggable integration is still in works.Any updates on 
this?


Yes, Tesseract is pretty fast and fairly good; a lot of time has gone into 
tuning it.  Mostly, we haven't been tracking Tesseract because its API has 
been in flux.  If you can get Tesseract to recognize lines, just run it over 
the line images.  Keep in mind, however, that Tesseract does not output 
probabilities and its language models work differently.  

The default character and language models you're using with OCRopus right 
now are not very good; we're training new ones that work better. 
Furthermore, you're probably seeing Tesseract output with adaptation and 
language modeling and OCRopus without adaptation and without language 
modeling.  Finally, Unicode and ligature support was buggy but is much 
better now (ligatures are used for recognizing hard-to-segment characters 
like "oo" and have a significant influence on recognition rates).

Tom


-- 
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en.

Reply via email to