>
> I'm currently trying to find an heuristic to decide whether OCROpus
> recognize has returned garbage (e.g. to find someway what is the most
> common word length and if it is 1 or 2 that consider it as garbage).
>
> Do you plan to have a way to directly report when OCROpus recognize
> has failed (i.e. recognized garbage?).


Yes, the next release will contain several techniques for page rotation
detection and correction.  The code is already integrated.  One of the
techniques is indeed to just run the OCR and see what comes out (however,
doing that efficiently and reliably requires a bit of post-processing).
There are a couple of other techniques that work independent of OCR.


> Will OCROpus to perform OCR on rotated pages (I guess this is a bit
> not desirable when using hocr output because there is no clear way to
> tell that sentences are in vertical rather then horizontal)?


hOCR can represent page segmentations in arbitrary orientations; hOCR is
agnostic on how you actually render that in HTML.  If you want to both
represent and render text vertically, you use hOCR to indicate where on the
page the vertical text occurred, and you use CSS to indicate that the text
should be rendered vertically.

Tom

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to