OCRopus has good automatic skew correction, but it isn't used by
default.  The components that implement this are DeskewPageByRAST and
DeskewGrayPageByRAST.

Pages are not deskewed by default by the OCR system because that makes
it more difficult to relate bounding boxes and geometric information
back to the original document image.  OCRopus itself doesn't care
(since it only uses pixel accurate information internally), but the
bounding boxes in hOCR wouldn't agree with the original page images.

The best solution will likely be to add deskew information to the hOCR
output format and output bounding boxes in the deskewed coordinate
system in hOCR; applications that want to relate this back to the
original image need to transform the bounding boxes back into the
original image's coordinate system.

Another reason not to deskew is that deskewing degrades image quality
slightly but OCRopus layout analysis can actually cope with skews.
But the range of skews allowed is limited by default because large
skews aren't recognized well anyway (since the recognizer hasn't been
trained much on them).

So, if your documents are slightly skewed, the current setup is
overall better.  If your documents exceed the range of skews that are
enabled, you probably should deskew your pages in a preprocessing
step.  If you need a quick solution, then write a small C++ or script
wrapper around DeskewPageByRAST and deskew your pages in a
preprocessing step.

Tom

On Sat, Jul 18, 2009 at 04:26, travis<[email protected]> wrote:
>
> in my limited testing slightly rotated images cause horrible problems.
> i know garbage in means garbage out. but is there a good way to
> mitigate this? do i need to do more testing? or is this simply
> inherent in the system? thanks in advance.
>
>   .travis
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to