OCRopus has good automatic skew correction, but it isn't used by default. The components that implement this are DeskewPageByRAST and DeskewGrayPageByRAST.
Pages are not deskewed by default by the OCR system because that makes it more difficult to relate bounding boxes and geometric information back to the original document image. OCRopus itself doesn't care (since it only uses pixel accurate information internally), but the bounding boxes in hOCR wouldn't agree with the original page images. The best solution will likely be to add deskew information to the hOCR output format and output bounding boxes in the deskewed coordinate system in hOCR; applications that want to relate this back to the original image need to transform the bounding boxes back into the original image's coordinate system. Another reason not to deskew is that deskewing degrades image quality slightly but OCRopus layout analysis can actually cope with skews. But the range of skews allowed is limited by default because large skews aren't recognized well anyway (since the recognizer hasn't been trained much on them). So, if your documents are slightly skewed, the current setup is overall better. If your documents exceed the range of skews that are enabled, you probably should deskew your pages in a preprocessing step. If you need a quick solution, then write a small C++ or script wrapper around DeskewPageByRAST and deskew your pages in a preprocessing step. Tom On Sat, Jul 18, 2009 at 04:26, travis<[email protected]> wrote: > > in my limited testing slightly rotated images cause horrible problems. > i know garbage in means garbage out. but is there a good way to > mitigate this? do i need to do more testing? or is this simply > inherent in the system? thanks in advance. > > .travis > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/ocropus?hl=en -~----------~----~----~----~------~----~------~--~---
