Dear all,
I'm using OCROpus to perform OCR on scanned PDFs. I've noticed that
OCROpus reliably extract text only if the pages are oriented in the
proper order.
If a page is e.g. rotated 90 degree, OCROpus sometimes fails with an
error message, but usually returns garbage.
I've tried to deskew the image before, but I've discovered that
deskewing works only when the image is already well oriented,
otherwise deskewing either fails with an error (saying it's impossible
to find text) or it produces usually a page rotated of 15~30 degree.
I'm currently trying to find an heuristic to decide whether OCROpus
recognize has returned garbage (e.g. to find someway what is the most
common word length and if it is 1 or 2 that consider it as garbage).
Do you plan to have a way to directly report when OCROpus recognize
has failed (i.e. recognized garbage?).
Will OCROpus to perform OCR on rotated pages (I guess this is a bit
not desirable when using hocr output because there is no clear way to
tell that sentences are in vertical rather then horizontal)?
Will deskew be able to autorotate pages in the correct direction? Do
you have maybe a suggestion on how I can do this myself (by modifying
deskew.lua)?
Best regards,
Samuele
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---