Hi! Is there a way to use tesseract with ocropus 0.4 (mercurial)? I am using ocropus.exe page mytif.tif and having some problems: The text is german, but ü, ä, ö are recognized as (ii 0), (a 4), o so I assume the language model is wrong... Any hints about that? Some lines werent recognized at all: [beam search failed] [beam search failed] [beam search failed] [beam search failed] They were the Title and some logo... underlines doesnt let g and p and y be recognized reight (d , d, 6v) after the last line it segfaults:
Stack trace: Frame Function Args 0022C028 7C802532 (00000718, 0000EA60, 000000A4, 0022C070) 0022C148 61097F54 (00000000, 7C8025F0, 7C802532, 000000A4) 0022C238 61095AEB (00000000, 003B0023, 00230000, 0022CE68) 0022C298 61095FCB (0022C2B0, 00000000, 00000094, 61020C1B) 0022C358 61096182 (00001718, 00000006, 0022C388, 61096383) 0022C368 610961AC (00000006, 0022CE88, 0022C3D8, 00571B23) 0022C388 61096383 (0022C3B8, FFFFFFFF, 0022C48C, 00000001) 0022C3D8 00571B37 (0022C260, 005EEF24, 0022C528, 005D0E86) 0022C3E8 0056EA76 (03D76418, 00612C78, 00000000, 3FF00000) 0022C528 005D0E86 (00FE07B8, 0022C7E0, 0022C7E0, 42C80000) 0022C818 005CB137 (00FE07B8, 0022C870, 00FF2DC0, 0022C9F0) 0022C8A8 005CC876 (00FE07B8, 00FF2DC0, 0022C9F0, 0000002C) 0022CBD8 0040D5D7 (00000002, 6116B67C, 0022CC08, 7C80AE00) 0022CCB8 0041C86A (00000003, 6116B678, 00401260, 00401273) 0022CCE8 004012AB (00000003, 6116B678, 00FE0090, 00000000) 0022CD98 610060D8 (00000000, 0022CDD0, 61005450, 0022CDD0) End of stack trace (more stack frames may be present) I have other document which has same problems but the last block isnt processed cause it seg faults before it (same stack)... Any ideas? Fernando Benites --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/ocropus?hl=en -~----------~----~----~----~------~----~------~--~---
