I use Ocropus to detect lots of early modern text in Fraktur typeface. I'm very impressed by its high accuracy While it recognises the single characters quite well, it doesn't seem to separate words correctly in my use-cases.
Example (pre-processed): <https://lh3.googleusercontent.com/-yEJqvSDKd9E/UqUP-VVcHYI/AAAAAAAAABk/AqoB5fQO-T4/s1600/Bildschirmfoto+2013-12-09+um+01.33.09.png> Output: MittlcindischenMeer gehabt-welcherso bald die Sonn untergangen kein siickgesehenxund aber durchdas essenrauer Leberen von Hùˆnerenist zu rechtgebrachtworden ) von diserLeberdes Fisches What i'd expect: Mittlcindischen Meer gehabt-welcher so bald die Sonn untergangen kein siick gesehen und aber durch das essen rauer Leberen von Hùˆnerenist zu recht gebracht worden ) von diser Leber des Fisches The spaces seem quite clear to me, compared to the character size. *Is there anything I can do to improve ocropus' behaviour concerning the spaces?* Thanks already. -- You received this message because you are subscribed to the Google Groups "ocropus" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/ocropus/abd2cd39-46f7-4465-8710-fb071f427230%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
