The problem is that the words in the original book were hyphenated a lot, resulting in a hyphen and a lineshift, like this:
'The Encyclapedia Arnericana libro- pika kashnami nin: now spellchecking that is difficult because it really reads "libro\n- pika" where \n is newline. Removing the dashes followed by newlines is not possible with openoffice. But installing the alternative search/replace plugin will help the problem: http://extensions.services.openoffice.org/project/AltSearch Search for "-\p" (without the " 's. \p means paragraph mark). Replace with "" (nothing). If anyone has an easier solution (sed or something?), please tell me. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

