The problem is that the words in the original book were hyphenated a
lot, resulting in a hyphen and a lineshift, like this:

'The Encyclapedia Arnericana libro-
pika kashnami nin:

now spellchecking that is difficult because it really reads "libro\n-
pika" where \n is newline.

Removing the dashes followed by newlines is not possible with
openoffice. But installing the alternative search/replace plugin will
help the problem:

http://extensions.services.openoffice.org/project/AltSearch

Search for "-\p" (without the " 's. \p means paragraph mark).
Replace with "" (nothing).

If anyone has an easier solution (sed or something?), please tell me.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to