This only occurs when converting the hocr into plain text. When I just use ocroscript rec-tess everything appears correctly and the text remains on the correct lines.
ocroscript rec-tess image.png > output.htm That he did not transmute that thought into action is known to-day the world over. That, instead, he went ruggedly forward overcoming all obstacles, to hew out a new career, greater by But for some reason when I then convert the hocr to text using the following it causes a line break to be inserted before the end of each line. This problem causes the last 1 - 2 words of each line to appear on a completely new line. ocroscript hocr-to-text output.htm > output.txt That he did not transmute that thought into action is known to-day the world over. That, instead, he went ruggedly forward overcoming all obstacles, to hew out a new career, greater by Why is this occurring and is there any way to prevent it? --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/ocropus?hl=en -~----------~----~----~----~------~----~------~--~---
