I'm currently using ocropus v0.2 with tesseract v2.03.  I use the
following commands to process images into plain text.

ocroscript rec-tess image.png > output.htm
ocroscript hocr-to-text output.htm > output.txt

Unfortunately it seems that those commands always wrap the last couple
of words on each line to the next line.  The following is an example
of this behavior.

That he did not transmute that thought
into
action is known to-day the world over.
That,
instead, he went ruggedly forward overcoming
all
obstacles, to hew out a new career, greater
by

As you can see the words "into, That, all, by" have all been wrapped
to the next line.  Is there any way that I can prevent this word-
wrapping from occurring?
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to