Problem with linefeed

Christoph Reimmann Fri, 07 Nov 2008 04:37:05 -0800

Hi all,

I have the following problem with tesseract's output:


I have documents with some address field such as

Lastname                                                          more
information
Firstname                                  01.01.89
Street No
D 12345 Town                                                    more
information

I just need the address in 4 lines, without the birthdate.

Calling tesseract with shellexec using an uzn zone file everything
works fine, but I don't know when tesseract is finished so I have some
timing problem on older computers.

Now I like to do it using the windows dll.

If I use recognize_all_words the result seems to be ok, but has all
information, so the address I need is is quite difficult to extract.

If I use recognize_a_block the resulting string has only the address
information, but not in 4 lines. The result looks this way:

LastnameFirstnameStreetNo <nl>
D 12345 Town

There is only one linefeed after Street No: I think the reason is that
there is no additional information in the line.

Has anyone any idea what I can do  or what I did wrong ?

I know the EANYCODE_CHAR structure has the box coordinates of each
letter so I can look for new lines myself, but I think there must be
an easier way to receive the correct result.

Thx in advance and cheers,

Chris from Aachen, Germany
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Problem with linefeed

Reply via email to